From Tim.Arnold at sas.com Mon Jun 1 16:14:53 2009 From: Tim.Arnold at sas.com (Tim Arnold) Date: Mon, 1 Jun 2009 10:14:53 -0400 Subject: [XML-SIG] docbook 5, lxml and rng In-Reply-To: <4A221DFF.1080006@behnel.de> References: <4A221DFF.1080006@behnel.de> Message-ID: > -----Original Message----- > From: Stefan Behnel [mailto:stefan_ml at behnel.de] > Sent: Sunday, May 31, 2009 2:05 AM > To: Tim Arnold > Cc: xml-sig at python.org > Subject: Re: [XML-SIG] docbook 5, lxml and rng > > Hi, > > Tim Arnold wrote: > > Hi, this is a newbie question I'm sure. I'm trying to validate an > > example straight out of the docbook 5 documentation (example given > > on the 'inlineequation' page). As it stands, the file doesn't pass > > as valid. > > > > The code: > > ======================================= > > from lxml import etree > > import os > > # RNGDIR = 'path to docbook.rng' > > # XMLDIR = 'path to the xml file' > > relaxng_doc = etree.parse(os.path.join(RNGDIR,'docbook.rng')) > > relaxng = etree.RelaxNG(relaxng_doc) > > > > doc = etree.parse(os.path.join(XMLDIR,'myfile.xml')) > > print relaxng.validate(doc) > > What does the validator tell you why it's not considered valid? Note that > there's a property "error_log" which returns a sequence of messages that > were collected during validation. > > http://codespeak.net/lxml/validation.html#relaxng > > Stefan > Thanks, I should have looked at the documentation more before posting. I see what you're talking about now and I think I might have an explanation of what's going on. The error_log says: --------------------- 4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMWRONG: Did not expect element para there 4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element example, got para 4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element bridgehead, got para 4:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element para has extra content: text 4:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element annotation, got para 4:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element article failed to validate content --------------------- But my libxml2 version is 5, which I think means that schematron isn't supported. And the docbook.rng contains some embedded schematron. From the DocBook 5 documentation: --------------------- If you want to validate against the DocBook 5 RelaxNG schema, then you have to find the right validation tool. The DocBook 5 RelaxNG schema includes embedded Schematron rules to express certain constraints on some content models. For example, a Schematron rule is added to prevent a sidebar element from containing another sidebar. For complete validation, a validator needs to check both the RelaxNG content models and the Schematron rules. --------------------- Does that make sense? thanks, --Tim Arnold From swtest123 at gmail.com Thu Jun 4 11:29:35 2009 From: swtest123 at gmail.com (testing123 test) Date: Thu, 4 Jun 2009 14:59:35 +0530 Subject: [XML-SIG] Regarding 2 XML Files Comparision using Python Message-ID: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com> Hi all, I am prasad.I need a help to write a python script to compare two XML Files.Is there any tutorial.Should we include any library?Please help me How to start? Rgds, Prasad -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Jun 4 13:35:16 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 4 Jun 2009 13:35:16 +0200 (CEST) Subject: [XML-SIG] Regarding 2 XML Files Comparision using Python In-Reply-To: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com> References: <8f7c146d0906040229i2d8a0a46i70a01886d119543d@mail.gmail.com> Message-ID: <13f45ea2de40bad6ab3039730ad7442a.squirrel@groupware.dvs.informatik.tu-darmstadt.de> testing123 test wrote: > Hi all, > I am prasad.I need a help to write a python script to compare two XML > Files.Is there any tutorial.Should we include any library?Please help me > How to start? ... by looking at the Python package index? If your XML files are small, you may get away with the xmldiff package. Also, a very simple way to do that is to pretty print your XML files and then run a normal line diff on them. Depends on what you want to achieve with your 'script'. If you need more than that and want to implement it in Python, you may consider using lxml (or cElementTree if you can afford to ignore comments) to parse the two files and then run through the two trees to look for differences. But note that this is not trivial. There is some scientific literature on good algorithms to compare XML tree structures. Note that lxml.html comes with an HTML diff algorithm, which you can look at for inspiration. Stefan From stefan_ml at behnel.de Sat Jun 6 17:29:39 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 06 Jun 2009 17:29:39 +0200 Subject: [XML-SIG] docbook 5, lxml and rng In-Reply-To: References: <4A221DFF.1080006@behnel.de> Message-ID: <4A2A8B63.2040205@behnel.de> Tim Arnold wrote: > my libxml2 version is 5, which I think means that schematron isn't > supported. And the docbook.rng contains some embedded schematron. From > the DocBook 5 documentation: > > --------------------- > If you want to validate against the DocBook 5 RelaxNG schema, then you > have to find the right validation tool. The DocBook 5 RelaxNG schema > includes embedded Schematron rules to express certain constraints on > some content models. For example, a Schematron rule is added to prevent > a sidebar element from containing another sidebar. For complete > validation, a validator needs to check both the RelaxNG content models > and the Schematron rules. > --------------------- Yes, it looks like libxml2 can't handle Schematron annotations that are embedded in RelaxNG schemas, even if both languages are supported separately. Stefan From billk at sunflower.com Sat Jun 6 23:08:27 2009 From: billk at sunflower.com (Bill Kinnersley) Date: Sat, 06 Jun 2009 16:08:27 -0500 Subject: [XML-SIG] docbook 5, lxml and rng In-Reply-To: <4A2A8B63.2040205@behnel.de> References: <4A221DFF.1080006@behnel.de> <4A2A8B63.2040205@behnel.de> Message-ID: <4A2ADACB.3010000@sunflower.com> Stefan Behnel wrote: > Tim Arnold wrote: >> my libxml2 version is 5, which I think means that schematron isn't >> supported. And the docbook.rng contains some embedded schematron. From >> the DocBook 5 documentation: >> >> --------------------- >> If you want to validate against the DocBook 5 RelaxNG schema, then you >> have to find the right validation tool. The DocBook 5 RelaxNG schema >> includes embedded Schematron rules to express certain constraints on >> some content models. For example, a Schematron rule is added to prevent >> a sidebar element from containing another sidebar. For complete >> validation, a validator needs to check both the RelaxNG content models >> and the Schematron rules. >> --------------------- > > Yes, it looks like libxml2 can't handle Schematron annotations that are > embedded in RelaxNG schemas, even if both languages are supported separately. Doesn't that just mean it skips over them? I don't see how the error_log entries Tim was getting would implicate Schematron. Anyway, the RelaxNG specification for Docbook, I believe, is still quite experimental. Both jing and trang choke on it, so perhaps libxml2 may be forgiven for choking also. From AndiDog at web.de Wed Jun 17 16:53:15 2009 From: AndiDog at web.de (Andreas Sommer) Date: Wed, 17 Jun 2009 15:53:15 +0100 Subject: [XML-SIG] XSLT 2.0 implementation in Python? Message-ID: <4A39035B.8050506@web.de> Hi, I just wanted to ask if there's any Python XML implementation which supports XSLT 2.0 (e.g. ). The only thing I found was Saxon, but it's only for Java/.NET (and I don't want to use Jython). Cheers Andreas From stefan_ml at behnel.de Wed Jun 17 18:02:05 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 17 Jun 2009 18:02:05 +0200 Subject: [XML-SIG] XSLT 2.0 implementation in Python? In-Reply-To: <4A39035B.8050506@web.de> References: <4A39035B.8050506@web.de> Message-ID: <4A39137D.1060808@behnel.de> Hi, Andreas Sommer wrote: > I just wanted to ask if there's any Python XML implementation which > supports XSLT 2.0 (e.g. ). The only thing I found was > Saxon, but it's only for Java/.NET (and I don't want to use Jython). This is a bit of a FAQ. You may want to search the list archives for some answers. Stefan From csad7 at t-online.de Thu Jun 18 12:38:15 2009 From: csad7 at t-online.de (Christof Hoeke) Date: Thu, 18 Jun 2009 12:38:15 +0200 Subject: [XML-SIG] XSLT 2.0 implementation in Python? Message-ID: <4A3A1917.3060301@t-online.de> > I just wanted to ask if there's any Python XML implementation which > supports XSLT 2.0 (e.g. ). The only thing I found was > Saxon, but it's only for Java/.NET (and I don't want to use Jython). I have looked for a native Python implementation for some time now but no chance it seems. With Jython 2.5final out Saxon is an alternative (I currently try to use e.g. web.py with it to be able to use XSLT 2 for web site templating). You could also try Saxon with IronPython, should work but I have not tried it yet. Only option to use Java/Saxon via Python would to be call Saxon a an os command and pipe the result back to your Python program. Does work but you still need Java in addition to Python but at least write your program in (C)Python. Also you cannot transform any e.g. lxml tree directly, you would have to reserialize any XML. But if you find anything let me know! Chris From stefan_ml at behnel.de Thu Jun 18 14:54:02 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 18 Jun 2009 14:54:02 +0200 (CEST) Subject: [XML-SIG] XSLT 2.0 implementation in Python? In-Reply-To: <4A3A1917.3060301@t-online.de> References: <4A3A1917.3060301@t-online.de> Message-ID: <4d06f0bd68dce9b32ba723764c058c90.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Christof Hoeke wrote: >> I just wanted to ask if there's any Python XML implementation which >> supports XSLT 2.0 (e.g. ). The only thing I found was >> Saxon, but it's only for Java/.NET (and I don't want to use Jython). > > Only option to use Java/Saxon via Python would to be call Saxon a an os > command and pipe the result back to your Python program. Sounds awfully slow, given the startup time of the average JVM, plus the time it takes hotspot to heat up. There's also JPype, GCJ or JCC if running Java is an option, see e.g. http://ubuntuforums.org/archive/index.php/t-593327.html http://pypi.python.org/pypi/JCC/ http://jpype.sourceforge.net/ > you still need Java in addition to Python but at least write your > program in (C)Python. Also you cannot transform any e.g. lxml tree > directly, you would have to reserialize any XML. Should I say it? Serialisation and parsing are *fast* in lxml - don't know about Saxon in Java, though. But given that both XSLT input and output can be streamed, the I/O performance might not be that much of a problem either (assuming large documents). Benchmarks will tell. You could also write an HTTP based transformation service in Jython that calls Saxon, and just run it in a permanently running JVM. Stefan From bigotp at acm.org Sun Jun 21 02:22:20 2009 From: bigotp at acm.org (Peter A. Bigot) Date: Sat, 20 Jun 2009 19:22:20 -0500 Subject: [XML-SIG] Python Bindings to XML Schema system released Message-ID: <4A3D7D3C.4040508@acm.org> PyXB ("pixbee") is a pure Python package that generates Python source code for classes that correspond to data structures defined by XMLSchema. The generated classes support bi-directional conversion between XML documents and Python instances. In concept it is similar to JAXB for Java and CodeSynthesis XSD for C++. Version 0.4.0, available from https://sourceforge.net/projects/pyxb, is fairly complete, and supports the following features: * Simple and complex type definitions * List and union datatypes * Constraints on (simple) datatypes (e.g., minInclusive, length) * Model groups and attribute groups * Complex content models (all, sequence, choice); minOccurs and maxOccurs * Abstract types, xsi:type, substitution groups * Nillable elements with xsi:nil * Namespace qualified attributes and elements * Class constants corresponding to string enumeration constraints It successfully generates bindings for many of the major WS-I schemas, such as WSDL and SOAP, as well as others like KML and SAML. A variety of examples show how to use it with demonstration web services such as the National Digital Forecast Database. The generated code can easily be customized by subclassing the generated bindings. Both DOM and SAX-based parsing are supported. PyXB assumes a fairly strict interpretation of the XML Schema specification, so web services using SOAP encodings with schemas but being lax about namespaces and validation against content models can sometimes be difficult to use. This may be addressed in a future release. The documentation serves as the project's home page, and can be viewed at http://pyxb.sourceforge.net/. This is the initial public release, and I would appreciate any feedback. Peter