From KRJackson at lbl.gov Mon Aug 1 04:12:44 2005 From: KRJackson at lbl.gov (Keith Jackson) Date: Sun, 31 Jul 2005 19:12:44 -0700 Subject: [XML-SIG] the state of WSDL implementation In-Reply-To: References: Message-ID: <87F4849C-5960-43F7-8E74-01F81A13B067@lbl.gov> Try ZSI. I just used wsdl2py to generate bindings, and everything went ok. I haven't tried the code out yet, nor am I going to have time to in the near future. I used the latest cvs version of the "serialize-dom-scheme" branch of ZSI. If you have trouble trying this, or it doesn't work, please email the pywebservices mailing list ( pywebsvcs-talk at lists.sourceforge.net). --keith On Jul 29, 2005, at 9:11 AM, Christoph Pingel wrote: > There's a nice project at the Leipzig University > http://wortschatz.uni-leipzig.de providing several web services > (SOAP) for linguistic use: co-occurrences of words, base forms, left > and right neighbours, and so forth. > > To give you an example of a web service, here's a WSDL describing the > baseform service: > http://wortschatz.uni-leipzig.de/axis/services/Baseform?wsdl > > According to the people in Leipzig, Java and .NET clients are doing > fine with this (acutally, the server software is part of the Java > Axis project), but they say Perl and Python can't handle the complex > WSDL descriptions. And indeed, SOAPpy fails to come up with a valid > SOAP envelope for this service. > > I'm wondering if there are similar experiences with Python SOAP > clients in other areas. > > I want to use *just this* service, so I could probably (as a > workaround) use a pre-built envelope as a template and just fill in > the word of which I need the base form. But this is obviously not the > way web services are meant to work. > > Are there any ideas or comments? Perhaps there are independent > implementations of SOAP besides pybwebsvcs? > > any input is highly welcome! > > best regards, > Christoph Pingel > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > From alin at gentoo.org Thu Aug 4 11:39:57 2005 From: alin at gentoo.org (Alin Dobre) Date: Thu, 04 Aug 2005 12:39:57 +0300 Subject: [XML-SIG] SAX and DTDs Message-ID: <42F1E26D.3010504@gentoo.org> Hey list, I have a python script that does a simple parsing of a XML document using SAX. The problem is that I cannot get to validate the XML using an external DTD file. ------------ #!/bin/env python import sys from xml.sax import saxlib, saxexts class mySaxDH(saxlib.HandlerBase): def startDocument(self): print 'Document start' handler = manSaxDH(sys.stdout) parser = saxexts.make_parser() parser.setDocumentHandler(handler) inFile = file(sys.argv[1], 'r') parser.parseFile(inFile) inFile.close() ------------ data ------------ For the examples shown above, I want to validate the xml stream against the my.dtd file. Any idea how to do this using SAX? Thanks, Alin. -- Alin DOBRE Romanian Lead Translator Gentoo Documentation Project: http://www.gentoo.org/doc/en/ Gentoo.RO Community: http://www.gentoo.ro/ From anders at norrbom.info Thu Aug 4 18:48:55 2005 From: anders at norrbom.info (Anders) Date: Thu, 04 Aug 2005 18:48:55 +0200 Subject: [XML-SIG] 'utf8' codec can't decode byte 0xc3 - bug in xmlproc? Message-ID: <42F246F7.4020203@norrbom.info> Im having a hard time debugging this error: ::: character set conversion problem: 'utf8' codec can't decode byte 0xc3 in position 65535: unexpected end of data The file Im trying to parse with xmlproc contains no illegal utf-8 byte sequences and this error does not occur when I switch to pyexpat. This is a hexdump of the row its complaining about: 00020030 64 65 73 20 6c c3 a8 76 72 65 73 20 42 6f 72 64 |des l..vres Bord| Its nothing wrong with this bytesequence what I can see. Has anyone else experienced this problem and found a solution, all help appreciated. /Anders From mike at skew.org Thu Aug 4 20:59:17 2005 From: mike at skew.org (Mike Brown) Date: Thu, 4 Aug 2005 12:59:17 -0600 (MDT) Subject: [XML-SIG] 'utf8' codec can't decode byte 0xc3 - bug in xmlproc? In-Reply-To: <42F246F7.4020203@norrbom.info> Message-ID: <200508041859.j74IxHjm088553@chilled.skew.org> Anders wrote: > Im having a hard time debugging this error: > > ::: character set conversion problem: 'utf8' codec can't decode byte 0xc3 in position 65535: unexpected end of data > > The file Im trying to parse with xmlproc contains no illegal utf-8 byte > sequences and this error does not occur when I switch to pyexpat. This > is a hexdump of the row its complaining about: > 00020030 64 65 73 20 6c c3 a8 76 72 65 73 20 42 6f 72 64 |des l..vres > Bord| > Its nothing wrong with this bytesequence what I can see. > > Has anyone else experienced this problem and found a solution, all help > appreciated. Apparently it's a buffering issue; the stream it's decoding only consists of 2^16 bytes, and the last one is that c3. What does your python code look like? What platform/OS is this on, and what versions of Python and PyXML? From anders at norrbom.info Fri Aug 5 10:07:43 2005 From: anders at norrbom.info (Anders Norrbom) Date: Fri, 05 Aug 2005 10:07:43 +0200 Subject: [XML-SIG] 'utf8' codec can't decode byte 0xc3 - bug in xmlproc? In-Reply-To: <200508041859.j74IxHjm088553@chilled.skew.org> References: <200508041859.j74IxHjm088553@chilled.skew.org> Message-ID: <42F31E4F.3020403@norrbom.info> That makes sense, any idea how to deal with it, flush the buffer somehow? This is what the code looks like: from xml.sax import make_parser from xml.sax.handler import feature_namespaces, feature_validation from xml.sax.handler import ContentHandler, ErrorHandler, DTDHandler . . . evalHandler = EvaluateKeyHandler() parser = make_parser(['_xmlplus.sax.drivers2.drv_xmlproc']) parser.setFeature(feature_validation, 1) parser.setFeature(feature_namespaces, 0) parser.setContentHandler(evalHandler) parser.setErrorHandler(evalHandler) f = open(file) parser.parse(f) . . . def characters(self, content): self.keywordData += content >>> xmlproc.version '0.70' PyXML-0.8.3 Python 2.3.3 Red Hat Linux 3.3.3-7 Mike Brown wrote: >Anders wrote: > > >>Im having a hard time debugging this error: >> >>::: character set conversion problem: 'utf8' codec can't decode byte 0xc3 in position 65535: unexpected end of data >> >>The file Im trying to parse with xmlproc contains no illegal utf-8 byte >>sequences and this error does not occur when I switch to pyexpat. This >>is a hexdump of the row its complaining about: >>00020030 64 65 73 20 6c c3 a8 76 72 65 73 20 42 6f 72 64 |des l..vres >>Bord| >>Its nothing wrong with this bytesequence what I can see. >> >>Has anyone else experienced this problem and found a solution, all help >>appreciated. >> >> > >Apparently it's a buffering issue; the stream it's decoding only consists of >2^16 bytes, and the last one is that c3. What does your python code look like? >What platform/OS is this on, and what versions of Python and PyXML? > > > From Uche.Ogbuji at fourthought.com Sat Aug 6 19:15:36 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Sat, 06 Aug 2005 11:15:36 -0600 Subject: [XML-SIG] SAX and DTDs In-Reply-To: <42F1E26D.3010504@gentoo.org> References: <42F1E26D.3010504@gentoo.org> Message-ID: <1123348536.4356.28.camel@borgia> On Thu, 2005-08-04 at 12:39 +0300, Alin Dobre wrote: > Hey list, > > I have a python script that does a simple parsing of a XML document > using SAX. The problem is that I cannot get to validate the XML using an > external DTD file. > > ------------ > #!/bin/env python > import sys > from xml.sax import saxlib, saxexts > class mySaxDH(saxlib.HandlerBase): > def startDocument(self): > print 'Document start' > handler = manSaxDH(sys.stdout) > parser = saxexts.make_parser() > parser.setDocumentHandler(handler) > inFile = file(sys.argv[1], 'r') > parser.parseFile(inFile) > inFile.close() > ------------ > > > data > ------------ > > For the examples shown above, I want to validate the xml stream against > the my.dtd file. Any idea how to do this using SAX? Use a validating parser. e.g. saxexts.XMLValParserFactory.make_parser() See example in listings 5 & 6 of http://www.xml.com/pub/a/2004/11/24/py-xml.html?page=2 -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html From alin at gentoo.org Sat Aug 6 19:23:07 2005 From: alin at gentoo.org (Alin Dobre) Date: Sat, 06 Aug 2005 20:23:07 +0300 Subject: [XML-SIG] SAX and DTDs In-Reply-To: <42F1E26D.3010504@gentoo.org> References: <42F1E26D.3010504@gentoo.org> Message-ID: <42F4F1FB.9090505@gentoo.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alin Dobre wrote: > Hey list, > > I have a python script that does a simple parsing of a XML document > using SAX. The problem is that I cannot get to validate the XML using an > external DTD file. > > ------------ > #!/bin/env python > import sys > from xml.sax import saxlib, saxexts > class mySaxDH(saxlib.HandlerBase): > def startDocument(self): > print 'Document start' > handler = manSaxDH(sys.stdout) > parser = saxexts.make_parser() > parser.setDocumentHandler(handler) > inFile = file(sys.argv[1], 'r') > parser.parseFile(inFile) > inFile.close() > ------------ > > > data > ------------ > > For the examples shown above, I want to validate the xml stream against > the my.dtd file. Any idea how to do this using SAX? > > Thanks, > Alin. Solved. I have used sax.sax2exts.XMLValParserFactory and now it validates the XML against the DTD file. PS: I've already seen that Uche Ogbuji gave me the same solution. Thanks. - -- Alin DOBRE Romanian Lead Translator Gentoo Documentation Project: http://www.gentoo.org/doc/en/ Gentoo.RO Community: http://www.gentoo.ro/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFC9PH7mG51ym6Hu9gRAv0KAKDv1ManMguTjiJRu/n3zPrwrcFHIgCgmQ8q K/nWS4JiSaArqi59DVkEkRE= =sFNa -----END PGP SIGNATURE----- From uche.ogbuji at fourthought.com Tue Aug 9 22:46:49 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 09 Aug 2005 14:46:49 -0600 Subject: [XML-SIG] ANN: Amara XML Toolkit 1.0 Message-ID: <1123620409.3753.18.camel@borgia> http://uche.ogbuji.net/tech/4suite/amara ftp://ftp.4suite.org/pub/Amara/ Changes in this release: * Bug fixes and documentation improvements * Incorporation of prerequisites (from 4Suite) into compact allinone package. You no longer need anything except for Python to install Amara from one package in one step. Amara XML Toolkit is a collection of Python tools for XML processing-- not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python. Amara builds on 4Suite [http://4Suite.org], but whereas 4Suite focuses more on literal implementation of XML standards in Python, Amara focuses on Pythonic idiom. It provides tools you can trust to conform with XML standards without losing the familiar Python feel. The components of Amara are: * Bindery: data binding tool (a very Pythonic XML API) * Scimitar: implementation of the ISO Schematron schema language for XML; converts Schematron files to Python scripts * domtools: set of tools to augment Python DOMs * saxtools: set of tools to make SAX easier to use in Python * Flextyper: user-defined datatypes in Python for XML processing There's a lot in Amara, but here are highlights: Amara Bindery: XML as easy as py -------------------------------- Bindery turns an XML document into a tree of Python objects corresponding to the vocabulary used in the XML document, for maximum clarity. For example, the document What do you mean "bleh" But I was looking for argument Becomes a data structure such that you can write binding.monty.python.spam In order to get the value "eggs" or binding.monty.python[1] In order to get the value "But I was looking for argument". There are other such tools for Python, and what makes Anobind unique is that it's driven by a very declarative rules-based system for binding XML to the Python data. You can register rules that are triggered by XPattern expressions specialized binding behavior. It includes XPath support and supports mutation. Bindery is very efficient, using SAX to generate bindings. Scimitar: Schematron for Pytthon -------------------------------- Merged in from a separate project, Scimitar is an implementation of ISO Schematron that compiles a Schematron schema into a Python validator script. You typically use scimitar in two phases. Say you have a schematron schema schema1.stron and you want to validate multiple XML files against it, instance1.xml, instance2.xml, instance3.xml. First you run schema1.stron through the scimitar compiler script, scimitar.py: scimitar.py schema1.stron The generated file, schema1.py, can be used to validate XML instances: python schema1.py instance1.xml Which emits a validation report. Amara DOM Tools: giving DOM a more Pythonic face ------------------------------------------------ DOM came from the Java world, hardly the most Pythonic API possible. Some DOM-like implementations such as 4Suite's Domlettes mix in some Pythonic idiom. Amara DOM Tools goes even further. Amara DOM Tools feature pushdom, similar to xml.dom.pulldom, but easier to use. It also includes Python generator-based tools for DOM processing, and a function to return an XPath location for any DOM node. Amara SAX Tools: SAX without the brain explosion ------------------------------------------------ Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing" SAX logic so that it flows more naturally, and needs a lot less state machine wizardry. License ------- Amara is open source, provided under the 4Suite variant of the Apache license. See the file COPYING for details. Installation ------------ Amara requires Python 2.3 or more recent. If you do not have 4Suite, grab the Amara-allinone package. If you already have 4Suite installed, grab the stand along Amara package. In either case, unpack to a convenient location and run: python setup.py install -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html From Pravin.Nagarajan at cognizant.com Wed Aug 10 11:59:10 2005 From: Pravin.Nagarajan at cognizant.com (Nagarajan, Pravin (Cognizant)) Date: Wed, 10 Aug 2005 15:29:10 +0530 Subject: [XML-SIG] Error on Page Message-ID: <6AFEB65BC42525489741012C03F8A5435CBE17@ctsinchnsxua.cts.com> Hi, The following page http://pyxml.sourceforge.net/topics/download.html has a redundancy error. "The distribution contains contains XML parsers implemented in both C and Python, a Python implementation of SAX and DOM, sample code, documentation, and a test suite." Regards, Pravin This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. Visit us at http://www.cognizant.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20050810/2acf2018/attachment.htm From Leif.Hardison at comverse.com Tue Aug 16 20:09:28 2005 From: Leif.Hardison at comverse.com (Hardison Leif) Date: Tue, 16 Aug 2005 14:09:28 -0400 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' Message-ID: <21558CA25042D2438FB3CFE70B31A6EB03410ACA@US-WKF-MAIL1.comverse.com> Hello, I'm currently working on learning Python's tools for working with XML. So far I have installed Python 2.4.1 and PyXML 0.8.4 on a Debian 3.1 Linux host. The error I'm receiving when running an example file is: my-host:/home/.../scripts# /usr/local/bin/python xml2.py Traceback (most recent call last): File "xml2.py", line 2, in ? from xml.dom.ext.reader import Sax2 File "/home/.../scripts/xml.py", line 1, in ? from xml.dom.ext.reader import Sax2 ImportError: No module named dom.ext.reader The xml2.py file looks like this: import sys from xml.dom.ext.reader import Sax2 from xml.dom.ext import PrettyPrint reader = Sax2.Reader() # get DOM object doc = reader.fromStream("extract.xml") PrettyPrint(doc) Python and PyXML both installed with out producing any noticeable errors. I'm stumped as even the demo files listed in the PyXML distribution say that they are out of date. Any pointers and troubleshooting advice appreciated. Thanks, Leif Hardison > Data Center Engineer Comverse leif.hardison at comverse.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20050816/af2a1eb6/attachment.htm From fdrake at acm.org Thu Aug 18 15:57:05 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 18 Aug 2005 09:57:05 -0400 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' In-Reply-To: <21558CA25042D2438FB3CFE70B31A6EB03410ACA@US-WKF-MAIL1.comverse.com> References: <21558CA25042D2438FB3CFE70B31A6EB03410ACA@US-WKF-MAIL1.comverse.com> Message-ID: <200508180957.05517.fdrake@acm.org> On Tuesday 16 August 2005 14:09, Hardison Leif wrote: > The error I'm receiving when running an example file is: > > my-host:/home/.../scripts# /usr/local/bin/python xml2.py > Traceback (most recent call last): > File "xml2.py", line 2, in ? > from xml.dom.ext.reader import Sax2 > File "/home/.../scripts/xml.py", line 1, in ? > from xml.dom.ext.reader import Sax2 > ImportError: No module named dom.ext.reader It looks like you have a file "xml.py" on your sys.path before the PyXML installation or the standard library. Is there an "xml.py" in the same directory as your xml2.py script? -Fred -- Fred L. Drake, Jr. From Leif.Hardison at comverse.com Thu Aug 18 17:47:13 2005 From: Leif.Hardison at comverse.com (Hardison Leif) Date: Thu, 18 Aug 2005 11:47:13 -0400 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' Message-ID: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> Oi! Fred, indeed that looks to be the problem... The script runs now; however it takes forever to parse the input file. Does anyone happen to have any generalized benchmarks on the performance one could expect from PyXML? The XML data files I'm working on currently are around 100MB in size and grow approximately 20MB per month if not more. Thanks! Leif Hardison >Data Center Engineer Comverse -----Original Message----- From: Fred L. Drake, Jr. [mailto:fdrake at acm.org] Sent: Thursday, August 18, 2005 9:57 AM To: xml-sig at python.org Cc: Hardison Leif Subject: Re: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' On Tuesday 16 August 2005 14:09, Hardison Leif wrote: > The error I'm receiving when running an example file is: > > my-host:/home/.../scripts# /usr/local/bin/python xml2.py > Traceback (most recent call last): > File "xml2.py", line 2, in ? > from xml.dom.ext.reader import Sax2 > File "/home/.../scripts/xml.py", line 1, in ? > from xml.dom.ext.reader import Sax2 > ImportError: No module named dom.ext.reader It looks like you have a file "xml.py" on your sys.path before the PyXML installation or the standard library. Is there an "xml.py" in the same directory as your xml2.py script? -Fred -- Fred L. Drake, Jr. ______________________________________________________________________ This email message has been scanned by PineApp Mail-Secure and has been found clean. From timh at zute.net Fri Aug 19 00:58:12 2005 From: timh at zute.net (Tim Hoffman) Date: Fri, 19 Aug 2005 06:58:12 +0800 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' In-Reply-To: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> References: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> Message-ID: <43051284.7070103@zute.net> Maybe you should look at ElementTree or more specifically cElementTree, Thiough it does have a different api/model T Hardison Leif wrote: >Oi! > >Fred, indeed that looks to be the problem... The script runs now; >however it takes forever to parse the input file. Does anyone happen to >have any generalized benchmarks on the performance one could expect from >PyXML? > >The XML data files I'm working on currently are around 100MB in size and >grow approximately 20MB per month if not more. > >Thanks! > > >Leif Hardison > > >>Data Center Engineer >> >> >Comverse > >-----Original Message----- >From: Fred L. Drake, Jr. [mailto:fdrake at acm.org] >Sent: Thursday, August 18, 2005 9:57 AM >To: xml-sig at python.org >Cc: Hardison Leif >Subject: Re: [XML-SIG] Assistant trouble shooting 'ImportError: No >module named dom.ext.reader' > >On Tuesday 16 August 2005 14:09, Hardison Leif wrote: > > The error I'm receiving when running an example file is: > > > > my-host:/home/.../scripts# /usr/local/bin/python xml2.py > Traceback >(most recent call last): > > File "xml2.py", line 2, in ? > > from xml.dom.ext.reader import Sax2 > > File "/home/.../scripts/xml.py", line 1, in ? > > from xml.dom.ext.reader import Sax2 > > ImportError: No module named dom.ext.reader > >It looks like you have a file "xml.py" on your sys.path before the PyXML >installation or the standard library. Is there an "xml.py" in the same >directory as your xml2.py script? > > > -Fred > > > From fredrik at pythonware.com Fri Aug 19 14:08:39 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 19 Aug 2005 14:08:39 +0200 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No modulenamed dom.ext.reader' References: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> Message-ID: Leif Hardison wrote: > Fred, indeed that looks to be the problem... The script runs now; > however it takes forever to parse the input file. Does anyone happen to > have any generalized benchmarks on the performance one could expect from > PyXML? > > The XML data files I'm working on currently are around 100MB in size and > grow approximately 20MB per month if not more. benchmarking XML tools is hard, and is a great way to get lots of nasty mails from people who don't know anything about software engineering, but here are some parse-only figures for common Python XML parsers: http://effbot.org/zone/celementtree.htm#benchmarks if your XML files are mostly regular (e.g. uses a record-like structure), I doubt you can beat cElementTree's iterparse function: http://effbot.org/zone/element-iterparse.htm From Leif.Hardison at comverse.com Fri Aug 19 15:49:08 2005 From: Leif.Hardison at comverse.com (Hardison Leif) Date: Fri, 19 Aug 2005 09:49:08 -0400 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' Message-ID: <21558CA25042D2438FB3CFE70B31A6EB03410AFA@US-WKF-MAIL1.comverse.com> Thanks for the suggestion Tim. Right now, I'm evaluating my different options for working with this XML file using Python. I'm excited because the project gives me one of those legit excuses to spend more time learning a language I've always wanted to spend time with. Leif Leif Hardison >Data Center Engineer Comverse +1 781 223 6754 (mobile) -----Original Message----- From: Tim Hoffman [mailto:timh at zute.net] Sent: Thursday, August 18, 2005 6:58 PM To: Hardison Leif Cc: xml-sig at python.org Subject: Re: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' Maybe you should look at ElementTree or more specifically cElementTree, Thiough it does have a different api/model T Hardison Leif wrote: >Oi! > >Fred, indeed that looks to be the problem... The script runs now; >however it takes forever to parse the input file. Does anyone happen >to have any generalized benchmarks on the performance one could expect >from PyXML? > >The XML data files I'm working on currently are around 100MB in size >and grow approximately 20MB per month if not more. > >Thanks! > > >Leif Hardison > > >>Data Center Engineer >> >> >Comverse > >-----Original Message----- >From: Fred L. Drake, Jr. [mailto:fdrake at acm.org] >Sent: Thursday, August 18, 2005 9:57 AM >To: xml-sig at python.org >Cc: Hardison Leif >Subject: Re: [XML-SIG] Assistant trouble shooting 'ImportError: No >module named dom.ext.reader' > >On Tuesday 16 August 2005 14:09, Hardison Leif wrote: > > The error I'm receiving when running an example file is: > > > > my-host:/home/.../scripts# /usr/local/bin/python xml2.py > > > Traceback >(most recent call last): > > File "xml2.py", line 2, in ? > > from xml.dom.ext.reader import Sax2 > > File "/home/.../scripts/xml.py", line 1, in ? > > from xml.dom.ext.reader import Sax2 > > ImportError: No module named dom.ext.reader > >It looks like you have a file "xml.py" on your sys.path before the >PyXML installation or the standard library. Is there an "xml.py" in >the same directory as your xml2.py script? > > > -Fred > > > ______________________________________________________________________ This email message has been scanned by PineApp Mail-Secure and has been found clean. From Uche.Ogbuji at fourthought.com Mon Aug 22 20:29:27 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Mon, 22 Aug 2005 12:29:27 -0600 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' In-Reply-To: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> References: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> Message-ID: <1124735367.18767.370.camel@borgia> Later on, this thread proves that certain grade school playground games haven't lost their attraction, but I'll instead ask a useful question. Lief, what is it precisely you're planning to do with these large files? Maybe a snippet of the XML and some pseudocode of the processing would help us help you (those who want to help you). -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html From fredrik at pythonware.com Mon Aug 22 21:47:36 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 22 Aug 2005 21:47:36 +0200 Subject: [XML-SIG] Assistant trouble shooting 'ImportError: No module named dom.ext.reader' References: <21558CA25042D2438FB3CFE70B31A6EB03410AEC@US-WKF-MAIL1.comverse.com> <1124735367.18767.370.camel@borgia> Message-ID: Uche Ogbuji wrote: > Later on, this thread proves that certain grade school playground games > haven't lost their attraction so when did you escape from the day care center? From ken.beesley at xrce.xerox.com Tue Aug 23 16:32:47 2005 From: ken.beesley at xrce.xerox.com (Ken Beesley) Date: Tue, 23 Aug 2005 16:32:47 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? Message-ID: <430B338F.6010404@xrce.xerox.com> In a few sentences, could some kind soul summarize the status of XML 1.1 processing using Python XML modules? Thanks, Ken From fredrik at pythonware.com Tue Aug 23 18:13:58 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 23 Aug 2005 18:13:58 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? References: <430B338F.6010404@xrce.xerox.com> Message-ID: Ken Beesley wrote: > In a few sentences, could some kind soul summarize the > status of XML 1.1 processing using Python XML modules? I haven't done any extensive testing, but I'm quite sure that sgmlop 1.1 supports it. I'm quite sure that expat doesn't support it, which means that most expat-based toolkits (pyexpat, PyXML, ET's default parser, Gnosis, Amara, et al) won't handle it. As for the libxml2-based toolkits (libxml itself, lxml), I don't know. From fdrake at acm.org Tue Aug 23 19:05:02 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 23 Aug 2005 13:05:02 -0400 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: References: <430B338F.6010404@xrce.xerox.com> Message-ID: <200508231305.02322.fdrake@acm.org> On Tuesday 23 August 2005 12:13, Fredrik Lundh wrote: > I'm quite sure that expat doesn't support it, which means that most > expat-based toolkits (pyexpat, PyXML, ET's default parser, Gnosis, > Amara, et al) won't handle it. Unfortunately, that's correct. I need to spend some time on Expat; the 2.0 release has been delayed way too long due to my recent non-availability. :-( -Fred -- Fred L. Drake, Jr. From veillard at redhat.com Wed Aug 24 01:04:13 2005 From: veillard at redhat.com (Daniel Veillard) Date: Tue, 23 Aug 2005 19:04:13 -0400 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: References: <430B338F.6010404@xrce.xerox.com> Message-ID: <20050823230413.GQ11470@redhat.com> On Tue, Aug 23, 2005 at 06:13:58PM +0200, Fredrik Lundh wrote: > As for the libxml2-based toolkits (libxml itself, lxml), I don't know. libxml2 doesn't. I think that lxml does though. I never really got demand to implement it so focused on stuff people were actually asking for ... Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard at redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From ken.beesley at xrce.xerox.com Wed Aug 24 13:27:04 2005 From: ken.beesley at xrce.xerox.com (Ken Beesley) Date: Wed, 24 Aug 2005 13:27:04 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? Message-ID: <430C5988.8040809@xrce.xerox.com> Many thanks to Fredrik Lundh, Fred Drake and Daniel Veillard for information on the status of XML 1.1 processing in Python. I'll do my best to do some testing and report back. Why I need XML 1.1 characters In case anyone is interested, my goal is to facilitate the definition of new Unicode input methods for Mac OS X. Apple already supplies a very human-UNfriendly XML language for defining new input methods. I have defined a new human-friendly XML language and need to convert my human-friendly XML files automatically to Apple's human- UNfriendly XML. The basic idea of input methods is that they intercept incoming key events, or sequences of key events, and map them into Unicode-character outputs that are sent to the destination, e.g. to the buffer of a Unicode text editor. Some of these Unicode output characters are control characters that are invalid in XML 1.0 but valid in XML 1.1. (I.e. when you press appropriate "control" keys on your keyboard, the output to the application is naturally a "control character".) If you define a new OS X input method in Apple's current XML format, the XML file contains control characters that are valid only in XML 1.1. The underlying (mystery) Apple parser that processes that XML file does _not_ choke on the control characters, so this processor is assuming the XML 1.1 character set, even if the XML file is overtly marked version="1.0". That's a no-no, of course; if the file is marked version="1.0", then any kosher XML processor should refuse to parse/process the file if it contains control characters not valid in XML 1.0. My human-friendly XML language is defined in Relax NG, and when I specify version="1.1", the files validate as they should using Jing. (If I change the attribute to version="1.0", then Jing properly refuses to validate the files because of the invalid control characters.) So far so good. But then when I try to write a Python script to parse the human-friendly XML language and convert it (very non-trivially) to the human-unfriendly XML language defined by Apple, the Python script (if limited to XML 1.0 processing) chokes as soon as it sees the offending control characters. Sigh. Hence my need for a Python XML parsing/processing module that handles XML 1.1 characters when the file is appropriate marked version="1.1". Thanks again for the pointers. Ken From piers.finlayson at metaswitch.com Thu Aug 25 15:04:23 2005 From: piers.finlayson at metaswitch.com (Piers Finlayson) Date: Thu, 25 Aug 2005 14:04:23 +0100 Subject: [XML-SIG] XMLRPC and SOAPpy installation problem? Message-ID: <7397E98E3B94C544BB0485F2283EF8BD23D8AE@enfimail1.datcon.co.uk> Hi, I have installed PyXML-0.8.4 on a Solaris 9 x86 machine running python 2.2 and SOAPpy 0.11.6. I am hitting this problem whenever my application receiveds a SOAP message: *** Internal exception xsd ********************************************* Traceback (most recent call last): File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Server.py", line 229, in do_POST (r, header, body, attrs) = \ File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line 1006, in parseSOAPRPC t = _parseSOAP(xml_str, rules = rules) File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line 985, in _parseSOAP parser.parse(inpsrc) File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/drivers2/drv_xmlproc. py", line 96, in feed self._parser.feed(data) File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlutils. py", line 332, in feed self.do_parse() File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlproc.p y", line 91, in do_parse self.parse_end_tag() File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlproc.p y", line 357, in parse_end_tag self.app.handle_end_tag(name) File "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/drivers2/drv_xmlproc. py", line 381, in handle_end_tag self._cont_handler.endElementNS(name, rawname) File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line 234, in endElementNS kind = (self._prem[kind[:i]], kind[i + 1:]) KeyError: xsd ************************************************************************ Does this imply some sort of installation or other problem with xmlrpc? Cheers, Piers -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20050825/70123056/attachment.htm From JRBoverhof at lbl.gov Thu Aug 25 19:56:57 2005 From: JRBoverhof at lbl.gov (Joshua Boverhof) Date: Thu, 25 Aug 2005 10:56:57 -0700 Subject: [XML-SIG] XMLRPC and SOAPpy installation problem? In-Reply-To: <7397E98E3B94C544BB0485F2283EF8BD23D8AE@enfimail1.datcon.co.uk> References: <7397E98E3B94C544BB0485F2283EF8BD23D8AE@enfimail1.datcon.co.uk> Message-ID: <430E0669.8010807@lbl.gov> Most likely the "xsd" prefix is used, but not set in the XML instance. Missing <... xmlns:xsd="..." > -josh Piers Finlayson wrote: > Hi, > > I have installed PyXML-0.8.4 on a Solaris 9 x86 machine running python > 2.2 and SOAPpy 0.11.6. I am hitting this problem whenever my > application receiveds a SOAP message: > > *** Internal exception xsd ********************************************* > Traceback (most recent call last): > File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Server.py", line > 229, in do_POST > (r, header, body, attrs) = \ > File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line > 1006, in parseSOAPRPC > t = _parseSOAP(xml_str, rules = rules) > File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line > 985, in _parseSOAP > parser.parse(inpsrc) > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line > 123, in parse > self.feed(buffer) > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py", > line 96, in feed > self._parser.feed(data) > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlutils.py", > line 332, in feed > self.do_parse() > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlproc.py", > line 91, in do_parse > self.parse_end_tag() > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/parsers/xmlproc/xmlproc.py", > line 357, in parse_end_tag > self.app.handle_end_tag(name) > File > "/opt/sfw/lib/python2.2/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py", > line 381, in handle_end_tag > self._cont_handler.endElementNS(name, rawname) > File "/opt/sfw/lib/python2.2/site-packages/SOAPpy/Parser.py", line > 234, in endElementNS > kind = (self._prem[kind[:i]], kind[i + 1:]) > KeyError: xsd > ************************************************************************ > > Does this imply some sort of installation or other problem with xmlrpc? > > Cheers, > Piers > >------------------------------------------------------------------------ > >_______________________________________________ >XML-SIG maillist - XML-SIG at python.org >http://mail.python.org/mailman/listinfo/xml-sig > > From xml-sig at mlists.thewrittenword.com Sun Aug 28 08:02:48 2005 From: xml-sig at mlists.thewrittenword.com (Albert Chin) Date: Sun, 28 Aug 2005 01:02:48 -0500 Subject: [XML-SIG] SAX and DTDs In-Reply-To: <1123348536.4356.28.camel@borgia> References: <42F1E26D.3010504@gentoo.org> <1123348536.4356.28.camel@borgia> Message-ID: <20050828060247.GA85655@mail1.thewrittenword.com> On Sat, Aug 06, 2005 at 11:15:36AM -0600, Uche Ogbuji wrote: > On Thu, 2005-08-04 at 12:39 +0300, Alin Dobre wrote: > > I have a python script that does a simple parsing of a XML document > > using SAX. The problem is that I cannot get to validate the XML using an > > external DTD file. > > > > ------------ > > #!/bin/env python > > import sys > > from xml.sax import saxlib, saxexts > > class mySaxDH(saxlib.HandlerBase): > > def startDocument(self): > > print 'Document start' > > handler = manSaxDH(sys.stdout) > > parser = saxexts.make_parser() > > parser.setDocumentHandler(handler) > > inFile = file(sys.argv[1], 'r') > > parser.parseFile(inFile) > > inFile.close() > > ------------ > > > > > > data > > ------------ > > > > For the examples shown above, I want to validate the xml stream against > > the my.dtd file. Any idea how to do this using SAX? > > Use a validating parser. e.g. > > saxexts.XMLValParserFactory.make_parser() What if you wanted to validate against an external DTD that you wish to load separately? The following doesn't work: p = saxexts.XMLValParserFactory.make_parser () p.parser.dtd = load_dtd ("[DTD File]") p.setDocumentHandler (xmlh) p.feed ([XML FILE AS STRING]) -- albert chin (china at thewrittenword.com) From xml-sig at mlists.thewrittenword.com Sun Aug 28 08:18:11 2005 From: xml-sig at mlists.thewrittenword.com (Albert Chin) Date: Sun, 28 Aug 2005 01:18:11 -0500 Subject: [XML-SIG] SAX and DTDs In-Reply-To: <20050828060247.GA85655@mail1.thewrittenword.com> References: <42F1E26D.3010504@gentoo.org> <1123348536.4356.28.camel@borgia> <20050828060247.GA85655@mail1.thewrittenword.com> Message-ID: <20050828061811.GB85655@mail1.thewrittenword.com> On Sun, Aug 28, 2005 at 01:02:48AM -0500, Albert Chin wrote: > On Sat, Aug 06, 2005 at 11:15:36AM -0600, Uche Ogbuji wrote: > > On Thu, 2005-08-04 at 12:39 +0300, Alin Dobre wrote: > > > I have a python script that does a simple parsing of a XML document > > > using SAX. The problem is that I cannot get to validate the XML using an > > > external DTD file. > > > > > > ------------ > > > #!/bin/env python > > > import sys > > > from xml.sax import saxlib, saxexts > > > class mySaxDH(saxlib.HandlerBase): > > > def startDocument(self): > > > print 'Document start' > > > handler = manSaxDH(sys.stdout) > > > parser = saxexts.make_parser() > > > parser.setDocumentHandler(handler) > > > inFile = file(sys.argv[1], 'r') > > > parser.parseFile(inFile) > > > inFile.close() > > > ------------ > > > > > > > > > data > > > ------------ > > > > > > For the examples shown above, I want to validate the xml stream against > > > the my.dtd file. Any idea how to do this using SAX? > > > > Use a validating parser. e.g. > > > > saxexts.XMLValParserFactory.make_parser() > > What if you wanted to validate against an external DTD that you wish > to load separately? The following doesn't work: > p = saxexts.XMLValParserFactory.make_parser () > p.parser.dtd = load_dtd ("[DTD File]") > p.setDocumentHandler (xmlh) > p.feed ([XML FILE AS STRING]) This seems to work: p = saxexts.XMLValParserFactory.make_parser () p.parser.dtd = load_dtd ("[DTD File]") p.parser.val.dtd = p.parser.dtd p.setDocumentHandler (xmlh) p.feed ([XML FILE AS STRING]) However, it doesn't work when using p.parseFile() instead of p.feed (). -- albert chin (china at thewrittenword.com) From marcelolrocha at yahoo.com.br Mon Aug 29 14:26:45 2005 From: marcelolrocha at yahoo.com.br (Marcelo Rocha) Date: Mon, 29 Aug 2005 09:26:45 -0300 (ART) Subject: [XML-SIG] problems in PyXML 0.8.4 instalation Message-ID: <20050829122645.16432.qmail@web32912.mail.mud.yahoo.com> Hi, When I try install PyXML0.8.4 in my Linux (Debian 3.0) receive this error message: $ python setup.py build Traceback (most recent call last): File "setup.py", line 127, in ? config_h_vars = parse_config_h(open(config_h)) IOError: [Errno 2] No such file or directory: '/usr/include/python2.3/pyconfig.h What is this? I use Python 2.3. []'s Marcelo __________________________________________________ Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/ From ping at pingyeh.net Mon Aug 29 14:44:37 2005 From: ping at pingyeh.net (Ping Yeh) Date: Mon, 29 Aug 2005 20:44:37 +0800 Subject: [XML-SIG] problems in PyXML 0.8.4 instalation In-Reply-To: <20050829122645.16432.qmail@web32912.mail.mud.yahoo.com> References: <20050829122645.16432.qmail@web32912.mail.mud.yahoo.com> Message-ID: <43130335.10106@pingyeh.net> I'm not using debian but it seems you don't have the python development package ("python-dev" or something like that) installed in your system. Ping Marcelo Rocha wrote: > Hi, > > When I try install PyXML0.8.4 in my Linux (Debian 3.0) > receive this error message: > $ python setup.py build > Traceback (most recent call last): > File "setup.py", line 127, in ? > config_h_vars = parse_config_h(open(config_h)) > IOError: [Errno 2] No such file or directory: > '/usr/include/python2.3/pyconfig.h > > What is this? > I use Python 2.3. > > []'s > > Marcelo > > > __________________________________________________ > Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/ _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig From Uche.Ogbuji at fourthought.com Mon Aug 29 19:58:46 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Mon, 29 Aug 2005 11:58:46 -0600 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <430C5988.8040809@xrce.xerox.com> References: <430C5988.8040809@xrce.xerox.com> Message-ID: <1125338326.1220.36.camel@borgia> On Wed, 2005-08-24 at 13:27 +0200, Ken Beesley wrote: > Many thanks to Fredrik Lundh, Fred Drake and Daniel Veillard > for information on the status of XML 1.1 processing in Python. > I'll do my best to do some testing and report back. > > Why I need XML 1.1 characters > > In case anyone is interested, my goal is to facilitate > the definition of new Unicode input methods for Mac OS X. > Apple already supplies a very human-UNfriendly XML > language for defining new input methods. I have defined a > new human-friendly XML language and need to convert my > human-friendly XML files automatically to Apple's human- > UNfriendly XML. > > The basic idea of input methods is that they > intercept incoming key events, or sequences of key events, and > map them into Unicode-character outputs that are sent to > the destination, > e.g. to the buffer of a Unicode text editor. Some of these > Unicode output characters are control characters that are > invalid in XML 1.0 but valid in XML 1.1. (I.e. when you > press appropriate "control" keys on your keyboard, the output > to the application is naturally a "control character".) > > If you define a new OS X input method in Apple's current > XML format, the XML file contains control characters that > are valid only in XML 1.1. The underlying (mystery) Apple > parser that processes > that XML file does _not_ choke on the control characters, > so this processor is assuming the XML 1.1 character set, > even if the XML file is overtly marked version="1.0". That's > a no-no, of course; if the file is marked version="1.0", then > any kosher XML processor should refuse to parse/process > the file if it contains control characters not valid in XML 1.0. > > My human-friendly XML language is defined in Relax NG, > and when I specify version="1.1", the files validate as they > should using Jing. (If I change the attribute to version="1.0", then > Jing properly refuses to validate the files because of the invalid > control characters.) So far so good. > But then when I try to write a Python script to parse > the human-friendly XML language and convert it (very non-trivially) > to the human-unfriendly XML language defined by Apple, > the Python script (if limited to XML 1.0 processing) chokes > as soon as it sees the offending control characters. Sigh. > > Hence my need for a Python XML parsing/processing module > that handles XML 1.1 characters when the file is > appropriate marked version="1.1". Interesting. nd thanks for taking a time to state your case so clearly. XML 1.1 does allow more control chars than 1.0, but some are still banned, so I don't think you have a comprehensive solution here. My suggested solution would be something mike Brown and I have often discussed: mapping the illegal characters to the Unicode private use area (PUA), and then back, as needed. Python should make this an easy solution. You can also use special elements or processing instructions to encode the problem characters. I do not suggest relying on XML 1.1 in the way you propose because uptake for it is slow not only in the Expat world. It's a pretty controversial spec (like every second-gen spec the W3C produces, it seems). Even if libxml folks and the Expat folks actually commit to XML 1.1 support, it will probably be a little while in coming, so I suggest a more general workaround, such as I've suggested within the bounds of XML 1.0. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html From ken.beesley at xrce.xerox.com Tue Aug 30 10:07:10 2005 From: ken.beesley at xrce.xerox.com (Ken Beesley) Date: Tue, 30 Aug 2005 10:07:10 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <1125338326.1220.36.camel@borgia> References: <430C5988.8040809@xrce.xerox.com> <1125338326.1220.36.camel@borgia> Message-ID: <431413AE.1000309@xrce.xerox.com> Thanks to Uche Ogbuji for the response. In insert some comments below Uche Ogbuji wrote: >On Wed, 2005-08-24 at 13:27 +0200, Ken Beesley wrote: > > >>Many thanks to Fredrik Lundh, Fred Drake and Daniel Veillard >>for information on the status of XML 1.1 processing in Python. >>I'll do my best to do some testing and report back. >> >> Why I need XML 1.1 characters >> >>In case anyone is interested, my goal is to facilitate >>the definition of new Unicode input methods for Mac OS X. >>Apple already supplies a very human-UNfriendly XML >>language for defining new input methods. I have defined a >>new human-friendly XML language and need to convert my >>human-friendly XML files automatically to Apple's human- >>UNfriendly XML. >> >>The basic idea of input methods is that they >>intercept incoming key events, or sequences of key events, and >>map them into Unicode-character outputs that are sent to >>the destination, >>e.g. to the buffer of a Unicode text editor. Some of these >>Unicode output characters are control characters that are >>invalid in XML 1.0 but valid in XML 1.1. (I.e. when you >>press appropriate "control" keys on your keyboard, the output >>to the application is naturally a "control character".) >> >>If you define a new OS X input method in Apple's current >>XML format, the XML file contains control characters that >>are valid only in XML 1.1. The underlying (mystery) Apple >>parser that processes >>that XML file does _not_ choke on the control characters, >>so this processor is assuming the XML 1.1 character set, >>even if the XML file is overtly marked version="1.0". That's >>a no-no, of course; if the file is marked version="1.0", then >>any kosher XML processor should refuse to parse/process >>the file if it contains control characters not valid in XML 1.0. >> >>My human-friendly XML language is defined in Relax NG, >>and when I specify version="1.1", the files validate as they >>should using Jing. (If I change the attribute to version="1.0", then >>Jing properly refuses to validate the files because of the invalid >>control characters.) So far so good. >>But then when I try to write a Python script to parse >>the human-friendly XML language and convert it (very non-trivially) >>to the human-unfriendly XML language defined by Apple, >>the Python script (if limited to XML 1.0 processing) chokes >>as soon as it sees the offending control characters. Sigh. >> >>Hence my need for a Python XML parsing/processing module >>that handles XML 1.1 characters when the file is >>appropriate marked version="1.1". >> >> > >Interesting. nd thanks for taking a time to state your case so clearly. > >XML 1.1 does allow more control chars than 1.0, but some are still >banned, so I don't think you have a comprehensive solution here. > > The value 0x0000 (null) is still banned, but in XML 1.1 all characters from 0x0001 through 0x001F are now legal, as long as they are expressed inside an XML 1.1 document as Character References, e.g.  The addition of these characters solves the problem for Apple OS X input methods. In fact, characters like  (Backspace) already appear (as character references) in existing Apple OS X input methods (written in the unfriendly XML format mentioned in my previous message). If you press the Backspace key on your keyboard, you want the input method to pass "the backspace character"  through to the application. One might suspect that applications like this prompted the change in 1.1 The existing (hidden) Mac parser that parses XML specifications of input methods (into a low-level binary format) already handles  and other control characters now legal in XML 1.1 So this hidden Mac parser is XML 1.1-capable, at least as far as control characters are concerned. >My suggested solution would be something mike Brown and I have often >discussed: mapping the illegal characters to the Unicode private use >area (PUA), and then back, as needed. Python should make this an easy >solution. You can also use special elements or processing instructions >to encode the problem characters. > >I do not suggest relying on XML 1.1 in the way you propose because >uptake for it is slow not only in the Expat world. It's a pretty >controversial spec (like every second-gen spec the W3C produces, it >seems). > >Even if libxml folks and the Expat folks actually commit to XML 1.1 >support, it will probably be a little while in coming, so I suggest a >more general workaround, such as I've suggested within the bounds of XML >1.0. > > Yes, one can obviously cobble together some kind of work-around, but it's unattractive when XML 1.1 has existed for a year and a half and would solve the problem (and when references like  are already being handled in Apple's own XML input-method language). When I define my own more human-friendly XML, forcing the use of PUA characters (which would get mapped to 1.1 control character references in the unfriendly XML output) puts an unattractive and unintuitive gap between my (hopefully) friendly XML language and Apple's existing unfriendly XML. Sigh. I see that pxdom claims to be pure Python and claims to handle XML 1.1 I'm not very excited about using DOM of any kind, but perhaps it's a solution. Thanks again, Ken From ht at inf.ed.ac.uk Tue Aug 30 10:16:05 2005 From: ht at inf.ed.ac.uk (Henry S. Thompson) Date: Tue, 30 Aug 2005 09:16:05 +0100 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <431413AE.1000309@xrce.xerox.com> (Ken Beesley's message of "Tue, 30 Aug 2005 10:07:10 +0200") References: <430C5988.8040809@xrce.xerox.com> <1125338326.1220.36.camel@borgia> <431413AE.1000309@xrce.xerox.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 PyLTXML [1] has supported XML 1.1 for over a year. ht [1] http://www.ltg.ed.ac.uk/software/xml/ - -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht at inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFDFBXFkjnJixAXWBoRAn/jAJ0XxSz6P7QWpMHj2dUtY8zJ+zGlqgCfYfWK dUnKyO5cQNi450ZLCsiyTLQ= =A+lI -----END PGP SIGNATURE----- From uche.ogbuji at fourthought.com Wed Aug 31 00:20:30 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 30 Aug 2005 16:20:30 -0600 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <431413AE.1000309@xrce.xerox.com> References: <430C5988.8040809@xrce.xerox.com> <1125338326.1220.36.camel@borgia> <431413AE.1000309@xrce.xerox.com> Message-ID: <1125440430.14255.112.camel@borgia> On Tue, 2005-08-30 at 10:07 +0200, Ken Beesley wrote: > Yes, one can obviously cobble together some kind of work-around, > but it's unattractive when XML 1.1 has existed for a year and a > half and would solve the > problem (and when references like  are already being > handled in Apple's own XML input-method language). I'm glad you're on the path to a solution that works for you, bu I could not pass on responding to the above. Just because XML 1.1 has been out for N months doesn't mean we have to like it. And you might find it hard to get people to implement something they don't care for (unless, of course, there's pay involved). Anyway, best of luck with pxdom. http://www.xml.com/pub/a/2003/12/17/py-xml.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html From fredrik at pythonware.com Wed Aug 31 00:57:51 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 31 Aug 2005 00:57:51 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? References: <430B338F.6010404@xrce.xerox.com> Message-ID: I wrote: >> In a few sentences, could some kind soul summarize the >> status of XML 1.1 processing using Python XML modules? > > I haven't done any extensive testing, but I'm quite sure that sgmlop > 1.1 supports it. fwiw, as the following snippet illustrates, ET+sgmlop can read files with 1.1-style character references, but the ET serializer doesn't encode such characters on the way out. this script from elementtree import ElementTree, SgmlopXMLTreeBuilder from StringIO import StringIO file = StringIO("this is a backspace: ") doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder()) root = doc.getroot() print repr(root.text) print repr(ElementTree.tostring(root)) prints 'this is a backspace: \x08' 'this is a backspace: \x08' which isn't entirely correct. fixing this in ElementTree is pretty straightforward; just tweak the RE, and make sure _encode_entity is called for all cdata sections. you can also use the following brute-force runtime patch: # patch the ET serializer (works with 1.2.X, may break beyond that) import re from elementtree import ElementTree escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+') ElementTree._encode_entity.func_defaults = (escape,) ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a) # end From veillard at redhat.com Wed Aug 31 18:02:15 2005 From: veillard at redhat.com (Daniel Veillard) Date: Wed, 31 Aug 2005 12:02:15 -0400 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <431413AE.1000309@xrce.xerox.com> References: <430C5988.8040809@xrce.xerox.com> <1125338326.1220.36.camel@borgia> <431413AE.1000309@xrce.xerox.com> Message-ID: <20050831160215.GL10194@redhat.com> On Tue, Aug 30, 2005 at 10:07:10AM +0200, Ken Beesley wrote: > The existing (hidden) Mac parser that parses XML specifications > of input methods (into a low-level binary format) already > handles  and other control characters now legal in XML 1.1 > So this hidden Mac parser is XML 1.1-capable, at least as far as > control characters are concerned. The real problem is that "parser" is from your initial description not an XML-1.0 parser nor an XML-1.1 parser. Send some flames to Apple for breaking a standard that everybody else tried to conform to. Then work around that broken piece in their stack if you want but as always for conformance problems workarounds it's just lost time in the long term. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard at redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From ken.beesley at xrce.xerox.com Wed Aug 31 19:55:52 2005 From: ken.beesley at xrce.xerox.com (Ken Beesley) Date: Wed, 31 Aug 2005 19:55:52 +0200 Subject: [XML-SIG] Status of XML 1.1 processing in Python? In-Reply-To: <20050831160215.GL10194@redhat.com> References: <430C5988.8040809@xrce.xerox.com> <1125338326.1220.36.camel@borgia> <431413AE.1000309@xrce.xerox.com> <20050831160215.GL10194@redhat.com> Message-ID: <4315EF28.4010203@xrce.xerox.com> Daniel Veillard wrote: >On Tue, Aug 30, 2005 at 10:07:10AM +0200, Ken Beesley wrote: > > >>The existing (hidden) Mac parser that parses XML specifications >>of input methods (into a low-level binary format) already >>handles  and other control characters now legal in XML 1.1 >>So this hidden Mac parser is XML 1.1-capable, at least as far as >>control characters are concerned. >> >> > > The real problem is that "parser" is from your initial description >not an XML-1.0 parser nor an XML-1.1 parser. Send some flames to Apple >for breaking a standard that everybody else tried to conform to. Then >work around that broken piece in their stack if you want but as always >for conformance problems workarounds it's just lost time in the long term. > > > First, I'd like to thank experts like Daniel Veillard, Uche Ogbuji and others who have responded to my XML 1.1 messages. I very much appreciate your volunteer work in creating and maintaining tools for XML processing. Yes, as I pointed out in an earlier message, this Apple behavior is formally a no-no. It is of course the official duty of a respectable XML parser to refuse to parse a document marked version="1.0" if it contains character references like  that are legal only in XML 1.1. Apple is faultable here, but it should be understood that it's their own private HIDDEN parser, used for exactly one specific application: this hidden parser translates OS-X-input-method-defining XML files, defined by a DTD documented in http://developer.apple.com/technotes/tn2002/tn2056.html, into an even less human-friendly binary format that OS X really uses internally. This hidden parser has only one purpose in life; it's a dog that knows only one trick. This OS X input-method application naturally "needs" to refer to XML 1.1 characters; and Apple has apparently wired XML 1.1 assumptions into this hidden, one-trick parser. Their sin would be wiped away if they simply required that the input files be marked properly as version="1.1". But, again, that's not my "real problem". I need and want to validate and parse XML 1.1 documents containing character references that are legal only in XML 1.1. I'm willing and anxious to mark the files properly as version="1.1". I don't want to force XML 1.1 on anyone; but it's _exactly_ what I need for my application. There must be some other people out there with the same needs, in particular the people who went out of their way to write the XML 1.1 recommendation. The "real problem" or real nuisance for me is that so few of the open, general-purpose XML tools can handle XML 1.1 at all. Even if I mark my XML files properly as version="1.1", the tools can't handle them because they're limited to XML 1.0. Here's what I've found so far: The Jing validating parser, for Relax NG schemas, seems to validate XML 1.0 vs. XML 1.1 correctly. Nice. http://www.thaiopensource.com/relaxng/jing.html pxdom (http://www.doxdesk.com/software/py/pxdom.html) is a pure Python implementation of DOM, not dependent on Expat, and claims to handle XML 1.0 and XML 1.1 PyLTXML, from the Univ. of Edinburgh, also claims to handle XML 1.0 and XML 1.1. (http://www.ltg.ed.ac.uk/software/xml/) With pxdom or PyLTXML (still to be tested) it would appear that I can do what I need to do, using real XML 1.1, and not have to resort to any workarounds. I'd _prefer_ to use pulldom or perhaps Ogbuji's very attractive binderytools.pushbind(). If I were half as dedicated to XML 1.1 as Veillard and Ogbuji are to XML in general, I'd roll up my sleeves and contribute to the development rather than just begging. :) Thanks again to all those working on XML tools, Ken