From faassen at infrae.com Tue Feb 1 20:03:53 2005 From: faassen at infrae.com (Martijn Faassen) Date: Tue Feb 1 20:03:55 2005 Subject: [XML-SIG] SOAPpy streaming base64 In-Reply-To: <200501311627.14973.erik@cq2.nl> References: <200501311627.14973.erik@cq2.nl> Message-ID: <41FFD299.1030902@infrae.com> Erik J. Groeneveld wrote: > I am new to this list. I am developing a web site that harvests OAI > repositories using the oai-mph protocol, and uploads the records to a > indexing service using SOAPpy. Just in case you hadn't seen it yet, have you seen Infrae's oaipmh module? Our software stack does much more than that module (including indexing using Zope and CMS integration), but it may be interesting to you. The stuff is all open source. The python module: http://www.infrae.com/download/oaipmh/ Our stack of OAI stuff: http://www.infrae.com/products/oaipack Regards, Martijn From korea12123 at korea.com Wed Feb 2 06:03:56 2005 From: korea12123 at korea.com (Ç᯽şˇĐ3) Date: Wed Feb 2 06:04:18 2005 Subject: [XML-SIG] =?iso-8859-1?q?=A2=BC=B1=E4=B1=DE=C0=DA=B1=DD_=C4=AB?= =?iso-8859-1?q?=B5=E5=B0=E1=C1=A6_=BF=F90=2E9=7E1=2E7=25=B1=DD=B8?= =?iso-8859-1?q?=AE=B7=CE5=C3=B5=B8=B8=BF=F8=B1=EE=C1=F6?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20050202/07a7dc1f/attachment.html From prasad_st at beceem.com Wed Feb 2 10:40:00 2005 From: prasad_st at beceem.com (Prasad PS) Date: Wed Feb 2 10:40:10 2005 Subject: [XML-SIG] Re: Could somebody help me? Message-ID: Hi, Using the code below, I have created a file "TempView.xml". Could anybody tell me how to append another "Employee" to the existing xml file? Here's the snippet I did to create an xml file. import getopt import os import string import sys import xml.dom.minidom from xml.dom.minidom import Node from xml.dom import minidom from xml.dom.ext.reader.Sax2 import FromXmlStream from xml.dom.ext.reader import Sax2 from xml.dom.ext import PrettyPrint from xml.dom.DOMImplementation import implementation import xml.sax.writer import xml.utils class LogView: def __init__(self): self.LogViewFile = open("TempView.xml",'w') self.document = implementation.createDocument(None,None,None) self.logViews = self.document.createElement("EmpDetails") self.document.appendChild(self.logViews) def createViewFile(self): self.logViews.appendChild(doc.createTextNode("\n ")) logdetail = doc.createElement("Address") self.logViews.appendChild(logdetail) logdetail.appendChild(doc.createTextNode("\n ")) tcidNode = doc.createElement("Name") tcidNode.appendChild(doc.createTextNode("Prasad")) logdetail.appendChild(tcidNode) logdetail.appendChild(doc.createTextNode("\n ")) grpNode = doc.createElement("Age") grpNode.appendChild(doc.createTextNode("28")) logdetail.appendChild(grpNode) logdetail.appendChild(doc.createTextNode("\n ")) def finalStep(self): t = self.document.createTextNode("\n") self.logViews.appendChild(t) PrettyPrint(self.document, self.LogViewFile) self.LogViewFile.write("\n") Prasad.p.s. -----Original Message----- From: Uche Ogbuji [mailto:Uche.Ogbuji@fourthought.com] Sent: Friday, January 28, 2005 8:43 PM To: Prasad PS Cc: XML-SIG Subject: RE: [XML-SIG] Re: Could somebody help me? On Fri, 2005-01-28 at 15:48 +0530, Prasad PS wrote: > Sure, here is the code > > In the code below, what I am doing is - I am opening an xml file and > appending a node to the root document. Then I add this root document to > the xml file > fp = open (string.strip(self.cnfDtls.GetLogFilePath()), 'w') > xml.dom.ext.PrettyPrint(doc, self.xmlFile) > self.xmlFile.write("\n") > fp.close(). So you tried the first choice (PyXML) rather than the second (Amara). OK. You were not clear on that. Your first problem is that you're using xml.dom.ext.reader.FromXmlStream rather than from xml.dom import minidom doc = minidom.parse(string.strip(self.cnfDtls.GetLogFilePath())) ... doc.toprettyxml() (rather than xml.dom.ext.PrettyPrint) That's the fault of the PyXML docs, which should really be updated. Side question: you mean you're appending a node to the document element, right? Not the root document. The latter would result in an invalid XML document entity. In the code you posted, it looks as if you only append to subsidiary nodes, so that should be OK. Even using 4DOM, your general approach should work, and I've used it oftentimes before (in the far-off past), with no problem, so I wonder: Are you sure self.xmlFile is "empty" at the point of the xml.dom.ext.PrettyPrint? If so, I suggest you whittle down a test case that reveals the apparent bug, and post data and complete, runnable code (preferably after switching to minidom). If it seems a clear bug, you can use the PyXML bug tracker. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From Chandra.Reddy01 at ca.com Wed Feb 2 12:54:00 2005 From: Chandra.Reddy01 at ca.com (Reddy, Chandra B) Date: Wed Feb 2 12:54:05 2005 Subject: [XML-SIG] import Error No module named ext.reader.Sax2 Message-ID: <16C3BD3BBB0FA04D967519E9FED1E8C0840144@inhyms21.ca.com> Hi, When I try to import the xml.dom.ext.reader.Sax2 I am getting the following error.Can any one help me how to solve this problem. from xml.dom.ext.reader.Sax2 import FromXmlStream, ImportError: No module named ext.reader.Sax2 Thanks & Regards, B. Chandra Reddy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20050202/52698159/attachment.htm From fredrik at pythonware.com Wed Feb 2 18:50:33 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 2 18:51:14 2005 Subject: [XML-SIG] Re: import Error No module named ext.reader.Sax2 References: <16C3BD3BBB0FA04D967519E9FED1E8C0840144@inhyms21.ca.com> Message-ID: "Reddy, Chandra B" wrote: > When I try to import the xml.dom.ext.reader.Sax2 I am getting the > following error.Can any one help me how to solve this problem. > > from xml.dom.ext.reader.Sax2 import FromXmlStream, > > ImportError: No module named ext.reader.Sax2 have you installed the PyXML extension? http://pyxml.sourceforge.net/ http://pyxml.sourceforge.net/topics/howto/section-install.html From jedp at ilm.com Wed Feb 2 18:59:49 2005 From: jedp at ilm.com (Jed Parsons) Date: Wed Feb 2 18:59:56 2005 Subject: [XML-SIG] chaining sax handlers Message-ID: <20050202095949.T11196@ilm.com> Hi, all, I would like to do with sax processors what I can do with the document() function in xslt, namely include other documents into the one that's being parsed. Here's a sample handler, and three xml files. This approach seems to work for simple cases, but appears to break the innards of the handler (described below): # ---------------------------------------------------------------------- # chaining handler class FooHandler(xml.sax.handler.ContentHandler): def characters(self, data): print data def include_proc(self, href): filter = xml.sax.make_parser() filter.setContentHandler(self) filter.parse(href) def startElement(self, name, attrs): if name == 'include': self.include_proc(attrs.get('href')) # ---------------------------------------------------------------------- # some xml files to work with: # file1.xml: # This is file1 # file2.xml: # This is file2 Back in file2 again after include # file3.xml: # This is file3. # ---------------------------------------------------------------------- # results (with whitespace removed): >>> filter = xml.sax.make_parser() >>> handler = FooHandler() >>> filter.setContentHandler(handler) >>> filter.parse('file1.xml') This is file1 This is file3. >>> filter.parse('file2.xml') This is file2 This is file3. Back in file2 again after include >>> So this seems to work in a simple case. w00t! But in a more involved handler, I get errors like "weakly-referenced object no longer exists" when I try to access the document locator after re-entering. Can anyone tell me what I'm doing wrong? Many thanks for any help, Jed -- Jed Parsons / Industrial Light + Magic : 415.448.2974 grep(do{for(ord){$o+=$_&7;grep(vec($j,+$o++,1)=1,5..($_>>3||print"$j\n"))}}, (split(//,"))*))2+29*2:.*4:1A1+9,1))2*:..)))2*:31.-1)4131)1))2*:\7Glug!"))); From uche.ogbuji at fourthought.com Wed Feb 2 23:57:04 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Wed Feb 2 23:57:30 2005 Subject: [XML-SIG] ANN: Amara XML Toolkit 0.9.4 Message-ID: <1107385024.4527.3.camel@borgia> http://uche.ogbuji.net/tech/4Suite/amara ftp://ftp.4suite.org/pub/Amara/ Changes in this release: * Add binderytools.type_inference rule which automatically converts XML nodes to native Python objects such as int, float and datetime * Improve threading and signal behavior of pushdom and pushbind * Add support for attributes() method on nodes. Can now call Ft.Xml.Domlette.PrettyPrint on bindery nodes * Add lazy attributes support by default. amara.binderytools.preserve_attribute_details rule now obsolete XPath always supports attribute access, now * rename prefixes node property to xmlns_prefixes * Update demos and tests * Add CherryPy demo (CherryPy rocks: http://www.cherrypy.org/) * Bug fixes The new binderytools.type_inference is similar to what's popularly called "XML marshalling": TYPE_MIX = """\ 5 2003-01-30T17:48:07.848769Z good """ rules=[binderytools.type_inference()] doc = binderytools.bind_string(TYPE_MIX, rules=rules) doc.a.a1 == 1 #type int doc.a.b.b1 == 2.1 #type float doc.a.c.c1 == datetime.datetime(2005, 1, 31) #type datetime. So wherever it's reasonable to interpret an XML node as one of these simple Python types, this new rule will work them naturally into the data binding. Amara XML Toolkit is a collection of Python tools for XML processing-- not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python. Amara builds on 4Suite [http://4Suite.org], but whereas 4Suite focuses more on literal implementation of XML standards in Python, Amara focuses on Pythonic idiom. It provides tools you can trust to conform with XML standards without losing the familiar Python feel. The components of Amara are: * Bindery: data binding tool (a very Pythonic XML API) * Scimitar: implementation of the ISO Schematron schema language for XML; converts Schematron files to Python scripts * domtools: set of tools to augment Python DOMs * saxtools: set of tools to make SAX easier to use in Python * Flextyper: user-defined datatypes in Python for XML processing There's a lot in Amara, but here are highlights: Amara Bindery: XML as easy as py -------------------------------- Based on the retired project Anobind, but updated to use SAX rather than DOM to create bindings. Bindery reads an XML document and returns a data structure of Python objects corresponding to the vocabulary used in the XML document, for maximum clarity. Bindery turns the document What do you mean "bleh" But I was looking for argument Into a set of objects such that you can write binding.monty.python.spam In order to get the value "eggs" or binding.monty.python[1] In order to get the value "But I was looking for argument". There are other such tools for Python, and what makes Anobind unique is that it's driven by a very declarative rules-based system for binding XML to the Python data. You can register rules that are triggered by XPattern expressions specialized binding behavior. It includes XPath support and supports mutation. Bindery is very efficient, using SAX to generate bindings. Scimitar: Schematron for Python -------------------------------- Merged in from a separate project, Scimitar is an implementation of ISO Schematron that compiles a Schematron schema into a Python validator script. You typically use scimitar in two phases. Say you have a schematron schema schema1.stron and you want to validate multiple XML files against it, instance1.xml, instance2.xml, instance3.xml. First you run schema1.stron through the scimitar compiler script, scimitar.py: scimitar.py schema1.stron The generated file, schema1.py, can be used to validate XML instances: python schema1.py instance1.xml Which emits a validation report. Amara DOM Tools: giving DOM a more Pythonic face ------------------------------------------------ DOM came from the Java world, hardly the most Pythonic API possible. Some DOM-like implementations such as 4Suite's Domlettes mix in some Pythonic idiom. Amara DOM Tools goes even further. Amara DOM Tools feature pushdom, similar to xml.dom.pulldom, but easier to use. It also includes Python generator-based tools for DOM processing, and a function to return an XPath location for any DOM node. Amara SAX Tools: SAX without the brain explosion ------------------------------------------------ Tenorsax (amara.saxtools.tenorsax) is a framework for "linerarizing" SAX logic so that it flows more naturally, and needs a lot less state machine wizardry. License ------- Amara is open source, provided under the 4Suite variant of the Apache license. See the file COPYING for details. Installation ------------ Amara requires Python 2.3 or more recent and 4Suite 1.0a4 or more recent. Make sure these are installed, unpack Amara to a convenient location and run python setup.py install -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From uche.ogbuji at fourthought.com Fri Feb 4 07:23:21 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Feb 4 07:23:29 2005 Subject: [XML-SIG] Article on converting WordNet to XML using Python Message-ID: <1107498201.4527.45.camel@borgia> Thought I should mention it, since it's not in a spot where you'd usually find about Python/XML. http://www.ibm.com/developerworks/xml/library/x-think29.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From uche.ogbuji at fourthought.com Fri Feb 4 18:20:56 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Feb 4 18:21:00 2005 Subject: [XML-SIG] XBEL resource page updates In-Reply-To: <41FE8261.4020705@v.loewis.de> References: <1106898215.8243.44.camel@borgia> <41FCA80D.4050709@v.loewis.de> <1107096130.8243.172.camel@borgia> <41FD23AB.1020302@v.loewis.de> <1107182375.8243.194.camel@borgia> <41FE8261.4020705@v.loewis.de> Message-ID: <1107537656.4527.74.camel@borgia> On Mon, 2005-01-31 at 20:09 +0100, "Martin v. L?wis" wrote: > Uche Ogbuji wrote: > > Well, I've done the last few Web page updates, anyway, and I'm already > > set up as a developer. Besides the 1.2 discussion, it's light enough > > work that I'm willing to take responsibility as XBEL maintainer. > > Very good! If I can help with more infrastructure (mailing lists on SF > or python.org, etc) please let me know. Other XBEL folks, what do you think of: * An XBEL SF project of its own * Its own SF mailing list * Its own SF home page, file releases, etc. ? Of this sounds good, I'll need some help getting it all set up. My time is limited. I'm OK making the basic SF project request, and some initial set-up. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From junkc at fh-trier.de Sat Feb 5 11:57:36 2005 From: junkc at fh-trier.de (Christian Junk) Date: Sat Feb 5 11:57:26 2005 Subject: [XML-SIG] XBEL resource page updates In-Reply-To: <1107537656.4527.74.camel@borgia> References: <1106898215.8243.44.camel@borgia> <41FE8261.4020705@v.loewis.de> <1107537656.4527.74.camel@borgia> Message-ID: <200502051157.36582.junkc@fh-trier.de> Am Freitag, 4. Februar 2005 18:20 schrieb Uche Ogbuji: > Other XBEL folks, what do you think of: > > * An XBEL SF project of its own > * Its own SF mailing list > * Its own SF home page, file releases, etc. > > ? > > Of this sounds good, I'll need some help getting it all set up. My time > is limited. I'm OK making the basic SF project request, and some > initial set-up. Hi! I think it is a very good idea and this was my intention when I created the site: http://xbel.webinternals.de I'm able to help you with the design of the home page. Regards, Christian -- Christian Junk FH Trier, University of Applied Sciences Faculty of Design and Applied Computer Science http://christianjunk.webinternals.de http://xbel.webinternals.de From fredrik at pythonware.com Sat Feb 5 14:51:16 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 5 14:51:12 2005 Subject: [XML-SIG] ANN: ElementTidy 1.0 beta 1 (january 3, 2005) Message-ID: The ElementTidy library is an add-on to ElementTree that provides an alternative tree builder that can read (almost) arbitrary HTML, and turn it into well-formed XHTML element trees. The ElementTidy library uses a library version of Dave Raggett's HTML Tidy utility to do the cleanup (source code is included), and does not rely on external utilities. The beta 1 release adds improved support for source document encoding, and more aggressive tidying (producing output also for seriously malformed HTML). For downloads and more information, see: http://effbot.org/downloads#elementtidy http://effbot.org/zone/element-tidylib.htm enjoy /F From Sylvain.Thenault at logilab.fr Mon Feb 7 12:18:48 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Mon Feb 7 12:18:51 2005 Subject: [XML-SIG] prepare_input_source and relative path Message-ID: <20050207111848.GA4540@logilab.fr> Hey, I've been heating a bug which is already registered as #616431 in the bug tracker. I find it very annoying and I've patched the function to make it work before noticing a patch was already available. Is there any reason to still wait to apply it ? Anyway I've joined to this mail my version of the fix, which fix the following cases: - prepare_input_source('relative.xml', '/base') -> /base/relative.xml the sf submitted patch fix this one to. - prepare_input_source('file:relative.xml', '/base') -> file:/base/relative.xml this allow to have a xml file containing relative system identifiers such as: where parse(open('path to my xml file')) should not fail as it currently does. If this patch sounds good to you, I can check it in. -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org -------------- next part -------------- --- /usr/lib/python2.3/site-packages/_xmlplus/sax/saxutils.py 2004-11-29 13:36:36.000000000 +0100 +++ cvs_work/_xmlplus/sax/saxutils.py 2005-02-07 12:01:42.000000000 +0100 @@ -5,7 +5,7 @@ $Id: saxutils.py,v 1.35 2004/03/20 07:46:04 fdrake Exp $ """ -import os, urlparse, urllib2, types +import os, urlparse, urllib, urllib2, types import handler import xmlreader import sys, _exceptions, saxlib @@ -511,14 +511,24 @@ source.setByteStream(f) if hasattr(f, "name"): source.setSystemId(f.name) - if source.getByteStream() is None: sysid = source.getSystemId() - if os.path.isfile(sysid): + # if a base is given, sysid may be relative to it, make the + # join before isfile() test + if base: basehead = os.path.split(os.path.normpath(base))[0] - source.setSystemId(os.path.join(basehead, sysid)) - f = open(sysid, "rb") + path = os.path.join(basehead, sysid) + else: + path = sysid + if os.path.isfile(path): + source.setSystemId(path) + f = open(path, "rb") else: + # if sysid is an url while base isn't, urljoin will fail, so + # insert the protocol identifier into base + proto = urlparse.urlparse(sysid)[0] + if proto and not urlparse.urlparse(base)[0]: + base = '%s:%s' % (proto, urllib.pathname2url(base)) source.setSystemId(urlparse.urljoin(base, sysid)) f = urllib2.urlopen(source.getSystemId()) From Uche.Ogbuji at fourthought.com Mon Feb 7 18:04:34 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Mon Feb 7 18:04:43 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050207111848.GA4540@logilab.fr> References: <20050207111848.GA4540@logilab.fr> Message-ID: <1107795874.4527.140.camel@borgia> On Mon, 2005-02-07 at 12:18 +0100, Sylvain Th?nault wrote: > Hey, > > I've been heating a bug which is already registered as #616431 in the > bug tracker. I find it very annoying and I've patched the function to > make it work before noticing a patch was already available. Is there any > reason to still wait to apply it ? > Anyway I've joined to this mail my version of the fix, which fix the > following cases: > > - prepare_input_source('relative.xml', '/base') -> /base/relative.xml > the sf submitted patch fix this one to. > > - prepare_input_source('file:relative.xml', '/base') -> > file:/base/relative.xml > > > this allow to have a xml file containing relative system identifiers > such as: > > > > > where parse(open('path to my xml file')) should not fail as it currently > does. > > If this patch sounds good to you, I can check it in. Wow. I'm always amazed at some of bugs that have lived on for so long in PyXML. Your patch seems fine to me, but there is one area that is probably worth discussion. I hope Mike Brown has a moment to chip in because he's an expert at such matters. For the case of the file: URL scheme (BTW, you might want to consider replacing your variable name "proto" with "scheme"), it's probably OK to have file:///base + file:relative.xml -> file:///base/relative.xml Since the file scheme's semantics are so wooly. But this wouldn't make sense if you replaced "file" with "http". Then there's the matter of a base URI given as /base in 4Suite we require all base URIs to be proper base URIs (so they must at least have a scheme). I think this is a reasonable restriction based on RFC requirements. Is there a valid user case where there would not be a proper base URI, anyway? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From Sylvain.Thenault at logilab.fr Mon Feb 7 18:17:36 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Mon Feb 7 18:17:40 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <1107795874.4527.140.camel@borgia> References: <20050207111848.GA4540@logilab.fr> <1107795874.4527.140.camel@borgia> Message-ID: <20050207171736.GA5096@logilab.fr> On Monday 07 February ? 10:04, Uche Ogbuji wrote: > On Mon, 2005-02-07 at 12:18 +0100, Sylvain Th?nault wrote: > > Hey, > > > > I've been heating a bug which is already registered as #616431 in the > > bug tracker. I find it very annoying and I've patched the function to > > make it work before noticing a patch was already available. Is there any > > reason to still wait to apply it ? > > Anyway I've joined to this mail my version of the fix, which fix the > > following cases: > > > > - prepare_input_source('relative.xml', '/base') -> /base/relative.xml > > the sf submitted patch fix this one to. > > > > - prepare_input_source('file:relative.xml', '/base') -> > > file:/base/relative.xml > > > > > > this allow to have a xml file containing relative system identifiers > > such as: > > > > > > > > > > where parse(open('path to my xml file')) should not fail as it currently > > does. > > > > If this patch sounds good to you, I can check it in. > > Wow. I'm always amazed at some of bugs that have lived on for so long > in PyXML. isn't it... > Your patch seems fine to me, but there is one area that is probably > worth discussion. I hope Mike Brown has a moment to chip in because > he's an expert at such matters. > > For the case of the file: URL scheme (BTW, you might want to consider > replacing your variable name "proto" with "scheme"), it's probably OK to > have thanks for fixing my url's vocabulary :) > file:///base + file:relative.xml -> file:///base/relative.xml > > Since the file scheme's semantics are so wooly. But this wouldn't make > sense if you replaced "file" with "http". yep. But notice my patch doesn't change anything in that case, which will so behave according to urlparse.urljoin's behaviour: >>> urlparse.urljoin('file:///base', 'file:relative.xml') 'file:///relative.xml' >>> urlparse.urljoin('file:///base', 'http:relative.xml') 'http:relative.xml' > Then there's the matter of a base URI given as > > /base > > in 4Suite we require all base URIs to be proper base URIs (so they must > at least have a scheme). I think this is a reasonable restriction based > on RFC requirements. Is there a valid user case where there would not > be a proper base URI, anyway? always having proper URI as base sounds like a reasonable restriction to me too, and I can't see user case where it would not. But we may have backward compat problem here if decide to care about it. Maybe InputSource.setSystemId could check for scheme presence, and if not add a file: and issue a deprecation warning ? -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org From uche.ogbuji at fourthought.com Tue Feb 8 00:33:01 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue Feb 8 00:33:06 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050207171736.GA5096@logilab.fr> References: <20050207111848.GA4540@logilab.fr> <1107795874.4527.140.camel@borgia> <20050207171736.GA5096@logilab.fr> Message-ID: <1107819181.4527.151.camel@borgia> On Mon, 2005-02-07 at 18:17 +0100, Sylvain Th?nault wrote: > On Monday 07 February ? 10:04, Uche Ogbuji wrote: > > file:///base + file:relative.xml -> file:///base/relative.xml > > > > Since the file scheme's semantics are so wooly. But this wouldn't make > > sense if you replaced "file" with "http". > > yep. But notice my patch doesn't change anything in that case, which > will so behave according to urlparse.urljoin's behaviour: > > >>> urlparse.urljoin('file:///base', 'file:relative.xml') > 'file:///relative.xml' > >>> urlparse.urljoin('file:///base', 'http:relative.xml') > 'http:relative.xml' Bleah. I guess that's why Mike Brown has had to create fixed versions of all the Python stdlib URI functions for 4Suite :-) > > Then there's the matter of a base URI given as > > > > /base > > > > in 4Suite we require all base URIs to be proper base URIs (so they must > > at least have a scheme). I think this is a reasonable restriction based > > on RFC requirements. Is there a valid user case where there would not > > be a proper base URI, anyway? > > always having proper URI as base sounds like a reasonable restriction to > me too, and I can't see user case where it would not. But we may have > backward compat problem here if decide to care about it. Maybe > InputSource.setSystemId could check for scheme presence, and if not add > a file: and issue a deprecation warning ? I do like the idea of a deprecation warning for this case, but what about backwards compat? The warnings module dates from Python 2.1. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From mike at skew.org Tue Feb 8 12:13:01 2005 From: mike at skew.org (Mike Brown) Date: Tue Feb 8 12:13:17 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050207111848.GA4540@logilab.fr> Message-ID: <200502081113.j18BD1Zw090277@chilled.skew.org> Sylvain Th?nault wrote: > - prepare_input_source('relative.xml', '/base') -> /base/relative.xml > the sf submitted patch fix this one to. Under no circumstances should '/base' + 'relative.xml' == '/base/relative.xml'. It would only be an acceptable result if you had '/base/' instead of '/base'. > - prepare_input_source('file:relative.xml', '/base') -> > file:/base/relative.xml Same here. This is incorrect. > this allow to have a xml file containing relative system identifiers > such as: > > (1) 'file:plans.xml' is not a relative URI reference. (2) The result of merging the reference 'file:plans.xml' with *any* base URI must be 'file:plans.xml'. RFC 3986 sec. 5 governs this resolution. > > > where parse(open('path to my xml file')) should not fail as it currently > does. Trust me, you'll find that it is much easier to implement RFC 3986 sec. 5 than it is to work around bugs in urllib and urlparse. I suggest porting Absolutize() and BaseJoin() from 4Suite's Ft.Lib.Uri. -Mike From mike at skew.org Tue Feb 8 12:30:53 2005 From: mike at skew.org (Mike Brown) Date: Tue Feb 8 12:30:57 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <1107819181.4527.151.camel@borgia> Message-ID: <200502081130.j18BUr3r090319@chilled.skew.org> Uche Ogbuji wrote: > Bleah. I guess that's why Mike Brown has had to create fixed versions > of all the Python stdlib URI functions for 4Suite :-) Yes. All of the URL functions in stdlib are either undocumented and for use within stdlib only, or are about 8 years out of date. Or both. I'm using Ft.Lib.Uri as proving grounds for APIs that I'll eventually propose for inclusion in urllib2. I don't anticipate making any headway on such proposals for a while, though. In Ft.Lib.Uri everything is RFC 3986 compliant (I was tracking development of the RFC), except for the percent-encoding APIs, which, like every other, are fraught with various gotchas that I wouldn't want to have to explain to anyone in any more detail than "everything you know is wrong" :) I hope to have those looking better "soon" but it involves some serious brain twisting. Relevant to this discussion, the API for resolution of URI references to absolute form -- Ft.Lib.Uri.Absolutize() -- is stable, and the algorithm it impements is well-defined by the RFC. The algorithm does not change for different URI schemes; it works the same for 'file' as for 'http'. It would not be too hard to copy Absolutize() and BaseJoin() from Ft.Lib.Uri over into PyXML as a temporary workaround until urllib2 is knocked into shape. I would just make it and its dependent functions semi-private, and change the exceptions to ValueErrors. > > > Then there's the matter of a base URI given as > > > > > > /base > > > > > > in 4Suite we require all base URIs to be proper base URIs (so they must > > > at least have a scheme). I think this is a reasonable restriction based > > > on RFC requirements. Is there a valid user case where there would not > > > be a proper base URI, anyway? > > > > always having proper URI as base sounds like a reasonable restriction to > > me too, and I can't see user case where it would not. But we may have > > backward compat problem here if decide to care about it. Maybe > > InputSource.setSystemId could check for scheme presence, and if not add > > a file: and issue a deprecation warning ? Adding 'file:' blindly can cause difficulties or unexpected results. These are all very different things: 'xyz' - relative URI reference (relative path) '/xyz' - relative URI reference (absolute path) 'file:xyz' - absolute URI (undef authority, non-hierarchical path) 'file:/xyz' - absolute URI (undef authority, absolute path; dubious usage) 'file://xyz' - absolute URI (authority xyz; no path) 'file:///xyz' - absolute URI (empty authority, absolute path) And then there's what happens when you start throwing in dot segments ('file:./xyz')... and people guessing at how to convert an OS path into a URI reference... it gets ugly. It is better to just check for the presence of a scheme and reject the base if it doesn't have one. Or, if you can tolerate receiving a result that has no scheme, prepend a dummy scheme, apply the proper resolution algorithm, and strip the scheme from the result. Again this may not give the results that the user expected, but IMHO there's no need to give the user what they expect when what they expect is wrong :) -Mike From Sylvain.Thenault at logilab.fr Tue Feb 8 16:42:10 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Tue Feb 8 16:42:13 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <200502081113.j18BD1Zw090277@chilled.skew.org> References: <20050207111848.GA4540@logilab.fr> <200502081113.j18BD1Zw090277@chilled.skew.org> Message-ID: <20050208154210.GA4113@logilab.fr> On Tuesday 08 February ? 04:13, Mike Brown wrote: > Sylvain Th?nault wrote: > > - prepare_input_source('relative.xml', '/base') -> /base/relative.xml > > the sf submitted patch fix this one to. > > Under no circumstances should '/base' + 'relative.xml' == '/base/relative.xml'. > It would only be an acceptable result if you had '/base/' instead of '/base'. > > > > - prepare_input_source('file:relative.xml', '/base') -> > > file:/base/relative.xml > > Same here. This is incorrect. yes, sorry for the wrong examples. Anyway in pyxml the base argument is usually something like '/base/starthere.xml' so the patch fix correctly this case. However you're right that it should probably be fixed to handle the "no trailing slash" problem. > > this allow to have a xml file containing relative system identifiers > > such as: > > > > > > (1) 'file:plans.xml' is not a relative URI reference. > > (2) The result of merging the reference 'file:plans.xml' with *any* base URI > must be 'file:plans.xml'. RFC 3986 sec. 5 governs this resolution. > > > > > > > where parse(open('path to my xml file')) should not fail as it currently > > does. > > Trust me, you'll find that it is much easier to implement RFC 3986 sec. 5 than > it is to work around bugs in urllib and urlparse. I suggest porting Absolutize() > and BaseJoin() from 4Suite's Ft.Lib.Uri. I guess you're right. I wrote this patch because it was fixing my problem. Now if it doesn't take too much time to have every cases correctly fixed by implementing RFC 3986, I may take some time to do so or to help having it done. And if parts of the job is already done in 4suite, that's great. However what's in 4suite, what's not and need to be implemented is not yet clear to me. -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org From mike at skew.org Wed Feb 9 03:01:35 2005 From: mike at skew.org (Mike Brown) Date: Wed Feb 9 03:01:44 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050208154210.GA4113@logilab.fr> Message-ID: <200502090201.j1921ZwP092976@chilled.skew.org> Sylvain Th?nault wrote: > I guess you're right. I wrote this patch because it was fixing my > problem. Now if it doesn't take too much time to have every cases > correctly fixed by implementing RFC 3986, I may take some time to do so > or to help having it done. And if parts of the job is already done in > 4suite, that's great. However what's in 4suite, what's not and need to > be implemented is not yet clear to me. The current version of Ft.Lib.Uri is here: http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup [1] If you see "rfc2396bis" in the doc strings, you may safely interpret them to mean "RFC 3986". The functions that you should look at are the following: MakeUrllibSafe(uriRef) ====================== This exists in order to convert a proper URI reference into one that can be handled by urllib.urlopen(). It does the following: 1. If the reference contains an Internationalized Domain Name, recodes it so that it is resolvable. (Py 2.3+ only) 2. Strips the fragment component, if any. 3. Ensures that the reference is a byte string, not unicode. 4. On Windows, assumes that the first ':' appearing in the path component is part of a drivespec, and converts it to '|'. If you port this function, the reference to PercentDecode() may be replaced with urllib.unquote(), but you must move the byte string check (#3, above) to occur before calling unquote. The references to the functions SplitUriRef and UnsplitUriRef can be replaced with urlsplit() and urlunsplit() from the urlparse module. Absolutize(uriRef, baseUri) =========================== This does strict merging of a URI reference and a base URI. The base URI *must* be absolute (must have a scheme). If you port this function, the UriException may be replaced with a ValueError, and SplitUriRef & UnsplitUriRef may be replaced with their urlparse equivalents, as mentioned above. The RemoveDotSegments function must also be ported and should be made semi-private because it is not for general use. I've implemented it using two segment stacks, as alluded to in the spec, rather than the explicit string-walking algorithm that would be too inefficient. BaseJoin(base, UriRef) ====================== This does lenient merging of a base URI and a URI reference (note the argument order is different than that of Absolutize). It allows the base URI to be a relative reference. In such cases, we use a dummy scheme (we don't say "assume 'file:' because the spec says all schemes must be resolved the same), run it through Absolutize, and then remove the scheme from the result. If you port this function, you will need to port the IsAbsolute function, which just checks to see if the URI has a scheme. I prefer to use a regex for this, as it is fast and accurate (':' can appear in more than one place in a URI reference, so it is not safe to assume that its presence means there is a scheme). -Mike [1] ...well, not really. The current version is on my hard drive :) From Sylvain.Thenault at logilab.fr Wed Feb 9 15:39:38 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Wed Feb 9 16:26:19 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <200502090201.j1921ZwP092976@chilled.skew.org> References: <20050208154210.GA4113@logilab.fr> <200502090201.j1921ZwP092976@chilled.skew.org> Message-ID: <20050209143938.GA4381@logilab.fr> On Tuesday 08 February ? 19:01, Mike Brown wrote: > Sylvain Th?nault wrote: > > I guess you're right. I wrote this patch because it was fixing my > > problem. Now if it doesn't take too much time to have every cases > > correctly fixed by implementing RFC 3986, I may take some time to do so > > or to help having it done. And if parts of the job is already done in > > 4suite, that's great. However what's in 4suite, what's not and need to > > be implemented is not yet clear to me. > > The current version of Ft.Lib.Uri is here: > http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup [1] > > If you see "rfc2396bis" in the doc strings, you may safely interpret > them to mean "RFC 3986". > > > The functions that you should look at are the following: > > MakeUrllibSafe(uriRef) > ====================== > This exists in order to convert a proper URI reference into one that > can be handled by urllib.urlopen(). It does the following: > 1. If the reference contains an Internationalized Domain Name, > recodes it so that it is resolvable. (Py 2.3+ only) > 2. Strips the fragment component, if any. > 3. Ensures that the reference is a byte string, not unicode. > 4. On Windows, assumes that the first ':' appearing in the path > component is part of a drivespec, and converts it to '|'. > > If you port this function, the reference to PercentDecode() may be replaced > with urllib.unquote(), but you must move the byte string check (#3, above) to > occur before calling unquote. The references to the functions SplitUriRef and > UnsplitUriRef can be replaced with urlsplit() and urlunsplit() from the > urlparse module. > > > Absolutize(uriRef, baseUri) > =========================== > This does strict merging of a URI reference and a base URI. The base URI > *must* be absolute (must have a scheme). If you port this function, the > UriException may be replaced with a ValueError, and SplitUriRef & > UnsplitUriRef may be replaced with their urlparse equivalents, as > mentioned above. The RemoveDotSegments function must also be ported and > should be made semi-private because it is not for general use. I've > implemented it using two segment stacks, as alluded to in the spec, > rather than the explicit string-walking algorithm that would be too > inefficient. > > > BaseJoin(base, UriRef) > ====================== > This does lenient merging of a base URI and a URI reference (note the > argument order is different than that of Absolutize). It allows the base > URI to be a relative reference. In such cases, we use a dummy scheme > (we don't say "assume 'file:' because the spec says all schemes must be > resolved the same), run it through Absolutize, and then remove the scheme > from the result. If you port this function, you will need to port the > IsAbsolute function, which just checks to see if the URI has a scheme. > I prefer to use a regex for this, as it is fast and accurate (':' can > appear in more than one place in a URI reference, so it is not safe to > assume that its presence means there is a scheme). thanks a lot. Actually almost all the work is already done right there. Here is what I've worked on. Once we'll reach a consensus, I'll add that to pyxml. So I've joined to this mail: - a light version of 4Suite Uri.py including the following functions: SplitUriRef, UnsplitUriRef (it was really less annoying to use those two functions than the equivalent urllib's ones), Absolutize, MakeUrllibSafe, _RemoveDotSegments, BaseJoin, GetScheme and IsAbsolute. With the presented solution, the 3 last ones are not used and could be removed, but I've kept them in for now. Every tests for Absolutize from 4suite are still passing. - a modified version of saxutils, expecting the Uri module above to be in the _xmlplus directory (ie importable as xml.Uri). I've refactored prepare_input_source to ease testing of the URI merging stuff. - a unittest file, which include some test cases for the URI merging function. Please take a look at the existant test cases to check everything looks fine to you. If you have other case to add, please let me know (or maybe can I add this file to the cvs first). Notice that to run the tests, you should have a "quotes.xml" file in the same directory as the test file (there is one in the test directory of pyxml). As a bonus, I've converted the escape function test from test_utils into a unittest in the same file. Anyway, having SplitUriRef/UnsplitUriRef replacing urlparse.urlsplit/urlunsplit and Absolutize or BaseJoin replacing urlparse.urljoin would definitly be the right thing. -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: test_saxutils.py Type: text/x-python Size: 2062 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/test_saxutils-0001.py -------------- next part -------------- A non-text attachment was scrubbed... Name: Uri.py Type: text/x-python Size: 16423 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/Uri-0001.py -------------- next part -------------- A non-text attachment was scrubbed... Name: saxutils.py Type: text/x-python Size: 24925 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/saxutils-0001.py From john at nmt.edu Wed Feb 9 20:46:19 2005 From: john at nmt.edu (John W. Shipman) Date: Wed Feb 9 20:46:28 2005 Subject: [XML-SIG] Generating XML from scratch Message-ID: I've been all through python.org site and carefully read ``Python & XML'' by Jones and Drake, but I can't find any body of practice about the generation of XML files from scratch. All the existing practice seems to be about reading or modifying existing XML documents. I want to capture data from a GUI or other source and store it as an XML document. I've been doing this for a while, using the minidom in 2.2, but apparently all the (admittedly undocumented) features I was using went away in 2.3, and the new methods are a lot uglier. This means that when we upgrade to 2.3 or 2.4 locally, I have to go back and rewrite a lot of existing, working scripts. Here's the document I wrote that describes how I did it in 2.2: http://www.nmt.edu/tcc/help/pubs/pyxml/ Look under the last chapter, ``Creating a document from scratch.'' I use the constructors such as Document() and Element() in that minidom version, but now they want me to use the .createElement() and other factory methods from the Document object. This is much more awkward. Either I have to pass the Document object to any piece of code that needs to create an Element object, or the code needs to dig the .ownerDocument attribute out of some handy Node object so it has access to the factory methods. There's one situation where even this approach doesn't work. I have a script that generates a document fragment that gets included in an XHTML page using server-side includes. I can't instantiate it as a Document object, because then I would get an processing instruction at the top, which is not something I want inside the element of a web page. Previously I was getting around this problem by using a DocumentFragment object, but such objects in the minidom have an .ownerDocument attribute set to None. So I have to instantiate an empty Document object *just* to get access to the factory methods. This is what we software old-timers call a KLUGE. Comments? Is there something out there I don't know about? Best regards, John Shipman (john@nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber From fredrik at pythonware.com Wed Feb 9 20:59:45 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 9 20:59:49 2005 Subject: [XML-SIG] Re: Generating XML from scratch References: Message-ID: John W. Shipman wrote: > I want to capture data from a GUI or other source and store > it as an XML document. > > I've been doing this for a while, using the minidom in 2.2, but > apparently all the (admittedly undocumented) features I was using > went away in 2.3, and the new methods are a lot uglier. This > means that when we upgrade to 2.3 or 2.4 locally, I have to go > back and rewrite a lot of existing, working scripts. > Comments? Is there something out there I don't know about? rule 1: don't use DOM, if you can avoid it. rule 2: you can always avoid it. some alternatives: http://www.xml.com/pub/a/2003/04/09/py-xml.html http://www.xml.com/pub/a/2003/10/15/py-xml.html http://effbot.org/zone/xml-writer.htm http://effbot.org/zone/element-index.htm etc. From frans.englich at telia.com Wed Feb 9 21:19:27 2005 From: frans.englich at telia.com (Frans Englich) Date: Wed Feb 9 21:11:41 2005 Subject: [XML-SIG] Re: Generating XML from scratch In-Reply-To: References: Message-ID: <200502092019.27937.frans.englich@telia.com> On Wednesday 09 February 2005 19:59, Fredrik Lundh wrote: > rule 1: don't use DOM, if you can avoid it. What's wrong with DOM? What makes one want to avoid the DOM interface? Do you know any docs/links which discuss this further? How tied to Python is your opinion on DOM? Cheers, Frans From hostetlerm at gmail.com Wed Feb 9 21:19:52 2005 From: hostetlerm at gmail.com (Mike Hostetler) Date: Wed Feb 9 21:20:28 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: References: Message-ID: On Wed, 9 Feb 2005 12:46:19 -0700 (MST), John W. Shipman wrote: > I've been all through python.org site and carefully read ``Python > & XML'' by Jones and Drake, but I can't find any body of practice > about the generation of XML files from scratch. All the existing > practice seems to be about reading or modifying existing XML > documents. I want to capture data from a GUI or other source and > store it as an XML document. [snip] Here is a snippet of how I did it with the Sax parser a few years ago. At the time, minidom didn't do all I needed, but in Py > 2.1 minidom has matured . . . from xml.dom.ext.reader import Sax dom = Sax.FromXml("") assert dom.documentElement.tagName == 'root' -- Mike Hostetler http://www.binary.net/thehaas From rsalz at datapower.com Wed Feb 9 21:22:09 2005 From: rsalz at datapower.com (Rich Salz) Date: Wed Feb 9 21:21:15 2005 Subject: [XML-SIG] Re: Generating XML from scratch In-Reply-To: <200502092019.27937.frans.englich@telia.com> References: <200502092019.27937.frans.englich@telia.com> Message-ID: <420A70F1.8070903@datapower.com> > What makes one want to avoid the DOM interface? Do you know any docs/links > which discuss this further? How tied to Python is your opinion on DOM? I think that last question is the key point. DOM is very much un-python. If you are "just" generating XML, then you will probably go faster if you use things that naturally fit into the python programming idioms. /r$ -- Rich Salz, Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From frans.englich at telia.com Wed Feb 9 21:35:38 2005 From: frans.englich at telia.com (Frans Englich) Date: Wed Feb 9 21:27:52 2005 Subject: [XML-SIG] Re: Generating XML from scratch In-Reply-To: <420A70F1.8070903@datapower.com> References: <200502092019.27937.frans.englich@telia.com> <420A70F1.8070903@datapower.com> Message-ID: <200502092035.38585.frans.englich@telia.com> On Wednesday 09 February 2005 20:22, Rich Salz wrote: > > What makes one want to avoid the DOM interface? Do you know any > > docs/links which discuss this further? How tied to Python is your opinion > > on DOM? > > I think that last question is the key point. > > DOM is very much un-python. I would say so too; it follows the usual "function interfacing" which IMO is strongly present in languages like Java and C++. I'm wondering if there's any disadvantages beyond its un-pithonity(now _that's_ duck typing), and/or if DOM should be avoided in other languages too. Cheers, Frans From mike at skew.org Thu Feb 10 00:06:31 2005 From: mike at skew.org (Mike Brown) Date: Thu Feb 10 00:07:32 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050209143938.GA4381@logilab.fr> Message-ID: <200502092306.j19N6VJR003704@chilled.skew.org> Sylvain Th?nault wrote: > thanks a lot. Actually almost all the work is already done right there. > Here is what I've worked on. Once we'll reach a consensus, I'll add that > to pyxml. So I've joined to this mail: > > - a light version of 4Suite Uri.py including the following functions: > SplitUriRef, UnsplitUriRef (it was really less annoying to use those > two functions than the equivalent urllib's ones), Absolutize, > MakeUrllibSafe, _RemoveDotSegments, BaseJoin, GetScheme and > IsAbsolute. With the presented solution, the 3 last ones are not used > and could be removed, but I've kept them in for now. Doc strings will need to be updated to reflect the promotion from "rfc2396bis" to RFC 3986. Also there's one place where I have "RFC (newline)2396bis" which should also be fixed. In MakeUrllibSafe, you should catch the UnicodeError that could result from the attempt to force unicode to a byte string: if isinstance(uri, unicode): try: uri = uri.encode('us-ascii') except UnicodeError: raise ValueError("uri %r must consist of ASCII characters." % uri) > Every tests for Absolutize from 4suite are still passing. I forgot to point you to my tests. They do not use unittest, so they would need to be adapted, but it would be easy since the comparisons are string-in to string-out (or exception), and I've labeled them pretty clearly: http://cvs.4suite.org/viewcvs/4Suite/test/Lib/test_uri.py?view=markup As you will see, they are fairly comprehensive. > - a modified version of saxutils, expecting the Uri module above to be > in the _xmlplus directory (ie importable as xml.Uri). I've refactored > prepare_input_source to ease testing of the URI merging stuff. You might want to grep for "emacspymodestink" in your code. :) > - a unittest file, which include some test cases for the URI merging > function. Please take a look at the existant test cases to check > everything looks fine to you. If you have other case to add, please let > me know (or maybe can I add this file to the cvs first). Notice that > to run the tests, you should have a "quotes.xml" file in the same > directory as the test file (there is one in the test directory of > pyxml). As a bonus, I've converted the escape function test from > test_utils into a unittest in the same file. > > Anyway, having SplitUriRef/UnsplitUriRef replacing > urlparse.urlsplit/urlunsplit and Absolutize or BaseJoin replacing > urlparse.urljoin would definitly be the right thing. On python-dev in Sep 2004, I was discussing with Martin v. L?wi swhat principles we think should be embraced by urlparse, urllib and urllib2. He feels that we should simultaneously shoot for both URI and IRI support according to the RFCs (3986 and 3987), with unicode arguments being assumed to be IRIs. I would hold off on any stdlib changes until the APIs can be discussed in more detail. From Sylvain.Thenault at logilab.fr Thu Feb 10 11:02:17 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Thu Feb 10 11:02:20 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <200502092306.j19N6VJR003704@chilled.skew.org> References: <20050209143938.GA4381@logilab.fr> <200502092306.j19N6VJR003704@chilled.skew.org> Message-ID: <20050210100217.GE3811@logilab.fr> On Wednesday 09 February ? 16:06, Mike Brown wrote: > Sylvain Th?nault wrote: > > thanks a lot. Actually almost all the work is already done right there. > > Here is what I've worked on. Once we'll reach a consensus, I'll add that > > to pyxml. So I've joined to this mail: > > > > - a light version of 4Suite Uri.py including the following functions: > > SplitUriRef, UnsplitUriRef (it was really less annoying to use those > > two functions than the equivalent urllib's ones), Absolutize, > > MakeUrllibSafe, _RemoveDotSegments, BaseJoin, GetScheme and > > IsAbsolute. With the presented solution, the 3 last ones are not used > > and could be removed, but I've kept them in for now. > > Doc strings will need to be updated to reflect the promotion from > "rfc2396bis" to RFC 3986. Also there's one place where I have "RFC > (newline)2396bis" which should also be fixed. done. However, does sections of rfc 2396bis match sections of rfc 3986 ? > In MakeUrllibSafe, you should catch the UnicodeError that could result > from the attempt to force unicode to a byte string: > > if isinstance(uri, unicode): > try: > uri = uri.encode('us-ascii') > except UnicodeError: > raise ValueError("uri %r must consist of ASCII characters." % uri) done. > > Every tests for Absolutize from 4suite are still passing. > > I forgot to point you to my tests. They do not use unittest, so they > would need to be adapted, but it would be easy since the comparisons > are string-in to string-out (or exception), and I've labeled them > pretty clearly: > > http://cvs.4suite.org/viewcvs/4Suite/test/Lib/test_uri.py?view=markup > > As you will see, they are fairly comprehensive. I did found them. As I said I've run relevant tests again the restricted version of Uri.py and all of them pass. > > - a modified version of saxutils, expecting the Uri module above to be > > in the _xmlplus directory (ie importable as xml.Uri). I've refactored > > prepare_input_source to ease testing of the URI merging stuff. > > You might want to grep for "emacspymodestink" in your code. :) right, forgot that :) And I've also added the following modification to prepare_input_source since I send it here: @@ -510,7 +510,7 @@ source = xmlreader.InputSource() source.setByteStream(f) if hasattr(f, "name"): - source.setSystemId(f.name) + source.setSystemId('file:%s' % f.name) if source.getByteStream() is None: sysid = absolute_system_id(source.getSystemId(), base) source.setSystemId(sysid) > > - a unittest file, which include some test cases for the URI merging > > function. Please take a look at the existant test cases to check > > everything looks fine to you. If you have other case to add, please let > > me know (or maybe can I add this file to the cvs first). Notice that > > to run the tests, you should have a "quotes.xml" file in the same > > directory as the test file (there is one in the test directory of > > pyxml). As a bonus, I've converted the escape function test from > > test_utils into a unittest in the same file. did you take a look at those tests ? Sounds good to anyone here ? More tests to add ? > > Anyway, having SplitUriRef/UnsplitUriRef replacing > > urlparse.urlsplit/urlunsplit and Absolutize or BaseJoin replacing > > urlparse.urljoin would definitly be the right thing. > > On python-dev in Sep 2004, I was discussing with Martin v. L?wi swhat > principles we think should be embraced by urlparse, urllib and urllib2. He > feels that we should simultaneously shoot for both URI and IRI support > according to the RFCs (3986 and 3987), with unicode arguments being assumed to > be IRIs. > > I would hold off on any stdlib changes until the APIs can be discussed in > more detail. ok. -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org From mike at skew.org Thu Feb 10 21:15:25 2005 From: mike at skew.org (Mike Brown) Date: Thu Feb 10 21:15:34 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <20050210100217.GE3811@logilab.fr> Message-ID: <200502102015.j1AKFPsR009831@chilled.skew.org> Sylvain Th?nault wrote: > done. However, does sections of rfc 2396bis match sections of rfc 3986 ? Yes. There were only very minor editorial changes in the last drafts before rfc2396bis became RFC 3986. > I did found them. As I said I've run relevant tests again the restricted > version of Uri.py and all of them pass. Ah, OK. I wasn't sure what you meant at first. > And I've also added the following modification to > prepare_input_source since I send it here: > > @@ -510,7 +510,7 @@ > source = xmlreader.InputSource() > source.setByteStream(f) > if hasattr(f, "name"): > - source.setSystemId(f.name) > + source.setSystemId('file:%s' % f.name) > if source.getByteStream() is None: > sysid = absolute_system_id(source.getSystemId(), base) > source.setSystemId(sysid) I'm not sure without seeing it in action, but this does not look right to me (the change, as well as its context). I need to look at what it's doing more closely. If you need to be lenient, be lenient with the base URI. When you prepend 'file:' to something, you're making it be absolute, which probably isn't what you wanted, and probably won't be ideal. > did you take a look at those tests ? Not yet, sorry. :) Busy. From postmaster at python.org Fri Feb 11 03:21:22 2005 From: postmaster at python.org (MAILER-DAEMON) Date: Fri Feb 11 03:31:23 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20050211023121.4A9BF1E400E@bag.python.org> The original message was received at Thu, 10 Feb 2005 16:21:22 -1000 from python.org [5.86.142.153] ----- The following addresses had permanent fatal errors ----- xml-sig@python.org ----- Transcript of session follows ----- ... while talking to python.org.: >>> MAIL From:"MAILER-DAEMON" <<< 509 "MAILER-DAEMON" ... Domain blacklisted -------------- next part -------------- Scanner: MMSMTP2.0 The message body part has been replaced with this note. Problem description: Body part: 2 [file.zip] SAV sweep results: A virus was detected. Virus found: W32/MyDoom-O Virus found: W32/MyDoom-O condition: virus infection action taken: disinfect condition: virus disinfection failed action taken: replace attachment From john at nmt.edu Fri Feb 11 01:27:16 2005 From: john at nmt.edu (John W. Shipman) Date: Fri Feb 11 05:07:59 2005 Subject: [XML-SIG] More Pythonic XML creation Message-ID: Thanks for all the replies to my inquiry about creation of documents from scratch using the DOM. I've rewritten my document "Python and the XML DOM" to conform to the way the Python 2.3 xml.dom.minidom module wants you to use factory methods: see section 6, `Creating a document from scratch' in this document: http://www.nmt.edu/tcc/help/pubs/pyxml/ Also included in this document is a module that makes document creation more Pythonic. It is described in section 7 of the document, and section 8 contains a "literate programming" presentation of the code of the new module. I would greatly appreciate any comments. Best regards, John Shipman (john@nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber From Sylvain.Thenault at logilab.fr Fri Feb 11 09:46:31 2005 From: Sylvain.Thenault at logilab.fr (Sylvain =?iso-8859-1?Q?Th=E9nault?=) Date: Fri Feb 11 09:46:35 2005 Subject: [XML-SIG] prepare_input_source and relative path In-Reply-To: <200502102015.j1AKFPsR009831@chilled.skew.org> References: <20050210100217.GE3811@logilab.fr> <200502102015.j1AKFPsR009831@chilled.skew.org> Message-ID: <20050211084631.GA3844@logilab.fr> On Thursday 10 February ? 13:15, Mike Brown wrote: > Sylvain Th?nault wrote: > > > And I've also added the following modification to > > prepare_input_source since I send it here: > > > > @@ -510,7 +510,7 @@ > > source = xmlreader.InputSource() > > source.setByteStream(f) > > if hasattr(f, "name"): > > - source.setSystemId(f.name) > > + source.setSystemId('file:%s' % f.name) > > if source.getByteStream() is None: > > sysid = absolute_system_id(source.getSystemId(), base) > > source.setSystemId(sysid) > > I'm not sure without seeing it in action, but this does not look > right to me (the change, as well as its context). I need to look at > what it's doing more closely. > > If you need to be lenient, be lenient with the base URI. When you > prepend 'file:' to something, you're making it be absolute, which > probably isn't what you wanted, and probably won't be ideal. To be honest, I don't feel really good with this either. What I wished to solve here is the case where prepare_input_source get a opened file as argument, which is a really common case since it's happen each time we do parser.parse(open('myfile.xml')). If the parsed file contains any reference to external resource, it's system id will be used as base uri, and that may be a problem if it's just as in the example 'myfile.xml'. Maybe adding "file:" if exists(abspath(f.name)) would be a good compromise. > > did you take a look at those tests ? > > Not yet, sorry. :) Busy. ok. It's just that since this is a sensitive part of pyxml, I wished to get some code review before to check anything in. Now I guess that other people on this list may also have an opinion on this... ;) -- Sylvain Th?nault LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org From dkuhlman at cutter.rexx.com Fri Feb 11 18:48:26 2005 From: dkuhlman at cutter.rexx.com (Dave Kuhlman) Date: Fri Feb 11 18:48:28 2005 Subject: [XML-SIG] Re: Generating XML from scratch In-Reply-To: <420A70F1.8070903@datapower.com> References: <200502092019.27937.frans.englich@telia.com> <420A70F1.8070903@datapower.com> Message-ID: <20050211174826.GA29929@cutter.rexx.com> On Wed, Feb 09, 2005 at 03:22:09PM -0500, Rich Salz wrote: > >What makes one want to avoid the DOM interface? Do you know any docs/links > >which discuss this further? How tied to Python is your opinion on DOM? > > I think that last question is the key point. > > DOM is very much un-python. OK, I'll bite. Which characteristics make DOM un-pythonic? Or are we just talking about a general ick-factor here? Maybe the minidom API is somewhat of a mess, but then so are XML and the XML documents that minidom must be able to represent. > > If you are "just" generating XML, then you will probably go faster if > you use things that naturally fit into the python programming idioms. Which things are those "that naturally fit into the python programming idioms"? Is it the writer idiom stuff? My understanding is that ElementTree occupies the same niche and satisfies the same needs as the Python implementation of DOM (for example, minidom). I'd like to see some sort of comparison of minidom and ElementTree. Are there some real reasons why I should choose ElementTree over minidom for future work? Is there a consensus that we should be using ElementTree instead of minidom? If so, it seems that this should be mentioned in the standard "Python Library Reference" sections on DOM and minidom rather than being a "secret known but to a few" on this list. Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman From rsalz at datapower.com Fri Feb 11 21:26:22 2005 From: rsalz at datapower.com (Rich Salz) Date: Fri Feb 11 21:25:20 2005 Subject: [XML-SIG] Re: Generating XML from scratch In-Reply-To: <20050211174826.GA29929@cutter.rexx.com> References: <200502092019.27937.frans.englich@telia.com> <420A70F1.8070903@datapower.com> <20050211174826.GA29929@cutter.rexx.com> Message-ID: <420D14EE.80902@datapower.com> > OK, I'll bite. Which characteristics make DOM un-pythonic? Quick reply, with some items off the top of my head. XML says that the order of attributes and namespace nodes doesn't matter, just the name and value. This maps naturally to Python dictionary. On the other hand, the order of an element's children does matter. This maps naturally to a Python list. Starting from those two basic concepts, think about how simpler many things become -- no addBefore, addAfter, etc, just standard Python list slices. Much other stuff can be thrown out. The element object should have a "resolve_qname" method which takes a 'foo:bar' qname and returns a (nsuri,localname) tuple. This removes the need for many of the DOM get.../get...NS routines. for k,v in curelt.attributes.items(): (ns,localname) = curelt.qname_resolve(k) ... now look at all attriubtes, by qname, ns, or localname And so on. >>If you are "just" generating XML, then you will probably go faster if >>you use things that naturally fit into the python programming idioms. Don't call complex API's. Instead set attributes on objects. That seems to be how ElementTree and amara work, for example. But I think that generating XML is not a very hard or interesting problem, and that it is very application specific -- i..e, it depends too much on what the local object that you are trying to serialize is. But I'm apparently in a real minority here, so don't listen to me.:) /r$ -- Rich Salz, Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From seethro at voila.fr Fri Feb 11 21:51:28 2005 From: seethro at voila.fr (seethro@voila.fr) Date: Fri Feb 11 21:56:13 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20050211205611.A6A281E4009@bag.python.org> The message was undeliverable due to the following reason(s): Your message could not be delivered because the destination server was not reachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message could not be delivered within 8 days: Host 126.220.158.127 is not responding. The following recipients could not receive this message: Please reply to postmaster@python.org if you feel this message to be in error. -------------- next part -------------- Scanner: MMSMTP2.0 The message body part has been replaced with this note. Problem description: Body part: 2 [file.zip] SAV sweep results: A virus was detected. Virus found: W32/MyDoom-O Virus found: W32/MyDoom-O Virus found: W32/MyDoom-O condition: virus infection action taken: disinfect condition: virus disinfection failed action taken: replace attachment From nicolas.plourde at sympatico.ca Fri Feb 11 22:29:26 2005 From: nicolas.plourde at sympatico.ca (nicolas.plourde@sympatico.ca) Date: Fri Feb 11 22:31:29 2005 Subject: [XML-SIG] Mail System Error - Returned Mail Message-ID: <200502112131.j1BLVQux027403@phoenix.szarvas.hu> -------------- next part -------------- *************************************************************** ** A csatolmány instruction.zip I-Worm.Mydoom.R virussal fertőzött, ** a csatolmány törölve lett. *************************************************************** From and-xml at doxdesk.com Sat Feb 12 15:37:36 2005 From: and-xml at doxdesk.com (Andrew Clover) Date: Sat Feb 12 15:33:40 2005 Subject: [XML-SIG] More Pythonic XML creation In-Reply-To: References: Message-ID: <420E14B0.6040804@doxdesk.com> John W. Shipman wrote: > I've rewritten my document "Python and the XML DOM" to > conform to the way the Python 2.3 xml.dom.minidom module > wants you to use factory methods: see section 6, `Creating > a document from scratch' Actually you haven't quite gone far enough. Document and DocumentType should themselves be created from factory methods. You're supposed to use minidom.getDOMImplementation(), or the 'implementation' property of an existing Document to get a DOMImplementation object, then call createDocument() and createDocumentType() on it. These constructors work for now, but can't be guaranteed; there are no constructors in the W3C DOM standard itself. Using the constructor for DocumentFragment, on the other hand, could well cause errors (like what you get with Element etc). Use Document.createDocumentFragment(). > I would greatly appreciate any comments. >> XML (eXtended Markup Language) and SGML (Standard General Markup Language) eXtensible Markup Language and Standard Generalized Markup Language. Possibly you wanted less nit-picky comments, but you've got to take what you can get eh? -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From and-xml at doxdesk.com Sat Feb 12 15:49:12 2005 From: and-xml at doxdesk.com (Andrew Clover) Date: Sat Feb 12 15:45:16 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: References: Message-ID: <420E1768.1090402@doxdesk.com> John W. Shipman wrote: > Look under the last chapter, ``Creating a document from scratch.'' > I use the constructors such as Document() and Element() in that > minidom version, but now they want me to use the .createElement() > and other factory methods from the Document object. They always did - it's the DOM standard. Minidom was just less fussy about it a long time ago; you're more likely to get errors about it these days. > I can't instantiate it as a Document object, because then I > would get an processing instruction at the top, > which is not something I want inside the element of > a web page. That's not a good reason not to use a Document. An XML serializer *may* allow a Document to be output without the XML declaration (pxdom supports the DOM Level 3 LS parameter 'xml-declaration', for example). Alternatively, just serialise the Document.documentElement or its children instead of the Document object itself. > Previously I was getting around this problem by using a > DocumentFragment object, but such objects in the minidom have an > .ownerDocument attribute set to None. A DocumentFragment still has to have an owner Document. Minidom DocumentFragments only have a null ownerDocument if you have constructed them wrong, using minidom's own private constructors. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From fredrik at pythonware.com Sat Feb 12 18:22:22 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 12 18:23:49 2005 Subject: [XML-SIG] Re: Re: Generating XML from scratch References: <200502092019.27937.frans.englich@telia.com><420A70F1.8070903@datapower.com> <20050211174826.GA29929@cutter.rexx.com> Message-ID: Dave Kuhlman wrote: > Maybe the minidom API is somewhat of a mess, but then so are > XML and the XML documents that minidom must be able to represent. that's a popular myth. other popular myths are that XML parsers have to be slow, because they process Unicode; that XML DOM representations have to use tons of memory, because they have to; and that tools that don't fully support all kinds of XML processing are unusable for any kind of XML processing. > I'd like to see some sort of comparison of minidom and ElementTree. > Are there some real reasons why I should choose ElementTree over > minidom for future work? that's a "python vs. perl" or "static typing vs. dynamic typing" question. I suggest trying it, to see if it fits your brain, and the kind of XML programming you do. > Is there a consensus that we should be using ElementTree instead > of minidom? if you ask toolmakers, they'll tell you that their own tool is the best one. if you ask users, you may get more consistent answers ;-) From wunder at verity.com Sat Feb 12 20:07:43 2005 From: wunder at verity.com (Walter Underwood) Date: Sat Feb 12 20:07:41 2005 Subject: [XML-SIG] Re: Re: Generating XML from scratch In-Reply-To: References: <200502092019.27937.frans.englich@telia.com><420A70F1.8070903@datapower.com> <20050211174826.GA29929@cutter.rexx.com> Message-ID: <01980FCE930EFC5AC55A83BD@adsl-64-166-133-243.dsl.snfc21.pacbell.net> Might want to use something designed for generating XML. The DOM is really designed for representing it, which isn't quite the same thing. GenX: Python wrapper: wunder --On February 12, 2005 6:22:22 PM +0100 Fredrik Lundh wrote: > Dave Kuhlman wrote: > >> Maybe the minidom API is somewhat of a mess, but then so are >> XML and the XML documents that minidom must be able to represent. > > that's a popular myth. > > other popular myths are that XML parsers have to be slow, because they > process Unicode; that XML DOM representations have to use tons of > memory, because they have to; and that tools that don't fully support all > kinds of XML processing are unusable for any kind of XML processing. > >> I'd like to see some sort of comparison of minidom and ElementTree. >> Are there some real reasons why I should choose ElementTree over >> minidom for future work? > > that's a "python vs. perl" or "static typing vs. dynamic typing" question. I suggest > trying it, to see if it fits your brain, and the kind of XML programming you do. > >> Is there a consensus that we should be using ElementTree instead >> of minidom? > > if you ask toolmakers, they'll tell you that their own tool is the best one. if you > ask users, you may get more consistent answers ;-) > > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > -- Walter Underwood Principal Architect, Verity From and-xml at doxdesk.com Sun Feb 13 10:42:21 2005 From: and-xml at doxdesk.com (Andrew Clover) Date: Sun Feb 13 10:38:25 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: References: Message-ID: <420F20FD.1090908@doxdesk.com> John W. Shipman wrote: > True for Python 2.3, but my principal workstation still has > Python 2.2, and even when I use the factory methods, the > .ownerDocument attribute of the DocumentFragment is None. Ugh, you're right, it's a typo in createDocumentFragment: d = DocumentFragment() d.ownerDoc = self (instead of ownerDocument.) Could you perhaps install PyXML on the 2.2 setup? The 2.2 minidom has a few other bugs you might also wish to avoid. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From mvshah at tatanova.com Mon Feb 14 07:31:13 2005 From: mvshah at tatanova.com (mvshah@tatanova.com) Date: Mon Feb 14 07:37:03 2005 Subject: [XML-SIG] Re: Mail Delivery (failure mvshah@tatanova.com) Message-ID: <20050214063113.9282.qmail@smtpmum3.tatanova.com> hi, Thanks for your mail. I will open my mailbox in the evening and reply. Regards, Maulik. From noreply at sourceforge.net Mon Feb 14 12:04:00 2005 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Feb 14 12:04:02 2005 Subject: [XML-SIG] [ pyxml-Patches-1122297 ] ASP.NET interoperability Message-ID: Patches item #1122297, was opened at 2005-02-14 12:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=1122297&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: lode leroy (lode_leroy) Assigned to: Nobody/Anonymous (nobody) Summary: ASP.NET interoperability Initial Comment: # this patch adds composition of the SOAPAction header # as expected by ASP.NET --- SOAPpy/Client.py-0.11.6 2005-02-14 11:58:17.858539200 +0100 +++ SOAPpy/Client.py 2005-02-14 12:06:57.876288000 +0100 @@ -317,7 +317,10 @@ if self.soapaction: sa = self.soapaction else: - sa = ns + name + if ns and self.config.DotNetSoapAction: + sa = ns + name + else: + sa = name if hd: # Get header if type(hd) == TupleType: ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=1122297&group_id=6473 From ruger at freesurf.fr Mon Feb 14 21:53:37 2005 From: ruger at freesurf.fr (ruger@freesurf.fr) Date: Mon Feb 14 21:57:02 2005 Subject: [XML-SIG] Delivery reports about your e-mail Message-ID: <20050214205700.5A6FC1E4007@bag.python.org> The message was undeliverable due to the following reason: Your message could not be delivered because the destination server was unreachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message could not be delivered within 2 days: Mail server 105.106.166.109 is not responding. The following recipients did not receive this message: Please reply to postmaster@python.org if you feel this message to be in error. -------------- next part -------------- Scanner: MMSMTP2.0 The message body part has been replaced with this note. Problem description: Body part: 2 [message.pif] SAV sweep results: A virus was detected. Virus found: W32/MyDoom-O condition: virus infection action taken: disinfect condition: virus disinfection failed action taken: replace attachment From noreply at sourceforge.net Mon Feb 14 23:05:51 2005 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Feb 14 23:05:54 2005 Subject: [XML-SIG] [ pyxml-Bugs-1122726 ] Cannot find ext.reader module Message-ID: Bugs item #1122726, was opened at 2005-02-14 17:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1122726&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: wildsolution (mfjacobs) Assigned to: Nobody/Anonymous (nobody) Summary: Cannot find ext.reader module Initial Comment: I was able to compile and install python 2.4 I built and installed PyXML-0.8.4. No errors. When I try to import the Sax2 module I get an error. Does anyone have any suggestions on why this is happening? Security Permissions, pathing? I do not have much expereince with XML any suggstions would be appreciated. Thanks, Mike Python 2.4 (#1, Feb 14 2005, 12:27:33) [GCC 3.2.2] on irix6 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> import sys >>> from xml.dom.ext.reader import Sax2 Traceback (most recent call last): File "", line 1, in ? ImportError: No module named ext.reader >>> ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1122726&group_id=6473 From postmaster at python.org Wed Feb 16 17:21:06 2005 From: postmaster at python.org (The Post Office) Date: Wed Feb 16 17:23:54 2005 Subject: [XML-SIG] (no subject) Message-ID: <200502161623.j1GGNpux007960@phoenix.szarvas.hu> The original message was received at Wed, 16 Feb 2005 17:21:06 +0100 from 64.107.146.59 ----- The following addresses had permanent fatal errors ----- xml-sig@python.org ----- Transcript of the session follows ----- ... while talking to python.org.: 550 5.1.2 ... Host unknown (Name server: host not found) -------------- next part -------------- *************************************************************** ** A csatolmány letter.zip I-Worm.Mydoom.R virussal fertőzött, ** a csatolmány törölve lett. *************************************************************** From Uche.Ogbuji at fourthought.com Wed Feb 16 19:55:59 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Wed Feb 16 19:56:03 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: References: Message-ID: <1108580159.27858.24.camel@borgia> On Wed, 2005-02-09 at 12:46 -0700, John W. Shipman wrote: > I've been all through python.org site and carefully read ``Python > & XML'' by Jones and Drake, but I can't find any body of practice > about the generation of XML files from scratch. All the existing > practice seems to be about reading or modifying existing XML > documents. I want to capture data from a GUI or other source and > store it as an XML document. http://www.xml.com/pub/a/2002/11/13/py-xml.html http://www.xml.com/pub/a/2003/03/12/py-xml.html http://www.xml.com/pub/a/2003/10/15/py-xml.html http://software.translucentcode.org/pygenx/ etc. DOM can be a pretty awkward way to generate XML. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From rtomayko at gmail.com Wed Feb 16 20:39:59 2005 From: rtomayko at gmail.com (Ryan Tomayko) Date: Wed Feb 16 20:40:14 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: <1108580159.27858.24.camel@borgia> References: <1108580159.27858.24.camel@borgia> Message-ID: <6e9a74e6bb0cc0df3236f750980fae84@gmail.com> You may also want to consider using an XML aware template language: Ryan On Feb 16, 2005, at 1:55 PM, Uche Ogbuji wrote: > On Wed, 2005-02-09 at 12:46 -0700, John W. Shipman wrote: >> I've been all through python.org site and carefully read ``Python >> & XML'' by Jones and Drake, but I can't find any body of practice >> about the generation of XML files from scratch. All the existing >> practice seems to be about reading or modifying existing XML >> documents. I want to capture data from a GUI or other source and >> store it as an XML document. > > http://www.xml.com/pub/a/2002/11/13/py-xml.html > http://www.xml.com/pub/a/2003/03/12/py-xml.html > http://www.xml.com/pub/a/2003/10/15/py-xml.html > http://software.translucentcode.org/pygenx/ > > etc. > > DOM can be a pretty awkward way to generate XML. > > > -- > Uche Ogbuji Fourthought, Inc. > http://uche.ogbuji.net http://4Suite.org http://fourthought.com > Use CSS to display XML - > http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html > Introducing the Amara XML Toolkit - > http://www.xml.com/pub/a/2005/01/19/amara.html > Be humble, not imperial (in design) - > http://www.adtmag.com/article.asp?id=10286 > Querying WordNet as XML - > http://www.ibm.com/developerworks/xml/library/x-think29.html > Manage XML collections with XAPI - > http://www-106.ibm.com/developerworks/xml/library/x-xapi.html > Default and error handling in XSLT lookup tables - > http://www.ibm.com/developerworks/xml/library/x-tiplook.html > Packaging XSLT lookup tables as EXSLT functions - > http://www.ibm.com/developerworks/xml/library/x-tiplook2.html > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > From uche.ogbuji at fourthought.com Wed Feb 16 21:45:28 2005 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Wed Feb 16 21:45:42 2005 Subject: [XML-SIG] Generating XML from scratch In-Reply-To: <6e9a74e6bb0cc0df3236f750980fae84@gmail.com> References: <1108580159.27858.24.camel@borgia> <6e9a74e6bb0cc0df3236f750980fae84@gmail.com> Message-ID: <1108586728.27858.46.camel@borgia> On Wed, 2005-02-16 at 14:39 -0500, Ryan Tomayko wrote: > You may also want to consider using an XML aware template language: > > For my own preference, I really dislike hybrid XML template languages. They seem hacky and too much of a blurring of the layers to me. I prefer a chain of Python feeding XSLT every time. But to each his own, of course. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From junkc at fh-trier.de Thu Feb 17 13:43:23 2005 From: junkc at fh-trier.de (Christian Junk) Date: Thu Feb 17 13:43:05 2005 Subject: [XML-SIG] XBEL resource page updates In-Reply-To: <1107537656.4527.74.camel@borgia> References: <1106898215.8243.44.camel@borgia> <41FE8261.4020705@v.loewis.de> <1107537656.4527.74.camel@borgia> Message-ID: <200502171343.23559.junkc@fh-trier.de> Am Freitag, 4. Februar 2005 18:20 schrieb Uche Ogbuji: > [..] > Of this sounds good, I'll need some help getting it all set up. My time > is limited. I'm OK making the basic SF project request, and some > initial set-up. Hi, there! I would like to ask, if there is any interim development? Can we help? What is the next step? Regards, Christian -- Christian Junk FH Trier, University of Applied Sciences Faculty of Design and Applied Computer Science http://christianjunk.webinternals.de http://xbel.webinternals.de From premium-server at thawte.com Thu Feb 17 20:51:31 2005 From: premium-server at thawte.com (premium-server@thawte.com) Date: Thu Feb 17 20:51:33 2005 Subject: [XML-SIG] Delivery reports about your e-mail Message-ID: <20050217195132.BDB251E4002@bag.python.org> The original message was received at Thu, 17 Feb 2005 11:51:31 -0800 from thawte.com [113.195.170.17] ----- The following addresses had permanent fatal errors ----- ----- Transcript of session follows ----- ... while talking to server 95.144.252.141: 550 5.1.2 ... Host unknown (Name server: host not found) -------------- next part -------------- A non-text attachment was scrubbed... Name: message.zip Type: application/octet-stream Size: 26405 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050217/a538aca5/message-0001.obj From Uche.Ogbuji at fourthought.com Fri Feb 18 00:45:43 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Feb 18 00:46:10 2005 Subject: [XML-SIG] XBEL resource page updates In-Reply-To: <200502171343.23559.junkc@fh-trier.de> References: <1106898215.8243.44.camel@borgia> <41FE8261.4020705@v.loewis.de> <1107537656.4527.74.camel@borgia> <200502171343.23559.junkc@fh-trier.de> Message-ID: <1108683943.27858.71.camel@borgia> On Thu, 2005-02-17 at 13:43 +0100, Christian Junk wrote: > Am Freitag, 4. Februar 2005 18:20 schrieb Uche Ogbuji: > > [..] > > Of this sounds good, I'll need some help getting it all set up. My time > > is limited. I'm OK making the basic SF project request, and some > > initial set-up. > > Hi, there! > > I would like to ask, if there is any interim development? Can we help? What is > the next step? Well, I posted the idea, and you and Martin responded positively. That's not (yet) an overwhelming endorsement, given the number of people I've seen post on XBEL. I thought it might be better to give people time to mull it over before embarking on such a potentially disruptive change. Maybe I'm being too cautious? Does anyone think that giving XBEL its own project space is *not* a good idea? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From postmaster at python.org Fri Feb 18 07:53:45 2005 From: postmaster at python.org (The Post Office) Date: Fri Feb 18 07:53:46 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20050218065344.E6E8C1E4002@bag.python.org> The original message was received at Thu, 22 Jul 2004 22:48:37 -0700 from python.org [127.183.133.209] ----- The following addresses had permanent fatal errors ----- ----- Transcript of session follows ----- while talking to python.org.: >>> MAIL From:"The Post Office" <<< 501 "The Post Office" ... Refused -------------- next part -------------- Dangerous Attachment has been Removed. The file "letter.zip" has been removed because of a virus. It was infected with the "W32/Mydoom.M-mm" virus. File quarantined as: "2821b1f0.letter.zip". http://www.fortinet.com/VirusEncyclopedia/search/encyclopediaSearch.do?method=quickSearchDirectly&virusName=W32%2FMydoom.M-mm From Alexandre.Fayolle at logilab.fr Fri Feb 18 08:34:56 2005 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Fri Feb 18 08:34:58 2005 Subject: [XML-SIG] XBEL resource page updates In-Reply-To: <1108683943.27858.71.camel@borgia> References: <1106898215.8243.44.camel@borgia> <41FE8261.4020705@v.loewis.de> <1107537656.4527.74.camel@borgia> <200502171343.23559.junkc@fh-trier.de> <1108683943.27858.71.camel@borgia> Message-ID: <20050218073456.GB7309@crater.logilab.fr> On Thu, Feb 17, 2005 at 04:45:43PM -0700, Uche Ogbuji wrote: > On Thu, 2005-02-17 at 13:43 +0100, Christian Junk wrote: > > Am Freitag, 4. Februar 2005 18:20 schrieb Uche Ogbuji: > > > [..] > > > Of this sounds good, I'll need some help getting it all set up. My time > > > is limited. I'm OK making the basic SF project request, and some > > > initial set-up. > > > > Hi, there! > > > > I would like to ask, if there is any interim development? Can we help? What is > > the next step? > > Well, I posted the idea, and you and Martin responded positively. > That's not (yet) an overwhelming endorsement, given the number of people > I've seen post on XBEL. I thought it might be better to give people > time to mull it over before embarking on such a potentially disruptive > change. > > Maybe I'm being too cautious? > > Does anyone think that giving XBEL its own project space is *not* a good > idea? I think it would be a *good* idea. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://mail.python.org/pipermail/xml-sig/attachments/20050218/790f68b7/attachment.pgp From benn at cenix-bioscience.com Fri Feb 18 13:20:04 2005 From: benn at cenix-bioscience.com (Neil Benn) Date: Fri Feb 18 13:19:47 2005 Subject: [XML-SIG] SAX Events to DOM Tree Message-ID: <4215DD74.8020606@cenix-bioscience.com> Hello, I have a couple of questions : I'm looking for some code which takes SAX Events and converts them to a DOM Tree, the sax events don;t have a namespace decleration - however I'm lost, from this page (http://pyxml.sourceforge.net/topics/howto/node10.html) I can see : dom.ext.reader Classes for building DOM trees from various input sources: SAX1 and SAX2 parsers, htmllib, and directly using Expat. However running help from the command line bring back a reader which states that it can only receive streams of strings (I would expect it to implement ContentHandler). I also found a class on ActiveState cookbooks but it only works with name spaced SAX events. Before I go about writing something is there a class which can do what I need? The next bit is that the XMLGenerator only prints out code in one huge big contiguous sequence of chars, I would like to have something that pretty prints (with \t and \n) to the 'file-like' object (aka output stream ;-)). Again this seems like a common thing so I was gonna check to see if I there is already a class that does this? As you can see I don't like writing code if it exists in a standard distribution, I've checked for the api docs with google (used 'pyxml doc', 'pyxml api doc', 'pyxml api docs', 'pyxml docs') and nada (only howtos and tutorials on xml). Thanks, in advance for your help. Cheers, Neil -- Neil Benn Senior Automation Engineer Cenix BioScience BioInnovations Zentrum Tatzberg 46 D-01307 Dresden Germany Tel : +49 (0)351 4173 154 e-mail : benn@cenix-bioscience.com Cenix Website : http://www.cenix-bioscience.com From michael.prilla at iaw.rub.de Fri Feb 18 13:55:43 2005 From: michael.prilla at iaw.rub.de (Michael Prilla) Date: Fri Feb 18 13:51:52 2005 Subject: [XML-SIG] Trouble installing PyXML Message-ID: <1A1B1BAC56DD014CBF1E23CA6FBF782F1F6A@exchge-imtm.IAW.RUHR-UNI-BOCHUM.DE> Hi, I'm getting several errors while installing PyXML. I tried different versions of PyXML (0.8.1, 0.8.3, 0.8.4) to verify the problem but the errors are always the same. The first problems arise when I start 'setup.py' by 'python setup.py build' and it gives back: File "sysconfig.py", line 172, in customize_compiler cc_cmd = cc + ' ' + opt TypeError: cannot concatenate 'str' and 'NoneType' objects I solved these problems by checking if all the parts are not None. After this part of the setup the process starts the gcc and hangs with the next message: extensions/pyexpat.c:2065: warning: excess elements in struct initializer extensions/pyexpat.c:2065: warning: (near initialization for `handler_info[21]') extensions/pyexpat.c:2065: warning: excess elements in array initializer extensions/pyexpat.c:2065: warning: (near initialization for `handler_info') extensions/pyexpat.c:1998: error: storage size of `handler_info' isn't known error: command 'gcc' failed with exit status 1 A few lines before it produces several warnings and errors: extensions/pyexpat.c:1664: warning: (near initialization for `Xmlparsetype') extensions/pyexpat.c:1664: error: parse error before "xmlparse_setattr" extensions/pyexpat.c:1665: error: `cmpfunc' undeclared here (not in a function) extensions/pyexpat.c:1665: warning: excess elements in scalar initializer extensions/pyexpat.c:1665: warning: (near initialization for `Xmlparsetype') extensions/pyexpat.c:1665: error: parse error before numeric constant extensions/pyexpat.c:1666: error: `reprfunc' undeclared here (not in a function) extensions/pyexpat.c:1666: warning: excess elements in scalar initializer extensions/pyexpat.c:1666: warning: (near initialization for `Xmlparsetype') extensions/pyexpat.c:1666: error: parse error before numeric constant extensions/pyexpat.c:1667: warning: excess elements in scalar initializer This is the point where I can't get the installation any further. I'm working on a SuSe Linux 9.1, the gcc is 3.3.4, Python is installed in version 2.3.3. Does anyone have an idea how to get the installation working or if it might be a gcc-compatibility issue? -- Michael Prilla www.imtm-iaw.rub.de From brian at sweetapp.com Fri Feb 18 15:06:34 2005 From: brian at sweetapp.com (Brian Quinlan) Date: Fri Feb 18 15:06:44 2005 Subject: [XML-SIG] SAX Events to DOM Tree In-Reply-To: <4215DD74.8020606@cenix-bioscience.com> References: <4215DD74.8020606@cenix-bioscience.com> Message-ID: <4215F66A.9050300@sweetapp.com> Neil Benn wrote: > Hello, > > I have a couple of questions : > > I'm looking for some code which takes SAX Events and converts > them to a DOM Tree, the sax events don;t have a namespace decleration - > however I'm lost, from this page > (http://pyxml.sourceforge.net/topics/howto/node10.html) I can see : > > dom.ext.reader > Classes for building DOM trees from various input sources: SAX1 and > SAX2 parsers, htmllib, and directly using Expat. > > However running help from the command line bring back a reader which > states that it can only receive streams of strings (I would expect it to > implement ContentHandler). I also found a class on ActiveState > cookbooks but it only works with name spaced SAX events. Before I go > about writing something is there a class which can do what I need? >>> help('xml.dom.ext.reader.Sax') This module might do what you want. > The next bit is that the XMLGenerator only prints out code in one > huge big contiguous sequence of chars, I would like to have something > that pretty prints (with \t and \n) to the 'file-like' object (aka > output stream ;-)). Again this seems like a common thing so I was gonna > check to see if I there is already a class that does this? I'd don't know anything about XMLGenerator but the problem that you are likely going to have is that the serializer doesn't where whitespace can be added without changing the semantics of your document. For example, this: 123 and this: 1 2 3 Would generate different DOMs (depending on the whitespace mode). See here: http://www.w3.org/TR/2000/REC-xml-20001006#sec-white-space Maybe there is a flag to control this somewhere in the XMLGenerator (whatever that is) API. Cheers, Brian From benn at cenix-bioscience.com Fri Feb 18 16:15:47 2005 From: benn at cenix-bioscience.com (Neil Benn) Date: Fri Feb 18 16:15:27 2005 Subject: [XML-SIG] XML stuff Message-ID: <421606A3.2090102@cenix-bioscience.com> Hello, Thanks for the response Brian: ---1 Cool - although I tested it straight off, binding to a Reader emitting start/end doc, start/end elements and no characters (both making a character call with an empty string and not making a character call at all). Anyways, I get a traceback : Traceback (most recent call last): File "CeLMA\Automation\Parsers\ParsingFramework.py", line 165, in ? objParser.parse(objTestFile) File "C:\Documents and Settings\benn.CENIX-SCIENCE\My Documents\svnfiles\CeLMA\Automation\Parsers\Implementation\HTDParser.py", line 85, in parse self.__startDoc() File "C:\Documents and Settings\benn.CENIX-SCIENCE\My Documents\svnfiles\CeLMA\Automation\Parsers\Implementation\HTDParser.py", line 193, in __startDoc self.__objHandler.startElement('data', AttributesImpl({})) File "C:\PROGRA~1\Python23\Lib\site-packages\_xmlplus\dom\ext\reader\Sax.py", line 73, in startElement self._completeTextNode() File "C:\PROGRA~1\Python23\Lib\site-packages\_xmlplus\dom\ext\reader\Sax.py", line 52, in _completeTextNode if self._currText: AttributeError: XmlDomGenerator instance has no attribute '_currText' self,__startDoc() looks like --- def __startDoc(self): self.__objHandler.startDocument() self.__objHandler.startElement('data', AttributesImpl({})) --- How's that for a method!! It looks to me like a charcters problem but I can't call charcters without calling startElement. When I get time I'll dig around to look for a solution. In the meantime, I've written a simple version meself. ---2 For the XMLGenerator, there is not a flag in XMLGenerator that I can find - it doesn't appear in the dir and something like that would be in the dir as I would need to access it. I ge teh point about the insignifcant white space and that is why a pretty print should be an option. Although in most cases people don't care about insignificant whitespace (i.e. white space outside of an element) in fact I can't think of a _sensible_ reason to care about insignificant whitespace - can you (it's a Friday afternoon, go on wonder away!)? Have a good weekend all. Cheers, Neil -- Neil Benn Senior Automation Engineer Cenix BioScience BioInnovations Zentrum Tatzberg 46 D-01307 Dresden Germany Tel : +49 (0)351 4173 154 e-mail : benn@cenix-bioscience.com Cenix Website : http://www.cenix-bioscience.com From brian at sweetapp.com Fri Feb 18 16:39:49 2005 From: brian at sweetapp.com (Brian Quinlan) Date: Fri Feb 18 16:39:51 2005 Subject: [XML-SIG] XML stuff In-Reply-To: <421606A3.2090102@cenix-bioscience.com> References: <421606A3.2090102@cenix-bioscience.com> Message-ID: <42160C45.7050405@sweetapp.com> Neil Benn wrote: > For the XMLGenerator, there is not a flag in XMLGenerator that I can > find - it doesn't appear in the dir and something like that would be in > the dir as I would need to access it. I ge teh point about the > insignifcant white space and that is why a pretty print should be an > option. Although in most cases people don't care about insignificant > whitespace (i.e. white space outside of an element) in fact I can't > think of a _sensible_ reason to care about insignificant whitespace - > can you (it's a Friday afternoon, go on wonder away!)? But all whitespace (that appears in the DOM) is in an element, at least the document element, so what whitespace do you consider insignificant? For example, in XHTML, these two are different:

Neil Benn

And:

NeilBenn

Cheers, Brian From Uche.Ogbuji at fourthought.com Fri Feb 18 19:13:22 2005 From: Uche.Ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Feb 18 19:13:43 2005 Subject: [XML-SIG] SAX Events to DOM Tree In-Reply-To: <4215DD74.8020606@cenix-bioscience.com> References: <4215DD74.8020606@cenix-bioscience.com> Message-ID: <1108750403.16835.69.camel@borgia> On Fri, 2005-02-18 at 13:20 +0100, Neil Benn wrote: > Hello, > > I have a couple of questions : > > I'm looking for some code which takes SAX Events and converts > them to a DOM Tree See http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/298343 Available with updates in Amara XML Toolkit [1] If you don't specify any chunking rules, it turns all SAX the SAX into one DOM document node. [1] http://www.xml.com/pub/a/2005/01/19/amara.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html From fjorback at users.multi-support.dk Sat Feb 19 13:19:34 2005 From: fjorback at users.multi-support.dk (fjorback@users.multi-support.dk) Date: Sat Feb 19 13:22:18 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <200502191222.j1JCMFAT012708@phoenix.szarvas.hu> Dear user xml-sig@python.org, Your email account has been used to send a huge amount of junk email during this week. Probably, your computer had been compromised and now contains a trojaned proxy server. We recommend that you follow instructions in order to keep your computer safe. Best regards, The python.org team. -------------- next part -------------- *************************************************************** ** A csatolmány xtaqxw.exe I-Worm.Mydoom.R virussal fertőzött, ** a csatolmány törölve lett. *************************************************************** From usafis at usafisnews.org Sat Feb 19 16:36:58 2005 From: usafis at usafisnews.org (usafis@usafisnews.org) Date: Sat Feb 19 16:40:04 2005 Subject: [XML-SIG] Mail System Error - Returned Mail Message-ID: <200502191540.j1JFe2AT013424@phoenix.szarvas.hu> The original message was received at Sat, 19 Feb 2005 16:36:58 +0100 from usafisnews.org [179.73.237.135] ----- The following addresses had permanent fatal errors ----- xml-sig@python.org ----- Transcript of the session follows ----- ... while talking to python.org.: 554 ... Message is too large 554 ... Service unavailable -------------- next part -------------- *************************************************************** ** A csatolmány eiei.zip I-Worm.Mydoom.R virussal fertőzött, ** a csatolmány törölve lett. *************************************************************** From mzhangyh at yahoo.com Sun Feb 20 00:21:47 2005 From: mzhangyh at yahoo.com (Michael Zhang) Date: Sun Feb 20 00:21:50 2005 Subject: [XML-SIG] xml parsing error Message-ID: <20050219232148.3080.qmail@web53709.mail.yahoo.com> Hi, When I used the xml to parse a document loaded from server, I got the following error message. Could anybody tell what's wrong with that? thanks, File "ShowAllData.py", line 143, in ? main(sys.argv) File "ShowAllData.py", line 117, in main win = MainWindow() File "ShowAllData.py", line 48, in __init__ videoInfo = CaMLDocumentParser.getVideoInfo(GenericParser.parse(StringIO(result))) File "/home/vraid1/mzhang/CaMLServer3/lib/GenericParser.py", line 39, in parse xml.sax.parse (file, g) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/__init__.py", line 31, in parse parser.parse(filename_or_stream) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 220, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:0: not well-formed (invalid token) __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo From mzhang at cpsc.ucalgary.ca Sun Feb 20 00:20:33 2005 From: mzhang at cpsc.ucalgary.ca (Yonghua (Michael) Zhang) Date: Sun Feb 20 00:23:43 2005 Subject: [XML-SIG] xml parsing error Message-ID: <4217C9C1.9000506@cpsc.ucalgary.ca> Hi, When I used the xml to parse a document loaded from server, I got the following error message. Could anybody tell what's wrong with that? thanks, File "ShowAllData.py", line 143, in ? main(sys.argv) File "ShowAllData.py", line 117, in main win = MainWindow() File "ShowAllData.py", line 48, in __init__ videoInfo = CaMLDocumentParser.getVideoInfo(GenericParser.parse(StringIO(result))) File "/home/vraid1/mzhang/CaMLServer3/lib/GenericParser.py", line 39, in parse xml.sax.parse (file, g) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/__init__.py", line 31, in parse parser.parse(filename_or_stream) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 220, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:0: not well-formed (invalid token) From and-xml at doxdesk.com Sun Feb 20 01:32:02 2005 From: and-xml at doxdesk.com (Andrew Clover) Date: Sun Feb 20 09:32:03 2005 Subject: [XML-SIG] xml parsing error In-Reply-To: <20050219232148.3080.qmail@web53709.mail.yahoo.com> References: <20050219232148.3080.qmail@web53709.mail.yahoo.com> Message-ID: <4217DA82.6020201@doxdesk.com> Michael Zhang wrote: > Could anybody tell what's wrong with that? Not without seeing the file you're trying to parse. sax.handler says the document isn't well-formed - perhaps it's right? -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From mike at skew.org Sun Feb 20 20:02:20 2005 From: mike at skew.org (Mike Brown) Date: Sun Feb 20 20:02:23 2005 Subject: [XML-SIG] xml parsing error In-Reply-To: <4217DA82.6020201@doxdesk.com> Message-ID: <200502201902.j1KJ2Kq5001673@chilled.skew.org> Andrew Clover wrote: > Michael Zhang wrote: > > > Could anybody tell what's wrong with that? > > Not without seeing the file you're trying to parse. sax.handler says the > document isn't well-formed - perhaps it's right? > I think the error message said 1:0, which means it saw a problem at the very beginning of the document. Perhaps the file is empty or begins with something other than "<" or a BOM. He should check the file for extraneous whitespace at the top. From gregoire.horkay at freemail.hu Mon Feb 21 11:49:45 2005 From: gregoire.horkay at freemail.hu (gregoire.horkay@freemail.hu) Date: Mon Feb 21 11:55:34 2005 Subject: [XML-SIG] {VIRUS?} xml-sig@python.org Message-ID: <200502211053.j1LArYGS025378@hosp.ozd.hu> Warning: This message has had one or more attachments removed. Warning: Please read the "VirusWarning.txt" attachment(s) for more information. The original message was received at Mon, 21 Feb 2005 11:49:45 +0100 from [159.48.65.226] ----- The following addresses had permanent fatal errors ----- xml-sig@python.org ----- Transcript of session follows ----- ... while talking to host python.org.: >>> DATA <<< 400-aturner; %MAIL-E-OPENOUT, error opening !AS as output <<< 400 -------------- next part -------------- This is a message from the MailScanner E-Mail Virus Protection Service ---------------------------------------------------------------------- The original e-mail attachment "text.zip" was believed to be infected by a virus and has been replaced by this warning message. If you wish to receive a copy of the *infected* attachment, please e-mail helpdesk and include the whole of this message in your request. Alternatively, you can call them, with the contents of this message to hand when you call. At Mon Feb 21 11:53:54 2005 the virus scanner said: >>> Virus 'W32/MyDoom-O' found in file ./j1LArYGS025378/text.zip/text.scr >>> Virus 'W32/MyDoom-O' found in file ./j1LArYGS025378/text.zip Note to Help Desk: Look on the MailScanner in /var/spool/MailScanner/quarantine (message j1LArYGS025378). -- Postmaster From users at openoffice.org Tue Feb 22 13:48:41 2005 From: users at openoffice.org (users@openoffice.org) Date: Tue Feb 22 11:56:52 2005 Subject: [XML-SIG] Returned mail: Data format error Message-ID: <20050222105637.65D641E4004@bag.python.org> The message was not delivered due to the following reason: Your message could not be delivered because the destination computer was not reachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message was not delivered within 8 days: Server 152.22.230.105 is not responding. The following recipients could not receive this message: Please reply to postmaster@python.org if you feel this message to be in error. From jairo at jairoboudewyn.com Thu Feb 24 06:02:07 2005 From: jairo at jairoboudewyn.com (jairo@jairoboudewyn.com) Date: Thu Feb 24 06:02:10 2005 Subject: [XML-SIG] Delivery reports about your e-mail Message-ID: <20050224050209.02BDD1E4005@bag.python.org> The original message was received at Wed, 23 Feb 2005 21:02:07 -0800 from [104.153.210.66] ----- The following addresses had permanent fatal errors ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: document.zip Type: application/octet-stream Size: 26015 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050223/a266e6f3/document-0001.obj From webdav at www.webdav.org Thu Feb 24 19:43:49 2005 From: webdav at www.webdav.org (webdav@www.webdav.org) Date: Thu Feb 24 19:43:03 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20050224184302.5B1C11E4005@bag.python.org> The original message was received at Thu, 24 Feb 2005 20:43:49 +0200 from www.webdav.org [44.244.74.165] ----- The following addresses had permanent fatal errors ----- ----- Transcript of session follows ----- ... while talking to python.org.: >>> MAIL From:webdav@www.webdav.org <<< 505 Refused -------------- next part -------------- A non-text attachment was scrubbed... Name: message.zip Type: application/octet-stream Size: 29284 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20050224/c7090518/message-0001.obj From postmaster at python.org Mon Feb 28 15:33:27 2005 From: postmaster at python.org (Returned mail) Date: Mon Feb 28 15:33:11 2005 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20050228143307.JMNR5839.viefep18-int.chello.at@python.org> The original message was received at Mon, 28 Feb 2005 15:33:27 +0100 from python.org [87.195.141.108] ----- The following addresses had permanent fatal errors ----- xml-sig@python.org ----- Transcript of session follows ----- ... while talking to 200.59.53.196: >>> MAIL FROM:"Returned mail" <<< 504 Refused -------------- next part -------------- -------- Virus Warning Message -------- The virus (W32/Mydoom.o@MM!zip) was detected in the attachment document.zip. The attached File document.zip has been removed. Nachfolgender Virus (W32/Mydoom.o@MM!zip) wurde im Attachment document.zip gefunden, deshalb wurde das Attachment document.zip gel?scht. F?r Fragen dazu steht Ihnen der chello Helpdesk sehr gerne zur Verf?gung. Weitere Informationen zum Virenschutz: http://portal.chello.at/av-info.html Le serveur de mail chello a d?tect? le virus W32/Mydoom.o@MM!zip dans le fichier document.zip inclus dans ce mail. Ce fichier document.zip a donc ?t? supprim?e pour en ?viter la diffusion. Pour plus d'information, merci de cliquer sur le lien suivant http://www.chello.fr Az ?nnek k?zbes?tett lev?l mell?klet?ben a v?russz?r? rendszer a(z) W32/Mydoom.o@MM!zip nev? v?rust tal?lta, ez?rt a(z) document.zip nev? mell?kletet biztons?gi okokb?l elt?vol?totta. Tov?bbi inform?ci??rt, k?rj?k kattintson az al?bbi hivatkoz?sra: http://home.hun.chello.hu/upcmnfc/start/tamogatas/virusszures/ V p??loze document.zip byl detekov?n virus W32/Mydoom.o@MM!zip. P??loha document.zip byla proto odstran?na. Pro dotazy kontaktujte pros?m technickou podporu. W za??czniku document.zip wykryto wirus W32/Mydoom.o@MM!zip. Plik document.zip zosta? usuni?ty. Wi?cej informacji znajdziesz na stronie internetowej: http://home.pol.chello.pl/upcmnfc/start/pomoc/wirusy/ V prilo?enom s?bore document.zip bol zisten? v?rus (W32/Mydoom.o@MM!zip). S?bor document.zip bol odstr?nen?. V pr?pade ot?zok pros?m kontaktujte linku technickej podpory. http://www.chello.sk ----------------------------------------