From bhartsho at yahoo.com Sun May 2 15:01:38 2004 From: bhartsho at yahoo.com (brett hartshorn) Date: Sun May 2 15:01:41 2004 Subject: [XML-SIG] xmlapi 0.2.1 Message-ID: <20040502190138.91074.qmail@web13424.mail.yahoo.com> Hi, I have started using this and others may find it useful for making specialized DOMs. I'm interested to hear peoples reactions to it, and if there are parts of the api they would want changed. -brett Xmlapi is even smaller XML DOM implementation than Python's standard xml.dom.minidom. This version should be: faster, and easy to use. Includes some extra features that make it easier for building DOM-like APIs ontop of it. http://opart.org/xmlapi/ __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover From kevin.thackray at clarisys.fr Tue May 4 09:21:12 2004 From: kevin.thackray at clarisys.fr (kevin Thackray) Date: Tue May 4 09:20:16 2004 Subject: [XML-SIG] DocumentFragment ?? Message-ID: <409798C8.1080900@clarisys.fr> lo everyone, I want to write a xml document from scratch but with a particular object model : I have a class that handle the document creation (and futher validating), and other classes that handle various parts of my document class MyDoc: newDoc = implementation.createDocument(None, "TheDoc", None) newDoc = Document() header = Header() newDoc.importNode(header, True) class Header(DocumentFragment): def __init__(self): DocumentFragment.__init__(self) self.appendChild(Element("Autor")) I raise this exception : File "/home/kevin/python/sdp/db2xml/DocTemplate.py", line 16, in __init__ newDoc.importNode(header, True) File "/usr/lib/python2.3/xml/dom/minidom.py", line 1730, in importNode return _clone_node(node, deep, self) File "/usr/lib/python2.3/xml/dom/minidom.py", line 1807, in _clone_node if node.ownerDocument.isSameNode(newOwnerDocument): AttributeError: 'NoneType' object has no attribute 'isSameNode' I use it that way because, i don't want that my "agregate" classes knows the "agragator", ie. i don't wan't that my Header class knows the MyDoc classes! If any body have any idea, any sample code would help me! Regards, Kevin Thackray -------------- next part -------------- A non-text attachment was scrubbed... Name: kevin.thackray.vcf Type: text/x-vcard Size: 70 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040504/40a20eaa/kevin.thackray.vcf From and-xml at doxdesk.com Tue May 4 08:48:17 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Tue May 4 09:47:06 2004 Subject: [XML-SIG] DocumentFragment ?? In-Reply-To: <409798C8.1080900@clarisys.fr> References: <409798C8.1080900@clarisys.fr> Message-ID: <40979111.2090007@doxdesk.com> Kevin Thackray wrote: > newDoc = Document() > class Header(DocumentFragment): > DocumentFragment.__init__(self) minidom makes no promises about using its internal classes directly rather than through the W3C standard DOM interfaces (createDocument and so on)*. Instantiating Node objects manually in later versions of minidom is likely to result in broken objects that can cause exceptions like these. In particular, you can't construct a DocumentFragment without telling it which Document it belongs to. Normally this would happen in the Document.createDocumentFragment method. Subclassing the DOM objects from minidom (or the other implementations I know of) is unsupported. You would have to either override all the relevant factory methods, or write back your extended versions into their namespace. Both approaches would require in-depth knowledge of the implementation's internal data structures, and would be incompatible across versions. I suggest using a containment relationship between Header and DocumentFragment, instead of inheritance. * - except in earlier versions of the Reference Manual, where Document was specified to be directly instantiable. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From kevin.thackray at clarisys.fr Tue May 4 10:16:16 2004 From: kevin.thackray at clarisys.fr (kevin Thackray) Date: Tue May 4 10:15:19 2004 Subject: [XML-SIG] DocumentFragment ?? In-Reply-To: <40979111.2090007@doxdesk.com> References: <409798C8.1080900@clarisys.fr> <40979111.2090007@doxdesk.com> Message-ID: <4097A5B0.2070404@clarisys.fr> lo, Thank you for your so quik reply. I tried your sugestion about classe linking with still minidom > > I suggest using a containment relationship between Header and > DocumentFragment, instead of inheritance. > And i raise an exception at the same place : class MyDocument: def __init__(self, demande): newDoc = implementation.createDocument(None, "CompteRendu", None) header = Header() newDoc.importNode(header.headerDoc, True) self.newDoc = newDoc def __str__(self): return str(PrettyPrint(self.newDoc)) class Header: def __init__(self): self.headerDoc = DocumentFragment() self.headerDoc.appendChild(Element("author")) traceback : Traceback (most recent call last): File "output.py", line 35, in ? doc = MyDocument(demande) File "/home/kevin/python/sdp/db2xml/DocTemplate.py", line 16, in __init__ newDoc.importNode(header.headerDoc, True) File "/usr/lib/python2.3/xml/dom/Document.py", line 163, in importNode return importedNode.cloneNode(deep, newOwner=self) TypeError: cloneNode() got an unexpected keyword argument 'newOwner' *********** I wasn't sure if the problem was minidom+inheritance or juste trucky things with minidom? Anyway, I will do the same thing that with standard DOM implementation (4Suite) Thank you so much, Regards, Kevin Thackray -------------- next part -------------- A non-text attachment was scrubbed... Name: kevin.thackray.vcf Type: text/x-vcard Size: 70 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040504/2295c8d7/kevin.thackray.vcf From and at doxdesk.com Tue May 4 09:56:35 2004 From: and at doxdesk.com (Andrew Clover) Date: Tue May 4 10:55:23 2004 Subject: [XML-SIG] DocumentFragment ?? In-Reply-To: <4097A5B0.2070404@clarisys.fr> References: <409798C8.1080900@clarisys.fr> <40979111.2090007@doxdesk.com> <4097A5B0.2070404@clarisys.fr> Message-ID: <4097A113.5030204@doxdesk.com> Kevin Thackray wrote: > And i raise an exception at the same place : Indeed, the same problem remains: > self.headerDoc = DocumentFragment() You shouldn't instantiate a DocumentFragment directly. Instead use the standard createDocumentFragment method of the Document you want it to belong to. Practically, what happens in this case is that it creates a DocumentFragment with no ownerDocument. This then causes an error when the ownerDocument is next checked (when you try to clone it). -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From hawkeye.parker at autodesk.com Wed May 5 14:41:15 2004 From: hawkeye.parker at autodesk.com (Hawkeye Parker) Date: Wed May 5 14:41:18 2004 Subject: [XML-SIG] programmatic xml schema validation Message-ID: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B04E40DE4@msgusawmb02.ads.autodesk.com> hi all, is there any xml schema validating parser for Python aside from XSV? hawkeye parker QA Programmer Autodesk, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040505/d0eb29a7/attachment.html From kevin.thackray at clarisys.fr Fri May 7 06:28:39 2004 From: kevin.thackray at clarisys.fr (kevin Thackray) Date: Fri May 7 06:27:47 2004 Subject: [XML-SIG] encoding problem with DOM Writing Message-ID: <409B64D7.3070909@clarisys.fr> hi everyone, I have an encoding problem when I am writing DOM document "from scratch". By the way, thank you Mr Andrew Clover for that help with wrinting dom from scratch, i got my DOM wrapper working :) ! I am using 4DOM, and PyPgSql for extracting datas from PostGres. I create my new document throught the interface : newDoc = implementation.createDocument(None, "DOC", None) Also, I am dealing with french datas so in the database there are some iso-88-59-1 characters. Then i add somes nodes, and when i want to print that out with PrettyPrint, it raise this error : (PrettyPrint(newDoc, encoding="iso-8859-1")) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 37-38: unexpected end of data When i try to encode database incomming datas with this function : def _encode(v): v = v.encode("iso-8859-1") return v This raise that exception : v = v.encode("iso-8859-1") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 37: ordinal not in range(128) I think my problem is that I don't handle the encoding while writing the new document, but i found no way to specify that with the creatDocument() interface. If anyone have any ideas or clues that you help me!! Regards, Kevin Thackray. -------------- next part -------------- A non-text attachment was scrubbed... Name: kevin.thackray.vcf Type: text/x-vcard Size: 70 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040507/aa3f9f40/kevin.thackray.vcf From kevin.thackray at clarisys.fr Fri May 7 10:31:40 2004 From: kevin.thackray at clarisys.fr (kevin Thackray) Date: Fri May 7 10:30:52 2004 Subject: [XML-SIG] encoding problem with DOM Writing Message-ID: <409B9DCC.50304@clarisys.fr> hi everyone, I finally figure out my problem : Through a lot of google's, i came across a PrettyPrint bug : the encoding is hard writen to utf-8, so my iso-8895-1 encoding was ignored. > > Then i add somes nodes, and when i want to print that out with > PrettyPrint, it raise this error : > (PrettyPrint(newDoc, encoding="iso-8859-1")) > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 37-38: > unexpected end of data > I patch it my self in the : /usr/lib/python2.3/xml/dom/ext/Printer.py def utf8_to_code(text, encoding): encoder = codecs.lookup(encoding)[0] # encode,decode,reader,writer if type(text) is not UnicodeType: #text = unicode(text, "utf-8") text = unicode(text, encoding) And everything work perfectly. I just wonder why, this bug which i found some traces in a mail of 2001, was still in my python standard distribution : Python 2.3.1 (#1, Sep 24 2003, 16:45:45) installed on a slackware, with slackware packaging : /var/log/packages/python-2.3.1-i486-1 /var/log/packages/python-demo-2.3.1-noarch-1 /var/log/packages/python-tools-2.3.1-noarch-1 Best Regards, Kevin Thackray -------------- next part -------------- A non-text attachment was scrubbed... Name: kevin.thackray.vcf Type: text/x-vcard Size: 70 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040507/d1f80c84/kevin.thackray.vcf From Nijhof at user-0c8hn36.cable.mindspring.com Sat May 8 09:55:07 2004 From: Nijhof at user-0c8hn36.cable.mindspring.com (Nijhof@user-0c8hn36.cable.mindspring.com) Date: Sat May 8 11:46:54 2004 Subject: [XML-SIG] Xml-sig, How do they sa-ck those c0c.k-s? In-Reply-To: References: Message-ID: Looks like you've come to a real Z00 here! Yeap! We have goats, we have horses, sheep, snakes, even dogs! e have lots of @n1m@ls here and we also have lots of g1r|s who just love to have some s. e -x with these creatures? How do they do it? http://zoo-action.com/av/val/?JTamA How do they sa-ck those c0c.k-s? How do they f@kk with snakes? Snakes don't have c0c.k-s!!! Guys! Our g1r|s can do it with every creature they want! They are ready for it! They are tired from men! They do realize that wild @n1m@ls are f@kking like no man would ever f@kk them. Cause they are animals and they f@kk just like everybody did thousands and millions years ago! http://zoo-action.com/av/val/?TQLUG Stunning 1ma-.ges, v1de0s, art series, lots of @n1m@ls, y0.u-n.g horny g1r|s spre@d1ng their legs and s@kking c0c-k.s! This is a first ever -X-.-X-.-X- zoo where every g1r| can f@kk the creature she wants! LOOK AT THIS NOW! wbmXLhTQ mMyoYaaFp From gael.pegliasco at free.fr Mon May 10 07:17:49 2004 From: gael.pegliasco at free.fr (=?iso-8859-1?b?R2HrbA==?= Pegliasco) Date: Mon May 10 07:18:03 2004 Subject: [XML-SIG]PyXML and XPath : simple Path expression seems to crash with brutality Message-ID: <1084187868.409f64dd01321@imp6-q.free.fr> Hello, I'm trying to test xpath with this simple program : import xml.dom.minidom from xml.xpath.Context import Context import xml.xpath s = ''' Someroto text More text ''' d = xml.dom.minidom.parseString(s) result=xml.xpath.Evaluate( '//elem/test', d.documentElement ) for node in result: print node, node.nodeName But it crashed with the message below: Traceback (most recent call last): File "./xpath.py", line 12, in ? result=xml.xpath.Evaluate( '//elem/test', d.documentElement ) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/__init__.py", line 70, in Evaluate retval = parser.new().parse(expr).evaluate(con) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/ParsedAbbreviatedAbsoluteLocationPath.py", line 44, in evaluate sub_rt.extend(self._rel.select(context)) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/ParsedRelativeLocationPath.py", line 23, in evaluate raise Exception("Expected node set from relative expression. Got %s"%str(rt)) Exception: Expected node set from relative expression. Got () If I try with this syntax, its works : result=xml.xpath.Evaluate( 'descendant::elem/test', d.documentElement ) I have the same problem with more complex requests : Like "//DOC.PRINCIPAL//FILE/@VOL_PAGE_SEQ" that I must rewrite like "descendant::DOC.PRINCIPAL/descendant::FILE/@VOL_PAGE_SEQ" else it crashed with a different message : Traceback (most recent call last): File "/edika/vol1/users/gpegliasco/projets/dev/tools/src/xpathgrep.py", line 233, in ? searchInFile( file, searchedObject ) File "/edika/vol1/users/gpegliasco/projets/dev/tools/src/xpathgrep.py", line 77, in searchInFile listNodes = xpath.Evaluate( xpathquery, context=con ) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/__init__.py", line 70, in Evaluate retval = parser.new().parse(expr).evaluate(con) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/ParsedAbbreviatedAbsoluteLocationPath.py", line 44, in evaluate sub_rt.extend(self._rel.select(context)) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/ParsedRelativeLocationPath.py", line 21, in evaluate rt = self._left.select(context) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/ParsedAbbreviatedRelativeLocationPath.py", line 52, in evaluate res = Set.Union(res,subRt) File "/usr/local/lib/python2.3/site-packages/_xmlplus/xpath/Set.py", line 25, in Union return compare + filter(lambda x,compare = compare:x not in compare,loop) TypeError: can only concatenate list (not "tuple") to list Does someone know what's wrong with these requests ? Is it a bad use from me or a bug or something else with PyXML ? Thank's for your help. With kind regards, Ga?l, From uche.ogbuji at fourthought.com Mon May 10 12:57:15 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Mon May 10 12:57:21 2004 Subject: [XML-SIG] XInclude In-Reply-To: <407C1C6D.6070805@inet.com> References: <407C1C6D.6070805@inet.com> Message-ID: <1084208234.9994.10496.camel@borgia> On Tue, 2004-04-13 at 10:59, hao xing wrote: > Hello: > > Is there any support to XInclude in the latest PyXml package? No, but 4Suite supports XInclude in all the basic libraries, including Domlette, if you are looking for a DOM-like package. http://uche.ogbuji.net -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ When to use elements versus attributes - http://www-106.ibm.com/developerworks/xml/library/x-eleatt.html Introducing PyRXP - http://www.xml.com/pub/a/2004/02/11/py-xml.html XML in the financial services industry - http://www-106.ibm.com/developerworks/xml/library/x-think22.html Python Web services developer: The real world, Part 2 - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth16/ Keep your XML clean - http://www.adtmag.com/article.asp?id=9012 From uche.ogbuji at fourthought.com Mon May 10 13:03:38 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Mon May 10 13:03:43 2004 Subject: [XML-SIG] DocumentFragment ?? In-Reply-To: <4097A5B0.2070404@clarisys.fr> References: <409798C8.1080900@clarisys.fr> <40979111.2090007@doxdesk.com> <4097A5B0.2070404@clarisys.fr> Message-ID: <1084208617.9994.10514.camel@borgia> On Tue, 2004-05-04 at 08:16, kevin Thackray wrote: > I wasn't sure if the problem was minidom+inheritance or juste trucky > things with minidom? > Anyway, I will do the same thing that with standard DOM implementation > (4Suite) Umm. It's worth pointing out that in this case the DOM standard *is* part of the problem (IMHO) Anyway, 4Suite's Domlettes are by admission *not* standard DOM, although they follow DOM wherever it doesn't get in the way of practicality. We view them as "close enough to DOM but not too close for comfort". -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ When to use elements versus attributes - http://www-106.ibm.com/developerworks/xml/library/x-eleatt.html Introducing PyRXP - http://www.xml.com/pub/a/2004/02/11/py-xml.html XML in the financial services industry - http://www-106.ibm.com/developerworks/xml/library/x-think22.html Python Web services developer: The real world, Part 2 - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth16/ Keep your XML clean - http://www.adtmag.com/article.asp?id=9012 From waitman at emkdesign.com Fri May 14 18:19:41 2004 From: waitman at emkdesign.com (Waitman C. Gobble, II) Date: Fri May 14 18:19:52 2004 Subject: [XML-SIG] simple php xbel parser/outputter Message-ID: <40A545FD.4000901@emkdesign.com> Hello I came across your site looking for a simple php script to format XBEL and display on a web page. I couldn't find any so I made my own. Nothing fancy, but it seems to work pretty good. http://wcg2.com/xbook/index.php if you would like to make it accessible to your visitors, that would be great. Thanks and Best Waitman Gobble -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040514/0514056b/attachment.html From bkline at rksystems.com Tue May 18 12:48:16 2004 From: bkline at rksystems.com (Bob Kline) Date: Tue May 18 12:37:41 2004 Subject: [XML-SIG] Change in reporting of CDATA sections Message-ID: We recently upgraded our servers from Python 2.2 to Python 2.3, and we noticed a change in the way xml.dom.minidom returns CDATA sections. In 2.2 the parser returned CDATA sections as TEXT_NODE objects. In 2.3 this changed to CDATA_SECTION_NODE. Could someone direct us to any online discussion behind the decision to make this change? We've look back through the subject lines for the last few months' worth of messages for this mailing list (unfortunately there doesn't appear to be a search interface), but didn't see anything. Thanks! -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From martin at v.loewis.de Tue May 18 13:29:43 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue May 18 13:29:46 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: References: Message-ID: <40AA4807.5060906@v.loewis.de> Bob Kline wrote: > We recently upgraded our servers from Python 2.2 to Python 2.3, and we > noticed a change in the way xml.dom.minidom returns CDATA sections. In > 2.2 the parser returned CDATA sections as TEXT_NODE objects. In 2.3 > this changed to CDATA_SECTION_NODE. Could someone direct us to any > online discussion behind the decision to make this change? There was no online discussion of this specific change. Instead, it came about as a side effect of the new expatbuilder module, which was announced with PyXML 0.8. Why do you ask? Regards, Martin From jason at mobarak.name Tue May 18 13:40:03 2004 From: jason at mobarak.name (Jason Mobarak) Date: Tue May 18 13:38:10 2004 Subject: [XML-SIG] XML Validation Message-ID: <40AA4A73.3000601@mobarak.name> Hello -- In short: what methods do you use to validate proper structure of an XML document? In programs I've written I've used minidom to traverse the nodes of an XML document and an FSM to verify that all tags appeared where they were allowed. This particular validation solution also allowed me to verify that the nodes of the XML document had the proper data. I'm definitely not an XML expert but from what I've heard my validation solution is not part of the "real" XML tool chain -- the valid structure of my document is described by Python code. I should be able to have a schema or DTD that specifies structure and valid content. However, DTDs seem limited, I haven't seen that they can verify the kind of complex structure that I might want, am I wrong? As far as I've seen a schema would provide the best solution. For validating XML documents with schemas, I haven't found any python libraries that I wanted to use (i.e. I don't want to try and use XSV, it's major focus seems to be the commandline/web, and I can't find adequate documentation). -- Jason From bkline at rksystems.com Tue May 18 14:32:51 2004 From: bkline at rksystems.com (Bob Kline) Date: Tue May 18 14:22:16 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <40AA4807.5060906@v.loewis.de> Message-ID: On Tue, 18 May 2004, "Martin v. L?wis" wrote: > Bob Kline wrote: > > We recently upgraded our servers from Python 2.2 to Python 2.3, and we > > noticed a change in the way xml.dom.minidom returns CDATA sections. In > > 2.2 the parser returned CDATA sections as TEXT_NODE objects. In 2.3 > > this changed to CDATA_SECTION_NODE. Could someone direct us to any > > online discussion behind the decision to make this change? > > There was no online discussion of this specific change. Instead, it > came about as a side effect of the new expatbuilder module, which was > announced with PyXML 0.8. Martin: Thanks for your reply. > > Why do you ask? Because I always like to find out as much as I can about the rationale for interface designs and specs for the APIs that I use. And I like to know why software behaves the way it does (including an understanding of which parts behave the way they do because the specifications behind the software say they must behave that way, and which parts behave the way they do by "chance" -- that is, the programmer was writing code in the absence of or in contradiction to a clear specification for the implemented behavior). I like to think that when a software package changes its behavior it does so because of a conscious decision made at some level for a good reason. I was hoping that someone in this forum might know where I could read about that reason (and ideally, the reasoning behind the old behavior, too). -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From walter at livinglogic.de Tue May 18 15:29:56 2004 From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue May 18 15:30:02 2004 Subject: [XML-SIG] ANN: ll-toxic 0.1 Message-ID: <40AA6434.4080602@livinglogic.de> ll-toxic 0.1 has been released! What is it? =========== ll-toxic is an XIST namespace that can be used for generating Oracle database functions that return XML strings. This is done by embedding processing instructions containing PL/SQL code into XML files and transforming those files with XIST. Where can I get it? =================== ll-toxic can be downloaded from http://ftp.livinglogic.de/toxic/ or ftp://ftp.livinglogic.de/pub/livinglogic/toxic/ Web pages are at http://www.livinglogic.de/Python/toxic/ ViewCVS access is available at http://www.livinglogic.de/viewcvs/ Bye, Walter D?rwald From chrish at cryptocard.com Tue May 18 15:34:30 2004 From: chrish at cryptocard.com (Chris Herborth) Date: Tue May 18 15:34:35 2004 Subject: [XML-SIG] XML Validation In-Reply-To: <40AA4A73.3000601@mobarak.name> References: <40AA4A73.3000601@mobarak.name> Message-ID: <40AA6546.8090201@cryptocard.com> Jason Mobarak wrote: > In short: what methods do you use to validate proper structure of an XML > document? I'm using pyRXPU to validate and parse the document into a DOM (using a pyRXPU -> DOM translator I wrote), along with a DTD. > I'm definitely not an XML expert but from what I've heard my validation > solution is not part of the "real" XML tool chain -- the valid > structure of my document is described by Python code. I should be able > to have a schema or DTD that specifies structure and valid content. > > However, DTDs seem limited, I haven't seen that they can verify the kind > of complex structure that I might want, am I wrong? DTDs very accurately and fairly concisely describe the _structure_ of the document, schemas describe the possible _contents_ and most of the structure. Yes, most of it, not all of it. So, if you want to be 100% accurate in all cases, you need a DTD and an XSD. It's been eight months since I used schemas for anything (previous job), so I can't remember exactly which bits of the structure the schema couldn't describe... oh, wait, maybe I can. It was the content model for each tag. I don't think it could be properly expressed in the XSD. -- Chris Herborth chrish@cryptocard.com Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/ Never send a monster to do the work of an evil scientist. From martin at v.loewis.de Tue May 18 16:02:11 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue May 18 16:02:13 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: References: Message-ID: <40AA6BC3.8020004@v.loewis.de> Bob Kline wrote: > I was hoping that someone in this forum > might know where I could read about that reason (and ideally, the > reasoning behind the old behavior, too). Ah, just ask :-) The current implementation tries to implement the DOM Level 3 Load-Store specification, which gives the application control over various details of how precisely the DOM tree is created. One such detail is whether CDATA sections are represented by text nodes or CDATA section nodes. The default for this setting is "on" (i.e. do create CDATA section nodes). The old implementation did not create CDATA section nodes, because, when it was first implemented, minidom did not even have CDATA section nodes. When they were added, the creation process was not changed. In any case, the old implementation also predates DOM L3 LS (but that is not the reason for not creating CDATA section nodes). Regards, Martin From martin at v.loewis.de Tue May 18 16:07:45 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue May 18 16:07:47 2004 Subject: [XML-SIG] XML Validation In-Reply-To: <40AA4A73.3000601@mobarak.name> References: <40AA4A73.3000601@mobarak.name> Message-ID: <40AA6D11.8070503@v.loewis.de> Jason Mobarak wrote: > In short: what methods do you use to validate proper structure of an XML > document? I don't believe in validation, atleast not as an online activity. This is like typing in Python: you don't declare types, you just use the values, and if your assumptions about the interface of the objects are wrong, you get an exception. For XML, that means that if you only need to look at a part of the document, don't complain that the rest doesn't follow some predefined structure. > However, DTDs seem limited, I haven't seen that they can verify the kind > of complex structure that I might want, am I wrong? Hard to tell. I don't know what complex structure you may want. DTDs can support quite complex structures. Regards, Martin From bkline at rksystems.com Tue May 18 16:46:14 2004 From: bkline at rksystems.com (Bob Kline) Date: Tue May 18 16:35:40 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <40AA6BC3.8020004@v.loewis.de> Message-ID: On Tue, 18 May 2004, "Martin v. L?wis" wrote: > Ah, just ask :-) The current implementation tries to implement the DOM > Level 3 Load-Store specification, which gives the application control > over various details of how precisely the DOM tree is created. One > such detail is whether CDATA sections are represented by text nodes or > CDATA section nodes. The default for this setting is "on" (i.e. do > create CDATA section nodes). > > The old implementation did not create CDATA section nodes, because, > when it was first implemented, minidom did not even have CDATA section > nodes. When they were added, the creation process was not changed. > > In any case, the old implementation also predates DOM L3 LS (but that > is not the reason for not creating CDATA section nodes). Thanks very much, Martin. This clears things up considerably. I hadn't gotten around to digging into the DOM3 specs (earlier Python's xml.dom having been based on DOM Level 2), and to be honest I never was successful in figuring out whether the earlier DOM specifications spelled out unambiguously whether a DOM parser was allowed to silently transform CDATA sections into text nodes. I do remember being surprised when I first discovered that the minidom implementation was doing this (and presumably the programmer who wrote the code in our system that was broken by the new behavior was also surprised to see text nodes coming back where he knew that the parsed document had CDATA sections was equally surprised). Did the earlier versions of the DOM spec allow this, or was that just a bug in the implementation? -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From martin at v.loewis.de Tue May 18 17:54:35 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue May 18 17:54:37 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: References: Message-ID: <40AA861B.8090009@v.loewis.de> Bob Kline wrote: > Did the earlier versions of the DOM spec allow > this, or was that just a bug in the implementation? DOM never had the notion of parsing, or of XML documents, for that matter. The DOM spec simply did not care where a DOM tree comes from; the only way to create one was to construct it from scratch (and then, there was no way to serialize a DOM tree into an XML document). Any way of integrating a parser into DOM was completely proprietary - both in the programming interface, and the semantics of that interface. Only with DOM level 3, load and store is considered. Given the past praxis, the only way to approach this is to make things highly customizable. So DOM LS has dozens of parameters that specify how to construct a DOM tree from an XML document. I personally think that CDATA sections are a mistake. They shouldn't have been in XML, and people should not use them. If that principle is followed, their representation in the DOM is irrelevant. Regards, Martin From bkline at rksystems.com Tue May 18 18:15:02 2004 From: bkline at rksystems.com (Bob Kline) Date: Tue May 18 18:04:22 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <40AA861B.8090009@v.loewis.de> Message-ID: On Tue, 18 May 2004, "Martin v. L?wis" wrote: > DOM never had the notion of parsing, or of XML documents, .... Once again, thanks very much for your excellent and clear explanations! -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From and at doxdesk.com Tue May 18 17:47:40 2004 From: and at doxdesk.com (Andrew Clover) Date: Tue May 18 18:46:21 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <40AA6BC3.8020004@v.loewis.de> References: <40AA6BC3.8020004@v.loewis.de> Message-ID: <40AA847C.8060204@doxdesk.com> Martin v. L?wis wrote: > One such detail is whether CDATA sections are represented > by text nodes or CDATA section nodes. The default for this setting > is "on" (i.e. do create CDATA section nodes). In general, the default for the parameter 'cdata-sections' is indeed True. However, due to a well-hidden trap in the Load/Save spec, parsing operations default 'cdata-sections' to False: LS #parameter-infoset: See the definition of DOMConfiguration for a description of this parameter. Unlike in [DOM Level 3 Core], this parameter will default to true for LSParser. Core #parameter-infoset: This forces the following parameters to false: "validate-if-schema", "entities", "datatype-normalization", "cdata-sections". So by default both CDATA sections and entity references (which don't currently happen in minidom anyway) should not be generated. > I personally think that CDATA sections are a mistake. They > shouldn't have been in XML, and people should not use them. I kind of agree; their use-case is very marginal. Typing a few extra < and &s is not usually much of a hardship even to hand-coders. I would have liked to have seen XML lose all the nonsense about DTDs, default attributes and entity references too, but it's a bit too late now! -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From qpgtrrkgf at fct.unl.pt Wed May 19 18:28:53 2004 From: qpgtrrkgf at fct.unl.pt (Quentin Summers) Date: Thu May 20 22:17:33 2004 Subject: [XML-SIG] 63~ I'm sick of working for a living Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040519/1635ee3e/attachment.html From uche.ogbuji at fourthought.com Sat May 22 18:17:25 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Sat May 22 18:17:30 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <40AA847C.8060204@doxdesk.com> References: <40AA6BC3.8020004@v.loewis.de> <40AA847C.8060204@doxdesk.com> Message-ID: <1085264245.4164.3491.camel@borgia> On Tue, 2004-05-18 at 15:47, Andrew Clover wrote: > > I personally think that CDATA sections are a mistake. They > > shouldn't have been in XML, and people should not use them. > > I kind of agree; their use-case is very marginal. Typing a few extra > < and &s is not usually much of a hardship even to hand-coders. > I would have liked to have seen XML lose all the nonsense about DTDs, > default attributes and entity references too, but it's a bit too late now! I disagree, and I use CDATA sections a lot. Try writing an article about XML *in* XML (e.g. XHTML). You might also become a fan :-) They are also useful to augment the toolchain in some scenarios where you're dealing with unknown data sources or unsophisticated authors. As long as people understand that they're a simple lexical convenience, I'm not sure what their harm is. I'm not sure any level of DOM has a sane treatment of CDATA sections, but this is the fault of DOM, not XML 1.0. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ When to use elements versus attributes - http://www-106.ibm.com/developerworks/xml/library/x-eleatt.html Introducing PyRXP - http://www.xml.com/pub/a/2004/02/11/py-xml.html XML in the financial services industry - http://www-106.ibm.com/developerworks/xml/library/x-think22.html Python Web services developer: The real world, Part 2 - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth16/ Keep your XML clean - http://www.adtmag.com/article.asp?id=9012 From and-xml at doxdesk.com Sat May 22 18:23:00 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Sat May 22 19:21:35 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <1085264245.4164.3491.camel@borgia> References: <40AA6BC3.8020004@v.loewis.de> <40AA847C.8060204@doxdesk.com> <1085264245.4164.3491.camel@borgia> Message-ID: <40AFD2C4.3070106@doxdesk.com> Uche Ogbuji wrote: > I disagree, and I use CDATA sections a lot. Try writing an article > about XML *in* XML (e.g. XHTML). You might also become a fan :-) I think that's the toolchain's job. In an ideal world there'd be an XML editor that wasn't awful (!) but it's easy enough with a decent text editor to write some XML, select it and encode/decode the offending characters. S'what I do, anyway. :-) > As long as people understand that they're a simple lexical convenience, > I'm not sure what their harm is. You're right: at an XML-parsing level they're not too bad, but still only a rather minor convenience. The problem is that they add complexity without completely solving the problem - if you are writing an XML article about CDATA sections, for example, you can't use a literal ']]>'! > I'm not sure any level of DOM has a sane treatment of CDATA sections I'm with you here, it's the DOM that's the real problem. Aside from normalising text together being defeated by them, the issues with splitting CDATA sections for ']]>' and out-of-encoding characters in DOM3 are an extra annoyance and likely source of bugs for implementations. The legacy nonsense from DTDs is a much worse issue in my book: it turns XML from a simple, easy-to-grok-and-knock-up-a-noddy-parser-for notation into a maze of twisty little bugs, all alike. Manifesto for a cleaner XML more suited to simple tasks (ohmygod Microsoft want to put XML in the DNS argh etc.): - no doctypes DTD validation is underpowered, ineffective for namespaces, and does not deserve to be part of the basic required XML syntax. Validation should be done as a layer on top of XML (Schema, RNG), not as part of the basic required syntax. - no entity references most common use case: named character escapes: character references are almost as convenient and anyway you should be using an encoding that doesn't require you escape them. Further use case: inclusions: use XInclude or similar processing layer on top of XML. Entity references are not worth the *enormous* complexity they add to the DOM (if implemented completely, anyway) - no default attribute values how hard is it for an application to take null (or '') for an answer? - no CDATA sections at least at a DOM level - no attribute normalisation seems to be barely used, and confuses DOM a treat - xmlns: declarations on the root element only, unique URIs being able to reuse prefixes over the document for eg. inclusions is not worth the pain of namespace fixup and broken interaction between DOM1 and DOM2 methods any I missed? Been having a grim day tracking down obscure DOM bugs and interactions, hope everyone is having a fun weekend. I'll stop ranting now then. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From and-xml at doxdesk.com Mon May 24 08:54:26 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Mon May 24 09:52:56 2004 Subject: [XML-SIG] Python domts 0.5 available Message-ID: <40B1F082.8080200@doxdesk.com> domts is a package for running the W3C DOM Test Suite against various Python DOM implementations. I've updated domts to understand some of the recent additions to TSML, and added a proper info page for it, namely: http://www.doxdesk.com/software/py/domts.html -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From dflists at iinet.net.au Tue May 25 00:57:56 2004 From: dflists at iinet.net.au (Derek Fountain) Date: Tue May 25 00:55:34 2004 Subject: [XML-SIG] 4Suite.org down? Message-ID: <200405251257.56751.dflists@iinet.net.au> Is http://www.4suite.org down? I'm getting "You don't have permission to access / on this server." messages from that and fourthought.com. I can't post to the 4suite mailing list either - I get error returns back saying the recipient isn't known. Is it them or is it me? From mike at skew.org Tue May 25 11:44:14 2004 From: mike at skew.org (Mike Brown) Date: Tue May 25 11:44:28 2004 Subject: [XML-SIG] 4Suite.org down? In-Reply-To: <200405251257.56751.dflists@iinet.net.au> "from Derek Fountain at May 25, 2004 12:57:56 pm" Message-ID: <200405251544.i4PFiEvg033720@chilled.skew.org> Derek Fountain wrote: > Is http://www.4suite.org down? Yes, as is fourthought.com and uche.ogbuji.net. It'll be fixed soon. Sorry for the downtime. It wasn't a software problem, I'm relieved to say! From martin at v.loewis.de Tue May 25 16:01:11 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue May 25 16:01:17 2004 Subject: [XML-SIG] Change in reporting of CDATA sections In-Reply-To: <1085264245.4164.3491.camel@borgia> References: <40AA6BC3.8020004@v.loewis.de> <40AA847C.8060204@doxdesk.com> <1085264245.4164.3491.camel@borgia> Message-ID: <40B3A607.4050608@v.loewis.de> Uche Ogbuji wrote: > They are also useful to augment the toolchain in some scenarios where > you're dealing with unknown data sources or unsophisticated authors. > > As long as people understand that they're a simple lexical convenience, > I'm not sure what their harm is. The harm is precisely that they are a pitfall for unsophisticated authors. They *think* they can put arbitrary binary data into a CDATA section, and they cannot. They then wonder how Web Services manage to transmit octet strings, if not in CDATA sections. Regards, Martin From tpassin at comcast.net Sun May 30 13:42:14 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Sun May 30 13:40:37 2004 Subject: [XML-SIG] PyXML Windows Installation to Multiple Python Installations? Message-ID: <40BA1CF6.5030704@comcast.net> I have two Python 2.3 installations on my machine - a standard one and the one that comes with Zope. However, there is only one registry entry pointing to the Python location. The PyXML 2.3 Windows installer points only to the location stored in the registry, and does not allow me to browse to another. So I cannot install to my standard Python installation. The workaround I have used in the past for other installs (not PyXML) is to install to the one place found by the installer, then copy to the other (erasing all .pyc and .pyo files). I have three questions - 1) Is there a better workaround? 2) Can the installer be revised to allow the user to browse for a Python location? 3) Is it possible to add additional Python installations to the registry? The key structure used by Python does not look like it is possible, but maybe there is some way to do it. I haven't found anything on the Web so far (e.g., Google). Cheers, Tom P From martin at v.loewis.de Mon May 31 04:13:40 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon May 31 04:13:50 2004 Subject: [XML-SIG] PyXML Windows Installation to Multiple Python Installations? In-Reply-To: <40BA1CF6.5030704@comcast.net> References: <40BA1CF6.5030704@comcast.net> Message-ID: <40BAE934.5000905@v.loewis.de> Thomas B. Passin wrote: > 1) Is there a better workaround? Yes: Install the source package, using python.exe setup.py install. Just use the desired python.exe in the process. > 2) Can the installer be revised to allow the user to browse for a Python > location? No. It is build using bdist_wininst, and it is very limited. > 3) Is it possible to add additional Python installations to the > registry? Only for different 2.x versions, which wouldn't help, because the PyXML distribution requires a match for the 2.x version. Regards, Martin From tpassin at comcast.net Mon May 31 11:02:33 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Mon May 31 11:00:55 2004 Subject: [XML-SIG] PyXML Windows Installation to Multiple Python Installations? In-Reply-To: <40BAE934.5000905@v.loewis.de> References: <40BA1CF6.5030704@comcast.net> <40BAE934.5000905@v.loewis.de> Message-ID: <40BB4909.6080300@comcast.net> Martin v. L?wis wrote: > Thomas B. Passin wrote: > >> 1) Is there a better workaround? > > > Yes: Install the source package, using python.exe setup.py install. > Just use the desired python.exe in the process. > Unfortunately I wouldn't be able to compile the parts written in C - they need to be compiled with the Microsoft compiler ans libraries, do they not? Here is how I finally solved the problem - 1) Run regedit and go to the key for PythonCore 2.3 (or whichever version). 2) Export the whole key to a .reg file so you can restor it later. 3) Change the default value of the InstallPath key to point to the location where the desired copy of python.exe is located. 4) Run the install program for PyXML. 5) Merge your saved .reg fileback (or change the InstallPath value by hand back to its original value). This sound cumbersome, but I actually found it pretty easy and quick, once I had figured out what to do. It does point to a weakness in the design of the keys that the Python installer adds to the registry, though. I think it is not uncommon to have several Python installations of the same version on the same machine, and the registry setup should be able to refelct that. This is nothing to do with PyXML, of course, but since Martin is closely involved with packaging Python itself (that's right, isn't it, Martin?), This seemed like a good place to make the remark. Cheers, Tom P From martin at v.loewis.de Mon May 31 15:15:47 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon May 31 15:16:03 2004 Subject: [XML-SIG] PyXML Windows Installation to Multiple Python Installations? Message-ID: <40BB8463.8030508@v.loewis.de> Thomas B. Passin wrote: > It does point to a weakness in the design of the keys that the Python > installer adds to the registry, though. I think it is not uncommon to > have several Python installations of the same version on the same > machine, and the registry setup should be able to refelct that. With the current setup, that wouldn't really work: each installation would need to know which part of the registry is theirs. > This is > nothing to do with PyXML, of course, but since Martin is closely > involved with packaging Python itself (that's right, isn't it, Martin?), Yes, altough bdist_wininst is Thomas Heller's doing. It is intentionally simple to use for the packager; this, in turn, is essential because it otherwise would not get used. As a result, it only supports the most common case. I plan to add another command to distutils, bdist_msi, which generates a Microsoft Installer packager. I'll try to incorporate the idea of allowing users to browse for a Python installation there, but that library is still months ahead. Regards, Martin From jdukarm at hydracen.com Mon May 31 15:22:54 2004 From: jdukarm at hydracen.com (Jim Dukarm) Date: Mon May 31 15:23:00 2004 Subject: [XML-SIG] PyXML Windows Installation to =?iso-8859-1?q?Multiple=09Python=09Installations=3F?= In-Reply-To: <40BB8463.8030508@v.loewis.de> References: <40BB8463.8030508@v.loewis.de> Message-ID: <200405311222.54749@CYGNUS-KMAIL> Would a .pth file placed in the relevant Python directories solve this problem? Jim Dukarm DELTA-X RESEARCH Victoria BC Canada From tpassin at comcast.net Mon May 31 18:01:10 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Mon May 31 17:59:33 2004 Subject: [XML-SIG] PyXML Windows Installation to Multiple Python Installations? In-Reply-To: <200405311222.54749@CYGNUS-KMAIL> References: <40BB8463.8030508@v.loewis.de> <200405311222.54749@CYGNUS-KMAIL> Message-ID: <40BBAB26.3070906@comcast.net> Jim Dukarm wrote: > Would a .pth file placed in the relevant Python directories solve this > problem? Unfortunately not, because the installer would first have to find those directories before it runs. I'm referring to the Windows .exe installers (to become .msi in the future, it seems), and they do not use Python. set.py would, but most of us would not be able to compile the package on Windows. Cheers, Tom P