From kamel.hamard at ericsson.com Wed Oct 1 23:13:47 2003 From: kamel.hamard at ericsson.com (Kamel Hamard (QC/EMC)) Date: Wed Oct 1 23:18:19 2003 Subject: [XML-SIG] Memory Problem Message-ID: <2DBF697D5B36014ABA46E66A96107DA072061F@lmc37.lmc.ericsson.se> Hi, I'm getting errors when executing my python program which is using 4Suite. Here are the errors: sys.excepthook is missing Traceback (most recent call last): File "/opt/Py/lib/python2.2/site-packages/Ft/Xml/Xslt/StylesheetHandler.py", line 618, in _combine_stylesheet include = self.clone().fromSrc(new_source) File "/opt/Py/lib/python2.2/site-packages/Ft/Xml/Xslt/StylesheetReader.py", line 159, in fromSrc stylesheet = self._parseSrc(new_source) File "/opt/Py/lib/python2.2/site-packages/Ft/Xml/Xslt/StylesheetReader.py", line 179, in _parseSrc success = self.parser.ParseFile(src.stream) File "/opt/Py/lib/python2.2/site-packages/Ft/Xml/Xslt/StylesheetHandler.py", line 360, in startElement inst_dict[instance_var_name] = value MemoryError Error in sys.exitfunc: Traceback (most recent call last): File "/opt/Py/lib/python2.2/atexit.py", line 20, in _run_exitfuncs apply(func, targs, kargs) File "/opt/Py/lib/python2.2/threading.py", line 538, in __exitfunc t = _pickSomeNonDaemonThread() File "/opt/Py/lib/python2.2/threading.py", line 550, in _pickSomeNonDaemonThread for t in enumerate(): MemoryError Is it possible to have an idea about the problem? Is it possible that the dom is consuming lot of memory when using XSL API? I'm able to get my document in XML but when I start transforming it with XSL API, the script crashes. /Regards From noreply at sourceforge.net Thu Oct 2 05:28:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Oct 2 05:28:40 2003 Subject: [XML-SIG] [ pyxml-Bugs-816442 ] Code error on Windows, py2.3 : no attribute "_id_cache" Message-ID: Bugs item #816442, was opened at 2003-10-02 11:28 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=816442&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: C?dric Dutoit (dutoitc) Assigned to: Nobody/Anonymous (nobody) Summary: Code error on Windows, py2.3 : no attribute "_id_cache" Initial Comment: Code error on Windows, py2.3 : The following code : a=Document() b=Element("x") a.appendChild(b) Give me the following error : "NoneType" object has no attribute "_id_cache" exceptions.AttributeError in ... ...minidom.py, 126, appendChild, _clear_id_cache(self) ... minidom.py, 1461, _clear_id_cache, node.ownerDocument._id_cache.clear() Where does this error come from ? I think that this is related to Python 2.3. I've the following versions : Python 2.3.1 pyxml-0.8.3.win32-py2.3.exe Thanks in advance for your help, C.Dutoit ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=816442&group_id=6473 From noreply at sourceforge.net Thu Oct 2 06:50:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Oct 2 06:50:08 2003 Subject: [XML-SIG] [ pyxml-Bugs-816478 ] Code error on Windows, py2.3 : no attribute "_id_cache" Message-ID: Bugs item #816478, was opened at 2003-10-02 12:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=816478&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: C?dric Dutoit (dutoitc) Assigned to: Nobody/Anonymous (nobody) Summary: Code error on Windows, py2.3 : no attribute "_id_cache" Initial Comment: Code error on Windows, py2.3 : The following code : a=Document() b=Element("x") a.appendChild(b) Give me the following error : "NoneType" object has no attribute "_id_cache" exceptions.AttributeError in ... ...minidom.py, 126, appendChild, _clear_id_cache(self) ... minidom.py, 1461, _clear_id_cache, node.ownerDocument._id_cache.clear() Where does this error come from ? I think that this is related to Python 2.3. I've the following versions : Python 2.3.1 pyxml-0.8.3.win32-py2.3.exe Thanks in advance for your help, C.Dutoit ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=816478&group_id=6473 From KSBeattie at lbl.gov Thu Oct 2 18:33:00 2003 From: KSBeattie at lbl.gov (Keith Beattie) Date: Thu Oct 2 18:33:04 2003 Subject: [XML-SIG] Source Forge cvs login troubles Message-ID: <3F7CA79C.5080903@lbl.gov> Thought I'd try to patch the recent issue I found in c14n, but I can't seem to get started with cvs from sourceforge: $ cvs -t -d :pserver:anonymous@cvs.sourceforge.net:/cvsroot/pyxml login -> main loop with CVSROOT=:pserver:anonymous@cvs.sourceforge.net:/cvsroot/pyxml Logging in to :pserver:anonymous@cvs.sourceforge.net:2401/cvsroot/pyxml CVS password: -> Connecting to cvs.sourceforge.net(66.35.250.207):2401 cvs [login aborted]: end of file from server (consult above messages if any) -> Lock_Cleanup() $ ~/.cvspass isn't created, and this has been happening all day. Any suggestions? Or is this a load issue on cvs.sourceforge.net? Thanks, ksb From rsalz at datapower.com Thu Oct 2 21:27:31 2003 From: rsalz at datapower.com (Rich Salz) Date: Thu Oct 2 21:27:35 2003 Subject: [XML-SIG] Source Forge cvs login troubles In-Reply-To: <3F7CA79C.5080903@lbl.gov> Message-ID: SF recently put out a note about how their CVS servers are way overloaded, and that they're moving anon-cvs off to another bank of machines, but that copying the repositories takes days. -- Rich Salz Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From Ferg.Stephen at bls.gov Fri Oct 3 08:42:28 2003 From: Ferg.Stephen at bls.gov (Ferg, Stephen - BLS) Date: Fri Oct 3 08:42:35 2003 Subject: [XML-SIG] a PDF version of the XML how-to? Message-ID: On http://pyxml.sourceforge.net/topics/docs.html there is a link to the XML how-to in HTML format. Is there a PDF version available, so I can print it all easily? -- Steve Stephen Ferg ferg_s@bls.gov 202-691-7257
Bureau of Labor Statistics, Room 5110 2 Mass. Ave. NE Washington, DC 20212-0001 USA
From Radovan.Chytracek at cern.ch Fri Oct 3 08:49:09 2003 From: Radovan.Chytracek at cern.ch (Radovan Chytracek) Date: Fri Oct 3 09:38:53 2003 Subject: [XML-SIG] a PDF version of the XML how-to? Message-ID: Hi, I produced it from text source, see attached Cheers Radovan > -----Original Message----- > From: Ferg, Stephen - BLS [mailto:Ferg.Stephen@bls.gov] > Sent: Friday, October 03, 2003 2:42 PM > To: 'xml-sig@python.org' > Subject: [XML-SIG] a PDF version of the XML how-to? > > > On http://pyxml.sourceforge.net/topics/docs.html > there is a > link to the XML how-to in HTML format. Is there a PDF > version available, so I can print it all easily? > > -- Steve > Stephen Ferg > ferg_s@bls.gov > 202-691-7257 >
> Bureau of Labor Statistics, Room 5110 > 2 Mass. Ave. NE > Washington, DC 20212-0001 > USA >
> > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-> sig > -------------- next part -------------- A non-text attachment was scrubbed... Name: xml-ref.pdf Type: application/octet-stream Size: 91672 bytes Desc: xml-ref.pdf Url : http://mail.python.org/pipermail/xml-sig/attachments/20031003/28fd56c8/xml-ref-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: xml-howto.pdf Type: application/octet-stream Size: 86653 bytes Desc: xml-howto.pdf Url : http://mail.python.org/pipermail/xml-sig/attachments/20031003/28fd56c8/xml-howto-0001.obj From dkgunter at lbl.gov Fri Oct 3 11:58:56 2003 From: dkgunter at lbl.gov (Dan Gunter) Date: Fri Oct 3 11:59:38 2003 Subject: [XML-SIG] xml book recommendation Message-ID: <3F7D9CC0.9050100@lbl.gov> As someone who knows XML but not all the details of W3C's XML Schema, I found the book "Definitive XML Schema" very useful. It is clearly written and covers a lot of ground. Unlike some other books I've seen, the author uses actual English prose to describe how things fit together, as opposed to some hideous BNF grammar that belongs in a spec, not a how-to book. title: Definitive XML Schema author: Priscilla Walmsley - Dan Gunter From cobra96 at ms9.hinet.net Mon Oct 6 21:11:48 2003 From: cobra96 at ms9.hinet.net (cobra96@ms9.hinet.net) Date: Mon Oct 6 23:40:38 2003 Subject: [XML-SIG] windows 98 se can't install pyxml Message-ID: <200310070111.JAA06297@msr21.hinet.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20031007/851934f4/attachment.html From KSBeattie at lbl.gov Tue Oct 7 21:52:31 2003 From: KSBeattie at lbl.gov (Keith Beattie) Date: Tue Oct 7 21:53:10 2003 Subject: [XML-SIG] c14n attribute ordering problem? In-Reply-To: References: Message-ID: <3F836DDF.5040803@lbl.gov> Rich Salz wrote: > The Python code (which I now think is wrong, not libxml/xmlsec) is: > def _sorter_ns(n1,n2): > '''_sorter_ns((n,v),(n,v)) -> int > "(an empty namespace URI is lexicographically least)."''' > > if n1[0] == 'xmlns': return -1 > if n2[0] == 'xmlns': return 1 > return cmp(n1[0], n2[0]) > > Should that cmp be using [1] instead of [0]? So I've taken a walk through c14n.py and that code isn't getting called by my counter-example (of the c14n'ing the sub-node). It looks like _sorter (rather than _sorter_ns, which will do the right thing) is being used for sorting those inherited NS attrs at line 324. I'm still digging at how to fix that, but should I submit a new bug at http://sourceforge.net/projects/pyxml/ ? (I just created an account but otherwise I'm a newbie when it comes to contributing to SF projects). If so, I assume it would be under the DOM category? Thanks, ksb From amarre at mac.com Wed Oct 8 20:56:07 2003 From: amarre at mac.com (Alexis Marrero) Date: Wed Oct 8 20:51:11 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? Message-ID: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> Is this a bug? Is there any work around? doc = minidom.parseString('value') doc.toxml() Traceback (most recent call last): File "", line 1, in ? File "/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/_xmlplus/dom/minidom.py", line 47, in toxml return self.toprettyxml("", "", encoding) File "/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/_xmlplus/dom/minidom.py", line 59, in toprettyxml self.writexml(writer, "", indent, newl, encoding) File "/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/_xmlplus/dom/minidom.py", line 1739, in writexml node.writexml(writer, indent, addindent, newl) File "/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/_xmlplus/dom/minidom.py", line 816, in writexml _write_data(writer, attrs[a_name].value) File "/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/_xmlplus/dom/minidom.py", line 304, in _write_data data = data.replace("&", "&").replace("<", "<") AttributeError: 'NoneType' object has no attribute 'replace' My environment is: Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)] on darwin Regards, amn From Alexandre.Fayolle at logilab.fr Thu Oct 9 06:13:47 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Thu Oct 9 06:13:55 2003 Subject: [XML-SIG] [ANN] xmldiff-0.6.4 Message-ID: <20031009101347.GP23519@calvin> Logilab has released xmldiff-0.6.4 What's new? ----------- This release fixes a bug which broke directory processing in recursive mode. A --help command line option was added to xlmrev, and the documentation was rewritten in restructured text. About xmldiff ------------- Xmldiff is a Python tool that figures out the differences between two similar XML files, in the same way the diff utility does for text files. The output can use a home brewn format, or XUpdate. Xmldiff can also be used as a python module, and use already parsed DOM trees for input. It requires pyxml[1] to be installed. Xmldiff is released under the GPL. URLs ---- Homepage: http://www.logilab.org/projects/xmldiff/ Download: ftp://ftp.logilab.org/pub/common/xmldiff-0.6.4.tar.gz Mailing list: http://lists.logilab.org/pipermail/xml-projects/ [1] http://pyxml.sf.net/ -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From Alexandre.Fayolle at logilab.fr Thu Oct 9 06:20:04 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Thu Oct 9 06:20:11 2003 Subject: [XML-SIG] Re: [ANN] xmldiff-0.6.4 In-Reply-To: <20031009101347.GP23519@calvin> References: <20031009101347.GP23519@calvin> Message-ID: <20031009102003.GQ23519@calvin> On Thu, Oct 09, 2003 at 12:13:47PM +0200, Alexandre Fayolle wrote: > Download: ftp://ftp.logilab.org/pub/common/xmldiff-0.6.4.tar.gz Sorry, wrong URL. Please use ftp://ftp.logilab.org/pub/xmldiff/xmldiff-0.6.4.tar.gz -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From Alexandre.Fayolle at logilab.fr Thu Oct 9 06:13:47 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Oct 10 04:03:39 2003 Subject: [XML-SIG] [ANN] xmldiff-0.6.4 Message-ID: <20031009101347.GP23519@calvin> Logilab has released xmldiff-0.6.4 What's new? ----------- This release fixes a bug which broke directory processing in recursive mode. A --help command line option was added to xlmrev, and the documentation was rewritten in restructured text. About xmldiff ------------- Xmldiff is a Python tool that figures out the differences between two similar XML files, in the same way the diff utility does for text files. The output can use a home brewn format, or XUpdate. Xmldiff can also be used as a python module, and use already parsed DOM trees for input. It requires pyxml[1] to be installed. Xmldiff is released under the GPL. URLs ---- Homepage: http://www.logilab.org/projects/xmldiff/ Download: ftp://ftp.logilab.org/pub/common/xmldiff-0.6.4.tar.gz Mailing list: http://lists.logilab.org/pipermail/xml-projects/ [1] http://pyxml.sf.net/ -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations -- http://mail.python.org/mailman/listinfo/python-list From Alexandre.Fayolle at logilab.fr Thu Oct 9 06:20:04 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Oct 10 11:56:37 2003 Subject: [XML-SIG] Re: [ANN] xmldiff-0.6.4 In-Reply-To: <20031009101347.GP23519@calvin> References: <20031009101347.GP23519@calvin> Message-ID: <20031009102003.GQ23519@calvin> On Thu, Oct 09, 2003 at 12:13:47PM +0200, Alexandre Fayolle wrote: > Download: ftp://ftp.logilab.org/pub/common/xmldiff-0.6.4.tar.gz Sorry, wrong URL. Please use ftp://ftp.logilab.org/pub/xmldiff/xmldiff-0.6.4.tar.gz -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations -- http://mail.python.org/mailman/listinfo/python-list From Alexandre.Fayolle at logilab.fr Thu Oct 9 06:13:47 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Oct 10 14:54:06 2003 Subject: [XML-SIG] [ANN] xmldiff-0.6.4 Message-ID: <20031009101347.GP23519@calvin> Logilab has released xmldiff-0.6.4 What's new? ----------- This release fixes a bug which broke directory processing in recursive mode. A --help command line option was added to xlmrev, and the documentation was rewritten in restructured text. About xmldiff ------------- Xmldiff is a Python tool that figures out the differences between two similar XML files, in the same way the diff utility does for text files. The output can use a home brewn format, or XUpdate. Xmldiff can also be used as a python module, and use already parsed DOM trees for input. It requires pyxml[1] to be installed. Xmldiff is released under the GPL. URLs ---- Homepage: http://www.logilab.org/projects/xmldiff/ Download: ftp://ftp.logilab.org/pub/common/xmldiff-0.6.4.tar.gz Mailing list: http://lists.logilab.org/pipermail/xml-projects/ [1] http://pyxml.sf.net/ -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations -- http://mail.python.org/mailman/listinfo/python-list From and-xml at doxdesk.com Mon Oct 13 04:40:05 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Mon Oct 13 04:55:51 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? In-Reply-To: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> References: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> Message-ID: <20031013084005.GA31515@doxdesk.com> Alexis Marrero wrote: > Is this a bug? > doc = minidom.parseString('value') > doc.toxml() IMO yes, but the bug occurs before doc.toxml. After the parseString call, doc.documentElement.attributes.item(0).value evaluates to None instead of '' (and it is this that causes the later exception). I suspect this code in expatbuilder.py, line 764: d = a.childNodes[0].__dict__ d['data'] = d['nodeValue'] = uri d = a.__dict__ d['value'] = d['nodeValue'] = uri Adding: if uri is None: uri= '' immediately preceding this code seems to fix the problem, though I haven't investigated whether this is the best way of doing it. Add to bug tracker [Y|N]? -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From tpassin at comcast.net Mon Oct 13 11:49:51 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Mon Oct 13 11:47:56 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? In-Reply-To: <20031013084005.GA31515@doxdesk.com> References: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> <20031013084005.GA31515@doxdesk.com> Message-ID: <3F8AC99F.7050805@comcast.net> Andrew Clover wrote: > Alexis Marrero wrote: > > >>Is this a bug? > > >>doc = minidom.parseString('value') >>doc.toxml() > > > IMO yes, but the bug occurs before doc.toxml. After the > parseString call, doc.documentElement.attributes.item(0).value > evaluates to None instead of '' (and it is this that causes the > later exception). > However, the empty namespace is _supposed_ to be represented by None. So the bug actually occurs in whatever code still thinks (from long ago) that it should be an empty string instead. > I suspect this code in expatbuilder.py, line 764: > > d = a.childNodes[0].__dict__ > d['data'] = d['nodeValue'] = uri > d = a.__dict__ > d['value'] = d['nodeValue'] = uri > > Adding: > > if uri is None: > uri= '' > > immediately preceding this code seems to fix the problem, though > I haven't investigated whether this is the best way of doing it. > > Add to bug tracker [Y|N]? > N (better to fix the real problem, which is apparently not here). Cheers, Tom P From fdrake at acm.org Mon Oct 13 12:27:02 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon Oct 13 12:27:11 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? In-Reply-To: <20031013084005.GA31515@doxdesk.com> References: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> <20031013084005.GA31515@doxdesk.com> <3F8AC99F.7050805@comcast.net> Message-ID: <16266.53846.885905.191406@grendel.zope.com> Andrew Clover writes: > IMO yes, but the bug occurs before doc.toxml. After the > parseString call, doc.documentElement.attributes.item(0).value > evaluates to None instead of '' (and it is this that causes the > later exception). Yep, that's the bug. > I suspect this code in expatbuilder.py, line 764: > > d = a.childNodes[0].__dict__ > d['data'] = d['nodeValue'] = uri > d = a.__dict__ > d['value'] = d['nodeValue'] = uri As well you should! ;-) > Adding: > > if uri is None: > uri= '' > > immediately preceding this code seems to fix the problem, though > I haven't investigated whether this is the best way of doing it. > > Add to bug tracker [Y|N]? No. I've just checked in the fix, so there's no need. Thomas: You're right that None should be used for the namespaceURI for the empty namespace, but it shouldn't be used as the value for the attribute. The attribute value itself is still the empty string. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From tpassin at comcast.net Mon Oct 13 15:54:36 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Mon Oct 13 15:52:39 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? In-Reply-To: <16266.53846.885905.191406@grendel.zope.com> References: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> <20031013084005.GA31515@doxdesk.com> <3F8AC99F.7050805@comcast.net> <16266.53846.885905.191406@grendel.zope.com> Message-ID: <3F8B02FC.2070608@comcast.net> Fred L. Drake, Jr. wrote: > Thomas: You're right that None should be used for the namespaceURI > for the empty namespace, but it shouldn't be used as the value for the > attribute. The attribute value itself is still the empty string. > Oh, yes, I see it. I missed that we were still looking at the attribute itself. Cheers, Tom P From and-xml at doxdesk.com Tue Oct 14 05:10:30 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Tue Oct 14 05:26:14 2003 Subject: [XML-SIG] Empty string namespace - to bug or not to bug? In-Reply-To: <3F8AC99F.7050805@comcast.net> References: <5D99ABE6-F9F3-11D7-95D8-000393D153BC@mac.com> <20031013084005.GA31515@doxdesk.com> <3F8AC99F.7050805@comcast.net> Message-ID: <20031014091030.GA32079@doxdesk.com> Thomas B. Passin wrote: > However, the empty namespace is _supposed_ to be represented by > None. Incidentally, judging by this posting: http://lists.w3.org/Archives/Public/www-dom/2003JulSep/0141.html it would seem that in the next draft (CR) of DOM 3 Core, it will be required for implementations to accept '' as an input value to functions that accept a namespaceURI, as an alternative to None. How irritating! It seems to me this is *more* likely to cause bugs in XML applications... -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From petri.savolainen at iki.fi Tue Oct 14 13:51:33 2003 From: petri.savolainen at iki.fi (Petri Savolainen) Date: Tue Oct 14 13:50:48 2003 Subject: [XML-SIG] pyxml0.8.3/minidom/getElementById still broken??? Message-ID: Information at http://pyxml.sourceforge.net/topics/compliance.html claims it is fixed in this version. The same document seems to say a version of minidom that comes with python2.3 (running that) also works. Yet, running RH9 and having the right version installed (on python 2.3), it fails on me, not returning anything where it clearly should. I've doublechecked and doublechecked everything I can think of. Also, using the same PyXML, xpath query: "//*[id=%s]" % myid delivers the node I want to get. But not document.getElementById... I tried google and newsgroup search but found nothing... any pointers to what is the case? Thanks! Petri From hawkeye.parker at autodesk.com Tue Oct 14 21:38:10 2003 From: hawkeye.parker at autodesk.com (Hawkeye Parker) Date: Tue Oct 14 21:37:34 2003 Subject: [XML-SIG] validating with XML schema (long) Message-ID: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B44AD4D@msgusawmb02.ads.autodesk.com> hi all i need to validate the structure/content of some xml, and i'm parsing etc. with python. i've been learning a bit about XML Schema and i'd like to confirm some basic assumptions: -validation with XML Schema (or any other validation language) doesn't "just happen". i.e., just because you specify an .xsd file in your xml, you still need to explicitly "call" it to validate the xml. it must be, correct? assuming i'm right so far: in terms of validation, it seems that DTD is unwieldy and that XML Schema (.xsd) is a much better choice, except that there's little support for it in general, and specifically in python. in fact, there doesn't seem to be a whole lot of xml validation support at all . . . . this makes me think that: -there are other (more sensible?) ways to validate the xml, like parsing into DOM and then using python to validate according to your desires. maybe messy but obvious. -xml is new, validation of xml is newer, validation with XML Schema is newer yet. in anycase, i've gotten XSV and run it against a few of my own examples. again, i'm confused: XSV seems to validate the XML Schema itself (schemaErrors) as much as the XML (instanceErrors). i guess this is good. moreover, i was expecting to write something like this: XSV.validate('foo.xml', foo.xsd') which would raise an exception if anything went wrong with the validation of the XML (.xml) file according to the XML Schema file (.xsd). instead, i get an (opaque) xml object that i will have to parse futher, eventually to raise my own custom exceptions. not meaning to complain at all, this makes me think i should just do everything in python: then, at least i know exactly what i'm doing. why am i wrong? lastly, here's an example of some simple xml and an empty schema: here's the xsv output: XSV does not complain about this example, though none of the elements (, , etc.) are specified in the Schema. i expect i'm missing something basic about xml, validation, and XML Schema, but this is just the sort of *very bad* xml that i want to be able to catch during validation. any help, thoughts, nudges would be wonderful. thanks, hawkeye From tpassin at comcast.net Tue Oct 14 23:06:46 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Tue Oct 14 23:05:05 2003 Subject: [XML-SIG] validating with XML schema (long) In-Reply-To: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B44AD4D@msgusawmb02.ads.autodesk.com> References: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B44AD4D@msgusawmb02.ads.autodesk.com> Message-ID: <3F8CB9C6.5060708@comcast.net> Hawkeye Parker wrote: > i need to validate the structure/content of some xml, and i'm parsing etc. with python. i've been learning a bit about XML Schema and i'd like to confirm some basic assumptions: > > -validation with XML Schema (or any other validation language) doesn't "just happen". i.e., just because you specify an .xsd file in your xml, you still need to explicitly "call" it to validate the xml. it must be, correct? > You must make validation happen, but not by "calling" a specified xsd file. > assuming i'm right so far: in terms of validation, it seems that DTD is unwieldy and that XML Schema (.xsd) is a much better choice, Huh??? Most people think that xml schema is an unwieldy beast, not the dtd. >except that there's little support for it in general, and specifically in python. in fact, there doesn't seem to be a whole lot of xml validation support at all . . . . this makes me think that: > > -there are other (more sensible?) ways to validate the xml, like parsing into DOM and then using python to validate according to your desires. maybe messy but obvious. Not very feasible except for quite restricted kinds of validation, though. > -xml is new, validation of xml is newer, validation with XML Schema is newer yet. > > in anycase, i've gotten XSV and run it against a few of my own examples. again, i'm confused: XSV seems to validate the XML Schema itself (schemaErrors) as much as the XML (instanceErrors). i guess this is good. moreover, i was expecting to write something like this: > > XSV.validate('foo.xml', foo.xsd') > > which would raise an exception if anything went wrong with the validation of the XML (.xml) file according to the XML Schema file (.xsd). instead, i get an (opaque) xml object that i will have to parse futher, eventually to raise my own custom exceptions. xsv comes with an xslt stylesheet to make the results easier to read. You could start with that (command line operation) until you understand what xsv is telling you. > > lastly, here's an example of some simple xml and an empty schema: > > > > > > > > > > > > here's the xsv output: > > > instanceAssessed="true" instanceErrors="0" schemaErrors="0" > schemaLocs="None -> PPSiteBuilderSchema.xsd; None -> PPSiteBuilderSchema.xsd" > target="file:///C:/sandbox/site_builder/siteBuilder.xml" validation="lax" > version="XSV 2.5-2 of 2003/07/09 13:08:04"> > URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd" > outcome="success" source="schemaLoc"/> > URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd" > outcome="redundant" source="schemaLoc"/> > > > > XSV does not complain about this example, But it does tell what it did. In this case, xsv could not find any elements to validate (since the schema is empty), so it went to "lax" mode - validation='lax'. This means it did not check the elements it found. XML Schema validation can be either lax or strict - you have to read up on it. With lax validation, xsv found no errors since all schema elements were satisfied or at least not failed (since there were none). > though none of the elements (, , etc.) are specified in the Schema. i expect i'm missing something basic about xml, validation, and XML Schema, but this is just the sort of *very bad* xml that i want to be able to catch during validation. > Learn how to enforce strict validation, or just use a DTD, or go to RELAX NG. If you use a DTD, you have to use a validating parser and tell it to validate - Python can do this. Search Google, you should find enough information. Cheers, Tom P From Alexandre.Fayolle at logilab.fr Wed Oct 15 02:02:23 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Wed Oct 15 02:02:28 2003 Subject: [XML-SIG] pyxml0.8.3/minidom/getElementById still broken??? In-Reply-To: References: Message-ID: <20031015060223.GK13178@calvin> On Tue, Oct 14, 2003 at 08:51:33PM +0300, Petri Savolainen wrote: > Information at http://pyxml.sourceforge.net/topics/compliance.html claims it > is fixed in this version. The same document seems to say a version of > minidom that comes with python2.3 (running that) also works. > > Yet, running RH9 and having the right version installed (on python 2.3), > it fails on me, not returning anything where it clearly should. I've > doublechecked and doublechecked everything I can think of. Have you parsed the file with a validating parser ? This is required for getElementsById to work since the parser needs to parse the DTD in order to know which attributes are IDs. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From ht at cogsci.ed.ac.uk Wed Oct 15 02:20:00 2003 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Wed Oct 15 02:21:27 2003 Subject: [XML-SIG] validating with XML schema (long) In-Reply-To: <3F8CB9C6.5060708@comcast.net> (Thomas B. Passin's message of "Tue, 14 Oct 2003 23:06:46 -0400") References: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B44AD4D@msgusawmb02.ads.autodesk.com> <3F8CB9C6.5060708@comcast.net> Message-ID: Only one thing to add to Thomas's sensible reply -- if you want programmatic access to XSV, forget XSV.commandLine and go straight to XSV.driver, probably the 'runit' method. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] From and-xml at doxdesk.com Wed Oct 15 08:46:05 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Wed Oct 15 09:01:56 2003 Subject: [XML-SIG] pyxml0.8.3/minidom/getElementById still broken??? In-Reply-To: References: Message-ID: <20031015124605.GA27953@doxdesk.com> Petri Savolainen wrote: > Information at http://pyxml.sourceforge.net/topics/compliance.html claims it > is fixed in this version. The same document seems to say a version of > minidom that comes with python2.3 (running that) also works. Works for me. I haven't tried all possible testcases, but this is okay: from xml.dom import minidom content= ']>' doc= minidom.parseString(content) doc.getElementById('b') - resulting in element . If the IDness of the attribute concerned is declared in the external subset, minidom won't use that as it's a non-validating, non-external-entity-reading DOM implementation. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From fdrake at acm.org Wed Oct 15 09:57:03 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Oct 15 09:57:13 2003 Subject: [XML-SIG] pyxml0.8.3/minidom/getElementById still broken??? In-Reply-To: <20031015124605.GA27953@doxdesk.com> References: <20031015124605.GA27953@doxdesk.com> Message-ID: <16269.21039.724817.186733@grendel.zope.com> Andrew Clover writes: > If the IDness of the attribute concerned is declared in the external subset, > minidom won't use that as it's a non-validating, non-external-entity-reading > DOM implementation. That, theoretically, can be controlled using the DOM 3 Load & Save interface, though do so isn't as well tested as it should be. That will change as the spec for that approaches Recommendation status. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From petri.savolainen at iki.fi Wed Oct 15 10:30:24 2003 From: petri.savolainen at iki.fi (Petri Savolainen) Date: Wed Oct 15 10:27:02 2003 Subject: [XML-SIG] Re: pyxml0.8.3/minidom/getElementById still broken??? References: <20031015124605.GA27953@doxdesk.com> <16269.21039.724817.186733@grendel.zope.com> Message-ID: Alexandre, Fred, Andrew, thank you! Indeed, I had no DTD, no validation. I assumed that by default, attributes named "id" are treated as id attributes, for the purposes of getElementById. Using setIdAttribute('id') from DOM3 Load & Save (?) works great, too. Thanks again! Petri Fred L. Drake, Jr. wrote: > > Andrew Clover writes: > > If the IDness of the attribute concerned is declared in the external > > subset, minidom won't use that as it's a non-validating, > > non-external-entity-reading DOM implementation. > > That, theoretically, can be controlled using the DOM 3 Load & Save > interface, though do so isn't as well tested as it should be. That > will change as the spec for that approaches Recommendation status. > > > -Fred > From hawkeye.parker at autodesk.com Wed Oct 15 13:29:28 2003 From: hawkeye.parker at autodesk.com (Hawkeye Parker) Date: Wed Oct 15 13:29:05 2003 Subject: [XML-SIG] validating with XML schema (long) Message-ID: <9BDC80F712DD0C4CB5FBF48D2ED3DD8B44AD50@msgusawmb02.ads.autodesk.com> >> Is there a way to specify to >> XSV, "hey, validate it in strict mode. >There isn't, there should be. >ht found the above from 2000. is this still the case? (it would seem so after having a look at runit()) i've discovered: processContents="strict" am i on the right track? sorry for all the unlearned questions: i had no idea the curve would be so steep (for me). hawkeye > -----Original Message----- > From: Henry S. Thompson [mailto:ht@cogsci.ed.ac.uk] > Sent: Tuesday, October 14, 2003 11:20 PM > To: Thomas B. Passin > Cc: Hawkeye Parker; xml-sig@python.org > Subject: Re: [XML-SIG] validating with XML schema (long) > > > Only one thing to add to Thomas's sensible reply -- if you want > programmatic access to XSV, forget XSV.commandLine and go straight to > XSV.driver, probably the 'runit' method. > > ht > -- > Henry S. Thompson, HCRC Language Technology Group, > University of Edinburgh > Half-time member of W3C Team > 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) > 131 650-4440 > Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ > [mail really from me _always_ has this .sig -- mail without > it is forged spam] > From fdrake at acm.org Wed Oct 15 14:20:49 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Oct 15 14:36:46 2003 Subject: [XML-SIG] Snapshot of upcoming Expat 1.95.7 Message-ID: <16269.36865.322742.443709@grendel.zope.com> The following is the text of a message I just sent to the expat-discuss mailing list. There are a few people here who may be interested in this. ;-) I've made a second snapshot of the current state of Expat in CVS; it's available as http://www.libexpat.org/expat-2003-10-15.tar.gz The implementation hasn't really changed... but the declarations of every entry point have been re-written. The purpose of making a second snapshot available is to catch any unintentional changes to the API by the restructuring of the macros in the source. The changes don't really affect the internals of Expat, and should not affect the API used by any working application, but are pervasive. The changes were needed to address the concerns described in this patch report: http://sourceforge.net/tracker/index.php?func=detail&aid=820946&group_id=10127&atid=310127 If you have time to test this snapshot, please report your results to the list. If all goes well, I'll release Expat 1.95.7 this weekend. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From brian at sweetapp.com Thu Oct 16 00:25:18 2003 From: brian at sweetapp.com (Brian Quinlan) Date: Thu Oct 16 00:22:43 2003 Subject: [XML-SIG] ANN: Pyana 0.8.1 Released Message-ID: <00e801c3939d$824def40$21795418@dell1700> ANN: Pyana 0.8.1 Released You can find it here: http://sourceforge.net/project/showfiles.php?group_id=28142 Changes: - Updated for Xalan 1.6/Xerces 2.3 - Fixed a bug where some Unicode strings would be misinterpreted as numbers What is Pyana? Pyana is a Python interface to the Xalan C XSLT processor. Example: import Pyana inputExampleXSL = r''' ''' inputExampleXML = r''' Hello World! ''' print Pyana.transform2String( source=inputExampleXML, style=inputExampleXSL) # prints 'Hello World' Some more complex examples are provided here: http://pyana.sourceforge.net/examples/ Cheers, Brian From kamel.hamard at ericsson.com Thu Oct 16 11:36:58 2003 From: kamel.hamard at ericsson.com (Kamel Hamard (QC/EMC)) Date: Thu Oct 16 11:41:38 2003 Subject: [XML-SIG] Using getElementById Message-ID: <2DBF697D5B36014ABA46E66A96107DA072064F@lmc37.lmc.ericsson.se> Hi Guys, I'm using 4suite API and I know that I can't access to my element using this document's method. To be able to do that, I use minidom. I have a xml file and a dtd file where I'm declaring which attivutes are considered as IDs. Here is the code to create the document: import xml.dom.minidom file = "xml file path name" doc = xml.dom.minidom.parseString(open(file).read()) print doc.getElementById('myid') I'm getting None as result. Any body has an Idea why? Here is the sction of the DTD for the declaration: Thanks Kamel From noreply at sourceforge.net Thu Oct 16 17:28:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Oct 16 17:28:48 2003 Subject: [XML-SIG] [ pyxml-Bugs-825115 ] c14n: 'xmlns=""' attrib ordering problem on subelements Message-ID: Bugs item #825115, was opened at 2003-10-16 14:28 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=825115&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Keith Beattie (ksbeattie) Assigned to: Nobody/Anonymous (nobody) Summary: c14n: 'xmlns=""' attrib ordering problem on subelements Initial Comment: When c14n'ing a sub element of a document which set the default xmlns, that xmlns attribute is not ordered first. For exmaple the program: --- from xml.dom import minidom from xml.dom.ext import c14n str = """ """ dom = minidom.parseString(str) B_el = dom.getElementsByTagNameNS("b", "B")[0] print c14n.Canonicalize(B_el) --- prints: When it should place the 'xmlns="b"' first: ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=825115&group_id=6473 From tpassin at comcast.net Thu Oct 16 19:19:26 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Thu Oct 16 19:17:20 2003 Subject: [XML-SIG] Using getElementById In-Reply-To: <2DBF697D5B36014ABA46E66A96107DA072064F@lmc37.lmc.ericsson.se> References: <2DBF697D5B36014ABA46E66A96107DA072064F@lmc37.lmc.ericsson.se> Message-ID: <3F8F277E.1020901@comcast.net> Kamel Hamard (QC/EMC) wrote: > > I'm using 4suite API and I know that I can't access to my element using this document's method. > > To be able to do that, I use minidom. I have a xml file and a dtd file where I'm declaring which attivutes are considered as IDs. > > Here is the code to create the document: > > import xml.dom.minidom > file = "xml file path name" > doc = xml.dom.minidom.parseString(open(file).read()) > print doc.getElementById('myid') > > > > I'm getting None as result. Any body has an Idea why? > > Here is the sction of the DTD for the declaration: > > id ID #REQUIRED > version CDATA #REQUIRED > > Well, you did not show your xml source. The dtd only declares that attributes names "id" are of type ID, and you asked for. If your source has no "id" attribute with a value 'myid', there will be nothing to return. Cheers, Tom P From KSBeattie at lbl.gov Thu Oct 16 22:05:27 2003 From: KSBeattie at lbl.gov (Keith Beattie) Date: Thu Oct 16 22:05:31 2003 Subject: [XML-SIG] c14n and subset Message-ID: <3F8F4E67.5010107@lbl.gov> So, after staring at c14n.py for far too long and filing a bug against it (825115) (see thread titles "c14n attribute ordering problem?"), I gave the subset arg of Canonicalize() a try. This doesn't exibit #825115 but does have another problem, it doesn't include the non-xmlns attributes. For example: $ cat c14n_simple.py from xml.dom import minidom from xml.dom.ext import c14n str = """ """ dom = minidom.parseString(str) B_el = dom.getElementsByTagNameNS("urn:b", "B") print c14n.Canonicalize(dom, subset=B_el) $ python c14n_simple.py I was expecting: Any ideas on what might be wrong? ksb From JRBoverhof at lbl.gov Thu Oct 16 23:06:01 2003 From: JRBoverhof at lbl.gov (Joshua Boverhof) Date: Thu Oct 16 23:05:14 2003 Subject: [XML-SIG] c14n and subset References: <3F8F4E67.5010107@lbl.gov> Message-ID: <3F8F5C99.50909@lbl.gov> Is this what you're trying to do? -josh $ python c14n_simple.py [boverhof@dahmer Canonicalize]$ !cat cat c14n_simple.py from xml.dom import minidom from xml.dom.ext import c14n str = """ """ dom = minidom.parseString(str) B_el = dom.getElementsByTagNameNS("urn:b", "B") #print c14n.Canonicalize(dom, subset=B_el) print c14n.Canonicalize(B_el[0]) $ python c14n_simple.py Keith Beattie wrote: > So, after staring at c14n.py for far too long and filing a bug against > it (825115) (see thread titles "c14n attribute ordering problem?"), I > gave the subset arg of Canonicalize() a try. > > This doesn't exibit #825115 but does have another problem, it doesn't > include the non-xmlns attributes. For example: > > $ cat c14n_simple.py > from xml.dom import minidom > from xml.dom.ext import c14n > > str = """ > > """ > dom = minidom.parseString(str) > B_el = dom.getElementsByTagNameNS("urn:b", "B") > print c14n.Canonicalize(dom, subset=B_el) > > $ python c14n_simple.py > > > I was expecting: > > > Any ideas on what might be wrong? > > ksb > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From KSBeattie at lbl.gov Fri Oct 17 03:02:21 2003 From: KSBeattie at lbl.gov (Keith Beattie) Date: Fri Oct 17 03:06:38 2003 Subject: [XML-SIG] c14n and subset In-Reply-To: <3F8F5C99.50909@lbl.gov> References: <3F8F4E67.5010107@lbl.gov> <3F8F5C99.50909@lbl.gov> Message-ID: <3F8F93FD.9030308@lbl.gov> Joshua Boverhof wrote: > Is this what you're trying to do? > > #print c14n.Canonicalize(dom, subset=B_el) > print c14n.Canonicalize(B_el[0]) > > $ python c14n_simple.py > Well, no, because that isn't canonical form. :/ Canonical form for this would be: c14n ordering of attributes is: 1st) the default namespace declaration ('xnlms=""') first 2nd) namespace declarations, sorted by prefix (the part after the ':') 3rd) unqualified attributes, sorted by name 4th) qualified attributes, sorted by namespace URI I initially tried what you suggest which led me to filing bug #825115 (which at the time I thought was only a violation of my rule #1, but it appears to be a bit more). After using the subset arg of Canonicalize(), it follows rule #1, but appears to drop attributes under rules 3 & 4. Looking at the code in c14n.py it appears that all the proper logic is there (and the use of subset getting rule #1 right is encouraging), but honestly my head is spinning with all it's recursion and terse comments. I'm hoping that perhaps c14n.py can do what I need (an exc c14n of an xpath subset of a doc) and that my initial test of using a sub-element as the lone arg to Canonicalize (my filed bug) is a known limitation. ksb From rsalz at datapower.com Fri Oct 17 07:40:25 2003 From: rsalz at datapower.com (Rich Salz) Date: Fri Oct 17 07:40:29 2003 Subject: [XML-SIG] c14n and subset In-Reply-To: <3F8F93FD.9030308@lbl.gov> Message-ID: > Looking at the code in c14n.py it appears that all the proper logic is > there (and the use of subset getting rule #1 right is encouraging), but > honestly my head is spinning with all it's recursion and terse comments. yes. > I'm hoping that perhaps c14n.py can do what I need (an exc c14n of an > xpath subset of a doc) and that my initial test of using a sub-element > as the lone arg to Canonicalize (my filed bug) is a known limitation. no known limitations in the code. just bugs. /r$ -- Rich Salz Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From Alexandre.Fayolle at logilab.fr Fri Oct 17 09:02:42 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Oct 17 09:02:53 2003 Subject: [XML-SIG] expat and external DTDs Message-ID: <20031017130242.GG27316@calvin> Hello, I think this has been discussed here not too long ago, but I could not find it in the archives. I'm using expat through the SAX interface, and it chokes on relative links to the DTD in the DOCTYPE of the documents I parse. How can I tell expat not to attempt to read the DTD ? -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From JRBoverhof at lbl.gov Fri Oct 17 12:42:12 2003 From: JRBoverhof at lbl.gov (Joshua Boverhof) Date: Fri Oct 17 12:42:10 2003 Subject: [XML-SIG] c14n and subset References: <3F8F4E67.5010107@lbl.gov> <3F8F5C99.50909@lbl.gov> <3F8F93FD.9030308@lbl.gov> Message-ID: <3F901BE4.3090300@lbl.gov> I made a small change to the _implementation._do_element method The initial_other_attrs parameter was just being added to other_attrs, so 'xmlns' attributes passed in this way are not being sorted correctly. I made a new little list 'sort_these_attrs', and I dump everything in it and then it passes this FIRST test. Hope this helps. -josh $ python c14n_simple.py CORRECT: CANONIC: $ more c14n_simple.py from xml.dom import minidom from xml.dom.ext import c14n str = """ """ dom = minidom.parseString(str) B_el = dom.getElementsByTagNameNS("urn:b", "B") #print c14n.Canonicalize(dom, subset=B_el) print 'CORRECT: ' print "CANONIC: ", c14n.Canonicalize(B_el[0]) Keith Beattie wrote: > Joshua Boverhof wrote: > >> Is this what you're trying to do? >> >> #print c14n.Canonicalize(dom, subset=B_el) >> print c14n.Canonicalize(B_el[0]) >> >> $ python c14n_simple.py >> > > > Well, no, because that isn't canonical form. :/ > > Canonical form for this would be: > > > c14n ordering of attributes is: > 1st) the default namespace declaration ('xnlms=""') first > 2nd) namespace declarations, sorted by prefix (the part after the ':') > 3rd) unqualified attributes, sorted by name > 4th) qualified attributes, sorted by namespace URI > > I initially tried what you suggest which led me to filing bug #825115 > (which at the time I thought was only a violation of my rule #1, but > it appears to be a bit more). After using the subset arg of > Canonicalize(), it follows rule #1, but appears to drop attributes > under rules 3 & 4. > > Looking at the code in c14n.py it appears that all the proper logic is > there (and the use of subset getting rule #1 right is encouraging), > but honestly my head is spinning with all it's recursion and terse > comments. > > I'm hoping that perhaps c14n.py can do what I need (an exc c14n of an > xpath subset of a doc) and that my initial test of using a sub-element > as the lone arg to Canonicalize (my filed bug) is a known limitation. > > ksb > > -------------- next part -------------- #! /usr/bin/env python '''XML Canonicalization This module generates canonical XML of a document or element. http://www.w3.org/TR/2001/REC-xml-c14n-20010315 and includes a prototype of exclusive canonicalization http://www.w3.org/Signature/Drafts/xml-exc-c14n Requires PyXML 0.7.0 or later. Known issues if using Ft.Lib.pDomlette: 1. Unicode 2. does not white space normalize attributes of type NMTOKEN and ID? 3. seems to be include "\n" after importing external entities? Note, this version processes a DOM tree, and consequently it processes namespace nodes as attributes, not from a node's namespace axis. This permits simple document and element canonicalization without XPath. When XPath is used, the XPath result node list is passed and used to determine if the node is in the XPath result list, but little else. Authors: "Joseph M. Reagle Jr." "Rich Salz" $Date: 2003/01/25 11:41:21 $ by $Author: loewis $ ''' _copyright = '''Copyright 2001, Zolera Systems Inc. All Rights Reserved. Copyright 2001, MIT. All Rights Reserved. Distributed under the terms of: Python 2.0 License or later. http://www.python.org/2.0.1/license.html or W3C Software License http://www.w3.org/Consortium/Legal/copyright-software-19980720 ''' import string from xml.dom import Node try: from xml.ns import XMLNS except: class XMLNS: BASE = "http://www.w3.org/2000/xmlns/" XML = "http://www.w3.org/XML/1998/namespace" try: import cStringIO StringIO = cStringIO except ImportError: import StringIO _attrs = lambda E: (E.attributes and E.attributes.values()) or [] _children = lambda E: E.childNodes or [] _IN_XML_NS = lambda n: n.name.startswith("xmlns") _inclusive = lambda n: n.unsuppressedPrefixes == None # Does a document/PI has lesser/greater document order than the # first element? _LesserElement, _Element, _GreaterElement = range(3) def _sorter(n1,n2): '''_sorter(n1,n2) -> int Sorting predicate for non-NS attributes.''' i = cmp(n1.namespaceURI, n2.namespaceURI) if i: return i return cmp(n1.localName, n2.localName) def _sorter_ns(n1,n2): '''_sorter_ns((n,v),(n,v)) -> int "(an empty namespace URI is lexicographically least)."''' if n1[0] == 'xmlns': return -1 if n2[0] == 'xmlns': return 1 return cmp(n1[0], n2[0]) def _utilized(n, node, other_attrs, unsuppressedPrefixes): '''_utilized(n, node, other_attrs, unsuppressedPrefixes) -> boolean Return true if that nodespace is utilized within the node''' if n.startswith('xmlns:'): n = n[6:] elif n.startswith('xmlns'): n = n[5:] if (n=="" and node.prefix in ["#default", None]) or \ n == node.prefix or n in unsuppressedPrefixes: return 1 for attr in other_attrs: if n == attr.prefix: return 1 return 0 #_in_subset = lambda subset, node: not subset or node in subset _in_subset = lambda subset, node: subset is None or node in subset # rich's tweak class _implementation: '''Implementation class for C14N. This accompanies a node during it's processing and includes the parameters and processing state.''' # Handler for each node type; populated during module instantiation. handlers = {} def __init__(self, node, write, **kw): '''Create and run the implementation.''' self.write = write self.subset = kw.get('subset') self.comments = kw.get('comments', 0) self.unsuppressedPrefixes = kw.get('unsuppressedPrefixes') nsdict = kw.get('nsdict', { 'xml': XMLNS.XML, 'xmlns': XMLNS.BASE }) # Processing state. self.state = (nsdict, {'xml':''}, {}) #0422 if node.nodeType == Node.DOCUMENT_NODE: self._do_document(node) elif node.nodeType == Node.ELEMENT_NODE: self.documentOrder = _Element # At document element if not _inclusive(self): self._do_element(node) else: inherited = self._inherit_context(node) self._do_element(node, inherited) elif node.nodeType == Node.DOCUMENT_TYPE_NODE: pass else: raise TypeError, str(node) def _inherit_context(self, node): '''_inherit_context(self, node) -> list Scan ancestors of attribute and namespace context. Used only for single element node canonicalization, not for subset canonicalization.''' # Collect the initial list of xml:foo attributes. xmlattrs = filter(_IN_XML_NS, _attrs(node)) # Walk up and get all xml:XXX attributes we inherit. inherited, parent = [], node.parentNode while parent and parent.nodeType == Node.ELEMENT_NODE: for a in filter(_IN_XML_NS, _attrs(parent)): n = a.localName if n not in xmlattrs: xmlattrs.append(n) inherited.append(a) parent = parent.parentNode return inherited def _do_document(self, node): '''_do_document(self, node) -> None Process a document node. documentOrder holds whether the document element has been encountered such that PIs/comments can be written as specified.''' self.documentOrder = _LesserElement for child in node.childNodes: if child.nodeType == Node.ELEMENT_NODE: self.documentOrder = _Element # At document element self._do_element(child) self.documentOrder = _GreaterElement # After document element elif child.nodeType == Node.PROCESSING_INSTRUCTION_NODE: self._do_pi(child) elif child.nodeType == Node.COMMENT_NODE: self._do_comment(child) elif child.nodeType == Node.DOCUMENT_TYPE_NODE: pass else: raise TypeError, str(child) handlers[Node.DOCUMENT_NODE] = _do_document def _do_text(self, node): '''_do_text(self, node) -> None Process a text or CDATA node. Render various special characters as their C14N entity representations.''' if not _in_subset(self.subset, node): return s = string.replace(node.data, "&", "&") s = string.replace(s, "<", "<") s = string.replace(s, ">", ">") s = string.replace(s, "\015", " ") if s: self.write(s) handlers[Node.TEXT_NODE] = _do_text handlers[Node.CDATA_SECTION_NODE] = _do_text def _do_pi(self, node): '''_do_pi(self, node) -> None Process a PI node. Render a leading or trailing #xA if the document order of the PI is greater or lesser (respectively) than the document element. ''' if not _in_subset(self.subset, node): return W = self.write if self.documentOrder == _GreaterElement: W('\n') W('') if self.documentOrder == _LesserElement: W('\n') handlers[Node.PROCESSING_INSTRUCTION_NODE] = _do_pi def _do_comment(self, node): '''_do_comment(self, node) -> None Process a comment node. Render a leading or trailing #xA if the document order of the comment is greater or lesser (respectively) than the document element. ''' if not _in_subset(self.subset, node): return if self.comments: W = self.write if self.documentOrder == _GreaterElement: W('\n') W('') if self.documentOrder == _LesserElement: W('\n') handlers[Node.COMMENT_NODE] = _do_comment def _do_attr(self, n, value): ''''_do_attr(self, node) -> None Process an attribute.''' W = self.write W(' ') W(n) W('="') s = string.replace(value, "&", "&") s = string.replace(s, "<", "<") s = string.replace(s, '"', '"') s = string.replace(s, '\011', ' ') s = string.replace(s, '\012', ' ') s = string.replace(s, '\015', ' ') W(s) W('"') def _do_element(self, node, initial_other_attrs = []): '''_do_element(self, node, initial_other_attrs = []) -> None Process an element (and its children).''' # Get state (from the stack) make local copies. # ns_parent -- NS declarations in parent # ns_rendered -- NS nodes rendered by ancestors # ns_local -- NS declarations relevant to this element # xml_attrs -- Attributes in XML namespace from parent # xml_attrs_local -- Local attributes in XML namespace. ns_parent, ns_rendered, xml_attrs = \ self.state[0], self.state[1].copy(), self.state[2].copy() #0422 ns_local = ns_parent.copy() xml_attrs_local = {} # Divide attributes into NS, XML, and others. #other_attrs = initial_other_attrs[:] other_attrs = [] sort_these_attrs = initial_other_attrs[:] in_subset = _in_subset(self.subset, node) #for a in _attrs(node): sort_these_attrs += _attrs(node) for a in sort_these_attrs: if a.namespaceURI == XMLNS.BASE: n = a.nodeName if n == "xmlns:": n = "xmlns" # DOM bug workaround ns_local[n] = a.nodeValue elif a.namespaceURI == XMLNS.XML: if _inclusive(self) or (in_subset and _in_subset(self.subset, a)): #020925 Test to see if attribute node in subset xml_attrs_local[a.nodeName] = a #0426 else: if _in_subset(self.subset, a): #020925 Test to see if attribute node in subset other_attrs.append(a) #add local xml:foo attributes to ancestor's xml:foo attributes xml_attrs.update(xml_attrs_local) # Render the node W, name = self.write, None if in_subset: name = node.nodeName W('<') W(name) # Create list of NS attributes to render. ns_to_render = [] for n,v in ns_local.items(): # If default namespace is XMLNS.BASE or empty, # and if an ancestor was the same if n == "xmlns" and v in [ XMLNS.BASE, '' ] \ and ns_rendered.get('xmlns') in [ XMLNS.BASE, '', None ]: continue # "omit namespace node with local name xml, which defines # the xml prefix, if its string value is # http://www.w3.org/XML/1998/namespace." if n in ["xmlns:xml", "xml"] \ and v in [ 'http://www.w3.org/XML/1998/namespace' ]: continue # If not previously rendered # and it's inclusive or utilized if (n,v) not in ns_rendered.items() \ and (_inclusive(self) or \ _utilized(n, node, other_attrs, self.unsuppressedPrefixes)): ns_to_render.append((n, v)) # Sort and render the ns, marking what was rendered. ns_to_render.sort(_sorter_ns) for n,v in ns_to_render: self._do_attr(n, v) ns_rendered[n]=v #0417 # If exclusive or the parent is in the subset, add the local xml attributes # Else, add all local and ancestor xml attributes # Sort and render the attributes. if not _inclusive(self) or _in_subset(self.subset,node.parentNode): #0426 other_attrs.extend(xml_attrs_local.values()) else: other_attrs.extend(xml_attrs.values()) other_attrs.sort(_sorter) for a in other_attrs: self._do_attr(a.nodeName, a.value) W('>') # Push state, recurse, pop state. state, self.state = self.state, (ns_local, ns_rendered, xml_attrs) for c in _children(node): _implementation.handlers[c.nodeType](self, c) self.state = state if name: W('' % name) handlers[Node.ELEMENT_NODE] = _do_element def Canonicalize(node, output=None, **kw): '''Canonicalize(node, output=None, **kw) -> UTF-8 Canonicalize a DOM document/element node and all descendents. Return the text; if output is specified then output.write will be called to output the text and None will be returned Keyword parameters: nsdict: a dictionary of prefix:uri namespace entries assumed to exist in the surrounding context comments: keep comments if non-zero (default is 0) subset: Canonical XML subsetting resulting from XPath (default is []) unsuppressedPrefixes: do exclusive C14N, and this specifies the prefixes that should be inherited. ''' if output: apply(_implementation, (node, output.write), kw) else: s = StringIO.StringIO() apply(_implementation, (node, s.write), kw) return s.getvalue() From martin at v.loewis.de Fri Oct 17 18:19:18 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Fri Oct 17 18:19:33 2003 Subject: [XML-SIG] expat and external DTDs In-Reply-To: <20031017130242.GG27316@calvin> References: <20031017130242.GG27316@calvin> Message-ID: Alexandre Fayolle writes: > I'm using expat through the SAX interface, and it chokes on relative > links to the DTD in the DOCTYPE of the documents I parse. How can I tell > expat not to attempt to read the DTD ? There are two ways: one is to declare the document standalone (through the XML header), and the other one is to SetParamEntityParsing to XML_PARAM_ENTITY_PARSING_NEVER (or some such). Regards, Martin From ashearer at shearersoftware.com Sat Oct 18 15:58:31 2003 From: ashearer at shearersoftware.com (Andrew Shearer) Date: Sat Oct 18 15:58:42 2003 Subject: [XML-SIG] expat and external DTDs In-Reply-To: Message-ID: <72AAE804-01A5-11D8-A5EB-000393B3AC06@shearersoftware.com> I had a similar problem with the first cut of a parser for Apple's PList format. It worked fine when I was connected to the Internet, but failed when I wasn't. The documents specified an Internet-based DTD when they really could stand alone, so I used code similar to the following: parser = xml.sax.make_parser() parser.setFeature(xml.sax.feature_external_ges, 0) parser.setFeature(xml.sax.feature_external_pes, 0) The complete code is at . On Friday, October 17, 2003, at 12:02 PM, xml-sig-request@python.org wrote: > From: Alexandre Fayolle > I'm using expat through the SAX interface, and it chokes on relative > links to the DTD in the DOCTYPE of the documents I parse. How can I > tell > expat not to attempt to read the DTD ? -- Andrew Shearer http://www.shearersoftware.com/ From fdrake at acm.org Tue Oct 21 00:26:45 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Oct 21 00:27:00 2003 Subject: [XML-SIG] Expat 1.95.7 released Message-ID: <16276.46469.146115.443017@grendel.zope.com> Expat 1.95.7 is now available for download. This release fixes a variety of problems, including the truly nasty "enum XML_Status" problem, which affected many compilers (just not those used by the maintainers at the time, unfortunately), and many build issues that affect one platform or another. Existing working applications which use Expat should not need any changes to work with the new version, though application maintainers should review the changes carefully to determine the potential impact of the specific changes that have been made. The following changes have been made since the previous release: - Fixed enum XML_Status issue (reported on SourceForge many times), so compilers that are properly picky will be happy. - Introduced an XMLCALL macro to control the calling convention used by the Expat API; this macro should be used to annotate prototypes and definitions of callback implementations in code compiled with a calling convention other than the default convention for the host platform. - Improved ability to build without the configure-generated expat_config.h header. This is useful for applications which embed Expat rather than linking in the library. - Fixed a variety of bugs: see SF issues 458907, 609603, 676844, 679754, 692878, 692964, 695401, 699323, 699487, 820946. - Improved hash table lookups. - Added more regression tests and improved documentation. Additional information about Expat can be found at the Expat website: http://www.libexpat.org/ This release can be downloaded from SourceForge: http://sourceforge.net/project/showfiles.php?group_id=10127 -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From tree at basistech.com Tue Oct 21 12:23:29 2003 From: tree at basistech.com (Tom Emerson) Date: Tue Oct 21 12:24:08 2003 Subject: [XML-SIG] Good visual diff for XML files? Message-ID: <16277.23937.786680.637520@magrathea.basistech.com> Is anyone aware of a good visual diff tool for XML documents? Rational/IBM has a nice one in the latest ClearCase release, but it isn't easily used for non-ClearCase objects. TIA, -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From Nicolas.Chauvat at logilab.fr Tue Oct 21 12:28:47 2003 From: Nicolas.Chauvat at logilab.fr (Nicolas Chauvat) Date: Tue Oct 21 12:29:00 2003 Subject: [XML-SIG] Good visual diff for XML files? In-Reply-To: <16277.23937.786680.637520@magrathea.basistech.com> References: <16277.23937.786680.637520@magrathea.basistech.com> Message-ID: <20031021162847.GF11238@logilab.fr> On Tue, Oct 21, 2003 at 12:23:29PM -0400, Tom Emerson wrote: > Is anyone aware of a good visual diff tool for XML documents? > Rational/IBM has a nice one in the latest ClearCase release, but it > isn't easily used for non-ClearCase objects. http://www.logilab.org/projects/xmldiff HTH, -- Nicolas Chauvat http://www.logilab.com - "Mais o? est donc Ornicar ?" - LOGILAB, Paris (France) From noreply at sourceforge.net Tue Oct 21 17:01:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Oct 21 17:01:23 2003 Subject: [XML-SIG] [ pyxml-Bugs-827796 ] c14n: \011, \012 and \015 incorrectly escaped in XML strings Message-ID: Bugs item #827796, was opened at 2003-10-21 21:01 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=827796&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benoit Goudreault-Emond (bge) Assigned to: Nobody/Anonymous (nobody) Summary: c14n: \011, \012 and \015 incorrectly escaped in XML strings Initial Comment: It seems the c14n algorithm incorrectly replaces \011 (tab) by (without semicolon!). If one reads http://www.w3.org/TR/xml-c14n#Example-Chars , it is clear from the example that \011 should be replaced by The same goes for \012 (LF) and \015 (CR). Here's some test code that demonstrates the problem: ---- #! /usr/bin/env python import sys from xml.dom import minidom from xml.dom.ext import c14n example = '''''' dom = minidom.parseString(example) print c14n.Canonicalize(dom) ---- Prints: According to the spec, it should print: Interestingly enough, test_c14n.py's test data reflects the problem--that is, eg4's expected result, base64 encoded, does not contain the semicolons :{) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=827796&group_id=6473 From noreply at sourceforge.net Tue Oct 21 17:05:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Oct 21 17:07:48 2003 Subject: [XML-SIG] [ pyxml-Patches-827802 ] Fix to bug ID 827796 Message-ID: Patches item #827802, was opened at 2003-10-21 21:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=827802&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benoit Goudreault-Emond (bge) Assigned to: Nobody/Anonymous (nobody) Summary: Fix to bug ID 827796 Initial Comment: This patch fixes the c14n.py problem with escaping characters \011, \012 and \015 in attributes. It also contains an update to test_c14n.py so the test reflects the W3C's example 4 in the C14N TR (http://www.w3.org/TR/xml-c14n#Example-Chars) The patch assumes both files are in the current directory--sorry about that. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=827802&group_id=6473 From mnunes at inf.pucrs.br Wed Oct 22 23:25:25 2003 From: mnunes at inf.pucrs.br (mnunes@inf.pucrs.br) Date: Wed Oct 22 21:25:41 2003 Subject: [XML-SIG] Python module for OWL Message-ID: <200310230125.h9N1PPXb003440@toledo.inf.pucrs.br> Hi Floks, I donīt know if this is in the scope of this SIG but do you know if thereīs anybody working on a Python module to deal with the OWL language. OWL is the brand new XML-based language for ontology description proposed by the W3C. Greedings, Marcelo -- Marcelo Pereira Nunes http://www.inf.pucrs.br/~mnunes/ From phthenry at earthlink.net Thu Oct 23 02:03:01 2003 From: phthenry at earthlink.net (Paul Tremblay) Date: Thu Oct 23 02:03:29 2003 Subject: [XML-SIG] turning off dtd resolver in sax Message-ID: <20031023060301.GA20683@localhost.localdomain> Is there a way to run SAX if the dtd in the document points to a url, and the url cannot be retrieved? I can parse my documents with SAX so long as I am connected to the internet. But when I am not connected, I get an error that the url cannot be found. I could just delete the dtd, but this is kind of a hack. I am distributing code, and I would like for the XML document which I am parsing to have a dtd. Here is a snippet of my code: parser = xml.sax.make_parser() parser.setFeature(feature_namespaces, 0) parser.setErrorHandler(ErrorHandler()) section_handler = FormSections( write_obj = write_obj, highest_level = self.__highest_level, ) parser.setContentHandler(section_handler) parser.parse(self.__input_file) thanks Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From gia at webde-ag.de Thu Oct 23 03:41:36 2003 From: gia at webde-ag.de (Gisbert Amm) Date: Thu Oct 23 03:41:43 2003 Subject: [XML-SIG] Python module for OWL Message-ID: <74ADFA8C453ED611A71E00508BBBA13510E9477C@exchange1.cinetic.de> > if there?s anybody working on a Python module to deal > with the OWL language. > Did you have a look at Tim Berners-Lee's Closed World Machine already? It's written in python and has some basic OWL features that might be inspiring. http://www.w3.org/2000/10/swap/doc/cwm.html Regards, Gisbert Amm From Radovan.Chytracek at cern.ch Thu Oct 23 08:17:05 2003 From: Radovan.Chytracek at cern.ch (Radovan Chytracek) Date: Thu Oct 23 08:17:11 2003 Subject: [XML-SIG] turning off dtd resolver in sax Message-ID: Hi, I was having the same problem and tried to discuss it here without any feedback, for details see: http://mail.python.org/pipermail/xml-sig/2003-September/009876.html where I have provided a solution to this. However, it does not work as is because PyXML has for some reasons disabled EntityResolver proper call sequence. I had to hack my local Python installation to make it work. Hope this helps you a bit. Radovan > -----Original Message----- > From: Paul Tremblay [mailto:phthenry@earthlink.net] > Sent: Thursday, October 23, 2003 8:03 AM > To: xml-sig@python.org > Subject: [XML-SIG] turning off dtd resolver in sax > > > Is there a way to run SAX if the dtd in the document points > to a url, and the url cannot be retrieved? I can parse my > documents with SAX so long as I am connected to the internet. > But when I am not connected, I get an error that the url > cannot be found. > > I could just delete the dtd, but this is kind of a hack. I am > distributing code, and I would like for the XML document > which I am parsing to have a dtd. > > Here is a snippet of my code: > > > parser = xml.sax.make_parser() > parser.setFeature(feature_namespaces, 0) > parser.setErrorHandler(ErrorHandler()) > section_handler = FormSections( > write_obj = write_obj, > highest_level = self.__highest_level, > ) > > parser.setContentHandler(section_handler) > parser.parse(self.__input_file) > > thanks > > Paul > > > > > -- > > ************************ > *Paul Tremblay * > *phthenry@earthlink.net* > ************************ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > From ashearer at shearersoftware.com Thu Oct 23 22:02:42 2003 From: ashearer at shearersoftware.com (Andrew Shearer) Date: Thu Oct 23 22:02:48 2003 Subject: [XML-SIG] Re: turning off dtd resolver in sax In-Reply-To: Message-ID: <27402551-05C6-11D8-86D9-000393B3AC06@shearersoftware.com> Apologies to the list for posting the same answer twice in a week, but hopefully this will be helpful: > I had a similar problem with the first cut of a parser for Apple's > PList format. It worked fine when I was connected to the Internet, but > failed when I wasn't. The documents specified an Internet-based DTD > when they really could stand alone, so I used code similar to the > following: > > parser = xml.sax.make_parser() > parser.setFeature(xml.sax.feature_external_ges, 0) > parser.setFeature(xml.sax.feature_external_pes, 0) > > The complete code is at > . On Thursday, October 23, 2003, at 12:10 PM, Paul Tremblay wrote: > Is there a way to run SAX if the dtd in the document points to a url, > and the url cannot be retrieved? I can parse my documents with SAX so > long as I am connected to the internet. But when I am not connected, I > get an error that the url cannot be found. > > I could just delete the dtd, but this is kind of a hack. I am > distributing code, and I would like for the XML document which I am > parsing to have a dtd. -- Andrew Shearer http://www.shearersoftware.com/ From noreply at sourceforge.net Fri Oct 24 19:43:31 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Oct 24 19:43:35 2003 Subject: [XML-SIG] [ pyxml-Patches-829905 ] c14n.py fix for bug #825115 Message-ID: Patches item #829905, was opened at 2003-10-24 16:43 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=829905&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Keith Beattie (ksbeattie) Assigned to: Nobody/Anonymous (nobody) Summary: c14n.py fix for bug #825115 Initial Comment: The problem is that when inheriting attributes via the initial_other_attrs arg of _do_element(), the xmlns ones are not sorted by _sorter_ns() as they should be. The attached patch to c14n.py causes such inherited attribs to be sorted properly. Also attached is a patch to test_c14n.py to add a test case for this bug. What is very curious about this is that the test case I've added will pass with the old code. This is because PyExpat dom parser has a bug in it that counteracts this one. Namely that default namespace attrib (xmlns='foo') are given a localName of None by pyexpat and not 'xmlns'. Since default namespace attribs are ordered first by c14n, when using PyExpat, c14n luckly does the right thing. Changing text_c14n.py to use another dom is probably a good idea... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=829905&group_id=6473 From noreply at sourceforge.net Fri Oct 24 21:12:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Oct 24 21:12:46 2003 Subject: [XML-SIG] [ pyxml-Bugs-829930 ] localName is None for default namespace Message-ID: Bugs item #829930, was opened at 2003-10-24 18:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=829930&group_id=6473 Category: pyexpat Group: None Status: Open Resolution: None Priority: 5 Submitted By: Keith Beattie (ksbeattie) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: localName is None for default namespace Initial Comment: I believe that for default namespaces, localName should be set to 'xmlns' not None. minidom and 4Suite do this and PyExpat uses None. Attached is a short script that demonstrates the difference between minidom, 4Suite and PyExpat in this respect. fwiw, this was discovered by noticing that my regression test for another bug (#825115) in c14n.py failed to fail on the original code. See my comment in patch #829905, if you're curious. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=829930&group_id=6473 From phthenry at earthlink.net Sat Oct 25 14:55:01 2003 From: phthenry at earthlink.net (Paul Tremblay) Date: Sat Oct 25 14:56:27 2003 Subject: [XML-SIG] turning of dtd checker Message-ID: <20031025185501.GA1766@localhost.localdomain> Is there a way to run SAX if the dtd in the document points to a url, and the url cannot be retrieved? I can parse my documents with SAX so long as I am connected to the internet. But when I am not connected, I get an error that the url cannot be found. I could just delete the dtd, but this is kind of a hack. I am distributing code, and I would like for the XML document which I am parsing to have a dtd. Here is a snippet of my code: parser = xml.sax.make_parser() parser.setFeature(feature_namespaces, 0) parser.setErrorHandler(ErrorHandler()) section_handler = FormSections( write_obj = write_obj, highest_level = self.__highest_level, ) parser.setContentHandler(section_handler) parser.parse(self.__input_file) Thanks Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From phthenry at earthlink.net Sat Oct 25 15:15:27 2003 From: phthenry at earthlink.net (Paul Tremblay) Date: Sat Oct 25 15:15:50 2003 Subject: [XML-SIG] turn off dtd check in SAX Message-ID: <1067109344.1957.2.camel@localhost.localdomain> Is there a way to run SAX if the dtd in the document points to a url, and the url cannot be retrieved? I can parse my documents with SAX so long as I am connected to the internet. But when I am not connected, I get an error that the url cannot be found. I could just delete the dtd, but this is kind of a hack. I am distributing code, and I would like for the XML document which I am parsing to have a dtd. Here is a snippet of my code: parser = xml.sax.make_parser() parser.setFeature(feature_namespaces, 0) parser.setErrorHandler(ErrorHandler()) section_handler = FormSections( write_obj = write_obj, highest_level = self.__highest_level, ) parser.setContentHandler(section_handler) parser.parse(self.__input_file) Thanks Paul From phthenry at earthlink.net Sat Oct 25 15:22:10 2003 From: phthenry at earthlink.net (Paul Tremblay) Date: Sat Oct 25 15:22:28 2003 Subject: [XML-SIG] turning of dtd checker In-Reply-To: <20031025185501.GA1766@localhost.localdomain> References: <20031025185501.GA1766@localhost.localdomain> Message-ID: <20031025192210.GA1962@localhost.localdomain> Apologies for posting my question three times. I didn't think my email was getting through. In fact, the mail was getting put in a mail box I didn't know existed. Sorry again! Paul On Sat, Oct 25, 2003 at 02:55:01PM -0400, Paul Tremblay wrote: > Date: Sat, 25 Oct 2003 14:55:01 -0400 > From: Paul Tremblay > To: xml-sig@python.org > User-Agent: Mutt/1.4i > Subject: [XML-SIG] turning of dtd checker > > > Is there a way to run SAX if the dtd in the document points to a url, > and the url cannot be retrieved? I can parse my documents with SAX so > long as I am connected to the internet. But when I am not connected, I > get an error that the url cannot be found. > > I could just delete the dtd, but this is kind of a hack. I am > distributing code, and I would like for the XML document which I am > parsing to have a dtd. > > Here is a snippet of my code: > > > parser = xml.sax.make_parser() > parser.setFeature(feature_namespaces, 0) > parser.setErrorHandler(ErrorHandler()) > section_handler = FormSections( > write_obj = write_obj, > highest_level = self.__highest_level, > ) > > parser.setContentHandler(section_handler) > parser.parse(self.__input_file) > > Thanks > > Paul > > -- > > ************************ > *Paul Tremblay * > *phthenry@earthlink.net* > ************************ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From martin at v.loewis.de Sun Oct 26 05:21:18 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sun Oct 26 05:21:40 2003 Subject: [XML-SIG] turning of dtd checker In-Reply-To: <20031025185501.GA1766@localhost.localdomain> References: <20031025185501.GA1766@localhost.localdomain> Message-ID: Paul Tremblay writes: > Is there a way to run SAX if the dtd in the document points to a url, > and the url cannot be retrieved? You need to turn off processing of external general entities: p.setFeature("http://xml.org/sax/features/external-general-entities",False) In that case, the parser won't attempt to resolve the external DTD subset, or references to external entities defined in the internal DTD subset. Regards, Martin From phthenry at earthlink.net Sun Oct 26 13:48:24 2003 From: phthenry at earthlink.net (Paul Tremblay) Date: Sun Oct 26 13:49:02 2003 Subject: [XML-SIG] turning of dtd checker In-Reply-To: References: <20031025185501.GA1766@localhost.localdomain> Message-ID: <20031026184824.GA1754@localhost.localdomain> On Sun, Oct 26, 2003 at 11:21:18AM +0100, Martin v. L?wis wrote: > > Paul Tremblay writes: > > > Is there a way to run SAX if the dtd in the document points to a url, > > and the url cannot be retrieved? > > You need to turn off processing of external general entities: > > p.setFeature("http://xml.org/sax/features/external-general-entities",False) > Thanks! That works exactly like I want. Paul -- ************************ *Paul Tremblay * *phthenry@earthlink.net* ************************ From Radovan.Chytracek at cern.ch Mon Oct 27 05:15:15 2003 From: Radovan.Chytracek at cern.ch (Radovan Chytracek) Date: Mon Oct 27 05:15:19 2003 Subject: [XML-SIG] turning of dtd checker Message-ID: > Paul Tremblay writes: > > > Is there a way to run SAX if the dtd in the document points > to a url, > > and the url cannot be retrieved? > > You need to turn off processing of external general entities: > > p.setFeature("http://xml.org/sax/features/external-general-ent > ities",False) > > In that case, the parser won't attempt to resolve the > external DTD subset, or references to external entities > defined in the internal DTD subset. > > Regards, > Martin Well, this is solution only to the problem where one wants to parse an XML document successfully in any case whether a DTD is accessible or not. This might not be a correct approach because even if parsing in non-validating mode the DTD might still contain some entities important to the dosument structure. On the other hand, nobody has anwereed my question where I needed to guarantee, that an XML document parsing succeeds including references to all external entities even in the case a DTD is not physically accessible. This is required for disconnected mode of operation where user has no network connection available on his/her laptop for example. The solution to this is to use EntityResolver which for some reaqson does not really work as expected. Well it works if one enables it but my feeling is that for some reasons it has been intentionally disabled so even if one implements and registers EntityResolver it is not called and causes not really intuitive exception(s) being raised. I would like to know what's a showstooper for EntityResolver proper function. I suspect there is some infrastructure missing between SAX API and parsers in PyXML. Cheers Radovan From martin at v.loewis.de Mon Oct 27 18:43:01 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon Oct 27 18:43:17 2003 Subject: [XML-SIG] turning of dtd checker In-Reply-To: References: Message-ID: <3F9DAD85.3020102@v.loewis.de> Radovan Chytracek wrote: > On the other hand, nobody has anwereed my question where I needed to > guarantee, that an XML document parsing succeeds including references to > all external entities even in the case a DTD is not physically > accessible. This is required for disconnected mode of operation where > user has no network connection available on his/her laptop for example. > The solution to this is to use EntityResolver Right. > which for some reaqson > does not really work as expected. Why do you say that? What did you do, what did you expect to happen, and what happened instead? > Well it works if one enables it but my > feeling is that for some reasons it has been intentionally disabled so > even if one implements and registers EntityResolver it is not called and > causes not really intuitive exception(s) being raised. Nothing like this is intentionally happening. > I would like to know what's a showstooper for EntityResolver proper > function. Why do you say it does not function properly? Regards, Martin From Radovan.Chytracek at cern.ch Tue Oct 28 03:55:54 2003 From: Radovan.Chytracek at cern.ch (Radovan Chytracek) Date: Tue Oct 28 03:55:59 2003 Subject: [XML-SIG] turning of dtd checker Message-ID: Hi Martin, see: http://mail.python.org/pipermail/xml-sig/2003-September/009871.html http://mail.python.org/pipermail/xml-sig/2003-September/009875.html http://mail.python.org/pipermail/xml-sig/2003-September/009876.html http://mail.python.org/pipermail/xml-sig/2003-October/009937.html Any comments welcome. Cheers Radovan > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin@v.loewis.de] > Sent: Tuesday, October 28, 2003 12:43 AM > To: Radovan Chytracek > Cc: xml-sig@python.org > Subject: Re: [XML-SIG] turning of dtd checker > > > Radovan Chytracek wrote: > > On the other hand, nobody has anwereed my question where I > needed to > > guarantee, that an XML document parsing succeeds including > references > > to all external entities even in the case a DTD is not physically > > accessible. This is required for disconnected mode of > operation where > > user has no network connection available on his/her laptop for > > example. The solution to this is to use EntityResolver > > Right. > > > which for some reaqson > > does not really work as expected. > > Why do you say that? What did you do, what did you expect to > happen, and what happened instead? > > > Well it works if one enables it but my > > feeling is that for some reasons it has been intentionally > disabled so > > even if one implements and registers EntityResolver it is > not called > > and causes not really intuitive exception(s) being raised. > > Nothing like this is intentionally happening. > > > I would like to know what's a showstooper for EntityResolver proper > > function. > > Why do you say it does not function properly? > > Regards, > Martin > > From martin at v.loewis.de Tue Oct 28 15:47:43 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Tue Oct 28 15:48:08 2003 Subject: [XML-SIG] turning of dtd checker In-Reply-To: References: Message-ID: "Radovan Chytracek" writes: > http://mail.python.org/pipermail/xml-sig/2003-September/009871.html > http://mail.python.org/pipermail/xml-sig/2003-September/009875.html > http://mail.python.org/pipermail/xml-sig/2003-September/009876.html > http://mail.python.org/pipermail/xml-sig/2003-October/009937.html > > Any comments welcome. I still can't tell, from these messages, what precisely you are doing. Can you create a small example, please, with a Python script, an XML file, and a DTD file? The bug 814935, which being a true bug, should not matter at all. You are looking at the source code of XMLFilterBase, which is never used for anything. Instead, the standard entity resolver is xml.sax.handler.EntityResolver, which does not suffer from the problem you see. Regards, Martin From roman at interview-machine.com Wed Oct 29 05:55:37 2003 From: roman at interview-machine.com (roman@interview-machine.com) Date: Wed Oct 29 05:55:47 2003 Subject: [XML-SIG] implementation of lookup... methods DOM Level 3 Message-ID: <32902.217.187.88.185.1067424937.squirrel@interview-machine.com> Hi list, In the current implementation of the DOM I missed some very useful methods from DOM Level 3, lookupPrefix and lookupNamespaceURI of the interface Node. In the spec there is also a good algorithm for these methods, so I took them and implemented the methods as helper functions. They work very well and it should be easy to integrate them as methods of Node. I attach my helper module in case someone is interested... Ciao, Roman (BTW: I am new on this list, so please apologize if this issue has been discussed before or something like that) -------------- next part -------------- A non-text attachment was scrubbed... Name: domhelper.py Type: application/octet-stream Size: 6335 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20031029/7e452d1b/domhelper.obj From Radovan.Chytracek at cern.ch Wed Oct 29 07:08:31 2003 From: Radovan.Chytracek at cern.ch (Radovan Chytracek) Date: Wed Oct 29 07:08:35 2003 Subject: [XML-SIG] turning of dtd checker Message-ID: Hi Martin, that is exactly my problem. I am using a chain of XMlFilterBase objects in my application and that is where this problem has shown up. Do you have any suggestion how to proceed in case one wants to use XMFilterBase together with EntityResolver? BTW, the problem has appeared even for the example code I shown in one the e-mails bellow (e.g. no XmlFilterBse used at all). But only after I have installed PyXML 8.3.0, not before. Any suggestion about that? Ragards Radovan > "Radovan Chytracek" writes: > > > http://mail.python.org/pipermail/xml-sig/2003-September/009871.html > > http://mail.python.org/pipermail/xml-sig/2003-September/009875.html > > http://mail.python.org/pipermail/xml-sig/2003-September/009876.html > > http://mail.python.org/pipermail/xml-sig/2003-October/009937.html > > > > Any comments welcome. > > I still can't tell, from these messages, what precisely you > are doing. Can you create a small example, please, with a > Python script, an XML file, and a DTD file? > > The bug 814935, which being a true bug, should not matter at > all. You are looking at the source code of XMLFilterBase, > which is never used for anything. Instead, the standard > entity resolver is xml.sax.handler.EntityResolver, which does > not suffer from the problem you see. > > Regards, > Martin > > From tlo at alias.com Wed Oct 29 16:47:25 2003 From: tlo at alias.com (Terence Lo) Date: Wed Oct 29 16:49:07 2003 Subject: [XML-SIG] XML Objectify.. how to iterate through child tags? Message-ID: <00ef01c39e66$3deab2b0$8d411dc6@ms.aliaswavefront.com> Say I have an xml file (test.xml) that contains: aaa bbb ccc source = XML_Objectify('test.xml').make_instance() How do I dynamically iterate through the children tags and values of ? I'd like to do something similar to what DOM does, (ie. childnode[0], childnode[1].. etc) Is this possible? Anyone know? Any help would be greatly appreciated. Thanks in advance. t From amk at amk.ca Thu Oct 30 14:27:18 2003 From: amk at amk.ca (amk@amk.ca) Date: Thu Oct 30 14:27:31 2003 Subject: [XML-SIG] HTML parsing: anyone use formatter? Message-ID: <20031030192718.GA13220@rogue.amk.ca> [Crossposted to python-dev, web-sig, and xml-sig. Followups to web-sig@python.org, please.] I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for all the missing elements. I've currently been adding just empty methods to the HTMLParser class, but the existing methods actually help render the HTML by calling methods on a Formatter object. For example, the definitions for the H1 element look like this: def start_h1(self, attrs): self.formatter.end_paragraph(1) self.formatter.push_font(('h1', 0, 1, 0)) def end_h1(self): self.formatter.end_paragraph(1) self.formatter.pop_font() Question: should I continue supporting this in new methods? This can only go so far; a tag such as or is easy for me to handle, but handling
or or would require greatly expanding the Formatter class's repertoire. I suppose the more general question is, does anyone use Python's formatter module? Do we want to keep it around, or should htmllib be pushed toward doing just HTML parsing? formatter.py is a long way from being able to handle modern web pages and it would be a lot of work to build a decent renderer. --amk From fincher.8 at osu.edu Thu Oct 30 16:03:15 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Thu Oct 30 15:04:57 2003 Subject: [XML-SIG] Re: [Python-Dev] HTML parsing: anyone use formatter? In-Reply-To: <20031030192718.GA13220@rogue.amk.ca> References: <20031030192718.GA13220@rogue.amk.ca> Message-ID: <200310301603.15437.fincher.8@osu.edu> On Thursday 30 October 2003 02:27 pm, amk@amk.ca wrote: > I suppose the more general question is, does anyone use Python's formatter > module? Do we want to keep it around, or should htmllib be pushed toward > doing just HTML parsing? formatter.py is a long way from being able to > handle modern web pages and it would be a lot of work to build a decent > renderer. I've never used it myself, though I'll admit that some software I've used (for searching the IMDB) does use it. Jeremy From markus_jais at yahoo.de Thu Oct 30 16:19:23 2003 From: markus_jais at yahoo.de (Markus Jais) Date: Thu Oct 30 16:20:50 2003 Subject: [XML-SIG] question on xmlproc source code Message-ID: <1067548765.1920.6.camel@eagle> hello I am trying to understand some of xmlproc's code. but now I have a maybe somewhat weird question. the method "parse_end_tag" in xmlproc.py seems to have a problem with indentation on my system. (I have the current cvs version of PyXML) but the code seems to work. so this is probably a problem with my system. but I never had this with any other python source code before. and the whole rest of xmlproc is ok. for example there is somethin like this: if name != elem: self.report_error(3023,(name,elem)) I hope it is now screwed up by the mail client. I tried it with 3 editors (vim, jedit and xemacs) and both with only tabs and tabs emulated with spaces. I know this may be strange but has anyone had the some problem or any hints what to do ?? thanks in advance Markus From noreply at sourceforge.net Thu Oct 30 21:09:06 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Oct 30 21:09:11 2003 Subject: [XML-SIG] [ pyxml-Patches-833461 ] Updating ns.py with c14n standard URI Message-ID: Patches item #833461, was opened at 2003-10-30 18:09 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=833461&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Keith Beattie (ksbeattie) Assigned to: Nobody/Anonymous (nobody) Summary: Updating ns.py with c14n standard URI Initial Comment: Attached is a patch for updating the identifier for canonicalization. This URI is taken from the c14n spec at: http://www.w3.org/TR/xml-c14n which is the same as the identifier XML Dig Sig wants as described in it's spec: http://www.w3.org/TR/xmldsig-core/#sec-c14nAlg I raised this question on the list in this post: http://mail.python.org/pipermail/xml-sig/2003-June/009598.html and got a positive reply, so here it is. Also added is a definition for exclusive c14n with comments, which follows the same pattern as 'regular' c14n, as described in the spec for exc-c14n here: http://www.w3.org/TR/xml-exc-c14n/#sec-Use The first change really should happen, then second one is candy. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=306473&aid=833461&group_id=6473