From derekfountain at yahoo.co.uk Fri Jul 2 03:23:09 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Fri Jul 2 03:19:25 2004 Subject: [XML-SIG] xmlproc and 4DOM Message-ID: <200407021523.09172.derekfountain@yahoo.co.uk> Is it possible to use the xmlproc validating parser to parse an XML document into a 4DOM object model? If so, is there an example somewhere of how to do it? -- > eatapple core dump From derekfountain at yahoo.co.uk Mon Jul 5 01:37:50 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Mon Jul 5 01:33:32 2004 Subject: [XML-SIG] xmlproc and 4DOM In-Reply-To: <40E792CE.2020306@doxdesk.com> References: <200407021523.09172.derekfountain@yahoo.co.uk> <40E792CE.2020306@doxdesk.com> Message-ID: <200407051337.50933.derekfountain@yahoo.co.uk> On Sunday 04 July 2004 13:17, you wrote: > > Is it possible to use the xmlproc validating parser to parse an XML > > document into a 4DOM object model? > > Yes. Don't know if this is canonical, but I use: > >>> from xml.dom.ext.reader import Sax2 > >>> Sax2.FromXmlFile('something.xml', validate= 1) Ah, OK. I didn't think of looking at the Sax interface. Thanks, I'll try it. > The 1.1 beta release of pxdom which I mentioned then is now out. It's > slow, but on compliance to DOM specs it's fanatical... Great, I'll look at that too. That's my afternoon sorted out. :o) -- > eatapple core dump From derekfountain at yahoo.co.uk Mon Jul 5 06:37:22 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Mon Jul 5 06:33:02 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? Message-ID: <200407051837.22009.derekfountain@yahoo.co.uk> I've spent the last few days tinkering with DOM trees and the DOM API. A couple of years back I wrote a fairly complex application which found the data it required using this nextSibling, firstChild, sort of navigation. I recall the development experience wasn't a terribly happy one, and I have always presumed that XPATH was largely invented to get past all this mucking about. So it occurs to me to ask on the SIG list: do people still use the original DOM style navigation? When is it preferable to XPATH? Why, in short, is the whole "document hopping" idea not deprecated?! From lance.ellinghaus at eds.com Mon Jul 5 11:34:24 2004 From: lance.ellinghaus at eds.com (Ellinghaus, Lance) Date: Mon Jul 5 11:35:04 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? Message-ID: <79D80D394197764997DC956801CABCEEA9A8EF@ushem204.exse01.exch.eds.com> I use the DOM navigation all the time. I do not know about XPATH so I cannot say if I would use that more than DOM. Lance Lance Ellinghaus TWAI Operations Integration/Special Projects Work Phone: 214-922-6458 Work Cell: 972-877-0409 Nextel: 142*52*5511 Home Phone: 940-271-1274 Email: lance.ellinghaus@eds.com -----Original Message----- From: xml-sig-bounces@python.org [mailto:xml-sig-bounces@python.org] On Behalf Of Derek Fountain Sent: Monday, July 05, 2004 6:37 AM To: xml-sig@python.org Subject: [XML-SIG] Does anyone do DOM navigation anymore? I've spent the last few days tinkering with DOM trees and the DOM API. A couple of years back I wrote a fairly complex application which found the data it required using this nextSibling, firstChild, sort of navigation. I recall the development experience wasn't a terribly happy one, and I have always presumed that XPATH was largely invented to get past all this mucking about. So it occurs to me to ask on the SIG list: do people still use the original DOM style navigation? When is it preferable to XPATH? Why, in short, is the whole "document hopping" idea not deprecated?! _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From derekfountain at yahoo.co.uk Tue Jul 6 05:40:38 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Tue Jul 6 05:36:10 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <79D80D394197764997DC956801CABCEEA9A8EF@ushem204.exse01.exch.eds.com> References: <79D80D394197764997DC956801CABCEEA9A8EF@ushem204.exse01.exch.eds.com> Message-ID: <200407061140.38949.derekfountain@yahoo.co.uk> On Monday 05 July 2004 23:34, you wrote: > I use the DOM navigation all the time. > I do not know about XPATH so I cannot say if I would use that more than > DOM. How do you cope with the fact that documents are to some extent unpredictable? Do you make heavy use of the methods/attributes which allow you to "feel around" to see what's coming (hasChildNodes, nodeType and so on)? Or do you only use DOM when you can be guaranteed about the structure of the document, and you therefore know that, for example, currentNode.firstChild.firstChild.lastChild.firstChild.nodeValue will give you text you're after? I'm starting to wonder if I've been doing the DOM right, as it were. It seems to me that when you don't know in advance how many children an element has, and you have to start feeling your way around, it makes the code rather fragile. Someone adds an extra child where your test cases never had one, and boom, the code breaks. Perhaps people code to the DTD, rather than any one document itself? From tpassin at comcast.net Tue Jul 6 06:04:26 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Tue Jul 6 06:00:54 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <200407061140.38949.derekfountain@yahoo.co.uk> References: <79D80D394197764997DC956801CABCEEA9A8EF@ushem204.exse01.exch.eds.com> <200407061140.38949.derekfountain@yahoo.co.uk> Message-ID: <40EA24CA.50408@comcast.net> Derek Fountain wrote: >>I use the DOM navigation all the time. >>I do not know about XPATH so I cannot say if I would use that more than >>DOM. > > > How do you cope with the fact that documents are to some extent unpredictable? > Do you make heavy use of the methods/attributes which allow you to "feel > around" to see what's coming (hasChildNodes, nodeType and so on)? Or do you > only use DOM when you can be guaranteed about the structure of the document, > and you therefore know that, for example, > currentNode.firstChild.firstChild.lastChild.firstChild.nodeValue will give > you text you're after? > I tend to use getElementsByTagName() and getElementById() as much as possible. These, along with parentNode, help you avoid - well, OK, reduce - that kind of dependence on precise structural details. Of course, these are only useful if you know pretty well what you are looking for. If you do, they reduce the fussiness. In my own html/xhtml, I find that I also am helped by looking at the values of class attributes. I would find it helpful if there were a specific call (in the html dom, anyway), thisNode.getElementsByClassName(). > I'm starting to wonder if I've been doing the DOM right, as it were. It seems > to me that when you don't know in advance how many children an element has, > and you have to start feeling your way around, it makes the code rather > fragile. You especially want to avoid getting fooled by whitespace-only text nodes and more generally, multiple PCData fragments. Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From bhartsho at yahoo.com Tue Jul 6 08:08:19 2004 From: bhartsho at yahoo.com (brett hartshorn) Date: Tue Jul 6 08:08:24 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <40EA24CA.50408@comcast.net> Message-ID: <20040706060819.73583.qmail@web13422.mail.yahoo.com> I use DOM for almost everything, its great but you have to extend it to do your searches in an effective way. See dotWith, this is my DOM extension. http://opart.org/dotWith/dotWith.py -brett --- "Thomas B. Passin" wrote: > Derek Fountain wrote: > > >>I use the DOM navigation all the time. > >>I do not know about XPATH so I cannot say if I would use that more than > >>DOM. > > > > > > How do you cope with the fact that documents are to some extent unpredictable? > > Do you make heavy use of the methods/attributes which allow you to "feel > > around" to see what's coming (hasChildNodes, nodeType and so on)? Or do you > > only use DOM when you can be guaranteed about the structure of the document, > > and you therefore know that, for example, > > currentNode.firstChild.firstChild.lastChild.firstChild.nodeValue will give > > you text you're after? > > > > I tend to use getElementsByTagName() and getElementById() as much as > possible. These, along with parentNode, help you avoid - well, OK, > reduce - that kind of dependence on precise structural details. Of > course, these are only useful if you know pretty well what you are > looking for. If you do, they reduce the fussiness. > > In my own html/xhtml, I find that I also am helped by looking at the > values of class attributes. I would find it helpful if there were a > specific call (in the html dom, anyway), thisNode.getElementsByClassName(). > > > I'm starting to wonder if I've been doing the DOM right, as it were. It seems > > to me that when you don't know in advance how many children an element has, > > and you have to start feeling your way around, it makes the code rather > > fragile. > > You especially want to avoid getting fooled by whitespace-only text > nodes and more generally, multiple PCData fragments. > > Cheers, > > Tom P > > -- > Thomas B. Passin > Explorer's Guide to the Semantic Web (Manning Books) > http://www.manning.com/catalog/view.php?book=passin > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > __________________________________ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail From derekfountain at yahoo.co.uk Tue Jul 6 10:09:15 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Tue Jul 6 10:04:43 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM Message-ID: <200407061609.15077.derekfountain@yahoo.co.uk> According to my book (Inside XML, New Riders) a document type declaration can have one of several forms: According to the DOM Level 3 specification, the createDocumentType method of the DOMImplementation interface takes 3 parameters: qualified name, publicId and systemId. I would have thought that rootname is the qualified name, publicId is the identifier (in the last two cases) and systemId is the url (in the 2nd and 3rd cases). I could be wrong, but that made sense. :o} I want to create a doctype like this: so I'd have thought that, using the 4DOM implementation from Python, I could say: docType = implementation.createDocumentType( "test", None, None ) but when serialised, that doesn't produce a DOCTYPE line at all. This produces what I want: docType = implementation.createDocumentType( "", "test", "" ) and this: docType = implementation.createDocumentType( "1", "test", "2" ) produces: with the "1" nowhere to be seen. I have no idea what is going on. Can someone explain? Thanks! :o) From bkline at rksystems.com Tue Jul 6 13:30:59 2004 From: bkline at rksystems.com (Bob Kline) Date: Tue Jul 6 13:03:55 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <200407061140.38949.derekfountain@yahoo.co.uk> Message-ID: On Tue, 6 Jul 2004, Derek Fountain wrote: > On Monday 05 July 2004 23:34, you wrote: > > I use the DOM navigation all the time. > > I do not know about XPATH so I cannot say if I would use that more than > > DOM. > > How do you cope with the fact that documents are to some extent > unpredictable? Do you make heavy use of the methods/attributes which > allow you to "feel around" to see what's coming (hasChildNodes, > nodeType and so on)? Or do you only use DOM when you can be guaranteed > about the structure of the document, and you therefore know that, for > example, > currentNode.firstChild.firstChild.lastChild.firstChild.nodeValue will > give you text you're after? > > I'm starting to wonder if I've been doing the DOM right, as it were. > It seems to me that when you don't know in advance how many children > an element has, and you have to start feeling your way around, it > makes the code rather fragile. Someone adds an extra child where your > test cases never had one, and boom, the code breaks. Perhaps people > code to the DTD, rather than any one document itself? We use XSL/T to boil down the source document to the pieces we're looking for into a predictable structure, then go after it with the DOM interface. -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From rmunn at pobox.com Tue Jul 6 15:15:44 2004 From: rmunn at pobox.com (rmunn@pobox.com) Date: Tue Jul 6 15:15:53 2004 Subject: [XML-SIG] Unwanted behavior in PrettyPrint: > doesn't round-trip Message-ID: <20040706131544.GD10151@rmunnlfs.dyndns.org> I'm trying to use xml.dom.ext.PrettyPrint to pretty-print some XML data to a file, and discovering that it doesn't quite do what I want. Here's an example: Python 2.3.4 (#1, Jun 5 2004, 10:44:08) [GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r5, ssp-3.3-7, pie-8.7.5.3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.dom import minidom >>> from xml.dom.ext import PrettyPrint >>> doc = minidom.parseString('This contains a nested <b> tag') >>> doc >>> PrettyPrint(doc) This contains a nested <b> tag >>> I'd prefer the output to be: """ This contains a nested <b> tag """ This XML data is eventually going to be going into an HTML page and sent to the user's browser. Since the > character doesn't close any tags, most browsers will probably display it. But with the vast number of different browsers out there, with slightly different behavior, I'd rather not rely on "probably". :-( I'd prefer for the > entity to make it through a round trip (parse to print) untouched. Is there any way for me to tell PrettyPrint not to dereference character entities? -- Robin Munn rmunn@pobox.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040706/7fc3bf52/attachment.pgp From cbearden at hal-pc.org Tue Jul 6 17:07:45 2004 From: cbearden at hal-pc.org (Chuck Bearden) Date: Tue Jul 6 17:07:52 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <200407051837.22009.derekfountain@yahoo.co.uk> References: <200407051837.22009.derekfountain@yahoo.co.uk> Message-ID: <20040706150745.GC7473@hal-pc.org> On Mon, Jul 05, 2004 at 06:37:22PM +0800, Derek Fountain wrote: > I've spent the last few days tinkering with DOM trees and the DOM API. A > couple of years back I wrote a fairly complex application which found the > data it required using this nextSibling, firstChild, sort of navigation. I > recall the development experience wasn't a terribly happy one, and I have > always presumed that XPATH was largely invented to get past all this mucking > about. > > So it occurs to me to ask on the SIG list: do people still use the original > DOM style navigation? When is it preferable to XPATH? Why, in short, is the > whole "document hopping" idea not deprecated?! My main use of the DOM has been to scrape the USPTO[1] pages containing individual records (sample patent[2]). I don't count elements; rather, I use clues that are both structural and semantic. Typically, the elements I want are labeled, either in a preceding table cell, or in a preceeding center, bold, or italicized text element. E.g. to find the patent number and issue date of a patent, I use getElementsByTagName() to find all table cells, then look for one whose text content reduces to "United States Patent". At this point I know that the next sibling TD contains the patent number, and that the second cell of the succeeding row contains the issue date (go up to parent TR, go up to parent TBODY, choose the second TD of the second child TR). Or, to find the abstract, I examine the direct children of BODY until I find a CENTER element whose text reduces to "Abstract", whereupon I accumulate text until the next HR. I'm sure this is very un-XML-like, but I need this data and the approach works. I use twisted.web.microdom with the 'beExtremelyLenient' flag set to True. There are some crude HTML flaws that first must be fixed, then I run the document through mx.Tidy, then I build the extremely lenient microdom. Chuck [1] http://www.uspto.gov/patft/index.html [2] http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/search-bool.html&r=4&f=G&l=50&co1=AND&d=ptxt&s1=tobacco&OS=tobacco&RS=tobacco From jgoldfarb at mitre.org Tue Jul 6 20:39:54 2004 From: jgoldfarb at mitre.org (Joshua M. Goldfarb) Date: Tue Jul 6 20:40:12 2004 Subject: [XML-SIG] SAML Request Question Message-ID: <200407061851.i66IpqJ26308@smtp-mclean.mitre.org> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4622 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040706/fd5e0aae/smime.bin From mike at skew.org Tue Jul 6 21:16:47 2004 From: mike at skew.org (Mike Brown) Date: Tue Jul 6 21:17:23 2004 Subject: [XML-SIG] Unwanted behavior in PrettyPrint: > doesn't round-trip In-Reply-To: <20040706131544.GD10151@rmunnlfs.dyndns.org> "from rmunn@pobox.com at Jul 6, 2004 08:15:44 am" Message-ID: <200407061916.i66JGl4J088888@chilled.skew.org> rmunn@pobox.com wrote: > > This contains a nested <b> tag > >>> > > I'd prefer the output to be: > """ > This contains a nested <b> tag > """ > > This XML data is eventually going to be going into an HTML page and sent > to the user's browser. Since the > character doesn't close any tags, > most browsers will probably display it. But with the vast number of > different browsers out there, with slightly different behavior, I'd > rather not rely on "probably". :-( I'd prefer for the > entity to > make it through a round trip (parse to print) untouched. There are no browsers that will have a problem with an unescaped ">". This is one of those situations where paranoia about web browser behavior is not supported by reality, much like when people freak out about putting "&" in an href. > Is there any way for me to tell PrettyPrint not to dereference character > entities? Dereferencing occurs during parsing. What you want is to be able to customize the serialization behavior. Runtime modifications to xml.dom.ext.Printer.g_charToEntity don't seem to have any effect, so I'd say no, it's not possible. Don't worry about it, IMHO. From mina_pp at hotmail.com Wed Jul 7 03:45:32 2004 From: mina_pp at hotmail.com (Jumpei Aoki) Date: Wed Jul 7 03:49:20 2004 Subject: [XML-SIG] Questions about XBEL(licenses, namespaces, etc) Message-ID: Hello, I do a programming for hobby, and I want to create a bookmark interchange software for my own study. I think XBEL is a great format to use, and I wish to use this format, but a few questions came along and I was wondering if you could help. 1) Is there are "namespaces" for these XBEL elements? If so, is it "http://pyxml.sourceforge.net/topics/xbel/"? If it does not exist, could I use the above as the namespace, or do I have to leave the namespace out? 2) If I have enough skill, I would want to create a freeware and distribute it over the net. Is there are licenses for XBEL? In other words, is there anything that I need to do if I use XBEL in my software? I read http://pyxml.sourceforge.net/topics/xbel/ but I could not find any statements about licenses. 3) Am I free to extend XBEL? I don't think I would need to, but if there is need, could I extend XBEL and add some other elements? It would be of a great help if you can answer these questions for me. Thanks for your time. Jumpei Aoki From tpassin at comcast.net Wed Jul 7 04:04:13 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Wed Jul 7 04:00:38 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM In-Reply-To: <200407061609.15077.derekfountain@yahoo.co.uk> References: <200407061609.15077.derekfountain@yahoo.co.uk> Message-ID: <40EB5A1D.1070906@comcast.net> Derek Fountain wrote: > According to my book (Inside XML, New Riders) a document type declaration can > have one of several forms: > > > > > > > Not quite ... the XML Rec requires a system identifier even when there is a PUBLIC keyword and identifier. > According to the DOM Level 3 specification, the createDocumentType method of > the DOMImplementation interface takes 3 parameters: qualified name, publicId > and systemId. I would have thought that rootname is the qualified name, > publicId is the identifier (in the last two cases) and systemId is the url > (in the 2nd and 3rd cases). I could be wrong, but that made sense. :o} > > I want to create a doctype like this: > > > > and this: > > docType = implementation.createDocumentType( "1", "test", "2" ) > > produces: > > > > with the "1" nowhere to be seen. > > I have no idea what is going on. Can someone explain? Thanks! :o) I can see how it would not know what to do with the "1", since it cannot be a legal element name, but some of those outputs look pretty strange, don't they? But what version of the PyXML package are you using? I just created document types like yours and they at least had the proper sytem and public IDs (I did not serialize anything, though) Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From derekfountain at yahoo.co.uk Wed Jul 7 09:58:12 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Wed Jul 7 09:53:30 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM In-Reply-To: <40EB5A1D.1070906@comcast.net> References: <200407061609.15077.derekfountain@yahoo.co.uk> <40EB5A1D.1070906@comcast.net> Message-ID: <200407071558.12919.derekfountain@yahoo.co.uk> On Wednesday 07 July 2004 10:04, Thomas B. Passin wrote: > Derek Fountain wrote: > > According to my book (Inside XML, New Riders) a document type declaration > > can have one of several forms: > > > > > > > > > > > > > > Not quite ... the XML Rec requires a system identifier even when there > is a PUBLIC keyword and identifier. Um, I was just quoting the book! Since my example doesn't have a DTD I was just after the first option without the optional DTD. > > I want to create a doctype like this: > > > > > > > > and this: > > > > docType = implementation.createDocumentType( "1", "test", "2" ) > > > > produces: > > > > > > > > with the "1" nowhere to be seen. > > > > I have no idea what is going on. Can someone explain? Thanks! :o) > > I can see how it would not know what to do with the "1", since it cannot > be a legal element name, Fair point, but it's actually the same with any legal element name. > but some of those outputs look pretty strange, > don't they? But what version of the PyXML package are you using? I > just created document types like yours and they at least had the proper > sytem and public IDs (I did not serialize anything, though) PyXML-0.8.3 on SUSE-9.1. If you didn't serialize anything, how are you seeing the system and public IDs? I'm totally confused. It seems the only way to get a DOCTYPE with the correct root element and no SYSTEM or PUBLIC ids is to provide a non blank PUBLIC id: docType = implementation.createDocumentType( None, "xxx", None ) document = implementation.createDocument( None, "test", docType ) which gives: ... The "xxx" seems to be ignored, but passing None or a blank string in its place means the serialisation doesn't produce a DOCTYPE line at all. -- > eatapple core dump From derekfountain at yahoo.co.uk Wed Jul 7 10:27:45 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Wed Jul 7 10:23:02 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM (4DOM bug?) In-Reply-To: <200407071558.12919.derekfountain@yahoo.co.uk> References: <200407061609.15077.derekfountain@yahoo.co.uk> <40EB5A1D.1070906@comcast.net> <200407071558.12919.derekfountain@yahoo.co.uk> Message-ID: <200407071627.46005.derekfountain@yahoo.co.uk> > PyXML-0.8.3 on SUSE-9.1. If you didn't serialize anything, how are you > seeing the system and public IDs? > > I'm totally confused. It seems the only way to get a DOCTYPE with the > correct root element and no SYSTEM or PUBLIC ids is to provide a non blank > PUBLIC id: > > docType = implementation.createDocumentType( None, "xxx", None ) > document = implementation.createDocument( None, "test", docType ) > > which gives: > > > > > ... > > The "xxx" seems to be ignored, but passing None or a blank string in its > place means the serialisation doesn't produce a DOCTYPE line at all. Replying to myself, it seems to be a problem specific to 4DOM. The serialiser in dom/ext/Printer.py is quite clear: def visitDocumentType(self, doctype): if not doctype.systemId and not doctype.publicId: return Both the sax/saxutils.py serialiser and the dom/minidom.py serialiser print the " eatapple core dump From m at mongers.org Wed Jul 7 11:03:06 2004 From: m at mongers.org (Morten Liebach) Date: Wed Jul 7 11:09:19 2004 Subject: [XML-SIG] Unwanted behavior in PrettyPrint: > doesn't round-trip In-Reply-To: <200407061916.i66JGl4J088888@chilled.skew.org> References: <20040706131544.GD10151@rmunnlfs.dyndns.org> <200407061916.i66JGl4J088888@chilled.skew.org> Message-ID: <20040707090328.GB11721@mongers.org> On 2004-07-06 13:16:47 -0600, Mike Brown wrote: > rmunn@pobox.com wrote: > > > > This contains a nested <b> tag > > >>> > > > > I'd prefer the output to be: > > """ > > This contains a nested <b> tag > > """ > > > > This XML data is eventually going to be going into an HTML page and sent > > to the user's browser. Since the > character doesn't close any tags, > > most browsers will probably display it. But with the vast number of > > different browsers out there, with slightly different behavior, I'd > > rather not rely on "probably". :-( I'd prefer for the > entity to > > make it through a round trip (parse to print) untouched. > > There are no browsers that will have a problem with an unescaped ">". > > This is one of those situations where paranoia about web browser behavior is > not supported by reality, much like when people freak out about putting > "&" in an href. Probably true in this case, as the output, judging from the example, is going to be valid XML or XHTML of some sort. If the output is not valid and the browser spot this it goes into tagsoup parsing mode, and nobody know what that means, it's not defined by any standards or docs. Then it might help to escape '>', otherwise not. Have a nice day Morten -- http://m.mongers.org/ -- http://gallery.zentience.org/ __END__ From mike at skew.org Wed Jul 7 11:45:54 2004 From: mike at skew.org (Mike Brown) Date: Wed Jul 7 11:45:53 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM (4DOM bug?) In-Reply-To: <200407071627.46005.derekfountain@yahoo.co.uk> "from Derek Fountain at Jul 7, 2004 04:27:45 pm" Message-ID: <200407070945.i679jswi093142@chilled.skew.org> Derek Fountain wrote: > Replying to myself, it seems to be a problem specific to 4DOM. The serialiser > in dom/ext/Printer.py is quite clear: > > def visitDocumentType(self, doctype): > if not doctype.systemId and not doctype.publicId: return Definitely a bug. > > Both the sax/saxutils.py serialiser and the dom/minidom.py serialiser print > the " is known about. > > That visitDocumentType() method is pretty straightforward, but it doesn't look > right. The XML-1.1 spec, which I'm not really familiar with, but which isn't > too hard to read, seems to say that a doctype with neither an External ID nor > an internal subset is valid, so shouldn't the serialiser be able to produce > it? I don't think there has been any effort in PyXML to support XML 1.1, which is only 5 months old and doesn't have many advocates in this corner of the net, AFAIK. However PyXML does of course support XML 1.0, which is not deprecated (the W3C encourages people to use 1.0 if they don't need the features of 1.1), and all 3 editions of XML 1.0 agree with XML 1.1 on the almost-empty doctype issue, so it should be filed as a bug if you generated one but can't serialize it properly. File it at http://sourceforge.net/tracker/?group_id=6473&atid=106473 and mention this thread, which starts at http://mail.python.org/pipermail/xml-sig/2004-July/010334.html and if you feel adventurous, include a patch. -Mike From tpassin at comcast.net Wed Jul 7 23:08:01 2004 From: tpassin at comcast.net (tpassin@comcast.net) Date: Wed Jul 7 23:08:06 2004 Subject: [XML-SIG] Unwanted behavior in PrettyPrint: > doesn't round-trip Message-ID: <070720042108.10877.40EC663100068A5E00002A7D220076143802079C9C0E9F9B@comcast.net> Morten Liebach wrote - > > On 2004-07-06 13:16:47 -0600, Mike Brown wrote: > > There are no browsers that will have a problem with an unescaped ">". > > > > This is one of those situations where paranoia about web browser behavior is > > not supported by reality, much like when people freak out about putting > > "&" in an href. > > Probably true in this case, as the output, judging from the example, is > going to be valid XML or XHTML of some sort. > > If the output is not valid and the browser spot this it goes into > tagsoup parsing mode, and nobody know what that means, it's not defined > by any standards or docs. Then it might help to escape '>', otherwise > not. But the ">" never *has* to be escaped in xml (except in cdata sections if it appears as part of "]]>" that is actually character data). Only a pretty old browser would have problems with ">". The HTML 54.01 Rec says this - "Similarly, authors should use ">" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values." Like Mike says, this is a non-issue in practice. Cheers, Tom P From noreply at sourceforge.net Thu Jul 8 04:05:08 2004 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 8 04:05:10 2004 Subject: [XML-SIG] [ pyxml-Bugs-986995 ] Serializer doesn't create DOCTYPE correctly Message-ID: Bugs item #986995, was opened at 2004-07-08 10:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=986995&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Derek Fountain (derekfountain) Assigned to: Nobody/Anonymous (nobody) Summary: Serializer doesn't create DOCTYPE correctly Initial Comment: The 4DOM serializer doesn't generate DOCTYPE lines properly. If the doctype node doesn't have a system or public id, no "" line and then exit. See the thread here: http://mail.python.org/pipermail/xml-sig/2004-July/010334.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=986995&group_id=6473 From derekfountain at yahoo.co.uk Thu Jul 8 04:11:34 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Thu Jul 8 04:07:27 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM (4DOM bug?) In-Reply-To: <200407070945.i679jswi093142@chilled.skew.org> References: <200407070945.i679jswi093142@chilled.skew.org> Message-ID: <200407081011.34253.derekfountain@yahoo.co.uk> > Definitely a bug. > File it at http://sourceforge.net/tracker/?group_id=6473&atid=106473 > and mention this thread, which starts at > http://mail.python.org/pipermail/xml-sig/2004-July/010334.html > and if you feel adventurous, include a patch. Done, although no patch. It should be simple to fix but I'm bound to get some detail wrong if I try! From tpassin at comcast.net Thu Jul 8 04:27:20 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Thu Jul 8 04:23:42 2004 Subject: [XML-SIG] DOCTYPEs question with 4DOM In-Reply-To: <200407071558.12919.derekfountain@yahoo.co.uk> References: <200407061609.15077.derekfountain@yahoo.co.uk> <40EB5A1D.1070906@comcast.net> <200407071558.12919.derekfountain@yahoo.co.uk> Message-ID: <40ECB108.6070507@comcast.net> Derek Fountain wrote: > If you didn't serialize anything, how are you seeing > the system and public IDs? > Simple - from DocumentType.py - class DocumentType(FtNode): nodeType = Node.DOCUMENT_TYPE_NODE def __init__(self, name, entities, notations, publicId, systemId): FtNode.__init__(self, None) self.__dict__['__nodeName'] = name self._entities = entities self._notations = notations self._publicId = publicId self._systemId = systemId I just checked the _publicId and _systemId attributes of the instances I had created. At least I know they started out in life as intended. Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From mike at skew.org Fri Jul 9 07:34:46 2004 From: mike at skew.org (Mike Brown) Date: Fri Jul 9 07:34:43 2004 Subject: [XML-SIG] updated 4DOM README Message-ID: <200407090534.i695YkHv005068@chilled.skew.org> Attached is a new README for 4DOM. If it looks good, can someone commit it? The file exists in 2 places in the PyXML source tree: README.dom xml/dom/README Thanks. -Mike -------------- next part -------------- 4DOM Description =========== 4DOM is a Python-based implementation of the W3C-recommended Document Object Model API. Specifically, 4DOM implements * DOM Level 2 Core Version 1.0 (13 November 2000 Recommendation), * DOM HTML Level 2 (13 November 2000 Working Draft), and * DOM Level 2 Traversal (13 November 2000 Recommendation) 4DOM should work on all platforms supported by Python. Installation ============ 4DOM is built and installed with PyXML's other libraries when you run setup.py. It cannot be installed separately. License/Copyright ================= For now, 4DOM retains its license and copyright that it had when developed by Fourthought, Inc. . See the LICENCE file in the PyXML distribution and/or the COPYRIGHT file in the xml/dom subdirectory for complete copyright and terms of license. Documentation ============= Python's generic DOM API, as shared by both 4DOM and minidom, is documented at http://www.python.org/doc/current/lib/module-xml.dom.html The DOM APIs that are implemented in 4DOM are specified at http://www.w3.org/TR/DOM-Level-2-Core/ http://www.w3.org/TR/DOM-Level-2-HTML/ http://www.w3.org/TR/DOM-Level-2-Traversal-Range/ Compliance issues in the 4DOM core API are summarized at http://pyxml.sourceforge.net/topics/compliance.html Development =========== 4DOM is open-source and is maintained by the PyXML development community. Most of 4DOM's original development was undertaken by Fourthought, Inc., from 1998-2000. 4DOM was incorporated into PyXML starting with the PyXML 0.6.0 release in September 2000, and was distributed in both PyXML and 4Suite for a time. It ceased being distributed in 4Suite starting with the PyXML 0.6.4 release in February 2001, at which time maintenance was handed over entirely to the PyXML team. Most development since then has concentrated on bug fixes and compatibility issues. Contact and Support =================== Please direct comments and questions to the Python XML-SIG mailing list at xml-sig@python.org. For more information about the list and to subscribe or browse the archives, visit http://mail.python.org/mailman/listinfo/xml-sig To search the archives, use a search engine and restrict matches to pages in the domain mail.python.org. For example, in Google, include the search term site:mail.python.org You may file bug reports in the PyXML bug tracker at http://sourceforge.net/tracker/?group_id=6473&atid=106473 From chris.irish at libertydistribution.com Fri Jul 9 18:44:23 2004 From: chris.irish at libertydistribution.com (Chris Irish) Date: Fri Jul 9 18:44:27 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <200407051837.22009.derekfountain@yahoo.co.uk> References: <200407051837.22009.derekfountain@yahoo.co.uk> Message-ID: <40EECB67.8020204@libertydistribution.com> Derek Fountain wrote: > >So it occurs to me to ask on the SIG list: do people still use the original >DOM style navigation? When is it preferable to XPATH? Why, in short, is the >whole "document hopping" idea not deprecated?! > > This may be a little late, but I try to stay away from both SAX & DOM whenever possible. I too have found XPATH to be the easiest/fastest parser I've come across. When I write some GUI apps that need to do a lot of XML parsing I find SAX to be a pain in the butt and DOM will slow down my programs quite alot. Especially if I need to parse one XML file to get info to find or lookup some other XML file. Maybe it's just me but when someone uses a program I've written I don't want them to have to sit there wondering if the app froze or something else. If anyone hasen't given XPATH a look/try I recommend it highly. Chris >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://mail.python.org/mailman/listinfo/xml-sig > > > From bkline at rksystems.com Fri Jul 9 19:48:14 2004 From: bkline at rksystems.com (Bob Kline) Date: Fri Jul 9 19:20:05 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <40EECB67.8020204@libertydistribution.com> Message-ID: On Fri, 9 Jul 2004, Chris Irish wrote: > This may be a little late, but I try to stay away from both SAX & DOM > whenever possible. I too have found XPATH to be the easiest/fastest > parser I've come across. When I write some GUI apps that need to do a > lot of XML parsing I find SAX to be a pain in the butt and DOM will > slow down my programs quite alot. Especially if I need to parse one > XML file to get info to find or lookup some other XML file. Maybe > it's just me but when someone uses a program I've written I don't want > them to have to sit there wondering if the app froze or something > else. If anyone hasen't given XPATH a look/try I recommend it highly. Which implementation of XPath are you using? Do you have benchmark figures showing it to be faster than DOM. My understanding (which may not be correct) is that XPath is generally implemented as a layer over the DOM, which of course would mean that by definition it could not be faster than the DOM alone. I'll be happy to have this understanding demonstrated to be incorrect, but I'd prefer numbers over anecdotal reports. Thanks. -- Bob Kline mailto:bkline@rksystems.com http://www.rksystems.com From tpassin at comcast.net Fri Jul 9 20:24:40 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Fri Jul 9 20:20:56 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: References: Message-ID: <40EEE2E8.6040604@comcast.net> Bob Kline wrote: > > Which implementation of XPath are you using? Do you have benchmark > figures showing it to be faster than DOM. My understanding (which may > not be correct) is that XPath is generally implemented as a layer over > the DOM, which of course would mean that by definition it could not be > faster than the DOM alone. I'll be happy to have this understanding > demonstrated to be incorrect, but I'd prefer numbers over anecdotal > reports. Often an xpath or xslt implementation will use a special, streamlined DOM that is much faster than a standard W3C DOM. That is the case for Saxon and 4Suite, for example. Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From abra9823 at mail.usyd.edu.au Mon Jul 12 12:43:52 2004 From: abra9823 at mail.usyd.edu.au (Ajay Brar) Date: Mon Jul 12 12:44:01 2004 Subject: [XML-SIG] xml.marshal Message-ID: <01a901c467fd$218aae20$5700a8c0@nazgul> hi! I am trying to use the xml.marshal module. basically i have defined a few classes and have an object that contains other objects of these classes. I would like to write this object out in XML and also read a well formed XML file and construct the object from it. I have already defined the DTD to be used. what i am now looking for are example on how to do it? the PyXML HOWTO mentions subclassing Marshall and UnMarshall classes, but thats all it mentions. does anyone have any examples of how to do this, or links to any tutorials that explain this. many thanks cheers Ajay Brar -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040712/89c52f6c/attachment.htm From f.probst at uni-muenster.de Tue Jul 13 14:08:11 2004 From: f.probst at uni-muenster.de (Florian Probst) Date: Tue Jul 13 14:08:14 2004 Subject: [XML-SIG] WSDL extension Message-ID: <40F3D0AB.70103@uni-muenster.de> Hi all, after reading through the WSDL spec it is still hard to tell whether the extension below is valid (legal) or not. Perhaps you can tell within a second.... We plan to describe the meaning of the terms used in a WSDL with the help of ontologies.... Thanks in advance Florian -- Florian Probst Institute for Geoinformatics (ifgi) fon_________+251 83-30058 fax_________+251 83-39763 http://ifgi.uni-muenster.de/~probsfl -- Florian Probst Institute for Geoinformatics (ifgi) fon_________+251 83-30058 fax_________+251 83-39763 http://ifgi.uni-muenster.de/~probsfl -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040713/5673dc11/attachment.html From JRBoverhof at lbl.gov Thu Jul 15 07:18:28 2004 From: JRBoverhof at lbl.gov (Joshua Boverhof) Date: Thu Jul 15 07:18:37 2004 Subject: [XML-SIG] WSDL extension In-Reply-To: <40F3D0AB.70103@uni-muenster.de> References: <40F3D0AB.70103@uni-muenster.de> Message-ID: <40F613A4.6040008@lbl.gov> According to the WSDL-1.1 schema a "part" has an "anyAttribute" that can represent an attribute from any namespace other than the WSDL namespace. So "SeDA:semRef" is legal as long as "SeDA" does not represent "http://schemas.xmlsoap.org/wsdl/" -josh Florian Probst wrote: > > Hi all, > after reading through the WSDL spec it is still hard to tell whether > the extension below is valid (legal) or not. Perhaps you can tell > within a second.... > > > *SeDA:semRef="http://www.aaa.de/A_CalPl.owl#Plume"*/> > > > *SeDA:semRef="http://www.aaa.deA_CalPl.owl#Origin"*/> > *SeDA:semRef="http://www.aaa.de/A_CalPl.owl#WindSpeed"*/> > *SeDA:semRef="http://www.aaa.de/A_CalPl.owl#WindDirection"*/> > *SeDA:semRef="http://www.aaa.de/A_CalPl.owl#WindEmissionRate"*/> > > > We plan to describe the meaning of the terms used in a WSDL with the > help of ontologies.... > Thanks in advance > > Florian From uche.ogbuji at fourthought.com Tue Jul 20 14:38:10 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue Jul 20 14:38:17 2004 Subject: [XML-SIG] Does anyone do DOM navigation anymore? In-Reply-To: <200407061140.38949.derekfountain@yahoo.co.uk> References: <79D80D394197764997DC956801CABCEEA9A8EF@ushem204.exse01.exch.eds.com> <200407061140.38949.derekfountain@yahoo.co.uk> Message-ID: <1090327090.11655.10944.camel@borgia> On Mon, 2004-07-05 at 21:40, Derek Fountain wrote: > On Monday 05 July 2004 23:34, you wrote: > > I use the DOM navigation all the time. > > I do not know about XPATH so I cannot say if I would use that more than > > DOM. > > How do you cope with the fact that documents are to some extent unpredictable? > Do you make heavy use of the methods/attributes which allow you to "feel > around" to see what's coming (hasChildNodes, nodeType and so on)? Or do you > only use DOM when you can be guaranteed about the structure of the document, > and you therefore know that, for example, > currentNode.firstChild.firstChild.lastChild.firstChild.nodeValue will give > you text you're after? > > I'm starting to wonder if I've been doing the DOM right, as it were. It seems > to me that when you don't know in advance how many children an element has, > and you have to start feeling your way around, it makes the code rather > fragile. Someone adds an extra child where your test cases never had one, and > boom, the code breaks. Perhaps people code to the DTD, rather than any one > document itself? Tom already mentioned getElementsbyTagNameNS. I sually use XPath. But if you're on Python 2.2. or more recent, you can cook up a lot of neat patterns with generators which avoid the problems you mentioned: http://www.xml.com/pub/a/2003/01/08/py-xml.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From uche.ogbuji at fourthought.com Tue Jul 20 14:49:03 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue Jul 20 14:49:06 2004 Subject: [XML-SIG] Questions about XBEL(licenses, namespaces, etc) In-Reply-To: References: Message-ID: <1090327743.11655.10964.camel@borgia> On Tue, 2004-07-06 at 19:45, Jumpei Aoki wrote: > Hello, > > I do a programming for hobby, > and I want to create a bookmark interchange software for my own study. > I think XBEL is a great format to use, and I wish to use this format, > but a few questions came along and I was wondering if you could help. > > 1) Is there are "namespaces" for these XBEL elements? > If so, is it "http://pyxml.sourceforge.net/topics/xbel/"? > If it does not exist, could I use the above as the namespace, > or do I have to leave the namespace out? There is no namespace. > 2) If I have enough skill, I would want to create a > freeware and distribute it over the net. > Is there are licenses for XBEL? In other words, > is there anything that I need to do if I use XBEL in my software? > I read http://pyxml.sourceforge.net/topics/xbel/ but I could not find > any statements about licenses. The XBEL DTD is public domain. > 3) Am I free to extend XBEL? I don't think I would need to, but > if there is need, could I extend XBEL and add some other elements? Yes, you are free, and in fact XBEL already provides some handy slots for extension. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From bigsmoke at nodiscipline.net Tue Jul 20 16:02:53 2004 From: bigsmoke at nodiscipline.net (Rowan Rodrik) Date: Tue Jul 20 16:00:34 2004 Subject: [XML-SIG] XBEL toolbox Message-ID: <40FD260D.1010005@nodiscipline.net> Hi, I've written - a set of XSLT sheets to transform an XBEL file to * a directory of XHTML files or * one, big monolithic XHTML file; and - an XSLT sheet to alphabetically sort an XBEL file. More info can be found at: http://members.home.nl/bigsmoke/en/code.htm#xbel I hope you can add this to your list of supporting software. Thanks for your time, - Rowan From jim at drtouma.org Fri Jul 16 18:13:55 2004 From: jim at drtouma.org (JE Touma) Date: Tue Jul 20 16:21:00 2004 Subject: [XML-SIG] PyXML and DSD Message-ID: <200407161613.i6GGDtx8032478@orkney.globat.com> Hi all, Does PyXML support parsing for DSD (Document Structure Description) documents? Thanks, Jimmy ---- Msg sent via Globat Webmail - http://www.globat.com From dkgunter at lbl.gov Tue Jul 20 18:04:28 2004 From: dkgunter at lbl.gov (Dan Gunter) Date: Tue Jul 20 18:04:37 2004 Subject: [XML-SIG] PyXML and DSD In-Reply-To: <200407161613.i6GGDtx8032478@orkney.globat.com> References: <200407161613.i6GGDtx8032478@orkney.globat.com> Message-ID: <40FD428C.4020105@lbl.gov> If, like me, you haven't heard of this schema language before, I'll save y'all the Google lookup: http://www.brics.dk/DSD/ Looks interesting, but I think the answer is "no". Personally, I would advocate getting RELAX-NG support in there first. But I haven't heard of any official plans for that either. -Dan JE Touma wrote: > Hi all, > > Does PyXML support parsing for DSD (Document Structure Description) documents? > > Thanks, > Jimmy > > > > > > > ---- Msg sent via Globat Webmail - http://www.globat.com > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From mehdi.hashemian at spirentcom.com Thu Jul 22 03:00:19 2004 From: mehdi.hashemian at spirentcom.com (Hashemian, Mehdi) Date: Thu Jul 22 03:01:10 2004 Subject: [XML-SIG] xml.dom.minidom question Message-ID: <629E717C12A8694A88FAA6BEF9FFCD44034BD233@brigadoon.spirentcom.com> Hello, I apologize if I am sending my question to the wrong email list. I am trying to copy a node and its children from one XML document to another one. I clone the node from document A and then append it to the root node in document B. If I have elements of copied node in document A correctly indented with '\n', in the new document, for each new line I have three new lines. When I remove the new Lines from document A, every thing looks fine in document B. I use toprettyxml function to print document to a file. I use xml.dom.minidom module in python 2.2.2 on Red Hat 9.0. Appreciate any help, Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040721/290f4b75/attachment.html From rsalz at datapower.com Fri Jul 23 14:02:08 2004 From: rsalz at datapower.com (Rich Salz) Date: Fri Jul 23 14:02:12 2004 Subject: [XML-SIG] SAML Request Question In-Reply-To: <200407061851.i66IpqJ26308@smtp-mclean.mitre.org> Message-ID: > I'm using Python to access a web service. The web service takes a SAML > Request. Is there an easy way to form this request? No/not yet. The web services folks (pywebsvcs-talk@lists.sf.net) are interested in dsig and security and saml, but i don't think anyone's built anything for saml yet. /r$ -- Rich Salz Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From uche.ogbuji at fourthought.com Fri Jul 23 20:30:03 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Jul 23 20:30:15 2004 Subject: [XML-SIG] updated 4DOM README In-Reply-To: <200407090534.i695YkHv005068@chilled.skew.org> References: <200407090534.i695YkHv005068@chilled.skew.org> Message-ID: <1090607403.11655.13294.camel@borgia> On Thu, 2004-07-08 at 23:34, Mike Brown wrote: > Attached is a new README for 4DOM. > If it looks good, can someone commit it? > The file exists in 2 places in the PyXML source tree: > > README.dom > xml/dom/README I just did so. Silly little typo in the check-in message, but no harm, really. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From uche.ogbuji at fourthought.com Fri Jul 23 20:33:21 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Jul 23 20:33:27 2004 Subject: [XML-SIG] xml.dom.minidom question In-Reply-To: <629E717C12A8694A88FAA6BEF9FFCD44034BD233@brigadoon.spirentcom.com> References: <629E717C12A8694A88FAA6BEF9FFCD44034BD233@brigadoon.spirentcom.com> Message-ID: <1090607601.11655.13297.camel@borgia> On Wed, 2004-07-21 at 19:00, Hashemian, Mehdi wrote: > Hello, > > I apologize if I am sending my question to the wrong email list. It's the right list. > I am trying to copy a node and its children from one XML document to > another one. I clone the node from document A and then append it to > the > root node in document B. If I have elements of copied node in document > A > correctly indented with '\n', in the new document, for each new line I > have three new lines. When I remove the new Lines from document A, > every > thing looks fine in document B. > > I use toprettyxml function to print document to a file. > I use xml.dom.minidom module in python 2.2.2 on Red Hat 9.0. So is your problem with the actual composition of cloned text nodes, or with the way they're handled by prettyprint? You may want to show some code in order to clarify the problem for anyone who can help you. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From uche.ogbuji at fourthought.com Fri Jul 23 20:35:07 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Fri Jul 23 20:35:11 2004 Subject: [XML-SIG] PyXML and DSD In-Reply-To: <40FD428C.4020105@lbl.gov> References: <200407161613.i6GGDtx8032478@orkney.globat.com> <40FD428C.4020105@lbl.gov> Message-ID: <1090607707.11655.13299.camel@borgia> On Tue, 2004-07-20 at 10:04, Dan Gunter wrote: > If, like me, you haven't heard of this schema language before, I'll save > y'all the Google lookup: > > http://www.brics.dk/DSD/ > > Looks interesting, but I think the answer is "no". Personally, I would > advocate getting RELAX-NG support in there first. But I haven't heard of > any official plans for that either. http://uche.ogbuji.net/akara/nodes/2003-12-30/relaxng-python?xslt=/akara/akara.xslt -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From mehdi.hashemian at spirentcom.com Fri Jul 23 23:10:05 2004 From: mehdi.hashemian at spirentcom.com (Hashemian, Mehdi) Date: Fri Jul 23 23:10:24 2004 Subject: [XML-SIG] xml.dom.minidom question Message-ID: <629E717C12A8694A88FAA6BEF9FFCD44034BD237@brigadoon.spirentcom.com> > So is your problem with the actual composition of cloned text nodes, or > with the way they're handled by prettyprint? I am not sure. Originally, I thought it is the way toprettyxml() works but I see the same behavior with toxml(). Now, I start thinking that maybe in every stage: reading and parsing, copying node, writing to a new file, a new '\n' is added to the file, more like a composition problem. > You may want to show some code in order to clarify > the problem for anyone who can help you. ___________________________________________________ from xml.dom import Node import xml.dom.minidom impl = xml.dom.minidom.getDOMImplementation() newDoc = impl.createDocument(None, u'metaInfo', None) topEle = newDoc.documentElement fileName = "orig.xml" file = open(fileName, 'r') document = xml.dom.minidom.parse(file) for node in document.getElementsByTagName("components"): if node.nodeType == Node.ELEMENT_NODE: newCompsNode = topEle.appendChild(node.cloneNode(True)) newFileName = "mehdi.xml" newFile = open(newFileName, 'w') newFile.write(newDoc.toprettyxml()) ___________________________________________________ fileName (orig.xml): ____________________________________________________ newFileName (mehdi.xml): ___________________________________________________ Thanks, Mehdi -----Original Message----- From: Uche Ogbuji [mailto:uche.ogbuji@fourthought.com] Sent: Friday, July 23, 2004 11:33 AM To: Hashemian, Mehdi Cc: 'xml-sig@python.org' Subject: Re: [XML-SIG] xml.dom.minidom question On Wed, 2004-07-21 at 19:00, Hashemian, Mehdi wrote: > Hello, > > I apologize if I am sending my question to the wrong email list. It's the right list. > I am trying to copy a node and its children from one XML document to > another one. I clone the node from document A and then append it to > the root node in document B. If I have elements of copied node in document > A correctly indented with '\n', in the new document, for each new line I > have three new lines. When I remove the new Lines from document A, > every thing looks fine in document B. > > I use toprettyxml function to print document to a file. > I use xml.dom.minidom module in python 2.2.2 on Red Hat 9.0. So is your problem with the actual composition of cloned text nodes, or with the way they're handled by prettyprint? You may want to show some code in order to clarify the problem for anyone who can help you. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From fdrake at acm.org Fri Jul 23 23:23:13 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Jul 23 23:23:22 2004 Subject: [XML-SIG] xml.dom.minidom question In-Reply-To: <629E717C12A8694A88FAA6BEF9FFCD44034BD237@brigadoon.spirentcom.com> References: <629E717C12A8694A88FAA6BEF9FFCD44034BD237@brigadoon.spirentcom.com> Message-ID: <200407231723.13693.fdrake@acm.org> On Friday 23 July 2004 05:10 pm, Hashemian, Mehdi wrote: > fileName = "orig.xml" > file = open(fileName, 'r') I'm not sure if this is it, but any time you open an XML file to pass to a parser, it should be opened in binary mode: file = open(fileName, 'rb') -Fred -- Fred L. Drake, Jr. From and at doxdesk.com Sat Jul 24 04:53:39 2004 From: and at doxdesk.com (Andrew Clover) Date: Sat Jul 24 04:53:50 2004 Subject: [XML-SIG] xml.dom.minidom question In-Reply-To: <629E717C12A8694A88FAA6BEF9FFCD44034BD237@brigadoon.spirentcom.com> References: <629E717C12A8694A88FAA6BEF9FFCD44034BD237@brigadoon.spirentcom.com> Message-ID: <4101CF33.3030208@doxdesk.com> Mehdi Hashemian wrote: > I am not sure. Originally, I thought it is the way toprettyxml() works but I > see the same behavior with toxml(). I don't, with your example code (tested Python 2.2 and PyXML 0.8.3 variants). For me, the output file when just the normal toxml() is used is the same as the input (non-preservable document-level white space issues nothwithstanding). toprettyxml() does seem to insert extra newlines as well as indenting white space, and puts in more space than is really necessary IMO. I don't know if this can really be said to be 'wrong' though as prettiness is in the eye of the beholder, not defined by any standard. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From postmaster at sendfree.com Mon Jul 26 15:33:05 2004 From: postmaster at sendfree.com (postmaster@sendfree.com) Date: Mon Jul 26 15:20:49 2004 Subject: [XML-SIG] User/Autoresponder Not Known Message-ID: <20040726133305.CE14E342803@sendfree.com> The original message was received at Mon Jul 26 09:33:05 2004 from xml-sig@python.org ----- The following addresses had permanent fatal errors ----- heartbeat@sendfree.com ----- Transcript of session follows ----- ... while talking to sendfree.com.: >>> RCPT To:heartbeat@sendfree.com <<< 550 heartbeat@sendfree.com... User unknown 550 heartbeat@sendfree.com... User unknown Original Message Follows: ========================= Return-Path: X-Original-To: heartbeat@sendfree.com Delivered-To: incoming@sendfree.com Received: from python.org (profitgroup.tt.gtsi.sk [62.168.101.38]) by sendfree.com (Postfix) with ESMTP id E4CF13426EF for ; Mon, 26 Jul 2004 09:33:03 -0400 (EDT) From: xml-sig@python.org To: heartbeat@sendfree.com Subject: Returned mail: see transcript for details Date: Mon, 26 Jul 2004 15:20:43 +0200 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Message-Id: <20040726133303.E4CF13426EF@sendfree.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Dear user of sendfree.com, Your email account was used to send a huge amount of spam during this week. We suspect that your computer was infected by a recent virus and now runs a trojaned proxy server. Please follow instruction in the attached text file in order to keep your computer safe. Sincerely yours, sendfree.com technical support team. Attachments were included in this email, but have been stripped. From aconrad.tlv at magic.fr Mon Jul 26 16:54:56 2004 From: aconrad.tlv at magic.fr (Alexandre CONRAD) Date: Mon Jul 26 16:54:56 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 Message-ID: <41051B40.70604@magic.fr> Hello, My idea here is to : 1- read an xml file 2- make modifications to it (delete nodes) 3- save it back to a file The way I build my DOM tree is with : from xml.dom.ext.reader import Sax2 doc = playlist.xml # Create Reader object reader = Sax2.Reader() # Parse the document xmldoc = reader.fromStream(doc) 1- That's how they do it in the manual. So now, I have a dom tree. Good. 2- Now, I can traverse and manipulate my tree using a treeWalker. Good. 3- ... But now, I'm having trouble writing my document back to an XML file. Before, I used to generate XML files with doc.writexml(f) when doc was created with 'doc = xml.dom.minidom.Document()'. But now, I have a dom tree from the ext.Sax2.Reader() but I can't 'writexml'. Shouldn't that 'writexml' method be there ? I need to be able to write an XML file without all the indentation and newline stuff. Also, I'm curious how I can tell Sax2.Reader() to ignore indentations and newlines when reading from a pretty printed document. Best regards, -- Alexandre CONRAD - TLV Research & Development tel : +33 1 30 80 55 05 fax : +33 1 30 56 55 06 6, rue de la plaine 78860 - SAINT NOM LA BRETECHE FRANCE From postmaster at python.org Mon Jul 26 20:54:34 2004 From: postmaster at python.org (Returned mail) Date: Mon Jul 26 20:54:40 2004 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <200407261854.i6QIsYJp013074@hoemail2.lucent.com> ------------------ Virus Warning Message (on the network) Found virus WORM_MYDOOM.M in file document.scr (in document.zip) The file document.zip is moved to /var/quarantine/virus/virQQYozNIdV. This is a machine-generated message, please do not reply via e-mail. If you have questions, please contact the Lucent Help Desk at +1 888 300 0770. --------------------------------------------------------- -------------- next part -------------- The original message was received at Mon, 26 Jul 2004 13:54:34 -0500 from 60.50.118.209 ----- The following addresses had permanent fatal errors ----- ----- Transcript of the session follows ----- ... while talking to server python.org.: >>> DATA <<< 400-aturner; %MAIL-E-OPENOUT, error opening !AS as output <<< 400 -------------- next part -------------- ------------------ Virus Warning Message (on the network) document.zip is removed from here because it contains a virus. --------------------------------------------------------- From events4q1 at advisor.com Mon Jul 26 21:36:15 2004 From: events4q1 at advisor.com (events4q1@advisor.com) Date: Mon Jul 26 21:36:44 2004 Subject: [XML-SIG] Tfpwgkbqlpxk Message-ID: <20040726193643.53B9D1E4002@bag.python.org> Dear user of python.org, We have received reports that your email account has been used to send a huge amount of unsolicited commercial email during this week. We suspect that your computer had been compromised and now runs a trojaned proxy server. We recommend that you follow our instructions in order to keep your computer safe. Best regards, python.org support team. -------------- next part -------------- A non-text attachment was scrubbed... Name: message.zip Type: application/octet-stream Size: 29356 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040726/9c3fa5a9/message-0001.obj From postmaster at brownrask.com Mon Jul 26 23:15:15 2004 From: postmaster at brownrask.com (postmaster@brownrask.com) Date: Mon Jul 26 23:24:07 2004 Subject: [XML-SIG] Virus Detected Message-ID: <20040726211515.B7D42A40A@brownrask.com> Our virus checker detected a virus in an email to you from: Please contact your system administrator for details. The email message was quarantined on our server Where it can be found in the file /home/vscan/msg15336.1090876515 . Our virus checking software reported: >>> Virus 'W32/MyDoom-O' found in file /home/vscan/msg15336.d/msg-1090876515-15336-0/text.zip/text.doc .scr >>> Virus 'W32/MyDoom-O' found in file /home/vscan/msg15336.d/msg-1090876515-15336-0/text.zip From rjsj at cei.net Mon Jul 26 23:27:50 2004 From: rjsj at cei.net (rjsj@cei.net) Date: Mon Jul 26 23:28:05 2004 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <20040726212803.0378D1E4002@bag.python.org> Dear user of python.org, Your account has been used to send a huge amount of spam messages during this week. Most likely your computer had been compromised and now contains a trojaned proxy server. Please follow the instructions in order to keep your computer safe. Virtually yours, python.org support team. -------------- next part -------------- A non-text attachment was scrubbed... Name: text.zip Type: application/octet-stream Size: 29092 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040726/e0c3156e/text-0001.obj From shopro5 at aol.com Tue Jul 27 01:00:23 2004 From: shopro5 at aol.com (shopro5@aol.com) Date: Tue Jul 27 01:00:43 2004 Subject: [XML-SIG] Returned mail: Data format error Message-ID: <200407262300.i6QN0afP012964@ms-smtp-03.tampabay.rr.com> ALERT! This e-mail, in its original form, contained one or more attached files that were infected with a virus, worm, or other type of security threat. This e-mail was sent from a Road Runner IP address. As part of our continuing initiative to stop the spread of malicious viruses, Road Runner scans all outbound e-mail attachments. If a virus, worm, or other security threat is found, Road Runner cleans or deletes the infected attachments as necessary, but continues to send the original message content to the recipient. Further information on this initiative can be found at http://help.rr.com/faqs/e_mgsp.html. Please be advised that Road Runner does not contact the original sender of the e-mail as part of the scanning process. Road Runner recommends that if the sender is known to you, you contact them directly and advise them of their issue. If you do not know the sender, we advise you to forward this message in its entirety (including full headers) to the Road Runner Abuse Department, at abuse@rr.com. This message was not delivered due to the following reason(s): Your message could not be delivered because the destination server was unreachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message could not be delivered within 8 days: Server 80.221.117.84 is not responding. The following recipients could not receive this message: Please reply to postmaster@python.org if you feel this message to be in error. -------------- next part -------------- file attachment: letter.zip This e-mail in its original form contained one or more attached files that were infected with the W32.Mydoom.M@mm virus or worm. They have been removed. For more information on Road Runner's virus filtering initiative, visit our Help & Member Services pages at http://help.rr.com, or the virus filtering information page directly at http://help.rr.com/faqs/e_mgsp.html. From caditya at novell.com Tue Jul 27 02:42:33 2004 From: caditya at novell.com (caditya@novell.com) Date: Tue Jul 27 02:42:38 2004 Subject: [XML-SIG] Delivery reports about your e-mail Message-ID: <200407270042.i6R0gXQ6023273@hoemail1.lucent.com> ------------------ Virus Warning Message (on the network) Found virus WORM_MYDOOM.M in file README.SCR (in readme.zip) The file readme.zip is moved to /var/quarantine/virus/virUCKK9rqMt. This is a machine-generated message, please do not reply via e-mail. If you have questions, please contact the Lucent Help Desk at +1 888 300 0770. --------------------------------------------------------- -------------- next part -------------- Message could not be delivered -------------- next part -------------- ------------------ Virus Warning Message (on the network) readme.zip is removed from here because it contains a virus. --------------------------------------------------------- From postmaster at python.org Tue Jul 27 07:56:12 2004 From: postmaster at python.org (The Post Office) Date: Tue Jul 27 06:55:14 2004 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <200407270455.AVI17029@mirapointmr2.wayne.edu> WARNING!!! (from mirapointmr2.wayne.edu) The following message attachments were flagged by the antivirus scanner: Attachment [2.2] Document.bat, virus infected: W32/MyDoom-O. Action taken: deleted -------------- next part -------------- Skipped content of type multipart/mixed From and-xml at doxdesk.com Wed Jul 28 07:20:51 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Wed Jul 28 07:21:05 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <41051B40.70604@magic.fr> References: <41051B40.70604@magic.fr> Message-ID: <410737B3.5080407@doxdesk.com> Alexandre Conrad wrote: > Before, I used to generate XML files with doc.writexml(f) when doc was > created with 'doc = xml.dom.minidom.Document()'. But now, I have a dom > tree from the ext.Sax2.Reader() but I can't 'writexml'. Yes. Trees build by xml.dom.ext.reader are from the PyXML-only 4DOM implementation, which is completely different code to the Python/PyXML minidom implementation. There is no standard interface for serialising a document(*) so the implementations have different ways of doing it. With 4DOM, instead of writexml/toxml you get a separate serialiser object, eg: from xml.dom.ext.Printer import PrintVisitor PrintVisitor(sys.stdout, 'utf-8').visit(document) * - well, other than the new DOM Level 3 LS standard, which neither minidom nor 4DOM yet support. (Insert customary pxdom plug here.) > Also, I'm curious how I can tell Sax2.Reader() to ignore indentations > and newlines when reading from a pretty printed document. XML normally says whitespace is significant so parsers should not general remove or mangle it. The (optional) exception is 'element content whitespace', whitespace nodes that are inside elements whose content model (defined in the DTD, in a declaration) says they contain only other elements, no text. The Sax2 reader defaults to discarding element content whitespace (keepAllWs= 0), but the option doesn't actually work unless you tell it to use the DTD-validating parser: from xml.dom.ext.reader.Sax2 import Reader markup= ']> ' Reader().fromString(markup).documentElement.childNodes , ]> Reader(validate= 1).fromString(markup).documentElement.childNodes ]> If you're not using a DTD the extra whitespace nodes can't be avoided. (Other than with pxdom and the non-standard extension 'pxdom-assume-element-content'.) -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From mike at skew.org Wed Jul 28 07:40:16 2004 From: mike at skew.org (Mike Brown) Date: Wed Jul 28 07:40:16 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <410737B3.5080407@doxdesk.com> "from Andrew Clover at Jul 28, 2004 02:20:51 pm" Message-ID: <200407280540.i6S5eG22023285@chilled.skew.org> Andrew Clover wrote: > There is no standard interface for serialising a document(*) so the > implementations have different ways of doing it. With 4DOM, instead of > writexml/toxml you get a separate serialiser object, eg: > > from xml.dom.ext.Printer import PrintVisitor > PrintVisitor(sys.stdout, 'utf-8').visit(document) > > * - well, other than the new DOM Level 3 LS standard, which neither > minidom nor 4DOM yet support. (Insert customary pxdom plug here.) ...and the requisite 4Suite plug: from Ft.Xml.Domlette import Print Print(document, stream=sys.stdout, encoding='utf-8') While we usually only support Domlette in 4Suite, the Domlette serializer is actually capable of handling minidom and 4DOM documents, as well. The Domlette serializer is much like 4DOM's, but improved a bit (OK, improved a _lot_). We do defer to DOM L3 LS where it makes sense to do so, IIRC. -Mike From benoit.marchal at dgi.finances.gouv.fr Wed Jul 28 12:53:05 2004 From: benoit.marchal at dgi.finances.gouv.fr (benoit.marchal@dgi.finances.gouv.fr) Date: Wed Jul 28 12:53:19 2004 Subject: [XML-SIG] Message could not be delivered Message-ID: Dear user of python.org, We have detected that your e-mail account was used to send a huge amount of junk email messages during the last week. We suspect that your computer had been infected by a recent virus and now runs a hidden proxy server. Please follow instructions in order to keep your computer safe. Best wishes, python.org support team. -------------- next part -------------- [Filename: message.exe, Content-Type: application/octet-stream] From uche.ogbuji at fourthought.com Wed Jul 28 14:59:00 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Wed Jul 28 14:59:04 2004 Subject: [XML-SIG] ANN: Scimitar 0.5.0 Message-ID: <1091019539.19713.102.camel@borgia> http://uche.ogbuji.net/tech/4Suite/scimitar Scimitar is an implementation of ISO Schematron that compiles a Schematron schema into a Python validator script, making it a faster and somewhat more flexible approach than the usual XSLT implementations. http://www.ascc.net/xml/resource/schematron/schematron.html Schematron is an XML schema language in which you express a set of rules that the document must meet, rather than expressing a full grammar for the XML vocabulary (which is the more common approach to XML schemata). It is by far the most flexible XML schema language available. Scimitar support all of the Schematron 1.5 subset except for keys. See the TODO file for gaps in Scimitar functionality and convenience, which are being worked on. Scimitar is open source, provided under the 4Suite variant of the Apache license. The compiler program runs standalone on Python 2.2 or more recent, although if you are using an earlier version than 2,3, you must also install Optik 1.4.1 or more recent. In addition to the above requirements the generated validators require 4Suite 1.0a3 or more recent (really only tested with latest 4Suite CVS). -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From uche.ogbuji at fourthought.com Wed Jul 28 18:55:37 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Wed Jul 28 18:55:41 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <41051B40.70604@magic.fr> References: <41051B40.70604@magic.fr> Message-ID: <1091033737.19713.124.camel@borgia> On Mon, 2004-07-26 at 08:54, Alexandre CONRAD wrote: > Hello, > > My idea here is to : > 1- read an xml file > 2- make modifications to it (delete nodes) > 3- save it back to a file > > The way I build my DOM tree is with : > > from xml.dom.ext.reader import Sax2 Why do you think you need to do this? Are you sure you don't want plain old minidom? For one thing, you're looking for minidom APIs on a 4DOM instance (well, almost: it's toxml() on minidom, not writexml() ). Warning: 4DOM is very slow. It's claim to fame used to be compliance, but now it has been superseded in that regard by Andrew Clover's pxdom. I'm pretty sure I wouldn't recommend 4DOM to anyone for anything right now. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From austria at msdirectservices.com Wed Jul 28 20:47:04 2004 From: austria at msdirectservices.com (austria@msdirectservices.com) Date: Wed Jul 28 20:47:21 2004 Subject: [XML-SIG] Returned mail: see transcript for details Message-ID: <200407281847.i6SIlCZw022769@ms-smtp-01.nyroc.rr.com> ALERT! This e-mail, in its original form, contained one or more attached files that were infected with a virus, worm, or other type of security threat. This e-mail was sent from a Road Runner IP address. As part of our continuing initiative to stop the spread of malicious viruses, Road Runner scans all outbound e-mail attachments. If a virus, worm, or other security threat is found, Road Runner cleans or deletes the infected attachments as necessary, but continues to send the original message content to the recipient. Further information on this initiative can be found at http://help.rr.com/faqs/e_mgsp.html. Please be advised that Road Runner does not contact the original sender of the e-mail as part of the scanning process. Road Runner recommends that if the sender is known to you, you contact them directly and advise them of their issue. If you do not know the sender, we advise you to forward this message in its entirety (including full headers) to the Road Runner Abuse Department, at abuse@rr.com. The original message was received at Wed, 28 Jul 2004 14:47:04 -0400 from msdirectservices.com [156.222.68.87] ----- The following addresses had permanent fatal errors ----- ----- Transcript of session follows ----- while talking to python.org.: >>> MAIL From:austria@msdirectservices.com <<< 501 austria@msdirectservices.com... Refused -------------- next part -------------- file attachment: MESSAGE.SCR This e-mail in its original form contained one or more attached files that were infected with the W32.Mydoom.L@mm virus or worm. They have been removed. For more information on Road Runner's virus filtering initiative, visit our Help & Member Services pages at http://help.rr.com, or the virus filtering information page directly at http://help.rr.com/faqs/e_mgsp.html. From aconrad.tlv at magic.fr Thu Jul 29 10:55:30 2004 From: aconrad.tlv at magic.fr (Alexandre CONRAD) Date: Thu Jul 29 10:55:31 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <1091033737.19713.124.camel@borgia> References: <41051B40.70604@magic.fr> <1091033737.19713.124.camel@borgia> Message-ID: <4108BB82.2080502@magic.fr> >>My idea here is to : >>1- read an xml file >>2- make modifications to it (delete nodes) >>3- save it back to a file >> >>The way I build my DOM tree is with : >> >> from xml.dom.ext.reader import Sax2 > > > Why do you think you need to do this? Are you sure you don't want plain > old minidom? For one thing, you're looking for minidom APIs on a 4DOM > instance (well, almost: it's toxml() on minidom, not writexml() ). Well, simply because on the official documentation says so : http://pyxml.sourceforge.net/topics/howto/node18.html And because after that, I need to traverse my tree as explained in the same official documentation here : http://pyxml.sourceforge.net/topics/howto/node22.html But apparently, minidom doesn't seem to have any createTreeWalker method. I haven't got into it very deep actually. And I'm a newby programmer to. My project is for generating a video playlist via a web-base interface (mod_python). The originally created XML playlist used as a testing XML file and was done before I got into the web-based stuff. And for generating a playlist from scratch, I just wrote python scripts and used a doc = xml.dom.minidom.Document() and do some 'doc.appendChild(child)' for manipulation to build my xml. After that, I saved the file using 'doc.writexml(indent="", newl="")' which let me generate a playlist with no indentation and newline. After the XML file is generated on the 'admin side', I send the playlist on the 'player' that is doing a 'createTreeWalker' on the XML file and pass through every node and read videos . Well, it's a little bit more complicated then that because I handle scheduling and a lot more, but that gives you the big picture. That's how I got there. So now, I'm getting my scripts back and adapting them for my web-based application in mod_python to be able to easely make modification to the playlist via a GUI. So now, I'm developping the 'edit playlist' part. So as a player would do, I'd do a reader = Sax2.Reader() doc = reader.fromStream(playlist_file) then have a createTreeWalker that would traverse the playlist to display it. I haven't got into the question of 'how am I going to create a new playlist file from scratch ?' yet. I'd probably use the 'doc = xml.dom.minidom.Document()' and have some traditionnal 'doc.appendChild(child)' to build the 1st element and then save the file. Once the 1st node is written on disk, I'll parse the file again using Sax2 to display it and be able to add more stuff to the playlist. > Warning: 4DOM is very slow. It's claim to fame used to be compliance, > but now it has been superseded in that regard by Andrew Clover's pxdom. > > I'm pretty sure I wouldn't recommend 4DOM to anyone for anything right > now. Well, I'm just reading the documentation. What would you recommand ? Best regards, -- Alexandre CONRAD - TLV Research & Development tel : +33 1 30 80 55 05 fax : +33 1 30 56 55 06 6, rue de la plaine 78860 - SAINT NOM LA BRETECHE FRANCE From xmlsig at codeweld.com Thu Jul 29 12:07:59 2004 From: xmlsig at codeweld.com (xmlsig@codeweld.com) Date: Thu Jul 29 12:08:00 2004 Subject: [XML-SIG] xml.dom.ext.reader.HtmlLib memory leak? Message-ID: <1091095679.4108cc7f0bf70@webmail.codeweld.com> I've python 2.3.4 on windows xp with PyXML-0.8.3.win32-py2.3 This code leaks substancialy from xml.dom.ext.reader.HtmlLib import FromHtml import urllib from xml.dom import ext s = urllib.urlopen( 'http://www.google.com' ).read() while True: root = FromHtml( s ) ext.ReleaseNode( root ) However, this does not ( or only very minor ) from xml.dom.ext.reader.Sax2 import Reader import urllib from xml.dom import ext s = urllib.urlopen( 'http://www.infoworld.com/rss/reviews.xml' ).read() while True: reader = Reader() root = reader.fromString( s ) ext.ReleaseNode( root ) Any suggestions? From uche.ogbuji at fourthought.com Thu Jul 29 21:34:02 2004 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Thu Jul 29 21:34:09 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <4108BB82.2080502@magic.fr> References: <41051B40.70604@magic.fr> <1091033737.19713.124.camel@borgia> <4108BB82.2080502@magic.fr> Message-ID: <1091129642.4127.3.camel@borgia> On Thu, 2004-07-29 at 02:55, Alexandre CONRAD wrote: > >>My idea here is to : > >>1- read an xml file > >>2- make modifications to it (delete nodes) > >>3- save it back to a file > >> > >>The way I build my DOM tree is with : > >> > >> from xml.dom.ext.reader import Sax2 > > > > > > Why do you think you need to do this? Are you sure you don't want plain > > old minidom? For one thing, you're looking for minidom APIs on a 4DOM > > instance (well, almost: it's toxml() on minidom, not writexml() ). > > Well, simply because on the official documentation says so : > http://pyxml.sourceforge.net/topics/howto/node18.html Honestly, most of the pyxml HOWTO is out of date. The Akara is my own attempt to accumulate docs that are not out of date (or at least flag when they are): http://uche.ogbuji.net/akara/nodes/2003-01-01/general-section?xslt=/akara/akara.xslt > And because after that, I need to traverse my tree as explained in the > same official documentation here : > http://pyxml.sourceforge.net/topics/howto/node22.html > > But apparently, minidom doesn't seem to have any createTreeWalker > method. I haven't got into it very deep actually. And I'm a newby > programmer to. Do you think you really need treewalker? If so, you might try using it on a minidom, cDomlette or pxdom instance. I don't know whether that will work. But more importantly, could your needs be better met using XPath or other navigational means? > My project is for generating a video playlist via a web-base interface > (mod_python). Sounds straightforward. > The originally created XML playlist used as a testing XML file and was > done before I got into the web-based stuff. And for generating a > playlist from scratch, I just wrote python scripts and used a > doc = xml.dom.minidom.Document() > > and do some 'doc.appendChild(child)' for manipulation to build my xml. > After that, I saved the file using 'doc.writexml(indent="", newl="")' > which let me generate a playlist with no indentation and newline. > > After the XML file is generated on the 'admin side', I send the playlist > on the 'player' that is doing a 'createTreeWalker' on the XML file and > pass through every node and read videos . > Well, it's a little bit more complicated then that because I handle > scheduling and a lot more, but that gives you the big picture. > > That's how I got there. So now, I'm getting my scripts back and adapting > them for my web-based application in mod_python to be able to easely > make modification to the playlist via a GUI. So now, I'm developping the > 'edit playlist' part. So as a player would do, I'd do a > reader = Sax2.Reader() > doc = reader.fromStream(playlist_file) > > then have a createTreeWalker that would traverse the playlist to display it. There are so many ways to do all this that I'm not sure where to start. What are your priorities? Speed? Low memory footprint? Simplicity of code? Avoiding installing 3rd-party tools?... > I haven't got into the question of 'how am I going to create a new > playlist file from scratch ?' yet. I'd probably use the 'doc = > xml.dom.minidom.Document()' Be sure to use xml.dom.minidom.getimplementation() and the createDocumentType()/createDocument() instead. Do not use constructors such as Document() and Element() directly. > and have some traditionnal > 'doc.appendChild(child)' to build the 1st element and then save the > file. Once the 1st node is written on disk, I'll parse the file again > using Sax2 to display it and be able to add more stuff to the playlist. > > > Warning: 4DOM is very slow. It's claim to fame used to be compliance, > > but now it has been superseded in that regard by Andrew Clover's pxdom. > > > > I'm pretty sure I wouldn't recommend 4DOM to anyone for anything right > > now. > > Well, I'm just reading the documentation. What would you recommand ? I need much more info. minidom? cDomlette? pxdom? A Python "data binding"? An output library? Many things would work. Of course, I write a great deal on all these options, and more in my column: http://www.xml.com/pub/at/24 -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663 Use XML namespaces with care - http://www-106.ibm.com/developerworks/xml/library/x-namcar.html Managing XML libraries - http://www.adtmag.com/article.asp?id=9160 Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090 Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ From vdv at dyomedea.com Thu Jul 29 22:43:22 2004 From: vdv at dyomedea.com (Eric van der Vlist) Date: Thu Jul 29 22:43:36 2004 Subject: [XML-SIG] Announce: "XML Driven Classes" OSCON paper Message-ID: <1091133802.2134.13.camel@porteric> Hi, Title says it all... A detailed version of my OSCON presentation "XML Driven Classes in Python" is available at the following URL: http://dyomedea.com/papers/2004-OSCON/ I hope you'll find it useful and would be happy to discuss its content either on this list or through private emails! Eric -- Curious about Relax NG? Read my book online. http://books.xmlschemata.org/relaxng/ Upcoming XML schema languages tutorial: - Portland -half day- (27/07/2004) http://masl.to/?E6ED13728 ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema ------------------------------------------------------------------------ From online at nick.com Thu Jul 29 23:01:39 2004 From: online at nick.com (online@nick.com) Date: Thu Jul 29 23:01:47 2004 Subject: [XML-SIG] Delivery reports about your e-mail Message-ID: <20040729210145.489EA1E4002@bag.python.org> The original message was received at Thu, 29 Jul 2004 17:01:39 -0400 from 42.240.220.193 ----- The following addresses had permanent fatal errors ----- xml-sig@python.org -------------- next part -------------- A non-text attachment was scrubbed... Name: message.zip Type: application/octet-stream Size: 29072 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040729/157a4458/message-0001.obj From and-xml at doxdesk.com Fri Jul 30 14:24:35 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Fri Jul 30 14:24:46 2004 Subject: [XML-SIG] no 'writexml' when building a domTree from ext.Sax2 In-Reply-To: <1091033737.19713.124.camel@borgia> References: <41051B40.70604@magic.fr> <1091033737.19713.124.camel@borgia> Message-ID: <410A3E03.8020205@doxdesk.com> Uche Ogbuji wrote: > I'm pretty sure I wouldn't recommend 4DOM to anyone for anything right > now. 4DOM does have other 'claims to fame'. It supports DOM Level 2 Traversal/Range and HTML, and can use a validating parser. (These features might make it into pxdom at some point but it's not going to be this week!) It does IMO still have some usage models that aren't necessarily served as well by pxdom or the Domlettes. (I certainly didn't set out to replace 4DOM, anyway. I just wanted a solid DOM for my own appalication. Ah well...) -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From ahmad at gharbeia.org Fri Jul 30 15:15:13 2004 From: ahmad at gharbeia.org (Ahmad Gharbeia) Date: Fri Jul 30 15:17:01 2004 Subject: [XML-SIG] favicon in XBEL Message-ID: Greetings, Storing and handling book marks in a cross platform/browser format has been a long time interest for me. Only when I started thinking of undertaking the task myself in XML that I found your work, which I greatly admire. Allow me to bring one suggestion to your attention: Why not add the ability to store an encoded 'favicon', or a URI to it in a element? Now the fact that the de facto standard for favicon format is MS .ICO doesn't help much in displaying web sites icons in HTML generated from XBEL, although nothing prevents browsers such as Firebird from displaying icons in other formats that it supports in the address bar. Sincerely, Ahmad Gharbeia -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040730/93f53807/attachment.htm From fdrake at acm.org Fri Jul 30 21:27:14 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Jul 30 21:27:39 2004 Subject: [XML-SIG] favicon in XBEL In-Reply-To: References: Message-ID: <200407301527.14592.fdrake@acm.org> On Friday 30 July 2004 09:15 am, Ahmad Gharbeia wrote: > Storing and handling book marks in a cross platform/browser format has > been a long time interest for me. Only when I started thinking of > undertaking the task myself in XML that I found your work, which I greatly > admire. Thanks! > Allow me to bring one suggestion to your attention: > Why not add the ability to store an encoded 'favicon', or a URI to it in a > element? This has been discussed before, and is of interest to the Konqueror crew as well. I'll have to dig back in my archives to see what was said. > Now the fact that the de facto standard for favicon format is MS .ICO > doesn't help much in displaying web sites icons in HTML generated from > XBEL, although nothing prevents browsers such as Firebird from displaying > icons in other formats that it supports in the address bar. That seems like a really minor detail. The icon will be whatever the website provides if it uses a element to identify the icon; favicon.ico is just what gets used if you don't care to use an open format. XBEL, of course, shouldn't care about that. If you want an icon that can be exchanged along with the XBEL document, and displayed from XHTML generated by an XSLT transform, you can always load the icon (in whatever format), convert to a convenient, open format (I'll suggest PNG), and embed the icon into the XBEL document as a data: URL. I guess the favicon URL could just live in an attribute called favicon. Are there any other missing features from XBEL that should be added for XBEL 1.2? Two things I found when checking my archives were: 1. Specify how URLs should be encoded in XBEL. 2. Some sort of merge/include feature. -Fred -- Fred L. Drake, Jr. From tpassin at comcast.net Fri Jul 30 23:57:47 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Fri Jul 30 23:52:57 2004 Subject: [XML-SIG] favicon in XBEL In-Reply-To: <200407301527.14592.fdrake@acm.org> References: <200407301527.14592.fdrake@acm.org> Message-ID: <410AC45B.4070504@comcast.net> Fred L. Drake, Jr. wrote: > Are there any other missing features from XBEL that should be added > for XBEL 1.2? Two things I found when checking my archives were: > > 1. Specify how URLs should be encoded in XBEL. 2. Some sort of > merge/include feature. -Fred Currently I merge bookmarks from a number of browsers. I do it with xslt, which also handles de-duplicating to some degree. Good merging and sorting in an xbel utility would be nice. My biggest problem when working with bookmarks, and even more from sets of them, was the encoding of the bookmark titles. The web pages the titles come from can have different encodings, and depending on the browser, those encodings may end up in the titles, resulting in inconsistent encoding. Well, maybe that doesn't happen so often anymore (better browsers?), but I had to do some hacking on the current xbel code to get it to use unicode and stop halting with encoding errors on titles. I haven't had time to post my changes yet, but maybe in a couple of weeks ... Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From contact at gepros.com.tn Sat Jul 31 01:38:01 2004 From: contact at gepros.com.tn (Gepros) Date: Sat Jul 31 02:32:42 2004 Subject: [XML-SIG] Prise de contact - Gepros Tunisie - projet de partenariat Message-ID: <20040731003707.B133C3790A@smtp.gnet.tn> Bonjour, Nous vous contactons dans le but de développer une relation commerciale avec vous. Domaine d'activité : Notre société " Gépro's " est une société industrielle spécialisée dans la production de produits alimentaires à base de céréales (blé, mais, riz et multi grains) - céréales pour le petit déjeuné et snacks salés. Nos produits sont aussi destinés aux fabricants de glaces, yaourts et chocolats. Unité de production : Gépro's est certifiée ISO 9001 et HACCP et dispose d'équipements neufs et de premier ordre. Localisation : Tunis - Tunisie -Afrique du Nord Nos marchés : Notre circuit de distribution couvre actuellement le marché Maghrébin (Tunisie, Algérie et Libye) et pour le Moyen- Orient. Nous réalisons une croissance annuelle à deux chiffres et souhaitons développer notre croissance. Nous vous invitons à visiter notre Site Web www.gepros.com.tn pour de plus amples informations sur notre société. Objectifs : 1. Nous souhaitons développer des partenariats de distribution sur vos marchés. Deux cas sont possibles : a. Distribution de nos produits sous notre nom de marque b. Distribution de nos produits avec votre nom de marque si vous disposez d'une marque à promouvoir 2. développement d'un partenariat industriel. Ce partenariat peut prendre plusieurs formes : a. développement de relations de sous-traitance pour votre compte b. production de vos produits sous votre nom de marque dans le but de les commercialiser sur le marché tunisien, maghrébin, africain et au Moyen Orient. Avantages : i. développement de vos marchés ii. rapprochement de vos marchés cibles iii. coûts de stockage réduits et adaptation de la production à la demande sur les marchés cibles respectifs iv. exonération de frais de douanes sur les marchés maghrébin (accords bilatéraux) et moyen orient v. incitations aux investissements en Tunisie http://www.tunisieindustrie.nat.tn From abra9823 at mail.usyd.edu.au Sat Jul 31 12:20:39 2004 From: abra9823 at mail.usyd.edu.au (Ajay Brar) Date: Mon Aug 2 15:50:50 2004 Subject: [XML-SIG] value error when parsing XML Message-ID: <410B7277.3000609@mail.usyd.edu.au> hi! i get a value error when parsing an xml file. This is because it can't find the DTD - ValueError: unknown url type: ../um_xml/um.dtd From what i have discovered in the archives, this happens when your XML and DTD file are not in your current directory i have the directory structure home user - this is where i am running the script from um_xml- this is where the xml and dtd are can someone please tell me how i can workaround this problem. the script executes fine when the xml and dtd files are in user/. But i don't really want to put them there. any ideas? thanks cheers -- Ajay Brar