From larsga@ifi.uio.no Tue Dec 1 14:12:05 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 01 Dec 1998 15:12:05 +0100 Subject: [XML-SIG] Trivial DOM patch Message-ID: This patch fixes a trivial buglet in the DOM example in xml.dom.core: [larsga@birk105 dom]$ cvs diff core.py Index: core.py =================================================================== RCS file: /projects/cvsroot/xml/dom/core.py,v retrieving revision 1.33 diff -r1.33 core.py 30c30,31 < doc.appendChild (head) # and this --- > html.appendChild(head) # and this > doc.appendChild (html) # and this --Lars M. From akuchlin@cnri.reston.va.us Tue Dec 1 14:14:49 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 1 Dec 1998 09:14:49 -0500 (EST) Subject: [XML-SIG] New xml-0.5 prerelease Message-ID: <13923.63646.128559.398310@amarok.cnri.reston.va.us> Round the loop we go again! I've put a new prerelease of the XML package up at http://www.python.org/sigs/xml-sig/files/ ; look for xml-0.5.tgz or .zip. I really want to announce 0.5, so please try compiling it and let me know if it goes smoothly; send me private e-mail even if you have no problems, so that I know that people have actually tried it. If I hear no problem reports by Thursday or Friday, I'll call it 0.5 final and announce it in various places, because we really need to start grabbing some mindshare for Python in the XML field. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Chemistry is physics without thought; mathematics is physics without purpose. -- Anonymous From Jeff.Johnson@icn.siemens.com Tue Dec 1 16:05:41 1998 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Tue, 1 Dec 1998 11:05:41 -0500 Subject: [XML-SIG] SGML to DOM? Message-ID: <852566CD.00588464.00@li01.lm.ssc.siemens.com> I sure hope this is a stupid question with an easy answer... I have a large SGML document that I am converting to HTML and another SGML DTD. For my prototype, I opened the document in ArborText's Adept SGML editor and saved it as XML. This made it well formed and escaped some '<' and '>' characters that were not markup. Unfortunately, it took the line feeds out of some pre-formatted text. I figured that was not a problem because I wanted to have Python read in the native SGML anyway. Then I briefly read the sgmllib docs; the part about it not supporting full SGML, just whatever HTML needs. Could someone tell me if there is a way to read in a non-well formed SGML document, with preformatted text into a DOM tree? Thanks in advance, Jeff From akuchlin@cnri.reston.va.us Tue Dec 1 16:28:39 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 1 Dec 1998 11:28:39 -0500 (EST) Subject: [XML-SIG] SGML to DOM? In-Reply-To: <852566CD.00588464.00@li01.lm.ssc.siemens.com> References: <852566CD.00588464.00@li01.lm.ssc.siemens.com> Message-ID: <13924.6050.370984.918423@amarok.cnri.reston.va.us> Jeff.Johnson@icn.siemens.com writes: >Could someone tell me if there is a way to read in a non-well formed SGML >document, with preformatted text into a DOM tree? If your SGML parser can output ESIS-formatted data, there's xml.dom.esis_builder which might help you. It doesn't support all of ESIS, though, and that really needs to be fixed; I think ESIS support is important to the really serious SGML users, as opposed to the XML dilettantes. Unfortunately, I'm just a dilettante. BTW, does anyone have a test SGML document which exercises all of the ESIS command characters? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It is not that I wanted to know a great deal, in order to acquire what is now called expertise, and which enables one to become an expert-tease to people who don't know as much as you do about the tiny corner you have made your own. -- Robertson Davies, _The Rebel Angels_ From digitome@iol.ie Wed Dec 2 13:49:32 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Wed, 2 Dec 1998 13:49:32 GMT Subject: [XML-SIG] SGML to DOM? Message-ID: <199812021349.NAA19589@GPO.iol.ie> A couple of ideas for you:- 1) Instead of "save-as" from Adept (and loosing the line feeds you mention) have you tried converting with James Clarks SX utility? 2) If you generate ESIS (James Clark's nsgmls) you can get into DOM in Python via the ESIS builder. There will be ESIS event types in your SGML that are invalid if your SGML uses features not available in XML. These are easily spotted owing to the line oriented nature of ESIS. At 11:05 01/12/98 -0500, you wrote: > > >I sure hope this is a stupid question with an easy answer... > >I have a large SGML document that I am converting to HTML and another SGML >DTD. For my prototype, I opened the document in ArborText's Adept SGML >editor and saved it as XML. This made it well formed and escaped some '<' >and '>' characters that were not markup. Unfortunately, it took the line >feeds out of some pre-formatted text. > >I figured that was not a problem because I wanted to have Python read in >the native SGML anyway. Then I briefly read the sgmllib docs; the part >about it not supporting full SGML, just whatever HTML needs. > >Could someone tell me if there is a way to read in a non-well formed SGML >document, with preformatted text into a DOM tree? > >Thanks in advance, >Jeff > > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://www.python.org/mailman/listinfo/xml-sig > > SELUR NOHTYP From Jeff.Johnson@icn.siemens.com Wed Dec 2 15:11:34 1998 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Wed, 2 Dec 1998 10:11:34 -0500 Subject: [XML-SIG] SGML to DOM? Message-ID: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com> I was lucky enough to have found SX yesterday and within 20 minutes had a working solution :) I haven't tried ESIS yet but my files look really good after I convert them to HTML so I think everything is working fine. I do have a small problem with SX. My files convert well on Win98 but get truncated on Win NT. I'll try to figure out what is wrong (maybe an EOF character?) and if it's a problem with SX I'll send the info to James Clark. Thanks for the help from Sean and Andrew! Sean Mc Grath on 12/02/98 08:49:32 AM To: xml-sig@python.org cc: (bcc: Jeff Johnson/Customer Service/Siemens_Stromberg-Carlson/US) Subject: Re: [XML-SIG] SGML to DOM? A couple of ideas for you:- 1) Instead of "save-as" from Adept (and loosing the line feeds you mention) have you tried converting with James Clarks SX utility? 2) If you generate ESIS (James Clark's nsgmls) you can get into DOM in Python via the ESIS builder. There will be ESIS event types in your SGML that are invalid if your SGML uses features not available in XML. These are easily spotted owing to the line oriented nature of ESIS. At 11:05 01/12/98 -0500, you wrote: > > >I sure hope this is a stupid question with an easy answer... > >I have a large SGML document that I am converting to HTML and another SGML >DTD. For my prototype, I opened the document in ArborText's Adept SGML >editor and saved it as XML. This made it well formed and escaped some '<' >and '>' characters that were not markup. Unfortunately, it took the line >feeds out of some pre-formatted text. > >I figured that was not a problem because I wanted to have Python read in >the native SGML anyway. Then I briefly read the sgmllib docs; the part >about it not supporting full SGML, just whatever HTML needs. > >Could someone tell me if there is a way to read in a non-well formed SGML >document, with preformatted text into a DOM tree? > >Thanks in advance, >Jeff > > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://www.python.org/mailman/listinfo/xml-sig > > SELUR NOHTYP _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig From akuchlin@cnri.reston.va.us Wed Dec 2 15:31:14 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 2 Dec 1998 10:31:14 -0500 (EST) Subject: [XML-SIG] SGML to DOM? In-Reply-To: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com> References: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com> Message-ID: <13925.23562.658406.15353@amarok.cnri.reston.va.us> Jeff.Johnson@icn.siemens.com writes: >I haven't tried ESIS yet but my files look really good after I convert them >to HTML so I think everything is working fine. BTW, last night I checked in a few changes to dom.esis_builder which add support for a few more ESIS events, but I'm not sure which ones are important to support. Also, I fiddled a bit with demo/xbel to get handling of Lynx bookmark files working again, and added a .toxml() method for the dom.core.Notation class. Has anyone tried the 0.5 prerelease yet? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ When one has stopped loving somebody, one feels that he has become someone else, even though he is still the same person. -- Sei Shonagon, _The Pillow Book_ From Fred L. Drake, Jr." Is the unicode/ directory in the xml tree supposed to be a package? If so, it needs an __init__.py. I'd also recommend moving wstrop.* into that package. The other C modules also should be moved into appropriate directories, and not installed in the site-packages/ directory but within the xml package at appropriate points. I think the C modules should end up being the following modules: intl xml.unicode.intl pyexpat xml.parsers.expat sgmlop xml.parsers._sgmlop wstrop xml.unicode._wstrop Note the addition of underscore prefixes for implementation-only modules. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From akuchlin@cnri.reston.va.us Wed Dec 2 16:51:26 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 2 Dec 1998 11:51:26 -0500 (EST) Subject: [XML-SIG] unicode package? In-Reply-To: <13925.25723.635465.305860@weyr.cnri.reston.va.us> References: <13925.25723.635465.305860@weyr.cnri.reston.va.us> Message-ID: <13925.27965.508563.446147@amarok.cnri.reston.va.us> Fred L. Drake writes: > Is the unicode/ directory in the xml tree supposed to be a package? >If so, it needs an __init__.py. I'd also recommend moving wstrop.* >into that package. At the moment, no. The modules in unicode/ all get installed in site-packages, so once installed, they're not associated with the XML code at all. > The other C modules also should be moved into appropriate >directories, and not installed in the site-packages/ directory but >within the xml package at appropriate points. > I think the C modules should end up being the following modules: > > intl xml.unicode.intl > pyexpat xml.parsers.expat > sgmlop xml.parsers._sgmlop > wstrop xml.unicode._wstrop This is a good question; should the Unicode support be included as a subpackage of xml, or should it be a standalone system that just happens to come with the XML package? I can see arguments for both possibilities; what does everyone think? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another's code; too little and expressiveness is endangered. -- Guido van Rossum, 13 Aug 1996 From Fred L. Drake, Jr." References: <13925.25723.635465.305860@weyr.cnri.reston.va.us> <13925.27965.508563.446147@amarok.cnri.reston.va.us> Message-ID: <13925.30117.718226.570922@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > At the moment, no. The modules in unicode/ all get installed > in site-packages, so once installed, they're not associated with the > XML code at all. Ugh! > This is a good question; should the Unicode support be > included as a subpackage of xml, or should it be a standalone system > that just happens to come with the XML package? I can see arguments > for both possibilities; what does everyone think? The xml package should not install *anything* outside the xml package. My understanding from the break-out session at IPC7 was that the support included in the package is largely a stop-gap solution until a more general solution for Python 1.6 has been implemented. At that point, the xml.unicode support can be either updated to use the standard support or removed; which we pick should depend on how much of the installed base won't be migrating to Python 1.6 quickly. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From digitome@iol.ie Wed Dec 2 17:18:55 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Wed, 2 Dec 1998 17:18:55 GMT Subject: [XML-SIG] unicode package? Message-ID: <199812021718.RAA32292@GPO.iol.ie> [AMK] > > This is a good question; should the Unicode support be >included as a subpackage of xml, or should it be a standalone system >that just happens to come with the XML package? I can see arguments >for both possibilities; what does everyone think? > I think it should just happen to come with the XML package. As Unicode support grows, we will see Unicode popping out of relational databases, bad HTML 4.0 and plain text files. IOW, lots of other Python modules will want to make use of it. SELUR NOHTYP From Fred L. Drake, Jr." References: <199812021718.RAA32292@GPO.iol.ie> Message-ID: <13925.32049.57046.953765@weyr.cnri.reston.va.us> Sean Mc Grath writes: > I think it should just happen to come with the XML package. > As Unicode support grows, we will see Unicode popping out > of relational databases, bad HTML 4.0 and plain text > files. IOW, lots of other Python modules will want to > make use of it. Sean, Since Python will eventually provide these facilities as part of the base installation, the support provided by/with the XML pacakge should only be at the "global" level if we're sure that the public interfaces to these things won't be significantly different. There's no reason other packages can't use what we provide, but it needs to be clear that what's being provided is a stop-gap solution and may behave differently from what's eventually provided with Python. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From kajiyama@etl.go.jp Thu Dec 3 03:03:47 1998 From: kajiyama@etl.go.jp (Tamito Kajiyama) Date: Thu, 3 Dec 98 03:03:47 JST Subject: [XML-SIG] New xml-0.5 prerelease In-Reply-To: <13923.63646.128559.398310@amarok.cnri.reston.va.us> (akuchlin@cnri.reston.va.us) Message-ID: <9812021803.AA26308@etlibs2.etl.go.jp> "Andrew M. Kuchling" writes: | Round the loop we go again! I've put a new prerelease of the XML | package up at http://www.python.org/sigs/xml-sig/files/ ; look for | xml-0.5.tgz or .zip. | | I really want to announce 0.5, so please try compiling it and | let me know if it goes smoothly; send me private e-mail even if you | have no problems, so that I know that people have actually tried it. I tried xml-0.5.tgz on SunOS 4.1.4_JL. This OS seems not to have libintl.h so that compiling intl.c failed. I built the XML package by removing 'intl*' from Makefile.pre.in and Setup.in. IMHO, I don't think it's worth supporting SunOS 4.x, since this version branch seems no longer supported by Sun and many vendors, and it would be out-of-date in the near future. Also, Python 1.5.1 seems not to define PySys_WriteStderr, and I had the following error: Python 1.5.1 (#45, Jul 16 1998, 10:46:19) [GCC 2.7.2.1] on sunos4 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from xml.sax import saxexts >>> parser = saxexts.make_parser() ld.so: Undefined symbol: _PySys_WriteStderr I avoided this error by explicitly giving saxexts.make_parser() a parser name (e.g. 'xmlproc'). BTW, if the directory $prefix/lib/python1.5/site-packages does not exist, the installation process simply fails. How about creating it if it does not exist as it is for subdirectories? I wonder if this is a general installation problem of Python... Regards, -- KAJIYAMA, Tamito From akuchlin@cnri.reston.va.us Wed Dec 2 18:30:59 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 2 Dec 1998 13:30:59 -0500 (EST) Subject: [XML-SIG] New xml-0.5 prerelease In-Reply-To: <9812021803.AA26308@etlibs2.etl.go.jp> References: <13923.63646.128559.398310@amarok.cnri.reston.va.us> <9812021803.AA26308@etlibs2.etl.go.jp> Message-ID: <13925.34498.718079.620238@amarok.cnri.reston.va.us> Tamito Kajiyama writes: >I tried xml-0.5.tgz on SunOS 4.1.4_JL. Thank you very much; these installation issues are all rather serious, and just the sort of thing that we don't want to have in a formal release. I'll work on fixing them tonight, and will try to issue a new prerelease tomorrow. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ There is no excellent beauty that hath not some strangeness in the proportion. -- Francis Bacon, "Of Beauty" From gstein@lyra.org Wed Dec 2 21:35:06 1998 From: gstein@lyra.org (Greg Stein) Date: Wed, 02 Dec 1998 13:35:06 -0800 Subject: [XML-SIG] unicode package? References: <199812021718.RAA32292@GPO.iol.ie> <13925.32049.57046.953765@weyr.cnri.reston.va.us> Message-ID: <3665B28A.12C15E2C@lyra.org> Fred L. Drake wrote: > > Sean Mc Grath writes: > > I think it should just happen to come with the XML package. > > As Unicode support grows, we will see Unicode popping out > > of relational databases, bad HTML 4.0 and plain text > > files. IOW, lots of other Python modules will want to > > make use of it. > > Sean, > Since Python will eventually provide these facilities as part of the > base installation, the support provided by/with the XML pacakge should > only be at the "global" level if we're sure that the public interfaces > to these things won't be significantly different. There's no reason > other packages can't use what we provide, but it needs to be clear > that what's being provided is a stop-gap solution and may behave > differently from what's eventually provided with Python. I *very* strongly agree with Fred's position here and in his prior email. We shouldn't mess around with trying to pretend something is applicable generally until we're sure that it is right. Here is a perfect case in point: the existence of _wstrop alone is not right -- the final implementation should use Unicode object methods, not external functions. -g -- Greg Stein, http://www.lyra.org/ From Fred L. Drake, Jr." References: <199812021718.RAA32292@GPO.iol.ie> <13925.32049.57046.953765@weyr.cnri.reston.va.us> <3665B28A.12C15E2C@lyra.org> Message-ID: <13925.49609.390429.961575@weyr.cnri.reston.va.us> Greg Stein writes: > I *very* strongly agree with Fred's position here and in his prior Wow, I must be having a good day! (And to think I spent half of it in a meeting! ;-) > Here is a perfect case in point: the existence of _wstrop alone is not > right -- the final implementation should use Unicode object methods, not Good point. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From akuchlin@cnri.reston.va.us Thu Dec 3 15:13:21 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 3 Dec 1998 10:13:21 -0500 (EST) Subject: [XML-SIG] xml-0.5pre2 released Message-ID: <13926.42346.660340.459393@amarok.cnri.reston.va.us> Here's the second pre-release of the XML package: http://www.python.org/sigs/xml-sig/files/xml-0.5pre2.tgz I've fixed the 1.5.2-ism of PySys_WriteStderr that crept into the pyexpat module, and also moved the Unicode stuff into xml.unicode, as argued by Fred and Greg S.; the package should now not install anything outside of site-packages/xml. While I was at it, I twiddled the test suite a bit and moved some files around. Dieter Maurer sent me a lengthy list of errors in private e-mail; most are minor things like broken links and demo programs, so I'm not sure if I'll do a third pre-release, though I will fix the problems. Anyway, please try this new version, and let me know if anything broke in the process. I'd still like to do an announcement tomorrow... -- A.M. Kuchling http://starship.skyport.net/crew/amk/ On Tuesdays he also wears the blue socks and the grey underwear and counts his bath towels. He has twenty-five bath towels. But how could anyone survive with less? -- The narrator introduces us to Michael Smith, in ENIGMA #1: "The Lizard, The Head, The Enigma" From jday@csihq.com Thu Dec 3 15:56:49 1998 From: jday@csihq.com (John Day) Date: Thu, 03 Dec 1998 10:56:49 -0500 Subject: [XML-SIG] xmlproc/DOM vs. WebL? Message-ID: <3.0.1.32.19981203105649.00687874@mail.csihq.com> Hi, I'm a newbie to both Python and XML, trying to figure out how it works and how to make it useful for creating Web agents, concept databases etc. I have recently stumbled across another scripting language called "WebL", which seems to be a smallish but elegant XML/HTML interpreter written in Java. COMPAQ is giving it away free w/src for non-commercial use: http://www.research.digital.com/SRC/WebL/ My reason for addressing this group is that I would like to know how it stacks up against Python/XML. In particular, does Python have any XML functions that do 'markup algebra' as described in the WebL docs? How would you compare their respective capabilities in general? (I'm hoping one you Python gurus has already looked at WebL). The WebL script examples for web-crawling and other agent actions are amazingly small. (On the downside, they seem to run extremely slowly on my machine). Perhaps there is some functionality here that could be applied to Python/XML. My hunch is that this markup algebra stuff could run a lot faster in Python. John Day From Fred L. Drake, Jr." References: <13926.42346.660340.459393@amarok.cnri.reston.va.us> Message-ID: <13926.46622.322046.458881@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > Here's the second pre-release of the XML package: doc/xml-ref.txt needs to be regenerated since xml-ref.tex has changed. I'll take a look at the updated installation. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From michael@graphion.com Thu Dec 3 18:25:21 1998 From: michael@graphion.com (Michael Sanborn) Date: Thu, 03 Dec 1998 10:25:21 -0800 Subject: [XML-SIG] Why is Builder.push() the way it is? Message-ID: <3666D791.F1252EB0@graphion.com> I'm new to Python, and am very interested in using the DOM implementation. So I'm puzzling over something in builder.py, and hope to have help understanding it. There's a section in push() like this: if self.current_element: self.current_element.insertBefore(node, None) elif nodetype in _LEGAL_DOCUMENT_CHILDREN: if nodetype == TEXT_NODE: if string.strip(node.get_nodeValue()) != "": self.document.appendChild(node) else: self.document.appendChild(node) Now, as far as I can see from the DOM spec and the definition of _LEGAL_DOCUMENT_CHILDREN, if nodetype is in _LEGAL_DOCUMENT_CHILDREN, nodetype will never be equal to TEXT_NODE. I was rather imagining that this section would read: if self.current_element: if nodetype == TEXT_NODE: if string.strip(node.get_nodeValue()) != "": self.document.appendChild(node) else: self.current_element.insertBefore(node, None) elif nodetype in _LEGAL_DOCUMENT_CHILDREN: self.document.appendChild(node) I expect that I'm mistaken, but I'd like to know why. As a second, minor question, why does one sometimes use appendChild(node) and other times insertBefore(node, None)? Best regards, Michael Sanborn Graphion Typesetting From akuchlin@cnri.reston.va.us Thu Dec 3 19:52:31 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 3 Dec 1998 14:52:31 -0500 (EST) Subject: [XML-SIG] Why is Builder.push() the way it is? In-Reply-To: <3666D791.F1252EB0@graphion.com> References: <3666D791.F1252EB0@graphion.com> Message-ID: <13926.58361.528160.505768@amarok.cnri.reston.va.us> Michael Sanborn writes: >Now, as far as I can see from the DOM spec and the definition of >_LEGAL_DOCUMENT_CHILDREN, if nodetype is in _LEGAL_DOCUMENT_CHILDREN, >nodetype will never be equal to TEXT_NODE. I was rather imagining that >this section would read: Hmm... Actually, you're not mistaken; that code does look suspicious. Thanks for the bug report! Certainly there's nothing clever going on the covers that makes that code reasonable. I'll do some archaeology in the CVS logs and try to figure out when the problem crept in, and fix it. (Maybe not in time for the 0.5 release, though.) >As a second, minor question, why does one sometimes use >appendChild(node) and other times insertBefore(node, None)? appendChild(node) actually just calls insertBefore(node, None), so there's no real difference other than the extra method call. If you were trying to do something very high-performance, you might use the latter form just to avoid the extra function call, but for most uses it doesn't matter. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I suppose I had vaguely hoped that you had changed, my brother. That you'd noticed that there were other people in the world. That you had begun to see people as other than things that dream, as creatures of stories. -- Destruction to Dream, in SANDMAN #48: "Brief Lives:8" From dkuhlman@enterpriselink.com Fri Dec 4 20:07:27 1998 From: dkuhlman@enterpriselink.com (Dave Kuhlman) Date: Fri, 04 Dec 1998 12:07:27 -0800 Subject: [XML-SIG] Installing and Test xml-0.5pre2 Message-ID: <366840FF.3EDAA9CC@EnterpriseLink.com> xml-0.5pre2 looks very good to me. I installed and tested under Linux Debian 2.0. I'm using Python 1.5.1. I had no problems compiling and installing. In case it is not obvious, I'm extremely grateful for the work you have done in support of XML for Python. I really believe that Python is going to be one of the best tools for processing XML. And, you people are making it happen. Thanks. Here are some notes about changes I made when running the demos. Read these with some skepticism. Please don't spend too much time replying to my comments. I'm happier when you're fixing the code, and I need to learn some of this for myself. In demo/quotes/qtfmt.py, I changed: 19c19,21 < import wstring, iso8859 # For fixing UTF-8 encoding --- > from xml.unicode import wstring > from xml.unicode import iso8859 > 353c355 < p=xml.sax.saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat") --- > p=xml.sax.saxexts.XMLParserFactory.make_parser("pyexpat") This probably should have been fixed by setting something in my environment. (And, I believe that you fixed the path to pyexpat in pre3.) You might tell how to set up for these tests in the README in demo/quotes. Whoa! I just saw a xml-0.5pre3 on the download site. You guys work fast. I downloaded it and installed it on my WinNT 4.0 machine here at work. (I only get to use Linux at home; here at work the un-enlightened make me use WinNT.) Again, with Python 1.5.1. Here is what I did to make it work: 1. Unzip the .zip file in a directory named, say, C:\Python\Test. It created a sub-directory C:\Python\Test\xml-0.5. 2. Rename directory "xml-0.5" to "xml" (because on my Linux machine, that's the name of the sub-directory it looks for under site-packages, so I guessed that this is the PYTHONPATH we need). 3. Create and run a batch file set_envir.bat containing the following: set PATH=C:\Python\Test\xml-0.5\windows;%PATH% set PYTHONPATH=C:\Python\Test;%PYTHONPATH% 4. Run some demos. You should consider including a README.windows file containing the above instructions or the correct ones if mine are wrong. (And it doesn't quite work for pyexpat. See below.) A comment on SAX drivers -- Are all the files in site-packages/xml/SAX/drivers that begin with "drv_" supposed to be SAX drivers? There were several that didn't work when I gave them as arguments to saxtimer.py. Testing on WinNT 4.0, now. Specifically I got the error message "ERROR: Parser not available" when I tried to use: xmltoolkit xmldc sgmlop pyexpat The following SAX drivers worked: xmllib sgmllib xmlproc I can't get pyexpat to load. This fails in demo/quotes/qtfmt.py and demo/sax/saxtimer.py. I'm guessing that it has something to do with my path or PYTHONPATH, but I have not figured out what. I have to spend more time looking at rec_find_module in saxexts.py, I suppose. I'd like to see a few more notes (in README files) on how to run each of the demos. Also, I'd like a few notes on how to set up my environment to run the demos. I looked at some of the stuff in the doc directory, but have not had time to read it thoroughly. When I do, maybe my questions will be answered. Many thanks again. -- Dave -- Dave Kuhlman EnterpriseLink Technology Corp http://www.enterpriselink.com 2542 S. Bascom Ave., Suite #203 Campbell, CA 95008 dkuhlman@EnterpriseLink.com 408-558-2011 From Jeff.Johnson@icn.siemens.com Fri Dec 4 20:51:30 1998 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Fri, 4 Dec 1998 15:51:30 -0500 Subject: [XML-SIG] Installing and Test xml-0.5pre2 Message-ID: <852566D0.00727B2B.00@li01.lm.ssc.siemens.com> > I can't get pyexpat to load. The way I do it for Windows is... copy xml/windows/pyexpat.dll to xml/parsers copy xml/expat/bin/xmlparse.dll to somewhere in path copy xml/expat/bin/xmltok.dll to somewhere in path For a while, the pyexpat.dll in CVS was corrupted but I think its been fixed. I'm not sure where xmlparse.dll and xmltok.dll should be but as long as they are in a directory in your PATH, they will be used. That's all I can remember, I hope that covers it. From akuchlin@cnri.reston.va.us Fri Dec 4 21:55:57 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 4 Dec 1998 16:55:57 -0500 (EST) Subject: [XML-SIG] Eliminating whitespace Message-ID: <13928.20594.800816.604650@amarok.cnri.reston.va.us> A common task when processing a document using the DOM is to strip out unnecessary whitespace. I'd definitely like to have a function or set of functions to do this, and would like to discuss what the interface should look like. The problem: given a DOM tree, you want to remove whitespace from it. There are several dimensions to the problem: * Delete whitespace, or collapse it down to a single space? * Just act on Text nodes that are all whitespace? Or act on Text nodes with leading, trailing, or internal whitespace? (If acting on internal whitespace, you'll probably be collapsing down to a single space, not deleting everything. Though who knows?) Anyway, I don't think there's any call for making elaborate whitespace-deleting classes that can be customized in various ways. So, how about a function (or method on dom.core.Node?). Strawman interface: normalize_whitespace( DOMtree, collapse = [true | false] default false, inside_node = [true | false] default false, where = LEFT, RIGHT, INSIDE, or a bitwise OR of these flags Default = all of them ) Examples: normalize_whitespace( DOMtree ) Drop all whitespace-only nodes normalize_whitespace( DOMtree, 1, 1 ) Collapse all runs of whitespace down to single spaces normalize_whitespace( DOMtree, 1, 1, LEFT | RIGHT ) Strip trailing and leading whitespace from all Text nodes I have a sneaking feeling that there's one argument too many in that function, and it could be made more compact somehow, but can't think of anything definite. Anyone got suggestions? (Where's Tim Peters when you need him?) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "I'll be curious to see what he thinks Hell is." "Garn, I hope he ain't British. Some of that stuff them people dream up... it's enough to gag a maggot." -- Demons awaiting Stanley's arrival in Hell in STANLEY AND HIS MONSTER #4 From akuchlin@cnri.reston.va.us Fri Dec 4 22:13:27 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 4 Dec 1998 17:13:27 -0500 (EST) Subject: [XML-SIG] pre3 -> 0.5 Message-ID: <13928.24060.914543.362950@amarok.cnri.reston.va.us> I've copied xml-0.5pre3 and renamed it to xml-0.5, the final package. The difference between pre2 and pre3 is simply fixing some minor nits, and not compiling the intl module by default. I really don't want to mess around with pre-releases anymore, so, if you try it out and find some hideous bug, let me know at amk1@erols.com . Otherwise, I'll write up an announcement and start sending it out over the weekend. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The universe may / be as great as they say. / But it wouldn't be missed / if it didn't exist. -- Piet Hein From mazito@softlab.com.ar Fri Dec 4 22:15:29 1998 From: mazito@softlab.com.ar (Mario A. Zito) Date: Fri, 04 Dec 1998 19:15:29 -0300 Subject: [XML-SIG] On going projects (and our own project) Message-ID: <36685F01.D67A5EF6@softlab.com.ar> I am new to XML, this list and to Python. It would be of interest to me (an perhaps to other list members) if more senior members can describe the ongoing projects they plan to use XML and Python for, so that more junior members (like me) can get an idea of the different ways of using this combination. In particular, I am planning to use it to construct an integrated mail based defect tracking, project managment and distributed version control system. All mails will be submitted as XML documents, and a Python based mail server will parse them and take the needed actions, correlate versions with defects, send bug reports to the right developer (also as XML docs), save the bugs in a database, store new versions in CVS, generate project status reports, etc. This will be our first Python/XML project, and (we expect) it will be able to support our own development projects. Our idea is to try to assemble it (as much as possible) from already existent software (such as CVS), and concentrate on the mail processor. And of course, gain working experience with Python and XML by the way. If we come out with something that (really) works, and others are interested in it, we may put it for public use. Any ideas, suggestions or any type of collaboration will be welcomed. If someone is interested on them, I can post to this list our proposed DTDs as the evolve. If someone objects to this, please let me know so I don't break any explicit or implicit rules (maybe this must be in some other list ?) Thanks. Mario A. Zito SoftLab SRL From fleck@informatik.uni-bonn.de Sat Dec 5 13:18:22 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Sat, 05 Dec 1998 14:18:22 +0100 Subject: [XML-SIG] On going projects (and our own project) References: <36685F01.D67A5EF6@softlab.com.ar> Message-ID: <3669329E.57@informatik.uni-bonn.de> Mario A. Zito wrote: > In particular, I am planning to use it to construct an integrated mail > based defect tracking, project managment and distributed version > control system. All mails will be submitted as XML documents, and a > Python based mail server will parse them and take the needed actions, > correlate versions with defects, send bug reports to the right developer > (also as XML docs), save the bugs in a database, store new versions in > CVS, generate project status reports, etc. Cool. The "GNU Gather" project will use Python and the Roxen WWW server to create a WebDAV/RTSP-based groupware framework, including applications like issue tracking and "knowledge database" management. We don't have any release schedule yet; in fact, the programming for "GNU Gather" hasn't even started. So while you might want to subscribe yourself to the "GNU Gather" initial announcements mailing list (see my .sig for URL), "GNU Gather" is probably a bit to heavyweight (and will take too long to finish) if all you want at the moment is "just" a mail tracking system. BTW, there's a web page about bug tracking and problem management tools for Linux at . Yours, Markus. -- //////////////////////////////////////////////////////////////////////////// Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079 UNIX Administrator - comp.lang.python.announce Moderator "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From larsga@ifi.uio.no Sat Dec 5 21:33:42 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 05 Dec 1998 22:33:42 +0100 Subject: [XML-SIG] Installing and Test xml-0.5pre2 In-Reply-To: <366840FF.3EDAA9CC@EnterpriseLink.com> References: <366840FF.3EDAA9CC@EnterpriseLink.com> Message-ID: * Dave Kuhlman | | A comment on SAX drivers -- Are all the files in | site-packages/xml/SAX/drivers that begin with "drv_" supposed to be | SAX drivers? Yes, although drv_xmltok is for XMLTok (the older version of expat). The rest are common libraries shared between different drivers. | There were several that didn't work when I gave them as arguments to | saxtimer.py. Testing on WinNT 4.0, now. Specifically I got the | error message "ERROR: Parser not available" when I tried to use: | | xmltoolkit | xmldc | sgmlop | pyexpat Well, you need to have the parsers installed. They work for me, but if they don't for you I'm very interested in hearing about it. Could you check if they're installed so you can load the parser from the command line and let me know how it works out? | I can't get pyexpat to load. This fails in demo/quotes/qtfmt.py and | demo/sax/saxtimer.py. I'm guessing that it has something to do with | my path or PYTHONPATH, but I have not figured out what. I have to | spend more time looking at rec_find_module in saxexts.py, I suppose. Try importing it from the command-line. If that works and rec_find_module does not, then please send me a bug report and I'll fix it. (Oh, and thanks for giving us some feedback.) --Lars M. From akuchlin@cnri.reston.va.us Sun Dec 6 16:47:14 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 6 Dec 1998 11:47:14 -0500 Subject: [XML-SIG] Proposed announcement Message-ID: <199812061647.LAA26113@207-172-39-232.s232.tnt10.ann.erols.com> To be sent to: c.l.py.announce, comp.text.xml, xml-dev, www-dom, announcement on freshmeat.net, other suggestions? ================== Version 0.5 of the Python/XML distribution can be downloaded from http://www.python.org/sigs/xml-sig/files/xml-0.5.tgz The Python/XML distribution contains the basic tools required for processing XML data using the Python programming language, assembled into one easy-to-install package. The distribution includes parsers and standard interfaces such as SAX and DOM, along with various other useful modules. Version 0.5 can be considered a beta release. Major changes in this version: * The DOM implementation has been extensively modified, and is now much closer to compliance with the DOM Recommendation. * A Unicode type has been added as the subpackage xml.unicode.wstring. * Various subpackages have been upgraded to their most recent versions. The package currently contains: * XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius Garshol), xmllib.py (Sjoerd Mullender) using the sgmlop.c accelerator module (Fredrik Lundh). * SAX interface (Lars Marius Garshol) * DOM interface (Stefane Fermigier, A.M. Kuchling) * xmlarch.py, for architectural forms processing (Geir Ove Grønmo) * Unicode wide-string module (Martin von Löwis) * Various utility modules and functions (various people) * Documentation and example programs (various people) The code is being developed bazaar-style by contributors from the Python XML Special Interest Group, so please send comments, questions, or bug reports to . For general information about Python, see: http://www.python.org The Python XML-SIG home page is: http://www.python.org/sigs/xml-sig/ -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Problems worthy of attack / prove their worth by hitting back. -- Piet Hein From akuchlin@cnri.reston.va.us Sun Dec 6 22:05:23 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 6 Dec 1998 17:05:23 -0500 Subject: [XML-SIG] XML and Zope Message-ID: <199812062205.RAA26917@207-172-56-151.s151.tnt12.ann.erols.com> As part of experimenting with Zope, I wanted to create a new tag under DocumentTemplate, and chose to create one that formatted some XML; I re-used my quotation formatting code, which made the job pretty trivial. It wasn't too hard to do, and you can see some notes on it at http://starship.skyport.net/crew/amk/zope/new-tag.html . As an example, in a DTML document I can now put: The days come and go... Ralph Waldo Emerson The tag will convert the fragment of XML it contains into HTML; more realistically, the content would come from a database query or some other source, and be present as a variable, something like: This is a very simple example, of course. What could we do that would be more general and more useful? An XSL styler (think of it: ...) would be an obvious prospect, but would also be a sizable job. Is there something smaller that would be easier to implement, but still useful for someone? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Barney turned his little squinty blue eyes on me. "We go to the garrick now and become warbs," he said. "The hell we do!" I thought to myself quickly. -- James Thurber, "The Black Magic of Barney Haller", in _The Thurber Carnival_ From kajiyama@etl.go.jp Mon Dec 7 09:34:34 1998 From: kajiyama@etl.go.jp (Tamito Kajiyama) Date: Mon, 7 Dec 98 09:34:34 JST Subject: [XML-SIG] Proposed announcement In-Reply-To: <199812061647.LAA26113@207-172-39-232.s232.tnt10.ann.erols.com> (amk1@erols.com) Message-ID: <9812070034.AA05534@etlibs2.etl.go.jp> "A.M. Kuchling" writes: | To be sent to: c.l.py.announce, comp.text.xml, xml-dev, www-dom, | announcement on freshmeat.net, | other suggestions? Please excuse me sending a problem report at the last moment of the final release (the wide range of the distinations of the proposed announcement reminds me ;). On SunOS 4.1.4_JL, the pyexpat module fails because of a call of an undefined procedure at runtime. The fix is simple: running ranlib on expat/libexpat.a before linking to pyexpat.so. And, here is a trivial patch: *** expat/Makefile.orig Sun Dec 6 01:02:48 1998 --- expat/Makefile Sun Dec 6 01:01:56 1998 *************** *** 40,43 **** --- 40,44 ---- libexpat.a: $(OBJS) ar cr libexpat.a $(OBJS) + ranlib libexpat.a I don't know if this problem happen on platforms other than SunOS. Regards, -- KAJIYAMA, Tamito From fleck@informatik.uni-bonn.de Mon Dec 7 10:41:08 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Mon, 07 Dec 1998 11:41:08 +0100 Subject: [XML-SIG] Pointer: "SPIN_py - SGML Parser Integration Project" Project" Message-ID: <366BB0C4.105B@informatik.uni-bonn.de> Hi! 32BITSONLINE has an article about SPIN_py: > SPIN_py - SGML Parser Integration Project > [...] > SPIN is an interface to SP. It delivers edge > events from SP to your script directly from > the C++ API to your Python script. URL: Greets, Markus. -- //////////////////////////////////////////////////////////////////////////// Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079 UNIX Administrator - comp.lang.python.announce Moderator "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From SHunting@goSPS.com Mon Dec 7 15:23:19 1998 From: SHunting@goSPS.com (Hunting, Sam) Date: Mon, 7 Dec 1998 10:23:19 -0500 Subject: [XML-SIG] Parameter entity visualization tool Message-ID: <518E520AF877D111B58100A0C9920BF527B060@SPS01> In python, is there such a thing as a parameter entity visualization tool, that would show how content model "building blocks" work? This would seem to be very useful in the context of understanding, maintaining, configuring, extending DTDs/schemas like Voyager (http://www.w3.org/TR/1998/WD-html-in-xml-19981205/) and of course the usual suspects like TEI and docbook %paragraph.stuff #PCDATA | %this.stuff a |b | %that.stuff c |d | %the.other.stuff "" I envision it working like a collapsible outliner, but a printout would be fine too. From p_schneider1@yahoo.com Tue Dec 8 17:47:52 1998 From: p_schneider1@yahoo.com (Paul Schneider) Date: Wed, 9 Dec 1998 04:47:52 +1100 (EST) Subject: [XML-SIG] XML for Windows Message-ID: <19981208174752.5216.rocketmail@send105.yahoomail.com> Hi there! I just downloaded and unpacked the xml-package xml-0_5pre3.zip. The makefile supplied to compile and install it is only for UNIX. -Is there a different pachage for NT? -How do I get the package running under Windows NT? Paul _________________________________________________________ DO YOU YAHOO!? Get your free @yahoo.com address at http://mail.yahoo.com From jday@csihq.com Wed Dec 9 00:19:58 1998 From: jday@csihq.com (John Day) Date: Tue, 08 Dec 1998 19:19:58 -0500 Subject: [XML-SIG] xml install problems Message-ID: <3.0.1.32.19981208191958.00692914@mail.csihq.com> Hi, I'm a Python and Xml newbie, having problems installing the latest xml-0.5 under Linux. I did not have Python installed, so I got the latest Python 1.5.1 from www.python.org and installed it with prefix=/home/jday It seemed to install OK and make test ran OK. Then I unzipped the xml-0.5 into /home/jday/xml/xml-0.5/ and did: make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/home/jday make make install The make seemed to make and install everything OK, _but_ 6 out of 7 tests failed: jday@medusa:/home/jday/xml/xml-0.5> make test cd test ; PYTHONPATH=.. python testxml.py test_arch test test_arch skipped -- an optional feature could not be imported test_dom test test_dom skipped -- an optional feature could not be imported test_pyexpat test_sax test test_sax skipped -- an optional feature could not be imported test_unicode test test_unicode skipped -- an optional feature could not be imported test_utils test test_utils skipped -- an optional feature could not be imported test_xmllib test test_xmllib skipped -- an optional feature could not be imported 1 test OK. 6 tests skipped: test_arch test_dom test_sax test_unicode test_utils test_xmllib jday@medusa:/home/jday/xml/xml-0.5/test> python test_arch.py Traceback (innermost last): File "test_arch.py", line 6, in ? from xml.sax import saxexts, saxlib, saxutils ImportError: No module named xml.sax PYTHONPATH was not defined so I tried setenv PYTHONPATH /home/jday/lib/python1.5/ with no improvement. I know very little about python and your XML implementations. What am I doing wrong? Thanks, John Day Palm Bay, Florida From betty@eccnet.eccnet.com Wed Dec 9 01:33:23 1998 From: betty@eccnet.eccnet.com (Betty Harvey) Date: Tue, 8 Dec 1998 20:33:23 -0500 (EST) Subject: [XML-SIG] xml install problems In-Reply-To: <3.0.1.32.19981208191958.00692914@mail.csihq.com> Message-ID: John: I had similar problems but much earlier. I tried installing Python on Linux 5.0. The Makefile.pre.in worked just fine, however, when I tried the 'make' I got the following error: gcc -fPIC -O -I/usr/include/python1.4 -I/usr/include/python1.4 -DHAVE_CONFIG_H -Iexpat/xmlparse -c ./pyexpat.c ./pyexpat.c: In function `mywrite': ./pyexpat.c:64: void value not ignored as it ought to be make: *** [pyexpat.o] Error 1 Betty On Tue, 8 Dec 1998, John Day wrote: > Hi, > > I'm a Python and Xml newbie, having problems installing the latest > xml-0.5 under Linux. > > I did not have Python installed, so I got the latest Python > 1.5.1 from www.python.org and installed it with prefix=/home/jday > It seemed to install OK and make test ran OK. > > > Then I unzipped the xml-0.5 into /home/jday/xml/xml-0.5/ and > did: > make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/home/jday > make > make install > > The make seemed to make and install everything OK, _but_ 6 out of 7 tests > failed: > > > jday@medusa:/home/jday/xml/xml-0.5> make test > cd test ; PYTHONPATH=.. python testxml.py > test_arch > test test_arch skipped -- an optional feature could not be imported > test_dom > test test_dom skipped -- an optional feature could not be imported > test_pyexpat > test_sax > test test_sax skipped -- an optional feature could not be imported > test_unicode > test test_unicode skipped -- an optional feature could not be imported > test_utils > test test_utils skipped -- an optional feature could not be imported > test_xmllib > test test_xmllib skipped -- an optional feature could not be imported > 1 test OK. > 6 tests skipped: test_arch test_dom test_sax test_unicode test_utils > test_xmllib > > jday@medusa:/home/jday/xml/xml-0.5/test> python test_arch.py > Traceback (innermost last): > File "test_arch.py", line 6, in ? > from xml.sax import saxexts, saxlib, saxutils > ImportError: No module named xml.sax > > PYTHONPATH was not defined so I tried > setenv PYTHONPATH /home/jday/lib/python1.5/ > with no improvement. > > I know very little about python and your XML implementations. What > am I doing wrong? > > Thanks, > John Day > Palm Bay, Florida > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > From akuchlin@cnri.reston.va.us Wed Dec 9 03:48:56 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Tue, 8 Dec 1998 22:48:56 -0500 Subject: [XML-SIG] Whitespace stripping functions Message-ID: <199812090348.WAA03586@207-172-46-251.s251.tnt9.ann.erols.com> I've added a dom.utils module for small utility functions for the DOM and plan to check it into the CVS tree. cvs.python.org is inaccessible for some reason, so a copy is appended below. It implements tree_print(), strip_whitespace(), and collapse_whitespace(). tree_print() is intended for debugging, and returns a string showing the tree structure of a DOM subtree. strip_whitespace() removes leading/trailing/both whitespace in-place from a DOM tree, and collapse_whitespace() folds runs of whitespace into a single space. Comments, suggestions? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I spent a busy day today, but got little done. This is because I am at last becoming perfect in the art of seeming busy, even when very little is going on in my head or under my hands. This is an art which every man learns, if he does not intend to work himself to death. -- Robertson Davies, _The Table Talk of Samuel Marchbanks_ # utils.py import re from xml.dom import core # Various utility functions that are often handy. def tree_print(node, indent = 0): """Print a representation of a tree that makes the tree structure explicit. Intended mostly for debugging use, so it's a lossy printout.""" s = indent*' ' + repr(node) + '\n' for n in node.get_childNodes(): s = s + tree_print(n, indent + 2) return s # this should grow up into a general-purpose whitespace post-processor, # options to include: # - whether to strip (s/\s+//) or collapse (s/\s+/ /) # - where to do it: head, tail, or interior of text nodes, or # all-whitespace nodes only # Initial implementation by Greg Ward; modified and collapse_whitespace added # by AMK. import string WS_LEFT, WS_BOTH, WS_RIGHT, WS_INTERNAL = [1,2,3,4] strip_func = {WS_LEFT: string.lstrip, WS_BOTH: string.strip, WS_RIGHT: string.rstrip } collapse_pat = {WS_LEFT: '^\s+', WS_BOTH: '(^\s+)|(\s+$)', WS_RIGHT: '\s+$', WS_INTERNAL: '\s+'} def strip_whitespace (node, func = WS_BOTH): """Remove leading and/or trailing whitespace from a DOM tree. node -- top node; its subtree will be traversed func -- one of WS_LEFT, WS_RIGHT, WS_BOTH telling which whitespace to strip """ if func == WS_INTERNAL: raise ValueError, "WS_INTERNAL not acceptable value for strip_whitespace()" func = strip_func[func] if node.nodeType == core.DOCUMENT_NODE: node = node.documentElement stack = [node] while (stack): # get the top node from the stack node = stack[-1] # XXX a general-purpose "visit" operation could go right here # walk this node's list of children, deleting those that are # all whitespace and saving the rest to be pushed onto the stack children = [] for child in node.childNodes[:] : if child.nodeType == core.TEXT_NODE: orig = child.get_nodeValue() v = func( orig ) if v == "": node.removeChild (child) elif v != orig: child.set_nodeValue( v ) elif child.hasChildNodes(): children.append (child) children.reverse() stack[-1:] = children # end: while stack not empty # end strip_whitespace def collapse_whitespace (node, func = WS_BOTH): """Collapse runs of whitespace down to a single space. node -- top node; its subtree will be traversed func -- one of WS_LEFT, WS_RIGHT, WS_BOTH, WS_INTERNAL telling which whitespace should be collapsed. """ pat = collapse_pat[ func ] pat = re.compile( pat ) if node.nodeType == core.DOCUMENT_NODE: node = node.documentElement stack = [node] while (stack): # get the top node from the stack node = stack[-1] # XXX a general-purpose "visit" operation could go right here # walk this node's list of children, deleting those that are # all whitespace and saving the rest to be pushed onto the stack children = [] for child in node.childNodes[:] : if child.nodeType == core.TEXT_NODE: orig = child.get_nodeValue() v = pat.sub(' ', orig) if v != orig: child.set_nodeValue( v ) elif child.hasChildNodes(): children.append (child) children.reverse() stack[-1:] = children # end: while stack not empty # end collapse_whitespace From jday@csihq.com Wed Dec 9 11:15:57 1998 From: jday@csihq.com (John Day) Date: Wed, 09 Dec 1998 06:15:57 -0500 Subject: [XML-SIG] xml install problems Message-ID: <3.0.1.32.19981209061557.006e5384@mail.csihq.com> Hi, I wrote yesterday that the Python and xml-0.5 installs proceeded without error, yet I could not run the xml-0.5 test files. I'm still trying to pinpoint the exact problem. My problem seems to be that Python1.5 can't see the xml site-package. It exists and seems to contain everything: jday@medusa:/home/jday/lib/python1.5/site-packages/xml> dir total 14 drwxr-xr-x 8 jday csi 1024 Dec 8 19:11 ./ drwxr-xr-x 3 jday csi 1024 Dec 5 06:24 ../ -rw-r--r-- 1 jday csi 37 Dec 8 19:11 __init__.py -rw-r--r-- 1 jday csi 175 Dec 8 19:11 __init__.pyc -rw-r--r-- 1 jday csi 169 Dec 8 19:11 __init__.pyo -rw-r--r-- 1 jday csi 427 Dec 8 19:11 _checkversion.py -rw-r--r-- 1 jday csi 654 Dec 8 19:11 _checkversion.pyc -rw-r--r-- 1 jday csi 621 Dec 8 19:11 _checkversion.pyo drwxrwxr-x 2 jday csi 1024 Dec 8 19:11 arch/ drwxrwxr-x 2 jday csi 1024 Dec 8 19:11 dom/ drwxrwxr-x 3 jday csi 1024 Dec 8 19:11 parsers/ drwxrwxr-x 3 jday csi 1024 Dec 8 19:11 sax/ drwxrwxr-x 2 jday csi 1024 Dec 8 19:11 unicode/ drwxrwxr-x 2 jday csi 1024 Dec 8 19:11 utils/ But when I run python I can't import any of them: jday@medusa:/home/jday/lib/python1.5/site-packages/xml> python Python 1.5.1 (#1, Dec 8 1998, 18:51:08) [GCC 2.8.1] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import xml.sax Traceback (innermost last): File "", line 1, in ? ImportError: No module named xml.sax Python obviously can't see the xml package. I've got python in my home bin and I've set PYTHONPATH to /home/jday/lib/python1.5/ What else is there to do? I'm new to Python so I don't really know the basic mechanism for installing these packages? I can't find it in any of the docs. (I'm guessing this is a simple problem to fix :-) Thanks, John Day From fredrik@pythonware.com Wed Dec 9 11:25:34 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 9 Dec 1998 12:25:34 +0100 Subject: [XML-SIG] xml install problems Message-ID: <001301be2366$a42b4c90$f29b12c2@pythonware.com> >Python obviously can't see the xml package. I've got python in my home >bin and I've set PYTHONPATH to /home/jday/lib/python1.5/ What else is >there to do? try this: $ python >>> import sys >>> sys.path this prints a list of all entries added to the python path. by the way, does "import sax" work ? Cheers /F fredrik@pythonware.com http://www.pythonware.com From jday@csihq.com Wed Dec 9 12:16:52 1998 From: jday@csihq.com (John Day) Date: Wed, 09 Dec 1998 07:16:52 -0500 Subject: [XML-SIG] xml install problems In-Reply-To: <001301be2366$a42b4c90$f29b12c2@pythonware.com> Message-ID: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> You wrote: ..... >try this: > >$ python >>>> import sys >>>> sys.path > >this prints a list of all entries added to the python path. > >by the way, does "import sax" work ? > ..................................................... Here's my sys.path (thanks, I didn't know about this): >>> import sys >>> for i in sys.path: print i ... /home/jday/bin/lib/python1.5/ /home/jday/bin/lib/python1.5/test /home/jday/bin/lib/python1.5/plat-linux2 /home/jday/bin/lib/python1.5/lib-tk /home/jday/bin/lib/python1.5/lib-dynload >>> import sax Traceback (innermost last): File "", line 1, in ? ImportError: No module named sax ....................................................... Now I'm confused. None of the above packages are members of the "site-packages" directory. (xml is the only entry) I assumed "site-packages" would list _all_ installed pythonic packages. -jday From kajiyama@etl.go.jp Wed Dec 9 23:49:21 1998 From: kajiyama@etl.go.jp (Tamito Kajiyama) Date: Wed, 9 Dec 98 23:49:21 JST Subject: [XML-SIG] xml install problems In-Reply-To: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> (message from John Day on Wed, 09 Dec 1998 07:16:52 -0500) Message-ID: <9812091449.AA11121@etlibs2.etl.go.jp> John Day writes: | | >>> import sys | >>> for i in sys.path: print i | .. | /home/jday/bin/lib/python1.5/ | /home/jday/bin/lib/python1.5/test | /home/jday/bin/lib/python1.5/plat-linux2 | /home/jday/bin/lib/python1.5/lib-tk | /home/jday/bin/lib/python1.5/lib-dynload | | Now I'm confused. None of the above packages are members | of the "site-packages" directory. (xml is the only entry) | I assumed "site-packages" would list _all_ installed pythonic | packages. Each element of sys.path is a directory Python searches modules. See Section 3.1 of the Python Library Manual for more information about `sys.path' (http://www.python.org/doc/lib/module-sys.html). BTW, in your message <3.0.1.32.19981208191958.00692914@mail.csihq.com>, you said you installed Python 1.5.1 with prefix=/home/jday. So, the directories listed in sys.path should be /home/jday/lib/python1.5/ /home/jday/lib/python1.5/test /home/jday/lib/python1.5/plat-linux2 and so on. Also, the directory /home/jday/lib/python1.5/site-packages should be in sys.path if you don't run Python with the -S option. I believe it is an installation problem of Python, not the XML package. I can't understand the cause of your problem. How about posting a message to comp.lang.python? -- KAJIYAMA, Tamito From akuchlin@cnri.reston.va.us Wed Dec 9 15:05:18 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 9 Dec 1998 10:05:18 -0500 (EST) Subject: [XML-SIG] xml install problems In-Reply-To: References: <3.0.1.32.19981208191958.00692914@mail.csihq.com> Message-ID: <13934.36700.619892.696779@amarok.cnri.reston.va.us> Betty Harvey writes: >I had similar problems but much earlier. I tried installing >Python on Linux 5.0. The Makefile.pre.in worked just fine, >however, when I tried the 'make' I got the following >error: >gcc -fPIC -O -I/usr/include/python1.4 -I/usr/include/python1.4 >-DHAVE_CONFIG_H >-Iexpat/xmlparse -c ./pyexpat.c >./pyexpat.c: In function `mywrite': >./pyexpat.c:64: void value not ignored as it ought to be >make: *** [pyexpat.o] Error 1 I'd recommend using Python 1.5, because 1.5 added several new features that are used in the XML code, most notably packages and the class-based exceptions. The compile error you report also stems from a difference between Python 1.4 and 1.5. While I can easily produce a patch that fixes pyexpat.c, you'd then run into more difficulties: the missing package support, exceptions, no re module, etc. (Or do XML-SIG people think that compatibility with 1.4 is important? If so, we can work on making sure it works with the older version.) In the meantime, I'll document the dependence on Python 1.5 more explicitly in the README; thanks for your bug report! -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "What are we going to do now?" "Keep it confused, feed it with useless information. I wonder if I have a television set handy?" -- Sgt. Benton and the second Doctor, in "The Three Doctors" From Fred L. Drake, Jr." References: <001301be2366$a42b4c90$f29b12c2@pythonware.com> <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> Message-ID: <13934.39325.895915.517536@weyr.cnri.reston.va.us> John Day writes: > Here's my sys.path (thanks, I didn't know about this): > > >>> import sys > >>> for i in sys.path: print i > .. > /home/jday/bin/lib/python1.5/ > /home/jday/bin/lib/python1.5/test > /home/jday/bin/lib/python1.5/plat-linux2 > /home/jday/bin/lib/python1.5/lib-tk > /home/jday/bin/lib/python1.5/lib-dynload > >>> import sax > Traceback (innermost last): > File "", line 1, in ? > ImportError: No module named sax > ...................................................... > > Now I'm confused. None of the above packages are members > of the "site-packages" directory. (xml is the only entry) > I assumed "site-packages" would list _all_ installed pythonic > packages. I'm going to be bold and guess that you're using Python 1.5, not 1.5.1. Python 1.5 did not automatically import the "site" module, but 1.5.1 does (if I recall correctly ;). Try doing "import site, xml". If that works, then you can do either of two things: - Add "import site" before your "import xml" in you application code, or - Upgrade to Python 1.5.1. Of course, now that I've written all this, you probably already have 1.5.1 and I'm confused. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From larsga@ifi.uio.no Wed Dec 9 16:14:59 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 09 Dec 1998 17:14:59 +0100 Subject: [XML-SIG] xml install problems In-Reply-To: <13934.36700.619892.696779@amarok.cnri.reston.va.us> References: <3.0.1.32.19981208191958.00692914@mail.csihq.com> <13934.36700.619892.696779@amarok.cnri.reston.va.us> Message-ID: * Andrew M. Kuchling | | (Or do XML-SIG people think that compatibility with 1.4 is | important? If so, we can work on making sure it works with the | older version.) Sounds pretty hopeless to me, I'm afraid. Both xmllib and xmlproc use the re module, saxlib uses class exceptions (and has to) and I guess the DOM does too. That leaves the C-based parsers and Dan Connolly's more or less useless parser. Personally, I wouldn't give it priority. --Lars M. From jday@csihq.com Wed Dec 9 17:12:46 1998 From: jday@csihq.com (John Day) Date: Wed, 09 Dec 1998 12:12:46 -0500 Subject: [XML-SIG] xml install problems In-Reply-To: <9812091449.AA11121@etlibs2.etl.go.jp> References: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> Message-ID: <3.0.1.32.19981209121246.007682ec@mail.csihq.com> At 11:49 PM 12/9/98 JST, Tamito Kajiyama wrote: > >BTW, in your message <3.0.1.32.19981208191958.00692914@mail.csihq.com>, >you said you installed Python 1.5.1 with prefix=/home/jday. So, the >directories listed in sys.path should be > > /home/jday/lib/python1.5/ > /home/jday/lib/python1.5/test > /home/jday/lib/python1.5/plat-linux2 > >and so on. Also, the directory /home/jday/lib/python1.5/site-packages >should be in sys.path if you don't run Python with the -S option. > Tamito, You were right, problem was the Python installation not xml-0.5. I've got everything working more or less OK now. Here's what I did (for the benefit of any silent minority having similar problems): 1. [From Python make directory]: Did 'make distclean' to clear out the original Python installation. Rebuilt Python1.5.1 from './configure --prefix=/home/jday' [I must have used the wrong prefix before] 2. After 'make' and 'make install' I ended up with _two_ executables: one in [prefix]/bin and the other in the make directory. The one in the make directory immediately allowed me to 'import xml.sax' etc The one in [prefix]/bin still couldn't see the site-packages until I defined setenv PYTHONPATH /home/jday/lib/python1.5/site-packages Then it allowed 'import xml.sax' also. [I don't understand why the executable in the make directory creates a different sys.path than the one in the [prefix]/bin directory] 3. In the xml-0.5 directory I rebuilt everything. The 'make test' still doesn't work [because it temporarily trashes PYTHONPATH] but I was able to run each test separately 'python test/test_arch.py' etc. So I think I am now XML-enabled. Thanks to everybody for helping me out. Now to figure out how the XML parser and other stuff works ;-) -jday From jday@csihq.com Wed Dec 9 19:30:24 1998 From: jday@csihq.com (John Day) Date: Wed, 09 Dec 1998 14:30:24 -0500 Subject: [XML-SIG] sax demo Message-ID: <3.0.1.32.19981209143024.0076a5b8@mail.csihq.com> FYI, in demo/sax/saxhack.py: line 82 class slowParser(xmllib.SlowXMLParser): causes error: only parser in xmllib appears to be class slowParser(xmllib.TestXMLParser): This works OK. -jday From akuchlin@cnri.reston.va.us Wed Dec 9 20:36:19 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 9 Dec 1998 15:36:19 -0500 (EST) Subject: [XML-SIG] xml install problems In-Reply-To: <3.0.1.32.19981209121246.007682ec@mail.csihq.com> References: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> <9812091449.AA11121@etlibs2.etl.go.jp> <3.0.1.32.19981209121246.007682ec@mail.csihq.com> Message-ID: <13934.50329.296793.922448@amarok.cnri.reston.va.us> John Day writes: >3. In the xml-0.5 directory I rebuilt everything. The 'make test' still > doesn't work [because it temporarily trashes PYTHONPATH] but I was > able to run each test separately 'python test/test_arch.py' etc. I should fix that; the intention is that you can run the test suite without having to actually install the package, but that relies on having a symlink from xml to '.' in the main directory. Perhaps it should behave in a way being suggested in the Distutils-sig, and construct a fake installation tree inside the package; actual installation would then be a matter of just copying the tree. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It's the same old story... Whatever it turns into on the way, whatever it is you originally undertake to spin or knit or weave, keep it going long enough and, in the end, my lilies, it's always a winding sheet. -- One of the three Fates, in SANDMAN #69: "The Kindly Ones:13" From betty@eccnet.eccnet.com Thu Dec 10 17:41:06 1998 From: betty@eccnet.eccnet.com (Betty Harvey) Date: Thu, 10 Dec 1998 12:41:06 -0500 (EST) Subject: [XML-SIG] xml install problems In-Reply-To: <13934.36700.619892.696779@amarok.cnri.reston.va.us> Message-ID: On Wed, 9 Dec 1998, Andrew M. Kuchling wrote: > I'd recommend using Python 1.5, because 1.5 added several new > features that are used in the XML code, most notably packages and the > class-based exceptions. The compile error you report also stems from > a difference between Python 1.4 and 1.5. While I can easily produce a > patch that fixes pyexpat.c, you'd then run into more difficulties: the > missing package support, exceptions, no re module, etc. Question about installing 1.5 on LINUX 5.0. I am unable to install 1.5 because LINUX is using Python 1.4 for some system support, including RPM. Is there a safe method for installing 1.5? Has anyone installed Linux 5.2? Is Python 1.5 available on 5.2. I have the CD for 5.2 but haven't upgraded yet. Betty /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ Betty Harvey | Phone: 301-540-8251 FAX: 4268 Electronic Commerce Connection, Inc. | 13017 Wisteria Drive, P.O. Box 333 | Germantown, Md. 20874 | harvey@eccnet.eccnet.com | Washington,DC SGML Users Grp URL: http://www.eccnet.com | http://www.eccnet.com/sgmlug/ /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/ From fredrik@pythonware.com Thu Dec 10 18:01:29 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 10 Dec 1998 19:01:29 +0100 Subject: [XML-SIG] xml install problems Message-ID: <00d301be2467$1e7ef220$f29b12c2@pythonware.com> >Question about installing 1.5 on LINUX 5.0. I am unable to install 1.5 >because LINUX is using Python 1.4 for some system support, including >RPM. Is there a safe method for installing 1.5? Sure. Quoting from the README file: All subdirectories created will have Python's version number in their name, e.g. the library modules are installed in "/usr/local/lib/python1.5/" by default. The Python binary is installed as "python1.5" and a hard link named "python" is created. The only file not installed with a version number in its name is the manual page, installed as "/usr/local/man/man1/python.1" by default. If you have a previous installation of a pre-1.5 Python that you don't want to replace yet, use make altinstall This installs the same set of files as "make install" except it doesn't create the hard link to "python1.5" named "python" and it doesn't install the manual page at all. Dunno about RedHat 5.2; we're still on 4.2 over here. But http://www.redhat.com/product.phtml/RH5020 says it's using Python 1.5.1. Cheers /F fredrik@pythonware.com http://www.pythonware.com From Fred L. Drake, Jr." I've attached a patch to add a get() method to NamedNodeMap, to make it a little more dictionary like. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 Index: core.py =================================================================== RCS file: /projects/cvsroot/xml/dom/core.py,v retrieving revision 1.36 diff -c -c -r1.36 core.py *** core.py 1998/12/09 03:18:58 1.36 --- core.py 1998/12/10 18:30:30 *************** *** 191,196 **** --- 191,201 ---- key = arg.nodeName self[key] = arg + def get(self, key, default=None): + if self.data.has_key(key): + return self[key] + return default + def item(self, index): return self.data.values[ index ] From gstein@lyra.org Thu Dec 10 18:52:29 1998 From: gstein@lyra.org (Greg Stein) Date: Thu, 10 Dec 1998 10:52:29 -0800 Subject: [XML-SIG] xml install problems References: <00d301be2467$1e7ef220$f29b12c2@pythonware.com> Message-ID: <3670186D.50B8090A@lyra.org> Fredrik Lundh wrote: > > >Question about installing 1.5 on LINUX 5.0. I am unable to install 1.5 > >because LINUX is using Python 1.4 for some system support, including > >RPM. Is there a safe method for installing 1.5? > ... > Dunno about RedHat 5.2; we're still on 4.2 over here. But > http://www.redhat.com/product.phtml/RH5020 says it's > using Python 1.5.1. RedHat started installing 1.5.1 as part of RedHat 5.1. In other words, RedHat 5.1 and 5.2 have the most recent (public) version of Python. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Jean-Michel.Bruel@univ-pau.fr Fri Dec 11 11:02:59 1998 From: Jean-Michel.Bruel@univ-pau.fr (Jean-Michel BRUEL) Date: Fri, 11 Dec 1998 12:02:59 +0100 (MET) Subject: [XML-SIG] [CFP] UML'99 Message-ID: <199812111102.MAA02794@crisv4.univ-pau.fr> [apologies if you receive multiple copies of this announcement] ================================================================= Call for Papers <>'99 ================================================================= Second International Conference on the Unified Modeling Language October 28-30, 1999, Fort Collins, Colorado, USA (just before OOPSLA) ================================================================= http://www.cs.colostate.edu/UML99 ================================================================= Invited Speaker: Grady Booch Scope: <>'99 will bring together researchers in academia and industry who are developing processes, methods, techniques, and semantic foundations for the UML. The conference will provide a forum for discussing and evaluating promising approaches that will enhance the application of UML. The <>'99 organizing committee invites authors to submit papers presenting original and unpublished research and experience reports on UML or related topics. Typical areas include (but are not limited to): - Integration of software development techniques - Significant or useful extensions - Metamodels and model interchange - Formal semantics - Business processes and modeling - Experiences reports that contribute significant research ideas or make unconventional use of UML - OCL and other contraint notations - Reuse at the modeling level - Patterns, pattern mining - Extensions and restrictions of UML - UML compared to other notations - Mapping of UML to programming languages, frameworks, databases, and architectures - Modeling software architectures with UML - Verification with UML models - Transformation of UML models (incl. code generation) - Refinement and composition of UML models - Method engineering in the large - Management of UML projects - Modeling of distributed systems - UML and real-time - Metrics and measures based on UML Important dates (deadlines are hard!): Deadline for abstract 05 May 1999 Deadline for submission 15 May 1999 Notification to authors 15 July 1999 Final version of accepted papers 25 August 1999 Conference web page: http://www.cs.colostate.edu/UML99 Submissions: Submit your 10-15 page manuscript electronically in Postscript or pdf using the Springer LNCS style. Details are available at the conference web page. The <>'99 proceedings will be published by Springer-Verlag in the LNCS series. Program Committee: C. Atkinson, Germany J. Bezivin, France J. Bieman, USA G. v. Bochmann, Canada R. Breu, Germany J.-M. Bruel, France F. Buschmann, Germany B. Cheng, USA D. Coleman, USA S. Cook, UK D. D'Souza, USA J. Daniels, UK G. Engels, Germany A. S. Evans, UK E. Fernandez, USA M. Fowler, USA E. Gery, Israel M. Gogolla, Germany M. Griss, USA R. Grosu, USA D. Harel, Israel B. Henderson-Sellers, Australia P. Hruby, Denmark H. Hussmann, Germany I. Jacobson, USA G. Kappel, Austria S. Kent, UK H. Kilov, USA C. Kobryn, USA P. Kruchten, USA K. Lano, UK G. Leavens, USA M. Loomis, USA S. Mellor, USA R. Mitchell, UK A. Moreira, Portugal P.-A. Muller, France L. Northrop, USA G. Overgaard, Sweden B. Paech, Germany J. Rumbaugh. USA A. Schurr, Germany E. Seidewitz, USA B. Selic, Canada R. Soley, USA J. Warmer, Netherlands T. Wasserman, USA A. Wills, UK R. Wirfs-Brock, USA Organizing Committee: Conference Chair: Robert B. France, USA Program Chair: Bernhard Rumpe, Germany Publicity Chairs: J.-M. Bruel, France J. Bieman, USA J. Suzuki, Japan Steering Committee: J. Bezivin, France R. B. France, USA P.-A. Muller, France B. Rumpe, Germany Further Information: Robert B. France E-mail: france@cs.colostate.edu Computer Science Department Tel: 970-491-6356 Colorado State University Fax: 970-491-2466 Fort Collins, CO 80523, USA Bernhard Rumpe E-mail: rumpe@in.tum.de Institut fuer Informatik Tel: 0049-89-289-28129 T. Universitaet Muenchen Fax: 0049-89-289-28183 80290 Muenchen, Germany From George McNinch Fri Dec 11 14:47:42 1998 From: George McNinch (George J McNinch) Date: 11 Dec 1998 09:47:42 -0500 Subject: [XML-SIG] build problems: xml-0.5 Message-ID: This is a MIME multipart message. If you are reading this, you shouldn't. --=-=-= Hi-- I have not been able to build xml-0.4 or xml-0.5 gmcninch@galois 7% uname -a IRIX galois 6.2 03131015 IP22 I'm _not_ using gcc, but IRIX cc. Find attached the compile log. Best, George McNinch --=-=-= Content-Disposition: inline; filename="~/lib/python/xml-0.5/compile_outcome.txt" cd /usr/people/gmcninch/lib/python/xml-0.5/ make -k cd expat ; make libexpat.a CC="cc -n32" CFLAGS=" -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse" cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -o gennmtab/gennmtab gennmtab/gennmtab.c rm -f xmltok/nametab.h gennmtab/gennmtab >xmltok/nametab.h cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmltok/xmltok.o xmltok/xmltok.c cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmltok/xmlrole.o xmltok/xmlrole.c cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/xmlwf.o xmlwf/xmlwf.c cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/codepage.o xmlwf/codepage.c cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlparse/xmlparse.o xmlparse/xmlparse.c "xmlparse/xmlparse.c", line 723: error(1131): expected a field name int tok = XmlContentTok(encoding, start, end, &next); ^ "xmlparse/xmlparse.c", line 723: error(1131): expected a field name int tok = XmlContentTok(encoding, start, end, &next); ^ "xmlparse/xmlparse.c", line 754: error(1131): expected a field name int tok = XmlContentTok(encoding, start, end, &next); ^ "xmlparse/xmlparse.c", line 754: error(1131): expected a field name int tok = XmlContentTok(encoding, start, end, &next); ^ "xmlparse/xmlparse.c", line 1510: error(1131): expected a field name int tok = XmlPrologTok(encoding, s, end, &next); ^ "xmlparse/xmlparse.c", line 1510: error(1131): expected a field name int tok = XmlPrologTok(encoding, s, end, &next); ^ "xmlparse/xmlparse.c", line 1807: error(1131): expected a field name int tok = XmlPrologTok(encoding, s, end, &next); ^ "xmlparse/xmlparse.c", line 1807: error(1131): expected a field name int tok = XmlPrologTok(encoding, s, end, &next); ^ "xmlparse/xmlparse.c", line 1925: warning(1110): statement is unreachable break; ^ "xmlparse/xmlparse.c", line 2007: error(1131): expected a field name int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next); ^ "xmlparse/xmlparse.c", line 2007: error(1131): expected a field name int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next); ^ 10 errors detected in the compilation of "xmlparse/xmlparse.c". *** Error code 2 (bu21) cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlparse/hashtable.o xmlparse/hashtable.c cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/unixfilemap.o xmlwf/unixfilemap.c `libexpat.a' not remade because of errors (bu14) cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Iexpat/xmlparse -c ./pyexpat.c "./pyexpat.c", line 297: warning(1164): argument of type "void (*)()" is incompatible with parameter of type "XML_StartElementHandler" XML_SetElementHandler(self->itself, my_StartElementHandler, ^ "./pyexpat.c", line 298: warning(1164): argument of type "void (*)()" is incompatible with parameter of type "XML_EndElementHandler" my_EndElementHandler); ^ "./pyexpat.c", line 299: warning(1164): argument of type "void (*)()" is incompatible with parameter of type "XML_CharacterDataHandler" XML_SetCharacterDataHandler(self->itself, my_CharacterDataHandler); ^ "./pyexpat.c", line 301: warning(1164): argument of type "void (*)()" is incompatible with parameter of type "XML_ProcessingInstructionHandler" my_ProcessingInstructionHandler); ^ ld -n32 -shared -all pyexpat.o expat/libexpat.a -o pyexpat.so ld32: FATAL 9: I/O error (expat/libexpat.a): No such file or directory *** Error code 32 (bu21) cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -c ./sgmlop.c ld -n32 -shared -all sgmlop.o -o sgmlop.so cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -c ./wstrop.c "./wstrop.c", line 417: warning(1164): argument of type "char *" is incompatible with parameter of type "const unsigned char *" l1=from_utf8(string+i,&wtmp); ^ "./wstrop.c", line 426: warning(1164): argument of type "char *" is incompatible with parameter of type "const unsigned char *" tmp+=from_utf8(tmp,wstr->string+i); ^ "./wstrop.c", line 627: warning(1164): argument of type "char *" is incompatible with parameter of type "unsigned char *" str+=to_utf8(self->string[i],str); ^ "./wstrop.c", line 807: warning(1164): argument of type "char *" is incompatible with parameter of type "unsigned char *" utf7_to_ucs2(PyString_AsString(ucs2),string,len,flags); ^ "./wstrop.c", line 829: warning(1164): argument of type "char *" is incompatible with parameter of type "unsigned char *" len=ucs2_to_utf7(0,PyString_AsString(ucs2),PyObject_Length(ucs2), ^ "./wstrop.c", line 838: warning(1164): argument of type "char *" is incompatible with parameter of type "unsigned char *" ucs2_to_utf7(PyString_AsString(utf7),PyString_AsString(ucs2), ^ "./wstrop.c", line 892: warning(1515): a value of type "char *" cannot be assigned to an entity of type "unsigned char *" s=PyString_AsString(result); ^ ld -n32 -shared -all wstrop.o -o wstrop.so `default' not remade because of errors (bu14) Compilation finished at Fri Dec 11 09:29:57 --=-=-=-- From akuchlin@cnri.reston.va.us Fri Dec 11 16:53:02 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 11 Dec 1998 11:53:02 -0500 (EST) Subject: [XML-SIG] Equality tests on DOM nodes Message-ID: <13937.18858.948855.840376@amarok.cnri.reston.va.us> [CC'ed to xml-sig@python.org and www-dom@w3.org; followups to www-dom@w3.org] With reference to the Python DOM implementation, someone has raised the question of testing the equality of nodes. I don't think there's anything in the DOM Recommendation that discusses this question, possibly because the issue doesn't raise its head in Java. Briefly, what should 'node1 == node2' do? In Python, object identity is tested using the 'is' operator, so 'node1 is node2' returns true iff node1 and node2 are actually the same object. 'node1 == node2' should therefore test for equal values of the node. This differs from Java, where n1==n2 tests object identity, and a further comparison would have to be implemented as a method. It seems fairly obvious that node1==node2 should check whether the node type and value are identical, and return false if they're not. But there are some trickier questions: * Should Element instances also compare their attributes? I would say 'yes', since the attributes are really associated with the Element node. * If the two nodes have identical type and value, should the comparison be recursive, comparing the children of the nodes. The == operator would then be comparing entire subtrees rooted at node1 and node2. I'm not certain if this is the best choice for the meaning of ==, but see no clear reason to choose recursive vs. non-recursive ==. Any suggestions? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Q. Does Kibo believe in furniture? A. No. Go away, furniture! -- The alt.religion.kibology FAQ From Fred L. Drake, Jr." References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> Message-ID: <13937.21569.329411.356332@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > * If the two nodes have identical type and value, should the > comparison be recursive, comparing the children of the nodes. The == > operator would then be comparing entire subtrees rooted at node1 and > node2. I'm not certain if this is the best choice for the meaning of > ==, but see no clear reason to choose recursive vs. non-recursive ==. > Any suggestions? Since I'm the one who raised this with Andrew, I'll mention that my first reaction was that it would be recursive. I don't see any clear indication that "shallow" equality has any real meaning. This corresponds to the basic notion of equality testing in Python. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From arabbit@earthlink.net Fri Dec 11 17:59:20 1998 From: arabbit@earthlink.net (Paul Butkiewicz) Date: Fri, 11 Dec 1998 12:59:20 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <13937.18858.948855.840376@amarok.cnri.reston.va.us> Message-ID: <000101be252f$fa764c60$da39bfa8@arabbit> Not to sound facetious, but to put this question in context, I might well ask how we implement < and > for nodes? We generally don't use those particular operators on something real. I would never say rock a > rock b, but I might say rock a weighs more than rock b. With respect to the equality and equivalence, I am very safe saying book a has the same author as book b, because she's really the same person. If I'm talking about book a and someone else is talking about book b, I might point out that they are talking about the same book. But if I say book a is the same as book b for two different books, while this is a commonly used construct, it invites argument --- "No, this book is dog-eared and has coffee stains on it. I want *my* book back!" "Honey, these two coffee tables are identical. Let's get the cheaper one." "No. This one is particle board and veneer, while this one is mission oak! How can you think they're the same?" What was my point? I think it was to say that it invites folly, especially when you're talking about an international, world-wide, universal standard, to specify that two things are equal when they do not refer to the same thing and/or measurable differences exist between them. It seems obvious perhaps only to me ) that attributes must be equal and the equality must be true recursively, if you dare to define equality for nodes. I think the next question might be, Does context make a difference in equality or equivalence? I could easily say that this paragraph is identical to that paragraph when we're talking about a printed page, but XML, in it's most commonly discussed usage, is about document metadata, and context is a part of that metadata. A node is, after all, part of a larger document. Paul -----Original Message----- From: www-dom-request@w3.org [mailto:www-dom-request@w3.org]On Behalf Of Andrew M. Kuchling Sent: Friday, December 11, 1998 11:53 AM To: www-dom@w3.org Cc: xml-sig@python.org Subject: Equality tests on DOM nodes [CC'ed to xml-sig@python.org and www-dom@w3.org; followups to www-dom@w3.org] With reference to the Python DOM implementation, someone has raised the question of testing the equality of nodes. I don't think there's anything in the DOM Recommendation that discusses this question, possibly because the issue doesn't raise its head in Java. Briefly, what should 'node1 == node2' do? In Python, object identity is tested using the 'is' operator, so 'node1 is node2' returns true iff node1 and node2 are actually the same object. 'node1 == node2' should therefore test for equal values of the node. This differs from Java, where n1==n2 tests object identity, and a further comparison would have to be implemented as a method. It seems fairly obvious that node1==node2 should check whether the node type and value are identical, and return false if they're not. But there are some trickier questions: * Should Element instances also compare their attributes? I would say 'yes', since the attributes are really associated with the Element node. * If the two nodes have identical type and value, should the comparison be recursive, comparing the children of the nodes. The == operator would then be comparing entire subtrees rooted at node1 and node2. I'm not certain if this is the best choice for the meaning of ==, but see no clear reason to choose recursive vs. non-recursive ==. Any suggestions? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Q. Does Kibo believe in furniture? A. No. Go away, furniture! -- The alt.religion.kibology FAQ From ray@imall.com Fri Dec 11 17:58:35 1998 From: ray@imall.com (Ray Whitmer) Date: Fri, 11 Dec 1998 10:58:35 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> Message-ID: <36715D4A.9660A0D0@imall.com> Andrew M. Kuchling wrote: > [CC'ed to xml-sig@python.org and www-dom@w3.org; followups to > www-dom@w3.org] > > With reference to the Python DOM implementation, someone has raised > the question of testing the equality of nodes. I don't think there's > anything in the DOM Recommendation that discusses this question, > possibly because the issue doesn't raise its head in Java. I don't know Python, but very object in Java has an equals method to signify deeper comparison than "==", for example, String.equals tells whether the contents of two strings are identical. > * Should Element instances also compare their attributes? > I would say 'yes', since the attributes are really associated with the > Element node. > > * If the two nodes have identical type and value, should the > comparison be recursive, comparing the children of the nodes. The == > operator would then be comparing entire subtrees rooted at node1 and > node2. I'm not certain if this is the best choice for the meaning of > ==, but see no clear reason to choose recursive vs. non-recursive ==. > Any suggestions? For my own uses on both the client and server (in Java, not Python), the full/deep comparison is the most useful and as such I implemented it in a private API extension extremely efficiently. A full/deep comparison is very useful in many situations, and can be implemented much more efficiently than forcing the user to check equality one attribute or recursive child at a time (with acceptable tradeoffs in other parts of the implementation). But I would recommend NOT using the built-in Python operator, just as I am not using the built-in equals method in Java, until it has been defined in the standard how this should be implemented. Otherwise, users of your implementation will not be interoperable with users of other implementations, and also possibly not interoperable with the standard definition if one is ever officially formulated. Instead, define the operator to raise an exception, if Python has one, and if you need an equality check, write one in a private API with your own name on it so it will be clear to users that by using your method, they will be sacrificing portability, in exchange for a concise, permanent definition of its behavior. The problem in Python is much bigger -- possibly rendering my advice irrelevant -- since no official DOM API binding has been released for that language in the first place. I am just following how I would tell someone to deal with the equals function in Java where users will expect portability between implementations. I don't know Python, so it is also possible that Python may impose more rigidity on the requirements of == (than Java does on equals), making it possible to know what the standard implementation should be, but your raising the question would seem to indicate that it does not. Ray Whitmer From Fred L. Drake, Jr." References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <000101be252f$fa764c60$da39bfa8@arabbit> Message-ID: <13937.24366.729293.26105@weyr.cnri.reston.va.us> Paul Butkiewicz writes: > Not to sound facetious, but to put this question in context, I might well > ask how we implement < and > for nodes? We generally don't use those This is a very real concern. I think comparison if nodes is only interesting for equality. When Python finally implements the "rich comparison" semantics that have been proposed, equality will be testable indepently of ordering. > to specify that two things are equal when they do not refer to the same > thing and/or measurable differences exist between them. It seems obvious > perhaps only to me ) that attributes must be equal and the equality must be > true recursively, if you dare to define equality for nodes. I think the Well said. > Does context make a difference in equality or equivalence? > > I could easily say that this paragraph is identical to that paragraph when > we're talking about a printed page, but XML, in it's most commonly discussed > usage, is about document metadata, and context is a part of that metadata. > A node is, after all, part of a larger document. A good point. I was particularly interested in equality *without consideration for parent*. So, I was ignoring context. Perhaps there is no fully general equality that isn't identity? I think the Python implementation would still require implementation of a comparison method to achieve this since it uses proxy nodes, but that's really just an implementation detail. Python's native identity operator doesn't work in the presence of proxies that represent the same node. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From michael@graphion.com Fri Dec 11 18:11:58 1998 From: michael@graphion.com (Michael Sanborn) Date: Fri, 11 Dec 1998 10:11:58 -0800 Subject: [XML-SIG] New to Python OO Message-ID: <3671606D.6D731B98@graphion.com> Suppose I wanted to create a customized method to write out a DOM tree, say as plain text, like a totxt() paralleling toxml(). And say my program imports xml.dom.core and xml.dom.builder. I would have thought that the way to approach this would be to define a local Node class derived from core.py that added an empty totxt() method, and then to define local subclasses of Node (such as Text) with specific totxt() methods. My reasoning was that the Builder class would then build the tree with my enhanced Nodes. But that doesn't seem to be happening. Instead, Builder seems to be constructing the tree with regular core Nodes that don't recognize my totxt() method. Can anyone give me advice on how to achieve this? Thanks, Michael Sanborn Graphion Typesetting From Fred L. Drake, Jr." References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <36715D4A.9660A0D0@imall.com> Message-ID: <13937.25009.925375.550977@weyr.cnri.reston.va.us> Ray Whitmer writes: > The problem in Python is much bigger -- possibly rendering my advice > irrelevant -- since no official DOM API binding has been released for that The spec does include IDL, and a Python binding for IDL is being developed. (Now, I've not checked that the Python DOM uses the Python IDL binding. Andrew, perhaps you can address this in the Python XML-SIG?) > I don't know Python, so it is also possible that Python may impose more > rigidity on the requirements of == (than Java does on equals), making it > possible to know what the standard implementation should be, but your > raising the question would seem to indicate that it does not. A couple of issues seem appearant to me, but depth is not one of them. First, the current implementation of Python's comparison semantics require complete ordering, which doesn't make sense in this case. That can be ignored for now if the documentation states that only equality/inequality is supported. Future versions of Python are expected to correct this problem. Second, the concerns Paul Butkiewicz raised about the relevance of context need to be addressed. Basic equality may have to be interpreted as node identity. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From arabbit@earthlink.net Fri Dec 11 18:19:45 1998 From: arabbit@earthlink.net (Paul Butkiewicz) Date: Fri, 11 Dec 1998 13:19:45 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <36715D4A.9660A0D0@imall.com> Message-ID: <000401be2532$d4647b20$da39bfa8@arabbit> >I don't know Python, but [e]very object in Java has an equals method to >signify deeper comparison than "==", for example, String.equals tells >whether the contents of two strings are identical. I must be feeling contrary today, but I think you're saying isn't true. String.equals( String ) does examine the contents of two different objects to determine that they are identical. But this is the case only because String explicitly overrides the equals( Object ) method in Object, which isn't true of many objects. The equals( Object ) method in Object only returns true if the objects are actually the same object, ie. ( *x )->equals( *y ) if and only if x == y. Paul From Fred L. Drake, Jr." References: <3671606D.6D731B98@graphion.com> Message-ID: <13937.25840.255411.454141@weyr.cnri.reston.va.us> Michael Sanborn writes: > Suppose I wanted to create a customized method to write out a DOM tree, > say as plain text, like a totxt() paralleling toxml(). And say my > program > imports xml.dom.core and xml.dom.builder. I would have thought that the > way to approach this would be to define a local Node class derived from > core.py that added an empty totxt() method, and then to define local > subclasses of Node (such as Text) with specific totxt() methods. My > reasoning was that the Builder class would then build the tree with my > enhanced Nodes. But that doesn't seem to be happening. Instead, Builder > seems to be constructing the tree with regular core Nodes that don't > recognize my totxt() method. Can anyone give me advice on how to achieve > this? There are two questions that need to be addressed here: 1) How should all this work, and 2) how to make it work now. Let's start with the second question, since it's easier. This is an approach I've used to write out an ESIS stream, so I can claim it works. Write the transform you want as a function (or maybe an object, if that's more conventient for state management), and pass the document to it. It just needs to walk the tree and handle each node type appropriately. From your brief description, I'd say this wouldn't be too hard. (You may find the stuff in the formatter module from the standard library handy as well.) What *should* be done is different. ;-) First, the DOM should support the visitor pattern. Not difficult to implement, but it's not in the DOM spec (yet). This would allow transforms to be written more cleanly. The ability to subclass the node types and have the subclasses be used would be really nice. The builder (and anything else) should only use the methods on the document object to create new nodes. You should then be able to subclass the Document class to make the factory methods do the right thing. (Some details may need to change on the builder, but that's trivial.) The biggest issue with this is the performance hit. That may be more than is acceptable. Use of the visitor pattern would certainly be more useful and easier in most cases. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From arabbit@earthlink.net Fri Dec 11 18:35:23 1998 From: arabbit@earthlink.net (Paul Butkiewicz) Date: Fri, 11 Dec 1998 13:35:23 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <000401be2532$d4647b20$da39bfa8@arabbit> Message-ID: <000501be2535$03afcd60$da39bfa8@arabbit> Wow. I'm replying to myself. If I did that walking down the street, people would stare at me. A further implementation difficulty has occurred to me: There are likely many people out there who would like to or are using the DOM in conjunction with a database, making the node objects persistent. These folks would probably prefer that equality indicate not just that two nodes are identical but that they represent the same record in the database. Paul -----Original Message----- From: www-dom-request@w3.org [mailto:www-dom-request@w3.org]On Behalf Of Paul Butkiewicz Sent: Friday, December 11, 1998 1:20 PM To: Ray Whitmer; Andrew M. Kuchling Cc: www-dom@w3.org; xml-sig@python.org Subject: RE: Equality tests on DOM nodes >I don't know Python, but [e]very object in Java has an equals method to >signify deeper comparison than "==", for example, String.equals tells >whether the contents of two strings are identical. I must be feeling contrary today, but I think you're saying isn't true. String.equals( String ) does examine the contents of two different objects to determine that they are identical. But this is the case only because String explicitly overrides the equals( Object ) method in Object, which isn't true of many objects. The equals( Object ) method in Object only returns true if the objects are actually the same object, ie. ( *x )->equals( *y ) if and only if x == y. Paul From gwachob@aimnet.com Fri Dec 11 18:48:06 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Fri, 11 Dec 1998 10:48:06 -0800 (PST) Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <13937.24366.729293.26105@weyr.cnri.reston.va.us> Message-ID: On Fri, 11 Dec 1998, Fred L. Drake wrote: > > Paul Butkiewicz writes: > > Not to sound facetious, but to put this question in context, I might well > > ask how we implement < and > for nodes? We generally don't use those > Perhaps there is no fully general equality that isn't identity? I > think the Python implementation would still require implementation of > a comparison method to achieve this since it uses proxy nodes, but > that's really just an implementation detail. Python's native identity > operator doesn't work in the presence of proxies that represent the > same node. Before you define equality generally for nodes, don't you have to define equality for each element and even each attribute? This may be a trivial task, but I suspect there are some issues (like if a Text node contains an entity reference, which, after being evaluated, results in a text string which is the same text string contained in another Text node without that entity referene) that are not specified. Another issue would be order of children. Without a DTD, how do you tell when order of child elements is significant? Perhaps this has to be an parameter to the deep comparison operator. If you made a decision on these issues and defined a comparison operator, I would say that it should be recursive because otherwise, the comparison operator isn't all that useful. Of course, given all the vagaries in the mapping of semantics of the word "equal" to the semantic meanings of various subtrees of two DOM trees, I wonder whether a single generalized equality operator will be useful to many people.. -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From Fred L. Drake, Jr." References: <13937.24366.729293.26105@weyr.cnri.reston.va.us> Message-ID: <13937.27240.670116.621025@weyr.cnri.reston.va.us> Gabe Wachob writes: > Before you define equality generally for nodes, don't you have to define > equality for each element and even each attribute? This may be a trivial ... > Another issue would be order of children. Without a DTD, how do you tell > when order of child elements is significant? Perhaps this has to be an Very good points. This makes it incredibly expensive to "do it right" with any level of abstraction. I guess it's not that hard to just write a routine that "does the right thing" for exactly what is needed in each case. And it's looking increasingly appropriate. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From jday@csihq.com Fri Dec 11 19:12:05 1998 From: jday@csihq.com (John Day) Date: Fri, 11 Dec 1998 14:12:05 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <000101be252f$fa764c60$da39bfa8@arabbit> References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> Message-ID: <3.0.1.32.19981211141205.00767594@mail.csihq.com> At 12:59 PM 12/11/98 -0500, you wrote: >Not to sound facetious, but to put this question in context, I might well >ask how we implement < and > for nodes? We generally don't use those >particular operators on something real. I would never say rock a > rock b, >but I might say rock a weighs more than rock b. This is a valid question with a meaningful reply. Operators like '<' and '>' can be implemented by any relation which is transitive, reflexive, and anti-symmetric. Since reflexive implies A B=' and '<='. The relation doesn't have to mean 'greater' or 'less'. It can be _any_ relation which satisfies the partial order defintion. A very useful one is "IS_A_SUBSET_OF". [It is understood that 'rock' itself is an "extential" object, understood by some set of "intents" (attributes) such as 'heavy', 'gray', 'hard', 'big' etc. The relation can be written in extential form but its meaning is usually applied to the intents. A extent like a rock cannot be perceived unless it has intents] Such relations define a "partial order" which have many uses in information retrieval, which XML certainly applies to. Let's say I'm searching for documents containing Concept X, where a concept if defined by the presence of a certain element node ("extent"), possibly qualified by attributes("intents". So 'equality' could be viewed as equivalence in the sense that two documents are equivalent if they contain the same concept(s). There may be other concepts in the documents that don't match, but this does not necessarily destroy the equivalence that we're searching for. Doesn't this imply that there is room for 'shallow' kinds of matching' to support this kind of reasoning? Of course, there is still a need for relations like "exactly identical", but subsethood is also a useful relation. -jday From arabbit@earthlink.net Fri Dec 11 19:29:54 1998 From: arabbit@earthlink.net (Paul Butkiewicz) Date: Fri, 11 Dec 1998 14:29:54 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <3.0.1.32.19981211141205.00767594@mail.csihq.com> Message-ID: <000701be253c$a1688680$da39bfa8@arabbit> OK, I hadn't really thought about that. But can you come up with a way of ordering nodes that deserves to be defined as part of a global and timeless standard instead of being merely implementation specific? Paul -----Original Message----- From: John Day [mailto:jday@csihq.com] Sent: Friday, December 11, 1998 2:12 PM To: Paul Butkiewicz; Andrew M. Kuchling; www-dom@w3.org Cc: xml-sig@python.org Subject: Re: [XML-SIG] RE: Equality tests on DOM nodes At 12:59 PM 12/11/98 -0500, you wrote: >Not to sound facetious, but to put this question in context, I might well >ask how we implement < and > for nodes? We generally don't use those >particular operators on something real. I would never say rock a > rock b, >but I might say rock a weighs more than rock b. This is a valid question with a meaningful reply. Operators like '<' and '>' can be implemented by any relation which is transitive, reflexive, and anti-symmetric. Since reflexive implies A B=' and '<='. The relation doesn't have to mean 'greater' or 'less'. It can be _any_ relation which satisfies the partial order defintion. A very useful one is "IS_A_SUBSET_OF". [It is understood that 'rock' itself is an "extential" object, understood by some set of "intents" (attributes) such as 'heavy', 'gray', 'hard', 'big' etc. The relation can be written in extential form but its meaning is usually applied to the intents. A extent like a rock cannot be perceived unless it has intents] Such relations define a "partial order" which have many uses in information retrieval, which XML certainly applies to. Let's say I'm searching for documents containing Concept X, where a concept if defined by the presence of a certain element node ("extent"), possibly qualified by attributes("intents". So 'equality' could be viewed as equivalence in the sense that two documents are equivalent if they contain the same concept(s). There may be other concepts in the documents that don't match, but this does not necessarily destroy the equivalence that we're searching for. Doesn't this imply that there is room for 'shallow' kinds of matching' to support this kind of reasoning? Of course, there is still a need for relations like "exactly identical", but subsethood is also a useful relation. -jday From gwachob@aimnet.com Fri Dec 11 19:45:01 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Fri, 11 Dec 1998 11:45:01 -0800 (PST) Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <13937.27240.670116.621025@weyr.cnri.reston.va.us> Message-ID: On Fri, 11 Dec 1998, Fred L. Drake wrote: > > Gabe Wachob writes: > > Before you define equality generally for nodes, don't you have to define > > equality for each element and even each attribute? This may be a trivial > .. > > Another issue would be order of children. Without a DTD, how do you tell > > when order of child elements is significant? Perhaps this has to be an > > Very good points. This makes it incredibly expensive to "do it > right" with any level of abstraction. > I guess it's not that hard to just write a routine that "does the > right thing" for exactly what is needed in each case. And it's > looking increasingly appropriate. Someone mentioned in this list or another that a set of objects corresponding to a Visitor pattern is something that should be added to DOM. There could be a default "equalityVisitor" that would have certain default equality rules built in (lets say, a separate equality test method for each DOM class). You could simply subclass the equalityVisitor to modify the semantics of the equality test for whatever particular elements you needed. Perhaps the equality visitor could simply have enough configurable parameters to make it do what you want without having to subclass. The original context of the Visitor suggestion was for rendering XML into HTML (I believe). For info on Visitor Pattern see the Gang of Four book "Design Patterns" http://iamwww.unibe.ch/CHOOSE/Articles/95-1/DP-book-review.html -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From akuchlin@cnri.reston.va.us Fri Dec 11 20:06:40 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 11 Dec 1998 15:06:40 -0500 (EST) Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <13937.27240.670116.621025@weyr.cnri.reston.va.us> References: <13937.24366.729293.26105@weyr.cnri.reston.va.us> <13937.27240.670116.621025@weyr.cnri.reston.va.us> Message-ID: <13937.31273.223202.338497@amarok.cnri.reston.va.us> Fred L. Drake writes: > >Gabe Wachob writes: > > Another issue would be order of children. Without a DTD, how do you tell > > when order of child elements is significant? Perhaps this has to be an > > I guess it's not that hard to just write a routine that "does the >right thing" for exactly what is needed in each case. And it's >looking increasingly appropriate. Indeed; it looks like there are several different variations on what equality would mean for a DOM node, and none seems obvious as the most intuitive meaning for ==. So the best course seems to be to define == applied between nodes to raise an exception, to be changed in case DOM Level N defines it, and have a collection of functions which implement different equality tests. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The large body of the swan wedged in the shattered glass of the car windscreen fills the film frame. Its head is bent back on itself in a parody of its orthodox gracefulness. -- Peter Greenaway, _A Zed and Two Noughts_ (1986) From akuchlin@cnri.reston.va.us Fri Dec 11 20:30:17 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 11 Dec 1998 15:30:17 -0500 (EST) Subject: [XML-SIG] New to Python OO In-Reply-To: <3671606D.6D731B98@graphion.com> References: <3671606D.6D731B98@graphion.com> Message-ID: <13937.31578.270593.15411@amarok.cnri.reston.va.us> Michael Sanborn writes: >imports xml.dom.core and xml.dom.builder. I would have thought that the >way to approach this would be to define a local Node class derived from >core.py that added an empty totxt() method, and then to define local >subclasses of Node (such as Text) with specific totxt() methods. Things aren't that simple, because of the implementation, which consists of a tree of hidden objects; the classes that you interact with, such as Node, Element, Text, etc. are all proxies for that hidden tree, and create new Node, Element, Text, ... proxies when you request a new portion. So all the retrieval methods would have to be aware >My >reasoning was that the Builder class would then build the tree with my >enhanced Nodes. But that doesn't seem to be happening. Instead, Builder >seems to be constructing the tree with regular core Nodes that don't >recognize my totxt() method. Can anyone give me advice on how to achieve >this? My suspicion is that subclassing Node classes isn't the way to go; instead, you'll write functions and classes (probably using existing classes such as Builder and Walker) that operate on DOM trees. However I'd really like to see a discussion of this. We need to work out common Python/DOM patterns, so that we can add appropriate helper modules and functions. (They'll also be useful to document as examples.) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The multiple human needs and desires that demand privacy among two or more people in the midst of social life must inevitably lead to cryptology wherever men thrive and wherever they write. -- David Kahn, _The Codebreakers_ From akuchlin@cnri.reston.va.us Fri Dec 11 20:35:30 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 11 Dec 1998 15:35:30 -0500 (EST) Subject: [XML-SIG] Re: Equality tests on DOM nodes In-Reply-To: <13937.25009.925375.550977@weyr.cnri.reston.va.us> References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <36715D4A.9660A0D0@imall.com> <13937.25009.925375.550977@weyr.cnri.reston.va.us> Message-ID: <13937.33182.137625.135265@amarok.cnri.reston.va.us> Fred L. Drake writes: >Ray Whitmer writes: > > The problem in Python is much bigger -- possibly rendering my advice > > irrelevant -- since no official DOM API binding has been released for that > > The spec does include IDL, and a Python binding for IDL is being >developed. (Now, I've not checked that the Python DOM uses the Python >IDL binding. Andrew, perhaps you can address this in the Python >XML-SIG?) I haven't checked it either, not having read the Python IDL binding. Since it uses Fnorb, I'd imagine that 4DOM definitely would follow the IDL binding. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The boast of heraldry, the pomp of power, / And all that beauty, all that wealth e'er gave, / Awaits alike th' inevitable hour: / The paths of glory lead but to the grave. -- Thomas Gray From jday@csihq.com Fri Dec 11 20:45:08 1998 From: jday@csihq.com (John Day) Date: Fri, 11 Dec 1998 15:45:08 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <000701be253c$a1688680$da39bfa8@arabbit> References: <3.0.1.32.19981211141205.00767594@mail.csihq.com> Message-ID: <3.0.1.32.19981211154508.0076cc40@mail.csihq.com> At 02:29 PM 12/11/98 -0500, Paul Butkiewicz wrote: >OK, I hadn't really thought about that. But can you come up with a way of >ordering nodes that deserves to be defined as part of a global and timeless >standard instead of being merely implementation specific? > > The branch of mathematics called "Order Theory" (a subset of Discete Math) is already a 'global and timeless standard'. I don't think we would want to dictate any specific orders. That should be left to specific implementors. Example: Compare two documents: the Bible and the Koran. Under the concept 'testaments of religious beliefs' they are virtually identical. Under the concept '', the books might be completely different. [Jon Bosak's 'tstmt.dtd' is a kind of 'most general unifier' for the first concept above] -jday From ray@imall.com Fri Dec 11 21:01:33 1998 From: ray@imall.com (Ray Whitmer) Date: Fri, 11 Dec 1998 14:01:33 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <000501be2535$03afcd60$da39bfa8@arabbit> Message-ID: <3671882D.DC7D9E5A@imall.com> Paul Butkiewicz wrote: > A further implementation difficulty has occurred to me: There are likely > many people out there who would like to or are using the DOM in conjunction > with a database, making the node objects persistent. These folks would > probably prefer that equality indicate not just that two nodes are identical > but that they represent the same record in the database. While this would be a useful function, I don't think it makes sense that it should be the function of "equals". But it does point out the many possible interpretations, which was the point of my original response. As I stated before, overriding equals would be a bad idea without an agreed-upon portable interpretation. People wonder why some of us are not sad that Java doesn't support general operator overloading, which would add yet another whole set of such ambiguities as "equals" provides. > I must be feeling contrary today, but I think you're saying isn't true. > String.equals( String ) does examine the contents of two different objects > to determine that they are identical. But this is the case only because > String explicitly overrides the equals( Object ) method in Object, which > isn't true of many objects. The equals( Object ) method in Object only > returns true if the objects are actually the same object, ie. > ( *x )->equals( *y ) if and only if x == y. The point of equals is so that it can be overridden with a deeper, class-specific interpretation. While Object it is too incomplete for a good deeper sense of equality, equals is only really useful with a set of classes where it is overridden in at least some of the classes to provide a deeper (but still consistent, transitive, symmetric, reflexive, useful) sense of equality. Otherwise, just use the "==" operator. Not only is the Object implementation of equals redundant with the "==" operator, but it is also less efficient. Ray Whitmer From gwachob@aimnet.com Fri Dec 11 21:19:35 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Fri, 11 Dec 1998 13:19:35 -0800 (PST) Subject: [XML-SIG] New to Python OO In-Reply-To: <13937.31578.270593.15411@amarok.cnri.reston.va.us> Message-ID: On Fri, 11 Dec 1998, Andrew M. Kuchling wrote: > >My > >reasoning was that the Builder class would then build the tree with my > >enhanced Nodes. But that doesn't seem to be happening. Instead, Builder > >seems to be constructing the tree with regular core Nodes that don't > >recognize my totxt() method. Can anyone give me advice on how to achieve > >this? > > My suspicion is that subclassing Node classes isn't the way to > go; instead, you'll write functions and classes (probably using > existing classes such as Builder and Walker) that operate on DOM > trees. However I'd really like to see a discussion of this. We need > to work out common Python/DOM patterns, so that we can add appropriate > helper modules and functions. (They'll also be useful to document as > examples.) (Speaking of the Python DOM implementation here) The Walker class is sort of a Visitor (not really). The walker "calls back" (really calls methods of its subclass) methods when the walker first visits and when the walker leaves a particular Node (assuming a depth first left-to-right traversal). A Visitor pattern class would not neccesarily include the "traversal" function (subclasses could) -- it would simply have "handleElement", "handleAttribute", "handleText", etc (sorta like SAX). A visitor pattern would handle an entire "subtree" at a time (I would guess) instead of thinking of the tree in a traversal sense (ie "startElement", "endElement"). It seems to me conceptually cleaner for most applications (if somewhat less efficient in some cases) to deal with the tree structurally instead of procedurally and thats why I would like to see a Visitor pattern.. Ultimately, it would be nice to be able to encode "transform" functions on trees -- approaching and surpassing the functionality of XSL from a programmtic (instead of stylesheet) point of view. For example (in prose): Take all the children of the "AUTHOR" element which have the attribute "INFORMATION" value of "PRIVATE" and compute a funciton on that attribute value and put it in a list. XSL can do a lot of this, but not all (or at least not cleanly, IMHO). Thoughts? -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From rll@eps.inso.com Fri Dec 11 21:47:39 1998 From: rll@eps.inso.com (Richard L. Lavallee) Date: Fri, 11 Dec 1998 16:47:39 -0500 Subject: [XML-SIG] Re: Equality tests on DOM nodes Message-ID: <199812112147.QAA11011@chineseballs.eps.inso.com> Regarding the problem of comparing DOM nodes, one implementation solution is to assign a "DOM node identifier" (DNI) to each DOM node, and use these as the basis for comparison. A DNI is an integer, base 1, which monotonically increases up to the maximum number of nodes in a particular DOM. The root node DNI is assigned "1", and the remainder are assigned in pre-order. When nodes persist their DNI's persist with them, for any given version of the particular DOM instance. So: how may any two DOM nodes be compared? Just examine their respective DNI's numerically. E.g.. a DOM node with DNI 42 is "==" to a DOM node with DNI 42. DNI_42 > DNI_5 DNI_9 < DNI_12 Of course, this works best for read-only DOM's; because arbitrary node insertion would disrupt the DNI sequencing. But I would argue that node insertion results in a new document version which necessarily has its own uniques set of DNI's anyway. How's that? -rll From ray@imall.com Fri Dec 11 23:21:38 1998 From: ray@imall.com (Ray Whitmer) Date: Fri, 11 Dec 1998 16:21:38 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <199812112147.QAA11011@chineseballs.eps.inso.com> Message-ID: <3671A902.60DC600@imall.com> Richard L. Lavallee wrote: > Regarding the problem of comparing DOM nodes, > one implementation solution is to assign a "DOM node identifier" (DNI) > to each DOM node, and use these as the basis for comparison. > > A DNI is an integer, base 1, which monotonically increases up to the > maximum number of nodes in a particular DOM. > The root node DNI is assigned "1", and the remainder are assigned > in pre-order. > > When nodes persist their DNI's persist with them, for any given > version of the particular DOM instance. > > So: how may any two DOM nodes be compared? > > Just examine their respective DNI's numerically. > > E.g.. a DOM node with DNI 42 is "==" to a DOM node with DNI 42. > > DNI_42 > DNI_5 > > DNI_9 < DNI_12 > > Of course, this works best for read-only DOM's; > because arbitrary node insertion would disrupt the DNI sequencing. > But I would argue that node insertion results in a new document version > which necessarily has its own uniques set of DNI's anyway. I think what you are proposing is yet another type of comparison function that detects the order of two nodes in traversal order of the hierarchy. This is a very useful function, too, which should be assigned to yet another function. I had to improve on the methodology you describe as follows to efficiently manage a mutable (modifiable) hierarchy: First, don't run the numbers continuously through the hierarchy, but rather keep different sequences for each set of siblings. Then, count the depth of each node being compared, replace the node that is deeper with its ancestor at the higher level, and go up the tree until you find the siblings with the common ancestor. Then, use the numbers to find the order there. But if you have large numbers of siblings, this is still a problem shifting large ranges, potentially of millions of siblings. So my final solution was to represent siblings in a btree, and then order just within fixed-length btree nodes, so you never have to shift many at all, and you can still compare quite rapidly. Ray Whitmer From uche.ogbuji@fourthought.com Sat Dec 12 00:30:37 1998 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 11 Dec 1998 17:30:37 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes In-Reply-To: Your message of "Fri, 11 Dec 1998 15:35:30 EST." <13937.33182.137625.135265@amarok.cnri.reston.va.us> Message-ID: <199812120030.RAA08745@malatesta.local> > Fred L. Drake writes: > >Ray Whitmer writes: > > > The problem in Python is much bigger -- possibly rendering my advice > > > irrelevant -- since no official DOM API binding has been released for that > > > > The spec does include IDL, and a Python binding for IDL is being > >developed. (Now, I've not checked that the Python DOM uses the Python > >IDL binding. Andrew, perhaps you can address this in the Python > >XML-SIG?) > > I haven't checked it either, not having read the Python IDL > binding. Since it uses Fnorb, I'd imagine that 4DOM definitely would > follow the IDL binding. Yes. In fact, we are participating in the do-sig to complete and formalize the IDL binding. We ran into problematic differences between both of the main current ORBs that support Python: Fnorb and ILU, and there has been discussion of this, all of which should lead to even more clarity in the Python-IDL binding. The IDL binding _does_ give good guidance on how to interpret the DOM spec, since so much of DOM is formally specified in IDL, and the Python-IDL binding in its current state is not too difficult a read, so you might want to check it out at: http://www.python.org/sigs/do-sig/corbamap.html -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sat Dec 12 00:59:06 1998 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 11 Dec 1998 17:59:06 -0700 Subject: [XML-SIG] New to Python OO In-Reply-To: Your message of "Fri, 11 Dec 1998 13:19:35 PST." Message-ID: <199812120059.RAA08780@malatesta.local> > The Walker class is sort of a Visitor (not really). The walker "calls > back" (really calls methods of its subclass) methods when the walker first > visits and when the walker leaves a particular Node (assuming a depth > first left-to-right traversal). > > A Visitor pattern class would not neccesarily include the "traversal" > function (subclasses could) -- it would simply have "handleElement", > "handleAttribute", "handleText", etc (sorta like SAX). A visitor pattern > would handle an entire "subtree" at a time (I would guess) instead of > thinking of the tree in a traversal sense (ie "startElement", > "endElement"). This is an excellent point. We are currently working on introducing the visitor pattern into 4DOM for the next version or two, over which we would overlay a global function, tentatively VisitInOrder(), which does the equivalent of the walker on PyDOM by doing an in-order traversal and invoking accept(AppropriateVisitor) on each of the DOM nodes. We like this idea because of the extensibility: we can then have visitors that print out raw text, or that pretty-print with extra whitespace. A user could add his own visitor that performs transforms as you mention, etc. > It seems to me conceptually cleaner for most applications (if somewhat > less efficient in some cases) to deal with the tree structurally instead > of procedurally and thats why I would like to see a Visitor pattern.. > > Ultimately, it would be nice to be able to encode "transform" functions on > trees -- approaching and surpassing the functionality of XSL from a > programmtic (instead of stylesheet) point of view. For example (in prose): > > Take all the children of the "AUTHOR" element which have the attribute > "INFORMATION" value of "PRIVATE" and compute a funciton on that attribute > value and put it in a list. > > XSL can do a lot of this, but not all (or at least not cleanly, IMHO). > > Thoughts? It seems to me that your example above is definitely not in the domain of style-sheets, but DOM programming. I guess I could imagine the ECMAScript to do it in XSL, but it just makes me ask "why?". A DOM visitor, or a SAX application, however, appear far more appropriate ways to do this. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From gwachob@aimnet.com Sat Dec 12 07:54:06 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Fri, 11 Dec 1998 23:54:06 -0800 (PST) Subject: [XML-SIG] My DOM Visitor Class(es) Message-ID: Hi folks- I threw together a very simple DOM Visitor class (it also has a "Walker" mixin to create a Visitor that automatically walks a tree and visits each Node). You can get it at: http://www.aimnet.com/~gwachob/DOMVisitor.py I use the term "Visitor" loosely -- while inspired by the Visitor Design Pattern in the book "Design Patterns", it is technically not following that pattern. It looks to be useful nontheless. The basic Visitor class does very very little -- if you subclass it, you must add visit_ELEMENT, visit_TEXT, etc methods. The basic Visitor class simply gets the type of the node you pass it and calls a visit_ method on that Node. (ie visit_TEXT(node)). The WalkerMixin changes this basic behavior by visiting the Node's children after the Node itself is visited. What makes this really nice is that the method which visits the Node returns a value which tells the main dispatcher method (visit) whether or not to visit the Node's children. Thus, whole subtrees can be treated separately (or not at all) depending on a visit to the root node of the subtree (that visit to the "root" node of a subtree can visit parts of the subtree itself, or you may decide in implementing the visit method that you can skip the entire subtree because that subtree is irrelevant for your purposes. You can even build a separate walker for that subtree to do some completely different processing. How about multithreaded parsing?) Wouldn't using a DOM tree in this way (structurally) better allow DOM parsers to only hold part of the DOM tree in memory? Anyway, enough rambling, I'd like people to take a look at the code, tell me what they think (has any body else written code like this?), tell me what improvements the code would need (yeah, yeah, it uses recursion), etc. Oh yeah, and please use it for your own projects! -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From fleck@informatik.uni-bonn.de Sat Dec 12 11:02:13 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Sat, 12 Dec 1998 12:02:13 +0100 Subject: [XML-SIG] Python WebDAV at Xerox? Message-ID: <36724D35.3345@informatik.uni-bonn.de> Hi! I just found out about Xerox's DAV server & client in Python, . The code hasn't been released yet, but it is mentioned in the WebDAV interoperability matrix at . Quoting from : > About the implementation > > The server is implemented in Python, and runs on > Python 1.4 or later. It runs on Unix and Windows. > The persistent store for resources is is a Posix file, > properties are stored in a dbm database. > > I am attempting to make the source code available, but > must secure permission from Xerox lawyers. Please be patient. > > I also have a client-side library in Python. Likewise, I am > attempting to release it. > > Feedback > > Send comments to jdavis@parc.xerox.com. Yours, Markus. -- //////////////////////////////////////////////////////////////////////////// Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079 UNIX Administrator - comp.lang.python.announce Moderator "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From MHammond@skippinet.com.au Sat Dec 12 11:12:47 1998 From: MHammond@skippinet.com.au (Mark Hammond) Date: Sat, 12 Dec 1998 22:12:47 +1100 Subject: [XML-SIG] XBEL Patch to msie_parse.py Message-ID: <002501be25c0$59c25890$0801a8c0@bobcat> This is a multi-part message in MIME format. ------=_NextPart_000_0026_01BE261C.8D32D090 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Better late than never :-) Ive attached a diff which attempts to use the "win32api" module to locate the favorites folder in the registry. It has been tested on NT and 98, but only on English systems - Im fairly sure that it will also work on non-English systems and Windows 95, but all testing appreciated :-) I took the approach that a command line arg could point to the favorites folder - if not specified, then it attempts to use win32api to find it. If that fails, it prints a message asking for the command line param. Also note that there where a couple of other changes WRT the arguments to certain functions - it appears this file did not keep up to date with bookmark.py Mark. ------=_NextPart_000_0026_01BE261C.8D32D090 Content-Type: application/octet-stream; name="msie_parse.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="msie_parse.diff" KioqIFx0ZW1wXG1zaWVfcGFyc2UucHkJV2VkIERlYyAwMiAyMjozNDowMiAxOTk4Ci0tLSBtc2ll X3BhcnNlLnB5CVNhdCBEZWMgMTIgMjE6MDg6NDkgMTk5OAoqKioqKioqKioqKioqKioKKioqIDE3 LDI4ICoqKioKICBjbGFzcyBNU0lFOg0KICAgICAgIyBpbnRlcm5ldCBleHBsb3Jlcg0KICANCiEg ICAgIGRlZiBfX2luaXRfXyhzZWxmLGJvb2ttYXJrcyk6DQohICAgICAgICAgIyBGSVhNRTogdXNl IHJlZ2lzdHJ5IGZvciB0aGlzIQ0KISANCiAgICAgICAgICBzZWxmLmJtcz1ib29rbWFya3MNCiAg ICAgICAgICBzZWxmLnJvb3QgPSBOb25lDQohICAgICAgICAgc2VsZi5wYXRoID0gb3MucGF0aC5q b2luKFVTUkRJUiwgRElSKQ0KICANCiAgICAgICAgICBzZWxmLl9fd2FsaygpDQogIA0KLS0tIDE3 LDI2IC0tLS0KICBjbGFzcyBNU0lFOg0KICAgICAgIyBpbnRlcm5ldCBleHBsb3Jlcg0KICANCiEg ICAgIGRlZiBfX2luaXRfXyhzZWxmLGJvb2ttYXJrcywgcGF0aCk6DQogICAgICAgICAgc2VsZi5i bXM9Ym9va21hcmtzDQogICAgICAgICAgc2VsZi5yb290ID0gTm9uZQ0KISAgICAgICAgIHNlbGYu cGF0aCA9IHBhdGgNCiAgDQogICAgICAgICAgc2VsZi5fX3dhbGsoKQ0KICANCioqKioqKioqKioq KioqKgoqKiogMzIsNDQgKioqKgogICAgICAgICAgZm9yIGZpbGUgaW4gb3MubGlzdGRpcihwYXRo KToNCiAgICAgICAgICAgICAgZnVsbG5hbWUgPSBvcy5wYXRoLmpvaW4ocGF0aCwgZmlsZSkNCiAg ICAgICAgICAgICAgaWYgb3MucGF0aC5pc2RpcihmdWxsbmFtZSk6DQohICAgICAgICAgICAgICAg ICBzZWxmLmJtcy5hZGRfZm9sZGVyKGZpbGUsTm9uZSxOb25lKQ0KICAgICAgICAgICAgICAgICAg c2VsZi5fX3dhbGsoc3VicGF0aCArIFtmaWxlXSkNCiAgICAgICAgICAgICAgZWxzZToNCiAgICAg ICAgICAgICAgICAgIHVybCA9IHNlbGYuX19nZXR1cmwoZnVsbG5hbWUpDQogICAgICAgICAgICAg ICAgICBpZiB1cmw6DQogICAgICAgICAgICAgICAgICAgICAgc2VsZi5ibXMuYWRkX2Jvb2ttYXJr KG9zLnBhdGguc3BsaXRleHQoZmlsZSlbMF0sTm9uZSwNCiEgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgTm9uZSx1cmwpDQogIA0KICAgICAgZGVmIF9fZ2V0dXJsKHNl bGYsIGZpbGUpOg0KICAgICAgICAgIHRyeToNCi0tLSAzMCw0MiAtLS0tCiAgICAgICAgICBmb3Ig ZmlsZSBpbiBvcy5saXN0ZGlyKHBhdGgpOg0KICAgICAgICAgICAgICBmdWxsbmFtZSA9IG9zLnBh dGguam9pbihwYXRoLCBmaWxlKQ0KICAgICAgICAgICAgICBpZiBvcy5wYXRoLmlzZGlyKGZ1bGxu YW1lKToNCiEgICAgICAgICAgICAgICAgIHNlbGYuYm1zLmFkZF9mb2xkZXIoZmlsZSxOb25lKQ0K ICAgICAgICAgICAgICAgICAgc2VsZi5fX3dhbGsoc3VicGF0aCArIFtmaWxlXSkNCiAgICAgICAg ICAgICAgZWxzZToNCiAgICAgICAgICAgICAgICAgIHVybCA9IHNlbGYuX19nZXR1cmwoZnVsbG5h bWUpDQogICAgICAgICAgICAgICAgICBpZiB1cmw6DQogICAgICAgICAgICAgICAgICAgICAgc2Vs Zi5ibXMuYWRkX2Jvb2ttYXJrKG9zLnBhdGguc3BsaXRleHQoZmlsZSlbMF0sTm9uZSwNCiEgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTm9uZSxOb25lLHVybCkNCiAg DQogICAgICBkZWYgX19nZXR1cmwoc2VsZiwgZmlsZSk6DQogICAgICAgICAgdHJ5Og0KKioqKioq KioqKioqKioqCioqKiA1OCw2MiAqKioqCiAgIyAtLS0gVGVzdHByb2dyYW0NCiAgDQogIGlmIF9f bmFtZV9fID09ICdfX21haW5fXyc6DQohICAgICBtc2llPU1TSUUoYm9va21hcmsuQm9va21hcmtz KCkpDQogICAgICBtc2llLmJtcy5kdW1wX3hiZWwoKQ0KLS0tIDU2LDc3IC0tLS0KICAjIC0tLSBU ZXN0cHJvZ3JhbQ0KICANCiAgaWYgX19uYW1lX18gPT0gJ19fbWFpbl9fJzoNCiEgICAgIGltcG9y dCBzeXMNCiEgICAgIGlmIGxlbihzeXMuYXJndik+MToNCiEgICAgICAgICBwYXRoID0gc3lzLmFy Z3ZbMV0NCiEgICAgIGVsc2U6DQohICAgICAgICAgdHJ5Og0KISAgICAgICAgICAgICBpbXBvcnQg d2luMzJhcGksIHdpbjMyY29uDQohICAgICAgICAgZXhjZXB0IEltcG9ydEVycm9yOg0KISAgICAg ICAgICAgICBwcmludCAiVGhlIHdpbjMyYXBpIG1vZHVsZSBpcyBub3QgYXZhaWxhYmxlIG9uIHRo aXMgc3lzdGVtIg0KISAgICAgICAgICAgICBwcmludCAic28gd2UgY2FudCBhdXRvbWF0aWNhbGx5 IGZpbmQgeW91ciBmYXZvcml0ZXMgZm9sZGVyLiINCiEgICAgICAgICAgICAgcHJpbnQgIlBsZWFz ZSByZS1ydW4gdGhpcyBwcm9ncmFtIHNwZWNpZml5aW5nIHRoZSBsb2NhdGlvbiBvZiB5b3VyIg0K ISAgICAgICAgICAgICBwcmludCAiZmF2b3JpdGVzIGZvbGRlciBvbiB0aGUgY29tbWFuZCBsaW5l LiINCiEgICAgICAgICAgICAgc3lzLmV4aXQoMSkNCiEgICAgICAgICBrZXluYW1lID0gciJTb2Z0 d2FyZVxNaWNyb3NvZnRcV2luZG93c1xDdXJyZW50VmVyc2lvblxFeHBsb3JlclxTaGVsbCBGb2xk ZXJzIg0KISAgICAgICAgIGhrZXkgPSB3aW4zMmFwaS5SZWdPcGVuS2V5KHdpbjMyY29uLkhLRVlf Q1VSUkVOVF9VU0VSLCBrZXluYW1lKQ0KISAgICAgICAgIHBhdGgsIHBhdGh0eXBlID0gd2luMzJh cGkuUmVnUXVlcnlWYWx1ZUV4KGhrZXksICJGYXZvcml0ZXMiKQ0KISAgICAgICAgIGFzc2VydCBw YXRodHlwZSA9PSB3aW4zMmNvbi5SRUdfU1oNCiEgDQohICAgICBtc2llPU1TSUUoYm9va21hcmsu Qm9va21hcmtzKCksIHBhdGgpDQogICAgICBtc2llLmJtcy5kdW1wX3hiZWwoKQ0K ------=_NextPart_000_0026_01BE261C.8D32D090-- From gstein@lyra.org Sat Dec 12 11:29:12 1998 From: gstein@lyra.org (Greg Stein) Date: Sat, 12 Dec 1998 03:29:12 -0800 Subject: [XML-SIG] Python WebDAV at Xerox? References: <36724D35.3345@informatik.uni-bonn.de> Message-ID: <36725388.36448E78@lyra.org> Markus Fleck wrote: > > Hi! > > I just found out about Xerox's DAV server & client > in Python, . > The code hasn't been released yet, but it is > mentioned in the WebDAV interoperability matrix at > . He has been itching to get it released since September :-) Jim did state there is one benefit to the delay in the release. It guarantees that mod_dav was built independently. The IETF likes independent implementations before moving a Proposed Standard to an Actual Standard. On topic: I'm not sure what XML parser he uses for the message bodies. I got a couple tracebacks during some initial interop testing, but I didn't immediately recognize anything. Jim also has a DAV client written in Python. No idea on the XML stuff there either. Note that he doesn't deal with some of the encoding issues yet. mod_dav uses James Clark's Expat parser (nice parser!). Cheers, -g p.s. okay. so I didn't really say anything interesting or useful. bleh. :-) -- Greg Stein, http://www.lyra.org/ From arabbit@earthlink.net Sat Dec 12 15:34:21 1998 From: arabbit@earthlink.net (Paul Butkiewicz) Date: Sat, 12 Dec 1998 10:34:21 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes In-Reply-To: <3671A902.60DC600@imall.com> Message-ID: <000701be25e4$e3826f60$5839bfa8@arabbit> >First, don't run the numbers continuously through the hierarchy, but rather >keep different sequences for each set of siblings. Then, count the depth >of each node being compared, replace the node that is deeper with its >ancestor at the higher level, and go up the tree until you find the >siblings with the common ancestor. Then, use the numbers to find the order >there. But if you have large numbers of siblings, this is still a problem >shifting large ranges, potentially of millions of siblings. >So my final solution was to represent siblings in a btree, and then order >just within fixed-length btree nodes, so you never have to shift many at >all, and you can still compare quite rapidly. We're getting way into implementation-specific details here, but in the first proposed solution: Suppose we are in an environment that requires us to both be able to insert nodes quickly and obtain a node's order quickly and we have a large number of nodes. And we're implementing the first solution. There isn't really a reason that the number has to be an integer, is there? For quick insertion and ordering, we could very well keep two integers, numerator and denominator, and if something belongs between 1/1 and 2/1 we just stick it at 1/2 rather than changing the numbers on the next 20000 nodes. And then, later, when the system is taking a breather, we can come back, lock the whole set of siblings, and rearrange the numbers? Not that anyone actually implements things this way, probably for good reason, but if I can't throw out crazy ideas here, where can I? Paul P.S. Ray, you missed my point on the whole Object.equals thing. My point is that if we look to java for guidance (which must make *someone* out there cringe :), than the way equals is implemented in String is the exception rather than the norm. I don't think nodes are like strings at all. From bbennett@unixg.ubc.ca Sat Dec 12 22:40:28 1998 From: bbennett@unixg.ubc.ca (Bruce Bennett) Date: Sat, 12 Dec 1998 14:40:28 -0800 Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat Message-ID: Greetings, xml-sig folk -- When I try to import pyexpat (as at the beginning of pyexpattest.py), I see a curious error message: Python 1.5.1 (#37, Apr 27 1998, 13:36:17) [CW CFM68K w/GUSI w/MSL] Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> import pyexpat Traceback (innermost last): File "", line 1, in ? ImportError: PythonCore--PySys_WriteStderr: A fragment had "hard" unresolved imports. Having no clue what this errmsg really means, I then try to import 'Pyexpat', hoping to confirm that Python is seeing the pyexpat lib file: >>> import Pyexpat Traceback (innermost last): File "", line 1, in ? NameError: Case mismatch for module name Pyexpat (filename pyexpat.cfm68k.slb) So yes, it's seeing it. Is there a problem with pyexpat.cfm68k.slb? Or with something I'm (not) doing? Does importing pyexpat require the definition of paths to other dependencies in the xml-0.5 package? I'm running System 7.5.5 with CFM-68K Runtime Enabler v. 4.0, and encountering no other problems with Mac Python 1.5.1. -- BTW, in the recently-released xml-0.5 package for Python, the file README.pyexpat says the requisite Macintosh binaries are available as > ftp://ftp.cwi.nl/pub/jack/python/pyexpat.hqx (macintosh binary-only). ^^^^^^^^^^^ At present, however, the filename in fact seems to be 'pyexpat.sit.hqx'. Further (in case the preceding observation wasn't petty enough), conformity with other Mac Python shared libs suggests the orthography 'pyexpat.CFM68k.slb' instead of the current 'pyexpat.cfm68k.slb'. Regards, -- Bruce Bennett From kajiyama@etl.go.jp Sun Dec 13 08:14:49 1998 From: kajiyama@etl.go.jp (Tamito Kajiyama) Date: Sun, 13 Dec 98 08:14:49 JST Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat In-Reply-To: (bbennett@unixg.ubc.ca) Message-ID: <9812122314.AA16915@etlibs2.etl.go.jp> bbennett@unixg.ubc.ca (Bruce Bennett) writes: | | When I try to import pyexpat (as at the beginning of pyexpattest.py), I see | a curious error message: | | Python 1.5.1 (#37, Apr 27 1998, 13:36:17) [CW CFM68K w/GUSI w/MSL] | Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam | >>> import sys | >>> import pyexpat | Traceback (innermost last): | File "", line 1, in ? | ImportError: PythonCore--PySys_WriteStderr: | A fragment had "hard" unresolved imports. I suspect that the XML package you are using is of a pre-release version (probably xml-0.5pre1). If you have compiled the XML package yourself, try the final version of it. If you have installed a binary distribution for Macintosh, ask the maintainer of the binary distribution ;-) -- KAJIYAMA, Tamito From dieter@handshake.de Sun Dec 13 21:17:23 1998 From: dieter@handshake.de (Dieter Maurer) Date: 13 Dec 1998 22:17:23 +0100 Subject: [XML-SIG] ANN: WeakDict's: addressing CPython's problem with cyclic structures Message-ID: The following message is a courtesy copy of an article that has been posted as well. WeakDict (Weak Dictionaries) have been designed to address CPythons problems with cyclic references. More precisely, WeakDict's allow the realization of weak references, references that are **NOT** counted in the reference count and can therefore be used to build cyclic structures without obstructing the reference counting scheme. This might be interesting e.g. for the DOM implementation of the XML-SIG. Other applications include object maps and caches of various kinds. WeakDict's are very similar to normal Python dictionaries, with the following essential exceptions: - all values in a WeakDict must be instances of 'WeakValue' (or a derived class) - the reference to a value in a WeakDict is *NOT* counted in the reference count of the value. Thus, it does not prevent the value from being garbaged collected. - When a value is garbaged collected, the corresponding entry disappears from the WeakDict. More information and download: URL:http://www.handshake.de/~dieter/weakdict.html From paul@prescod.net Sun Dec 13 21:32:33 1998 From: paul@prescod.net (Paul Prescod) Date: Sun, 13 Dec 1998 15:32:33 -0600 Subject: [XML-SIG] Zope, DTML and XML Message-ID: <36743271.376A09A8@prescod.net> Of course Zope must eventually move into the XML world. Zope needs to do templates. XSL also does templates. In fact templates are almost as central to XSL as they are to Zope. I would suggest that Zope should use XSL template syntax for DTML templates as far as is possible. In fact, maybe when XSL becomes popular enough, it might make sense to describe the interaction between Zope and the Python runtime in terms of XML transformations. That's for the future, though. In the meantime, the point is that the template syntax should be the same. Here are the details from the current XSL spec: "The value of an attribute of a literal result element is interpreted as an attribute value template: it can contain string expressions contained in curly braces ({})." "Within a template, the xsl:value-of element can be used to compute generated text, for example by extracting text from the source tree or by inserting the value of a string constant. The xsl:value-of element does this with a string expression that is specified as the value of the expr attribute. String expressions can also be used inside attribute values of literal result elements by enclosing the string expression in curly brace ({})." "The xsl:value-of element is replaced by the value of the string expression specified by the expr attribute. The expr attribute is required." "e.g. " "In an attribute value that is interpreted as an attribute value template, such as an attribute of a literal result element, string expressions can be used by surrounding the string expression with curly braces ({}). The attribute value template is instantiated by replacing the string expression together with surrounding curly braces by the value of the string expression. The following example creates an IMG result element from a photograph element in the source; the value of the SRC attribute of the IMG element is computed from the value of the image-dir constant and the content of the href child of the photograph element; the value of the WIDTH attribute of the IMG element is computed from the value of the the width attribute of the size child of the photograph element: With this source headquarters.jpg the result would be When an attribute value template is instantiated, a double left or right curly brace outside a string expression will be replaced by a single curly brace. It is an error if a right curly brace occurs in an attribute value template outside a string expression without being followed by a second right curly brace; an XSL processor may signal the error or recover by treating the right curly brace as if it had been doubled. A right curly brace inside an AttributeValue in a string expression is not recognized as terminating the string expression." http://www.w3.org/TR/WD-xsl Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From jeremy@allaire.com Mon Dec 14 01:33:05 1998 From: jeremy@allaire.com (Jeremy Allaire) Date: Sun, 13 Dec 1998 20:33:05 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: Hello folks- I'm interesting in engaging anyone/everyone from the Python community to work with us on a WDDX platform module for Python. With the help of a few developers, we've been able to muster/ship WDDX modules for ASP/COM, Java, ColdFusion, Perl and JavaScript, and would love to see a Python implementation. Given the recent XML release for Python, seems like it would be a great project to make cross-language distributed web applications even more possible. Take a visit to www.WDDX.org, and most importantly take a view of the SDK, developed by Nate Weiss, which brings it all together with all of the above languages. Best and regards, Jeremy Allaire From jim@Digicool.com Mon Dec 14 12:57:54 1998 From: jim@Digicool.com (Jim Fulton) Date: Mon, 14 Dec 1998 12:57:54 +0000 Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML References: <36743271.376A09A8@prescod.net> Message-ID: <36750B52.EE1EBC7D@digicool.com> Paul Prescod wrote: > > Of course Zope must eventually move into the XML world. Zope needs to do > templates. It already does, via DTML. > XSL also does templates. I would have thought that XSL *was* a template mechanism. What do you mean by "template"? > In fact templates are almost as central to XSL as they are to Zope. I would say far more so, > I would suggest that Zope should use > XSL template syntax for DTML templates as far as is possible. It appears to me that DTML and XSL represent two very different approaches to solving the same or similar problems. They are both intended for generating text from objects. DTML generates text from Python objects. XSL generates text from XML objects. DTML takes a higly procedural approach. In DTML, you generate text directly. In XSL (as I understand it) you specify a set of rules for applying transformations to XML elements. This is fairly declarative in nature. In the example you gave, you didn't render a specific picture element. Instead, you have a rule for converting picture elements to img tags. Another difference between DTML and XSL is in how content is determined. DTML is typically used to define as well as format content. A DTML document directly specifies data that is often extracted from large object spaces. In XSL, it appears that the content is largely defined by a source document and an XSL "template" simply specifies transformations. Of course, an XSL specification can also filter, so there is some ability to extract, but it is much less direct than with DTML. Given the very different natures of DTML and XSL, I don't see much point in making the syntaxes all that consistent. > In fact, > maybe when XSL becomes popular enough, it might make sense to describe the > interaction between Zope and the Python runtime in terms of XML > transformations. It may very well. If Zope made it easy to generate XML from Zope (ie Python) objects, then people who like XSL could apply XSL transformations to the resulting XSL, bypassing DTML altogether. In other words, I see XSL as an alternative to DTML, not another form of it. Or, DTML may turn out to be a good tool for generating XML from objects, and then XSL could be applied to DTML output, in which case the two would act in tandem. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (540) 371-6909 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From paul@prescod.net Mon Dec 14 12:52:27 1998 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Dec 1998 06:52:27 -0600 Subject: [XML-SIG] Perl and character encodings Message-ID: <36750A0B.EBEB7355@prescod.net> Thought this might be of interest: > Version 2.17 of XML::Parser has been uploaded to CPAN. With this version, > the entire API of James Clark's expat library is accessible from perl. > > The major new feature is access to character set encodings other than > expat's built-in set (UTF-8, UTF-16, ISO-8859-1, US-ASCII). This is done > through binary character encoding maps appearing in the pathlist > represented by @XML::Parser::Expat::Encoding_Path. The following encoding > maps come with this distribution and require no further action on the part > of the user, i.e. if expat comes across the encoding, it will just use it > without user intervention: > > Big5 > ISO-8859-2 > ISO-8859-3 > ISO-8859-4 > ISO-8859-5 > ISO-8859-7 > ISO-8859-8 > ISO-8859-9 > Shift_JIS > windows-1250 > > Other maps may be created and installed in the encoding search path by > using the tools in the newly released XML::Encoding distribution. -- > Subject: Re: XML::Parser Version 2.17 has been uploaded to CPAN > From: MURATA Makoto > Date: Mon, 14 Dec 1998 15:53:48 +0900 > X-Message-Number: 4 > > I tried an XML document in Shift_JIS and an equivalent document in UTF-16. > XML::Parser created exactly the same result. Great work! > > Cheers, > > Makoto > > Fuji Xerox Information Systems -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From ray@imall.com Mon Dec 14 18:12:23 1998 From: ray@imall.com (Ray Whitmer) Date: Mon, 14 Dec 1998 11:12:23 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <000701be25e4$e3826f60$5839bfa8@arabbit> Message-ID: <36755507.F8657565@imall.com> Paul Butkiewicz wrote: > We're getting way into implementation-specific details here, but in the > first proposed solution: Suppose we are in an environment that requires us > to both be able to insert nodes quickly and obtain a node's order quickly > and we have a large number of nodes. And we're implementing the first > solution. There isn't really a reason that the number has to be an integer, > is there? For quick insertion and ordering, we could very well keep two > integers, numerator and denominator, and if something belongs between 1/1 > and 2/1 we just stick it at 1/2 rather than changing the numbers on the next > 20000 nodes. And then, later, when the system is taking a breather, we can > come back, lock the whole set of siblings, and rearrange the numbers? > > Not that anyone actually implements things this way, probably for good > reason, but if I can't throw out crazy ideas here, where can I? Yes, or leave huge gaps in your integer values, or use something like a bit string, where you can keep tacking bits on. I pursued this type of solutions for quite a while before I used the BTree solution. It still gets quite messy in large situations. I came up with the BTree solution because it was far less messy, im my experience, and scaled much better. It is not clear when you talk about "the first solution" if you mean keeping consecutive ordering throughout the hierarchy, or only of siblings. Keeping it throughout the hierarchy is even less managable. > P.S. Ray, you missed my point on the whole Object.equals thing. My point > is that if we look to java for guidance (which must make *someone* out there > cringe :), than the way equals is implemented in String is the exception > rather than the norm. I don't think nodes are like strings at all. I don't think I missed the point. You didn't say to look to Java for guidance. You said to look to the default implementation in Java Object, which I argued does not and can not represent the purpose of equals in Java, which String, Color, DataFlavor, Dimension, Font, Insets, MenuShortcut, Point, Rectangle, File ... -- any of the 63 classes in jdk1.1.7a that override equals -- do a better job of representing. String and these other classes are not the exception. They are the rule, point, and whole purpose of having an equals method. Classes which have not overridden equals have a less meaningful definition. I use "equals" on classes which have overridden it much more often than on those which have not overridden it. If I want to know whether two are the same allocation, I will use "==". If I want to know if one successfully substitutes for the other without changing meaning, I use "equals". There can be ambiguity in judging what should be significant in the equals call, but it is not unreasonable to expect that the Java DOM binding might eventually specify some behavior here, which would not be the "==" comparison. Ray Whitmer From paul@prescod.net Mon Dec 14 19:03:23 1998 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Dec 1998 13:03:23 -0600 Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML References: <36743271.376A09A8@prescod.net> <36750B52.EE1EBC7D@digicool.com> Message-ID: <367560FB.632C3ED5@prescod.net> Jim Fulton wrote: > > Paul Prescod wrote: > > > > Of course Zope must eventually move into the XML world. Zope needs to do > > templates. > > It already does, via DTML. Right, but DTML code is not valid XML code. It can't be edited in an XML editor, stored in an XML repository, routed through XML-based workflow, etc. etc. > > XSL also does templates. > > I would have thought that XSL *was* a template mechanism. What do you mean > by "template"? XSL can be thought of as a template mechanism. But an XSL stylesheet has many templates and describes a flow of control between them, whereas DTML documents are a single template. > > In fact templates are almost as central to XSL as they are to Zope. > > I would say far more so, Fair enough. I meant to say that that they are almost as central to XSL as they are to DTML. six of one... > > I would suggest that Zope should use > > XSL template syntax for DTML templates as far as is possible. > > It appears to me that DTML and XSL represent two very different > approaches to solving the same or similar problems. They are > both intended for generating text from objects. DTML generates text > from Python objects. XSL generates text from XML objects. Not quite. XSL generates XML objects (technically speaking, "nodes") from other XML objects (other nodes). > DTML takes a higly procedural approach. In DTML, you generate > text directly. In XSL (as I understand it) you specify a set of > rules for applying transformations to XML elements. This is fairly > declarative in nature. In the example you gave, you didn't render a > specific picture element. Instead, you have a rule for converting > picture elements to img tags. Right. But the same holds for DTML. You don't write DTML to generate an IMG tag for a specific picture. If you knew exactly what picture you wanted, you would use the HTML for it. You use DTML extensions when you want to figure out the picture to use at runtime, just like in XSL. I don't see this as a difference. > Another difference between DTML and XSL is in how content is determined. > DTML is typically used to define as well as format content. A DTML > document directly specifies data that is often extracted from large > object spaces. In XSL, it appears that the content is largely defined > by a source document and an XSL "template" simply specifies transformations. > Of course, an XSL specification can also filter, so there is some > ability to extract, but it is much less direct than with DTML. What you seem to be saying is that DTML works on large Python object-bases and XSL works on small XML document inputs. But that is a difference in degree, not in kind. I could encode a phonebook as a single XML document and use XSL to generate a list of all of the numbers in a particular zipcode. How is that different from using DTML in the same context to solve the same problem? The big difference, of course, is that XSL's set of expressions is quite limited where as Python is quite flexible. That's why I propose using the same syntax but changing the expressions to be Python expressions. > Given the very different natures of DTML and XSL, I don't see much > point in making the syntaxes all that consistent. Do you have another XML-compliant syntax in mind or have you decided that XML compliance isn't critical? > It may very well. If Zope made it easy to generate XML from Zope (ie Python) > objects, then people who like XSL could apply XSL transformations to the > resulting XSL, bypassing DTML altogether. Sure, but how do I specify the objects that I want to work on from the XSL stylesheet? You can't [*] export the database as a single XML document, so you must allow a syntax that allows drilling into Python objects: Python syntax. [*] It is vaguely possible that un-extended XSL could work directly on a Zope database if we could express all Python objects as XML data... this requires more thought...but even so, you couldn't evaluate arbitrary Python code, you could only refer to preexisting objects. > In other words, I see XSL as an alternative to DTML, not another form of it. I don't really see the difference. Either an extended XSL replaces DTML or an XSL-syntax DTML replaces DTML. All I'm saying is that the next generation templating syntax should be XSL-based. > Or, DTML may turn out to be a good tool for generating XML from objects, and > then XSL could be applied to DTML output, in which case the two would > act in tandem. Why have two steps? It seems better to just use XSL syntax, either extended with Python expression syntax or not. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From hinsen@cnrs-orleans.fr Mon Dec 14 20:37:48 1998 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Mon, 14 Dec 1998 21:37:48 +0100 Subject: [XML-SIG] XML 0.5 problems Message-ID: <199812142037.VAA19848@dirac.cnrs-orleans.fr> I just tried to install the latest XML package release, to make sure that my XML exploration session planned for the Christmas vacation won't be spoiled by technical problems. And here they are. I did the test installation on an AIX 4.3 machine running Python 1.5.1. 1) At first try nothing looked right, and nothing worked. Some exploration revealed that my standard reflex of replacing Makefile.pre.in by my patched one was not such a good idea, because the XML package includes a modified version. I understand that this is the easiest way to handle installation, but it also presents problems: - Makefile.pre.in varies with Python versions - Some people need patched versions; for example, the standard version does not work for AIX. It wasn't much trouble for me to patch the file coming with XML to work with AIX, but only because I had also done the original patch. I recommend a more robust installation approach for the final release (perhaps a short Python script...) 2) I then tried some of the demos, again with little success. Some examples: cd unicode; python test.py Traceback (innermost last): File "test.py", line 1, in ? from xml.unicode import wstring ImportError: No module named unicode cd sax; python saxdemo.py Traceback (innermost last): File "saxdemo.py", line 5, in ? from xml.sax import saxexts, saxlib, saxutils ImportError: No module named sax Then I tried a few simple imports, with the result that I can import xml, but none of its subpackages, although all the directories exist and contain something that looks right. But all imports *do* work if the current directory is the xml-0.5 installation directory. I do have . in PYTHONPATH, which probably explains the difference. My conclusion: something is wrong with the installation! Happy bug hunting, Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From cowan@locke.ccil.org Mon Dec 14 20:45:48 1998 From: cowan@locke.ccil.org (John Cowan) Date: Mon, 14 Dec 1998 15:45:48 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <000101be252f$fa764c60$da39bfa8@arabbit> <13937.24366.729293.26105@weyr.cnri.reston.va.us> Message-ID: <367578FC.373DACD1@locke.ccil.org> Fred L. Drake wrote: > Perhaps there is no fully general equality that isn't identity? To be precise: Fully general equality (fge) for mutable objects is identity. Fge for immutable objects is the fge-ness of their parts, since indiscernable objects are identical (Leibniz's criterion). E.g. immutable strings are equal if their characters are equal, but (mutable) vectors are equal only if they are identical objects. (There are other definitions of equality, of course, but they are not general.) -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) From Fred L. Drake, Jr." References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <000101be252f$fa764c60$da39bfa8@arabbit> <13937.24366.729293.26105@weyr.cnri.reston.va.us> <367578FC.373DACD1@locke.ccil.org> Message-ID: <13941.31549.873183.1048@weyr.cnri.reston.va.us> John Cowan writes: > To be precise: Fully general equality (fge) for mutable objects is > identity. Fge for immutable objects is the fge-ness of their parts, > since indiscernable objects are identical (Leibniz's criterion). Leibniz? Wow, and to think I actually know the name! Shades of a day long past! (I first heard of Leibniz when I studied architecture, of all things!) I think this is just about where we've ended up on this one, but it is definately stricter than is generally used for Python. Typically, two Python objects (let's take lists as an examples) are considered equal if their contents are the same; equality of two objects is not considered to be an unchangable characteristic. If I have two lists: a = [1, 2] b = [1, 2] they are considered equal now, but if I then do this: a.reverse() they are no longer equal. I think the biggest problem for doing this with DOM nodes is the issue of context: if the parents are different, the nodes should probably be considered different. Now, if I create two different nodes and insert equivalent data into each (say, character data nodes that contain equal data), I think they should compare equal. The problem is that this is not the interesting case in practice. What I *wanted* was less clearly a matter of equality, and more a matter of a particularly strong correspondence. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Jeffrey@digicool.com Mon Dec 14 21:15:55 1998 From: Jeffrey@digicool.com (Jeffrey Shell) Date: Mon, 14 Dec 1998 16:15:55 -0500 Subject: [XML-SIG] RE: [Zope] - Re: [XML-SIG] Re: [Zope] - Zope, DTML and XML Message-ID: <613145F79272D211914B0020AFF6401901AD4A@gandalf.digicool.com> > Right, but DTML code is not valid XML code. It can't be > edited in an XML > editor, stored in an XML repository, routed through XML-based > workflow, > etc. etc. But using DTML you can generate any kind of XML that you want, and get the level of effectiveness that you are stating. And then that generated XML can be routed through all the XML based workflow that you want. I've done a couple of relatively small experiments with this, once using DTML over a sequence of Tabula records to generate a good size XML file to test an XML parser. So it was an XML expression of objects in the database, but using DTML allowed me to make the XML structure that I wanted independant of the structure of the Zope database or the Tabula database or what-have-you. And even that doesn't stop you from using XSL directly in Zope. On NT *twitch* with IE5 *twitch* you could make a Zope object to call Microsofts XSL processor. Then you could have a DTML page that goes over some sort of query to return an XML document linked to the XSL style sheet. Add another document that calls the XSL processor and passes in the rendered XML document, and Walla! > > > I would suggest that Zope should use > > > XSL template syntax for DTML templates as far as is possible. > > > > It appears to me that DTML and XSL represent two very different > > approaches to solving the same or similar problems. They are > > both intended for generating text from objects. DTML generates text > > from Python objects. XSL generates text from XML objects. > > Not quite. XSL generates XML objects (technically speaking, > "nodes") from > other XML objects (other nodes). Using XML to go through XML and generate XML? :) > The big difference, of course, is that XSL's set of > expressions is quite > limited where as Python is quite flexible. That's why I > propose using the > same syntax but changing the expressions to be Python expressions. Doesn't this kill off any sort of 'XSL portability'? I can imagine a system where both Zope and, say, Access *twitch* (independantly of Zope) could generate XML documents of the same or similar DTD and have the same XSL document(s) be able to render them both on an entirely different machine. > > Given the very different natures of DTML and XSL, I don't see much > > point in making the syntaxes all that consistent. > > Do you have another XML-compliant syntax in mind or have you > decided that > XML compliance isn't critical? It's easy to write some DTML to generate XML. There's a big piece of compliance right there. Currently, there's no XML on the intake side. I think this is a _far_ more important thing to do than spend a bunch of time writing yet another XSL parser. I would rather be able to generate that phone book file as XML and be able to upload it into Zope as intelligent-ish Zope objects (into a Tabula, as Zope Folders, or who-knows-what) and write a simple DTML document rather than the complex XSL involved. Then I can add the ability for people to add new phone numbers and modify their entries in this dataset and re-export it to a new updated XML file, and use some other XSL parser to generate a printable phone book, PDF, RTF, and HTML from that. The XML file can be easily done in DTML by: > > It may very well. If Zope made it easy to generate XML from > Zope (ie Python) > > objects, then people who like XSL could apply XSL > transformations to the > > resulting XSL, bypassing DTML altogether. > > Sure, but how do I specify the objects that I want to work on > from the XSL > stylesheet? You can't [*] export the database as a single XML > document, so > you must allow a syntax that allows drilling into Python > objects: Python > syntax. See above. There's a few variations that can be done. > I don't really see the difference. Either an extended XSL > replaces DTML or > an XSL-syntax DTML replaces DTML. All I'm saying is that the next > generation templating syntax should be XSL-based. XSL is _much much much_ tougher for beginners to Grok. It's very very powerful, yes, but sometimes just to do a simple tabular based report in it is waaaay too much of a headache. We discussed this a long time ago here at digicool with just an XML based replacement for DTML (XSL was just barely off the drawing board at the time of these discussions). I still emphasize that it's (a) not that hard to generate complient XML using DTML (the DTML document itself doesn't have to be XML complient, just the document as rendered), and (b) importing XML should be a bigger priority. just my .02. From jday@csihq.com Mon Dec 14 21:20:14 1998 From: jday@csihq.com (John Day) Date: Mon, 14 Dec 1998 16:20:14 -0500 Subject: [XML-SIG] Normalized AttVals Message-ID: <3.0.1.32.19981214162014.006a5290@mail.csihq.com> Forgive my ignorance of Python and the XML standards, but I am confused by the behavior of pyexpat. Re: quoted attribute contents ("AttVal") When '>' is encountered e.g. it is "normalized" to '>', however, when '&' is encountered it is a fatal error e.g. Is this pyexpat behavior correct? Why can't the parser tell that '&b' above is _not_ a defined entity because it is not terminated by ';'? It seems to me that this usage could be normalized to '&b', just like pyexpat did for '>'. Then it would be backward compatible with HTML (sort of). The impact of this seems to be enormous. All of the existing HTML parameter generators will have to change the way they post arguments, when HTML is replaced by XML, right? -jday From michael@graphion.com Mon Dec 14 21:25:10 1998 From: michael@graphion.com (Michael Sanborn) Date: Mon, 14 Dec 1998 13:25:10 -0800 Subject: [XML-SIG] Re: New to Python OO Message-ID: <36758235.57BC4FBE@graphion.com> Fred L. Drake writes: > There are two questions that need to be addressed here: 1) How > should all this work, and 2) how to make it work now. > Let's start with the second question, since it's easier. This is > an > approach I've used to write out an ESIS stream, so I can claim it > works. Write the transform you want as a function (or maybe an > object, if that's more conventient for state management), and pass > the > document to it. It just needs to walk the tree and handle each node > type appropriately. Yes, this gets me over the hump just fine, thanks. I'm now able to write out the result of SQL queries as XML and then, with only a few lines of additional code (subclassing Walker), alternatively write it out in my company's proprietary typesetting markup. And this after less than a month's acquaintance with Python. I think I'm in love! When I have a little more time, I'll also look at Gabe Wachob's Visitor class (recently posted to this list), to see if I can also do it the way it 'should' be done. :-) Thanks for everything. Michael Sanborn Graphion Typesetting From akuchlin@cnri.reston.va.us Mon Dec 14 21:51:01 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 14 Dec 1998 16:51:01 -0500 (EST) Subject: [XML-SIG] Normalized AttVals In-Reply-To: <3.0.1.32.19981214162014.006a5290@mail.csihq.com> References: <3.0.1.32.19981214162014.006a5290@mail.csihq.com> Message-ID: <13941.34375.768950.944498@amarok.cnri.reston.va.us> John Day writes: >Re: quoted attribute contents ("AttVal") >When '>' is encountered e.g. it is "normalized" >to '>', however, when '&' is encountered it is a fatal >error e.g. > >Is this pyexpat behavior correct? Why can't the parser tell that >'&b' above is _not_ a defined entity because it is not terminated >by ';'? It seems to me that this usage could be normalized to >'&b', just like pyexpat did for '>'. Then it would be backward >compatible with HTML (sort of). Actually, the fact that the above HTML href works is an artifact of the error recovery in HTML parsers; you really are supposed to write . There were some lengthy threads about this in comp.infosystems.www.authoring.html a few months ago, when someone found that in "a=1§ion=4", their browser was picking up § and turning it into a character, which made the link not behave as expected. I think the XML community wishes to avoid depending on error recovery in this way, because it leads to the same pit that HTML fell into. HTML parsers were really forgiving of invalid HTML, so few authors bothered to check whether their HTML was valid, so you could never, ever switch to using a stricter parser because so little of the HTML in existence would be accepted by it. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ And Herakles was full of it. He just got dead drunk for a couple of weeks in Phrygia and told everyone he'd been to the land of the dead. -- Death, in SANDMAN: "The Song of Orpheus" From akuchlin@cnri.reston.va.us Mon Dec 14 21:55:47 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 14 Dec 1998 16:55:47 -0500 (EST) Subject: [XML-SIG] XML 0.5 problems In-Reply-To: <199812142037.VAA19848@dirac.cnrs-orleans.fr> References: <199812142037.VAA19848@dirac.cnrs-orleans.fr> Message-ID: <13941.35069.861693.617350@amarok.cnri.reston.va.us> Konrad Hinsen writes: > - Some people need patched versions; for example, the standard > version does not work for AIX. What's the patch that's required for AIX? And is there some reason it can't be rolled into the Makefile.pre.in for 1.5.2? >2) I then tried some of the demos, again with little success. Some > examples: > > cd unicode; python test.py > Traceback (innermost last): > File "test.py", line 1, in ? > from xml.unicode import wstring > ImportError: No module named unicode Are you getting this error after you've installed the package under site-packages? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Americans are benevolently ignorant about Canada, while Canadians are malevolently well informed about the United States. -- J. Bartlett Brebner From cowan@locke.ccil.org Mon Dec 14 21:56:38 1998 From: cowan@locke.ccil.org (John Cowan) Date: Mon, 14 Dec 1998 16:56:38 -0500 Subject: [XML-SIG] RE: Equality tests on DOM nodes References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <000101be252f$fa764c60$da39bfa8@arabbit> <13937.24366.729293.26105@weyr.cnri.reston.va.us> <367578FC.373DACD1@locke.ccil.org> <13941.31549.873183.1048@weyr.cnri.reston.va.us> Message-ID: <36758996.B9842B78@locke.ccil.org> Fred L. Drake wrote: > Typically, > two Python objects (let's take lists as an examples) are considered > equal if their contents are the same; equality of two objects is not > considered to be an unchangable characteristic. The trouble with that scheme is that it makes equality hard to reason about. Intuitively, we expect equality to be transitive, (if a = b and b = c then a = c), reflexive (a = a), and symmetrical (if a = b then b = a). Making equality depend on mutable properties defeats this: a might = b at one time, but a later check for b = a might fail. > a.reverse() I presume this is a *destructive* reverse (leaves a reversed)? -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) From Michael.Scharf@gmx.de Mon Dec 14 21:56:11 1998 From: Michael.Scharf@gmx.de (Michael Scharf) Date: Mon, 14 Dec 1998 22:56:11 +0100 Subject: [XML-SIG] Q: which XML would you recommend? Message-ID: <3675897B.D926F51@gmx.de> I need a Christmas present for myself ;-) Today I was in the bookstore looking for a XML book. There are very many (some have ~1000 pages?!)! What I am looking for is a Python-Tutorial/O'Reiley style book. Something for someone who knows programming and HTML and a bit of SGML. No XML for dummies with 10 pages explaining what means. Also nothing that explains everything 'very theoretically' without any example. A practical introduction where I can start doing while I read (or imagining what and how I could do it). 100-200 pages would be best. Thanks for your help. Michael -- ''''\ Michael Scharf ` c-@@ TakeFive Software ` > http://www.TakeFive.com \_ V mailto:Michael_Scharf@TakeFive.co.at From akuchlin@cnri.reston.va.us Mon Dec 14 22:02:44 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 14 Dec 1998 17:02:44 -0500 (EST) Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat In-Reply-To: References: Message-ID: <13941.35434.210358.532950@amarok.cnri.reston.va.us> Bruce Bennett writes: > Python 1.5.1 (#37, Apr 27 1998, 13:36:17) [CW CFM68K w/GUSI w/MSL] > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import sys > >>> import pyexpat > Traceback (innermost last): > File "", line 1, in ? > ImportError: PythonCore--PySys_WriteStderr: > A fragment had "hard" unresolved imports. Are you using one of the pre-releases of the xml-0.5 package? PySys_WriteStderr is a C function that was added after 1.5.1; this problem was fixed in the final release by adding a private version of PySys_WriteStderr. Possibly the #ifdef that enables the private version is wrong, or perhaps you have one of the prereleases of the code. (Let me know what you find...) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ A slovenly action repeated thrice has become a habit. -- Robertson Davies, _Leaven of Malice_ (?) From jim.fulton@Digicool.com Mon Dec 14 22:13:02 1998 From: jim.fulton@Digicool.com (Jim Fulton) Date: Mon, 14 Dec 1998 17:13:02 -0500 Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML References: <36743271.376A09A8@prescod.net> <36750B52.EE1EBC7D@digicool.com> <367560FB.632C3ED5@prescod.net> Message-ID: <36758D6E.A925B0FF@digicool.com> Paul Prescod wrote: > > Jim Fulton wrote: > > > > Paul Prescod wrote: > > > > > > Of course Zope must eventually move into the XML world. Zope needs to do > > > templates. > > > > It already does, via DTML. > > Right, but DTML code is not valid XML code. It can't be edited in an XML > editor, stored in an XML repository, routed through XML-based workflow, > etc. etc. Is that important? Python isn't valid XML code either, but it's still useful. I think it would be useful if there was an XML-compatible syntax for DTML, but I don't see that having much to do with XSL. The difference between XSL and DTML run far deeper than syntax. (I had a similar discussion with some folks a while back wrt ASP and DTML. On the surface, DTML and ASP are similar, but the semantics are really very different.) > > > XSL also does templates. > > > > I would have thought that XSL *was* a template mechanism. What do you mean > > by "template"? > > XSL can be thought of as a template mechanism. But an XSL stylesheet has > many templates and describes a flow of control between them, whereas DTML > documents are a single template. OK, OK, what ever. You know alot more about XSL that I do. :) > > > In fact templates are almost as central to XSL as they are to Zope. > > > > I would say far more so, > > Fair enough. I meant to say that that they are almost as central to XSL as > they are to DTML. six of one... > > > > I would suggest that Zope should use > > > XSL template syntax for DTML templates as far as is possible. > > > > It appears to me that DTML and XSL represent two very different > > approaches to solving the same or similar problems. They are > > both intended for generating text from objects. DTML generates text > > from Python objects. XSL generates text from XML objects. > > Not quite. XSL generates XML objects (technically speaking, "nodes") from > other XML objects (other nodes). Ditto. > > DTML takes a higly procedural approach. In DTML, you generate > > text directly. In XSL (as I understand it) you specify a set of > > rules for applying transformations to XML elements. This is fairly > > declarative in nature. In the example you gave, you didn't render a > > specific picture element. Instead, you have a rule for converting > > picture elements to img tags. > > Right. But the same holds for DTML. You don't write DTML to generate an > IMG tag for a specific picture. Often you do. Or at least, you typically start out with a relatively specific thing. For example, an in tag is applied to a specific collection or to the results of a specific call (e.g. a database query). Then, code is applied to elements within the collection. > If you knew exactly what picture you > wanted, you would use the HTML for it. You use DTML extensions when you > want to figure out the picture to use at runtime, just like in XSL. I > don't see this as a difference. XSL is rule-based. You don't say "interate over this and within this iteration output X and then output Y". In XSL (speaking as someone pretty ignorant of XSL ;) you say things like "if you see a Foo, convert it to a bar ....". It's like the difference between Python and Prolog (or, uh, sendmail.cf ... sorry, low blow revealing XSL skepticism ;). > > Another difference between DTML and XSL is in how content is determined. > > DTML is typically used to define as well as format content. A DTML > > document directly specifies data that is often extracted from large > > object spaces. In XSL, it appears that the content is largely defined > > by a source document and an XSL "template" simply specifies transformations. > > Of course, an XSL specification can also filter, so there is some > > ability to extract, but it is much less direct than with DTML. > > What you seem to be saying is that DTML works on large Python object-bases > and XSL works on small XML document inputs. DTML makes calls into a large object base, typically pulling out a small subset. XSL on the other hand seems to be geared toward transforming a body of data. > But that is a difference in > degree, not in kind. It doesn't feel like the same sort of thing to me. Perhaps I'm just too ignorant of XSL. > I could encode a phonebook as a single XML document > and use XSL to generate a list of all of the numbers in a particular > zipcode. How is that different from using DTML in the same context to > solve the same problem? It's not different. I think DTML and XML problem spaces definately overlap. In fact, I'd be happy to drop the argument that the problem spaces are different. I still think the approaches are too different to make it worthwhile to try to turn one into the other. > The big difference, of course, is that XSL's set of expressions is quite > limited where as Python is quite flexible. That's why I propose using the > same syntax but changing the expressions to be Python expressions. > > > Given the very different natures of DTML and XSL, I don't see much > > point in making the syntaxes all that consistent. > > Do you have another XML-compliant syntax in mind or have you decided that > XML compliance isn't critical? I have a syntax in mind. But that seems to me to be beside the point. This discussion isn't really about syntax issues, is it? > > It may very well. If Zope made it easy to generate XML from Zope (ie Python) > > objects, then people who like XSL could apply XSL transformations to the > > resulting XSL, bypassing DTML altogether. > > Sure, but how do I specify the objects that I want to work on from the XSL > stylesheet? You can't [*] export the database as a single XML document, so > you must allow a syntax that allows drilling into Python objects: Python > syntax. If this is true, then you seem to be supporting my argument that one way that the two is different is that DTML is geared toward drilling into an object space while XSL is geared to transforming a body of data. > [*] It is vaguely possible that un-extended XSL could work directly on a > Zope database if we could express all Python objects as XML data... this > requires more thought...but even so, you couldn't evaluate arbitrary > Python code, you could only refer to preexisting objects. > > > In other words, I see XSL as an alternative to DTML, not another form of it. > > I don't really see the difference. Either an extended XSL replaces DTML or > an XSL-syntax DTML replaces DTML. Why must one replace the other? You don't believe that there should be only one programming language, do you? I think that the approaches of these two systems apeal to different users. > All I'm saying is that the next > generation templating syntax should be XSL-based. This is what you think. We'll have to agree to disagree. > > Or, DTML may turn out to be a good tool for generating XML from objects, and > > then XSL could be applied to DTML output, in which case the two would > > act in tandem. > > Why have two steps? OK, let's eliminate the XSL step. ;) Seriously, DTML and XSL have different strengths. Sometimes we combine DTML and Python, or even C, because DTML isn't good for everything. In fact, an idea that we are very fond of with Zope if that objects can have methods written in a multitude of languages (possibly by a multitude of people). Right now, it's not unusual to have objects with methods written in 4 different languages (Python, C, DTML, SQL). I'm perfectly happy to see XSL thrown into the mix. > It seems better to just use XSL syntax, either > extended with Python expression syntax or not. I don't agree. Of course, I'm happy to see people experiment. It doesn't sound to me like you want an XSL syntax for DTML. It sounds more to me like you want some sort of XSL processor in Zope (or just Python) that is extended to make calls into an object system. If you think you can adapt DTML to this somehow, go for it. I'll be interested to see what you come up with. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gwachob@aimnet.com Mon Dec 14 22:43:31 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Mon, 14 Dec 1998 14:43:31 -0800 (PST) Subject: [XML-SIG] Re: New to Python OO In-Reply-To: <36758235.57BC4FBE@graphion.com> Message-ID: On Mon, 14 Dec 1998, Michael Sanborn wrote: > Fred L. Drake writes: > When I have a little more time, I'll also look at Gabe Wachob's Visitor > class (recently posted to this list), to see if I can also do it the way > it 'should' be done. :-) I'm reworking it constantly -- I've added a notion of "subtree value" to it -- the idea that an entire subtree can be visited and produce a string which represents its "value" (alternatively, visiting a subtree can produce side effects like populating a dictionary for use by another subtree). Its not terribly clean (ie, the default behavior for node's value is to take the node's "Value" as returned by the visit method called on that node and append that value to the value of each of the node's children. This will basically "flatten" an XML file (The default value of a text node is the text itself -- other nodes' default values are ""). Anyway, when I get the current version working I'll post it up at the same URL -- http://www.aimnet.com/~gwachob/DOMVisitor.py -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From paul@prescod.net Mon Dec 14 23:55:52 1998 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Dec 1998 17:55:52 -0600 Subject: [XML-SIG] Normalized AttVals References: <3.0.1.32.19981214162014.006a5290@mail.csihq.com> Message-ID: <3675A588.E7A7D999@prescod.net> John Day wrote: > > Re: quoted attribute contents ("AttVal") > When '>' is encountered e.g. it is "normalized" > to '>', however, when '&' is encountered it is a fatal > error e.g. That's what the XML spec says. AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" That means that "<" and "&" are never allowed in attribute values except as parts of an attribute reference. > Is this pyexpat behavior correct? Why can't the parser tell that > '&b' above is _not_ a defined entity because it is not terminated > by ';'? That's what full SGML does, but that's not what XML does. XML is supposed to be easier to implement. > It seems to me that this usage could be normalized to > '&b', just like pyexpat did for '>'. Then it would be backward > compatible with HTML (sort of). There are several ways that it isn't backwards compatible with HTML > The impact of this seems to be enormous. All of the existing HTML > parameter generators will have to change the way they post arguments, > when HTML is replaced by XML, right? This has been a known problem for a long time. http://www.uni-ulm.de/uni/fak/natwis/strudo/ampersand.html Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From SBEAKLEY@uact.edu Tue Dec 15 02:36:51 1998 From: SBEAKLEY@uact.edu (Sara Beakley) Date: Mon, 14 Dec 1998 19:36:51 -0700 Subject: [XML-SIG] unsubscribe Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------ =_NextPart_001_01BE27D3.C5D32D70 Content-Type: text/plain unsubscribe > ---------- > From: John Cowan[SMTP:cowan@locke.ccil.org] > Sent: Monday, December 14, 1998 2:56 PM > To: DOM List; xml-sig@python.org > Subject: Re: [XML-SIG] RE: Equality tests on DOM nodes > > Fred L. Drake wrote: > > > Typically, > > two Python objects (let's take lists as an examples) are considered > > equal if their contents are the same; equality of two objects is not > > considered to be an unchangable characteristic. > > The trouble with that scheme is that it makes equality hard to > reason about. Intuitively, we expect equality to be transitive, > (if a = b and b = c then a = c), reflexive (a = a), and symmetrical > (if a = b then b = a). Making equality depend on mutable properties > defeats this: a might = b at one time, but a later check for > b = a might fail. > > > a.reverse() > > I presume this is a *destructive* reverse (leaves a reversed)? > > -- > John Cowan http://www.ccil.org/~cowan cowan@ccil.org > You tollerday donsk? N. You tolkatiff scowegian? Nn. > You spigotty anglease? Nnn. You phonio saxo? Nnnn. > Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) > ------ =_NextPart_001_01BE27D3.C5D32D70 Content-Type: text/html Content-Transfer-Encoding: quoted-printable unsubscribe

unsubscribe

------ =_NextPart_001_01BE27D3.C5D32D70-- From gwachob@aimnet.com Tue Dec 15 07:05:27 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Mon, 14 Dec 1998 23:05:27 -0800 (PST) Subject: [XML-SIG] Parsers which include external parsed entities Message-ID: Are there any parsers out there which automatically include external parsed entities? I am building an app which has (to begin with) a list of items (urls) and a categorization breakdown of those items. I'd like to keep the items in a separate file from the categorizations and "glue" them together for purposes of the application in a third file by including external parsed entity references to those other xml files. I could parse them as separate files, but thats not "pretty" (but it probably is more efficient ;-) I've found that I have to pipe my xml files through SGMLNORM (which, ugh, upcases all my tags) to get this effect. Is there a technical reason why these parses DON'T automatically "include" externally parsed entities when producing SAX or ESIS (and then DOM) output? I know there is no requirement that the external entities be parsed, but are there parsers (written in Python or other languages) that you can force to include external parsed entities? -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From fredrik@pythonware.com Tue Dec 15 08:56:59 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 15 Dec 1998 09:56:59 +0100 Subject: [XML-SIG] RE: Equality tests on DOM nodes Message-ID: <00db01be2808$f0d01250$f29b12c2@pythonware.com> John Cowan wrote: > Fred L. Drake wrote: > >> Typically, >> two Python objects (let's take lists as an examples) are considered >> equal if their contents are the same; equality of two objects is not >> considered to be an unchangable characteristic. > >The trouble with that scheme is that it makes equality hard to >reason about. Intuitively, we expect equality to be transitive, >(if a = b and b = c then a = c), reflexive (a = a), and symmetrical >(if a = b then b = a). Making equality depend on mutable properties >defeats this: a might = b at one time, but a later check for >b = a might fail. Do your bank agree with you on this one? ("hey, I know there was $1000 on this account a week ago, and it's definitely the same account number!") (but sure, Python provides the "is" operator if you really want to test for object identity. Beginners seem to have trouble grasping that concept, though, so I doubt it qualifies as "intuitive"...) Cheers /F fredrik@pythonware.com http://www.pythonware.com From hinsen@cnrs-orleans.fr Tue Dec 15 09:48:25 1998 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 15 Dec 1998 10:48:25 +0100 Subject: [XML-SIG] XML 0.5 problems In-Reply-To: <13941.35069.861693.617350@amarok.cnri.reston.va.us> (akuchlin@cnri.reston.va.us) References: <199812142037.VAA19848@dirac.cnrs-orleans.fr> <13941.35069.861693.617350@amarok.cnri.reston.va.us> Message-ID: <199812150948.KAA13922@dirac.cnrs-orleans.fr> > > - Some people need patched versions; for example, the standard > > version does not work for AIX. > > What's the patch that's required for AIX? And is there some > reason it can't be rolled into the Makefile.pre.in for 1.5.2? I don't know the system well enough to decide. The problem is that shared library linking is a rather complicated process under AIX, which is handled by two shell scripts. These shell scripts come with the Python distribution (they are "ld_so_aix" and "makexp_aix") and are ultimately installed in the "config" subdirectory of the Python library. But during the compilation of the interpreter and its standard library modules, they reside in the "Modules" subdirectory of the Python distribution. The settings in the configuration reflect this initial situation, not the one after installation. So if you use the standard Makefile.pre.in, the two critical definitions becom LINKCC= $(srcdir)/makexp_aix python.exp "" $(LIBRARY); $(PURIFY) $(CC) LDSHARED= $(srcdir)/ld_so_aix $(CC) whereas they should be LINKCC= $(LIBPL)/makexp_aix $(LIBPL)/python.exp "" $(LIBRARY); $(PURIFY) $(CC) LDSHARED= $(LIBPL)/ld_so_aix $(CC) -bI:$(LIBPL)/python.exp I suppose this could be arranged during the installation process, but I don't really want to figure out how that works! > > cd unicode; python test.py > > Traceback (innermost last): > > File "test.py", line 1, in ? > > from xml.unicode import wstring > > ImportError: No module named unicode > > Are you getting this error after you've installed the package > under site-packages? Forget about this problem; I found out that I had a file xml.py somewhere else on my PYTHONPATH. I have no idea where it comes from, but deleting it didn't seem to have any negative effect. Sorry for the false alarm! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From larsga@ifi.uio.no Tue Dec 15 10:27:55 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 15 Dec 1998 11:27:55 +0100 Subject: [XML-SIG] Parsers which include external parsed entities In-Reply-To: References: Message-ID: * Gabe Wachob | | Are there any parsers out there which automatically include external | parsed entities? xmlproc does, both in validating and well-formedness mode. | I've found that I have to pipe my xml files through SGMLNORM (which, | ugh, upcases all my tags) to get this effect. Why not use SX instead? That shouldn't have the same problem. | Is there a technical reason why these parsers DON'T automatically | "include" externally parsed entities when producing SAX or ESIS (and | then DOM) output? Many parsers don't bother to parse the internal DTD subset and so don't have any entity information. For the rest I don't really know. --Lars M. From akuchlin@cnri.reston.va.us Tue Dec 15 13:42:50 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 15 Dec 1998 08:42:50 -0500 (EST) Subject: [XML-SIG] Q: which XML would you recommend? In-Reply-To: <3675897B.D926F51@gmx.de> References: <3675897B.D926F51@gmx.de> Message-ID: <13942.25794.548851.696945@amarok.cnri.reston.va.us> Michael Scharf writes: >Today I was in the bookstore looking for a XML book. There >are very many (some have ~1000 pages?!)! What I am looking >for is a Python-Tutorial/O'Reiley style book. Something for >someone who knows programming and HTML and a bit of SGML. No I've read the issue of O'Reilly's late _Web Journal_ about XML; it was a nice overview, but it's now outdated in many respects. Sean McGrath's _XML By Example_ is sitting in my to-read pile, but I haven't gotten around to it yet. If anyone has read other XML books, brief recommendations (or warnings) for the book page would be great... -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "And are your Lord's lessons learned in *you*, Cannon?" "I am confident that I will pass through St. Peter's gates with only minor negotiations." -- The Sandman and the Cannon, in SANDMAN MYSTERY THEATRE: "The Cannon", act IV From Fred L. Drake, Jr." References: <13937.18858.948855.840376@amarok.cnri.reston.va.us> <000101be252f$fa764c60$da39bfa8@arabbit> <13937.24366.729293.26105@weyr.cnri.reston.va.us> <367578FC.373DACD1@locke.ccil.org> <13941.31549.873183.1048@weyr.cnri.reston.va.us> <36758996.B9842B78@locke.ccil.org> Message-ID: <13942.29753.814023.621356@weyr.cnri.reston.va.us> John Cowan writes: > The trouble with that scheme is that it makes equality hard to > reason about. Intuitively, we expect equality to be transitive, > (if a = b and b = c then a = c), reflexive (a = a), and symmetrical > (if a = b then b = a). Making equality depend on mutable properties > defeats this: a might = b at one time, but a later check for > b = a might fail. That is correct. This is very important for the programmer to know about, and is a real consideration when designing a class for which equality or ordering are important issues. This is one reason why many Python programmers use a minimalist approach for immutable data: it's clear that a particular value will not change underneath you. However, I don't think comparison of mutable objects is necessarily a signigicant problem. I think most programmers expect equality of objects to be meaning only when the comparison is made; any longevity of the result depends on the specific guarantees made by that object. > > a.reverse() > > I presume this is a *destructive* reverse (leaves a reversed)? Yes, that's exactly how the list .reverse() method operates.. I think we're sufficiently off-topic; we can move this to personal email or some other forum if you wish to continue. The topic is interesting. This might be good for comp.lang.python. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From db@Eng.Sun.COM Tue Dec 15 17:42:09 1998 From: db@Eng.Sun.COM (David Brownell) Date: Tue, 15 Dec 1998 09:42:09 -0800 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com> Message-ID: <36769F71.A90D8C2A@eng.sun.com> On the general topic of "equality", I hope that it's clear to everyone that there are almost innumerable definitions of the notion based on the particular task being performed ... don't go hoping for a single universal "always useful" definition!!! Ray Whitmer wrote: > > it is not unreasonable to expect that the Java DOM binding > might eventually specify some behavior here, which would not be the "==" > comparison. Though there's one thing to consider: The behavior of Object.equals() and Object.hashCode() is specified to make objects work as hashtable keys in the natural manner. For example, strings can be used as keys since they're immutable and equals() is overridden ... were they mutable, or did they not override equals(), that'd not be so. If org.w3c.dom.Node.equals(Object) were defined to invoke the DOM method equals(Node, true) then when a node was changed, it'd need to get moved to a different location in any hashtable. For the moment, I have a hard time seeing any better implementations of Object.equals() and Object.hashCode() for DOM nodes than the default! - Dave From gwachob@aimnet.com Tue Dec 15 18:44:59 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Tue, 15 Dec 1998 10:44:59 -0800 (PST) Subject: [XML-SIG] Parsers which include external parsed entities In-Reply-To: Message-ID: On 15 Dec 1998, Lars Marius Garshol wrote: > > * Gabe Wachob > | > | Are there any parsers out there which automatically include external > | parsed entities? > > xmlproc does, both in validating and well-formedness mode. I'm having problems parsing this with all of the python xml parsers: ] > &linklist; Is my brain mush? Whats wrong with this? I get no errors, but I also get no DOM tree. Is this a problem with the XML here, the parser, or the DOM builder? If I try to parse a more "vanilla" XML file, I get a DOM tree just fine: This is head text This is body text > | I've found that I have to pipe my xml files through SGMLNORM (which, > | ugh, upcases all my tags) to get this effect. > > Why not use SX instead? That shouldn't have the same problem. It does have the same problem (upcasing). XML is case sensitive, while SGML is not -- SX assumes it the incoming data is SGML and therefore ignores the case of the incoming element tags text. Is there an option for SX to behave case sensitively? (as an aside, I wish they had named it something besides SX, since the xmodem protocol handler also has a binary named sx) -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From spepping@scaprea.hobby.nl Mon Dec 14 19:24:23 1998 From: spepping@scaprea.hobby.nl (Simon Pepping) Date: Mon, 14 Dec 1998 20:24:23 +0100 (MET) Subject: [XML-SIG] Installing XML package Message-ID: Hello, When installing the Python XML package, I encountered the following difficulties: - make install did not work, because the site-packages directory did not yet exist. I think the installation should check for this. - Many py files in the parsers directory had mixed tabs/spaces. This is awkward when exchanging files; e.g., in my settings tab = 4 spaces, so that the first and second level indentations were identical. I hope this helps. Simon Pepping email: spepping@scaprea.hobby.nl From paul@prescod.net Tue Dec 15 22:02:56 1998 From: paul@prescod.net (Paul Prescod) Date: Tue, 15 Dec 1998 16:02:56 -0600 Subject: [XML-SIG] WDDX for Python References: Message-ID: <3676DC90.2AD41AA2@prescod.net> Simeon, I am looking into the development of the Python binding for WDDX as I said I might a few weeks ago. I'm cc:ing the Python xml-sig because they might be interested. I'm not entirely happy with the logical level of WDDX. My problem is I can't easily understand when I would use WDDX. Information passing situations seem to fall under two wide categories. Either we have a negotiated format (i.e. packet template) or we do not. If we DO, then why do we want to tag things ..., ... etc. Can't we infer the types of things from our pre-negotiated template? If we DO NOT, then wouldn't it be useful to be able to linearize objects of *named types* instead of only primitive types? i.e. Using the TYPE attribute, we could look up the constuctor for the appropriate type and invoke it. The problem is that Python programmers seldom work with data structures made of primitive and compound types. Rather they work with structures of objects. If you can't encode and decode those easily then you haven't made the job of encoding data structures much easier. We could encode objects as structs, but then their type gets lost so that they cannot be rebuilt "on the other end." Maybe you could help me to understand a typical usage situation. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From simeons@allaire.com Tue Dec 15 23:02:30 1998 From: simeons@allaire.com (Simeon Simeonov) Date: Tue, 15 Dec 1998 18:02:30 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <009e01be287e$fe0b3410$7315b5cd@ssimeonov.allaire.com> Hi, Paul! It's good to hear from you. My comments are intersperced below: >I'm not entirely happy with the logical level of WDDX. My problem is I >can't easily understand when I would use WDDX. Think of WDDX as the epitome of the 80/20 rule. It tries to provide a solution for 80% of the meaningful data exchange problems with 20% of the effort. The 80% that WDDX focuses on involve the easy and efficient exchange of complex structured _data_ (not objects) between different language platforms. So far, WDDX can be used with C++, Java, COM (VBScript, ASP, Delphi, PowerBuilder, etc.), ColdFusion, JavaScript, and Perl. WDDX is particularly well-suited for use on the Web because its XML data format can be easily transported over HTTP. Example apps: - At Allaire we use WDDX to exchange data between the ColdFusion Application Server and the ColdFusion Studio remote debugger. - Some big public ecommerce and content providers are working on WDDX-enabling their sites to expose data for application use. - Try this URL for another cool example of WDDX use: http://forums.allaire.com/Forums/Index.cfm?CFApp=49&Message_ID=225377 >Information passing situations seem to fall under two wide categories. >Either we have a negotiated format (i.e. packet template) or we do not. > >If we DO, then why do we want to tag things ..., >... etc. Can't we infer the types of things from our >pre-negotiated template? > I agree. We fall in the latter category. >If we DO NOT, then wouldn't it be useful to be able to linearize objects >of *named types* instead of only primitive types? i.e. > > > > > > > > >Using the TYPE attribute, we could look up the constuctor for the >appropriate type and invoke it. The problem is that Python programmers >seldom work with data structures made of primitive and compound types. >Rather they work with structures of objects. When you want to reach such a wide audience you have to make concessions. In particular, we had to decide that we couldn't exchange objects because some of the target languages have no notion of such. >If you can't encode and >decode those easily then you haven't made the job of encoding data >structures much easier. I would disagree with you here... How can a Python app exchange data with an ecommerce app written in ColdFusion? Or a book browser that's written in Perl? Or with Microsoft Word? How can it send a recordset and a three dimensional array to a web browser where these data can be used to build cool DHTML UI? The core problem of cross-language data exchange is very difficult. WDDX offers you one way to talk to a _huge_ audience of applications. It is not perfect, but it is far better than the "roll-your-own" approach. >We could encode objects as structs, but then their type gets lost so that >they cannot be rebuilt "on the other end." This is correct. It will be easy to work with objects in Python and encode them as structs using something like the dynamic serialization shown by the JavaScript serializer. And, yes, it is not easy to wrap objects around the data returned by a deserializer. Probably the easiest way to do this is to build an object factory for particular types of WDDX packets and apply it on the result of the deserialization. Whether this will be worth doing depends on your application. >Maybe you could help me to understand a typical usage situation. Bottom line: WDDX is not a solution for python-python object serialization. It can, however, open python apps up and let them communicate with a _huge_ number of other applications. Hope this help. Stay in touch. Regards, Sim Allaire From gwachob@aimnet.com Wed Dec 16 00:13:32 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Tue, 15 Dec 1998 16:13:32 -0800 (PST) Subject: [XML-SIG] WDDX for Python In-Reply-To: Message-ID: Hi folks- In response to this request, I put together a Deserializer (there are some issues in serializing that I didn't want to address yet) for WDDX data into a python object. One question I had is this: In the DTD, you show that a data element can contain one or more of any of the data types plus recordset/struct/array. Does this mean that this is a valid XML fragment: 43 ... I made the assumption that is was, so in my deserialization, I create an object WDDXObject which contains an array items -- in the previous case the array would contain a number as its first element, and the struct object (WDDXStruct) as its second element. If data has more than one child, then how do you refer to each child if you don't implement the deserialization the way I do with an array as the "top level" child of the deserialized object (I ask because I didn't want to do it this way, but I couldn't think of another simple way of doing it). What if you have two structs with two element variables with the same name? So, anyway, my deserializer fully implements the DTD and the spec as far as I understand it. It does not parse the timeDate type (I could throw it in a wrapper object with nice methods and all). The URL is http://www.aimnet.com/~gwachob/software.html It uses my current rev of my DOMVisitor.py Everything is not well tested, and in fact, may not be the most efficient. However, here it is... -Gabe On Sun, 13 Dec 1998, Jeremy Allaire wrote: > Hello folks- > > I'm interesting in engaging anyone/everyone from the Python community to > work with us on a WDDX platform module for Python. With the help of a few > developers, we've been able to muster/ship WDDX modules for ASP/COM, Java, > ColdFusion, Perl and JavaScript, and would love to see a Python > implementation. > > Given the recent XML release for Python, seems like it would be a great > project to make cross-language distributed web applications even more > possible. > > Take a visit to www.WDDX.org, and most importantly take a view of the SDK, > developed by Nate Weiss, which brings it all together with all of the above > languages. > > Best and regards, > Jeremy Allaire > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From JackUnger@aol.com Wed Dec 16 04:43:44 1998 From: JackUnger@aol.com (JackUnger@aol.com) Date: Tue, 15 Dec 1998 23:43:44 EST Subject: [XML-SIG] (no subject) Message-ID: In a message dated 12/15/98 7:56:59 AM Central Standard Time, akuchlin@cnri.reston.va.us writes: << Michael Scharf writes: >Today I was in the bookstore looking for a XML book. There >are very many (some have ~1000 pages?!)! What I am looking >for is a Python-Tutorial/O'Reiley style book. Something for >someone who knows programming and HTML and a bit of SGML. No I've read the issue of O'Reilly's late _Web Journal_ about XML; it was a nice overview, but it's now outdated in many respects. Sean McGrath's _XML By Example_ is sitting in my to-read pile, but I haven't gotten around to it yet. If anyone has read other XML books, brief recommendations (or warnings) for the book page would be great... >> One comment on Steven Holzner's XML Complete. Its Java based and when I got last March there were already problems with the code. It was based on an early version of MS Parser for Java and by the time I got the book JDK 1.1 and a corresponding version of the MS Parser were in place and most of the code in the book didn't work. I haven't revisited the book with Python in hand to translate and see if it works. Back to mode. 8^) Jack Ungerleider From simeons@allaire.com Wed Dec 16 14:40:04 1998 From: simeons@allaire.com (Simeon Simeonov) Date: Wed, 16 Dec 1998 09:40:04 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <00ca01be2901$f824a5d0$7315b5cd@ssimeonov.allaire.com> Hi, Gabe! Great work! > One question I had is this: > >In the DTD, you show that a data element can contain one or more of any of >the data types plus recordset/struct/array. Does this mean that this is a >valid XML fragment: > > >43 > >... > > Nope, it does not. There was "bug" in the DTD. The content of the data element had a *. It really should have one and only one child element. The version on the site must not have been updated. I'll make sure it is. Regards, Sim Allaire From akuchlin@cnri.reston.va.us Wed Dec 16 15:20:47 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 16 Dec 1998 10:20:47 -0500 (EST) Subject: [XML-SIG] WDDX for Python In-Reply-To: References: Message-ID: <13943.53067.284156.866042@amarok.cnri.reston.va.us> Gabe Wachob writes: >Hi folks- > In response to this request, I put together a Deserializer (there >are some issues in serializing that I didn't want to address yet) for WDDX >data into a python object. Neat! FYI, I've also been working on marshalling a bit, trying to produce a generic Python-to-XML-marshalling class that can be subclassed to implement a specific format like WDDX or XML-RPC. It's too early to report any results, since I haven't actually implemented unmarshalling yet, and the code hasn't been added to the CVS tree. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Things in Python are very clear, but are harder to find than the secrets of wizards. Things in Perl are easy to find, but look like arcane spells to invoke magic. -- Mike Meyer, 6 Nov 1997 From larsga@ifi.uio.no Wed Dec 16 15:29:51 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 16 Dec 1998 16:29:51 +0100 Subject: [XML-SIG] Parsers which include external parsed entities In-Reply-To: References: Message-ID: * Gabe Wachob | | I'm having problems parsing this with all of the python xml parsers: | | | | | ] | > Congratulations! You've found a bug in xmlproc! It turns out that this way to end the internal DTD subset is well-formed after all. I'll fix this now so that it will work with the next release (which shouldn't be too far off). Meanwhile, just change ']\n>' to ']>' and it should work. | Is my brain mush? Don't think so. I'm more worried about mine... :) | I get no errors, but I also get no DOM tree. This is probably because you don't set any errorhandler so the errors are just silently swallowed. saxutils.ErrorPrinter is handy if you want one that simply prints the error messages. | XML is case sensitive, while SGML is not -- SX assumes it the | incoming data is SGML and therefore ignores the case of the incoming | element tags text. I should have guessed that, of course. | Is there an option for SX to behave case sensitively? In a sense, yes. If you use an SGML declaration where you set element type names to be case sensitive it shouldn't behave in this way. Another trick you can try is jade with an identity-transform DSSSL stylesheet like: jade -d id.dsl -t xml mydoc.sgml and the stylesheet: (default (make element)) You'll lose comments and PIs. You'll also lose attributes, but it shouldn't be too hard to write a little snippet that puts them in, using queries on (current-node). Don't have time to put that together now, unfortunately. --Lars M. From gwachob@aimnet.com Wed Dec 16 16:06:54 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Wed, 16 Dec 1998 08:06:54 -0800 (PST) Subject: [XML-SIG] WDDX for Python In-Reply-To: <13943.53067.284156.866042@amarok.cnri.reston.va.us> Message-ID: On Wed, 16 Dec 1998, Andrew M. Kuchling wrote: > Gabe Wachob writes: > >Hi folks- > > In response to this request, I put together a Deserializer (there > >are some issues in serializing that I didn't want to address yet) for WDDX > >data into a python object. > > Neat! FYI, I've also been working on marshalling a bit, trying > to produce a generic Python-to-XML-marshalling class that can be > subclassed to implement a specific format like WDDX or XML-RPC. It's > too early to report any results, since I haven't actually implemented > unmarshalling yet, and the code hasn't been added to the CVS tree. If you get this done, I know there are people at the casbah project who might want to use such a thing for LDO (their lightweight distributed object) component. I discussed LDO with Ken MacLeod, and there are some thorny issues that I'm sure if you haven't run across you may (my memory on the specific issues are cloudy). Anyway the Casbah URL is http://www.ntlug.org/casbah -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From ray@imall.com Wed Dec 16 16:45:18 1998 From: ray@imall.com (Ray Whitmer) Date: Wed, 16 Dec 1998 09:45:18 -0700 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com> <36769F71.A90D8C2A@eng.sun.com> Message-ID: <3677E39E.8BAECF6E@imall.com> David Brownell wrote: > Though there's one thing to consider: The behavior of Object.equals() > and Object.hashCode() is specified to make objects work as hashtable > keys in the natural manner. For example, strings can be used as keys > since they're immutable and equals() is overridden ... were they mutable, > or did they not override equals(), that'd not be so. That is a sad but true that Hashtable influenced the implementation of Object. Equals is problematic in Object's API because of its ambiguity, but about every other language seems to do something similarly ambiguous. You raise a connection between equals and immutability that I generally tend to overlook as nonessential. There are plenty of other examples in the jdk that also overlook it that I cited before (like Point or Rectangle), again demonstrating the ambiguity of the interpretation of equals, which I think we are mostly agreed upon. Users of Hashtable must rely on discipline, because there is not enough typing to otherwise guarantee that the interpretation of equals will not change. In any case, equals should not be usable for Node until a clear portable definition is established, whether that be the identity interpretation or some deeper interpretation. Ray From db@Eng.Sun.COM Wed Dec 16 17:39:27 1998 From: db@Eng.Sun.COM (David Brownell) Date: Wed, 16 Dec 1998 09:39:27 -0800 Subject: [XML-SIG] Re: Equality tests on DOM nodes References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com> <36769F71.A90D8C2A@eng.sun.com> <3677E39E.8BAECF6E@imall.com> Message-ID: <3677F04F.391A3B35@eng.sun.com> Ray Whitmer wrote: > > David Brownell wrote: > > > Though there's one thing to consider: The behavior of Object.equals() > > and Object.hashCode() is specified to make objects work as hashtable > > keys in the natural manner. For example, strings can be used as keys > > since they're immutable and equals() is overridden ... were they mutable, > > or did they not override equals(), that'd not be so. > > That is a sad but true that Hashtable influenced the implementation of > Object. Equals is problematic in Object's API because of its ambiguity, but > about every other language seems to do something similarly ambiguous. I don't see anything being "sad" in the influence you mention. Any answer that's picked to define "equality" (or "identity") is going to be pretty arbitrary, and become (in some context/task) "ambiguous". So there will always be a need to define application-specific definitions for this. I'll also note that after several years (!) of discussion on the topic, OMG decided to -- gasp! -- let objects be used as keys into hashtables in CORBA 2.0, as its first foray into the murky waters of this problem. It's got a low system-wide cost, and provides the benefits folk need. > You > raise a connection between equals and immutability that I generally tend to > overlook as nonessential. There are plenty of other examples in the jdk that > also overlook it that I cited before (like Point or Rectangle), There may be no official API policy with respect to immutability, though I'll ask about that one. One can adopt a policy (with some imperfect degree of success) like "if you want to change it, don't use it as a hashtable key...". I mentioned it to highlight some of the complexity behind the notion of one thing being "equal" to another -- it could change over time! > In any case, equals should not be usable for Node until a clear portable > definition is established, whether that be the identity interpretation or some > deeper interpretation. At this point in time, the definition would seem to be the default that's supported by all java.lang.Object instances. - Dave From akuchlin@cnri.reston.va.us Thu Dec 17 01:48:09 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 16 Dec 1998 20:48:09 -0500 Subject: [XML-SIG] Open issues: Namespaces and Unicode Message-ID: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com> There are two major issues still unresolved at this point, from the list assembled during the Developer's Day session at IPC7. Other things, like WDDX and all that, are more minor and not showstoppers. 1) Unicode support. The wstring type was added in version 0.5 of the package, but it was just added to the installation, not integrated with the XML parsers. sgmlop and pyexpat are probably the only parsers that stand a chance of handling 16-bit Unicode. xmlproc relies on the re module, and making re handle Unicode would be a big job, so users would have to UTF-8 encode their data first. From poking around inside Expat, it looks like it can handle UTF-16, agreeing with a simple test with xmlwf; try running this test program to generate a file named t.xml and then run it through xmlwf: from xml.unicode import wstring s=wstring.L(""" text""") f = open('t.xml', 'w') ; f.write(s.utf16() ) ; f.close() Amazingly, if the resulting file is then parsed by Python code using pyexpat, the resulting UTF8 output is correct, even though the code doesn't do anything special about Unicode at all. I suspect that this is only a coincidence, and won't work on a machine of different endianness. Anyway, we should probably modify at least one of the parsers to handle a wide string. Pyexpat is probably the best candidate, since the Unicode support is already there in Expat itself. Does this seem to be a reasonable course of action? Any volunteers? 2) Namespace support. We also wanted to arrive at some form of namespace support for the SAX and DOM interfaces. Unfortunately, no one responsible seems to be defining what namespace support should look like in SAX and DOM. The plan for SAX might be to use a parser filter that implemented the additional namespace processing; in a Nov. 13 xml-dev post David Megginson supported this idea, and said he'd like to formalise the idea of a SAX filter in SAX 1.0.1. I'm not aware of any public info about the changes, but have written Megginson asking about it. There also seems no sign of namespace support for the DOM, though I've posted to the www-dom mailing list asking about it. This presents us with two options: ignore DOM namespaces completely for 1.0 and wait for some guidance from the working group; or add some utility function or module to do it, knowing that it will probably be made obsolete in the future. (For example, there might be a do_namespaces() function in xml.dom.utils that walked over a DOM tree looking for xmlns:* attributes and decorated all the nodes with an attribute containing the namespace URI, or a Node method that scanned its ancestors looking for namespace declarations.) What do you think? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It is in this matter that I fall foul of so many American writers on writing; they seem to think that writing is a confidence game by means of which the author cajoles a restless, dull-witted, shallow audience into hearing his point of view. Such an attitude is base, and can only beget base prose. -- Robertson Davies, "Elements of Style" From Fred L. Drake, Jr." References: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com> Message-ID: <13945.13411.498583.532812@weyr.cnri.reston.va.us> A.M. Kuchling writes: > 1) Unicode support. ... > is only a coincidence, and won't work on a machine of different > endianness. I suspect expat is able to determine endianness and takes care of byteswapping as needed. > Anyway, we should probably modify at least one of the parsers to > handle a wide string. Pyexpat is probably the best candidate, since > the Unicode support is already there in Expat itself. Does this seem > to be a reasonable course of action? Any volunteers? Yes, and no. ;-) > 2) Namespace support. ... > and DOM interfaces. Unfortunately, no one responsible seems to be > defining what namespace support should look like in SAX and DOM. The > plan for SAX might be to use a parser filter that implemented the If we can get an agreement as to just what SAX filters are supposed to look like, I'm willing to do any new coding needed to implement a namespace handler. I understand that someone has already done some work on SAX filters, but SAX itself really needs to define this, and preferably define the SAX interface in IDL as well. Let us know if you get any info from Dave. The results of the last call on the Namespace draft should be known in early January. We should wait until that's done before worrying about it much. > There also seems no sign of namespace support for the DOM, ... > function or module to do it, knowing that it will probably be made > obsolete in the future. (For example, there might be a > do_namespaces() function in xml.dom.utils that walked over a DOM tree > looking for xmlns:* attributes and decorated all the nodes with an I think a "decorating" function like this would be a sufficient interim solution. I would not place it in the Node or Element class because of the expected obsolescence. I'm willing, but for either project (SAX or DOM namespaces), I won't have time until January. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Jeff.Johnson@icn.siemens.com Thu Dec 17 17:54:00 1998 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Thu, 17 Dec 1998 12:54:00 -0500 Subject: [XML-SIG] new code for xml.dom.utils Message-ID: <852566DD.0062463D.00@li01.lm.ssc.siemens.com> I threw together this class and thought it might be a good candidate for the xml.dom.utils module. It makes it really easy to get a DOM tree from a file. I made it a class even though it could just as easily be a bunch of functions but as a class it might be subclassed for some neat things I can't think of right now. The following subclass would allow an HTML or XML file to be pretty printed with a single line of code (a pretty silly example but it's just an example): class DomDumper(DomHelper) __init__(self,filename): DomHelper.__init__(self,filename) self.dom.dump() d = DomDumper(sys.argv[1]) Here's the file: import sys, string, os from xml.dom import core from xml.dom import html_builder from xml.sax import saxexts from xml.dom.sax_builder import SaxBuilder class DomHelper: def __init__(self,filename=None): self.filename = filename if filename != None: self.dom = self.readFile(filename) def readFile(self,filename): """Given an XML, HTML, or SGML filename with appropriate file extensione, return the DOM document.""" type = self.getFileType(filename) file = open(filename,'r') dom = self.readStream(file,type) file.close() return dom def readStream(self,stream,type='XML'): if type == 'XML': dom = self.readXml(stream) elif type == 'HTML': dom = self.readHtml(stream) elif type == 'SGML': dom = self.readSgml(stream) else: dom = None return dom def readXml(self,stream,parserName=None): """parserName could be 'pyexpat', 'sgmlop', etc.""" p = saxexts.make_parser(parserName) dh = SaxBuilder() p.setDocumentHandler(dh) p.feed(stream.read()) doc = dh.document p.close() return doc def readHtml(self,stream): b = html_builder.HtmlBuilder() b.feed(stream.read()) b.close() doc = b.document # There was some bug that prevents the builder from # freeing itself (maybe it has already been fixed?). # The next two lines break its references to the DOM # tree so that it can be freed. b.document = None b.current_element = None return doc def readSgml(self): # Don't know much about this part. This could call SX to # convert the SGML to XML, then read it in. That's what I # do for some SGML files I need to convert. Any suggestions? print "This is not implemented." def getFileType(self,filename): """Given a filename, figure out if the file contains XML, HTML, or SGML. For now, use the file extension to make the determination.""" filename = string.lower(filename) (name,ext) = os.path.splitext(filename) if ext in ('.htm','.html'): type = 'HTML' elif ext in ('.sgm','.sgml'): type = 'SGML' elif ext == '.xml': type = 'XML' else: type = '' # should this return None instead? return type if __name__ == '__main__': if len(sys.argv) == 2: d = DomHelper() dom = d.readFile(sys.argv[1]) dom.dump() else: print "Usage: python %s " % sys.argv[0] From jeremy@allaire.com Thu Dec 17 20:43:57 1998 From: jeremy@allaire.com (Jeremy Allaire) Date: Thu, 17 Dec 1998 15:43:57 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <005001be29fd$f80bcc20$2b15b5cd@jallaire_lt.allaire.com> Gabe- This is awsome! Lovin support for Python in the mix. Is there anything we can do to help solve problems on the serialization side? Also, you should drop a note to Nate Weiss (nweiss@icesinc.com) who is the creator of the SDK so he can include your bits and build some samples off of it. Also, FYI, PCWeek just ran a story on WDDX: http://www.zdnet.com/pcweek/stories/news/0,4153,380476,00.html Thanks and regards, Jeremy -----Original Message----- From: Gabe Wachob To: Jeremy Allaire Cc: 'xml-sig@python.org' ; Simeon Simeonov Date: Tuesday, December 15, 1998 7:15 PM Subject: Re: [XML-SIG] WDDX for Python >Hi folks- > In response to this request, I put together a Deserializer (there >are some issues in serializing that I didn't want to address yet) for WDDX >data into a python object. > > One question I had is this: > >In the DTD, you show that a data element can contain one or more of any of >the data types plus recordset/struct/array. Does this mean that this is a >valid XML fragment: > > >43 > >... > > > >I made the assumption that is was, so in my deserialization, I create an >object WDDXObject which contains an array items -- in the previous case >the array would contain a number as its first element, and the struct >object (WDDXStruct) as its second element. > >If data has more than one child, then how do you refer to each child if >you don't implement the deserialization the way I do with an array as the >"top level" child of the deserialized object (I ask because I didn't want >to do it this way, but I couldn't think of another simple way of doing >it). What if you have two structs with two element variables with the same >name? > >So, anyway, my deserializer fully implements the DTD and the spec as far >as I understand it. It does not parse the timeDate type (I could throw it >in a wrapper object with nice methods and all). > >The URL is http://www.aimnet.com/~gwachob/software.html > >It uses my current rev of my DOMVisitor.py Everything is not well tested, >and in fact, may not be the most efficient. However, here it is... > > -Gabe > >On Sun, 13 Dec 1998, Jeremy Allaire wrote: > >> Hello folks- >> >> I'm interesting in engaging anyone/everyone from the Python community to >> work with us on a WDDX platform module for Python. With the help of a few >> developers, we've been able to muster/ship WDDX modules for ASP/COM, Java, >> ColdFusion, Perl and JavaScript, and would love to see a Python >> implementation. >> >> Given the recent XML release for Python, seems like it would be a great >> project to make cross-language distributed web applications even more >> possible. >> >> Take a visit to www.WDDX.org, and most importantly take a view of the SDK, >> developed by Nate Weiss, which brings it all together with all of the above >> languages. >> >> Best and regards, >> Jeremy Allaire >> >> _______________________________________________ >> XML-SIG maillist - XML-SIG@python.org >> http://www.python.org/mailman/listinfo/xml-sig >> > >------------------------------------------------------------------- >http://www.aimnet.com/~gwachob http://www.findlaw.com >"A popular Government, without popular information, or the means of >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps >both." -- James Madison > import std.disclaimer > > From gwachob@aimnet.com Thu Dec 17 21:12:18 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Thu, 17 Dec 1998 13:12:18 -0800 (PST) Subject: [XML-SIG] WDDX for Python In-Reply-To: <005001be29fd$f80bcc20$2b15b5cd@jallaire_lt.allaire.com> Message-ID: On Thu, 17 Dec 1998, Jeremy Allaire wrote: > Gabe- > > This is awsome! Lovin support for Python in the mix. Is there anything we > can do to help solve problems on the serialization side? Well, I wonder aloud whether its possible (or worth attempting) to write a serializer for arbitrary python objects. What is the approach taken in other languages? I have not looked at much WDDX stuff besides the DTD.. (in fact, the first time I had ever looked at the WDDX stuff for more than a minute was when I sat down to write the Deserializer). Also, I'm not sure what sort of Python objects or data types would map to a timeDate WDDX element. I'm thinking that the best thing to do would be to create a WDDXCreator object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc). I don't know -- looking at how other languages like Java do it would be instructional.. > Also, you should > drop a note to Nate Weiss (nweiss@icesinc.com) who is the creator of the SDK > so he can include your bits and build some samples off of it. Done > Also, FYI, PCWeek just ran a story on WDDX: > http://www.zdnet.com/pcweek/stories/news/0,4153,380476,00.html Nice. I actually have no immediate use for WDDX, nor any past experience in it. I've recently been getting into XML using Python and your message to XML-SIG (the Python XML SIG list) was timed perfectly for a "Gee, that looks like a cool thing to play around with" project... Turns out that Python is such a cool language that it only took an hour or so to write... -Gabe > > Thanks and regards, > Jeremy > > -----Original Message----- > From: Gabe Wachob > To: Jeremy Allaire > Cc: 'xml-sig@python.org' ; Simeon Simeonov > > Date: Tuesday, December 15, 1998 7:15 PM > Subject: Re: [XML-SIG] WDDX for Python > > > >Hi folks- > > In response to this request, I put together a Deserializer (there > >are some issues in serializing that I didn't want to address yet) for WDDX > >data into a python object. > > > > One question I had is this: > > > >In the DTD, you show that a data element can contain one or more of any of > >the data types plus recordset/struct/array. Does this mean that this is a > >valid XML fragment: > > > > > >43 > > > >... > > > > > > > >I made the assumption that is was, so in my deserialization, I create an > >object WDDXObject which contains an array items -- in the previous case > >the array would contain a number as its first element, and the struct > >object (WDDXStruct) as its second element. > > > >If data has more than one child, then how do you refer to each child if > >you don't implement the deserialization the way I do with an array as the > >"top level" child of the deserialized object (I ask because I didn't want > >to do it this way, but I couldn't think of another simple way of doing > >it). What if you have two structs with two element variables with the same > >name? > > > >So, anyway, my deserializer fully implements the DTD and the spec as far > >as I understand it. It does not parse the timeDate type (I could throw it > >in a wrapper object with nice methods and all). > > > >The URL is http://www.aimnet.com/~gwachob/software.html > > > >It uses my current rev of my DOMVisitor.py Everything is not well tested, > >and in fact, may not be the most efficient. However, here it is... > > > > -Gabe > > > >On Sun, 13 Dec 1998, Jeremy Allaire wrote: > > > >> Hello folks- > >> > >> I'm interesting in engaging anyone/everyone from the Python community to > >> work with us on a WDDX platform module for Python. With the help of a > few > >> developers, we've been able to muster/ship WDDX modules for ASP/COM, > Java, > >> ColdFusion, Perl and JavaScript, and would love to see a Python > >> implementation. > >> > >> Given the recent XML release for Python, seems like it would be a great > >> project to make cross-language distributed web applications even more > >> possible. > >> > >> Take a visit to www.WDDX.org, and most importantly take a view of the > SDK, > >> developed by Nate Weiss, which brings it all together with all of the > above > >> languages. > >> > >> Best and regards, > >> Jeremy Allaire > >> > >> _______________________________________________ > >> XML-SIG maillist - XML-SIG@python.org > >> http://www.python.org/mailman/listinfo/xml-sig > >> > > > >------------------------------------------------------------------- > >http://www.aimnet.com/~gwachob http://www.findlaw.com > >"A popular Government, without popular information, or the means of > >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps > >both." -- James Madison > > import std.disclaimer > > > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From jeremy@allaire.com Thu Dec 17 21:31:19 1998 From: jeremy@allaire.com (Jeremy Allaire) Date: Thu, 17 Dec 1998 16:31:19 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <009001be2a04$95f3a100$2b15b5cd@jallaire_lt.allaire.com> >Well, I wonder aloud whether its possible (or worth attempting) to write a >serializer for arbitrary python objects. What is the approach taken in >other languages? I have not looked at much WDDX stuff besides the DTD.. >(in fact, the first time I had ever looked at the WDDX stuff for more than >a minute was when I sat down to write the Deserializer). > >Also, I'm not sure what sort of Python objects or data types would map to >a timeDate WDDX element. > >I'm thinking that the best thing to do would be to create a WDDXCreator >object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc). > >I don't know -- looking at how other languages like Java do it would be >instructional.. You should look at the Perl and COM implementations -- they're part of the downloadable SDK, including references and examples. >I actually have no immediate use for WDDX, nor any past experience in it. >I've recently been getting into XML using Python and your message to >XML-SIG (the Python XML SIG list) was timed perfectly for a "Gee, that >looks like a cool thing to play around with" project... Turns out that >Python is such a cool language that it only took an hour or so to write... That's awesome that it was so easy to put together. I think the serializer side is dooable without a lot of work. That would then let Python be a 'object server' to any other scripting language on the Web, as opposed to the deserializer which would allow Python to be a 'client' to other distributed web applications. WDDX is useful for a lot of things. For one, it allows you to tie together applications created with different applications. It also allows you to expose your Python apps as 'services' that can be leveraged over the net by any other web application, creating what we're calling 'web syndicate networks'. It's even useful for doing rich DHTML/JavaScript front-ends with Python back-ends, as with WDDX you can pass live objects from your server to the browser and have them load automagically as JavaScript objects in the page. There's some good examples in the SDK of this behavior. Regards, Jeremy > > -Gabe > >> >> Thanks and regards, >> Jeremy >> >> -----Original Message----- >> From: Gabe Wachob >> To: Jeremy Allaire >> Cc: 'xml-sig@python.org' ; Simeon Simeonov >> >> Date: Tuesday, December 15, 1998 7:15 PM >> Subject: Re: [XML-SIG] WDDX for Python >> >> >> >Hi folks- >> > In response to this request, I put together a Deserializer (there >> >are some issues in serializing that I didn't want to address yet) for WDDX >> >data into a python object. >> > >> > One question I had is this: >> > >> >In the DTD, you show that a data element can contain one or more of any of >> >the data types plus recordset/struct/array. Does this mean that this is a >> >valid XML fragment: >> > >> > >> >43 >> > >> >... >> > >> > >> > >> >I made the assumption that is was, so in my deserialization, I create an >> >object WDDXObject which contains an array items -- in the previous case >> >the array would contain a number as its first element, and the struct >> >object (WDDXStruct) as its second element. >> > >> >If data has more than one child, then how do you refer to each child if >> >you don't implement the deserialization the way I do with an array as the >> >"top level" child of the deserialized object (I ask because I didn't want >> >to do it this way, but I couldn't think of another simple way of doing >> >it). What if you have two structs with two element variables with the same >> >name? >> > >> >So, anyway, my deserializer fully implements the DTD and the spec as far >> >as I understand it. It does not parse the timeDate type (I could throw it >> >in a wrapper object with nice methods and all). >> > >> >The URL is http://www.aimnet.com/~gwachob/software.html >> > >> >It uses my current rev of my DOMVisitor.py Everything is not well tested, >> >and in fact, may not be the most efficient. However, here it is... >> > >> > -Gabe >> > >> >On Sun, 13 Dec 1998, Jeremy Allaire wrote: >> > >> >> Hello folks- >> >> >> >> I'm interesting in engaging anyone/everyone from the Python community to >> >> work with us on a WDDX platform module for Python. With the help of a >> few >> >> developers, we've been able to muster/ship WDDX modules for ASP/COM, >> Java, >> >> ColdFusion, Perl and JavaScript, and would love to see a Python >> >> implementation. >> >> >> >> Given the recent XML release for Python, seems like it would be a great >> >> project to make cross-language distributed web applications even more >> >> possible. >> >> >> >> Take a visit to www.WDDX.org, and most importantly take a view of the >> SDK, >> >> developed by Nate Weiss, which brings it all together with all of the >> above >> >> languages. >> >> >> >> Best and regards, >> >> Jeremy Allaire >> >> >> >> _______________________________________________ >> >> XML-SIG maillist - XML-SIG@python.org >> >> http://www.python.org/mailman/listinfo/xml-sig >> >> >> > >> >------------------------------------------------------------------- >> >http://www.aimnet.com/~gwachob http://www.findlaw.com >> >"A popular Government, without popular information, or the means of >> >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps >> >both." -- James Madison >> > import std.disclaimer >> > >> > >> >> >> _______________________________________________ >> XML-SIG maillist - XML-SIG@python.org >> http://www.python.org/mailman/listinfo/xml-sig >> > >------------------------------------------------------------------- >http://www.aimnet.com/~gwachob http://www.findlaw.com >"A popular Government, without popular information, or the means of >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps >both." -- James Madison > import std.disclaimer > > From simeons@allaire.com Thu Dec 17 22:44:20 1998 From: simeons@allaire.com (Simeon Simeonov) Date: Thu, 17 Dec 1998 17:44:20 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <023c01be2a0e$c92de710$7315b5cd@ssimeonov.allaire.com> Gabe, >Also, I'm not sure what sort of Python objects or data types would map to >a timeDate WDDX element. To do WDDX serialization you really need to define a set of Python objects / interfaces that other developers should use. Probably the best example code to look at is the JavaScript serializer. Here is what I did: - I created a WddxRecordset object because JS did not have the notion of a recordset. Internally, I used it just as you deserialize recordsets--as an object with property arrays. However, making it an object allowed me to provide custom serialization semantics via a wddxSerialize(serializer) method. - All arrays and simple types I mapped to WDDX directly. - All objects that did not define a custom serialization method I serialized as structs. This allows for convenient serialization of any JS object. Hope this provides some food for thought. Regards, Sim Allaire From paul@prescod.net Thu Dec 17 22:07:03 1998 From: paul@prescod.net (Paul Prescod) Date: Thu, 17 Dec 1998 16:07:03 -0600 Subject: [XML-SIG] WDDX for Python References: Message-ID: <36798087.FC366A71@prescod.net> The serializer is a little bit more tricky. We should probably discuss what the right thing here is. Gabe Wachob wrote: > > Well, I wonder aloud whether its possible (or worth attempting) to write a > serializer for arbitrary python objects. Depends on your definition: * arbitrary Python instances and a finite list of builtin types? Yes. * transient objects such as file handles and TKinter windows? No. * what about objects like compiled regular expressions and AST trees? According to the Pickle documentation, no C built-ins can be pickled except the most basic types. I'm surprised that there isn't any way to make user-defined built-in types (e.g. a C-programmed DOM-node) picklable. Anyone know more about this? The docs say: > Classes can further influence how their instances are pickled -- if > the class defines the method __getstate__(), it is called and the > return state is pickled as the contents for the instance, Does this really apply ONLY to classes, or also to built-in types? Another issue is whether we try to be smart about Python instances that represent lists of things and mappings. Do we map them to lists and structs or not? > Also, I'm not sure what sort of Python objects or data types would map to > a timeDate WDDX element. This is a problem I have been discussing in the newsgroup. We would have to define a WDDX time object and Python programmers could convert seconds-past-the-epoch integers or time tuple-lists to time objects: wddx.time( time.gmtime()). It would be nicer to have 1.5.2 contain some tiny time class but I haven't got any feedback to indicate that that will happen, so shipping our own is the next best thing. > I'm thinking that the best thing to do would be to create a WDDXCreator > object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc). That's fine for date/time and for the top-level packets, but you don't want to force the programmer to convert every item in a list (e.g.) to a WDDX type. That would be onerous. > I don't know -- looking at how other languages like Java do it would be > instructional.. I think that Javascript is a better guide because it is a more dynamic language like Python. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From Fred L. Drake, Jr." References: <36798087.FC366A71@prescod.net> Message-ID: <13945.36696.210677.726104@weyr.cnri.reston.va.us> Paul Prescod writes: > According to the Pickle documentation, no C built-ins can be pickled > except the most basic types. I'm surprised that there isn't any way to > make user-defined built-in types (e.g. a C-programmed DOM-node) picklable. Hey Paul! You can use the copy_reg module to register pickling operations on built-in types that aren't already picklable. To see how do this from C, look at Modules/parsermodule.c. > This is a problem I have been discussing in the newsgroup. We would have > to define a WDDX time object and Python programmers could convert > seconds-past-the-epoch integers or time tuple-lists to time objects: > wddx.time( time.gmtime()). It would be nicer to have 1.5.2 contain some I've not had time to keep up with the newsgroup / list, but agree we need this. I've thought a little about this for the iso8601 module; I'd like a class that can represent dates that are "not precise", like "december, 1998". The ISO 8601 standard includes such things, and being able to represent them is useful. (I've not had time to look at mxDateTime yet.) > want to force the programmer to convert every item in a list (e.g.) to a > WDDX type. That would be onerous. Support for a commonly used type (mxDataTime stuff?) might be the best way, and provide a type for people without that extension. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From paul@prescod.net Thu Dec 17 22:43:43 1998 From: paul@prescod.net (Paul Prescod) Date: Thu, 17 Dec 1998 16:43:43 -0600 Subject: [XML-SIG] WDDX for Python References: <009e01be287e$fe0b3410$7315b5cd@ssimeonov.allaire.com> Message-ID: <3679891F.865B7550@prescod.net> Simeon Simeonov wrote: > > Think of WDDX as the epitome of the 80/20 rule. Maybe 60/40? :) > When you want to reach such a wide audience you have to make concessions. In > particular, we had to decide that we couldn't exchange objects because some > of the target languages have no notion of such. Objects??? Are we integrating with Perl 4.0? Okay, what if we just add an *optional* attribute called "type" to structs. People could ignore it if they want to but as a Python programmer I wouldn't feel like I was throwing away Really Important Information. Also, what if we added an optional "id" attribute and a type...(maybe I can wait on the reference type for WDDX 2, but I'd rather not) > I would disagree with you here... How can a Python app exchange data with an > ecommerce app written in ColdFusion? Or a book browser that's written in > Perl? Or with Microsoft Word? How can it send a recordset and a three > dimensional array to a web browser where these data can be used to build > cool DHTML UI? If I had to send a 3D array of integers to Perl, I would send a bunch of lines like this: 23 43 564 234 40 203 03 203 23 430 23 10 It is presumably two lines of Perl code to split that up and convert it to integers. To me, the big win comes when I can send an OBJECT to Perl without dumbing it down into basic types. In fact, I think that the only features that I need to do this AS WELL AS the native Python tool called "pickle" is the "type" attribute and ID/IDREF. If I could just not throw away types then I could at least handle simple, non-recursive data structures okay (i.e. ID/IDREF can maybe wait). > And, yes, it is not easy to wrap objects around the data returned by a > deserializer. Probably the easiest way to do this is to build an object > factory for particular types of WDDX packets and apply it on the result of > the deserialization. Whether this will be worth doing depends on your > application. Except that packet types aren't self-labelling either. They do have a place for meta-data, however. If we could provide a place in structs for arbitrary metadata, we would be almost home. BTW, wouldn't the packet metadata be more useful if there was some attribute that let me say what kind of metadata it was, like HTML META tags? > Bottom line: WDDX is not a solution for python-python object serialization. > It can, however, open python apps up and let them communicate with a _huge_ > number of other applications. Sure, but we're so close to making it useful for Python->Python and (more interesting) Python->arbitrary OO language (including Perl 5) object exchange. I think that all we need is one attribute. The attribute should contain a URI (URIs are language independent) and each deserializer could have a mapping from URIs to class constructors. Languages that don't have a notion of class would ignore the URI. URIs are verbose but of course they compress beautifully. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From Daniel Biddle Thu Dec 17 23:25:09 1998 From: Daniel Biddle (Daniel Biddle) Date: Thu, 17 Dec 1998 23:25:09 +0000 (GMT) Subject: [XML-SIG] WDDX for Python In-Reply-To: <13945.36696.210677.726104@weyr.cnri.reston.va.us> Message-ID: On 1998-12-17 (Thu) Fred L. Drake wrote: > I've not had time to keep up with the newsgroup / list, but agree we > need this. I've thought a little about this for the iso8601 module; > I'd like a class that can represent dates that are "not precise", like > "december, 1998". The ISO 8601 standard includes such things, and > being able to represent them is useful. (I've not had time to look at > mxDateTime yet.) Does it? I've typed out the whole standard and am about to convert it into HTML, and I've not noticed anything like "december, 1998" being possible. Do you mean "1998-12"? -- Daniel Biddle From simeons@allaire.com Thu Dec 17 23:52:36 1998 From: simeons@allaire.com (Simeon Simeonov) Date: Thu, 17 Dec 1998 18:52:36 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com> Hi, Paul! Many nice comments here. >Okay, what if we just add an *optional* attribute called "type" to >structs. People could ignore it if they want to but as a Python programmer >I wouldn't feel like I was throwing away Really Important Information. Yup, this is probably the easiest way to go about providing some basic object serialization. I don't have a problem with this. >Also, what if we added an optional "id" attribute and a >type...(maybe I can wait on the reference type for WDDX 2, but I'd rather >not) This is a much nastier problem as it complicates and slows down both the serialization and deserialization algorithms. Not that it's a difficult thing to implement, but it does require the maintenance of data global to the entire serialization/deserialization process and it slows the process down considerably. We should probably handle this by optionally notifying the serializer/deserializer that they are dealing with aggregate data and no references. >> I would disagree with you here... How can a Python app exchange data with an >> ecommerce app written in ColdFusion? Or a book browser that's written in >> Perl? Or with Microsoft Word? How can it send a recordset and a three >> dimensional array to a web browser where these data can be used to build >> cool DHTML UI? > >If I had to send a 3D array of integers to Perl, I would send a bunch of >lines like this: > >23 43 564 234 >40 203 03 203 >23 430 23 10 > >It is presumably two lines of Perl code to split that up and convert it to >integers. To me, the big win comes when I can send an OBJECT to Perl >without dumbing it down into basic types. In fact, I think that the only >features that I need to do this AS WELL AS the native Python tool called >"pickle" is the "type" attribute and ID/IDREF. If I could just not throw >away types then I could at least handle simple, non-recursive data >structures okay (i.e. ID/IDREF can maybe wait). Humor me and try to do the same using JavaScript or VBScript. Humor me even further and exchange an array of arbitrary strings in a safe and efficient manner. :) I think you'll find the problem unpleasantly fickle... >> Bottom line: WDDX is not a solution for python-python object serialization. >> It can, however, open python apps up and let them communicate with a _huge_ >> number of other applications. > >Sure, but we're so close to making it useful for Python->Python and (more >interesting) Python->arbitrary OO language (including Perl 5) object >exchange. I think that all we need is one attribute. > >The attribute should contain a URI (URIs are language independent) and >each deserializer could have a mapping from URIs to class constructors. >Languages that don't have a notion of class would ignore the URI. URIs are >verbose but of course they compress beautifully. I agree with you here. Do you have a particular URI type (look'n'feel) in mind? Sim From paul@prescod.net Fri Dec 18 03:54:15 1998 From: paul@prescod.net (Paul Prescod) Date: Thu, 17 Dec 1998 21:54:15 -0600 Subject: [XML-SIG] WDDX for Python References: <36798087.FC366A71@prescod.net> <13945.36696.210677.726104@weyr.cnri.reston.va.us> Message-ID: <3679D1E7.B6881DB4@prescod.net> "Fred L. Drake" wrote: > > You can use the copy_reg module to register pickling operations on > built-in types that aren't already picklable. To see how do this from > C, look at Modules/parsermodule.c. I'll have to implement a similar module for WDDX. I can't use copy_reg because WDDX has a cross-language requirement. I can't encode the type name in terms of modules and constructor functions: I must indirect through a URI. > Support for a commonly used type (mxDataTime stuff?) might be the > best way, and provide a type for people without that extension. I can make a two-way registry which describes mappings both way. Then I'll prime the registry with any date/time classes people give me URLs for. Then users can choose their own date/time class. Whichever one appears as input (mxDateTime, /F's, etc.) will get interpreted correctly as a date. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From paul@prescod.net Fri Dec 18 03:02:04 1998 From: paul@prescod.net (Paul Prescod) Date: Thu, 17 Dec 1998 21:02:04 -0600 Subject: [XML-SIG] WDDX for Python References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com> Message-ID: <3679C5AC.83D10000@prescod.net> Simeon Simeonov wrote: > > Yup, this is probably the easiest way to go about providing some basic > object serialization. I don't have a problem with this. Great! > I agree with you here. Do you have a particular URI type (look'n'feel) in > mind? There are three conventions that should be followed: * SGML convention is that the URI should be to a document describing the object type. That way if you ever "find" a packet, (e.g. as a serialization of a large data structure) then you can research it. * XML Namespaces convention is that applications should not depend on any particular type of data at the other end (or of the URI pointing to anything at all) * general URL convention is that you or your organization should own the domain name. > >Also, what if we added an optional "id" attribute and a > >type...(maybe I can wait on the reference type for WDDX 2, but I'd rather > >not) > > This is a much nastier problem as it complicates and slows down both the > serialization and deserialization algorithms. Not that it's a difficult > thing to implement, but it does require the maintenance of data global to > the entire serialization/deserialization process and it slows the process > down considerably. We should probably handle this by optionally notifying > the serializer/deserializer that they are dealing with aggregate data and no > references. I admit that this increases the complexity alot. The biggest problem is dealing with mutually recursive references between objects: especially in strongly typed programming languages. In dynamically typed languages you can easily build proxies for the object that isn't available yet. In a static language I don't know offhand what you would do. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From akuchlin@cnri.reston.va.us Fri Dec 18 04:22:35 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Thu, 17 Dec 1998 23:22:35 -0500 Subject: [XML-SIG] Recent CVS changes Message-ID: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> Some stuff that's been added to the XML CVS tree tonight: * Jeff Johnson's DomHelper class has been added to xml.dom.utils, renamed to FileReader and with some minor changes to allow passing in a file-like object. I hope I didn't break anything in those changes. * While waiting for a friend to show up for dinner, I got my generic marshalling code finished and cleaned up, and also worked on subclassing it to handle WDDX and XML-RPC, finishing neither of them but getting pretty close. XML-RPC is complete except for the datetime.iso8601 type; I'm not sure how the caller should pass in something to be marshalled as a date. (This ties in to the absence of a standard date-time type.) WDDX is still missing dateTime, recordSet, and some other things I can't remember. Another hour should suffice to finish it. (That's what I like about Python: writing 90% of the code takes 10% of the time, and the other 10% also takes 10% of the time.) I'd be interested in seeing what people think of xml.marshal.generic; does its structure seem easily amenable to further subclassing to implement other data serializers? Also, does anyone know of other DTDs for data serialization? I'd like to take a crack at implementing them all, and seeing if they're all fairly clean to implement. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Who was it that designed brown envelopes? I feel sure that he hated people whoever he was. I wonder where he's buried? -- Tom Baker, in his autobiography From paul@prescod.net Fri Dec 18 06:28:35 1998 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Dec 1998 00:28:35 -0600 Subject: [XML-SIG] Marshalling References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> Message-ID: <3679F613.68A22D40@prescod.net> "A.M. Kuchling" wrote: > > I'd be interested in seeing what people think of > xml.marshal.generic; does its structure seem easily amenable to > further subclassing to implement other data serializers? Also, does > anyone know of other DTDs for data serialization? I'd like to take a > crack at implementing them all, and seeing if they're all fairly clean > to implement. It looks like you've put a lot of thought into it, so please forgive my random, partially thought-out questions: * why have a single class for marshalling and unmarshalling? * this stuff is a little weird: "m = self.__class__()" Could we put all of the mutable data in a separate class and avoid it? Maybe I'm just skittish about strange idioms... * Could m_unimplemented be called by default for unhandled classes? * Maybe string handling should be safer...i.e. control characters User defined types issues: 1. What do we do about instances? I suggest looping over data-properties and saving them as named structs. The names should be unique URIs. 2. what do we do about built-in types (i.e. complex)? I suggest using copy_reg to deconstruct ... and using URI-named structs again. 3. pickle uses various magic methods: __reduce__, __getinitargs__, __getstate__. Should XML marshalling support some or all of that stuff? My modest contribution is the following code which handles the mapping from URIs to types and also registers types with copy_reg . """Type_reg.py Type registry -- mapping from URLs to builders and decomposers. """ import copy_reg registry={} def register( url, type, pickle_function, constructor ): copy_reg.pickle( type, pickle_function, constructor ) registry[url]=type, constructor def rebuild( url, args ): type, cons = registry[url] return apply( cons, args ) def decompose( obj ): pickle_function = copy_reg.dispatch_table[type( obj )] return pickle_function( obj )[1] register( "http://www.python.org/doc/ref/types.html#complex", type( 1j ), copy_reg.pickle_complex, complex ) ### Todo: register various date/time types -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From gstein@lyra.org Fri Dec 18 08:34:03 1998 From: gstein@lyra.org (Greg Stein) Date: Fri, 18 Dec 1998 00:34:03 -0800 Subject: [XML-SIG] WDDX for Python References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com> <3679C5AC.83D10000@prescod.net> Message-ID: <367A137B.470C3AC4@lyra.org> Paul Prescod wrote: > ... > * XML Namespaces convention is that applications should not depend on any > particular type of data at the other end (or of the URI pointing to > anything at all) In short: it is a URI, not a URL. It doesn't locate anything; it just identifies something uniquely. Nominally, if an XML element looks like: Then, the element is uniquely identified as "URI_goes_hereELEM" (they're appended). In a more familiar form, you might have a URI of "http://my.domain.com/some_app/xml_elems/" so that you end up with final URIs like "http://my.domain.com/some_app/xml_elem/ELEM" Cheers, -g -- Greg Stein, http://www.lyra.org/ From Sjoerd.Mullender@cwi.nl Fri Dec 18 12:11:07 1998 From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender) Date: Fri, 18 Dec 1998 13:11:07 +0100 Subject: [XML-SIG] Open issues: Namespaces and Unicode In-Reply-To: Your message of Thu, 17 Dec 1998 11:42:11 -0500. <13945.13411.498583.532812@weyr.cnri.reston.va.us> References: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com> <13945.13411.498583.532812@weyr.cnri.reston.va.us> Message-ID: On Thu, Dec 17 1998 "Fred L. Drake" wrote: > > 2) Namespace support. In my private version of xmllib I have support for XML namespaces. I haven't submitted this version to Guido yet for several reasons: - The namespace support (at least for the current namespace proposal) is very new (like 1 day). - My current version isn't compatible with the old version that is in the Python core. - I haven't documented the new interface yet. Is anybody interested in taking a look at my new version anyway? The most important API changes are: - I don't look look to see if there any methods with a name matching start_TAG end end_TAG since TAG can contain characters that aren't allowed in Python identifiers. Instead I look in a dicionary that maps tag names to start and end methods. - You can specify the valid attributes and default values for all elements. The way this is done has also changed. -- Sjoerd Mullender From akuchlin@cnri.reston.va.us Fri Dec 18 14:13:59 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 18 Dec 1998 09:13:59 -0500 (EST) Subject: [XML-SIG] Marshalling In-Reply-To: <3679F613.68A22D40@prescod.net> References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> Message-ID: <13946.24429.905213.372579@amarok.cnri.reston.va.us> Paul Prescod writes: > * why have a single class for marshalling and unmarshalling? My fuzzy argument for this was that I wanted the user to write only a single subclass, not two of them. > * this stuff is a little weird: "m = self.__class__()" Could we put all >of the mutable data in a separate class and avoid it? Maybe I'm just >skittish about strange idioms... Probably data_stack shouldn't be an attribute of the class, but be passed to each of the unmarshalling functions. That would mean that the Marshaller class would have no mutable attributes at all, and the self.__class__ thing would be unnecessary. > * Could m_unimplemented be called by default for unhandled classes? Good point; I'll clean that up, and also make the listing of unmarshalling functions tidier. > * Maybe string handling should be safer...i.e. control characters Shouldn't control characters, such as chr(9) or chr(7) be fine? The code already escapes <,&,>, and aren't those the only characters to worry about? Another potential problem is that on unmarshalling, the XML parser may change newlines around inside your string. If you care, then you'd have to base64-encode all your strings. I may add code to check for Tim Bray's proposed attribute, xml:packed="base64" (or whatever it is), and automatically decode it. >User defined types issues: > 1. What do we do about instances? I suggest looping over data-properties >and saving them as named structs. The names should be unique URIs. > 2. what do we do about built-in types (i.e. complex)? I suggest using >copy_reg to deconstruct ... and using URI-named structs again. The generic code actually does complex numbers, but I see your point. > 3. pickle uses various magic methods: __reduce__, __getinitargs__, >__getstate__. Should XML marshalling support some or all of that stuff? Definitely, if it supports generic Python instances. However, I'm less interested in reproducing pickle in XML than in providing a base for supporting all the various DTDs that are popping up. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ A wise man can do no better than to turn from the churches and look up through the airy majesty of the wayside trees with exultation, with resignation, at the unconquerable unimplicated sun. -- Llewelyn Powys, _The Pathetic Fallacy_ From Fred L. Drake, Jr." References: <13945.36696.210677.726104@weyr.cnri.reston.va.us> Message-ID: <13946.28333.391076.100292@weyr.cnri.reston.va.us> I wrote: > "december, 1998". The ISO 8601 standard includes such things, and > being able to represent them is useful. (I've not had time to look at Daniel Biddle replied: > Does it? I've typed out the whole standard and am about to convert it into > HTML, and I've not noticed anything like "december, 1998" being possible. > Do you mean "1998-12"? Yes. I was not meaning that the syntax I presented was ISO 8601 compliant, only that the date I described was expressible. Sorry for any confusion; the ISO 8601 syntax is quite strict, and terse (and appropriately so). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From paul@prescod.net Fri Dec 18 15:04:55 1998 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Dec 1998 09:04:55 -0600 Subject: [XML-SIG] Marshalling References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us> Message-ID: <367A6F17.3823C9B3@prescod.net> "Andrew M. Kuchling" wrote: > > My fuzzy argument for this was that I wanted the user to write > only a single subclass, not two of them. Consider having some kind of DTD-adapter class. Python is sufficiently flexible that sometimes delegation and adapters are simpler than subclassing. > Probably data_stack shouldn't be an attribute of the class, > but be passed to each of the unmarshalling functions. That would mean > that the Marshaller class would have no mutable attributes at all, and > the self.__class__ thing would be unnecessary. Good idea. > > * Maybe string handling should be safer...i.e. control characters > > Shouldn't control characters, such as chr(9) or chr(7) be > fine? The code already escapes <,&,>, and aren't those the only > characters to worry about? chr(9), yes. chr(7) no. From REC-XML: Char ::= #x9 | #xA | #xD | [#x20-#D7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] > Definitely, if it supports generic Python instances. However, > I'm less interested in reproducing pickle in XML than in providing a > base for supporting all the various DTDs that are popping up. Presumably the number of new DTDs is going to slow down. That territory on the noosphere is getting crowded. To me, transporting instances is the difference between being useful and being mildly convenient. Allaire has agreed to support my "type" attribute, which strikes me as the major thing required to make this stuff useful for Python->Python object transmission. Also, I think it would be a good idea for Python's ASCII pickle format to (eventually!) be standards-based (i.e. WDDX or something). Sure, it would result in a blow-up, but ASCII pickle is already vebose and slow. Given the choice between proprietary, verbose and slow or open, really verbose and very slow, I think that the latter would be better. If ASCII pickle is intended for human readability and debugging, then why not make it more readable and even editable in XML editors? The whole basis for WDDX and XML-RPC is that XML is bloody verbose but it is also very human-friendly. Anyhow, I'm not trying to invent work for you. If there is some easy way I can add instance marshalling support to only the WDDX subclass (or "adapter") then I will do that. We can migrate it towards full pickle functionality when and if it becomes popular enough to justify the work. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From Fred L. Drake, Jr." References: <36798087.FC366A71@prescod.net> <13945.36696.210677.726104@weyr.cnri.reston.va.us> <3679D1E7.B6881DB4@prescod.net> Message-ID: <13946.30292.59954.75623@weyr.cnri.reston.va.us> Paul Prescod writes: > I'll have to implement a similar module for WDDX. I can't use copy_reg > because WDDX has a cross-language requirement. I can't encode the type Yes; my response was only to the pickle part of your question. I don't see why there can't be an xml.wddx.registry module or something like that which implements the specific mechanics. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Fred L. Drake, Jr." References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us> Message-ID: <13946.31647.653312.303899@weyr.cnri.reston.va.us> Paul Prescod writes: > * why have a single class for marshalling and unmarshalling? Andrew M. Kuchling writes: > My fuzzy argument for this was that I wanted the user to write > only a single subclass, not two of them. Now's my turn to say "this is bogus". This is bogus. It's entirely appropriate to separate the two functions. This also makes sense if you only need to support one or the other for some format not provided with the base package. There is precedence for separate classes in pickle and xdrlib. Paul Prescod writes: > 3. pickle uses various magic methods: __reduce__, __getinitargs__, >__getstate__. Should XML marshalling support some or all of that stuff? Andrew M. Kuchling writes: > Definitely, if it supports generic Python instances. However, > I'm less interested in reproducing pickle in XML than in providing a > base for supporting all the various DTDs that are popping up. This seems to be an issue for the specific subclasses; some systems will support more than others, and the Python implementations should "do the right thing" as appropriate for the specific requirements. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Fred L. Drake, Jr." References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com> <3679C5AC.83D10000@prescod.net> Message-ID: <13946.32271.869068.621724@weyr.cnri.reston.va.us> Paul Prescod writes: > dealing with mutually recursive references between objects: especially in > strongly typed programming languages. In dynamically typed languages you > can easily build proxies for the object that isn't available yet. In a > static language I don't know offhand what you would do. Either static or dynamic languages can be supported using a patch list. Using a patch list eliminates the need to construct proxies as well. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From simeons@allaire.com Fri Dec 18 16:06:06 1998 From: simeons@allaire.com (Simeon Simeonov) Date: Fri, 18 Dec 1998 11:06:06 -0500 Subject: [XML-SIG] WDDX for Python Message-ID: <029b01be2aa0$51b939e0$7315b5cd@ssimeonov.allaire.com> >There are three conventions that should be followed: > > * SGML convention is that the URI should be to a document describing the >object type. That way if you ever "find" a packet, (e.g. as a >serialization of a large data structure) then you can research it. > > * XML Namespaces convention is that applications should not depend on any >particular type of data at the other end (or of the URI pointing to >anything at all) > > * general URL convention is that you or your organization should own the >domain name. > I like the XML Namespaces approach. The type URI should be no more than a unique ID that both ends of a data exchange will use (most likely) to plug into some kind of an object factory. So a generic Python object can serialize its data to a structure w/ a type= attribute obtained from this object factory. Sim Allaire From ken@bitsko.slc.ut.us Fri Dec 18 16:51:26 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: Fri, 18 Dec 1998 10:51:26 -0600 (CST) Subject: [XML-SIG] Recent CVS changes In-Reply-To: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> from "A.M. Kuchling" at Dec 17, 98 11:22:35 pm Message-ID: <199812181651.KAA31392@bitsko.slc.ut.us> Andrew Kuchling wrote: > I'd be interested in seeing what people think of > xml.marshal.generic; does its structure seem easily amenable to further > subclassing to implement other data serializers? Also, does anyone > know of other DTDs for data serialization? I'd like to take a crack > at implementing them all, and seeing if they're all fairly clean > to implement. Another is LDO's XML serialization: The DTD itself has basic specs and I hope to complete more docs over Christmas vacation. -- Ken From Sjoerd.Mullender@cwi.nl Fri Dec 18 17:33:02 1998 From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender) Date: Fri, 18 Dec 1998 18:33:02 +0100 Subject: [XML-SIG] New version of xmllib Message-ID: ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <10433.914002226.1@bireme.cwi.nl> Here is my current version of xmllib.py and the documentation. This version has some API changes with respect to the version currently in Python (also the one in 1.5.2a). This version supports XML namespaces. -- Sjoerd Mullender ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <10433.914002226.2@bireme.cwi.nl> Content-Description: xmllib.py Content-Disposition: attachment; filename="xmllib.py" # A parser for XML, using the derived class as static DTD. # Author: Sjoerd Mullender. import re import string version = '0.2' # Regular expressions used for parsing _S = '[ \t\r\n]+' # white space _opS = '[ \t\r\n]*' # optional white space _Name = '[a-zA-Z_:][-a-zA-Z0-9._:]*' # valid XML name _QStr = "(?:'[^']*'|\"[^\"]*\")" # quoted XML string illegal = re.compile('[^\t\r\n -\176\240-\377]') # illegal chars in content interesting = re.compile('[]&<]') amp = re.compile('&') ref = re.compile('&(' + _Name + '|#[0-9]+|#x[0-9a-fA-F]+)[^-a-zA-Z0-9._:]') entityref = re.compile('&(?P' + _Name + ')[^-a-zA-Z0-9._:]') charref = re.compile('&#(?P[0-9]+[^0-9]|x[0-9a-fA-F]+[^0-9a-fA-F])') space = re.compile(_S + '$') newline = re.compile('\n') attrfind = re.compile( _S + '(?P' + _Name + ')' '(' + _opS + '=' + _opS + '(?P'+_QStr+'|[-a-zA-Z0-9.:+*%?!()_#=~]+))?') starttagopen = re.compile('<' + _Name) starttagend = re.compile(_opS + '(?P/?)>') starttagmatch = re.compile('<(?P'+_Name+')' '(?P(?:'+attrfind.pattern+')*)'+ starttagend.pattern) endtagopen = re.compile('') endbracketfind = re.compile('(?:[^>\'"]|'+_QStr+')*>') tagfind = re.compile(_Name) cdataopen = re.compile(r'') # this matches one of the following: # SYSTEM SystemLiteral # PUBLIC PubidLiteral SystemLiteral _SystemLiteral = '(?P<%s>'+_QStr+')' _PublicLiteral = '(?P<%s>"[-\'()+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*"|' \ "'[-()+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*')" _ExternalId = '(?:SYSTEM|' \ 'PUBLIC'+_S+_PublicLiteral%'pubid'+ \ ')'+_S+_SystemLiteral%'syslit' doctype = re.compile(''+_Name+')' '(?:'+_S+_ExternalId+')?'+_opS) xmldecl = re.compile('<\?xml'+_S+ 'version'+_opS+'='+_opS+'(?P'+_QStr+')'+ '(?:'+_S+'encoding'+_opS+'='+_opS+ "(?P'[A-Za-z][-A-Za-z0-9._]*'|" '"[A-Za-z][-A-Za-z0-9._]*"))?' '(?:'+_S+'standalone'+_opS+'='+_opS+ '(?P\'(?:yes|no)\'|"(?:yes|no)"))?'+ _opS+'\?>') procopen = re.compile(r'<\?(?P' + _Name + ')' + _opS) procclose = re.compile(_opS + r'\?>') commentopen = re.compile('') doubledash = re.compile('--') attrtrans = string.maketrans(' \r\n\t', ' ') # definitions for XML namespaces _NCName = '[a-zA-Z_][-a-zA-Z0-9._]*' # XML Name, minus the ":" ncname = re.compile(_NCName + '$') qname = re.compile('(?:(?P' + _NCName + '):)?' # optional prefix '(?P' + _NCName + ')$') xmlns = re.compile('xmlns(?::(?P'+_NCName+'))?$') # XML parser base class -- find tags and call handler functions. # Usage: p = XMLParser(); p.feed(data); ...; p.close(). # The dtd is defined by deriving a class which defines methods with # special names to handle tags: start_foo and end_foo to handle # and , respectively. The data between tags is passed to the # parser by calling self.handle_data() with some data as argument (the # data may be split up in arbutrary chunks). Entity references are # passed by calling self.handle_entityref() with the entity reference # as argument. class XMLParser: attributes = {} # default, to be overridden elements = {} # default, to be overridden # Interface -- initialize and reset this instance def __init__(self): self.reset() # Interface -- reset this instance. Loses all unprocessed data def reset(self): self.rawdata = '' self.stack = [] self.nomoretags = 0 self.literal = 0 self.lineno = 1 self.__at_start = 1 self.__seen_doctype = None self.__seen_starttag = 0 self.__namespaces = {'xml':None} # xml is implicitly declared # For derived classes only -- enter literal mode (CDATA) till EOF def setnomoretags(self): self.nomoretags = self.literal = 1 # For derived classes only -- enter literal mode (CDATA) def setliteral(self, *args): self.literal = 1 # Interface -- feed some data to the parser. Call this as # often as you want, with as little or as much text as you # want (may include '\n'). (This just saves the text, all the # processing is done by goahead().) def feed(self, data): self.rawdata = self.rawdata + data self.goahead(0) # Interface -- handle the remaining data def close(self): self.goahead(1) # Interface -- translate references def translate_references(self, data, all = 1): i = 0 while 1: res = amp.search(data, i) if res is None: return data res = ref.match(data, res.start(0)) if res is None: self.syntax_error("bogus `&'") i =i+1 continue i = res.end(0) if data[i - 1] != ';': self.syntax_error("`;' missing after entity/char reference") i = i-1 str = res.group(1) pre = data[:res.start(0)] post = data[i:] if str[0] == '#': if str[1] == 'x': str = chr(string.atoi(str[2:], 16)) else: str = chr(string.atoi(str[1:])) data = pre + str + post i = res.start(0)+len(str) elif all: if self.entitydefs.has_key(str): data = pre + self.entitydefs[str] + post i = res.start(0) # rescan substituted text else: self.syntax_error('reference to unknown entity') # can't do it, so keep the entity ref in data = pre + '&' + str + ';' + post i = res.start(0) + len(str) + 2 else: # just translating character references pass # i is already postioned correctly # Internal -- handle data as far as reasonable. May leave state # and data to be processed by a subsequent call. If 'end' is # true, force handling all data as if followed by EOF marker. def goahead(self, end): rawdata = self.rawdata i = 0 n = len(rawdata) while i < n: if i > 0: self.__at_start = 0 if self.nomoretags: data = rawdata[i:n] self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') i = n break res = interesting.search(rawdata, i) if res: j = res.start(0) else: j = n if i < j: if self.__at_start: self.syntax_error('illegal data at start of file') self.__at_start = 0 data = rawdata[i:j] if not self.stack and space.match(data) is None: self.syntax_error('data not in content') if illegal.search(data): self.syntax_error('illegal character in content') self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') i = j if i == n: break if rawdata[i] == '<': if starttagopen.match(rawdata, i): if self.literal: data = rawdata[i] self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') i = i+1 continue k = self.parse_starttag(i) if k < 0: break self.__seen_starttag = 1 self.lineno = self.lineno + string.count(rawdata[i:k], '\n') i = k continue if endtagopen.match(rawdata, i): k = self.parse_endtag(i) if k < 0: break self.lineno = self.lineno + string.count(rawdata[i:k], '\n') i = k continue if commentopen.match(rawdata, i): if self.literal: data = rawdata[i] self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') i = i+1 continue k = self.parse_comment(i) if k < 0: break self.lineno = self.lineno + string.count(rawdata[i:k], '\n') i = k continue if cdataopen.match(rawdata, i): k = self.parse_cdata(i) if k < 0: break self.lineno = self.lineno + string.count(rawdata[i:i], '\n') i = k continue res = xmldecl.match(rawdata, i) if res: if not self.__at_start: self.syntax_error(" declaration not at start of document") version, encoding, standalone = res.group('version', 'encoding', 'standalone') if version[1:-1] != '1.0': raise RuntimeError, 'only XML version 1.0 supported' if encoding: encoding = encoding[1:-1] if standalone: standalone = standalone[1:-1] self.handle_xml(encoding, standalone) i = res.end(0) continue res = procopen.match(rawdata, i) if res: k = self.parse_proc(i) if k < 0: break self.lineno = self.lineno + string.count(rawdata[i:k], '\n') i = k continue res = doctype.match(rawdata, i) if res: if self.literal: data = rawdata[i] self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') i = i+1 continue if self.__seen_doctype: self.syntax_error('multiple DOCTYPE elements') if self.__seen_starttag: self.syntax_error('DOCTYPE not at beginning of document') k = self.parse_doctype(res) if k < 0: break self.__seen_doctype = res.group('name') self.lineno = self.lineno + string.count(rawdata[i:k], '\n') i = k continue elif rawdata[i] == '&': if self.literal: data = rawdata[i] self.handle_data(data) i = i+1 continue res = charref.match(rawdata, i) if res is not None: i = res.end(0) if rawdata[i-1] != ';': self.syntax_error("`;' missing in charref") i = i-1 if not self.stack: self.syntax_error('data not in content') self.handle_charref(res.group('char')[:-1]) self.lineno = self.lineno + string.count(res.group(0), '\n') continue res = entityref.match(rawdata, i) if res is not None: i = res.end(0) if rawdata[i-1] != ';': self.syntax_error("`;' missing in entityref") i = i-1 name = res.group('name') if self.entitydefs.has_key(name): self.rawdata = rawdata = rawdata[:res.start(0)] + self.entitydefs[name] + rawdata[i:] n = len(rawdata) i = res.start(0) else: self.syntax_error('reference to unknown entity') self.unknown_entityref(name) self.lineno = self.lineno + string.count(res.group(0), '\n') continue elif rawdata[i] == ']': if self.literal: data = rawdata[i] self.handle_data(data) i = i+1 continue if n-i < 3: break if cdataclose.match(rawdata, i): self.syntax_error("bogus `]]>'") self.handle_data(rawdata[i]) i = i+1 continue else: raise RuntimeError, 'neither < nor & ??' # We get here only if incomplete matches but # nothing else break # end while if i > 0: self.__at_start = 0 if end and i < n: data = rawdata[i] self.syntax_error("bogus `%s'" % data) if illegal.search(data): self.syntax_error('illegal character in content') self.handle_data(data) self.lineno = self.lineno + string.count(data, '\n') self.rawdata = rawdata[i+1:] return self.goahead(end) self.rawdata = rawdata[i:] if end: if not self.__seen_starttag: self.syntax_error('no elements in file') if self.stack: self.syntax_error('missing end tags') while self.stack: self.finish_endtag(self.stack[-1][0]) # Internal -- parse comment, return length or -1 if not terminated def parse_comment(self, i): rawdata = self.rawdata if rawdata[i:i+4] <> '} delimiters, but not the delimiters themselves. For example, the comment \samp{} will cause this method to be called with the argument \code{'text'}. The default method does nothing. \end{methoddesc} \begin{methoddesc}{handle_cdata}{data} This method is called when a CDATA element is encountered. The \var{data} argument is a string containing the text between the \samp{} delimiters, but not the delimiters themselves. For example, the entity \samp{} will cause this method to be called with the argument \code{'text'}. The default method does nothing, and is intended to be overridden. \end{methoddesc} \begin{methoddesc}{handle_proc}{name, data} This method is called when a processing instruction (PI) is encountered. The \var{name} is the PI target, and the \var{data} argument is a string containing the text between the PI target and the closing delimiter, but not the delimiter itself. For example, the instruction \samp{} will cause this method to be called with the arguments \code{'XML'} and \code{'text'}. The default method does nothing. Note that if a document starts with \samp{}, \method{handle_xml()} is called to handle it. \end{methoddesc} \begin{methoddesc}{handle_special}{data} This method is called when a declaration is encountered. The \var{data} argument is a string containing the text between the \samp{} delimiters, but not the delimiters themselves. For example, the entity \samp{} will cause this method to be called with the argument \code{'ENTITY text'}. The default method does nothing. Note that \samp{} is handled separately if it is located at the start of the document. \end{methoddesc} \begin{methoddesc}{syntax_error}{message} This method is called when a syntax error is encountered. The \var{message} is a description of what was wrong. The default method raises a \exception{RuntimeError} exception. If this method is overridden, it is permissable for it to return. This method is only called when the error can be recovered from. Unrecoverable errors raise a \exception{RuntimeError} without first calling \method{syntax_error()}. \end{methoddesc} \begin{methoddesc}{unknown_starttag}{tag, attributes} This method is called to process an unknown start tag. It is intended to be overridden by a derived class; the base class implementation does nothing. \end{methoddesc} \begin{methoddesc}{unknown_endtag}{tag} This method is called to process an unknown end tag. It is intended to be overridden by a derived class; the base class implementation does nothing. \end{methoddesc} \begin{methoddesc}{unknown_charref}{ref} This method is called to process unresolvable numeric character references. It is intended to be overridden by a derived class; the base class implementation does nothing. \end{methoddesc} \begin{methoddesc}{unknown_entityref}{ref} This method is called to process an unknown entity reference. It is intended to be overridden by a derived class; the base class implementation does nothing. \end{methoddesc} \subsection{XML Namespaces} This module has support for XML namespaces as defined in the XML Namespaces proposed recommendation. Tag and attribute names that are defined in an XML namespace are handled as if the name of the tag or element consisted of the namespace (i.e. the URL that defines the namespace) followed by a space and the name of the tag or attribute. For instance, the tag \code{} is treated as if the tag name was \code{'http://www.w3.org/TR/REC-html40 html'}, and the tag \code{} inside the above mentioned element is treated as if the tag name were \code{'http://www.w3.org/TR/REC-html40 a'} and the attribute name as if it were \code{'http://www.w3.org/TR/REC-html40 src'}. An older draft of the XML Namespaces proposal is also recognized, but triggers a warning. ------- =_aaaaaaaaaa0-- From paul@prescod.net Fri Dec 18 20:10:16 1998 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Dec 1998 14:10:16 -0600 Subject: [XML-SIG] Recent CVS changes References: <199812181651.KAA31392@bitsko.slc.ut.us> Message-ID: <367AB6A8.F2D7B76@prescod.net> Ken MacLeod wrote: > > Another is LDO's XML serialization: > > > > The DTD itself has basic specs and I hope to complete more docs over > Christmas vacation. Can you people please explain why we need all of these competing proposals? XML-RPC looks like a superset of WDDX (in that it has a concept of "method"). It could be described as a superset of WDDX, couldn't it? LDO looks like a *subset* of WDDX except for the REF element type. Can't we all just get along? Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Sports utility vehicles are gated communities on wheels" - Anon From akuchlin@cnri.reston.va.us Fri Dec 18 22:10:22 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 18 Dec 1998 17:10:22 -0500 (EST) Subject: [XML-SIG] Marshalling In-Reply-To: <13946.31647.653312.303899@weyr.cnri.reston.va.us> References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us> <13946.31647.653312.303899@weyr.cnri.reston.va.us> Message-ID: <13946.53571.550184.123458@amarok.cnri.reston.va.us> Fred L. Drake writes: > >>>Paul Prescod writes: >>> * why have a single class for marshalling and unmarshalling? >>Andrew M. Kuchling writes: >> My fuzzy argument for this was that I wanted the user to write >> only a single subclass, not two of them. > > Now's my turn to say "this is bogus". This is bogus. It's And Prescod slams Kuchling into the mat, stunning him! Now Drake has him in a headlock! Oh, the humanity... OK, I'll try to divide the two functions into separate classes, and see how it goes. Would it be all right if I left both the m_* and um_* methods on the basic Marshalling class, and just pushed out the SAX handler methods? Or should there be different Marshaller and Unmarshaller classes? Incidentally, Paul's idea of changing Python's pickle module to XML is an interesting one for Python 2.0, but not really possible before then. It would be nice if xml.marshal could do what pickle does, though. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your home. -- Terry Pratchett & Neil Gaiman, _Good Omens_ From Fred L. Drake, Jr." References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us> <13946.31647.653312.303899@weyr.cnri.reston.va.us> <13946.53571.550184.123458@amarok.cnri.reston.va.us> Message-ID: <13946.54712.149166.228072@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > right if I left both the m_* and um_* methods on the basic Marshalling > class, and just pushed out the SAX handler methods? Or should there > be different Marshaller and Unmarshaller classes? Wasn't that the point? I think pickle and xdrlib got the model right: packing and unpacking are two different functions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From gstein@lyra.org Fri Dec 18 22:59:57 1998 From: gstein@lyra.org (Greg Stein) Date: Fri, 18 Dec 1998 14:59:57 -0800 Subject: [XML-SIG] Recent CVS changes References: <199812181651.KAA31392@bitsko.slc.ut.us> <367AB6A8.F2D7B76@prescod.net> Message-ID: <367ADE6D.357DABD3@lyra.org> Paul Prescod wrote: > > Ken MacLeod wrote: > > > > Another is LDO's XML serialization: > > > > > > > > The DTD itself has basic specs and I hope to complete more docs over > > Christmas vacation. > > Can you people please explain why we need all of these competing > proposals? XML-RPC looks like a superset of WDDX (in that it has a concept > of "method"). It could be described as a superset of WDDX, couldn't it? > > LDO looks like a *subset* of WDDX except for the REF element type. > > Can't we all just get along? Reality says "no" I think we would be in error to create a new one, but since those others are already out there, then (IMO) it is best if we can work with them. Put politics and ideals aside -- pragmatism says "damn it, I need a connector because I need to work with XYZ". It would be nice to keep Python in the game here. Cheers, -g -- Greg Stein, http://www.lyra.org/ From digitome@iol.ie Sat Dec 19 10:58:44 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Sat, 19 Dec 1998 10:58:44 +0000 Subject: [XML-SIG] Python tutorial at XML Europe '99 Message-ID: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie> My proposal to present a half day tutorial on Python at XML Europe '98 in Spain has been accepted. Python goes mainstream at GCA SGML/XML Conference. Great! I will also be doing a half day Python tutorial at WWW8 where XML will receive more than a passing mention:-) The Python/XML combo marches ever onward... See www.gca.org and www. www8.org for conference details. Regards, Sean From Fred L. Drake, Jr." References: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie> Message-ID: <13947.51295.280708.515434@weyr.cnri.reston.va.us> Sean Mc Grath writes: > My proposal to present a half day tutorial on > Python at XML Europe '98 in Spain has > been accepted. Python goes mainstream > at GCA SGML/XML Conference. Great! Congratulations! > I will also be doing a half day Python tutorial > at WWW8 where XML will receive more than > a passing mention:-) Sounds like you've been busy! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From ken@bitsko.slc.ut.us Sat Dec 19 16:55:22 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 19 Dec 1998 10:55:22 -0600 Subject: [XML-SIG] Recent CVS changes In-Reply-To: Paul Prescod's message of Fri, 18 Dec 1998 14:10:16 -0600 References: <199812181651.KAA31392@bitsko.slc.ut.us> <367AB6A8.F2D7B76@prescod.net> Message-ID: Paul Prescod writes: > Ken MacLeod wrote: > > > > Another is LDO's XML serialization: > > > > > > > > The DTD itself has basic specs and I hope to complete more docs over > > Christmas vacation. > > Can you people please explain why we need all of these competing > proposals? XML-RPC looks like a superset of WDDX (in that it has a concept > of "method"). It could be described as a superset of WDDX, couldn't it? > > Can't we all just get along? Serialization in LDO is modular, and LDO includes binary and XML serialization specs that are a ``best fit'' for how LDO handles distributed objects. Python's `pickle' and Perl's `Storable' also work well within LDO for python-to-python or perl-to-perl messages. I would be glad to support WDDX serialization too, or in place of LDO's XML serialization, but it's not a ``best fit'' for LDO right now, in part because it's not specified how to handle binary values (using base64 for example), null values are explicitly unsupported, there's no type or class attributes, no support for object references, and no support for non-string keys in dictionaries (structures). > LDO looks like a *subset* of WDDX except for the REF element type. LDO's XML serialization may have fewer tags, but it does support all the semantics described above. I would say it is actually a superset, because everything in WDDX can be encoded in LDO's XML serialization, but the reverse is not true. -- Ken MacLeod ken@bitsko.slc.ut.us From gwachob@aimnet.com Sat Dec 19 20:28:01 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Sat, 19 Dec 1998 12:28:01 -0800 (PST) Subject: [XML-SIG] Recent CVS changes In-Reply-To: Message-ID: On 19 Dec 1998, Ken MacLeod wrote: > Paul Prescod writes: > > > Ken MacLeod wrote: > > LDO looks like a *subset* of WDDX except for the REF element type. > > LDO's XML serialization may have fewer tags, but it does support all > the semantics described above. I would say it is actually a superset, > because everything in WDDX can be encoded in LDO's XML serialization, > but the reverse is not true. Having worked a little with Ken on the LDO/Python stuff as well as the WDDX stuff, I must say that they do serve very similar functions. Ken's stuff I think has more "requirements" and thus is a little more complicated. WDDX is simpler, and is much easier to implement (thats not a knock against Ken's work -- his work is more ambitious, IMHO). The one thing I would say is that Ken's LDO specification relies more on the processes at each end of the wire to decode what the information traveling over the wire means in a semantic sense. LDO explicitly has no concept of type (which leads to some thorny issues ;-), whereas WDDX has hints or outright imposition of type information. I look at LDO as the "XML" of serialization, whereas WDDX is more like the "HTML" of serialization (in that LDO can be used for more different things, but it requires more work on the processing ends by application writers). If that isn't flame bait, I don't know what is ;-) I like both. Why can't we all get along! One will probably be used more widely than the other. I think LDO is more consistent, but I think WDDX is obviously easier to use. One if written by a really bright guy for a great opensource project (Casbah - http://www.ntlug.org/casbah), one is written by a well-known application-server company who have a lot of recognition. I don't know which one will survive (hell, maybe they *both* will -- that'd be ok) -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From jtauber@jtauber.com Sun Dec 20 06:21:25 1998 From: jtauber@jtauber.com (James Tauber) Date: Sun, 20 Dec 1998 14:21:25 +0800 Subject: [XML-SIG] Python tutorial at XML Europe '99 Message-ID: <00c201be2be1$4988f160$0300000a@othniel.cygnus.uwa.edu.au> -----Original Message----- From: Sean Mc Grath >I will also be doing a half day Python tutorial >at WWW8 where XML will receive more than >a passing mention:-) And I will be doing a full day XML tutorial at WWW8 where Python will receive mention :-) >The Python/XML combo marches ever onward... Indeed. James -- James Tauber / jtauber@jtauber.com / www.jtauber.com Associate Researcher, Electronic Commerce Network Curtin University of Technology, Perth, Western Australia Maintainer of : www.xmlinfo.com, www.xmlsoftware.com and www.schema.net From larsga@ifi.uio.no Sun Dec 20 12:02:55 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 20 Dec 1998 13:02:55 +0100 Subject: [XML-SIG] Python tutorial at XML Europe '99 In-Reply-To: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie> References: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie> Message-ID: * Sean Mc Grath | | My proposal to present a half day tutorial on Python at XML Europe | '98 in Spain has been accepted. Cool! :) For my own part, I will give a full-day tutorial on XML processing (sort of an expansion of the workshop Paul, Geir Ove and I did at SGML/XML Norway '98 a couple of weeks ago) at the same conference. Of course, Python will receive more than a passing mention. --Lars M. From larsga@ifi.uio.no Sun Dec 20 14:29:45 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 20 Dec 1998 15:29:45 +0100 Subject: [XML-SIG] Perl and character encodings In-Reply-To: <36750A0B.EBEB7355@prescod.net> References: <36750A0B.EBEB7355@prescod.net> Message-ID: * Paul Prescod [quoting an XML::Parser release announcement] | | > The major new feature is access to character set encodings other than | > expat's built-in set (UTF-8, UTF-16, ISO-8859-1, US-ASCII). This is done | > through binary character encoding maps appearing in the pathlist | > represented by @XML::Parser::Expat::Encoding_Path. Just for the record: xmlproc has something similar in its charconv module. This module is currently not used by the parser, but modifying xmlproc to use it is a very simple job. I've not given these changes priority, since the conversions that are not simple mappings that can be handled by string.translate are way too slow (and these are of course the most interesting ones, such as utf-8 -> iso-8559-1 and vice versa). Martin von Löwis' module looks like it has some stuff I can use, so this may appear soon if anyone wants it enough to ask for it (or if I one day feel like making it). If anyone else feels like having a go at this, then feel free. --Lars M. From Sjoerd.Mullender@cwi.nl Mon Dec 21 10:35:19 1998 From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender) Date: Mon, 21 Dec 1998 11:35:19 +0100 Subject: [XML-SIG] New version of xmllib In-Reply-To: Your message of Fri, 18 Dec 1998 18:33:02 +0100. References: Message-ID: On Fri, Dec 18 1998 Sjoerd Mullender wrote: > Here is my current version of xmllib.py and the documentation. This > version has some API changes with respect to the version currently in > Python (also the one in 1.5.2a). > This version supports XML namespaces. And here is a patch to this version. There are two improvements: - Fixed a bug where a syntax error was reported when a document started with white space. (White space at the start of a document is valid if there is no XML declaration.) - Improved the speed quite a bit for documents that don't make use of namespaces. -- Sjoerd Mullender Index: xmllib.py =================================================================== RCS file: /ufs/sjoerd/.CVSroot/mm/demo/pylib/xmllib.py,v retrieving revision 1.24 diff -u -r1.24 xmllib.py --- xmllib.py 1998/12/18 17:33:50 1.24 +++ xmllib.py 1998/12/21 10:25:29 @@ -100,6 +100,7 @@ self.__at_start = 1 self.__seen_doctype = None self.__seen_starttag = 0 + self.__use_namespaces = 0 self.__namespaces = {'xml':None} # xml is implicitly declared # For derived classes only -- enter literal mode (CDATA) till EOF @@ -183,10 +184,10 @@ else: j = n if i < j: - if self.__at_start: + data = rawdata[i:j] + if self.__at_start and space.match(data) is None: self.syntax_error('illegal data at start of file') self.__at_start = 0 - data = rawdata[i:j] if not self.stack and space.match(data) is None: self.syntax_error('data not in content') if illegal.search(data): @@ -439,6 +440,7 @@ name = res.group(0) if name == 'xml:namespace': self.syntax_error('old-fashioned namespace declaration') + self.__use_namespaces = -1 # namespace declaration # this must come after the declaration (if any) # and before the (if any). @@ -489,6 +491,8 @@ # namespace declaration ncname = res.group('ncname') namespace[ncname or ''] = attrvalue or None + if not self.__use_namespaces: + self.__use_namespaces = len(self.stack)+1 continue if '<' in attrvalue: self.syntax_error("`<' illegal in attribute value") @@ -518,7 +522,10 @@ k, j = tag.span('attrs') attrdict, nsdict, k = self.parse_attributes(tagname, k, j) self.stack.append((tagname, nsdict, nstag)) - res = qname.match(tagname) + if self.__use_namespaces: + res = qname.match(tagname) + else: + res = None if res is not None: prefix, nstag = res.group('prefix', 'local') if prefix is None: @@ -535,27 +542,28 @@ nstag = prefix + ':' + nstag # undo split self.stack[-1] = tagname, nsdict, nstag # translate namespace of attributes - nattrdict = {} - for key, val in attrdict.items(): - res = qname.match(key) - if res is not None: - aprefix, key = res.group('prefix', 'local') - if aprefix is None: - aprefix = '' - ans = None - for t, d, nst in self.stack: - if d.has_key(aprefix): - ans = d[aprefix] - if ans is None and aprefix != '': - ans = self.__namespaces.get(aprefix) - if ans is not None: - key = ans + ' ' + key - elif aprefix != '': - key = aprefix + ':' + key - elif ns is not None: - key = ns + ' ' + key - nattrdict[key] = val - attrdict = nattrdict + if self.__use_namespaces: + nattrdict = {} + for key, val in attrdict.items(): + res = qname.match(key) + if res is not None: + aprefix, key = res.group('prefix', 'local') + if aprefix is None: + aprefix = '' + ans = None + for t, d, nst in self.stack: + if d.has_key(aprefix): + ans = d[aprefix] + if ans is None and aprefix != '': + ans = self.__namespaces.get(aprefix) + if ans is not None: + key = ans + ' ' + key + elif aprefix != '': + key = aprefix + ':' + key + elif ns is not None: + key = ns + ' ' + key + nattrdict[key] = val + attrdict = nattrdict attributes = self.attributes.get(nstag) if attributes is not None: for key in attrdict.keys(): @@ -634,6 +642,8 @@ self.handle_endtag(nstag, method) else: self.unknown_endtag(nstag) + if self.__use_namespaces == len(self.stack): + self.__use_namespaces = 0 del self.stack[-1] # Overridable -- handle xml processing instruction From Milan.Hemzal@pvt.cz Mon Dec 21 15:02:38 1998 From: Milan.Hemzal@pvt.cz (=?ISO-8859-2?Q?Hem=BEal_Milan?=) Date: Mon, 21 Dec 1998 16:02:38 +0100 Subject: [XML-SIG] (no subject) Message-ID: <6CD0F60F48F9D1119E4B0000F87A9AE2287724@p40w13.plz.pvt.cz> From gwachob@aimnet.com Tue Dec 22 02:06:04 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Mon, 21 Dec 1998 18:06:04 -0800 (PST) Subject: [XML-SIG] Simple WDDX Serialization Message-ID: OK, I have not been following the serialization thread very closely. I want to put together a simple WDDX serializer, and I want to throw out my idea to see if anyone can see any major problems. Basically, serialization is easy for most objects. Tuples, Arrays -> WDDX Arrays Objects -> Structs (obviously, skipping methods) Number -> Numbers String -> String For the dateTime WDDX type, I am thinking either 1) do pattern matching on strings to determine if they are valid time/dates -- if so, make them dateTime WDDX elements, or 2) if a string begins with a magic code, then the rest of the string is interpreted as a dateTime element. We could also have a flag in the serializer which turns on or off serialization into dateTime globally for the serialization of a particular object. I'm thinking that the serializer would only serialize a whole object at a time (ie it would not allow for "building" WDDX packets programmatically) Thoughts? Bumps in the road? -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From gwachob@aimnet.com Tue Dec 22 02:17:15 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Mon, 21 Dec 1998 18:17:15 -0800 (PST) Subject: [XML-SIG] More on WDDX Serialization Message-ID: Oh yeah, the thorny Recordset issue. I guess the rule would be if a dictionary contains a number of keys which map to arrays of equal size, then the dictionary should encoded as a recordset (but this also requires that the arrays of equal size contain only "simple" types -- otherwise, we are to use an array of structures (have to think about this one)). There are also issue about enforcing the distinguishability of recordset field names (thats not too difficult). -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From paul@prescod.net Tue Dec 22 20:54:02 1998 From: paul@prescod.net (Paul Prescod) Date: Tue, 22 Dec 1998 14:54:02 -0600 Subject: [XML-SIG] Simple WDDX Serialization References: Message-ID: <368006EA.C3328307@prescod.net> Gabe Wachob wrote: > > OK, I have not been following the serialization thread very closely. > > I want to put together a simple WDDX serializer, and I want to throw out > my idea to see if anyone can see any major problems. You should probably build on the work that Andrew Kuchling is doing in his "universal serializer." > String -> String > > For the dateTime WDDX type, I am thinking either 1) do pattern matching on > strings to determine if they are valid time/dates -- if so, make them > dateTime WDDX elements, or 2) if a string begins with a magic code, then > the rest of the string is interpreted as a dateTime element. Autosensing of either sort seems dangerous. Also, Python dates can be encoded as integers and tuples. (see the time module for more information. What we need is to ship some particular date/time class with the XML package and require people to use it on both input and output. > We could also > have a flag in the serializer which turns on or off serialization into > dateTime globally for the serialization of a particular object. I'm not sure what you mean. > I'm thinking that the serializer would only serialize a whole object at a > time (ie it would not allow for "building" WDDX packets programmatically) That sounds fine. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Are the social and economic benefits of capital punishment sufficient to outweigh the injustice of accidentally executing innocents?" "What benefits???" From akuchlin@cnri.reston.va.us Tue Dec 22 21:17:13 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 22 Dec 1998 16:17:13 -0500 (EST) Subject: [XML-SIG] Simple WDDX Serialization In-Reply-To: References: Message-ID: <13952.2782.43087.9551@amarok.cnri.reston.va.us> Gabe Wachob writes: >Tuples, Arrays -> WDDX Arrays >Objects -> Structs (obviously, skipping methods) >Number -> Numbers >String -> String Dictionaries -> Structs would be another possibility. >For the dateTime WDDX type, I am thinking either 1) do pattern matching on >strings to determine if they are valid time/dates -- if so, make them >dateTime WDDX elements, or 2) if a string begins with a magic code, then >the rest of the string is interpreted as a dateTime element. We could also For dateTime, we would really need a standard date/time object, included in either the Python standard library or in the XML package. Instances of this object would then become dateTime elements in the generated WDDX. For record sets, I haven't thought up anything yet, but I like your idea of a dictionary of keys mapping to equal-sized lists. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The world is full of people whose notion of a satisfactory future is, in fact, a return to an idealised past. -- Robertson Davies, _A Voice from the Attic_ From gwachob@aimnet.com Wed Dec 23 01:30:43 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Tue, 22 Dec 1998 17:30:43 -0800 (PST) Subject: [XML-SIG] Simple WDDX Serialization In-Reply-To: <368006EA.C3328307@prescod.net> Message-ID: On Tue, 22 Dec 1998, Paul Prescod wrote: > Gabe Wachob wrote: > > > > OK, I have not been following the serialization thread very closely. > > > > I want to put together a simple WDDX serializer, and I want to throw out > > my idea to see if anyone can see any major problems. > > You should probably build on the work that Andrew Kuchling is doing in his > "universal serializer." I saw mentions of this, but I have seen this. Pointers? > > > String -> String > > > > For the dateTime WDDX type, I am thinking either 1) do pattern matching on > > strings to determine if they are valid time/dates -- if so, make them > > dateTime WDDX elements, or 2) if a string begins with a magic code, then > > the rest of the string is interpreted as a dateTime element. > > Autosensing of either sort seems dangerous. Also, Python dates can be > encoded as integers and tuples. (see the time module for more information. > What we need is to ship some particular date/time class with the XML > package and require people to use it on both input and output. Well, thats fine -- I'm just trying to suggest something I can do now... > > We could also > > have a flag in the serializer which turns on or off serialization into > > dateTime globally for the serialization of a particular object. > > I'm not sure what you mean. Not important -- basically to allow globally for NOT doing the autosensing. -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From gwachob@aimnet.com Wed Dec 23 01:37:36 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Tue, 22 Dec 1998 17:37:36 -0800 (PST) Subject: [XML-SIG] Simple WDDX Serialization In-Reply-To: <13952.2782.43087.9551@amarok.cnri.reston.va.us> Message-ID: On Tue, 22 Dec 1998, Andrew M. Kuchling wrote: > Gabe Wachob writes: > >Tuples, Arrays -> WDDX Arrays > >Objects -> Structs (obviously, skipping methods) > >Number -> Numbers > >String -> String > > Dictionaries -> Structs would be another possibility. I think I mentioned that in another email.. If I didn't then oops. > >For the dateTime WDDX type, I am thinking either 1) do pattern matching on > >strings to determine if they are valid time/dates -- if so, make them > >dateTime WDDX elements, or 2) if a string begins with a magic code, then > >the rest of the string is interpreted as a dateTime element. We could also > > For dateTime, we would really need a standard date/time object, > included in either the Python standard library or in the XML package. > Instances of this object would then become dateTime elements in the > generated WDDX. Hey, I'm all for a dateTime object in the Python lib... However, isn't the point of writing a WDDX serializer to make WDDX *transparent*? That is, don't you want to eliminate special effort on the part of the Python programmer in composing WDDX packets from Python entities? It seems unclean to have the WDDX serializer be transparent *except* for the dateTime object -- perhaps this is WDDX's fault (dateTime seems to me to be a higher level abstraction than String, Number, Array, etc). > For record sets, I haven't thought up anything yet, but I like your > idea of a dictionary of keys mapping to equal-sized lists. I see this an unavoidable kludge, actually. The problem is that the array elements have to consist solely of "simple" types (according to the DTD). That means that to "autodetect" that a dictionary should be mapped to a recordset, we need to figure out the type of every element in every array in the dictionary. Now, I suppose this may not be a big issue if we assume that the data structures involved are not too complex or large (a valid assumption given the type of applications likely to use WDDX, I would think). -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From paul@prescod.net Wed Dec 23 05:19:29 1998 From: paul@prescod.net (Paul Prescod) Date: Tue, 22 Dec 1998 23:19:29 -0600 Subject: [XML-SIG] Simple WDDX Serialization References: Message-ID: <36807D61.13A47F5C@prescod.net> Gabe Wachob wrote: > > It seems unclean to have the WDDX serializer be transparent *except* for > the dateTime object -- perhaps this is WDDX's fault (dateTime seems to me > to be a higher level abstraction than String, Number, Array, etc). WDDX is not going to be transparent unless it handles instances and none of the implementations handle those yet. I can't remember the last time I created a Python data structure that consisted of only dictionaries, tuples, lists and other built-in types. Anyhow, there is no way to make date/time handling transparent in Python until there is a Python date/time class. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "In spite of everything I still believe that people are basically good at heart." - Anne Frank From Frank McGeough" Hi, Is it possible to run the test release of XML on NT? I downloaded the software from : http://www.python.org/topics/xml/download.html The README says to run make. I don't have a Unix style make. Is there a version that would work with Microsoft's nmake and VC compiler. Thanks, Frank Synchrologic, Inc. http://www.synchrologic.com T: 770.754.5600 F: 770.619.5612 From akuchlin@cnri.reston.va.us Sun Dec 27 16:30:28 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 27 Dec 1998 11:30:28 -0500 Subject: [XML-SIG] Marshalling In-Reply-To: <13946.24429.905213.372579@amarok.cnri.reston.va.us> References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us> Message-ID: <199812271630.LAA14883@207-172-46-235.s235.tnt9.ann.erols.com> I've been working on the XML marshalling some more, and have implemented handling of Python instances. In the generic module, instances are marshalled as: ... init args ... contents of __dict__ ... I don't know what to do for WDDX and XML-RPC, if anything. Earlier, I wrote: > Probably data_stack shouldn't be an attribute of the class, > but be passed to each of the unmarshalling functions. That would mean > that the Marshaller class would have no mutable attributes at all, and > the self.__class__ thing would be unnecessary. Unfortunately, I realized this isn't possible because I'm using SAX to parse the XML when unmarshalling, and that gives me no way to pass in an additional argument. So the self.__class__() hack has to stay. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Perhaps they are leaving the village. They are going up to the high place, to wait there for the end of their world. And here in my room (I will be fifty soon. I wonder if I will see that birthday, if I will be here to celebrate?)... all alone, I am going with them. -- The director's last screenplay in SIGNAL TO NOISE From akuchlin@cnri.reston.va.us Sun Dec 27 17:07:42 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 27 Dec 1998 12:07:42 -0500 Subject: [XML-SIG] Namespace support for DOM Message-ID: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com> After reading over the current namespace working draft, I thought a little bit about how PyDOM should support it. I'd like to hear opinions on this... Thought 1: the idea of walking over the whole tree and annotating it is bad, because if you modify the tree, the annotations become outdated and you have to recompute them. Thought 2: similar reasoning applies to modifying element or attribute names by removing the namespace prefix. Thought 3: therefore, the better course of action is to have functions or methods that dynamically compute what namespaces apply by looking at a node's ancestors. Thought 4: looking at the draft, the bits that are needed are functions or methods to do the following: 1) Get a dictionary mapping namespace prefixes to URIs, and vice versa; this would be done by walking up the tree looking at xmlns:* attributes. 2) Get the default namespace (might be prefix = "" in the dictionary returned from the previous function) 3) Divide an element or attribute name into the prefix and the rest of the name. This means that namespace-using applications won't have everything done for them; Python code might look vaguely like: XSL_URI = "http://www.w3.org/..." uri = node.get_namespace_mapping() # Next line assumes node is an Element tag nsp, name = divide_qualified_name( node.tagName ) if uri[nsp] == XSL_URI: # node is a tag in the XSL namespace; react appropriately elif uri[nsp] == other_namespace: # do something else Proposed interfaces need to be tried out by actually implementing something on top of them, in order to find areas that have been missed. Can anyone suggest some namespace-using application that would be useful as a test case? It would also provide another demo application. The transformation portion of XSL is one candidate, but I haven't read enough of the XSL draft to get an idea of how big the job would be. Anyone know of something small? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The bitterest tears shed over graves are for words left unsaid and for deeds left undone. -- Harriet Beecher Stowe From gwachob@aimnet.com Mon Dec 28 03:11:51 1998 From: gwachob@aimnet.com (Gabe Wachob) Date: Sun, 27 Dec 1998 19:11:51 -0800 (PST) Subject: [XML-SIG] Namespace support for DOM In-Reply-To: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com> Message-ID: On Sun, 27 Dec 1998, A.M. Kuchling wrote: > After reading over the current namespace working draft, I thought a > little bit about how PyDOM should support it. I'd like to hear > opinions on this... > > Thought 1: the idea of walking over the whole tree and > annotating it is bad, because if you modify the tree, the annotations > become outdated and you have to recompute them. What about annotation where when you modify a node, you simply recompute the namespace annotations for all the nodes in the subtree of that changed node. I *think* you can do this efficiently (in other words, the newly changed node can be scanned to see if it could possibly have an effect on its children's namespaces). If it doesn't not contain any namespace-related declarations, for example, there shouldn't be any need to update subtree namespace annotations... > Thought 2: similar reasoning applies to modifying element or > attribute names by removing the namespace prefix. I think similar reasoning would also apply to my previous comment (though not sure). > Thought 3: therefore, the better course of action is to have > functions or methods that dynamically compute what namespaces apply by > looking at a node's ancestors. I'm not sure if you are suggesting what I mention in the my first response, or whether (as I think) you are suggesting a "get_namespace" (I assume thats what get_namespace_mapping() is below). Is there a concise statement of the algorithm for determining the namespace of an element or attribute somewhere? I have not been able to find one.. > Proposed interfaces need to be tried out by actually implementing > something on top of them, in order to find areas that have been > missed. Can anyone suggest some namespace-using application that > would be useful as a test case? It would also provide another demo > application. The transformation portion of XSL is one candidate, but > I haven't read enough of the XSL draft to get an idea of how big the > job would be. Anyone know of something small? Well, to be a nontrivial test, wouldn't we want some app built on documents using multiple namespaces?? I mean, if everything is xsl:, whats the point -- you'll always get either xml or "another" namespace (whatever your output namespaces is I guess). How about something simple with RDF? How about a RDF equality tool? Takes two RDF XML documents and determines if the two are semantically equivalent forms? Ora Lassila Ora Lassila Ora Lassila All three of these are "semantically equivalent" (or are they -- I notice a lack of namespace declarations in the latter two examples?), but not syntactically equivalent. This would be a more interesting tool if more than two namespaces (RDF and then multiple schemas) were involved. I don't know complicated this would be (I'm not extremely familiar with RDF). I, as I gather, RDF documents can be treated as directed graphs, it would seem to me that equivalence shouldn't be too hard a task to take on... -Gabe ------------------------------------------------------------------- http://www.aimnet.com/~gwachob http://www.findlaw.com "A popular Government, without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps both." -- James Madison import std.disclaimer From prescod@prescod.net Mon Dec 28 14:01:43 1998 From: prescod@prescod.net (Paul) Date: Mon, 28 Dec 1998 08:01:43 -0600 (CST) Subject: [XML-SIG] Namespace support for DOM In-Reply-To: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com> Message-ID: On Sun, 27 Dec 1998, A.M. Kuchling wrote: > > Thought 3: therefore, the better course of action is to have > functions or methods that dynamically compute what namespaces apply by > looking at a node's ancestors. This is probably the best, mostly because you want your namespace-enhanced DOM to be a superset of the regular DOM. > 1) Get a dictionary mapping namespace prefixes to URIs, and > vice versa; this would be done by walking up the tree looking at > xmlns:* attributes. I don't think that the programmer needs access to this dictionary. Internally you need it, but I don't think that the programmer should. > This means that namespace-using applications won't have everything > done for them; Why not? > Python code might look vaguely like: > > XSL_URI = "http://www.w3.org/..." > uri = node.get_namespace_mapping() > > # Next line assumes node is an Element tag > nsp, name = divide_qualified_name( node.tagName ) > > if uri[nsp] == XSL_URI: I think that this would be better: uri, name = namespace_divide( node.tagName ) You can do the lookup internally, whether you use a dictionary or a walk up the tree is your business. > would be useful as a test case? It would also provide another demo > application. The transformation portion of XSL is one candidate, but > I haven't read enough of the XSL draft to get an idea of how big the > job would be. Anyone know of something small? How about an app that rewrote namespace prefixes to some canonical form to allow simple diff-ing. So maybe you have a configuration file like this: and you would feed in a document like this: ... ... ... and would rewrite it (based on the elided namespace declarations and the configuration file) as: ... ... ... This is useful for all of the usual reasons canoncalization is useful: to write simpler software that depends on the output instead of understanding the input. For instance if you were writing an RDF processor but were to lazy to handle the various requirements of namespaces you would pipe your data through the canoncalizer and do "dumb checks" like tagName=="rdf:description". To be totally useful to programmers and not just as a demo app, it should actually transform one DOM into another (or, better, act as a lazy proxy). I think that this app would use all features of the namespace draft. Paul Prescod From akuchlin@cnri.reston.va.us Mon Dec 28 15:15:57 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 28 Dec 1998 10:15:57 -0500 (EST) Subject: [XML-SIG] Updated XML HOWTO with DOM coverage Message-ID: <199812281515.KAA19087@amarok.cnri.reston.va.us> I've added more coverage of how to use PyDOM to the XML HOWTO. The new material starts at: http://www.python.org/doc/howto/xml/DOM.html As usual, comments are welcome. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "I... I did not intend to hurt you." "And what if you did not? Intent and outcome are so rarely coincident." -- Dream and Larissa, in SANDMAN #65: "The Kindly Ones:9" From spepping@scaprea.hobby.nl Tue Dec 29 18:31:23 1998 From: spepping@scaprea.hobby.nl (Simon Pepping) Date: Tue, 29 Dec 1998 19:31:23 +0100 (MET) Subject: [XML-SIG] Documentation and problems Message-ID: Hi, I have spent quite some time with the XML package, mainly with the SAX interface and xmlproc. As a result I have written a(nother) document about the interaction of an application and a SAX parser, and how to write a SAX application. I also wrote a simple application to demonstrate it. Check it out at http://www.hobby.nl/~scaprea/XML/index.html. I also made a short list of problems I encountered: Pr. SAXParseException.__str__ reads: return "%s at %s:%d:%d" % (self.msg,self.getSystemId(), self.getColumnNumber(),self.getLineNumber()) getColumnNumber and getLineNumber should be swapped. ======================== Pr. pyexpat does not report the document name with the getSystemId method: Document: Fatal error: not well-formed at :5:1 (SAXParseException.__str__) ======================== Pr. XMLValidator does not use my error handlers: ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at ./waarnemingen.dtd:17:25 TEXT: '#PCDATA )>' Possible cause: XMLValidator.reset() and XMLValidator.set_dtd_listener(). With these modifications it works, but now the location in the DTD is no longer reported: def reset(self): self.dtd=CompleteDTD(ErrorHandler(self.parser)) # added SP 1998/12/23 self.dtd.set_dtd_listener(self.parser.dtd_listener) # added SP 1998/12/23 self.dtd.set_error_handler(self.parser.err) self.val=ValidatingApp(self.dtd) self.val.set_real_app(self.app) # added SP 1998/12/23 self.val.set_error_handler(self.parser.err) self.parser.reset() self.parser.set_application(self.val) self.parser.dtd=self.dtd self.parser.ent=self.dtd def set_dtd_listener(self,dtd_listener): self.parser.set_dtd_listener(dtd_listener) # added SP 1998/12/23 self.dtd.set_dtd_listener(dtd_listener) ======================== Pr. drv_xmlproc does not implement a getPublicId method: # added SP 1998/12/24 def getPublicId(self): # Hmmm, the parser has no method to get the PubID # return self.parser.get_current_pubid() return 'unknown' ========================= Pr. XMLValidator does not accept spaces around #PCDATA as content in an element type declaration: ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at ./waarnemingen.dtd:17:25 TEXT: '#PCDATA )>' ========================= Pr. XMLValidator does not accept the following construction in an external DTD: ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at waarnemingen.dtd:22:38 TEXT: '%tekst;)> ' (the declaration of p is line 22) I am not sure whether this is allowed. nsgmls gives the warning: '#PCDATA in nested model group'. I hope this is useful. And thanks for the work you have already put into this. It generally works fine. Simon Pepping email: spepping@scaprea.hobby.nl From fm@synchrologic.com Tue Dec 29 20:11:51 1998 From: fm@synchrologic.com (Frank McGeough) Date: Tue, 29 Dec 1998 15:11:51 -0500 Subject: [XML-SIG] Documentation and problems Message-ID: <009e01be3367$794b5220$529b90d1@synchrologic.com> Simon, In your doc at : http://www.hobby.nl/~scaprea/XML/t173.html I believe the 2. Call the parser factory with the name of a known driver module, e.g., SAXparser=xml.sax.saxexts.make_parser("xml.sax.drivers.drv_xmlproc") is incorrect. The saxexts.py has the following code in it: parser_name = 'xml.sax.drivers.drv_' + parser_name therefore you should create the parser with : SAXparser=xml.sax.saxexts.make_parser("xmlproc") This may have been a recent change. I just started in with Python XML stuff. I have downloaded the xml-0_5.zip version. Thanks for putting that doc on-line. I found it very helpful. -----Original Message----- From: Simon Pepping To: Python XML-SIG Date: Tuesday, December 29, 1998 2:56 PM Subject: [XML-SIG] Documentation and problems >Hi, > >I have spent quite some time with the XML package, mainly with the SAX >interface and xmlproc. As a result I have written a(nother) document >about the interaction of an application and a SAX parser, and how to >write a SAX application. I also wrote a simple application to >demonstrate it. > >Check it out at http://www.hobby.nl/~scaprea/XML/index.html. > From dieter@handshake.de Wed Dec 30 18:39:28 1998 From: dieter@handshake.de (Dieter Maurer) Date: Wed, 30 Dec 1998 19:39:28 +0100 Subject: [XML-SIG] Experiences with xml-0.5 Message-ID: <199812301839.TAA01200@lindm.dm> This is a multi-part MIME message. --------------FC5583E803777E8ABB8C4995 Content-Type: text/plain; charset=iso-8859-1 Based on our xml-0.5 release, I have made a small tool which adds a hierarchical content table to HTML documents: URL:http://www.handshake.de/~dieter/pyprojects/addContentTable.html I encountered three bugs: 1. "xml.dom.core.Document"s methods "get_firstChild" and "get_lastChild" (inherited from "xml.dom.core.Node") fail to initialize the "ownerDocument" in the children correctly (patch attached). 2. "xml.dom.write.OutputStream.write" folds successive '\n' into a single '\n' (i.e. it eliminates empty lines). This is bad for preformatted elements (patch attached). 3. The "NodeList" returned by "get_childNodes" is live (as required by the standard). This can make children processing a bit hasardous (the downside of liveness), e.g. f= dom.createDocumentFragment() for c in node.childNodes: f.appendChild(c) will *NOT* put all children of "node" into "f" (it does for about every second, and leaves the remaining children) because the list is modified as a side effect. This is a well known problem with Pythons for loop. However, the standard workaround (using a slice copy of the list) does not work in this case, because "NodeList[:]" does not yield a NodeList but rather a "_nodeData". Dieter --------------FC5583E803777E8ABB8C4995 Content-Type: application/x-patch; name="docowner.pat" Content-Description: Patch to provide "xml.dom.core.Document" its own implementation of "get_firstChild" and "get_lastChild" correctly initializing "ownerDocument" of the children. --- :core.py Tue Dec 29 10:45:25 1998 +++ core.py Tue Dec 29 14:59:35 1998 @@ -1041,6 +1041,27 @@ def get_childNodes(self): return NodeList(self._node.children, self, self) + ## DM: the inherited method fails to set "._document" correctly + def get_firstChild(self): + """Return the first child of this node. If there is no such node, this + returns null.""" + + if self._node.children: + n = self._node.children[0] + return NODE_CLASS[ n.type ] (n, self, self) + else: + return None + + ## DM: the inherited method fails to set "._document" correctly + def get_lastChild(self): + """Return the last child of this node. If there is no such node, this + returns null.""" + if self._node.children: + n = self._node.children[-1] + return NODE_CLASS[ n.type ] (n, self, self) + else: + return None + def get_documentElement(self): """Return the root element of the Document object, or None if there is no root element.""" --------------FC5583E803777E8ABB8C4995 Content-Type: application/x-patch; name="emptyline.pat" Content-Description: Patch to remove empty line removal in "xml.dom.writer" --- :writer.py Tue Dec 29 10:45:27 1998 +++ writer.py Wed Dec 30 11:51:50 1998 @@ -16,7 +16,9 @@ def write(self, s): #print 'write', `s` - self.file.write(re.sub('\n+', '\n', s)) + #self.file.write(re.sub('\n+', '\n', s)) + # removing newlines is not a good idea for 'pre', e.g. + self.file.write(s) if s and s[-1] == '\n': self.new_line = 1 else: --------------FC5583E803777E8ABB8C4995-- From pas@xis.xerox.com Tue Dec 29 21:42:54 1998 From: pas@xis.xerox.com (Perry Stoll) Date: Tue, 29 Dec 1998 13:42:54 -0800 Subject: [XML-SIG] Running XML on NT Message-ID: <004601be3374$6a718500$b54cf60d@bushido> In case no one has responded to you offline, I'll respond here. Yes, I have the xml-0.5 release running on NT. There are some dlls in the the xml-0.5/windows/ directory. The wstrop module is not there, although it's not strictly necessary (if you don't have utf8 encoded data). I compiled sgmlop, wstring, pyexpat modules. I also copied over the xml.* modules into a directory on my path by hand. It seems to do the trick. If the packager (Andrew?) of the xml module would like a wstrop.dll, I'd pass it along. Frank, if you want more specific instructions (as opposed to "Yep, i've done it!"), let me know. -Perry -----Original Message----- From: Frank McGeough To: xml-sig@python.org Date: Saturday, December 26, 1998 8:41 PM Subject: [XML-SIG] Running XML on NT >Hi, > >Is it possible to run the test release of XML >on NT? I downloaded the software from : >http://www.python.org/topics/xml/download.html > >The README says to run make. I don't have a >Unix style make. Is there a version that would >work with Microsoft's nmake and VC compiler. > >Thanks, >Frank > >Synchrologic, Inc. >http://www.synchrologic.com >T: 770.754.5600 >F: 770.619.5612 > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://www.python.org/mailman/listinfo/xml-sig > > From akuchlin@cnri.reston.va.us Mon Dec 28 22:07:56 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 28 Dec 1998 17:07:56 -0500 (EST) Subject: [XML-SIG] Namespace support for DOM In-Reply-To: References: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com> Message-ID: <13959.65326.842240.956946@amarok.cnri.reston.va.us> Paul writes: >> 1) Get a dictionary mapping namespace prefixes to URIs, and >> vice versa; this would be done by walking up the tree looking at >> xmlns:* attributes. > >I don't think that the programmer needs access to this dictionary. >Internally you need it, but I don't think that the programmer should. >I think that this would be better: >uri, name = namespace_divide( node.tagName ) Talking about this with Fred at lunch today, I realized that this is probably not sufficient, and that you really do need access to the dictionary. Consider an Element node with no namespace prefix; its namespace is therefore assumed to be the default one. Take that node out of the tree, and insert it somewhere else, where the default namespace is *different*. Assume that this behaviour isn't what you want; instead, you want to keep the element in the same namespace as it was originally in. This may mean adding the right prefix for the namespace's URI, which means you need some way of getting at the prefixes and URIs availabe at the new location. (It could also be done by adding an xmlns="URI" attribute to the element, but that makes solving this problem too easy. :) More seriously, there might be applications where adding the NS prefix is the only way to go.) I like the idea of the namespace canonicalizer as a demo app, BTW. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I must be strong. And in my head a voice says, Yes, Dear, you must. And in my head another voice is muttering Oh that I were a man, or that I had power to execute my apprehended wishes: I would whip some with scorpions... And a voice says, You know what you must do. -- Lyta is told her son is dead, in SANDMAN #59: "The Kindly Ones:3" From prescod@prescod.net Thu Dec 31 04:47:14 1998 From: prescod@prescod.net (Paul) Date: Wed, 30 Dec 1998 22:47:14 -0600 (CST) Subject: [XML-SIG] Namespace support for DOM In-Reply-To: <13959.65326.842240.956946@amarok.cnri.reston.va.us> Message-ID: On Mon, 28 Dec 1998, Andrew M. Kuchling wrote: > Talking about this with Fred at lunch today, I realized that > this is probably not sufficient, and that you really do need access to > the dictionary. Consider an Element node with no namespace prefix; > its namespace is therefore assumed to be the default one. Take that > node out of the tree, and insert it somewhere else, where the default > namespace is *different*. Assume that this behaviour isn't what you > want; instead, you want to keep the element in the same namespace as > it was originally in. Since namespace defaulting is just a typing convenience, I would argue that moving a node should never change its namespace. > This may mean adding the right prefix for the namespace's URI, > which means you need some way of getting at the prefixes and URIs > availabe at the new location. (It could also be done by adding an > xmlns="URI" attribute to the element, but that makes solving this > problem too easy. :) More seriously, there might be applications where > adding the NS prefix is the only way to go.) I think that The namespace-aware node-moving-method should do the fixup automatically. Maybe my desire to have everything be automatic and semantically clean is at odds with your desire to have this be a transparent extension to the DOM that doesn't change the behaviorof any DOM-builtin method. Paul Prescod From akuchlin@cnri.reston.va.us Thu Dec 31 19:08:41 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 31 Dec 1998 14:08:41 -0500 (EST) Subject: [XML-SIG] Namespace support for DOM In-Reply-To: References: <13959.65326.842240.956946@amarok.cnri.reston.va.us> Message-ID: <13963.51590.804687.473078@amarok.cnri.reston.va.us> Paul writes: >I think that The namespace-aware node-moving-method should do the fixup >automatically. > >Maybe my desire to have everything be automatic and semantically clean is >at odds with your desire to have this be a transparent extension to the >DOM that doesn't change the behaviorof any DOM-builtin method. Indeed; I'm frightened of adding some sort of clever, invalidate-namespaces-on-a-move, scheme and opening the door to lots of subtle bugs. Also, the PyDOM representation has nodes with a list of their children, and no parent pointers; this makes the traversing of ancestors difficult. I'm somewhat tempted to toss the recently announced WeakDict object into the XML package and add parent pointers, but it may be too late to undertake such a large change to the DOM code. Any opinions? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "Wow. That's wicked! Like _Star Wars_." "A strange analogy, child, but indeed, there was a war in heaven, and you see the vanquished now, burning as they fall, like stars. In the darkness before the first dawn, theirs was the first folly; theirs the first rebellion." -- Tim and Dr Occult, in BOOKS OF MAGIC #1