From boud2@rempt.xs4all.nl Tue Feb 1 06:45:00 2000 From: boud2@rempt.xs4all.nl (Boudewijn Rempt (KDE test user)) Date: Tue, 1 Feb 2000 07:45:00 +0100 (CET) Subject: [XML-SIG] DevDay results In-Reply-To: <38961207.EF6835F2@prescod.net> Message-ID: (I ought to introduce myself first - while I'm not working on the XML modules myself, I've spent the past month working on a XML editor for KDE, using Python.) On Mon, 31 Jan 2000, Paul Prescod wrote: > uche.ogbuji@fourthought.com wrote: > > > > > My vote would be to bundle SAX and Expat, which will do for many uses. If > > they need more sophisticated XML, they can download the XML package to get > > DOM, XPath, XSLT, etc. > > My concern is that I don't consider the DOM "advanced". Hell, Visual > Basic and Javascript programmers can't even spell SAX but they all use > the DOM. If a new user asked me which to learn first, I'd say "the DOM" > because any semi-competent newbie can find their way around a tree(?) to > get the information they need whereas being smart enough to buffer the > right information in the right order takes a little more algorithmic > fore-though ('scuse me). > As far as I'm concerned, the DOM is absolutely basic. At least, it's what I turned to immediately when I started writing my editor. If a DOM isn't included in the standard XML package, would it be allowable to include it in every application that needs one? From uche.ogbuji@fourthought.com Tue Feb 1 15:02:37 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 01 Feb 2000 08:02:37 -0700 Subject: [XML-SIG] DevDay results In-Reply-To: Your message of "Tue, 01 Feb 2000 07:45:00 +0100." Message-ID: <200002011502.IAA04939@localhost.localdomain> > (I ought to introduce myself first - while I'm not working > on the XML modules myself, I've spent the past month working > on a XML editor for KDE, using Python.) [snip] > As far as I'm concerned, the DOM is absolutely basic. At least, it's what > I turned to immediately when I started writing my editor. If a DOM isn't > included in the standard XML package, would it be allowable to include > it in every application that needs one? I doubt anyone would disagree that the core of the DOM is basic, but as I've already witnessed elsewhere, if you got all these people together, there would be no easy consensus on what constitutes that core. We all seem to be agreed that 4DOM is (and even PyDOM would have been) too bulky to be bundled with Python. Most have also expressed that some DOM interface would be good for bundling with Python. Perhaps you can bring your fresh perspective to the question of exactly how we go about this. If you don't mind, take a look at the xml-sig thread beginning at: http://www.python.org/pipermail/xml-sig/1999-April/002712.html and Paul's final proposal at: http://www.python.org/pipermail/xml-sig/1999-April/002763.html Except for the hard-core DOM-haters, most of us liked Paul's proposal, and it is only time that has prevented us from building in a conversion layer from 4DOM to miniDOM. I think we should review Paul's proposal in the face of DOM Level 2, and come up with a miniDOM which _can_ be bundled with Python, knowing that miniDOM code could be easily migrated to 4DOM if bigger guns are needed. You'll also see a lot of dicussion of qp_xml in that thread. qp_xml is nice and lightweight, but my resistance to it (and others) is that it doesn't follow the XML Infoset, which, rightly or wrongly, makes many concessions to DOM. I'm sure we have all written our own quick and effective XML APIs (Mike and I have written our share). We moved to DOM, warts and all, because standardization and intellectual cohesiveness is more important than memory and processor footprint for a general API. -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From paul@prescod.net Tue Feb 1 20:14:58 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 01 Feb 2000 12:14:58 -0800 Subject: [XML-SIG] DevDay results References: <200002011502.IAA04939@localhost.localdomain> Message-ID: <38973EC2.704AD82B@prescod.net> Are congratulations in order yet, Uche? uche.ogbuji@fourthought.com wrote: > > ... > I doubt anyone would disagree that the core of the DOM is basic, but as I've > already witnessed elsewhere, if you got all these people together, there would > be no easy consensus on what constitutes that core. And the "core" would likely not be sufficient to support an XML editor. I am most interested in enabling XML->Foo transformations that require walking around the DOM tree. On the other hand, walking around the DOM tree without something like XPath is a little painful so maybe just providing the DOM is not so useful....decisions, decisions. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world´s greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From boud2@rempt.xs4all.nl Tue Feb 1 20:45:21 2000 From: boud2@rempt.xs4all.nl (Boudewijn Rempt (KDE test user)) Date: Tue, 1 Feb 2000 21:45:21 +0100 (CET) Subject: [XML-SIG] DevDay results In-Reply-To: <38973EC2.704AD82B@prescod.net> Message-ID: On Tue, 1 Feb 2000, Paul Prescod wrote: > Are congratulations in order yet, Uche? > > uche.ogbuji@fourthought.com wrote: > > > > ... > > I doubt anyone would disagree that the core of the DOM is basic, but as I've > > already witnessed elsewhere, if you got all these people together, there would > > be no easy consensus on what constitutes that core. > > And the "core" would likely not be sufficient to support an XML editor. > I was getting nicely underway with what I had - of course, I was only trying to build a simple editor, only showing the tree, node attributes and text nodes. But then, I'm very much a layman when it comes to these issues. From uche.ogbuji@fourthought.com Tue Feb 1 23:31:41 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 01 Feb 2000 16:31:41 -0700 Subject: [XML-SIG] DOM in Python 1.6 Message-ID: <200002012331.QAA06963@localhost.localdomain> Did Guido set a timetable for Python 1.6? What deadlines are we facing if we want to try to get a lightweight DOM into Python 1.6? -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From steve@renlabs.com Tue Feb 1 23:33:07 2000 From: steve@renlabs.com (Steven Work) Date: 01 Feb 2000 15:33:07 -0800 Subject: [XML-SIG] Please resolve external parameter entity references In-Reply-To: "Boudewijn Rempt's message of "Tue, 1 Feb 2000 21:45:21 +0100 (CET)" Message-ID: <87g0vcwlak.fsf@solano.in.renlabs.com> May I weigh in on the feature list question? For many purposes the core XML processor should resolve external parameter entity references; expat currently doesn't. W3C only *requires* this of a validating parser, and that appears to be expat's justfication for skipping them. I'd like to argue that a good general-use *non-validating* parser should do it too, at least optionally; and I don't think it would bloat code measurably or slow things down any when there are no external parameter entity references, or when the option is turned off. Why does this matter? Here's one example. I find myself logging (accumulating) information in XML-derived formats pretty frequently these days. The only way I know to do this in a strictly append-only and atomic way is this: 1. Start with a (unchanging) top-level document like "log.xml" here: %log.decls; ]> &log.ents; 2. For each "thing" to log, do these steps in order: a. Write a well-formed chunk of XML, valid within a entity, to a uniquely-named new file. b. Append something like this to "log.decls": c. Append something like this to "log.ents": &unique-name; If you can assume the writes in 2b and 2c are atomic (happen to completion without other writes to the same file intervening; for small writes on most systems this is an OK assumption) then "log.xml" remains valid at all times -- no need for locks or other interprocess communications to avoid scrambling the data, even with many processes writing data "simultaneously." But to process "log.xml" I have to fall back from the very-fast expat, usually to an ESIS parser chewing the data stream from nsgmls in a separate process (validating xmlproc works too but it's even slower). These systems don't need validating parsers, but the to my knowledge the XML developer community hasn't built any good non-validating parsers that don't just ignore external parameter entity references. Only they can't ignore them entirely (Section 5.1 of the W3C recommendation requires a non-validating parser to notice when it has chosen NOT to read an external parameter entity, so it can know at what point it is absolved of its responsibility to process entity declarations or attribute-list declarations that come later). So there's essentially no speed cost to having the *option* of reading external parameter entities, and choosing *not* to. And you're already expanding internally-declared parameter entities, so it won't add a measurable amount of code to do so from another file. I think I'm talking myself into patching expat. Would some kind soul please point out flaws in the above, so I can save myself the trouble? -- Steven Work Renaissance Labs steve@renlabs.com 360 647-1833 From paul@prescod.net Wed Feb 2 02:00:46 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 01 Feb 2000 18:00:46 -0800 Subject: [XML-SIG] Please resolve external parameter entity references References: <87g0vcwlak.fsf@solano.in.renlabs.com> Message-ID: <38978FCE.86BC3FB2@prescod.net> The current experimental (beta) version of expat resolves paramater entities. This "test version" is now about 6 months old and was last updated in October. I have no way of knowing if there are any known bugs in it. http://www.oasis-open.org/cover/news1999Q2.html -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world´s greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From wunder@infoseek.com Wed Feb 2 17:12:21 2000 From: wunder@infoseek.com (Walter Underwood) Date: Wed, 02 Feb 2000 09:12:21 -0800 Subject: [XML-SIG] Please resolve external parameter entity references In-Reply-To: <87g0vcwlak.fsf@solano.in.renlabs.com> References: <"Boudewijn Rempt's message of "Tue, 1 Feb 2000 21:45:21 +0100 (CET)"> Message-ID: <4.3.0.33.1.20000202090220.00cd5650@corp.infoseek.com> At 03:33 PM 2/1/00 -0800, Steven Work wrote: >I find myself logging (accumulating) information in XML-derived >formats pretty frequently these days. The only way I know to do this >in a strictly append-only and atomic way is this: This exact problem has been discussed a few times on xml-dev. The way to break out of it is to look at each log entry as a separate document, rather than try to make the entire log one document. To separate the documents in the log, use a character not allowed in XML. Formfeed is a fine choice, since it even means "next page" which is pretty close to the semantics wanted. Using a character that doesn't appear in XML means that even if a partial write is made to the file, the log can be re-sync'ed at the beginning of the next log entry. So the penalty for non-atomic writes is lessened (from "partial write wrecks the whole file" to "partial write wrecks one entry"). So an entry looks like this: 2000-01-02T08:30:22 wunder logon [formfeed] But the xml declaration and doctype are optional, so the space-conscious logger can do this: 2000-01-02T08:30:22 wunder logon [formfeed] Or even lose the ignorable whitespace and put it all on one line. wunder -- Walter R. Underwood Senior Staff Engineer Infoseek Software GO Network, part of The Walt Disney Company wunder@infoseek.com http://software.infoseek.com/cce/ (my product) http://www.best.com/~wunder/ 1-408-543-6946 From hansv@net4all.be Thu Feb 3 16:43:07 2000 From: hansv@net4all.be (hansv@net4all.be) Date: Thu, 3 Feb 2000 17:43:07 +0100 Subject: [XML-SIG] XML on the Mac Message-ID: <9E725728E7D0D311B24D00508B6A05430A34BC@PDC> Hi, is there anybody who could point me in the right direction to install the Python-XML package on a mac. I have no Codewarrior or any other compiler, so binaries would be appreciated. I'm planning on using the 4DOM package. Any help is appreciated, Hans From uche.ogbuji@fourthought.com Fri Feb 4 02:57:22 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 03 Feb 2000 19:57:22 -0700 Subject: [XML-SIG] XML on the Mac In-Reply-To: Your message of "Thu, 03 Feb 2000 17:43:07 +0100." <9E725728E7D0D311B24D00508B6A05430A34BC@PDC> Message-ID: <200002040257.TAA01589@localhost.localdomain> > is there anybody who could point me in the right direction to install the > Python-XML package on a mac. > > I have no Codewarrior or any other compiler, so binaries would be > appreciated. I'm planning on using the 4DOM package. 4DOM should work just fine with xmlproc and the SAX package, both of which are pure Python, are developed by Lars Marius Garshol, and can be obtained from his web site independently from the XML package. See http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/index.html -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From dwallace@udel.edu Fri Feb 4 17:45:04 2000 From: dwallace@udel.edu (Dave Wallace) Date: Fri, 04 Feb 2000 12:45:04 -0500 Subject: [XML-SIG] XML-SIG and 4DOM Message-ID: <389B1020.19D48865@udel.edu> Hello, I am beginning a project that will be using Python to manipulate a series of HTML and XML documents. My first thought was of course to check out the xml-sig here, but I also see that there is a another implementation. Everyone seems to be co-existing well enough, but I am confused as to which I should be using, are the xml-sig tools ready for use? Is there a comparison of the two somewhere? Dave. -- ************************************* * Dave Wallace (dwallace@udel.edu) * * MIS-TRG, University of Delaware * ************************************* From uche.ogbuji@fourthought.com Fri Feb 4 18:52:17 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 04 Feb 2000 11:52:17 -0700 Subject: [XML-SIG] XML-SIG and 4DOM In-Reply-To: Your message of "Fri, 04 Feb 2000 12:45:04 EST." <389B1020.19D48865@udel.edu> Message-ID: <200002041852.LAA04248@localhost.localdomain> > I am beginning a project that will be using Python to manipulate a > series of HTML and XML documents. My first thought was of course to > check out the xml-sig here, but I also see that there is a another > implementation. Everyone seems to be co-existing well enough, but I am > confused as to which I should be using, are the xml-sig tools ready for > use? Is there a comparison of the two somewhere? The XML SIG has actually adopted 4DOM, 4XSLT and 4XPath, and once we sort out some details such as the packaging, they will be in the xml-sig distribution. I would say it's pretty "safe" to just go ahead with 4DOM except for the point that as it is now packaged, you would use it in such like: import Ft.Dom ... While as part of the xml-sig distro it will probably be import xml.dom ... At least for a while, we shall probably maintain a version using the old packaging, and if the needs of Fourthought and its clients diverges from the sig, there will probably be closely parallel versions for a while. The key thing is that if your code depends on the "Ft.Dom" form, you needn't worry about having to hack it all into the "xml.dom" form for a while. Hopefully this didn't just confuse you further. -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From hansv@net4all.be Mon Feb 7 10:45:56 2000 From: hansv@net4all.be (hansv@net4all.be) Date: Mon, 7 Feb 2000 11:45:56 +0100 Subject: [XML-SIG] Problem using 4DOM for xml parsing Message-ID: <9E725728E7D0D311B24D00508B6A05430A34BE@PDC> Hi, I can't seem to get 4DOM for xml parsing working for me, when I try the demo "python dom_from_xml_file.py addr_book1.xml" (I ran it with a script passing "read_xml_from_file('Ft/Dom/demo/addr_book1.xml')" from idle). I get following errors. Traceback (innermost last): File "E:\Python\Tools\idle\ScriptBinding.py", line 131, in run_module_event execfile(filename, mod.__dict__) File "E:\Python\Ft\Dom\demo\dom_from_xml_file.py", line 22, in ? read_xml_from_file('Ft/Dom/demo/addr_book1.xml') File "E:\Python\Ft\Dom\demo\dom_from_xml_file.py", line 7, in read_xml_from_file xml_dom_object = Sax.FromXmlFile(fileName, validate=0) File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 155, in FromXmlFile rv = FromXmlStream(fp,ownerDocument,validate,keepAllWs,catName,saxHandlerClass) File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 135, in FromXmlStream parser.parseFile(stream) File "E:\Python\xml\sax\drivers\pylibs.py", line 32, in parseFile self.feed(buf) File "E:\Python\xml\sax\drivers\drv_xmllib.py", line 68, in feed xmllib.XMLParser.feed(self,data) File "E:\Python\Lib\xmllib.py", line 149, in feed self.goahead(0) File "E:\Python\Lib\xmllib.py", line 240, in goahead k = self.parse_starttag(i) File "E:\Python\Lib\xmllib.py", line 609, in parse_starttag self.finish_starttag(nstag, attrdict, method) File "E:\Python\Lib\xmllib.py", line 646, in finish_starttag self.unknown_starttag(tagname, attrdict) File "E:\Python\xml\sax\drivers\drv_xmllib.py", line 24, in unknown_starttag self.doc_handler.startElement(tag,saxutils.AttributeMap(attributes)) File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 71, in startElement self.__completeTextNode() File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 51, in __completeTextNode self.__nodeStack[-1].appendChild(new_text) File "E:\Python\Ft\Dom\Document.py", line 223, in appendChild return Node.appendChild(self,newChild) File "E:\Python\Ft\Dom\Node.py", line 225, in appendChild self._4dom_validateNode(newChild) File "E:\Python\Ft\Dom\Node.py", line 298, in _4dom_validateNode raise DOMException(HIERARCHY_REQUEST_ERR) I get similar errors with Python on Mac. I'm a newbie to Python and probably forgot to install something. Could you please send me a list of things I should have installed to get this working. Any help is appreciated, Hans verschooten From mmc@r-l.de Mon Feb 7 19:55:23 2000 From: mmc@r-l.de (Morten M. Christensen) Date: Mon, 07 Feb 2000 11:55:23 -0800 Subject: [XML-SIG] Installing the XML Toolkit on Windows ? Message-ID: <389F232B.952509C1@r-l.de> Hi, Is there a recompiled version of the XML Toolkit that one can use for Windows? Thanks in advance! Cheers, Morten Christensen From paul@prescod.net Mon Feb 7 20:50:11 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 14:50:11 -0600 Subject: [XML-SIG] PyExpat update Message-ID: <389F3003.FE2DCA77@prescod.net> I did some work on pyexpat over the weekend. Modulo bugs I have introduced, I think that my changes so far have all been backwards compatible. I list my new features at the bottom of this message. Before I release, I want some xml-sig opinions on things I would like to change that are NOT backwards compatible. 1. Attributes would be returned as a mapping {key:value, key:value} and not a list [key,value,key,value] . Obviously this will break code that expected the former. 2. Errors will be returned as strings, not integers. You can check for string equality using "==" The intention is not that you would hard-code strings into your code, but would rather use pre-defined string constants: foo = parser.Parse( data ) if foo is pyexpat.unclosed_token: print "Oops:"+pyexpat.unclosed_token IIRC, Python is smart about checking for pointer equality before string equality, right?) 3. There will be no list of exceptions in the modules interface. Here's what it looks like now: >>> import pyexpat >>> for name in dir( pyexpat ): ... if name[0:3]=="XML": ... print name, getattr( pyexpat, name ) ... XML_ERROR_ASYNC_ENTITY 13 XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF 16 XML_ERROR_BAD_CHAR_REF 14 XML_ERROR_BINARY_ENTITY_REF 15 XML_ERROR_DUPLICATE_ATTRIBUTE 8 XML_ERROR_INCORRECT_ENCODING 19 XML_ERROR_INVALID_TOKEN 4 XML_ERROR_JUNK_AFTER_DOC_ELEMENT 9 XML_ERROR_MISPLACED_XML_PI 17 XML_ERROR_NONE 0 XML_ERROR_NO_ELEMENTS 3 XML_ERROR_NO_MEMORY 1 XML_ERROR_PARAM_ENTITY_REF 10 XML_ERROR_PARTIAL_CHAR 6 XML_ERROR_RECURSIVE_ENTITY_REF 12 XML_ERROR_SYNTAX 2 XML_ERROR_TAG_MISMATCH 7 XML_ERROR_UNCLOSED_TOKEN 5 XML_ERROR_UNDEFINED_ENTITY 11 XML_ERROR_UNKNOWN_ENCODING 18 I would rather move all of these to an "errors" dictionary so they don't clutter up the main module namespace (after converting them to strings instead of integers). ----------------- Here are the new features I have already added. * more handlers: StartElement, EndElement, ProcessingInstruction, CharacterData, UnparsedEntityDecl, NotationDecl, StartNamespaceDecl, EndNamespaceDecl, Comment, StartCdataSection, EndCdataSection, Default, * new error handling: setjmp/longjmp is gone exceptions are propogated properly even on Windows I believe the new code is thread-safe. * ParseFile: now possible to parse an open file or file-like object. * bug fixes: setattr throws an proper exeption when you do a bad assignment setjmp/longjmp works on Windows * new bugs: ??? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Feb 7 20:54:58 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 14:54:58 -0600 Subject: [XML-SIG] Pyexpat error handling Message-ID: <389F3122.D1E9BF57@prescod.net> I'd like to improve the error handling in one case but am not sure how. >>> from pyexpat import ParserCreate >>> p=ParserCreate() >>> p.StartElementHandler=lambda x:x >>> p.ParseFile( open( "../hamlet.xml" ) ) Traceback (innermost last): File "", line 1, in ? TypeError: too many arguments; expected 1, got 2 You see how it looks like it was the ParseFile that had too many arguments but really it was the call to the callback. I'm not sure of the best way to make this more clear. Perhaps add a bogus traceback entry??? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Feb 7 21:21:47 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 15:21:47 -0600 Subject: [XML-SIG] Re: PyExpat update References: <389F1CD5.102E6757@prescod.net> <14495.8295.588575.301610@weyr.cnri.reston.va.us> Message-ID: <389F376B.2F77F810@prescod.net> I'll take this to xml-sig where I meant to post in the first place. "Fred L. Drake, Jr." wrote: > > Paul Prescod writes: > > 1. Attributes would be returned as a mapping {key:value, key:value} and > > not a list [key,value,key,value] . Obviously this will break code that > > expected the former. > > This is good. > > > 2. Errors will be returned as strings, not integers. You can check for > > string equality using "==" The intention is not that you would hard-code > > strings into your code, but would rather use pre-defined string > > constants: > > Please explain *why* you need this change; could the constants not > still be numbers? (I'm not saying they *should* be numbers, just > trying to understand the rationale for the change.)they're just IDs, Well, why use numbers? The numbers are meaningless. Strings are at least meaningful for some percentage of the world. > > foo = parser.Parse( data ) > > if foo is pyexpat.unclosed_token: > > print "Oops:"+pyexpat.unclosed_token > > Are the strings the error messages or some sort of identifier? If > they're IDs, this code fragment doesn't make sense. If they're > messages, you tie the C component to a specific (human) language.\ They are both messages and identifiers. As you can see above they can be used as "dumb" identifiers (just like the integers) and they can be used as strings if you happen to want to output English error messages (which will be the case in the vast majority of situations just because most programmers are too lazy/busy to localize). > My inclination is to stick with IDs (numeric or string) and map that > to natural language in the application. If you want to map in your application, you can do that. If you want to print out the string, you can do that too. Think of them as IDs that have a __str__ that happens to be English readable. Oh, and they happen to be implemented as Python strings. :) > > 3. There will be no list of exceptions in the modules interface. Here's > > what it looks like now: > ... > > I would rather move all of these to an "errors" dictionary so they don't > > clutter up the main module namespace (after converting them to strings > > instead of integers). > > So what's the dictionary look like? I imagine something like: > > errors = { > "XML_ERROR_SYNTAX": "Syntax error!", > ... > } > > or are the integers still there? No integers. On second thought, instead of a dictionary I'll use an instance so that you can say if rv == errors.XML_ERROR_SYNTAX: ... > > setattr throws an proper exeption when you do a bad assignment > > setjmp/longjmp works on Windows > > So is setjmp/longjmp still used, or not? No. I meant to say that handler error reporting now works on Windows. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Feb 7 21:30:10 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 15:30:10 -0600 Subject: [XML-SIG] Re: PyExpat update References: <389F1CD5.102E6757@prescod.net> <3daelcg4s9.fsf@amarok.cnri.reston.va.us> Message-ID: <389F3962.43AF77A3@prescod.net> "Andrew M. Kuchling" wrote: > > I'd really much rather write: > > if foo is pyexpat.UNCLOSED_TOKEN: > print 'Oops:', pyexpat.errors[ foo ] > > That makes it clear that UNCLOSED_TOKEN is a constant. (Losing the > 'XML_' prefix from all the errors is definitely a good idea; losing > the 'ERROR_' prefix might be not. The above might be clearer if it > was pyexpat.ERROR_UNCLOSED_TOKEN or UNCLOSED_TOKEN_ERROR.) Upper case is one issue. Naming is a second. A third is whether the referent is an integer or a string. In your example above you make no use of the fact that it is an integer and it could just as easily be a string. The only thing making an integer does is force an extra list lookup in the common case of wanting to report the string error. if foo is pyexpat.UNCLOSED_TOKEN: print 'Oops:', foo > I have no problem with cluttering the module's namespace with error > constants, if that's the only reason for the change. How would you > code error checks with an 'errors' dictionary? Well I've been thinking it should be an instance instead of a dictionary if foo is pyexpat.errors.UNCLOSED_TOKEN: print 'Ooops:', foo Note that there are NO occurrences of dependence on the English "spelling" of these messages in the code. If you want to localize for spanish then you just do: spanish_errors={pyexpat.errors.UNCLOSED_TOKEN: "Something in Spanish", ...} if foo is pyexpat.errors.UNCLOSED_TOKEN: print 'Ooops:', spanish_errors[ foo ] -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Mon Feb 7 23:12:22 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 17:12:22 -0600 Subject: [XML-SIG] Re: PyExpat update References: <389F1CD5.102E6757@prescod.net> Message-ID: <389F5156.F69D9CC9@prescod.net> Mark C Favas wrote: > > Is there any chance that pyexpat could handle DTDs and thus default values for > attributes (I believe there was a test version of expat that added this > capability...) Yes, it makes sense to use the version of expat with external subset support. > I'd like to use the SAX interface to that to spped parsing up - > I currently use the validating xmlproc part of the PyXML-0.5.3 package. Pyexpat should meet your needs soon. > >setjmp/longjmp is gone > >setjmp/longjmp works on Windows > > Umm - did setjmp/longjmp come back? No, just a think-o. Error reporting from handlers now works on windows. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From uche.ogbuji@fourthought.com Tue Feb 8 00:10:27 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 07 Feb 2000 17:10:27 -0700 Subject: [XML-SIG] Pyexpat error handling In-Reply-To: Your message of "Mon, 07 Feb 2000 14:54:58 CST." <389F3122.D1E9BF57@prescod.net> Message-ID: <200002080010.RAA09581@localhost.localdomain> > >>> from pyexpat import ParserCreate > >>> p=ParserCreate() > >>> p.StartElementHandler=lambda x:x > >>> p.ParseFile( open( "../hamlet.xml" ) ) > Traceback (innermost last): > File "", line 1, in ? > TypeError: too many arguments; expected 1, got 2 > > You see how it looks like it was the ParseFile that had too many > arguments but really it was the call to the callback. I'm not sure of > the best way to make this more clear. Perhaps add a bogus traceback > entry??? If I'm following correctly, we often run into this problem with C/Python call-backs. We usually pass back an error-code that we can use to generate a custom exception. I suppose adding a traceback entry would be another approach. I'm curious as to how well that would work. -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From paul@prescod.net Tue Feb 8 01:15:11 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 19:15:11 -0600 Subject: [XML-SIG] Pyexpat error handling References: <200002080010.RAA09581@localhost.localdomain> Message-ID: <389F6E1F.F1539944@prescod.net> uche.ogbuji@fourthought.com wrote: > > If I'm following correctly, we often run into this problem with C/Python > call-backs. We usually pass back an error-code that we can use to generate a > custom exception. Okay, but can you distinguish the TypeError generated from a bad arglist from a regular TypeError in the code? If it actually got into the code then you have a decent traceback and I'd rather not blow it away. > I suppose adding a traceback entry would be another > approach. I'm curious as to how well that would work. Me too. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From paul@prescod.net Tue Feb 8 01:24:21 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 07 Feb 2000 19:24:21 -0600 Subject: [XML-SIG] Installer Message-ID: <389F7045.CAE6C5D9@prescod.net> People are really nervous about installing the xml package on Windows. Why don't we ask Christian Tismer to keep his Windows installer up to date for us and then link to it from the Python XML web page ftp://ftp.pns.cc/pub/xml/PythonXML.EXE While I am at it, how hard would it be to add Python as a "special topic" along with JPython, Tkinter and (???) Emacs support on the main python.org page. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "If I say something, yet it does not fill you with the immediate burning desire to voluntarily show it to everyone you know, well then, it's probably not all that important." - http://www.bespoke.org/viridian/ From larsga@garshol.priv.no Tue Feb 8 08:19:18 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 08 Feb 2000 09:19:18 +0100 Subject: [XML-SIG] Installing the XML Toolkit on Windows ? In-Reply-To: <389F232B.952509C1@r-l.de> References: <389F232B.952509C1@r-l.de> Message-ID: * Morten M. Christensen | | Is there a recompiled version of the XML Toolkit that one can use | for Windows? Depends on what you mean by the XML Toolkit, but in the standard XML-SIG package there are precompiled versions of the C tools for Windows. --Lars M. From fdrake@acm.org Tue Feb 8 14:38:45 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 8 Feb 2000 09:38:45 -0500 (EST) Subject: [XML-SIG] Re: PyExpat update In-Reply-To: <389F376B.2F77F810@prescod.net> References: <389F1CD5.102E6757@prescod.net> <14495.8295.588575.301610@weyr.cnri.reston.va.us> <389F376B.2F77F810@prescod.net> Message-ID: <14496.10869.954432.391776@weyr.cnri.reston.va.us> Paul Prescod writes: > Well, why use numbers? The numbers are meaningless. Strings are at least > meaningful for some percentage of the world. Paul, If they are identifiers, they are meaningless regardless. They can only be used as messages if they are natural language, which doesn't appeal to me. As long as they're identifiers, I think it's fine for them to be strings; I really am not *advocating* the use of numbers. I do think that API changes to a known-working module need to be justified in some way. > They are both messages and identifiers. As you can see above they can be > used as "dumb" identifiers (just like the integers) and they can be used > as strings if you happen to want to output English error messages (which > will be the case in the vast majority of situations just because most > programmers are too lazy/busy to localize). What I'm disturbed by is the conflation of use. I'd rather see some identifier be used and let the user take care of *all* messages provided to the user. A "default" set of English messages can (and should) be provided, but it's better to ask the client code to perform some transformation (dictionary lookup, whatever the guise); this allows better flexibility both for application writers and for future maintainers of the pyexpat module. > On second thought, instead of a dictionary I'll use an instance so that > you can say > > if rv == errors.XML_ERROR_SYNTAX: > ... That's a bit nicer. I'm not sure that the namespace needs to be separated from the module namespace, but I don't object, either. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guglielmetti@dynabits.com Thu Feb 10 09:32:34 2000 From: guglielmetti@dynabits.com (guglielmetti@dynabits.com) Date: Thu, 10 Feb 2000 10:32:34 +0100 Subject: [XML-SIG] XBEL tool and remard about its DTD Message-ID: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> C'est un message de format MIME en plusieurs parties. ------=_NextPart_000_0007_01BF73B3.984B86D0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit I wrote an XBEL export template for the excellent Compass bookmarks manager (on MS Windows...)(http://www.softgauge.com/compass/) You can download this file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/files/. By the way, the problem I have is that XBEL DTD at http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support "european" accented characters such as éàîö... I think it should as long as some billions people on this Earth will use a different language than English... Philippe Guglielmetti http://i.am/goulu/ "C'est de la folie, mais Courtines 16 goulu@i.am avec de la méthode" 1242 Satigny (GE) +41 22 753 4138 Suisse ICQ 30265921 (Hamlet, Acte 2 Scene 2) ------=_NextPart_000_0007_01BF73B3.984B86D0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

I wrote = an XBEL export=20 template for the excellent Compass bookmarks manager (on MS = Windows...)(http://www.softgauge.com/compa= ss/)=20 You can download this file (XBEL.TPL) from my page http://membres.trip= od.fr/Guglielmetti/files/.

By the = way, the=20 problem I have is that XBEL DTD at http://www.py= thon.org/topics/xml/dtds/xbel-1.0.dtd does=20 not support "european" accented characters such as =E9=E0=EE=F6... I = think it should as=20 long as some billions people on this Earth will use a different language = than=20 English...

Philippe = Guglielmetti    http://i.am/goulu/  =20 "C'est de la folie,  mais
Courtines=20 16            = ;    =20 goulu@i.am        avec de la = m=E9thode"=20
1242 Satigny (GE)         = +41 22 753=20 4138      
Suisse    = ;            =      =20 ICQ 30265921      (Hamlet, Acte 2 Scene = 2)
=20

------=_NextPart_000_0007_01BF73B3.984B86D0-- From fdrake@acm.org Thu Feb 10 15:04:31 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 10 Feb 2000 10:04:31 -0500 (EST) Subject: [XML-SIG] XBEL tool and remard about its DTD In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> Message-ID: <14498.54143.475911.873929@weyr.cnri.reston.va.us> goulu@i.am writes: > I wrote an XBEL export template for the excellent Compass bookmarks = manager > (on MS Windows...)(http://www.softgauge.com/compass/) You can downlo= ad this > file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/f= iles/. Philippe, Cool! Do you mind if I provide a link to this from the XBEL pages on python.org? > By the way, the problem I have is that XBEL DTD at > http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support > "european" accented characters such as =E9=E0=EE=F6... I think it sh= ould as long as > some billions people on this Earth will use a different language tha= n > English... This I don't understand; what's missing that needs to be added to the DTD? This is XML, so the character set is Unicode. Now, if it's the *tools* that don't support a wide range of encodings, that I do understand. I think this will be fixed when Python provides direct support for Unicode in the core. I'll fix them=20= myself if I have to! -Fred -- Fred L. Drake, Jr.=09 Corporation for National Research Initiatives From fdrake@acm.org Thu Feb 10 16:17:00 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 10 Feb 2000 11:17:00 -0500 (EST) Subject: [XML-SIG] XBEL tool and remard about its DTD In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> References: <14498.54143.475911.873929@weyr.cnri.reston.va.us> <000f01bf73de$ab0ad110$bfc0e6c2@HYDRE> <14498.57205.913845.298240@weyr.cnri.reston.va.us> <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> Message-ID: <14498.58492.684421.784894@weyr.cnri.reston.va.us> goulu@i.am writes: > I wrote an XBEL export template for the excellent Compass bookmarks manager > (on MS Windows...)(http://www.softgauge.com/compass/) You can download this > file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/files/. Fred L. Drake, Jr. writes: > Do you have a URL for Compass? Sheesh, I can't read today. Nevermind.... ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From wunder@infoseek.com Thu Feb 10 17:32:02 2000 From: wunder@infoseek.com (Walter Underwood) Date: Thu, 10 Feb 2000 09:32:02 -0800 Subject: [XML-SIG] XBEL tool and remard about its DTD In-Reply-To: <14498.54143.475911.873929@weyr.cnri.reston.va.us> References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> Message-ID: <4.3.0.40.1.20000210093037.00d65b60@corp.infoseek.com> At 10:04 AM 2/10/00 -0500, Fred L. Drake, Jr. wrote: > > By the way, the problem I have is that XBEL DTD at > > http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support > > "european" accented characters such as =E9=E0=EE=F6... I think it= should as long as > > some billions people on this Earth will use a different language than > > English... > > This I don't understand; what's missing that needs to be added to >the DTD? This is XML, so the character set is Unicode. Some DTDs (like XHTML) provide entities for those characters, like ö. The characters are supported, but you may need to enter them as numeric references. wunder -- Walter R. Underwood Senior Staff Engineer Infoseek Software GO Network, part of The Walt Disney Company wunder@infoseek.com http://software.infoseek.com/cce/ (my product) http://www.best.com/~wunder/ 1-408-543-6946 From paul@prescod.net Thu Feb 10 19:25:55 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 10 Feb 2000 11:25:55 -0800 Subject: [XML-SIG] XBEL tool and remard about its DTD References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> Message-ID: <38A310C3.A346049@prescod.net> > goulu@i.am wrote: > > By the way, the problem I have is that XBEL DTD at > http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support > "european" accented characters such as éàîö... I think it should as > long as some billions people on this Earth will use a different > language than English... A DTD cannot prohibit you from using a non-English character. You can either do it directly with a Unicode text editor or you can use &#some_unicode_number; syntax. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From Finlay.Thompson@MCS.VUW.AC.NZ Fri Feb 11 01:21:18 2000 From: Finlay.Thompson@MCS.VUW.AC.NZ (Finlay Thompson) Date: Fri, 11 Feb 2000 14:21:18 +1300 Subject: [XML-SIG] Swig, Xerces and python, Message-ID: <00021114295502.02347@delta.mcs.vuw.ac.nz> Hi there, Im just at the stage of choosing tools and I would greatly appreciate advice: The task is to upgrade an existing and busy internet news site. The problem is that the existing formating is all in perl and very rigid. The idea is to create a XPath interface onto the existing database, leave all the publishing software intact, and provide an XML front end for the graphic people to work with. The existing system is running on a FreeBSD server with Apache and lots of perl. My experience, and that of others in our group, is with python, so we want to use python tools. After looking at the xml.apache.org site I had the idea of running the Xerces C++ XML parser, that already supports dom and sax and .... , through SWIG to produce a python interface. What do people think? Does anyone know a what is wrong with Xerces?(apart from not having a python interface) Finlay. From gstein@lyra.org Fri Feb 11 03:57:47 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 10 Feb 2000 19:57:47 -0800 (PST) Subject: [XML-SIG] Swig, Xerces and python, In-Reply-To: <00021114295502.02347@delta.mcs.vuw.ac.nz> Message-ID: If Xerces is not a requirement (it appears that you have only recently decided to use it), then I might recommend Expat and the PyExpat module. You'll have your XML parsing and Python interface as quick as you can install them :-) Using the XML-SIG release, you'll also have a DOM to work with. SAX operates inside of there, constructing the DOM -- you won't really need to worry about it (unless you want to skip the DOM). Fourthought has got a DOM and, IIRC, an XPath implementation. I think it is all in Python, but there may be some remaining C cruft. You'll have to follow up on that. Anyhow... in a nutshell: I think there are ample alternatives without going and fooling around with a C++ XML Parser, SWIG, and developing your own Python/C Extension. Cheers, -g On Fri, 11 Feb 2000, Finlay Thompson wrote: > Hi there, > > Im just at the stage of choosing tools and I would greatly appreciate advice: > > The task is to upgrade an existing and busy internet news site. The problem is > that the existing formating is all in perl and very rigid. The idea is to > create a XPath interface onto the existing database, leave all the publishing > software intact, and provide an XML front end for the graphic people to work > with. > > The existing system is running on a FreeBSD server with Apache and lots of perl. > My experience, and that of others in our group, is with python, so we want to > use python tools. > > After looking at the xml.apache.org site I had the idea of running the Xerces > C++ XML parser, that already supports dom and sax and .... , through SWIG to > produce a python interface. > > What do people think? Does anyone know a what is > wrong with Xerces?(apart from not having a python interface) > > Finlay. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > -- Greg Stein, http://www.lyra.org/ From larsga@garshol.priv.no Mon Feb 14 08:08:38 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 14 Feb 2000 09:08:38 +0100 Subject: [XML-SIG] XBEL tool and remard about its DTD In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE> Message-ID: * goulu@i.am | | By the way, the problem I have is that XBEL DTD at | http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support | "european" accented characters such as éàîö... I think it should as | long as some billions people on this Earth will use a different | language than English... I guess the problem you've run into is that you've produced something like this: Here is an accented char: é. That this causes a problem has nothing to do with the XBEL DTD, but rather with the fact that conforming XML parsers must assume that this document is UTF-8-encoded, and your 'é.' is not a legal UTF-8 bit sequence, hence the problems. So if you do like this instead, everything should be fine (provided I've guessed correctly what your problem is): Here is an accented char: é. --Lars M. From gstein@lyra.org Tue Feb 15 03:01:17 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 14 Feb 2000 19:01:17 -0800 (PST) Subject: [XML-SIG] Re: qp_xml check-in In-Reply-To: <200001241745.MAA13251@amarok.cnri.reston.va.us> Message-ID: On Mon, 24 Jan 2000, Andrew M. Kuchling wrote: > You don't seem to have checked in qp_xml.py into the XML-SIG's CVS > tree. Going to? (And have you decided between xml.parsers and > xml.utils ?) I've checked this into xml.utils, along with an update to CREDITS and LICENCE. I've got doc due to Fred, so qp_xml doc will be deferred for a bit; I left a marker in TODO. Since it isn't truly a parser, it made a bit more sense under utils. Cheers, -g -- Greg Stein, http://www.lyra.org/ From FightHunger@4mycommunity.com Fri Feb 18 10:50:47 2000 From: FightHunger@4mycommunity.com (Fight Hunger) Date: Fri, 18 Feb 2000 02:50:47 -0800 Subject: [XML-SIG] Every Click Counts Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01BF79FE.05B2924A Content-Type: text/plain; charset="iso-8859-1" Every 3.6 seconds someone in the world dies of hunger -- 75% of these deaths are children under 5. Make A FREE DONATION to fight hunger by visiting http://www.4mycommunity.com/online/ent/wfp.asp?tag=sc217ei106716 and clicking on one of our "Every Click Counts" links. Each click buys a hungry person 1.5 cups of a staple food. * It costs you nothing * We don't ask you for any personal information * All donations go to The United Nations World Food Programme (http://www.wfp.org) Thank you, FightHunger@4MyCommunity.com P.S. If you think this is a good idea, please pass this message to a friend. P.P.S. You can also support 300,000 schools and churches via "Every Click Counts" or online shopping. See http://www.4MyCommunity.com for details. To be removed from this mailing list, please reply with the word "Unsubscribe" in the subject. ------_=_NextPart_001_01BF79FE.05B2924A Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Every Click Counts

Every 3.6 seconds someone in the world dies of hunger = -- 75% of these deaths are children under 5.

Make A FREE DONATION to fight hunger by visiting http://www.4mycommunity.com/online/ent/wfp.asp?tag=3Ds= c217ei106716 and clicking on one of our "Every Click = Counts" links.

Each click buys a hungry person 1.5 cups of a staple = food.

* It costs you nothing
* We don't ask you for any personal = information
* All donations go to The United Nations World Food = Programme (http://www.wfp.org)


Thank you,

FightHunger@4MyCommunity.com

P.S. If you think this is a good idea, please pass = this message to a friend.

P.P.S. You can also support 300,000 schools and = churches via "Every Click Counts" or online shopping. See http://www.4MyCommunity.com for = details.

To be removed from this mailing list, please reply = with the word "Unsubscribe" in the subject.

------_=_NextPart_001_01BF79FE.05B2924A-- From paul@prescod.net Fri Feb 18 15:34:03 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 07:34:03 -0800 Subject: [XML-SIG] DOM and Proxies Message-ID: <38AD666B.E5C3E20C@prescod.net> I propose that for Python 1.6 we define a generic Proxy mechanism with the following properties: You wrap an object by calling Proxy( object ). For an object to be proxy-wrappable it must have an __unlink__ method. The object that you pass to the original Proxy() call is calld the LemmingLeader. Proxies proxy all method calls, field accesses and tp_...methods. When a field is accessed or a method called it looks at the returned object. If it is proxy-wrappable, (e.g. a DOM or grove node) it is wrapped. If it isn't, (e.g. an integer) it isn't. Proxied objects have "families". All objects in a family live for the same length of time. Families are expected to be completely internally linked. There is one proxy "family" for every LemmingLeader (created through an explicit call to the proxy method) (e.g. one per DOM). There is a hidden "proxy family object" -- it is used only for its refcount and its reference to the patriarch. When a proxy generates a proxy, it passes a reference to the family object. When all proxies go away (the user is no longer interested in the object family) the family object calls the LemmingLeader's __unlink__ method which is presumed to unlink the object and recursively unlink and thus destroy all children. Proxies have an __realnode__ method to get back the real, real node. If you hold a real reference to a real node and throw away the last proxy then you will find that everything in that node's family except the node is gone. All of the proxy stuff is implemented in C so that it is very efficient. Proxied objects can be implemented in C or Python. We would use this class for both xml.dom and a minidom in the standard library. It would also be usable from Pyxie, groves easysax, and anywhere else that reference counting of cyclic objects is necessary. Opinions? We could actually sneak this class into a C-coded minidom library (built directly on top of expat) for use by anyone who knows it is there. No, I am not volunteering to do it -- at least not for another several weeks.` -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From paul@prescod.net Fri Feb 18 15:53:56 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 07:53:56 -0800 Subject: [XML-SIG] Minidom proposal Message-ID: <38AD6B14.F4F7E84E@prescod.net> I propose the following interface for a module that would go into Python 1.6 (excuse my IDLish shorthand) class Node : [List of Node] childNodes Node parent class Document(Node): Element documentElement class Attribute(Node): string namespaceURI string prefix string localName string value element ownerElement class Element(Node): string tagname # check what the DOM does with namespaces {Dictionary of Name->Value} attributes GetElementsByTagName( tagname ) -> List[Node]: getElementsByTagNameNS( DOMString namespaceURI, DOMString localName) -> NodeList string namespaceURI string prefix string localName class Comment(Node): String data class ProcessingInstruction(Node): String target String data class Text( Node ): String data All properties could be read-write but there would be no special cut and paste/clone methods. Opinions? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From paul@prescod.net Fri Feb 18 15:54:03 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 07:54:03 -0800 Subject: [XML-SIG] DOM and Proxies Message-ID: <38AD6B1B.C57AD16B@prescod.net> I propose that for Python 1.6 we define a generic Proxy mechanism with the following properties: You wrap an object by calling Proxy( object ). For an object to be proxy-wrappable it must have an __unlink__ method. The object that you pass to the original Proxy() call is calld the LemmingLeader. Proxies proxy all method calls, field accesses and tp_...methods. When a field is accessed or a method called it looks at the returned object. If it is proxy-wrappable, (e.g. a DOM or grove node) it is wrapped. If it isn't, (e.g. an integer) it isn't. Proxied objects have "families". All objects in a family live for the same length of time. Families are expected to be completely internally linked. There is one proxy "family" for every LemmingLeader (created through an explicit call to the proxy method) (e.g. one per DOM). There is a hidden "proxy family object" -- it is used only for its refcount and its reference to the patriarch. When a proxy generates a proxy, it passes a reference to the family object. When all proxies go away (the user is no longer interested in the object family) the family object calls the LemmingLeader's __unlink__ method which is presumed to unlink the object and recursively unlink and thus destroy all children. Proxies have an __realnode__ method to get back the real, real node. If you hold a real reference to a real node and throw away the last proxy then you will find that everything in that node's family except the node is gone. All of the proxy stuff is implemented in C so that it is very efficient. Proxied objects can be implemented in C or Python. We would use this class for both xml.dom and a minidom in the standard library. It would also be usable from Pyxie, groves easysax, and anywhere else that reference counting of cyclic objects is necessary. Opinions? I would actually sneak this class into a C-coded minidom library (built directly on top of expat) for use by anyone who knows it is there. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From akuchlin@mems-exchange.org Fri Feb 18 18:56:46 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 18 Feb 2000 13:56:46 -0500 (EST) Subject: [XML-SIG] DOM and Proxies In-Reply-To: <38AD6B1B.C57AD16B@prescod.net> References: <38AD6B1B.C57AD16B@prescod.net> Message-ID: <14509.38382.968129.719917@amarok.cnri.reston.va.us> Paul Prescod writes: >Opinions? I would actually sneak this class into a C-coded minidom >library (built directly on top of expat) for use by anyone who knows it >is there. Open question: is the proxy mechanism still useful if a garbage collection mechanism for collecting cycles gets into 1.6? (Neal Schemenauer is working on something, but it's too early to tell if it'll get into 1.6; perhaps the cost will be too high.) If cyclic trash was collected, would you still need a proxy mechanism? Maybe you'd use it for performance reasons, to save the GC some work, making less trash for it to scan through, but then you're losing a tiny bit of performance from the extra indirection on every access to the object. My concern is simply to avoid spending time building something that turns out to be unneeded. -- A.M. Kuchling http://starship.python.net/crew/amk/ "What's that awful noise?" "I beg your pardon... "Awful noise"? A good way to talk about my singing!" "No, Doctor, not that awful noise -- the other one!" -- Barbara and the Doctor, in "The Chase" From ken@bitsko.slc.ut.us Fri Feb 18 20:07:42 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 18 Feb 2000 14:07:42 -0600 Subject: [XML-SIG] DOM and Proxies In-Reply-To: "Andrew M. Kuchling"'s message of Fri, 18 Feb 2000 13:56:46 -0500 (EST) References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> Message-ID: "Andrew M. Kuchling" writes: > Paul Prescod writes: > >Opinions? I would actually sneak this class into a C-coded minidom > >library (built directly on top of expat) for use by anyone who knows it > >is there. > > Open question: is the proxy mechanism still useful if a garbage > collection mechanism for collecting cycles gets into 1.6? (Neal > Schemenauer is working on something, but it's too early to tell if > it'll get into 1.6; perhaps the cost will be too high.) Probably not needed purely for GC reasons. > Maybe you'd use it for performance reasons, to save the GC some work, > making less trash for it to scan through, but then you're losing a > tiny bit of performance from the extra indirection on every access to > the object. My concern is simply to avoid spending time building > something that turns out to be unneeded. Probably not a performance boost either, a GC would still likely scan all the objects and using proxies would actually add more objects to be scanned. > If cyclic trash was collected, would you still need a proxy mechanism? I'd like to offer up a different reason for using proxies: to remove the concept of "ownership" from fragments of the tree so that they can be shared by multiple processing steps. It's not clear to me why the DOM processing model has such a strict concept of "owning document". To a lesser extent, a lot of data models use parent references because the data is inherently hierarchic but ignore the usefulness of being able to share tree fragments between different trees. I have found proxies to be very good at providing the illusion of heritage while in reality allowing fragments to be shared among trees. -- Ken From ken@bitsko.slc.ut.us Fri Feb 18 20:12:24 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 18 Feb 2000 14:12:24 -0600 Subject: [XML-SIG] Minidom proposal In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 07:53:56 -0800 References: <38AD6B14.F4F7E84E@prescod.net> Message-ID: Paul Prescod writes: > I propose the following interface for a module that would go > into Python 1.6 (excuse my IDLish shorthand) All looks good to me. > class Element(Node): > string tagname > # check what the DOM does with namespaces > {Dictionary of Name->Value} attributes To clarify, this dictionary follows the earlier proposal that attributes are keyed by (namespaceURI, localName) tuples, correct? -- Ken P.S. I wish Perl could do that gracefully. :-/ From fdrake@acm.org Fri Feb 18 22:36:32 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 18 Feb 2000 17:36:32 -0500 (EST) Subject: [XML-SIG] DOM and Proxies In-Reply-To: References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> Message-ID: <14509.51568.114533.714616@weyr.cnri.reston.va.us> Ken MacLeod writes: > I'd like to offer up a different reason for using proxies: to remove > the concept of "ownership" from fragments of the tree so that they can > be shared by multiple processing steps. I like this! This also requires proxies to work cleanly, as far as I can tell. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Fri Feb 18 23:26:31 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 18 Feb 2000 18:26:31 -0500 (EST) Subject: [XML-SIG] DOM and Proxies In-Reply-To: <38ADAD92.BE9948FB@prescod.net> References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> <14509.51568.114533.714616@weyr.cnri.reston.va.us> <38ADAD92.BE9948FB@prescod.net> Message-ID: <14509.54567.163149.694183@weyr.cnri.reston.va.us> Paul Prescod writes: > Insofar as this is minidom and provides minimal support for moving > things around, cloning them and so forth, I wouldn't put in proxies just > to get object reuse. In the full PyDOM they would be more appropriate. I was thinking more of the general case, DOM or otherwise. I think it would be really nice to have this sort of proxy available in a "high performance" implementation. The reality is that several variants might be needed (with various support for mappings, sequences, etc.), but that's a detail symptomatic of the type/class dichotomy and not a long-term issue. It may not be realistic to share one implementation, and may not be worth the C code if not. But to support "sharable sub-hierarchies" as Ken described, we would need to use some sort of proxy solution. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Mike.Olson@Fourthought.com Mon Feb 21 08:45:43 2000 From: Mike.Olson@Fourthought.com (Mike Olson) Date: Mon, 21 Feb 2000 01:45:43 -0700 Subject: [XML-SIG] Minidom proposal References: <38AD6B14.F4F7E84E@prescod.net> Message-ID: <38B0FB36.955F2C9A@Fourthought.com> --------------7AC834847477473E2914850B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Ken MacLeod wrote: > Paul Prescod writes: > > > I propose the following interface for a module that would go > > into Python 1.6 (excuse my IDLish shorthand) > > All looks good to me. > > > class Element(Node): > > string tagname > > # check what the DOM does with namespaces > > {Dictionary of Name->Value} attributes > > To clarify, this dictionary follows the earlier proposal that > attributes are keyed by (namespaceURI, localName) tuples, correct? > What if we are in a non-namespace-aware system? Should the key be (None,localName) or just localName? Mike > > -- Ken > > P.S. I wish Perl could do that gracefully. :-/ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Senior Consultant Fourthought, Inc. http://www.fourthought.com http://www.opentechnology.com 720-304-0152 --------------7AC834847477473E2914850B Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Ken MacLeod wrote:
Paul Prescod <paul@prescod.net> writes:

> I propose the following interface for a module that would go
> into Python 1.6 (excuse my IDLish shorthand)

All looks good to me.

> class Element(Node):
>       string tagname
>       # check what the DOM does with namespaces
>       {Dictionary of Name->Value} attributes

To clarify, this dictionary follows the earlier proposal that
attributes are keyed by (namespaceURI, localName) tuples, correct?
 

What if we are in a non-namespace-aware system?  Should the key be (None,localName) or just localName?

Mike
 

 
  -- Ken

P.S. I wish Perl could do that gracefully.  :-/

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson
Senior Consultant Fourthought, Inc.
http://www.fourthought.com http://www.opentechnology.com
720-304-0152
  --------------7AC834847477473E2914850B-- From ken@bitsko.slc.ut.us Mon Feb 21 16:21:14 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 21 Feb 2000 10:21:14 -0600 Subject: [XML-SIG] Proposal: Marrying SAX2 and DOM Message-ID: As SAX2 comes near to being finalized, I'd like to make a proposal for the Python binding that could make SAX2/Python a lot simpler. SAX2 adds support for specifying "features" that the parser supports. Many of these features include additional properties be made available to handlers. In the Java binding these additional properties are only available through "callbacks" to the parser. What I would like to propose is that the Python SAX2 binding pass objects, specifically DOM-conformant objects, as a single parameter rather than using both positional parameters and callbacks. Benefits: * Will allow additional properties to be passed to handlers in a straightforward way, making parser extensions and filters much simpler to use and implement. * Becomes much easier using SAX to traverse a DOM, each SAX event simply passes the DOM node itself, rather than having a domNode() callback on the "parser". Drawbacks: * A wider gap between the Java binding and the Python binding. * Creating objects for each event is a performance hit. The parser would most likely use a DOMFactory specific to the type of DOM objects the user would want, MiniDOM, PyDOM, etc. If the parse is being used simply to create a DOM tree, then the DOM objects passed in the events can be used to create the tree (by just appending children to their parent). This pattern has been used in the Perl SAX binding and I've found it to be extremely convenient. I would propose using DOM nodes for SAX2 (Java) altogether for the same reasons, but I think Java's strict typing would be very prohibitive to this sort of idea. Comments? -- Ken From gstein@lyra.org Fri Feb 18 23:53:29 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 18 Feb 2000 15:53:29 -0800 (PST) Subject: [XML-SIG] Minidom proposal In-Reply-To: <38AD6B14.F4F7E84E@prescod.net> Message-ID: On Fri, 18 Feb 2000, Paul Prescod wrote: >... > class Attribute(Node): > string namespaceURI > string prefix > string localName > string value > element ownerElement Attribute is a subclass of Node, which has a parent. Why not use the parent for the owner? > class Element(Node): > string tagname > # check what the DOM does with namespaces > {Dictionary of Name->Value} attributes > GetElementsByTagName( tagname ) -> List[Node]: > getElementsByTagNameNS( > DOMString namespaceURI, > DOMString localName) -> NodeList GetElementsByTagName* should have a matching capitaliztion. DOMString?? NodeList -> List[Node] > string namespaceURI > string prefix > string localName Isn't this a duplicate of tagname? Why have both? Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@garshol.priv.no Mon Feb 21 08:23:36 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Feb 2000 09:23:36 +0100 Subject: [XML-SIG] SAX 2.0, again Message-ID: Some weeks ago David Megginson released a SAX 2.0 beta in Java, and this release appears to be quite close to the final form of SAX 2.0. I've started working on translating this release into Python, but there are some general design issues that need to be thought through before this can be completed. ### XML names The first problem is that of how to represent XML names. SAX 2.0 can handle namespaces, and so we must somehow represent namespace-names. I can see several different ways of doing this, all with their advantages and disadvantages, and would very much like to hear the opinion of the XML-SIG on this. The alternatives I've thought of are - use (uri, localpart) tuple for namespace-names, simple strings for ordinary names - use (uri, localpart, rawname) for namespace-names, simple strings for ordinary names; rawname must be communicated out of band somehow - use XMLName objects for names, regardless of kind. If these were made immutable and drivers used hashtables of these this might not be too inefficient. - use separate parameters for uri, localpart and rawname, letting some of these be None depending on what was in the document and what the parser supports. ### Driver maintenance Given that SAX 2.0 is larger than SAX 1.0 and also supports various possibilities for extensions, writing a good and complete SAX 2.0 driver can be quite a bit of work. If any parser writers or others feel like contributing to this work by writing and maintaining drivers, then please feel encouraged to do so. If nobody does write drivers, I will do it, but it will probably take longer and they may not be as complete. ### Unicode support Python 1.6 will have Unicode support, and so we should make PySAX 2.0 Unicode-ready. The main part of this is really adding the InputSource object to the library, since this allows applications to feed byte or character streams to the parser in a convenient way. The question is: how will this distinction look in Python 1.6? Will there be one? How should we relate to it? Could we do it simply by using file-like objects with different semantics? ### easySAX vs Pyxie What should we do with this? Should we try to turn Pyxie into what we envisioned easySAX to be, or should we maintain two such libraries? I see advantages and disadvantages to both approaches. One idea I've had for easySAX is something inspired by John Aycock's Spark parser generator, that one could write SAX document handlers with three kinds of special methods: start-element, end-element and element content methods. These could use the 's_', 'e_' and 'c_' prefixes, respectively. Unlike in xmllib, though, the names of these methods would have no significance beyond the prefix. Instead, the documentation string could contain very simple XPath expressions to be used to dispatch events onto the various methods. This should allow us to write easySAX applications that look somewhat like this (self.out is some XML generator class which may or may not be part of easySAX): class MyHandler(GenericEZSAXHandler): def s_doc(self, attrs): ' document ' self.out.write_template("top") def c_sec_title(self, contents, attrs): ' section / title ' self.out.make_element('h1', contents) def c_subsec_title(self, contents, attrs): ' subsection / title ' self.out.make_element('h2', contents) def e_doc(self): ' document ' self.out.write_template("bottom") I'm fairly confident that a layer on top of SAX 2.0 to enable such easySAX applications could be made fairly fast and it should be pretty easy to implement as well. (I've made an early sketch of this.) The only question is what to do with namespace-names. Perhaps the application could declare constant namespace prefixes to be used in the documentation strings in its constructor. --Lars M. From paul@prescod.net Fri Feb 18 20:30:13 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 12:30:13 -0800 Subject: [XML-SIG] Minidom proposal References: <38AD6B14.F4F7E84E@prescod.net> Message-ID: <38ADABD5.E0E02CBA@prescod.net> > To clarify, this dictionary follows the earlier proposal that > attributes are keyed by (namespaceURI, localName) tuples, correct? Good question. I think that for simplicity we should index attributes both with tuples AND with a simple tagname. I don't want to mess up node["href"] just to support the much less common node[("http://www.w3.org/TR/xlink","href")]. This is especially the case since attributes do not "namespace default." So namespaced attributes will actually be relatively rare. > -- Ken > > P.S. I wish Perl could do that gracefully. :-/ You'd better get used to saying that. <0.9 wink> -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From paul@prescod.net Fri Feb 18 20:37:38 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 12:37:38 -0800 Subject: [XML-SIG] DOM and Proxies References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> <14509.51568.114533.714616@weyr.cnri.reston.va.us> Message-ID: <38ADAD92.BE9948FB@prescod.net> "Fred L. Drake, Jr." wrote: > > I like this! This also requires proxies to work cleanly, as far as > I can tell. > > -Fred Insofar as this is minidom and provides minimal support for moving things around, cloning them and so forth, I wouldn't put in proxies just to get object reuse. In the full PyDOM they would be more appropriate. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From tpassin@idsonline.com Mon Feb 21 17:25:29 2000 From: tpassin@idsonline.com (THOMAS PASSIN) Date: Mon, 21 Feb 2000 12:25:29 -0500 Subject: [XML-SIG] SAX 2.0, again References: Message-ID: <002101bf7c90$a8f18a80$5da4fea9@tompassin> Lars Marius Garshol wrote: > > > Some weeks ago David Megginson released a SAX 2.0 beta in Java, and > this release appears to be quite close to the final form of SAX 2.0. > I've started working on translating this release into Python, but > there are some general design issues that need to be thought through > before this can be completed. > > > ### XML names > > The first problem is that of how to represent XML names. SAX 2.0 can > handle namespaces, and so we must somehow represent namespace-names. I think we should make it as easy as possible to use either namespace-style names or ordinary names, so both can be used in the same way as far as possible. The application shouldn't have to figure out the structure before it can even extract the value. So I don't think the xml name should be a tuple if it has a declared namespace but a string if there is no namespace. With this in mind, how about ((prefix,localpart),uri) If namespaces were not being used, prefix and uri would be None (or possibly the empty string). This allows the use of alternative values for the prefix (so you could, for example, use xslt:template for xsl:template if you wanted to, which is the way it is supposed to work), and you could check the uri value anytime you needed to learn the exact namespace. localpart would always be a string. Also, if you had a document containing several prefixes for the same namespace, you could easily use the localpart and uri, rather than the prefix. I don't recall how it shook out on XML-DEV, but there were a number of posts that said it was important to keep the actual prefix value, and this approach would do that. BTW, "uri" doesn't actually need to be a uri, any unique string will do. > I can see several different ways of doing this, all with their > advantages and disadvantages, and would very much like to hear the > opinion of the XML-SIG on this. > > The alternatives I've thought of are > > - use (uri, localpart) tuple for namespace-names, simple strings for > ordinary names > > - use (uri, localpart, rawname) for namespace-names, simple strings > for ordinary names; rawname must be communicated out of band > somehow > > - use XMLName objects for names, regardless of kind. If these were > made immutable and drivers used hashtables of these this might not > be too inefficient. > > - use separate parameters for uri, localpart and rawname, letting > some of these be None depending on what was in the document and > what the parser supports. > Tom Passin From ken@bitsko.slc.ut.us Mon Feb 21 19:37:46 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 21 Feb 2000 13:37:46 -0600 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: Lars Marius Garshol's message of "21 Feb 2000 09:23:36 +0100" References: Message-ID: Lars Marius Garshol writes: > ### XML names > > The first problem is that of how to represent XML names. SAX 2.0 can > handle namespaces, and so we must somehow represent namespace-names. > I can see several different ways of doing this, all with their > advantages and disadvantages, and would very much like to hear the > opinion of the XML-SIG on this. > > The alternatives I've thought of are > > - use (uri, localpart) tuple for namespace-names, simple strings for > ordinary names > > - use (uri, localpart, rawname) for namespace-names, simple strings > for ordinary names; rawname must be communicated out of band > somehow > > - use XMLName objects for names, regardless of kind. If these were > made immutable and drivers used hashtables of these this might not > be too inefficient. > > - use separate parameters for uri, localpart and rawname, letting > some of these be None depending on what was in the document and > what the parser supports. The proposal I made earlier (passing objects instead of positional parameters) is another solution. From my proposal and Paul's miniDOM proposal earlier, start_element would be passed an Element object: class Element(Node): string tagName {Dictionary of Name->Value} attributes string namespaceURI string prefix string localName I believe tagName is the raw name and the remaining three are set depending on whether NS processing is turned on. For attributes to be a dictionary and support both NS and no-NS processing, I like (uri, localName) for NS and (None, tagName) for no-NS. > ### Unicode support > > Python 1.6 will have Unicode support, and so we should make PySAX 2.0 > Unicode-ready. The main part of this is really adding the InputSource > object to the library, since this allows applications to feed byte or > character streams to the parser in a convenient way. Adding InputSource may not be necessary if there was a method parseCharFile() to specify character streams. > ### easySAX vs Pyxie > > What should we do with this? Should we try to turn Pyxie into what we > envisioned easySAX to be, or should we maintain two such libraries? I > see advantages and disadvantages to both approaches. > > One idea I've had for easySAX is something inspired by John Aycock's > Spark parser generator, that one could write SAX document handlers > with three kinds of special methods: start-element, end-element and > element content methods. These could use the 's_', 'e_' and 'c_' > prefixes, respectively. > I'm fairly confident that a layer on top of SAX 2.0 to enable such > easySAX applications could be made fairly fast and it should be pretty > easy to implement as well. (I've made an early sketch of this.) If I understand correctly, yes, having a SAX filter that calls tag-based methods names should be really easy. I think the part I don't understand about easySAX and Pyxie (and it's probably from not having the opportunity to use them) is: why isn't the SAX binding already this easy? -- Ken From paul@prescod.net Sat Feb 19 16:12:21 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 19 Feb 2000 08:12:21 -0800 Subject: [XML-SIG] Minidom proposal References: Message-ID: <38AEC0E5.95F54819@prescod.net> Greg Stein wrote: > > Attribute is a subclass of Node, which has a parent. Why not use the > parent for the owner? This is a common debate in the XML world. Attributes are not considered "children" of elements so it is somewhat weird to call the owner "parent". You're my parent but I'm not your child. Given that the argument could go either way we might as well do it the way that PyDOM and 4DOM currently do (AFAIK). > GetElementsByTagName* should have a matching capitaliztion. True. > DOMString?? Cut and paste error. For Python, DOMString is just PyString -- especially since we will soon have Unicode. > NodeList -> List[Node] Right. > Isn't this a duplicate of tagname? Why have both? tagname = html:a or a localname = a (always) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From mrnolta@princeton.edu Tue Feb 22 09:23:10 2000 From: mrnolta@princeton.edu (Michael Nolta) Date: Tue, 22 Feb 2000 04:23:10 -0500 (EST) Subject: [XML-SIG] installation problem Message-ID: I'm having problems installing. It can't file the file /usr/lib/python1.5/config/Makefile which it needs to make sedscript. I'm using RedHat 6.1, and there's no config/ directory in /usr/lib/python1.5. -Mike --- VERSION=`python -c "import sys; print sys.version[:3]"`; \ installdir=`python -c "import sys; print sys.prefix"`; \ exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \ make -f ./Makefile.pre.in VPATH=. srcdir=. \ VERSION=$VERSION \ installdir=$installdir \ exec_installdir=$exec_installdir \ Makefile make[1]: Entering directory `/scr1/build/PyXML-0.5.2/extensions' make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile', needed by `sedscript'. Stop. --- From hannu@tm.ee Tue Feb 22 09:45:42 2000 From: hannu@tm.ee (Hannu Krosing) Date: Tue, 22 Feb 2000 11:45:42 +0200 Subject: [XML-SIG] installation problem References: Message-ID: <38B25AC6.13963FA@tm.ee> Michael Nolta wrote: > > I'm having problems installing. It can't file the file > > /usr/lib/python1.5/config/Makefile > > which it needs to make sedscript. I'm using RedHat 6.1, and there's no > config/ directory in /usr/lib/python1.5. install python-devel-*.rpm ------- Hannu From larsga@garshol.priv.no Tue Feb 22 16:06:12 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Feb 2000 17:06:12 +0100 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: <002101bf7c90$a8f18a80$5da4fea9@tompassin> References: <002101bf7c90$a8f18a80$5da4fea9@tompassin> Message-ID: * THOMAS PASSIN | | I think we should make it as easy as possible to use either | namespace-style names or ordinary names, so both can be used in the | same way as far as possible. Agreed. This has to be the overall goal. | The application shouldn't have to figure out the structure before it | can even extract the value. So I don't think the xml name should be | a tuple if it has a declared namespace but a string if there is no | namespace. This is a valid point, unless we can work around the problem somehow. | With this in mind, how about | | ((prefix,localpart),uri) For performance and convenience it would be better to do this as (prefix, localpart, uri) but I agree that this is better than (uri, localpart, rawname) since you rarely want the rawname anyway, and when you want it you can get it from the prefix + localpart. The only problem I have with this is that it means that names with different prefixes do not compare as equal. This is why I would prefer to have the prefix reported somewhere else. (Any good ideas for where?) | I don't recall how it shook out on XML-DEV, but there were a number | of posts that said it was important to keep the actual prefix value, | and this approach would do that. I think it was needed for the DOM, and it's also part of the lexical information that one sometimes needs, so there are definitely reasons to keep it. The question is where. | BTW, "uri" doesn't actually need to be a uri, any unique string will | do. Perhaps, but it doesn't really matter to us. :-) --Lars M. From larsga@garshol.priv.no Tue Feb 22 15:59:11 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Feb 2000 16:59:11 +0100 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: References: Message-ID: * Ken MacLeod | | [representing XML names] | The proposal I made earlier (passing objects instead of positional | parameters) is another solution. Yeah, I saw that after I'd posted and the list email had been fixed again. I've looked at that proposal and want to think it through a little before I say anything about it. | Adding InputSource may not be necessary if there was a method | parseCharFile() to specify character streams. We still need to be able to return something from EntityResolver that the parser can read from correctly, and I think InputSource is the way to go. It's a very simple class anyway, and would be implemented only once (in the SAX library). | [easySAX vs Pyxie] | | If I understand correctly, yes, having a SAX filter that calls | tag-based methods names should be really easy. It would, and we already have that. What I was thinking of was using documentation comments to do dispatching on instead, since this would give us more advanced dispatching. You could do things like def c_beep(self, contents, attrs): ' section / title ' self.out.make_element('h1', contents) | I think the part I don't understand about easySAX and Pyxie (and | it's probably from not having the opportunity to use them) is: why | isn't the SAX binding already this easy? It's a good question. The main reason is that I wanted something very simple that could be implemented by parser libraries without too much fuss and also something that could easily be put on top of databases, converters and other kinds of tools to produce XML output. Similarly, I wanted it to be possible to make competing toolkits for making XML processing simple on top of a standard parser API so that it would be trivially easy for all these toolkits to support all XML parsers (and other XML generators) available in Python. --Lars M. From tpassin@idsonline.com Wed Feb 23 02:29:25 2000 From: tpassin@idsonline.com (THOMAS PASSIN) Date: Tue, 22 Feb 2000 21:29:25 -0500 Subject: [XML-SIG] SAX 2.0, again References: <002101bf7c90$a8f18a80$5da4fea9@tompassin> Message-ID: <002d01bf7da5$d1ee6280$5c2a08d1@idsonline.com> Lars Marius Garshol wrote, replying to my post: > | The application shouldn't have to figure out the structure before it > | can even extract the value. So I don't think the xml name should be > | a tuple if it has a declared namespace but a string if there is no > | namespace. > > This is a valid point, unless we can work around the problem somehow. > > | With this in mind, how about > | > | ((prefix,localpart),uri) > > For performance and convenience it would be better to do this as > > (prefix, localpart, uri) > > but I agree that this is better than > > (uri, localpart, rawname) > > since you rarely want the rawname anyway, and when you want it you can > get it from the prefix + localpart. > > The only problem I have with this is that it means that names with > different prefixes do not compare as equal. This is why I would prefer > to have the prefix reported somewhere else. (Any good ideas for where?) > OK, what about (prefix,(localpart,uri)). Then we compare names with names_compare=(name1[1]==name2[1]). Since names are the same by definition if the localpart and namespace are identical, this should work fine. And the prefix is still there, tagging along for the ride. As for performance, you know far more about Python performance than I. But maybe some analysis... say we are processing 10,000 elements using SAX with some typical kind of element processing methods. What fraction of the total processing time would be lost by using this structure and name test instead of some optimized structure? If the loss might be, say, 5%, I say don't worry about it one little bit. If it's 25% of the ***overall*** processing time, probably that is too much. Who can shed some reasonably definitive light on this? Regards, Tom Passin From paul@prescod.net Fri Feb 18 20:21:07 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 18 Feb 2000 12:21:07 -0800 Subject: [XML-SIG] DOM and Proxies References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> Message-ID: <38ADA9B3.B5C01BE1@prescod.net> "Andrew M. Kuchling" wrote: > > Open question: is the proxy mechanism still useful if a garbage > collection mechanism for collecting cycles gets into 1.6? (Neal > Schemenauer is working on something, but it's too early to tell if > it'll get into 1.6; perhaps the cost will be too high.) No, if the cycle-reaper gets into 1.6 then I wouldn't bother with the proxies. The mere fact that I've spent too much of my life thinking up cycle-avoidance mechanisms suggests that we should give Neal's patch a high priority (scuse the pun)! Obviously the proxy has the benefit of not slowing down anything that doesn't use it. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From harri.pasanen@trema.com Wed Feb 23 17:12:56 2000 From: harri.pasanen@trema.com (Harri Pasanen) Date: Wed, 23 Feb 2000 18:12:56 +0100 Subject: [XML-SIG] small installation problem Message-ID: <38B41518.3548D87B@trema.com> I installed PyXML-0.5.3 on Solaris 2.7 following the README instructions. python setup.py install failed at first, because of missing /usr/local/lib/python1.5/site-packages/ directory. That directory does not appear the be created when Python 1.5.2 is installed from the tar-ball. After manually creating the directory, the install went through without complaints. Regards, -Harri From fdrake@acm.org Wed Feb 23 17:24:36 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 23 Feb 2000 12:24:36 -0500 (EST) Subject: [XML-SIG] small installation problem In-Reply-To: <38B41518.3548D87B@trema.com> References: <38B41518.3548D87B@trema.com> Message-ID: <14516.6100.748263.302872@weyr.cnri.reston.va.us> Harri, This is (in part) a distutils issue. The distutils package should always create this directory if it doesn't exist. Raw Python installations should not create it, since adding it to the search path would slow down the module search. Greg, I don't know if you're reading the XML-SIG list, so I'm adding the distutils list to the list of recipients. Harri Pasanen writes: > I installed PyXML-0.5.3 on Solaris 2.7 following the README > instructions. > > > python setup.py install > > failed at first, because of missing > /usr/local/lib/python1.5/site-packages/ directory. > That directory does not appear the be created when Python 1.5.2 is > installed from the tar-ball. > > After manually creating the directory, the install went through without > complaints. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gward@python.net Thu Feb 24 02:53:56 2000 From: gward@python.net (Greg Ward) Date: Wed, 23 Feb 2000 21:53:56 -0500 Subject: [Distutils] Re: [XML-SIG] small installation problem In-Reply-To: <14516.6100.748263.302872@weyr.cnri.reston.va.us>; from Fred L. Drake, Jr. on Wed, Feb 23, 2000 at 12:24:36PM -0500 References: <38B41518.3548D87B@trema.com> <14516.6100.748263.302872@weyr.cnri.reston.va.us> Message-ID: <20000223215356.A3815@beelzebub> On 23 February 2000, Fred L. Drake, Jr. said: > This is (in part) a distutils issue. The distutils package should > always create this directory if it doesn't exist. Fred is correct -- Distutils should create any directories it needs to install files. (In fact, it creates any directories it needs to do anything.) In fact, I've just tested this: both my current development version (on Linux) and the 0.1.3 release (Solaris 2.6) work peachy keen. If I remove or rename my site-packages directory, Distutils recreates it *as long as I have permission to write in $prefix/lib/python1.5*. Harri Pasanen writes: > python setup.py install > > failed at first, because of missing > /usr/local/lib/python1.5/site-packages/ directory. > That directory does not appear the be created when Python 1.5.2 is > installed from the tar-ball. Since I can't reproduce the bug, I'm going to need more information. Could you supply an exact transcript of the session where Distutils failed to create the site-packages directory? (I'm guessing there's a traceback that will reveal useful information.) Also, what version of Distutils did you use? Greg -- Greg Ward - Linux bigot gward@python.net http://starship.python.net/~gward/ Whatever became of eternal truth? From Tony.McDonald@newcastle.ac.uk Thu Feb 24 07:11:39 2000 From: Tony.McDonald@newcastle.ac.uk (Tony.McDonald@newcastle.ac.uk) Date: Thu, 24 Feb 2000 07:11:39 +0000 Subject: [XML-SIG] Compiled mac version of pyexpat anywhere? Message-ID: Can someone point me to a source for a pyexpat library for the Mac that will work with the latest Python version (1.5.2fc). With the library available at the Mac Python site, I keep getting 'ImportError: PythonCore: An import library was too new for a client.' messages. As the pystones benchmark indicates that my iMac is *twice* (4300 vs 2400 pystones) as fast as our Sun iron at work, I'd like to try and get this working! :) cheers tone From bradmars@yahoo.com Thu Feb 24 21:54:40 2000 From: bradmars@yahoo.com (Bradley Marshall) Date: Thu, 24 Feb 2000 13:54:40 -0800 (PST) Subject: [XML-SIG] Returning data from DocumentHandler Message-ID: <20000224215440.23382.rocketmail@web222.mail.yahoo.com> Hey guys, How do I return data from a ducumentHandler? I am using sax to build a data structure from xml files. I want to do something like : class DocHandler(DocumentHandler): .... def endDocument(self): return self.data Then I'm calling it like: dh = docHandler() p = parser() p.setDocumentHandler(dh) data = p.parseFile(file) p.close() but if I do : print data I get: None If I do all my manipulations in endDocument(), it's fine, but I'd like to seperate those functionalities. Thanks a lot, Brad Marshall __________________________________________________ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://im.yahoo.com From akuchlin@mems-exchange.org Thu Feb 24 22:19:50 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 24 Feb 2000 17:19:50 -0500 (EST) Subject: [XML-SIG] Returning data from DocumentHandler In-Reply-To: <20000224215440.23382.rocketmail@web222.mail.yahoo.com> References: <20000224215440.23382.rocketmail@web222.mail.yahoo.com> Message-ID: <14517.44678.636035.291431@amarok.cnri.reston.va.us> Bradley Marshall writes: >class DocHandler(DocumentHandler): >.... > def endDocument(self): > return self.data > >dh = docHandler() >p = parser() >p.setDocumentHandler(dh) >data = p.parseFile(file) >p.close() The parseFile() method doesn't return anything, so it'll always be None. In the Java version of SAX, the parse() method is declared as void, in other words. Why not just access the attribute .data of your DocHandler class? You can also add an accessor method, .getWhateverData(), to your class, if you prefer accessor methods to attributes. -- A.M. Kuchling http://starship.python.net/crew/amk/ Perhaps God made cats so that man might have the pleasure of fondling the tiger... -- Robertson Davies, _The Diary of Samuel Marchbanks_ From uche.ogbuji@fourthought.com Sun Feb 27 08:04:11 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 27 Feb 2000 01:04:11 -0700 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: Your message of "21 Feb 2000 09:23:36 +0100." Message-ID: <200002270804.BAA04277@localhost.localdomain> > The first problem is that of how to represent XML names. SAX 2.0 can > handle namespaces, and so we must somehow represent namespace-names. > I can see several different ways of doing this, all with their > advantages and disadvantages, and would very much like to hear the > opinion of the XML-SIG on this. > > The alternatives I've thought of are > > - use (uri, localpart) tuple for namespace-names, simple strings for > ordinary names This is how names are indexed in 4DOM. However, it can cause some od problems if namespace-aware code is mixed with non-ns code. > - use (uri, localpart, rawname) for namespace-names, simple strings > for ordinary names; rawname must be communicated out of band > somehow I do think it is very important to at least keep track of the prefix, even though we'd admonish users not to attach semantic value to them. > - use XMLName objects for names, regardless of kind. If these were > made immutable and drivers used hashtables of these this might not > be too inefficient. What interface do you have in mind? What hashing approach? Simple string hashing for string names, and maybe soem concatenation into a single string for namespace names? > - use separate parameters for uri, localpart and rawname, letting > some of these be None depending on what was in the document and > what the parser supports. -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From uche.ogbuji@fourthought.com Sun Feb 27 08:15:57 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 27 Feb 2000 01:15:57 -0700 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: Your message of "Mon, 21 Feb 2000 12:25:29 EST." <002101bf7c90$a8f18a80$5da4fea9@tompassin> Message-ID: <200002270815.BAA04321@localhost.localdomain> > > The first problem is that of how to represent XML names. SAX 2.0 can > > handle namespaces, and so we must somehow represent namespace-names. > > I think we should make it as easy as possible to use either namespace-style > names or ordinary names, so both can be used in the same way as far as > possible. The application shouldn't have to figure out the structure before > it can even extract the value. So I don't think the xml name should be a > tuple if it has a declared namespace but a string if there is no namespace. > > With this in mind, how about > > ((prefix,localpart),uri) > > If namespaces were not being used, prefix and uri would be None (or possibly > the empty string). It would have to be the former, to avoid confusion with default namespaces and null NS in an NS-aware system. > This allows the use of alternative values for the prefix > (so you could, for example, use xslt:template for xsl:template if you wanted > to, which is the way it is supposed to work), and you could check the uri > value anytime you needed to learn the exact namespace. localpart would > always be a string. This is pretty much essential. > Also, if you had a document containing several prefixes for the same > namespace, you could easily use the localpart and uri, rather than the > prefix. The prefix shouldn't be used except for convenient uniformity from input to output, and for the few W3C-sanctioned cases such as XPath name tests. > I don't recall how it shook out on XML-DEV, but there were a number of posts > that said it was important to keep the actual prefix value, and this > approach would do that. I was a champion of that on XML-DEV, for the above reasons. > BTW, "uri" doesn't actually need to be a uri, any unique string will do. Actually, it does have to be a URI or it is in contradiction of the spec (although they didn't go the natural step to make URI conformance a formal namespace constraint, they do have pretty conclusive wording to that effect in section 1). -- Uche Ogbuji Fourthought, Inc., IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software-engineering, project-management, knowledge-management http://Fourthought.com http://OpenTechnology.org From tpassin@idsonline.com Sun Feb 27 17:20:24 2000 From: tpassin@idsonline.com (THOMAS PASSIN) Date: Sun, 27 Feb 2000 12:20:24 -0500 Subject: [XML-SIG] SAX 2.0, again References: <200002270815.BAA04321@localhost.localdomain> Message-ID: <001901bf8146$f04041a0$b92a08d1@idsonline.com> wrote > > BTW, "uri" doesn't actually need to be a uri, any unique string will do. > > Actually, it does have to be a URI or it is in contradiction of the spec > (although they didn't go the natural step to make URI conformance a formal > namespace constraint, they do have pretty conclusive wording to that effect in > section 1). > Actually I mis-spoke slightly. I really meant it doesn't have to look like a regular ***URL***. I was thinking that the "scheme" of a URI could be blank, but checking the RFC I see it has to have at least one letter plus the ":". The rest of it can just be a string (modulo using legal characters. etc). The namespace spec specifically says "It is not a goal that it be directly usable for retrieval of a schema (if any exists). " So it doesn't have to be any existing URL or even an existing scheme, as long as it is unique. Regards, Tom Passin From josh@shock.pobox.com Sun Feb 27 22:59:58 2000 From: josh@shock.pobox.com (Josh Marcus) Date: Sun, 27 Feb 2000 17:59:58 -0500 Subject: [XML-SIG] XML database options Message-ID: <20000227175958.A22001@shock.pobox.com> Can I ask a slightly off-topic question? Recently, I've been implementing an XML server that is capable of storing XML documents in a relational database in such a way that the documents can be quickly queried. I found an interesting paper that compares the performance of alternative mapping schemes ("A Performance Evalutation of Alternative Mapping Schemes for Storing XML Data in a Relational Database") and decided to follow the scheme their experimentation found most efficient -- with a few changes to trade storage space for speed. I was just wondering: o Is there an open source application that I can use for this? o If not, is there conventional wisdom regarding how one might go about storing and querying XML data (short of buying a commercial oo-db)? Thanks, --j From jack@oratrix.nl Sun Feb 27 23:28:15 2000 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 28 Feb 2000 00:28:15 +0100 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: Message by uche.ogbuji@fourthought.com , Sun, 27 Feb 2000 01:15:57 -0700 , <200002270815.BAA04321@localhost.localdomain> Message-ID: <20000227232820.25A81D71F2@oratrix.oratrix.nl> Sjoerds mods to xmllib (which I don't think are publicly available, but they might be in the CVS archive) use a single string ns+' '+attr. This has the advantage of being pretty easy to use: it doesn't matter much whether you check for an attribute "foo" or an attribute "myns bar". The only addition you would need would be an optional mapping of external namespaces, i.e. there'd have to be a way to specify that if a certain namespace was used in a document you'd like to see it with a specific name in the parser regardless of what is used in the document. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From m.favas@per.dem.csiro.au Mon Feb 28 03:16:49 2000 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Mon, 28 Feb 2000 11:16:49 +0800 Subject: [XML-SIG] Patches for PyXML-0.5.3 re single arg to list.append Message-ID: <38B9E8A1.2F27364F@per.dem.csiro.au> Recently, the CVS version of Python has been changed to flag as an error usages of list.append() with more than one argument. (Previously, multiple args were silently converted to a tuple.) The following two patches fix this type of append() use for the released PyXML-0.5.3 (apologies if already fixed in XML CVS). Both occur in xml/parsers/xmlproc. *** dtdparser.py.orig Mon Feb 28 10:31:17 2000 --- dtdparser.py Mon Feb 28 10:33:00 2000 *************** *** 598,604 **** self.scan_to(">") self.skip_ws() ! cont_list.append(self.get_match(reg_name),"") if sep=="|" and not self.now_at("*"): self.report_error(3005,"*") --- 598,604 ---- self.scan_to(">") self.skip_ws() ! cont_list.append((self.get_match(reg_name),"")) if sep=="|" and not self.now_at("*"): self.report_error(3005,"*") *** xmlutils.py.orig Mon Feb 28 10:31:26 2000 --- xmlutils.py Mon Feb 28 10:32:15 2000 *************** *** 406,414 **** # --- Internal methods def _push_ent_stack(self): ! self.ent_stack.append(self.get_current_sysid(),self.data,self.pos,\ ! self.line,self.last_break,self.datasize,\ ! self.last_upd_pos,self.block_offset,self.final) def _pop_ent_stack(self): (self.current_sysID,self.data,self.pos,self.line,self.last_break,\ --- 406,414 ---- # --- Internal methods def _push_ent_stack(self): ! self.ent_stack.append((self.get_current_sysid(),self.data,self.pos, ! self.line,self.last_break,self.datasize, ! self.last_upd_pos,self.block_offset,self.final)) def _pop_ent_stack(self): (self.current_sysID,self.data,self.pos,self.line,self.last_break,\ Cheers, Mark -- Email - m.favas@per.dem.csiro.au Postal - Mark C Favas Phone - +61 8 9333 6268, 041 892 6074 CSIRO Exploration & Mining Fax - +61 8 9333 6121 Private Bag No 5 Wembley, Western Australia 6913 From fdrake@acm.org Mon Feb 28 20:33:34 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 28 Feb 2000 15:33:34 -0500 (EST) Subject: [XML-SIG] Patches for PyXML-0.5.3 re single arg to list.append In-Reply-To: <38B9E8A1.2F27364F@per.dem.csiro.au> References: <38B9E8A1.2F27364F@per.dem.csiro.au> Message-ID: <14522.56222.262861.886569@weyr.cnri.reston.va.us> Mark Favas writes: > Recently, the CVS version of Python has been changed to flag as an error > usages of list.append() with more than one argument. (Previously, > multiple args were silently converted to a tuple.) The following two > patches fix this type of append() use for the released PyXML-0.5.3 Mark, Thanks! I've just checked this in. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From rehankhwaja@yahoo.com Mon Feb 28 21:49:38 2000 From: rehankhwaja@yahoo.com (Rehan Khwaja) Date: Mon, 28 Feb 2000 13:49:38 -0800 (PST) Subject: [XML-SIG] xslt stylesheet for xbel Message-ID: <20000228214938.17798.qmail@web114.yahoomail.com> i've made an xslt stylesheet for tranforming an xbel document into a collapsing/expanding tree. the dhtml for the collapsing/expanding stuff works in Internet Explorer, at least. is anybody interested in this? i'd like to post it somewhere if possible. thanks, rehan khwaja rehankhwaja@yahoo.com __________________________________________________ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://im.yahoo.com From fdrake@acm.org Mon Feb 28 21:57:14 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 28 Feb 2000 16:57:14 -0500 (EST) Subject: [XML-SIG] xslt stylesheet for xbel In-Reply-To: <20000228214938.17798.qmail@web114.yahoomail.com> References: <20000228214938.17798.qmail@web114.yahoomail.com> Message-ID: <14522.61242.974334.713338@weyr.cnri.reston.va.us> Rehan Khwaja writes: > i've made an xslt stylesheet for tranforming an xbel > document into a collapsing/expanding tree. Cool! I played with one a while back for display, but was just learning XSL (there wasn't a "T" back then!) and wasn't very pleased with the result. > is anybody interested in this? i'd like to post it > somewhere if possible. I'd love to see it. I can add it to the xbel directory in the PyXML package if you think others will be interested. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein@lyra.org Mon Feb 28 23:13:48 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 28 Feb 2000 15:13:48 -0800 (PST) Subject: [XML-SIG] URI schemes (was: SAX 2.0, again) In-Reply-To: <001901bf8146$f04041a0$b92a08d1@idsonline.com> Message-ID: On Sun, 27 Feb 2000, THOMAS PASSIN wrote: > wrote > > > > > > BTW, "uri" doesn't actually need to be a uri, any unique string will do. > > > > Actually, it does have to be a URI or it is in contradiction of the spec > > (although they didn't go the natural step to make URI conformance a formal > > namespace constraint, they do have pretty conclusive wording to that > effect in > > section 1). > > > Actually I mis-spoke slightly. I really meant it doesn't have to look like > a regular ***URL***. I was thinking that the "scheme" of a URI could be > blank, but checking the RFC I see it has to have at least one letter plus > the ":". The rest of it can just be a string (modulo using legal > characters. etc). The namespace spec specifically says > "It is not a goal that it be directly usable for retrieval of a schema (if > any exists). " So it doesn't have to be any existing URL or even an > existing scheme, as long as it is unique. Minor nit: For it to be called a URI, the scheme must be registered with the IANA. If you just willy-nilly use arbitrary, unregistered schemes, then you *do* run the chance that it is not unique. Cheers, -g -- Greg Stein, http://www.lyra.org/ From rehankhwaja@yahoo.com Tue Feb 29 00:26:35 2000 From: rehankhwaja@yahoo.com (Rehan Khwaja) Date: Mon, 28 Feb 2000 16:26:35 -0800 (PST) Subject: [XML-SIG] xslt stylesheet for xbel Message-ID: <20000229002635.23704.qmail@web111.yahoomail.com> --0-596516649-951783995=:21007 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline ok - here it is, with a few of my bookmarks that i tested it with. i know that the dhtml it produces doesn't work in navigator :( i'd like to know of other bugs, otherwise enjoy. > > is anybody interested in this? i'd like to post > it > > somewhere if possible. > > I'd love to see it. I can add it to the xbel > directory in the PyXML > package if you think others will be interested. > that sounds great :o) cheers, rehan __________________________________________________ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://im.yahoo.com --0-596516649-951783995=:21007 Content-Type: application/x-zip-compressed; name="xbel-xsl.zip" Content-Transfer-Encoding: base64 Content-Description: xbel-xsl.zip Content-Disposition: attachment; filename="xbel-xsl.zip" UEsDBBQAAAAIAMGZXCjmeSI3GAUAAMAQAAAJAAAAbGlua3MueG1stVjdb9s2 EH9OgP4PN790Q1OpW/cwDI6LzGnabnGaNV67PhW0dJJY80MgKX/0oX/7jpTV SLbiuAEGJDBNHu9+93308MVKCligsVyr08HP0bMBoEp0ylV+OvhnevH0t8GL 0fHwh/O34+nH65ewmqGAm48305cTGBTOlb/H8XK5jMq1K7SKtMljp0ue2JgY x6lLaUFXnhLniL4NiJcX+dS6tUBbIDpw6xJPBw5XLl5ZMYDCYHY6EFzNbeQ3 XoweHQ+D3A7O0fHRMNMiRUOro6HjTuBorGVZOSIbxvXGI392S9bQnXObVNYz axMeDWdazyUz8w2IjYJWMFuk2nn1BoFNw6c56XA5GqZok5HCpYVMG1BoUgIU NoOUuBEz2iPUW5UZ6zApFE9YlGjZlX1mLEw3p33yr8fwo9TWifVPUDCTLplB 4CrT/VD2GcBj8TeX2ojUI4nJx4qsZ2OuUlxFhZNi0EivobzTMzQOxoZCCcX6 sYVEi0oq27V4xxbDuOXRzfJRy22XXFWrQzyWGQouicxFCt0WsIvmrM9ob0tU cKMrkyCwsmzcto31PmPlSkusw6Uj+5Xf75NLoBBoNafsgRmzmIJWkLv5AwHM 0z7xf52/PFz439MHyhbeS9qGYnBZGp0bJvsiJHgTNgSSwgQanrYPZagIIaHa V7iCBTNcVxZSnmVoUDkQTOUVy9GewBJnJzB+8uQEzlIG6JI9Sn0Lv3bVaMff hCdGW525Q2LQm0I2F+r07aj/ejsUDjSxtKnqMiaNFih0iYZWGauE8+beEje5 Ob/aJ+5e5cmIh6othBJRrhdxEuox+Sl2ldOGM2FjKiJzW+jydhWXjlgw6hWT szdXfaFyPa0JoGHzEMN5ZNaxZB4pEX9NueTO8DjVq3WOKt6SuNnuC0RXcAv0 5wqkPEH6ZA44fegqLxwsmYUZWloUVEt4+M7VgnZ4zrwt/EXrr9F6X4Hpq4Yt f/w7uWzQ7VFaLcnmxW73qPf71OMB2bxWK+iaV2sIvgrZx8xMG9+uI4DGFKSq by4gkBnKP2rbkBktv7vp0Wjg+8wuXD+m1L1rF3Awtiew3DXeeLxAAlup9Psh 2Lsg2C6EHQR2L4JtD9/v4D/Zgh3g4c9EFtlK7SL2J0BY6LDD5yA7fKspUVtA q9TUnf+zrdO1kzsNDbV8pTBxNCn8fzi0Iofh1DCaSKjQ0Nj1i6eMu5i+1Z/e jG4OQ9SGxA62u1UkOLZR4Y6I6q2gLX+ODdUeD/EDTWJfDvCsL1hh5DIotUM/ y8SZS/z/pyUFmSVd60rf1vQ9Gp5x6uEX02v7/fb2Qlf2VyZEKJPimUljpphJ ivWurNc8p8yHS5/2pNnDpBUZS8j+q/r1QIUl4/Q22PLflPbhwh/AGWHhC3yY MFk+p5bjewAZc7tFXj+HG394UJ887jh89y3yjsmSUTEc05hcSXKKlS3I+6Z+ yb7oVjo3837YbrPoRl8/wxl5Bi1TqdIzgTtM/wjHcKZSuPIEd7Fva72j6Qfq De2Lu6H/RjmkkU3Ae95+dN3jq0Ugngm92q1tu+d3x8N9z4oJo4CiRt4Fd98o kTIeYRqxJKrm8fh9XYZoQQOoWfcMYP5tiirllQSdbWTBjauyrH/I8DOCf7Ix mFUqKfylMAI/cCpPbFRZGgzn8dclF4IzGUtt8FNzw/bNXZPwfgS5MVBt8P1j ev3khEnnygnp4IAKrIbwOgAKSNDO1w5bW+CQKeh2OQy/KoyO/wNQSwMEFAAA AAgA4GRTKJLeYBmYAAAApAAAAAkAAABtaW51cy5wbmfrDPBz5+WS4mJgYOD1 9HAJAtLcIMwBJBh6jmv8AlKiJa4RJc5FqYklmfl5CiGZuakM7BeY+EWktc3f 6u8HKmAv8fR1BQoJi3Dz/Xz66B9QiLPAI7KYgYHrAwgzOq0xmQEUVPJ0cQyp uJWc4MDg1va/QE14koWeGXPM33o2MZb2Rou9QEvNfVn4Z33m4QdZ7+nq57LO KaEJAFBLAwQUAAAACADGllwogiIRyJ4BAABPBAAACQAAAGxpbmtzLnhzbLVU O2/bMBCebcD/geCSdrDYIEvtSkmnTp3aBMhKU2eJMUUSvPND/760KMlxYAdB gE4Cj9/d9zhC+cOhMWwHAbWzBb/NvnEGVrlS26rgT4+/5t/5w/1smh/QLJFa A1gDEItdFpexWPCayC+F2O/32f4uc6ESt4vFQjz//S0eg7S4dqHh5xRx4Gw6 6WYSNN5IAtZIUnXBV85tGhk2R8xkkm9N901YSRT0ahvBVjZQcNJkIAF7xE6a LczdmiEYUFTwElBx0c8QZ0P6onzd/5ahDrAeCK4w/OwwYphyieNKZ5I/iktC cpEsp0FDOu8EtnamhPA2Lt1UDIMquDdbzLytONNlOnHmrDJabQoOBy9t+eXr j1HFf4v5AwlgFMOUkYiDq7lylsBS1Ny9vUikMbpvl8w6e65Jem/a+ZAOjhx9 PuIj2PHxjVaOmj63F9GvpKZmWEoNshxkdNbv/0At7Q0yo+0GmZcV5CLd9DBU QXti1PrjIuBA4kXuZKrytOGuN3vBMQ2RrgcLJ9Z85cr2nRxG1yMudvfyL/hO pdNPIRb/AVBLAwQUAAAACADgZFMoa3GVj6IAAACsAAAACAAAAHBsdXMucG5n 6wzwc+flkuJiYGDg9fRwCQLS3CDMASQYeo5r/AJSoiWuESXORamJJZn5eQoh mbmpDOwXmPhFpLXN3+rvBypgL/H0dQUKCYtw8/c9upQBFOIs8IgsZmDg+gDC jE5rTGYABbU8XRxDKm4lJzgwuLX9L1ATDmFhmTHhisJmyQev/htvaLDs+f9x ZbrTxWYGhrpFbPwuK7maQM7wdPVzWeeU0AQAUEsDBBQAAAAIAJSZXCj6AFLk mwAAAJYBAAAIAAAAbGlua3MuanOtkDEKwzAMRecEcgfjKV10geKpdO0dTKy0 AkU2sdM0lN69dobSpSFDQUJ/eEgP9ZN0ibwofAQrrj009bOpK+pVO5M4PwPe URLEsTszDiWSU8YoHXiKOuNV4atfcLBjHhfvENjGdLoRO4hpYQRHMbBdjCZh EtTHrT3lqNIDST66Ceb4ISHIdaVfuZEj/sNW/D7X9T97VAv4bZrrDVBLAQIU ABQAAAAIAMGZXCjmeSI3GAUAAMAQAAAJAAAAAAAAAAEAIAC2gQAAAABsaW5r cy54bWxQSwECFAAUAAAACADgZFMokt5gGZgAAACkAAAACQAAAAAAAAAAACAA toE/BQAAbWludXMucG5nUEsBAhQAFAAAAAgAxpZcKIIiEcieAQAATwQAAAkA AAAAAAAAAQAgALaB/gUAAGxpbmtzLnhzbFBLAQIUABQAAAAIAOBkUyhrcZWP ogAAAKwAAAAIAAAAAAAAAAAAIAC2gcMHAABwbHVzLnBuZ1BLAQIUABQAAAAI AJSZXCj6AFLkmwAAAJYBAAAIAAAAAAAAAAEAIAC2gYsIAABsaW5rcy5qc1BL BQYAAAAABQAFABEBAABMCQAAAAA= --0-596516649-951783995=:21007-- From larsga@garshol.priv.no Tue Feb 29 07:21:44 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Feb 2000 08:21:44 +0100 Subject: [XML-SIG] SAX 2.0 names Message-ID: I've done some more thinking about this now, and this is the result: Element and attribute type names in XML have the following properties: - a namespace URI - a local name - a raw name (SAX 2.0b2 is now reporting this instead of the prefix) The essential operations on these names are: - comparison - indexing (that is, using them as keys in a dictionary) - decomposition (which includes partial comparison, where you check only the namespace or local name of the name) After the discussions we've had so far, these are the best alternatives for representations I can think of: - as objects (with __cmp__, __hash__, get_uri, get_local_name and get_rawname methods) - requires a bit of machinery in drivers to be effective - all operations will be slow - a natural way to model this - as strings (of the form 'uri localname', with the rawname in a separate parameter) - comparison and indexing will be fast, especially with interned names - decomposition will be slow and awkward - feels kind of like a hack - as tuples (of the form ('uri', 'localname'), with the rawname in a separate parameter) - all operations are convenient - comparison and indexing may not be as fast as with strings - a natural way to model this I have to go to a meeting before too long, but I'll try to make two benchmarks to compare the performance of the different representations. --Lars M. From larsga@garshol.priv.no Tue Feb 29 07:23:36 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Feb 2000 08:23:36 +0100 Subject: [XML-SIG] SAX 2.0, again In-Reply-To: <200002270815.BAA04321@localhost.localdomain> References: <200002270815.BAA04321@localhost.localdomain> Message-ID: * THOMAS PASSIN | | Also, if you had a document containing several prefixes for the same | namespace, you could easily use the localpart and uri, rather than | the prefix. * uche ogbuji | | The prefix shouldn't be used except for convenient uniformity from | input to output, and for the few W3C-sanctioned cases such as XPath | name tests. Agreed. The prefix is just a lexical detail, only useful for roundtripping. --Lars M. From tpassin@idsonline.com Tue Feb 29 12:57:54 2000 From: tpassin@idsonline.com (THOMAS PASSIN) Date: Tue, 29 Feb 2000 07:57:54 -0500 Subject: [XML-SIG] SAX 2.0, again References: <200002270815.BAA04321@localhost.localdomain> Message-ID: <001901bf82b4$98c2f160$3415b0cf@idsonline.com> Lars Marius Garshol > * THOMAS PASSIN > | > | Also, if you had a document containing several prefixes for the same > | namespace, you could easily use the localpart and uri, rather than > | the prefix. > > * uche ogbuji > | > | The prefix shouldn't be used except for convenient uniformity from > | input to output, and for the few W3C-sanctioned cases such as XPath > | name tests. > > Agreed. The prefix is just a lexical detail, only useful for > roundtripping. > Agreed here, too. Tom Passin From fdrake@acm.org Tue Feb 29 15:31:05 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 29 Feb 2000 10:31:05 -0500 (EST) Subject: [XML-SIG] SAX 2.0 names In-Reply-To: References: Message-ID: <14523.58937.766784.166817@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > - as objects (with __cmp__, __hash__, get_uri, get_local_name and > get_rawname methods) > > - requires a bit of machinery in drivers to be effective > - all operations will be slow > - a natural way to model this If the objects are implemented as a C/Java extension type, it should be plenty fast. A 100% Pure Python implementation can be a fallback if the extension isn't available. > - as strings (of the form 'uri localname', with the rawname in a > separate parameter) > > - comparison and indexing will be fast, especially with interned > names > - decomposition will be slow and awkward > - feels kind of like a hack Very much. > - as tuples (of the form ('uri', 'localname'), with the rawname in a > separate parameter) > > - all operations are convenient > - comparison and indexing may not be as fast as with strings > - a natural way to model this And the convenient tuple unpacking could also be provided using the object approach; the objects can easily implement the sequence protocol. I'd be willing to write a C implementation of the object version if that's the API we decide on, but I'd also be fine with the third option. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From larsga@garshol.priv.no Tue Feb 29 16:21:06 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Feb 2000 17:21:06 +0100 Subject: [XML-SIG] SAX 2.0 names In-Reply-To: <14523.58937.766784.166817@weyr.cnri.reston.va.us> References: <14523.58937.766784.166817@weyr.cnri.reston.va.us> Message-ID: * Lars Marius Garshol | | - as objects (with __cmp__, __hash__, get_uri, get_local_name and | get_rawname methods) | | - requires a bit of machinery in drivers to be effective | - all operations will be slow | - a natural way to model this * Fred L. Drake, Jr. | | If the objects are implemented as a C/Java extension type, it should | be plenty fast. A 100% Pure Python implementation can be a fallback | if the extension isn't available. Hmmm. That might be the way to go. I still wonder about the speed, though. | And the convenient tuple unpacking could also be provided using the | object approach; the objects can easily implement the sequence | protocol. Good idea. This makes objects even more attractive. | I'd be willing to write a C implementation of the object version if | that's the API we decide on, but I'd also be fine with the third | option. Hmmm. Let's chew on this a little more and hear some more opinions before deciding. I did the benchmark I spoke of, and the results indicate that the performance differences are very small between strings and tuples. Also, how you put together the strings influences the speed a bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium II with plenty of RAM and MHz. [larsga@pc-larsga python]$ python sax2bench.py Pure parsing time: 28.73 ---Generic: __main__.NamespaceFilterString 30.25 __main__.NamespaceFilterInternedString 30.85 __main__.NamespaceFilterTuple 30.15 ---Specific: __main__.NamespaceFilterString 30.71 __main__.NamespaceFilterInternedString 30.7 __main__.NamespaceFilterTuple 29.67 # A simple benchmark of various ways to represent namespace-names and # how this affects performance. # ==================== NAMESPACEFILTER # This is xmlproc's normal namespace filter, but modified to use # different name representations import string from xml.parsers.xmlproc import xmlapp # --- Name objects class SAXName: def __init__(self, uri, localname, rawname): self.__uri = uri self.__localname = localname self.__rawname = rawname self.__hash = hash(uri) + hash(localname) def get_uri(self): return self.__uri def get_localname(self): return self.__localname def get_rawname(self): return self.__rawname def __cmp__(self, other): # NB! Does not sort properly if self.__hash == hash(other) and isinstance(other, SAXName): return self.__uri == other.get_uri() and \ self.__localname == other.get_localname() else: return 0 def __hash__(self): return self.__hash # --- ParserFilter class ParserFilter(xmlapp.Application): "A generic parser filter class." def __init__(self): xmlapp.Application.__init__(self) self.app=xmlapp.Application() def set_application(self,app): "Sets the application to report events to." self.app=app # --- Methods inherited from xmlapp.Application def set_locator(self,locator): xmlapp.Application.set_locator(self,locator) self.app.set_locator(locator) def doc_start(self): self.app.doc_start() def doc_end(self): self.app.doc_end() def handle_comment(self,data): self.app.handle_comment(data) def handle_start_tag(self,name,attrs): self.app.handle_start_tag(name,attrs) def handle_end_tag(self,name): self.app.handle_end_tag(name) def handle_data(self,data,start,end): self.app.handle_data(data,start,end) def handle_ignorable_data(self,data,start,end): self.app.handle_ignorable_data(data,start,end) def handle_pi(self,target,data): self.app.handle_pi(target,data) def handle_doctype(self,root,pubID,sysID): self.app.handle_doctype(root,pubID,sysID) def set_entity_info(self,xmlver,enc,sddecl): self.app.set_entity_info(xmlver,enc,sddecl) # --- NamespaceFilter class NamespaceFilterGeneric(ParserFilter): """An xmlproc application that processes qualified names and reports them as 'URI local-part' names. It reports errors through the error reporting mechanisms of the parser.""" def __init__(self,parser): ParserFilter.__init__(self) self.ns_map={} # Current prefix -> URI map self.ns_stack=[] # Pushed for each element, used to maint ns_map self.rep_ns_attrs=0 # Report xmlns-attributes? self.parser=parser def set_report_ns_attributes(self,action): "Tells the filter whether to report or delete xmlns-attributes." self.rep_ns_attrs=action # --- Overridden event methods def handle_start_tag(self,name,attrs): old_ns={} # Reset ns_map to these values when we leave this element del_ns=[] # Delete these prefixes from ns_map when we leave element # attrs=attrs.copy() Will have to do this if more filters are made # Find declarations, update self.ns_map and self.ns_stack for (a,v) in attrs.items(): if a[:6]=="xmlns:": prefix=a[6:] if string.find(prefix,":")!=-1: self.parser.report_error(1900) if v=="": self.parser.report_error(1901) elif a=="xmlns": prefix="" else: continue if self.ns_map.has_key(prefix): old_ns[prefix]=self.ns_map[prefix] else: del_ns.append(prefix) if prefix=="" and v=="": del self.ns_map[prefix] else: self.ns_map[prefix]=v if not self.rep_ns_attrs: del attrs[a] self.ns_stack.append((old_ns,del_ns)) # Process elem and attr names name=self._process_name(name) for (a,v) in attrs.items(): del attrs[a] attrs[self._process_name(a)]=v # Report event self.app.handle_start_tag(name,attrs) def handle_end_tag(self,name): name=self._process_name(name) # Clean up self.ns_map and self.ns_stack (old_ns,del_ns)=self.ns_stack[-1] del self.ns_stack[-1] self.ns_map.update(old_ns) for prefix in del_ns: del self.ns_map[prefix] self.app.handle_end_tag(name) class NamespaceFilterString(NamespaceFilterGeneric): def _process_name(self,name): n=string.split(name,":") if len(n)>2: self.parser.report_error(1900) return name elif len(n)==2: if n[0]=="xmlns": return name try: #return string.join(self.ns_map[n[0]],n[1]) (slowest) #return "%s %s" % (self.ns_map[n[0]],n[1]) (slower) return self.ns_map[n[0]] + " " + n[1] except KeyError: self.parser.report_error(1902) return name elif self.ns_map.has_key("") and name!="xmlns": return "%s %s" % (self.ns_map[""],name) else: return name class NamespaceFilterInternedString(NamespaceFilterGeneric): def _process_name(self,name): n=string.split(name,":") if len(n)>2: self.parser.report_error(1900) return name elif len(n)==2: if n[0]=="xmlns": return name try: #return intern(string.join(self.ns_map[n[0]],n[1])) (slowest) #return intern("%s %s" % (self.ns_map[n[0]],n[1])) (slower) return intern(self.ns_map[n[0]] + " " + n[1]) except KeyError: self.parser.report_error(1902) return name elif self.ns_map.has_key("") and name!="xmlns": return intern("%s %s" % (self.ns_map[""],name)) else: return name class NamespaceFilterTuple(NamespaceFilterGeneric): def _process_name(self,name): n=string.split(name,":") if len(n)>2: self.parser.report_error(1900) return name elif len(n)==2: if n[0]=="xmlns": return name try: return (self.ns_map[n[0]], n[1]) except KeyError: self.parser.report_error(1902) return (None, name) elif self.ns_map.has_key("") and name!="xmlns": return (self.ns_map[""], name) else: return (None, name) class NamespaceFilterObject(NamespaceFilterGeneric): def __init__(self, parser): NamespaceFilterGeneric.__init__(self, parser) self.__objs = {} def _process_name(self,name): # FIXME: implement! n=string.split(name,":") if len(n)>2: self.parser.report_error(1900) return name elif len(n)==2: if n[0]=="xmlns": return name try: return (self.ns_map[n[0]], n[1]) except KeyError: self.parser.report_error(1902) return (None, name) elif self.ns_map.has_key("") and name!="xmlns": return (self.ns_map[""], name) else: return name # ==================== GENERIC BENCHMARK class GenericStats(xmlapp.Application): def __init__(self): self.__elemtypes = {} self.__attrtypes = {} def handle_start_tag(self, name, attrs): try: self.__elemtypes[name] = self.__elemtypes[name] + 1 except KeyError: self.__elemtypes[name] = 1 for (attr, value) in attrs.items(): try: self.__attrtypes[attr] = self.__attrtypes[attr] except KeyError: self.__attrtypes[attr] = 1 # ==================== SPECIFIC BENCHMARK apt_airport = intern("http://www.megginson.com/exp/ns/airports# Airport") apt_latitude = intern("http://www.megginson.com/exp/ns/airports# latitude") apt_uri = "http://www.megginson.com/exp/ns/airports#" apt_len = len(apt_uri) rdf_uri = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf_len = len(rdf_uri) apt_airport2 = ("http://www.megginson.com/exp/ns/airports#", intern("Airport")) apt_latitude2 = ("http://www.megginson.com/exp/ns/airports#", intern("latitude")) class SpecificStatsString(xmlapp.Application): def __init__(self): self.__airports = 0 self.__with_coords = 0 self.__apt_elems = 0 self.__rdf_elems = 0 def handle_start_tag(self, name, attrs): if name == apt_airport: self.__airports = self.__airports + 1 elif name == apt_latitude: self.__with_coords = self.__with_coords + 1 if name[:apt_len] == apt_uri: self.__apt_elems = self.__apt_elems + 1 elif name[:rdf_len] == rdf_uri: self.__rdf_elems = self.__rdf_elems + 1 class SpecificStatsTuple(xmlapp.Application): def __init__(self): self.__airports = 0 self.__with_coords = 0 self.__apt_elems = 0 self.__rdf_elems = 0 def handle_start_tag(self, name, attrs): if name == apt_airport2: self.__airports = self.__airports + 1 elif name == apt_latitude2: self.__with_coords = self.__with_coords + 1 if name[0] == apt_uri: self.__apt_elems = self.__apt_elems + 1 elif name[0] == rdf_uri: self.__rdf_elems = self.__rdf_elems + 1 # ==================== MAIN PROGRAM from xml.parsers.xmlproc import xmlproc import time p = xmlproc.XMLProcessor() start = time.clock() p.set_application(NamespaceFilterTuple(p)) p.parse_resource("airports.rdf") used = time.clock() - start print "Pure parsing time:", used print print "---Generic:" for filter in [NamespaceFilterString, NamespaceFilterInternedString, NamespaceFilterTuple]: p = xmlproc.XMLProcessor() nsfilter = filter(p) nsfilter.set_application(GenericStats()) p.set_application(nsfilter) start = time.clock() p.parse_resource("airports.rdf") used = time.clock() - start print "%30s\t%s" % (filter, used) print print "---Specific:" for (Filter, App) in [(NamespaceFilterString, SpecificStatsString), (NamespaceFilterInternedString, SpecificStatsString), (NamespaceFilterTuple, SpecificStatsTuple)]: p = xmlproc.XMLProcessor() nsfilter = Filter(p) nsfilter.set_application(App()) p.set_application(nsfilter) start = time.clock() p.parse_resource("airports.rdf") used = time.clock() - start print "%30s\t%s" % (Filter, used) #--Lars M. From fdrake@acm.org Tue Feb 29 16:31:35 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 29 Feb 2000 11:31:35 -0500 (EST) Subject: [XML-SIG] SAX 2.0 names In-Reply-To: References: <14523.58937.766784.166817@weyr.cnri.reston.va.us> Message-ID: <14523.62567.356663.65659@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > Hmmm. That might be the way to go. I still wonder about the speed, > though. If the C extension is actually available, it should be about the same a building a tuple; perhaps a *little* faster, but the difference would come out in the wash. > Hmmm. Let's chew on this a little more and hear some more opinions > before deciding. Agreed; I won't have time to write a bunch of new C code for a couple of weeks anyway. > I did the benchmark I spoke of, and the results indicate that the > performance differences are very small between strings and tuples. > Also, how you put together the strings influences the speed a > bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium > II with plenty of RAM and MHz. Looks good! As for string construction, "%s %s" % (uri, localpart) requires 1 malloc() more for the new string than just creating the tuple, and uri + " " + localpart would require the same number of malloc() calls, but slightly more data copying when uri isn't "". Very close, but both require the extra malloc() compared to just using a tuple. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tpassin@idsonline.com Tue Feb 29 20:44:09 2000 From: tpassin@idsonline.com (THOMAS PASSIN) Date: Tue, 29 Feb 2000 15:44:09 -0500 Subject: [XML-SIG] SAX 2.0 names References: <14523.58937.766784.166817@weyr.cnri.reston.va.us> <14523.62567.356663.65659@weyr.cnri.reston.va.us> Message-ID: <001801bf82f5$bba340e0$4d15b0cf@idsonline.com> Fred L. Drake, Jr. wrote > > Lars Marius Garshol writes: > > Hmmm. That might be the way to go. I still wonder about the speed, > > though. > > If the C extension is actually available, it should be about the > same a building a tuple; perhaps a *little* faster, but the difference > would come out in the wash. > > > Hmmm. Let's chew on this a little more and hear some more opinions > > before deciding. > > Agreed; I won't have time to write a bunch of new C code for a > couple of weeks anyway. > > > I did the benchmark I spoke of, and the results indicate that the > > performance differences are very small between strings and tuples. > > Also, how you put together the strings influences the speed a > > bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium > > II with plenty of RAM and MHz. > > Looks good! As for string construction, "%s %s" % (uri, localpart) > requires 1 malloc() more for the new string than just creating the > tuple, and uri + " " + localpart would require the same number of > malloc() calls, but slightly more data copying when uri isn't "". > Very close, but both require the extra malloc() compared to just using > a tuple. > In earlier posts I suggested tuples. Fred and Lars' posts seem to be saying that tuples shouldn't cause a bug performance hit, and that could possibly be finessed anyway. Have I summarized what you have said correctly, Fred and Lars? Then I think we should go with tuples, because 1) They are easy for a non-expert Python programmer to understand and work with, 2) they capitalize on a Python strength (nice data structures), 3) an expert can make them perform even better with extension modules, and 4) as Fred said, if the extension module were not available one could fall back to a 100% Python implementation with practically no changes to existing code. Regards, Tom P From fdrake@acm.org Tue Feb 29 21:17:31 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 29 Feb 2000 16:17:31 -0500 (EST) Subject: [XML-SIG] SAX 2.0 names In-Reply-To: <001801bf82f5$bba340e0$4d15b0cf@idsonline.com> References: <14523.58937.766784.166817@weyr.cnri.reston.va.us> <14523.62567.356663.65659@weyr.cnri.reston.va.us> <001801bf82f5$bba340e0$4d15b0cf@idsonline.com> Message-ID: <14524.14187.795572.189167@weyr.cnri.reston.va.us> THOMAS PASSIN writes: > In earlier posts I suggested tuples. Fred and Lars' posts seem to be saying > that tuples shouldn't cause a bug performance hit, and that could possibly > be finessed anyway. Have I summarized what you have said correctly, Fred > and Lars? That's my interpretation. > Then I think we should go with tuples, because > 1) They are easy for a non-expert Python programmer to understand and work > with, > 2) they capitalize on a Python strength (nice data structures), > 3) an expert can make them perform even better with extension modules, and > 4) as Fred said, if the extension module were not available one could fall > back to a 100% Python implementation with practically no changes to existing > code. The "object" I imagine has three attributes: namespace URI, localpart, and prefix. It would unpack to two values: URI & localpart, and comparisons would only operate on those two as well. The advantage is that we get the prefix for those who want it, single object comparisons, and no extraneous parameters to the call. I don't think *this* is available using the non-object approaches. Whether the objects are extension types or classes is irrelevant to this. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein@lyra.org Tue Feb 29 21:34:47 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 13:34:47 -0800 (PST) Subject: [XML-SIG] SAX 2.0 names In-Reply-To: Message-ID: I'm all for using tuples. If somebody wants extended capabilities through the use of objects, then they can use them on top of tuples. If you start with objects, then you've set a minimum. As Thomas said, using tuples is simple, clean, and Pythonic. KISS On 29 Feb 2000, Lars Marius Garshol wrote: >... > I did the benchmark I spoke of, and the results indicate that the > performance differences are very small between strings and tuples. > Also, how you put together the strings influences the speed a > bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium > II with plenty of RAM and MHz. > > [larsga@pc-larsga python]$ python sax2bench.py > Pure parsing time: 28.73 > > ---Generic: > __main__.NamespaceFilterString 30.25 > __main__.NamespaceFilterInternedString 30.85 > __main__.NamespaceFilterTuple 30.15 > > ---Specific: > __main__.NamespaceFilterString 30.71 > __main__.NamespaceFilterInternedString 30.7 > __main__.NamespaceFilterTuple 29.67 The reason they seem small is because the "benchmark" is bogus. You have a HUGE constant factor. Just look at the thing: hundreds of lines. Classes here and there, function calls over that way, etc. If you want to truly benchmark the varieties, then initialize a number of sample objects and time their *usage*. Alternatively, you can time their *construction* from some fake data. As it is, your test has *way* too much noise in it to provide adequate information about the performance of the alternatives. And besides... performance isn't everything. The use of tuples is clean and straight-forward. That counts for quite a lot. The fact that it appears they are more efficient (based on your rough test) is just another wonderful boon for them. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ken@bitsko.slc.ut.us Fri Feb 18 23:04:27 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 18 Feb 2000 17:04:27 -0600 Subject: [XML-SIG] DOM and Proxies In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 12:37:38 -0800 References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us> <14509.51568.114533.714616@weyr.cnri.reston.va.us> <38ADAD92.BE9948FB@prescod.net> Message-ID: Paul Prescod writes: > "Fred L. Drake, Jr." wrote: > > > > I like this! This also requires proxies to work cleanly, as far as > > I can tell. > > Insofar as this is minidom and provides minimal support for moving > things around, cloning them and so forth, I wouldn't put in proxies > just to get object reuse. In the full PyDOM they would be more > appropriate. AFAIK, "real" DOM doesn't support object reuse anyway, so compliant DOM code wouldn't need proxies for that reason. I was thinking more of the general case, of which a mini-dom could optionally support where a full DOM really wouldn't or shouldn't according to spec. So, I still like proxies for data (especially grove-like data), but if 1.6 doesn't need 'em for DOM, I'm OK. -- Ken From ken@bitsko.slc.ut.us Fri Feb 18 21:11:38 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 18 Feb 2000 15:11:38 -0600 Subject: [XML-SIG] DOM and Proxies In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 07:34:03 -0800 References: <38AD666B.E5C3E20C@prescod.net> Message-ID: Paul Prescod writes: > Proxied objects have "families". All objects in a family live for > the same length of time. Families are expected to be completely > internally linked. There is one proxy "family" for every > LemmingLeader (created through an explicit call to the proxy method) > (e.g. one per DOM). I kinda got lost in this. What's the need for the "family" and LemmingLeader? In my usages of proxies, a reference to the root node is almost always kept somewhere for the life of the tree, so the tree never gets collected until that reference is released. All of the proxies that are generated from the tree are usually just temporary. Even if the root-proxy is created implicitly (not by the user), it still has the only one reference to the root of the tree, when the root proxy is no longer referenced, the tree goes away. This may be what you said, but it sounded like quite a bit more was being described. -- Ken From ken@bitsko.slc.ut.us Tue Feb 29 19:50:56 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Feb 2000 13:50:56 -0600 Subject: C extension (was Re: [XML-SIG] SAX 2.0 names) In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 29 Feb 2000 11:31:35 -0500 (EST)" References: <14523.58937.766784.166817@weyr.cnri.reston.va.us> <14523.62567.356663.65659@weyr.cnri.reston.va.us> Message-ID: "Fred L. Drake, Jr." writes: > Lars Marius Garshol writes: > > Hmmm. That might be the way to go. I still wonder about the > > speed, though. > > If the C extension is actually available, it should be about the > same a building a tuple; perhaps a *little* faster, but the > difference would come out in the wash. Speaking of C extensions, I've started some work on a C library similar to what was discussed here a few months back: the ability to capture attribute (property) access in an efficient way and support generated values, parent proxies, and inherited properties (SVG comes to mind). The core is very grove-like and the implementation is strongly influenced by Objective-C and Python, even though I started it to implement solutions for my Perl modules. Even though the core is grove-like, I definitely want to be able to support a DOM layer over it for those who prefer DOM. The core of the library has no intentional Perl-isms in it, I would really like to have a Python co-developer work with me so we can share the resources developing it. I will/would have made a Python binding for it asap, but it'd be really nice if it happened earlier in the development. I'm just finishing up the core data model and will check the source into my CVS server as soon as that's complete. I'll crosspost between both lists as developments occur. The "basic" integration of the core library and Perl only took a couple of hours, I would expect about the same for Python. -- Ken