From RoD@qnet20.com Thu Feb 1 23:24:23 2001 From: RoD@qnet20.com (Rod) Date: Thu, 1 Feb 2001 23:24:23 Subject: [XML-SIG] Diamond x Jungle Carpet Python Message-ID: <20010202072446.AF667F506@mail.python.org> I have several Diamond x Jungle Capret Pythons for SALE. Make me an offer.... Go to: www.qnet20.com From mal@lemburg.com Fri Feb 2 09:25:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 02 Feb 2001 10:25:53 +0100 Subject: [XML-SIG] Diamond x Jungle Carpet Python References: <20010202072446.AF667F506@mail.python.org> Message-ID: <3A7A7D21.B43614F8@lemburg.com> Rod wrote: > > I have several Diamond x Jungle Capret Pythons for SALE. > > Make me an offer.... > > Go to: www.qnet20.com Perhaps we ought throw together and buy Guido one of these elegant Pythons for the conference ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From eugeneai@icc.ru Fri Feb 2 09:42:45 2001 From: eugeneai@icc.ru (Evgeny Cherkashin) Date: Fri, 2 Feb 2001 17:42:45 +0800 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 Message-ID: <200102020943.RAA23939@monster.icc.ru> Hi! I just installed codecs aware pyXML-0.6.3 package and figured out, that at least for python 2.0 the package installer should replace python's original pyexpat.pyd module (in python's DLLs folder under windows), as it is usually loaded by pyXML (no that new in package). Or, may be, remove all old pyexpat.pyd before installation. This results in pyXML does not support codecs. Thank you for codecs inclusion in the package. Evegeny -- From martin@mira.cs.tu-berlin.de Fri Feb 2 13:33:46 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 2 Feb 2001 14:33:46 +0100 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: <200102020943.RAA23939@monster.icc.ru> (message from Evgeny Cherkashin on Fri, 2 Feb 2001 17:42:45 +0800) References: <200102020943.RAA23939@monster.icc.ru> Message-ID: <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> > I just installed codecs aware pyXML-0.6.3 package and figured out, > that at least for python 2.0 the package installer should replace > python's original pyexpat.pyd module (in python's DLLs folder under > windows), as it is usually loaded by pyXML (no that new in > package). Or, may be, remove all old pyexpat.pyd before > installation. This results in pyXML does not support codecs. Why is that? It should work just fine if you use xml.parsers.expat. Regards, Martin From noreply@sourceforge.net Fri Feb 2 21:26:58 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 02 Feb 2001 13:26:58 -0800 Subject: [XML-SIG] [Bug #130913] XML processing instruction being output wrong Message-ID: Bug #130913, was updated on 2001-Feb-02 13:26 Here is a current snapshot of the bug. Project: Python/XML Category: SAX Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: nobody Assigned to : nobody Summary: XML processing instruction being output wrong Details: The version="1.0" which is required in the XML processing instruction is not included when the XmlWrite.startDocument is done. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=130913&group_id=6473 From guido@digicool.com Sat Feb 3 19:39:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 03 Feb 2001 14:39:54 -0500 Subject: [XML-SIG] Minidom bugs/questions Message-ID: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> I'm making my first steps into XML, so please forgive me. I wrote a simple XML application using a DOM implementation by Digital Creations folks. Then I was trapped in a hotel room with my code on a laptop without a copy of DC's code, but with the Python 2.1a1 release installed. Converting my app to use minidom was easy enough, but I found out a bout a bunch of differences between the two DOM implementations. Some of these are fine with me (e.g. minidom doesn't preserve comments, doesn't prefix its output with "" when writing XML output, minidom returns Unicode strings even for ASCII input). But others suggest that either the DOM standard isn't very strict or unambiguous, or one of the implementations has a bug. Here's the list of things that I had to fix in my code: 1. The other DOM has a hasAttributes() predicate; minidom is missing this and I have to use the more expensive form "if node.attributes". 2. In minidom, Element.getAttribute() and .getAttributeNS() raise KeyError for a non-existing attribute; in the othe DOM, they return "". (Personally, I'd prefer KeyError or perhaps None, but according to Fred, the DOM standard requires "". Note that this is poorly documented -- from the docs for getAttribute*() it's not clear *what* is returned in this case.) 3. Note that getAttributeNode() correctly returns None of the attribute doesn't exist, but getAttributeNodeNS() looks like it will raise KeyError too! 4. In minidom, createDocument() leaves doc.documentElement set to None; in the other DOM, doc.documentElement is initialized to an Element node created from the second argument to createDocument(). (Again, according to Fred, the DOM standard requires the latter.) 5. When writing XML output from a DOM tree that uses namespace attributes, minidom doesn't insert the proper "xmlns:=" attributes. The other DOM gets this right. (This is a bit tricky to do, although I've figured a good way to do it which I'll gladly donate to minidom if it's deemed useful.) 6. When writing XML output from a DOM tree that has a default namespace, minidom writes <:tag>... instead of ... like the other DOM, and like I would have expected. Other comments: 7. I noticed that minidom's __getattr__ special-cases requests for an attribute whose name begins with _get_, and makes up a lambda on the fly. This suggests that the caller is using for _get_foo() where there is no such method, but there is a foo attribute. Since _get_foo() is a detail of the implementation (I hope), doesn't this mean that the implementation is doing something silly? Shouldn't the implementation be fixed rather than accommodated? Or am I missing something? Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6 turns out to require a patch to pulldom.py). 5 is more work; 7 is a trivial patch but I expect there's a reason (in which case a comment would be a nice idea :-). I'd like some feedback before checking this in... *** pulldom.py 2001/01/27 08:47:37 1.17 --- pulldom.py 2001/02/03 19:38:26 *************** *** 56,62 **** # provide us with the original name. If not, create # *a* valid tagName from the current context. if tagName is None: ! tagName = self._current_context[uri] + ":" + localname node = self.document.createElementNS(uri, tagName) else: # When the tagname is not prefixed, it just appears as --- 56,66 ---- # provide us with the original name. If not, create # *a* valid tagName from the current context. if tagName is None: ! prefix = self._current_context[uri] ! if prefix: ! tagName = prefix + ":" + localname ! else: ! tagName = localname node = self.document.createElementNS(uri, tagName) else: # When the tagname is not prefixed, it just appears as *************** *** 66,72 **** for aname,value in attrs.items(): a_uri, a_localname = aname if a_uri: ! qname = self._current_context[a_uri] + ":" + a_localname attr = self.document.createAttributeNS(a_uri, qname) else: attr = self.document.createAttribute(a_localname) --- 70,80 ---- for aname,value in attrs.items(): a_uri, a_localname = aname if a_uri: ! prefix = self._current_context[a_uri] ! if prefix: ! qname = prefix + ":" + a_localname ! else: ! qname = a_localname attr = self.document.createAttributeNS(a_uri, qname) else: attr = self.document.createAttribute(a_localname) *** minidom.py 2001/02/02 19:40:19 1.22 --- minidom.py 2001/02/03 19:38:50 *************** *** 435,444 **** Node.unlink(self) def getAttribute(self, attname): ! return self._attrs[attname].value def getAttributeNS(self, namespaceURI, localName): ! return self._attrsNS[(namespaceURI, localName)].value def setAttribute(self, attname, value): attr = Attr(attname) --- 435,450 ---- Node.unlink(self) def getAttribute(self, attname): ! try: ! return self._attrs[attname].value ! except KeyError: ! return "" def getAttributeNS(self, namespaceURI, localName): ! try: ! return self._attrsNS[(namespaceURI, localName)].value ! except KeyError: ! return "" def setAttribute(self, attname, value): attr = Attr(attname) *************** *** 457,463 **** return self._attrs.get(attrname) def getAttributeNodeNS(self, namespaceURI, localName): ! return self._attrsNS[(namespaceURI, localName)] def setAttributeNode(self, attr): if attr.ownerElement not in (None, self): --- 463,469 ---- return self._attrs.get(attrname) def getAttributeNodeNS(self, namespaceURI, localName): ! return self._attrsNS.get((namespaceURI, localName)) def setAttributeNode(self, attr): if attr.ownerElement not in (None, self): *************** *** 528,533 **** --- 534,545 ---- def _get_attributes(self): return AttributeList(self._attrs, self._attrsNS) + def hasAttributes(self): + if self._attrs or self._attrsNS: + return 1 + else: + return 0 + class Comment(Node): nodeType = Node.COMMENT_NODE nodeName = "#comment" *************** *** 635,640 **** --- 647,654 ---- raise xml.dom.NamespaceErr("illegal use of 'xml' prefix") if prefix and not namespaceURI: raise xml.dom.NamespaceErr("illegal use of prefix without namespaces") + element = doc.createElementNS(namespaceURI, qualifiedName) + doc.appendChild(element) doctype.parentNode = doc doc.doctype = doctype doc.implementation = self --Guido van Rossum (home page: http://www.python.org/~guido/) From Mike.Olson@fourthought.com Sat Feb 3 23:24:47 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 03 Feb 2001 16:24:47 -0700 Subject: [XML-SIG] Updating our servers Message-ID: <3A7C933F.D34EFEE2@FourThought.com> Sorry if you get this twice Just wanted to send a quick message to everyone to say that we are in the middle of updating our web servers so connections to fourthought.com and 4suite.org will be spotty for the rest of the weekend. We hope to have it all configured and running by the end of the day, and then it will take a day for name servers to update and point to the new machines. In the meantime we will be running on both the new and old machine so you _should_ be able to get to the site, but errors might pop up as caches are updated etc. Sorry for the inconvience, but this should help performance of these sites greatly. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Feb 3 23:23:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 4 Feb 2001 00:23:15 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sat, 03 Feb 2001 14:39:54 -0500) References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> Message-ID: <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> > I'm making my first steps into XML, so please forgive me. Hi Guido, I will forgive, but I will still comment :-) > Converting my app to use minidom was easy enough, but I found out a > bout a bunch of differences between the two DOM implementations. Some > of these are fine with me (e.g. minidom doesn't preserve comments, > doesn't prefix its output with "" when writing > XML output, minidom returns Unicode strings even for ASCII input). Actually, input in XML is always Unicode. If no encoding is specified in the document, it is treated as UTF-8. If an encoding is specified, DOM implementations shall transform it into Unicode before giving it to the user. It is only that older Python versions did not support Unicode; I guess that's the reason why the Zope one does not comply here. > 1. The other DOM has a hasAttributes() predicate; minidom is missing > this and I have to use the more expensive form "if node.attributes". Right; that's a bug in minidom: hasAttributes was introduced in "DOM Level 2". The original idea of minidom was that it should be "minimal"; clearly that has not worked out, so we probably should review it carefully to achieve completeness (with respect to "DOM 2 Core"). > 2. In minidom, Element.getAttribute() and .getAttributeNS() raise > KeyError for a non-existing attribute; in the othe DOM, they return > "". (Personally, I'd prefer KeyError or perhaps None, but according > to Fred, the DOM standard requires "". Right. To get the KeyError, use .attributes['attrname'], which is a Python extension to the DOM. > 3. Note that getAttributeNode() correctly returns None of the attribute > doesn't exist, but getAttributeNodeNS() looks like it will raise > KeyError too! Yes, that's yet another error. > 4. In minidom, createDocument() leaves doc.documentElement set to None; > in the other DOM, doc.documentElement is initialized to an Element > node created from the second argument to createDocument(). (Again, > according to Fred, the DOM standard requires the latter.) That was a surprise to me. After reading the spec and a number of implementations, I think the requirement is much stronger: You MUST pass a qualifiedName, only the namespaceURI and the doctype are optional. So your patch is incomplete in this respect; you also need to correct pulldom to pass meaningful content (with your patch, you could get two document elements). It appears to be a common trick to allow null in createDocument, so that the first element found during parsing can be introduced with appendChild, but that appears to be non-conforming (somebody please correct me if it is). I could try to come up with a separate patch for that issue. > 5. When writing XML output from a DOM tree that uses namespace > attributes, minidom doesn't insert the proper "xmlns:=" > attributes. The other DOM gets this right. (This is a bit tricky > to do, although I've figured a good way to do it which I'll gladly > donate to minidom if it's deemed useful.) Yes, that is certainly desirable; minidom should support namespaces fully. > > 6. When writing XML output from a DOM tree that has a default > namespace, minidom writes <:tag>... instead of > ... like the other DOM, and like I would have expected. Certainly a bug. When writing out namespace declarations, dealing with default default namespace is really tricky (e.g. when a tree that had a default namespace is extended with an element with no namespace). > 7. I noticed that minidom's __getattr__ special-cases requests for an > attribute whose name begins with _get_, and makes up a lambda on the > fly. This suggests that the caller is using for _get_foo() where > there is no such method, but there is a foo attribute. Since > _get_foo() is a detail of the implementation (I hope) No, its actually not. The DOM is defined in terms of CORBA IDL, unfortunately with a massive use of attributes. Attributes, in CORBA, map to two functions, _get_ and _set_; this is also how the IDL language mapping for Python works. So the canonical way of using DOM in Python would be to use the _get_ and _set_ methods; a number of Python DOM implementations support that - although the now-official Python DOM mapping marks these methods as optional. Some people might be using this interface, e.g. when they access a DOM both locally and remotely. Some may use it because they consider accessor functions cleaner than attribute access. Since it does not cost anything to have that feature, I'd leave it. > Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6 > turns out to require a patch to pulldom.py). The ones for 1,2,3 and 6 look fine; for the one to 4, see my comments above. > 7 is a trivial patch but I expect there's a reason (in which case a > comment would be a nice idea :-). It is elaborated at http://python.sourceforge.net/devel-docs/lib/dom-accessor-methods.html So referring the reader to the documentation may be appropriate. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun Feb 4 23:12:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 00:12:15 +0100 Subject: [XML-SIG] Providing a DOMImplementationFactory Message-ID: <200102042312.f14NCF501636@mira.informatik.hu-berlin.de> The DOM level 3 draft proposes a mechanism for Java to locate a DOMImplementation object. In short, Java programs can invoke org.w3c.dom.DOMImplementationFactory.getDOMImplementation() which loads the implementation defined in the property org.w3c.dom.DOMImplementation. Should Python offer a similar mechanism? If so, how should it work? I can think of the following strategy: - offer two functions, xml.dom.getDOMImplementation([name]) xml.dom.registerDOMImplementation(name, implementation) That is not really a factory, but rather a locator (should that be an implementation factory?) - In getDOMImplementation, use various approaches of returning an implementation: * if a name was given, and an implementation with that name was registered, return it. Well-known names should be published by posting to xml-sig@python.org, and subsequently recorded in xml.dom.__init__ * if no name is given, but the PYTHON_DOM environment variable is set, this variable names a module which should have an .implementation attribute; this is then used. I don't know whether it is good or bad that Python does not provide Java-style properties... * if no name was given, and attempt to return a "best" implementation should be done, where best means "most featureful". Not sure how to compute this, though. - The implementation of xml.dom.__init__ would provide a number of pre-registered DOM implementations, which would always include minidom and would include 4DOM if PyXML is installed. - add-on packages (like 4Suite, or Zope) can install .pth files which register additional DOM implementations (starting with Python 2.1). Please comment. Regards, Martin From eugeneai@icc.ru Mon Feb 5 02:32:13 2001 From: eugeneai@icc.ru (Evgeny Cherkashin) Date: Mon, 5 Feb 2001 10:32:13 +0800 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> Message-ID: <200102050233.KAA24522@monster.icc.ru> On Fri, 2 Feb 2001 14:33:46 +0100 "Martin v. Loewis" wrote: MVL> > I just installed codecs aware pyXML-0.6.3 package and figured out, MVL> > that at least for python 2.0 the package installer should replace MVL> > python's original pyexpat.pyd module (in python's DLLs folder under MVL> > windows), as it is usually loaded by pyXML (no that new in MVL> > package). Or, may be, remove all old pyexpat.pyd before MVL> > installation. This results in pyXML does not support codecs. MVL> MVL> Why is that? It should work just fine if you use xml.parsers.expat. MVL> But in the automatical mode (without explicit notification) does not. MVL> Regards, MVL> Martin MVL> -- From uche.ogbuji@fourthought.com Mon Feb 5 04:46:35 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 04 Feb 2001 21:46:35 -0700 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: Message from "Martin v. Loewis" of "Sun, 04 Feb 2001 00:23:15 +0100." <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> Message-ID: <200102050446.VAA29599@localhost.localdomain> > > Converting my app to use minidom was easy enough, but I found out a > > bout a bunch of differences between the two DOM implementations. Some > > of these are fine with me (e.g. minidom doesn't preserve comments, > > doesn't prefix its output with "" when writing > > XML output, minidom should be fixed to put out an XML declaration, preferably with the encoding. This is hardly a burden, and is *highly* recommended XML practice. > > minidom returns Unicode strings even for ASCII input). > > 1. The other DOM has a hasAttributes() predicate; minidom is missing > > this and I have to use the more expensive form "if node.attributes". > > Right; that's a bug in minidom: hasAttributes was introduced in "DOM > Level 2". > > The original idea of minidom was that it should be "minimal"; clearly > that has not worked out, so we probably should review it carefully to > achieve completeness (with respect to "DOM 2 Core"). Well, we should think about exactly what makes minidom "mini". It's debatable whether it is possible to implement all of DOM Level 2 core and still be "mini". And what about DOm level 3? > > 4. In minidom, createDocument() leaves doc.documentElement set to None; > > in the other DOM, doc.documentElement is initialized to an Element > > node created from the second argument to createDocument(). (Again, > > according to Fred, the DOM standard requires the latter.) > > That was a surprise to me. After reading the spec and a number of > implementations, I think the requirement is much stronger: You MUST > pass a qualifiedName, only the namespaceURI and the doctype are > optional. Yes. This is a pain, but it is clearly fundamental to the DOM WG conceptual model. > It appears to be a common trick to allow null in createDocument, so > that the first element found during parsing can be introduced with > appendChild, but that appears to be non-conforming (somebody please > correct me if it is). I think it is, even though 4DOM does this. Mike or Jeremy will probably remind me if I'm missing something. From what I see of the readers, we don't need this convenience. > I could try to come up with a separate patch for that issue. > > > 5. When writing XML output from a DOM tree that uses namespace > > attributes, minidom doesn't insert the proper "xmlns:=" > > attributes. The other DOM gets this right. (This is a bit tricky > > to do, although I've figured a good way to do it which I'll gladly > > donate to minidom if it's deemed useful.) > > Yes, that is certainly desirable; minidom should support namespaces > fully. Of course if it isn't Level 2 compliant, it needn't do so. I wouldn't consider it unreasonable to have minidom L1 only. If users want Level 2, they install PyXML or other. > > 6. When writing XML output from a DOM tree that has a default > > namespace, minidom writes <:tag>... instead of > > ... like the other DOM, and like I would have expected. > > Certainly a bug. When writing out namespace declarations, dealing with > default default namespace is really tricky (e.g. when a tree that had > a default namespace is extended with an element with no namespace). Horrid bug. Those are invalid XML 1.0 NMTOKENS. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Feb 5 04:50:37 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 04 Feb 2001 21:50:37 -0700 Subject: [XML-SIG] Providing a DOMImplementationFactory In-Reply-To: Message from "Martin v. Loewis" of "Mon, 05 Feb 2001 00:12:15 +0100." <200102042312.f14NCF501636@mira.informatik.hu-berlin.de> Message-ID: <200102050450.VAA29629@localhost.localdomain> > The DOM level 3 draft proposes a mechanism for Java to locate a > DOMImplementation object. In short, Java programs can invoke > > org.w3c.dom.DOMImplementationFactory.getDOMImplementation() > > which loads the implementation defined in the property > org.w3c.dom.DOMImplementation. Should Python offer a similar > mechanism? If so, how should it work? > > I can think of the following strategy: > - offer two functions, > xml.dom.getDOMImplementation([name]) > xml.dom.registerDOMImplementation(name, implementation) > > That is not really a factory, but rather a locator (should that be > an implementation factory?) I think it should be a factory, because I've just been thinking about the ability to set properties non-globally on DOM implementations. For instance, I think 4DOM should come with the mutation event system disabled unless support for this is set as a property. A factory would be a perfect place to set such properties. > - In getDOMImplementation, use various approaches of returning an > implementation: > * if a name was given, and an implementation with that name was > registered, return it. Well-known names should be published by > posting to xml-sig@python.org, and subsequently recorded in > xml.dom.__init__ > * if no name is given, but the PYTHON_DOM environment variable is set, > this variable names a module which should have an .implementation > attribute; this is then used. I don't know whether it is good or bad > that Python does not provide Java-style properties... > * if no name was given, and attempt to return a "best" implementation > should be done, where best means "most featureful". Not sure how > to compute this, though. > > - The implementation of xml.dom.__init__ would provide a number of > pre-registered DOM implementations, which would always include > minidom and would include 4DOM if PyXML is installed. > > - add-on packages (like 4Suite, or Zope) can install .pth files which > register additional DOM implementations (starting with Python 2.1). Sounds good enough to try out. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Mon Feb 5 05:38:22 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 5 Feb 2001 00:38:22 -0500 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> Message-ID: <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> Here is a revised version of the PEP about using None for namespace URIs. It's extended quite a bit. I've tried to use the suggestions of Ken MacLeod, Uchi, and Martin, among others, and I've spent a fair amount of time rummaging through the various Recs. I was disappointed to learn that the PyXML docs, as well as the 4DOM docs, don't really say anything about this issue. So I made a start on reading through some of the code (not the most recent versions in the CVS tree, but what I've got from version 0.6.2 and from downloading from 4Thought). I'd appreciate it if anyone who is quite familiar with the SAX and SAX2 code help out and verify if using None would cause any problems for the existing code. The PEP (below) makes for a longish posting, but I didn't want to use an attachment unless everyone agrees it's OK to do so.. What do you all think about using attachments for this kind of thing? Cheers, Tom P ============================================= xmlpep-1 Values for Null Or Empty Namespace URIs 0.20 Draft Standards Track 29-Jan-2001 This PEP specifies the proper values of the Namespace URI property when its value might otherwise appear to be either "null", "None", or the empty string. Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3] These three recommendations do not appear to be in full agreement. This fact, and differences between Java and Python, has lead to some confusion and some disagreement between various implementations supported by PyXML. The language in these three Recommendations is reviewed. The recommendation is made to use None as the URI value in all cases where no URI applies to an element or attribute. The XMLPEP, when approved, will apply to all namespace-aware software maintained by the pyxml interest group. When no namespace has been declared whose scope applies to a particular element or attribute, the application MUST report the URI of the namespace of the element or attribute as None. When there is no namespace prefix, the application MUST report the value of the prefix as None. This requirement does not apply for applications that are not namespace-aware. This requirement applies to all XML processing software maintained by the PyXML interest group. This PEP is needed because of continued uncertainty among varous PyXML developers as to the proper values to use, and because of inconsistency among various PyXML products. Differences between Python, IDL, and Java make an unambiguous interpretation unclear. A definitive and consistent treatment is needed so that all the PyXML software may be made consistent. The Namespaces Recommendation recognizes that a namespace URI may be given no value - called "empty" in the Recommendation - even though a structure for a URI is provided in the document. Two relevant passages are quoted here: Section 2. ... [Definition:] If the attribute name matches DefaultAttName, then the namespace name in the attribute value is that of the default namespace in the scope of the element to which the declaration is attached. In such a default declaration, the attribute value may be empty. 5.2 Namespace Defaulting A default namespace is considered to apply to the element where it is declared (if that element has no namespace prefix), and to all elements with no prefix within the content of that element. If the URI reference in a default namespace declaration is empty, then unprefixed elements in the scope of the declaration are not considered to be in any namespace. Note that default namespaces do not apply directly to attributes. ...The default namespace can be set to the empty string. This has the same effect, within the scope of the declaration, of there being no default namespace. The term "empty" is not defined further, but in the context of the Recommendation, it must mean a missing string value. The last fragment quoted above suggests, but does not require, that an empty string may be returned for an "empty" URI value. This has no direct applicability to values returned by implemenations, since 1) the word "can" is used, rather than "must", and 2) the Recommendation seems to apply to XML documents, not to implementations. The W3C DOM Level 2 Recommendation refers to "null" namespaces in several places. The thrust is clear and consistent: a "null" value is to be used to indicate a non-existent namespace URI value. Here are some relevant extracts from the Recommendation: Note that because the DOM does no lexical checking, the empty string will be treated as a real namespace URI in DOM Level 2 methods. Applications must use the value null as the namespaceURI parameter for methods if they wish to have no namespace. The IDL definition for the createAttributeNS() method creates an attribute with these characteristics: A new Attr object with the following attributes: Attribute Value Node.nodeName qualifiedName Node.namespaceURI namespaceURI Node.prefix prefix, extracted from qualifiedName, or null if there is no prefix Node.localName local name, extracted from qualifiedName Attr.name qualifiedName Node.nodeValue the empty string For the older, non-NS aware createAttribute() method, the Recommendation says ...localName, prefix, and namespaceURI set to null. This is typical - a "null" is returned of there is no prefix or URI. It is clear that the IDL specifies the use of "null" for empty namespaces, rather that the empty string. The java binding does not specify any particular way value. Thus there seems to be nothing the the DOM Recommendation that suggests that empty strings should be used, and there is clear language that "null" values should be used. The SAX2 java API clearly says that an empty string is to be returned. The following extracts demonstrate this: In SAX2, the startElement and endElement callbacks in a content handler look like this: public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException; public void endElement (String uri, String localName, String qName) throws SAXException; By default, an XML reader will report a Namespace URI and a local name for every element, in both the start and end handler. Consider the following example: With the default SAX2 Namespace processing, the XML reader would report a start and end element event with the Namespace URI "http://www.w3.org/1999/xhtml" and the local name "hr". The XML reader might also report the original qName "html:hr", but that parameter might simply be an empty string. If namespaces is true and namespace-prefixes is true, then a SAX2 XML reader will report the following: an element with the Namespace URI "http://www.greeting.com/ns/", the local name "hello", and the qName "h:hello"; an attribute with no Namespace URI (empty string), no local name (empty string), and the qName "xmlns:h"; an attribute with no Namespace URI (empty string), the local name "id", and the qName "id"; and an attribute with the Namespace URI "http://www.greeting.com/ns/", the local name "person", and the qName "h:person". To summarize, the Namespace Recommendation is essentially silent on the subject, the DOM clearly specifies "null" values, and SAX2 clearly specifies the use of empty strings. The "highest" level Recommendation is presumably the DOM. Python offers a data object similar to "null" - the None object. The None object can be tested for exactly as for an empty string: if uri: doYourThing() Alternatively, None can be tested for explicitly, as in: if uri is not None: doYourThing() Thus, None is flexible enough to be useful for this purpose. Many posts to the PyXML list have favored the use of None, although not all. Either None or the empty string would seem to work in this context. "None" agrees with the DOM Recommendation, and would seem (in a mnemonic sense)to suggest the absence of a prefix or URI. The 4DOM code will handle a None URI correctly in many places, since it uses tests like this typical example: if namespaceURI and namespaceURI != XML_NAMESPACE: # ... This code works correctly if the namespaceURI is None. Another test used in 4DOM is as follows: def getElementsByTagNameNS(self,namespaceURI,localName): root = self.documentElement if root == None: return implementation.createNodeList([]) py = root.getElementsByTagNameNS(namespaceURI,localName) if namespaceURI == '*' or namespaceURI == root.namespaceURI: if localName == '*' or localName == root.localName: py.insert(0,root) return py The expression "namespaceURI == '*'" also evaluates correctly when the URI is None. If handling code is consistent throughout 4DOM, then it will handle None correctly. [Need material here] [Should there be a reference here to one particular processor, such as xmlproc?] This PEP may be used by anyone. From Mike.Olson@fourthought.com Mon Feb 5 06:17:47 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 04 Feb 2001 23:17:47 -0700 Subject: [XML-SIG] Minidom bugs/questions References: <200102050446.VAA29599@localhost.localdomain> Message-ID: <3A7E458B.E787CD07@FourThought.com> Uche Ogbuji wrote: > > The original idea of minidom was that it should be "minimal"; clearly > > that has not worked out, so we probably should review it carefully to > > achieve completeness (with respect to "DOM 2 Core"). > > Well, we should think about exactly what makes minidom "mini". It's debatable > whether it is possible to implement all of DOM Level 2 core and still be > "mini". And what about DOm level 3? I think we should also look at merging minidom and pDomlette. Both are supposed to be "mini" and I think they both support about the same sets of functionality. No sense keeping both of them around. I can look at the differences and try to merge them. > > > It appears to be a common trick to allow null in createDocument, so > > that the first element found during parsing can be introduced with > > appendChild, but that appears to be non-conforming (somebody please > > correct me if it is). > > I think it is, even though 4DOM does this. Mike or Jeremy will probably > remind me if I'm missing something. From what I see of the readers, we don't > need this convenience. It was originally there for the readers and to allow a user to create a document with out a document type. I don't think the readers need this functionality any more (I'd have to look at all of them). -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Mon Feb 5 06:19:37 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 04 Feb 2001 23:19:37 -0700 Subject: [XML-SIG] Providing a DOMImplementationFactory References: <200102042312.f14NCF501636@mira.informatik.hu-berlin.de> Message-ID: <3A7E45F9.1AF5A19C@FourThought.com> "Martin v. Loewis" wrote: > > The DOM level 3 draft proposes a mechanism for Java to locate a > DOMImplementation object. In short, Java programs can invoke > > org.w3c.dom.DOMImplementationFactory.getDOMImplementation() > > which loads the implementation defined in the property > org.w3c.dom.DOMImplementation. Should Python offer a similar > mechanism? If so, how should it work? > > I can think of the following strategy: > - offer two functions, > xml.dom.getDOMImplementation([name]) > xml.dom.registerDOMImplementation(name, implementation) > > That is not really a factory, but rather a locator (should that be > an implementation factory?) > > - In getDOMImplementation, use various approaches of returning an > implementation: > * if a name was given, and an implementation with that name was > registered, return it. Well-known names should be published by > posting to xml-sig@python.org, and subsequently recorded in > xml.dom.__init__ > * if no name is given, but the PYTHON_DOM environment variable is set, > this variable names a module which should have an .implementation > attribute; this is then used. I don't know whether it is good or bad > that Python does not provide Java-style properties... > * if no name was given, and attempt to return a "best" implementation > should be done, where best means "most featureful". Not sure how > to compute this, though. > > - The implementation of xml.dom.__init__ would provide a number of > pre-registered DOM implementations, which would always include > minidom and would include 4DOM if PyXML is installed. > > - add-on packages (like 4Suite, or Zope) can install .pth files which > register additional DOM implementations (starting with Python 2.1). I like this approach. It is the one I recommended to Jeremy for the XPath interface and I wouldn't mind seeing it used for all of the xml libraries (XPath, XPointer, DOM, etc). +1 for me Mike > > Please comment. > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Feb 5 06:29:08 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 04 Feb 2001 23:29:08 -0700 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: Message from Mike Olson of "Sun, 04 Feb 2001 23:17:47 MST." <3A7E458B.E787CD07@FourThought.com> Message-ID: <200102050629.XAA29875@localhost.localdomain> > Uche Ogbuji wrote: > > > The original idea of minidom was that it should be "minimal"; clearly > > > that has not worked out, so we probably should review it carefully to > > > achieve completeness (with respect to "DOM 2 Core"). > > > > Well, we should think about exactly what makes minidom "mini". It's debatable > > whether it is possible to implement all of DOM Level 2 core and still be > > "mini". And what about DOm level 3? > > I think we should also look at merging minidom and pDomlette. Both are > supposed to be "mini" and I think they both support about the same sets > of functionality. No sense keeping both of them around. I can look at > the differences and try to merge them. Before you start doing this, I think we need to really air the matter out. It wouldn't normally be such a big deal except for the special status of minidom (as the default Python DOM). My sentiments are in favor of the idea. Probably the biggest issues would be the DOM extension interfaces, e.g. PrettyPrint vs. toXML. Of course DOM Level 3 should settle that. This would be a very opportune time for Paul Prescod to make a re-appearance. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 06:57:29 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 07:57:29 +0100 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: <200102050233.KAA24522@monster.icc.ru> (message from Evgeny Cherkashin on Mon, 5 Feb 2001 10:32:13 +0800) References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru> Message-ID: <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> > MVL> Why is that? It should work just fine if you use xml.parsers.expat. > MVL> > > But in the automatical mode (without explicit notification) does not. Can you please elaborate? If one writes from xml.parsers import expat it works fine; the PyXML version of pyexpat is used. What is the automatical mode? What is explicit notification? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 07:21:25 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 08:21:25 +0100 Subject: [XML-SIG] Providing a DOMImplementationFactory In-Reply-To: <200102050450.VAA29629@localhost.localdomain> (message from Uche Ogbuji on Sun, 04 Feb 2001 21:50:37 -0700) References: <200102050450.VAA29629@localhost.localdomain> Message-ID: <200102050721.f157LPo00881@mira.informatik.hu-berlin.de> > > xml.dom.getDOMImplementation([name]) > > xml.dom.registerDOMImplementation(name, implementation) > > > I think it should be a factory, because I've just been thinking > about the ability to set properties non-globally on DOM > implementations. For instance, I think 4DOM should come with the > mutation event system disabled unless support for this is set as a > property. Ok. So register gets a callable as its second argument then, and modules which provide an implemation should provide a getDOMImplementation function (in addition to the implementation singleton that they may provide for backwards compatibility). > Sounds good enough to try out. Thanks. I'll prepare a patch for PyXML and 2.1b1. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 07:14:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 08:14:15 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <200102050446.VAA29599@localhost.localdomain> (message from Uche Ogbuji on Sun, 04 Feb 2001 21:46:35 -0700) References: <200102050446.VAA29599@localhost.localdomain> Message-ID: <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> > minidom should be fixed to put out an XML declaration, preferably > with the encoding. This is hardly a burden, and is *highly* > recommended XML practice. Certainly. I'll look into that once Guido has committed his patches; there is also a "pretty-print" patch pending that I'll have to commit. > > The original idea of minidom was that it should be "minimal"; clearly > > that has not worked out, so we probably should review it carefully to > > achieve completeness (with respect to "DOM 2 Core"). > > Well, we should think about exactly what makes minidom "mini". It's > debatable whether it is possible to implement all of DOM Level 2 > core and still be "mini". And what about DOm level 3? I think the original understanding was that everything that is "convenience", ie. can be composed from other interfaces, should not be included. In addition, minidom originally had no DOMImplementation, you had to know the implementation class names to build a tree. That approach has failed; people have been contributing bits and pieces so that what they wanted to use is there. These days, I think it is mini by only implementing DOM Core. That probably makes it a AA battery. [supporting namespaces] > Of course if it isn't Level 2 compliant, it needn't do so. I > wouldn't consider it unreasonable to have minidom L1 only. If users > want Level 2, they install PyXML or other. I'd say that this is a matter of internal consistency. Since the SAX part in Python supports namespaces, the DOM part should do so as well. That means L2. It also turns out that what I hope is the larger half of NS support is already in minidom as of Python 2.0, so ripping it out would not be sensible. As for supporting L3, following your advice to not do anything until the spec nears completion is reasonable. If there is any interest, providing a standard definition for the enumerations (inside Node3) would be feasible, if the exact version of the draft is documented in the code. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 07:25:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 08:25:33 +0100 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs In-Reply-To: <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> Message-ID: <200102050725.f157PXD00934@mira.informatik.hu-berlin.de> > The PEP (below) makes for a longish posting, but I didn't want to > use an attachment unless everyone agrees it's OK to do so.. What do > you all think about using attachments for this kind of thing? While hopefully looking at the actual text later, I think a major point of the Python PEPs is that they are online even when in draft status. That way, interested people don't have to react when it is published, as they can always go to a well-known repository and look what's there. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 07:45:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 08:45:24 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <3A7E458B.E787CD07@FourThought.com> (message from Mike Olson on Sun, 04 Feb 2001 23:17:47 -0700) References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com> Message-ID: <200102050745.f157jOm00978@mira.informatik.hu-berlin.de> > I think we should also look at merging minidom and pDomlette. Both are > supposed to be "mini" and I think they both support about the same sets > of functionality. No sense keeping both of them around. I can look at > the differences and try to merge them. I never quite understood where the "p" in pDomlette came from. To date, pDomlette is just 200 lines longer than minidom, so yes, merging them is a genuine option. Bear in mind that a new Python release is upcoming, and that the final beta release is probably the last point to add missing features (i.e. bug corrections with regard to DOM conformance). It is not inherently wrong to include a more complete version of minidom with PyXML, but it would be nice if it was stable after 2.1. As for the differences, I wonder what to do with the auto-normalization feature of pDomlette. I can't figure out what exactly that means: auto-normalization during parsing, or auto-normalization during insertion of nodes. While I can see that it is useful, I'm concerned about standards compliance here. > > > It appears to be a common trick to allow null in createDocument, > > > so that the first element found during parsing can be introduced > > > with appendChild, but that appears to be non-conforming > > > (somebody please correct me if it is [conforming, I meant]). > > > > I think it is, even though 4DOM does this. Mike or Jeremy will > > probably remind me if I'm missing something. From what I see of > > the readers, we don't need this convenience. > > It was originally there for the readers and to allow a user to > create a document with out a document type. I don't think the > readers need this functionality any more (I'd have to look at all of > them). I feel some misunderstanding here. I'm talking about code like if ownerDoc == None: dt = implementation.createDocumentType('', '', '') self._ownerDoc = implementation.createDocument('', None, dt) self._rootNode = self._ownerDoc (from xml.dom.ext.reader.Sax), in particular about the invocation of createDocument with a null qualifiedName. I could not find any permission in the DOM spec for such usage, and Xerces/C++ has code like something::createDocument(DOMString& uri, DOMString& qualifiedName, DocumentType*dt){ Document *d = new DocumentImpl(dt); d->appendChild(new ElementImpl(uri, qualifiedName); return d; } I.e. they create an element unconditionally, whereas 4DOM.DOMImplementation creates it only if qualifiedName. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 07:59:26 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 08:59:26 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <200102050629.XAA29875@localhost.localdomain> (message from Uche Ogbuji on Sun, 04 Feb 2001 23:29:08 -0700) References: <200102050629.XAA29875@localhost.localdomain> Message-ID: <200102050759.f157xQc01033@mira.informatik.hu-berlin.de> > Before you start doing this, I think we need to really air the > matter out. It wouldn't normally be such a big deal except for the > special status of minidom (as the default Python DOM). > My sentiments are in favor of the idea. Probably the biggest issues > would be the DOM extension interfaces, e.g. PrettyPrint vs. toXML. > Of course DOM Level 3 should settle that. > This would be a very opportune time for Paul Prescod to make a > re-appearance. I agree in all three points, in particular with the last one :-) On the second point, I think a PEP "standard DOM extensions" would be good. Even if that is not ready for Python 2.1, it would be desirable unless L3 supercedes it before. In particular, it should deal with the following aspects: - getting an implementation; I think I can provide the proposed interface RSN. - getting a tree from a parser. For SAX parsers, we could publish the pulldom contents, which has a standard DOM builder, as long as it is provided with a SAX parser and a DOM implementation. That would not cover the "smart" 4Suite DOM builders which directly interact with a parser, or do other stuff besides building the tree. - pretty printing. Any volunteers who want to draft a proposal? This is the time to get your own share of fame :-) Regards, Martin P.S. As for people who I'd like to appear or re-appear: Anybody from digicool interested? Fred's and Guido's comments are always a pleasure to read, but who is the person or the place I could bombard with questions about XML-in-Zope? From eugeneai@icc.ru Mon Feb 5 08:09:29 2001 From: eugeneai@icc.ru (Evgeny Cherkashin) Date: Mon, 5 Feb 2001 16:09:29 +0800 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru> <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> Message-ID: <200102050810.QAA28414@monster.icc.ru> On Mon, 5 Feb 2001 07:57:29 +0100 "Martin v. Loewis" wrote: MVL> from xml.parsers import expat MVL> Okay. I undestood. MVL> Regards, MVL> Martin MVL> -- From tpassin@home.com Mon Feb 5 12:58:07 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 5 Feb 2001 07:58:07 -0500 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> <200102050725.f157PXD00934@mira.informatik.hu-berlin.de> Message-ID: <001401c08f73$491b22a0$7cac1218@reston1.va.home.com> Martin v. Loewis > > While hopefully looking at the actual text later, I think a major > point of the Python PEPs is that they are online even when in draft > status. That way, interested people don't have to react when it is > published, as they can always go to a well-known repository and look > what's there. > I agree. Shouldn't these xmlPEPs go onto the SF site? Who can set that up? Cheers, Tom P From akuchlin@mems-exchange.org Mon Feb 5 13:19:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 5 Feb 2001 08:19:28 -0500 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, Feb 05, 2001 at 08:59:26AM +0100 References: <200102050629.XAA29875@localhost.localdomain> <200102050759.f157xQc01033@mira.informatik.hu-berlin.de> Message-ID: <20010205081928.A15233@newcnri.cnri.reston.va.us> On Mon, Feb 05, 2001 at 08:59:26AM +0100, Martin v. Loewis wrote: >P.S. As for people who I'd like to appear or re-appear: Anybody from >digicool interested? Fred's and Guido's comments are always a pleasure >to read, but who is the person or the place I could bombard with >questions about XML-in-Zope? It might be Fred, now; see http://www.advogato.org/person/fdrake/ . --amk From guido@digicool.com Mon Feb 5 15:11:38 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 05 Feb 2001 10:11:38 -0500 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: Your message of "Mon, 05 Feb 2001 08:59:26 +0100." <200102050759.f157xQc01033@mira.informatik.hu-berlin.de> References: <200102050629.XAA29875@localhost.localdomain> <200102050759.f157xQc01033@mira.informatik.hu-berlin.de> Message-ID: <200102051511.KAA31888@cj20424-a.reston1.va.home.com> > P.S. As for people who I'd like to appear or re-appear: Anybody from > digicool interested? Fred's and Guido's comments are always a pleasure > to read, but who is the person or the place I could bombard with > questions about XML-in-Zope? I'm subscribed again. Both Fred and I can handle questions about XML-in-Zope; Fred has more implementation knowledge, I've been more involved in architectural issues (plus one prototype app). I'm pretty excited about the Template Attribute Language and Zope Presentation Templates, but I'm not sure if it is the right time to describe that yet. (I'll know more later this week.) The ParsedXML stuff is open though: http://www.zope.org/Wikis/DevSite/Projects/ParsedXML/FrontPage --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Feb 5 15:22:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 05 Feb 2001 10:22:23 -0500 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: Your message of "Mon, 05 Feb 2001 07:57:29 +0100." <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru> <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> Message-ID: <200102051522.KAA31951@cj20424-a.reston1.va.home.com> > > MVL> Why is that? It should work just fine if you use xml.parsers.expat. > > MVL> > > > > But in the automatical mode (without explicit notification) does not. > > Can you please elaborate? If one writes > > from xml.parsers import expat > > it works fine; the PyXML version of pyexpat is used. What is the > automatical mode? What is explicit notification? Adding more confusion: I recently got bitten by a really nasty convention in Zope where you must do "import ZODB" for a side effect it has. (It installs a persistency implementation in another module. If you import that other module before ZODB, it fails with a mysterious error.) I would hope that the standard xml package (nor PyXML) does not repeat that trick (requiring an import for its side effect). Factory functions should be used, and if there's a generic factory function, it should also have a sensible default. I don't know enough about the xml package to be able to figure out whether or not it engages in such tricks, and offer my apologies if this is already taken care of! --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 18:00:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 19:00:49 +0100 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs In-Reply-To: <001401c08f73$491b22a0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> <200102050725.f157PXD00934@mira.informatik.hu-berlin.de> <001401c08f73$491b22a0$7cac1218@reston1.va.home.com> Message-ID: <200102051800.f15I0n600860@mira.informatik.hu-berlin.de> > I agree. Shouldn't these xmlPEPs go onto the SF site? Who can set that up? It would be best if you check it into the www project in the CVS, which will also provide the versioning of the document. At the moment, you have to run //pyxml/doupdate on an SF shell machine to propagate the content to the Web page; I hope I can restore the cron job for that. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 5 18:07:02 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 5 Feb 2001 19:07:02 +0100 Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3 In-Reply-To: <200102051522.KAA31951@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Mon, 05 Feb 2001 10:22:23 -0500) References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru> <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> <200102051522.KAA31951@cj20424-a.reston1.va.home.com> Message-ID: <200102051807.f15I72E00863@mira.informatik.hu-berlin.de> > I would hope that the standard xml package (nor PyXML) does not repeat > that trick (requiring an import for its side effect). Factory > functions should be used, and if there's a generic factory function, > it should also have a sensible default. No, the trick it engages in is that xml/parsers/expat.py reads from pyexpat import * Now, PyXML provides a pyexpat copy that sometimes supercedes the one in Python (if bugs are detected in the Python version at installation time). It installs xml/parsers/pyexpat.pyd, so that is found in the package before the builtin module; if it is not present, the builtin is used. I'm not sure what Evgeny's concern was, perhaps that a plain import pyexpat in the application would not get the PyXML-provided replacement; there is not much we could do about that. Regards, Martin From guido@digicool.com Mon Feb 5 19:22:09 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 05 Feb 2001 14:22:09 -0500 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: Your message of "Sun, 04 Feb 2001 00:23:15 +0100." <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> Message-ID: <200102051922.OAA01298@cj20424-a.reston1.va.home.com> Thanks to Martin and Uche, I've gathered confidence and checked in my changes to minidom and pulldom in the Python tree. (Are there also PyXML versions? Someone should update them too then.) I'll leave it to Martin to add code to raise hell when createDocument() is passed an empty qualified name, and also to change pulldom to do the right thing when it in fact passes a non-null qualified name: The code in startDocument() looks like it would insert two document elements if self._locator is set and its getPublicId() returns a non-null qualified name. I don't know how to fix that, or how common this is. (My checkin comment has a bug: it claims that hasAttributes() is DOM level 3, but it is really level 2.) --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Tue Feb 6 01:31:46 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 6 Feb 2001 02:31:46 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <200102051922.OAA01298@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Mon, 05 Feb 2001 14:22:09 -0500) References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> <200102051922.OAA01298@cj20424-a.reston1.va.home.com> Message-ID: <200102060131.f161Vk311008@mira.informatik.hu-berlin.de> > Are there also PyXML versions? Someone should update them too then. At the moment, the Python versions are a couple of revisions ahead. I'll add the recently-discussed factory finder for DOM implementations to xml.dom.__init__, and merge all that into PyXML afterwards. > I'll leave it to Martin to add code to raise hell when > createDocument() is passed an empty qualified name, and also to change > pulldom to do the right thing when it in fact passes a non-null > qualified name Done. > The code in startDocument() looks like it would insert two document > elements if self._locator is set and its getPublicId() returns a > non-null qualified name. I don't know how to fix that, or how > common this is. I think this code was completely bogus. The author apparently thought of creating DocumentTypes, in which case publicId and systemId would be required. However, the SAX locator does not provide that information (atleast not for the DTD; rather for the document itself), nor were we in the process of creating document types. It seems that the processing of the doctype argument is also incorrect: It should *not* create one given the qualifiedName, atleast I can't find any indication that it should. It MUST set the ownerDocument, though, which it doesn't. I'm not sure whether the doctype needs to appear in the childNodes of the Document, can anybody clarify this? Regards, Martin From mclay@nist.gov Tue Feb 6 02:03:41 2001 From: mclay@nist.gov (Michael McLay) Date: Mon, 5 Feb 2001 21:03:41 -0500 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 Message-ID: <0102052103410E.03631@fermi.eeel.nist.gov> The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to include the addition of the freevars and cellvars arguments that were added to PyCode_New and closure that was added to PyFrame_New copying xml/utils/iso8601.py -> build/lib.linux-i586-2.1/_xmlplus/utils copying xml/utils/qp_xml.py -> build/lib.linux-i586-2.1/_xmlplus/utils running build_ext building '_xmlplus.parsers.pyexpat' extension creating build/temp.linux-i586-2.1 gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD -DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok -Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o extensions/pyexpat.c: In function `getcode': extensions/pyexpat.c:266: warning: passing arg 11 of `PyCode_New' makes pointer from integer without a cast extensions/pyexpat.c:266: too few arguments to function `PyCode_New' extensions/pyexpat.c: In function `call_with_frame': extensions/pyexpat.c:293: too few arguments to function `PyFrame_New' error: command 'gcc' failed with exit status 1 according to modsupport.h 25-Jan-2001 FLD 1010 Parameters added to PyCode_New() and PyFrame_New(); Python 2.1a2 In compile.c PyCodeObject * PyCode_New(int argcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, int firstlineno, PyObject *lnotab) { and frameobject.c PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals, PyObject *locals, PyObject *closure) { From rnd@onego.ru Tue Feb 6 05:23:26 2001 From: rnd@onego.ru (Roman Suzi) Date: Tue, 6 Feb 2001 08:23:26 +0300 (MSK) Subject: [XML-SIG] [OT] locale.py doesn't work? (fwd) Message-ID: I am sorry for offtopic, but I can't contact Martin at martin@mira.cs.tu-berlin.de for a week already. (connection refused) Roman. ---------- Forwarded message ---------- Date: Tue, 30 Jan 2001 15:46:02 +0300 (MSK) From: Roman Suzi To: Martin v. Loewis Subject: locale.py doesn't work? Hello, Martin! I am trying to use ru_RU.koi8-r locale (collation, uppercase, etc) but it doesn't work for unknown reason. Here is a code: > cat ./try_locale.py #!/usr/bin/env python import locale import os # os.environ["LC_ALL"] =3D "ru_RU.CP1251" # print locale.getdefaultlocale() locale.setlocale(locale.LC_ALL,['ru_RU','koi8-r']) #locale.setlocale(locale.LC_ALL,['ru_RU','KOI8-R']) print locale.getlocale() print locale.string.uppercase print locale.string.lowercase # End of try_locale.py > ./try_locale.py ['ru_RU', 'ISO8859-5'] ABCDEFGHIJKLMNOPQRSTUVWXYZ=A1=A2=A3=A4=A5=A6=A7=A8=A9=AA=AB=AC=AE=AF=B0=B1= =B2=B3=B4=B5=B6=B7=B8=A0=BA=BB=BC=BD=BE=BF=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA= =CB=CC=CD=CE=CF abcdefghijklmnopqrstuvwxyz=D0=D1=D2=D3=D4=D5=D6=D7=D8=D9=DA=DB=DC=DD=DE=DF= =E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=EF=F1=F2=F3=F4=F5=F6=F7=F8=F9= =FA=FB=FC=FE=FF - which is wrong, because koi8-r uppercase letters are different. It is not even ISO8859-5 (I tried with recode: > ./try_locale.py | recode ISO8859-5..koi8-r ['ru_RU', 'ISO8859-5'] ABCDEFGHIJKLMNOPQRSTUVWXYZ=B3recode: Invalid input in step `ISO-8859-5..KOI8-R' - and this is strange. Am I missing something important? Or is it a bug in Python 2.0? (All this in BlackCat Linux 6.2 ~=3D RH 6.2) (You are the author of the article "Internationalizing Python" so probably you could answer this question.) Sincerely yours, Roman Suzi --=20 Vote for my design: http://silvermouse.onego.ru/gray.php3?id=3D0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Tuesday, January 30, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Give instruction to a wise man and he will be yet wiser." _/ From uche.ogbuji@fourthought.com Tue Feb 6 05:29:13 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 05 Feb 2001 22:29:13 -0700 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: Message from "Martin v. Loewis" of "Tue, 06 Feb 2001 02:31:46 +0100." <200102060131.f161Vk311008@mira.informatik.hu-berlin.de> Message-ID: <200102060529.WAA18345@localhost.localdomain> > > The code in startDocument() looks like it would insert two document > > elements if self._locator is set and its getPublicId() returns a > > non-null qualified name. I don't know how to fix that, or how > > common this is. > > I think this code was completely bogus. The author apparently thought > of creating DocumentTypes, in which case publicId and systemId would > be required. However, the SAX locator does not provide that > information (atleast not for the DTD; rather for the document itself), > nor were we in the process of creating document types. > > It seems that the processing of the doctype argument is also > incorrect: It should *not* create one given the qualifiedName, atleast > I can't find any indication that it should. It MUST set the > ownerDocument, though, which it doesn't. I'm not sure whether the > doctype needs to appear in the childNodes of the Document, can anybody > clarify this? Yes. The doctype is a child of the Document, along with any comments and PIs in the prolog. This is the main reason for having a documentElement() method. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Feb 6 05:30:54 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 05 Feb 2001 22:30:54 -0700 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: Message from Michael McLay of "Mon, 05 Feb 2001 21:03:41 EST." <0102052103410E.03631@fermi.eeel.nist.gov> Message-ID: <200102060530.WAA18464@localhost.localdomain> > The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to > include the addition of the freevars and cellvars arguments that were added > to PyCode_New and closure that was added to PyFrame_New > > copying xml/utils/iso8601.py -> build/lib.linux-i586-2.1/_xmlplus/utils > copying xml/utils/qp_xml.py -> build/lib.linux-i586-2.1/_xmlplus/utils > running build_ext > building '_xmlplus.parsers.pyexpat' extension > creating build/temp.linux-i586-2.1 > gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD > -DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok > -Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c > extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o > extensions/pyexpat.c: In function `getcode': > extensions/pyexpat.c:266: warning: passing arg 11 of `PyCode_New' makes > pointer from integer without a cast > extensions/pyexpat.c:266: too few arguments to function `PyCode_New' > extensions/pyexpat.c: In function `call_with_frame': > extensions/pyexpat.c:293: too few arguments to function `PyFrame_New' > error: command 'gcc' failed with exit status 1 Odd. It does compile for me with Python 2.1a2, but then I'm using PyXML from CVS. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Tue Feb 6 08:53:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 6 Feb 2001 09:53:47 +0100 Subject: [XML-SIG] [OT] locale.py doesn't work? (fwd) In-Reply-To: (message from Roman Suzi on Tue, 6 Feb 2001 08:23:26 +0300 (MSK)) References: Message-ID: <200102060853.f168rlX01040@mira.informatik.hu-berlin.de> > I am sorry for offtopic, but I can't contact Martin at > martin@mira.cs.tu-berlin.de for a week already. > (connection refused) Sorry for any confusion this has caused; please use martin@loewis.home.cs.tu-berlin.de (which *should* be the From: address in this message). BTW, i18n-sig@python.org would have been the right for this kind of issue. > print locale.string.uppercase > print locale.string.lowercase > > # End of try_locale.py > > > ./try_locale.py > ['ru_RU', 'ISO8859-5'] > ABCDEFGHIJKLMNOPQRSTUVWXYZ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸ º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ > abcdefghijklmnopqrstuvwxyzÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïñòóôõö÷øùúûüþÿ > > - which is wrong, because koi8-r uppercase letters are different. Interesting. Please run the C program #include #include int main() { int i; printf("%s\n",setlocale(LC_ALL,"ru_RU")); for(i=1;i<256;i++){ if(islower(i)) printf("%d, ",i); } printf("\n"); } on your system. It is supposed to print the decimal values of all lowercase letters. As you'll find, it prints the numeric values of all letters in string.letters (try map(ord, string.letters) to obtain such a list in Python). I get the same results on my Linux installation, which uses glibc 2.1.3. So I'd say it is a bug in the C library; please submit a bug using the glibcbug script if you agree, or complain to your Linux distributor. If you find out a solution, please let us know. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 6 09:44:29 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 6 Feb 2001 10:44:29 +0100 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: <0102052103410E.03631@fermi.eeel.nist.gov> (message from Michael McLay on Mon, 5 Feb 2001 21:03:41 -0500) References: <0102052103410E.03631@fermi.eeel.nist.gov> Message-ID: <200102060944.f169iTe01589@mira.informatik.hu-berlin.de> > The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to > include the addition of the freevars and cellvars arguments that were added > to PyCode_New and closure that was added to PyFrame_New Thanks for the reminder. The pyexpat copy in 2.0a2 already had these changes, yet in a manner that only worked for the modified API. I have modified both copies of pyexpat.c to support the 2.0a2 API. pyexpat.c has good chances of being the Python module with the highest number of independently-maintained copies; a third copy lives in the Zope CVS. People building PyXML might not have noticed the problem, setup.py won't build pyexpat if it finds that the Python one is good enough. Regards, Martin From edd@usefulinc.com Tue Feb 6 10:14:56 2001 From: edd@usefulinc.com (Edd Dumbill) Date: Tue, 6 Feb 2001 10:14:56 +0000 Subject: [XML-SIG] REMINDER: Days left for O'Reilly Open Source Conference XML CFP Message-ID: <20010206101456.N25446@usefulinc.com> A reminder -- just days left to submit a proposal for the XML track at O'Reilly's Open Source convention this year. I include the original CFP below. Please get in touch if you have any questions or need more time. (sent to XML-DEV, copied to Apache General, Python XML-SIG and Perl-XML lists) Call for Participation XTech 2001 Conference (in co-operation with GCA) Part of the O'Reilly Open Source Convention July 23-27, 2001 in San Diego, California The Open Source Convention is a five-day event designed for programmers, developers, and technical staff involved in Open Source technology and its applications. The Convention includes two days of intensely focused tutorials aimed at novices and experienced users, and three days of multi-tracked convention sessions, including an XML track, XTech 2001. The XML program committee invites submissions of tutorials or convention presentations on pure XML topics, open source XML applications and the use of XML in open source platforms. Submissions tailored for open source developers new to XML, as well as those that highlight the cutting edge of XML technology are sought. Submissions by marketing staff or with a marketing focus will not be accepted. The deadline for tutorial and presentation proposals is February 9, 2001 Further details and guidelines for submission may be found at -- Edd Dumbill, XML Track Chair From mclay@nist.gov Tue Feb 6 01:31:25 2001 From: mclay@nist.gov (Michael McLay) Date: Mon, 5 Feb 2001 20:31:25 -0500 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: <200102060944.f169iTe01589@mira.informatik.hu-berlin.de> References: <0102052103410E.03631@fermi.eeel.nist.gov> <200102060944.f169iTe01589@mira.informatik.hu-berlin.de> Message-ID: <01020520312500.01559@fermi.eeel.nist.gov> On Tuesday 06 February 2001 04:44, Martin v. Loewis wrote: > > People building PyXML might not have noticed the problem, setup.py > won't build pyexpat if it finds that the Python one is good enough. When I ran make on 2.0a2 and then ran build the pyexpat module wasn't built. I'm running Redhat linux: Linux 2.2.14-5.0 #1 Tue Mar 7 20:53:41 EST 2000 i586 unknown I suspect it may have failed to build because I did not have expat installed. There wasn't a warning to this effect and the instructions in the README file did not say I need to have it installed. (The README file still talks about editing the Module/Setup file so I think it is probably out of date.) The 2.1a2 download page does nor reference source code for expat. It was included in the 2.0 download page, http://www.python.org/2.0/. It was not clear to me if the expat source code was included in the 2.1a2 source distribution. The 2.1 download page, http://www.python.org/2.1/, references the 2.1a1 release on SourceForge. (The returned page highlights the older release instead of the 2.1a2 release.) The http://www.python.org/ftp/python/2.1/ download location is also reference on the /2.1/ page. None of the pages reference expat source or mention the need to install expat. From martin@loewis.home.cs.tu-berlin.de Tue Feb 6 19:42:36 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 6 Feb 2001 20:42:36 +0100 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: <01020520312500.01559@fermi.eeel.nist.gov> (message from Michael McLay on Mon, 5 Feb 2001 20:31:25 -0500) References: <0102052103410E.03631@fermi.eeel.nist.gov> <200102060944.f169iTe01589@mira.informatik.hu-berlin.de> <01020520312500.01559@fermi.eeel.nist.gov> Message-ID: <200102061942.f16Jga700919@mira.informatik.hu-berlin.de> > > People building PyXML might not have noticed the problem, setup.py > > won't build pyexpat if it finds that the Python one is good enough. > > When I ran make on 2.0a2 and then ran build the pyexpat module > wasn't built. I'm running Redhat linux: > > Linux 2.2.14-5.0 #1 Tue Mar 7 20:53:41 EST 2000 i586 unknown > > I suspect it may have failed to build because I did not have expat > installed. That is the likely cause, indeed. I was not saying that you did anything wrong, or that others did anything wrong - I just tried to explain the differences. > There wasn't a warning to this effect and the instructions in the > README file did not say I need to have it installed. (The README > file still talks about editing the Module/Setup file so I think it > is probably out of date.) It is not an error for an extension module not being built - it just won't be there when you need it. The autoconfiguration can't know what modules you meant to be built; instead, it will build everything it can (sometimes, it errs at guessing what it can build, such case is a genuine bug). > The 2.1a2 download page does nor reference source code for expat. > It was included in the 2.0 download page, > http://www.python.org/2.0/. It was not clear to me if the expat > source code was included in the 2.1a2 source distribution. It wasn't included, and likely will not be. Instead, you need to install it separately (as you did for 2.0 - the Python download pages just provided a copy that was known to work). > The 2.1 download page, http://www.python.org/2.1/, references the > 2.1a1 release on SourceForge. (The returned page highlights the > older release instead of the 2.1a2 release.) The > http://www.python.org/ftp/python/2.1/ download location is also > reference on the /2.1/ page. None of the pages reference expat > source or mention the need to install expat. It is not needed, at least not more than Tkinter, zlib, BSDDB, OpenGL, Purify, readline, OpenSSL, or GDBM. Different manual installation will provide different sets of extension modules, but that is really no change. It will be the responsibility of packagers (i.e. Windows installation authors and Linux distributors) to make sure a common set of extension modules is always available. It is certainly recommended that pyexpat is in this common set. Regards, Martin From paulp@ActiveState.com Tue Feb 6 21:08:26 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 06 Feb 2001 13:08:26 -0800 Subject: [XML-SIG] Minidom bugs/questions References: <200102050629.XAA29875@localhost.localdomain> Message-ID: <3A8067CA.3983596@ActiveState.com> Uche Ogbuji wrote: > > ... > > This would be a very opportune time for Paul Prescod to make a re-appearance. Your invocation cut my vacation short. Thanks alot! I think that minidom should remain as mini as possible. I'll comment on the other issues later today... Paul Prescod From mclay@nist.gov Tue Feb 6 10:17:20 2001 From: mclay@nist.gov (Michael McLay) Date: Tue, 6 Feb 2001 05:17:20 -0500 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: <200102061942.f16Jga700919@mira.informatik.hu-berlin.de> References: <0102052103410E.03631@fermi.eeel.nist.gov> <01020520312500.01559@fermi.eeel.nist.gov> <200102061942.f16Jga700919@mira.informatik.hu-berlin.de> Message-ID: <01020605172007.01559@fermi.eeel.nist.gov> On Tuesday 06 February 2001 14:42, Martin v. Loewis wrote: > > I suspect it may have failed to build because I did not have expat > > installed. > > That is the likely cause, indeed. I was not saying that you did > anything wrong, or that others did anything wrong - I just tried to > explain the differences. Is there a specific version of expat that needs to be installed? I found no reference to where to get the library or which version was required. I could go to freshmeat to look it up, but what if I find a stall reference to an old version of the library? There is an advantage in having a list of URLs required to build extensions would make it much safer and error free for end users when they are trying to build a fully populated module library. > > > There wasn't a warning to this effect and the instructions in the > > README file did not say I need to have it installed. (The README > > file still talks about editing the Module/Setup file so I think it > > is probably out of date.) > > It is not an error for an extension module not being built - it just > won't be there when you need it. The autoconfiguration can't know what > modules you meant to be built; instead, it will build everything it > can (sometimes, it errs at guessing what it can build, such case is a > genuine bug). I found out the pyexpat wasn't built when I tried executing a script that required the module. Fortunately I was still running ground tests on my flight control system when the exception was raised:-) > > The 2.1a2 download page does nor reference source code for expat. > > It was included in the 2.0 download page, > > http://www.python.org/2.0/. It was not clear to me if the expat > > source code was included in the 2.1a2 source distribution. > > It wasn't included, and likely will not be. Instead, you need to > install it separately (as you did for 2.0 - the Python download pages > just provided a copy that was known to work). I understand why you don't bundle it with the distribution, I just was pointing out that the documentation didn't make it clear that I needed to have the expat library installed before PyXML would work. > > The 2.1 download page, http://www.python.org/2.1/, references the > > 2.1a1 release on SourceForge. (The returned page highlights the > > older release instead of the 2.1a2 release.) The > > http://www.python.org/ftp/python/2.1/ download location is also > > reference on the /2.1/ page. None of the pages reference expat > > source or mention the need to install expat. > > It is not needed, at least not more than Tkinter, zlib, BSDDB, OpenGL, > Purify, readline, OpenSSL, or GDBM. Different manual installation will > provide different sets of extension modules, but that is really no > change. It will be the responsibility of packagers (i.e. Windows > installation authors and Linux distributors) to make sure a common set > of extension modules is always available. It is certainly recommended > that pyexpat is in this common set. The Linux distributions are not consistent about which modules are built. I assume that the included modules are based on what they need internally and maybe what is easy to build. If it were possible to easily identify which modules were not built and what was missing that prevented them from being built. With some additional information it is more likely that the maintainers of Linux distributions would add imissing libraries so the buiild would be complete. This would help Python to have a more uniform base of preinstalled modules. That is just speculation on my part. I would expect the report generation could be done automatically by the build process. The build tool would need to track which modules were skipped and generate a report at the end of what was not built and why. To enable the reporting a module maintainers would add a dictionary that mapped libraries to a list of URLs where the libraries can be retrieved. From martin@loewis.home.cs.tu-berlin.de Tue Feb 6 23:29:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 7 Feb 2001 00:29:38 +0100 Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2 In-Reply-To: <01020605172007.01559@fermi.eeel.nist.gov> (message from Michael McLay on Tue, 6 Feb 2001 05:17:20 -0500) References: <0102052103410E.03631@fermi.eeel.nist.gov> <01020520312500.01559@fermi.eeel.nist.gov> <200102061942.f16Jga700919@mira.informatik.hu-berlin.de> <01020605172007.01559@fermi.eeel.nist.gov> Message-ID: <200102062329.f16NTcs02332@mira.informatik.hu-berlin.de> > Is there a specific version of expat that needs to be installed? I > found no reference to where to get the library or which version was > required. Both expat 1.1 and 1.2 are known to work. Unfortunately, these releases are not self-identifying, so it is hard to tell which one you got after you've installed them. For 1.95.1, I think there are some issues that pyexpat will behave differently - I don't know whether that is due to bug fixes, new bugs, or simply changed behaviour; it was also a while ago that I've used this version (by accident at that time). > There is an advantage in having a list of URLs required to build > extensions would make it much safer and error free for end users > when they are trying to build a fully populated module library. Certainly. The disadvantage of having such a list is that it requires a volunteer to maintain it. Please have a look at Modules/Setup.dist, though. As it is not required in the actual build process, it may be inaccurate. > I found out the pyexpat wasn't built when I tried executing a script > that required the module. Fortunately I was still running ground > tests on my flight control system when the exception was raised:-) Yes, packaging and deployment is a hard business, and often requires expert knowledge of the system being deployed. > > > The 2.1a2 download page does nor reference source code for expat. > > > It was included in the 2.0 download page, > > > http://www.python.org/2.0/. It was not clear to me if the expat > > > source code was included in the 2.1a2 source distribution. > > > > It wasn't included, and likely will not be. Instead, you need to > > install it separately (as you did for 2.0 - the Python download pages > > just provided a copy that was known to work). > > I understand why you don't bundle it with the distribution, I just was > pointing out that the documentation didn't make it clear that I needed to > have the expat library installed before PyXML would work. There is some confusion here: PyXML *does* include expat. Python 2.0 does not include it, nor will 2.1. > If it were possible to easily identify which modules were not built > and what was missing that prevented them from being built. With > some additional information it is more likely that the maintainers > of Linux distributions would add imissing libraries so the buiild > would be complete. In Python 1.5, Modules/Setup* did provide such a complete list, yet there were still differences - apparently caused by distributors being unwilling to make the required headers and libraries available on the build system. So I do not believe your claim that a comprehensive list would solve this matter. > That is just speculation on my part. I would expect the report > generation could be done automatically by the build process. The > build tool would need to track which modules were skipped and > generate a report at the end of what was not built and why. It is certainly possible; it just requires a volunteer to implement such a feature. It would then require cooperation of all contributors to use the feature properly when they make changes to the build process. To officially request a feature, please file a bug report at sourceforge.net/projects/python. This is also the place where patches can be contributed. Regards, Martin From Mike.Olson@fourthought.com Wed Feb 7 00:56:34 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 06 Feb 2001 17:56:34 -0700 Subject: [XML-SIG] Minidom bugs/questions References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com> <200102050745.f157jOm00978@mira.informatik.hu-berlin.de> Message-ID: <3A809D42.95B40680@FourThought.com> "Martin v. Loewis" wrote: > > > I think we should also look at merging minidom and pDomlette. Both are > > supposed to be "mini" and I think they both support about the same sets > > of functionality. No sense keeping both of them around. I can look at > > the differences and try to merge them. > > I never quite understood where the "p" in pDomlette came from. To > date, pDomlette is just 200 lines longer than minidom, so yes, merging > them is a genuine option. Bear in mind that a new Python release is > upcoming, and that the final beta release is probably the last point > to add missing features (i.e. bug corrections with regard to DOM > conformance). It is not inherently wrong to include a more complete > version of minidom with PyXML, but it would be nice if it was stable > after 2.1. We've been using pDomlette pretty heavily for quite some time now. It is the default DOM in 4XSLT. That would be one nice feature about combining them both, the DOM that ships with Python 2.0 will work with 4XSLT (and 4XPath, 4XLink, et al). > > As for the differences, I wonder what to do with the > auto-normalization feature of pDomlette. I can't figure out what > exactly that means: auto-normalization during parsing, or > auto-normalization during insertion of nodes. While I can see that it > is useful, I'm concerned about standards compliance here. It is during parsing. If you append a text node after another text node, it will keep them as two seperate nodes. I think this is standards compliant though iuf I recall the spec is a little bit hazy there... > I feel some misunderstanding here. I'm talking about code like > > if ownerDoc == None: > dt = implementation.createDocumentType('', '', '') > self._ownerDoc = implementation.createDocument('', None, dt) > self._rootNode = self._ownerDoc > > (from xml.dom.ext.reader.Sax), in particular about the invocation of > createDocument with a null qualifiedName. I could not find any > permission in the DOM spec for such usage, and Xerces/C++ has code like Its for the readers. If pass in a namespaceURI to createDocument it will add the root element. Then we would need special handeling code in start_element to determine if a document element has been added, or if it even has a document element. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Wed Feb 7 01:18:22 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 06 Feb 2001 18:18:22 -0700 Subject: [XML-SIG] Minidom bugs/questions References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com> <200102050745.f157jOm00978@mira.informatik.hu-berlin.de> <3A809D42.95B40680@FourThought.com> Message-ID: <3A80A25E.8BC0A360@FourThought.com> > > I never quite understood where the "p" in pDomlette came from. For Python DOM, as opposed to cDomlette, our C DOM. Its a naming convention I stole from Zope....our eventual idea is to have Ft.Lib.domlette and at import time decide if it should be "p" or "c" but we have a fair amount of work to do on our cDomlette first. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Wed Feb 7 04:52:12 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 6 Feb 2001 23:52:12 -0500 Subject: [XML-SIG] Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 Message-ID: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com> I started trying out some of the demos that come with PyXML and 4Suite - actually I got them by downloading the source for 4Suite from the 4Suite server. I've picked up some bugs. Most, but not all, are in test or demo scripts, but some are in actual working modules. I've only tried out a few things, so this is not comprhensive at all. My system is Python 1.5.2. on Windows 98, with PyXML 0.6.2. First, a number of modules reference the "core" module, which isn't there (any more, I assume?). Some are tests, some are not. I don't have a list at the moment, but they should be flushed out. Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement doesn't throw an ImportError, but a NameError (seems strange). I fixed it like this (I used Exception instead of NameError, in case some other versoin should throw ImportError as you would expect. line 27: try: #Python 2.0 import pyexpat #except ImportError: ==>Currently this import throws NameError, not ImportError except Exception: #Python 1.x with PyXML from xml.parsers import pyexpat I don't think this is really the way to fix it, though - there must be some reason I'm getting an unexpected type of exception, and that is what ought to be fixed. Finally, in Ft\Xlink\XLinkElements.py, reader.fromURI() has an additional argument which is no longer used in the reader's parent class. I fixed it like this, commenting out the extra arg so you can see it: line 51: frag = reader.fromUri(self.href)#, doc = doc) ==> API doesn't include 'doc' arg It looks to me like there are a lot of left-over things that haven't gotten caught yet, and a lot of the tests haven't run for me - DOM seems OK but XLink and XPointer have given problems. They look like the kind of things that wouldn't have been worked for 0.6.3, but I haven't tried that yet. I'm not sure who shoud be putting fixes for these bugs in once they are agreed on. I'm still not getting secure access negotiated properly, so it's not going to be me for while yet. Cheers, Tom P From tpassin@home.com Wed Feb 7 04:55:54 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 6 Feb 2001 23:55:54 -0500 Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 Message-ID: <008101c090c2$40afeba0$7cac1218@reston1.va.home.com> I forget to add - a lot of FT's test scripts use "import TestSuite", but that's now at Ft.Lib.TestSuite, and I had to make a number of corresponding import changes, too. There were a lot of them in the xpath test directory, though I didn't make a list yet. Cheers, Tom P ----- Original Message ----- From: "Thomas B. Passin" To: Sent: Tuesday, February 06, 2001 11:52 PM Subject: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 > I started trying out some of the demos that come with PyXML and 4Suite - > actually I got them by downloading the source for 4Suite from the 4Suite > server. I've picked up some bugs. Most, but not all, are in test or demo > scripts, but some are in actual working modules. I've only tried out a few > things, so this is not comprhensive at all. > > My system is Python 1.5.2. on Windows 98, with PyXML 0.6.2. > > First, a number of modules reference the "core" module, which isn't there (any > more, I assume?). Some are tests, some are not. I don't have a list at the > moment, but they should be flushed out. > > Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement doesn't throw > an ImportError, but a NameError (seems strange). I fixed it like this (I > used Exception instead of NameError, in case some other versoin should throw > ImportError as you would expect. > line 27: > try: > #Python 2.0 > import pyexpat > #except ImportError: ==>Currently this import throws NameError, not > ImportError > except Exception: > #Python 1.x with PyXML > from xml.parsers import pyexpat > > I don't think this is really the way to fix it, though - there must be some > reason I'm getting an unexpected type of exception, and that is what ought to > be fixed. > > Finally, in Ft\Xlink\XLinkElements.py, reader.fromURI() has an additional > argument which is no longer used in the reader's parent class. I fixed it > like this, commenting out the extra arg so you can see it: > > line 51: > frag = reader.fromUri(self.href)#, doc = doc) ==> API doesn't include > 'doc' arg > > It looks to me like there are a lot of left-over things that haven't gotten > caught yet, and a lot of the tests haven't run for me - DOM seems OK but XLink > and XPointer have given problems. They look like the kind of things that > wouldn't have been worked for 0.6.3, but I haven't tried that yet. > > I'm not sure who shoud be putting fixes for these bugs in once they are agreed > on. I'm still not getting secure access negotiated properly, so it's not > going to be me for while yet. > > Cheers, > > Tom P > From tpassin@home.com Wed Feb 7 05:16:53 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 7 Feb 2001 00:16:53 -0500 Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 Message-ID: <008801c090c5$2ed14e80$7cac1218@reston1.va.home.com> More problems with the XPointer TestParser script. At line 12, the ReadFromUri() no longer exists. I hacked up a fix as shown below: ********** XPointerParser ********** Traceback (innermost last): File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line 67, in ? retval = test() File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line 12, in test doc = pDomlette.ReadFromUri('addrbook.xml') AttributeError: ReadFromUri ## doc = pDomlette.ReadFromUri('addrbook.xml') #Original code reader=pDomlette.PyExpatReader() # Need a reader, original call is no more doc = reader.fromUri('addrbook.xml') Now the code runs, but the example being tested still fails - Apparently the xpointer expression no longer works. When I try various plausible variations, a few of them work, some of them return None (which causes an error) , and some give this same error. Seems to me that if no match is found for an expression, returning None would always be appropriate. It shouldn't return an exception unless there was invalid syntax in the xpointer expression. Here's the trace: C:>TestParser.py ********** XPointerParser ********** Creating test environment [ OK ] Traceback (innermost last): File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line 69, in ? retval = test() File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line 49, in test result = XPointer.SelectNode(doc, frag) File "C:\Program Files\Python\Ft\XPointer\__init__.py", line 57, in SelectNode return xptr.select(doc, contextNode, nss) File "C:\Program Files\Python\Ft\XPointer\ParsedXPointer.py", line 47, in sele ct raise XPtrException(XPtrException.SUB_RESOURCE_ERROR) Ft.XPointer.XPtrException.XPtrException: Expression does not locate a resource Cheers, Tom P p.s. - That's it for tonight, no more posts on this! From paulp@ActiveState.com Wed Feb 7 21:02:31 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 07 Feb 2001 13:02:31 -0800 Subject: [XML-SIG] Minidom bugs/questions References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> <200102051922.OAA01298@cj20424-a.reston1.va.home.com> <200102060131.f161Vk311008@mira.informatik.hu-berlin.de> Message-ID: <3A81B7E7.7B55C9A1@ActiveState.com> "Martin v. Loewis" wrote: > >... > > I think this code was completely bogus. The author apparently thought > of creating DocumentTypes, in which case publicId and systemId would > be required. However, the SAX locator does not provide that > information (atleast not for the DTD; rather for the document itself), > nor were we in the process of creating document types. I do not think that minidom should support any DTD information. I would advise that you should just remove any code relating to public and system identifiers. It was not there originally and I don't think it is useful. > It seems that the processing of the doctype argument is also > incorrect: It should *not* create one given the qualifiedName, atleast > I can't find any indication that it should. It MUST set the > ownerDocument, though, which it doesn't. I'm not sure whether the > doctype needs to appear in the childNodes of the Document, can anybody > clarify this? Yes, the DocumentType would be a child of the Document. But I don't think we should have doctype at all...leave that to 4dom. Paul Prescod From jeremy.kloth@fourthought.com Wed Feb 7 21:12:44 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Wed, 07 Feb 2001 14:12:44 -0700 Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 References: <008101c090c2$40afeba0$7cac1218@reston1.va.home.com> Message-ID: <3A81BA4C.9741C89B@fourthought.com> "Thomas B. Passin" wrote: > > I forget to add - a lot of FT's test scripts use "import TestSuite", but > that's now at Ft.Lib.TestSuite, and I had to make a number of corresponding > import changes, too. There were a lot of them in the xpath test directory, > though I didn't make a list yet. > Actually, the local import works just fine when the test scripts are run from the directory where they are installed, the same as the documentation directory. During install, that file gets copied into the directory as well. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paulp@ActiveState.com Wed Feb 7 21:09:57 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 07 Feb 2001 13:09:57 -0800 Subject: [XML-SIG] Minidom bugs/questions References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> Message-ID: <3A81B9A5.CE897F06@ActiveState.com> "Martin v. Loewis" wrote: > > > . > > Well, we should think about exactly what makes minidom "mini". It's > > debatable whether it is possible to implement all of DOM Level 2 > > core and still be "mini". And what about DOm level 3? > > I think the original understanding was that everything that is > "convenience", ie. can be composed from other interfaces, should not > be included. In addition, minidom originally had no DOMImplementation, > you had to know the implementation class names to build a tree. > > That approach has failed; people have been contributing bits and > pieces so that what they wanted to use is there. These days, I think > it is mini by only implementing DOM Core. That probably makes it a AA > battery. First, would implementing DOM core include entities, notations, document types, entity references etc.? If so, I think you're increasing the conceptual load quite a bit. I also originally wanted minidom to be readonly but yes, that has gone away also. > [supporting namespaces] > > Of course if it isn't Level 2 compliant, it needn't do so. I > > wouldn't consider it unreasonable to have minidom L1 only. If users > > want Level 2, they install PyXML or other. > > I'd say that this is a matter of internal consistency. Since the SAX > part in Python supports namespaces, the DOM part should do so as > well. That means L2. It also turns out that what I hope is the larger > half of NS support is already in minidom as of Python 2.0, so ripping > it out would not be sensible. I put off namespace support as long as I could trying to keep it simple. The tricky part of doing namespaces "right" is doing movement of nodes across namespace boundaries right. You've got issues of prefix clashes, element type renaming and so forth. Having proper namespace support would not be trivial. I admittedly should have not have started adding any namespace support at all until I had figured out the end-game...is it too late to go back and make it readonly again? :) Paul Prescod From jeremy.kloth@fourthought.com Wed Feb 7 21:17:32 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Wed, 07 Feb 2001 14:17:32 -0700 Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 References: <008801c090c5$2ed14e80$7cac1218@reston1.va.home.com> Message-ID: <3A81BB6C.1600F1EE@fourthought.com> "Thomas B. Passin" wrote: > > Now the code runs, but the example being tested still fails - Apparently the > xpointer expression no longer works. When I try various plausible variations, > a few of them work, some of them return None (which causes an error) , and > some give this same error. > > Seems to me that if no match is found for an expression, returning None would > always be appropriate. It shouldn't return an exception unless there was > invalid syntax in the xpointer expression. > According the the XPointer specification, it is an error. In section 3.4: [Definition: If a syntactically correct XPointer, suitably escaped, fails as discussed in 4.3 Schemes, the XPointer has a sub-resource error.] Note that XPath allows expressions that return empty node-sets as their results and does not regard this situation as an error. Because the XPointer language is intended as a specification of document locations rather than a broader query language, an empty result is an error. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu Feb 8 01:14:03 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 8 Feb 2001 02:14:03 +0100 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: <3A81B9A5.CE897F06@ActiveState.com> (message from Paul Prescod on Wed, 07 Feb 2001 13:09:57 -0800) References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> <3A81B9A5.CE897F06@ActiveState.com> Message-ID: <200102080114.f181E3k01764@mira.informatik.hu-berlin.de> > First, would implementing DOM core include entities, notations, document > types, entity references etc.? If so, I think you're increasing the > conceptual load quite a bit. I think it should include anything that users want to use. Refusing patches because they extend beyond an originally-set feature set is not good. Regards, Martin From paulp@ActiveState.com Thu Feb 8 02:00:48 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 07 Feb 2001 18:00:48 -0800 Subject: [XML-SIG] Minidom bugs/questions References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> <3A81B9A5.CE897F06@ActiveState.com> <200102080114.f181E3k01764@mira.informatik.hu-berlin.de> Message-ID: <3A81FDD0.DEBCE83C@ActiveState.com> "Martin v. Loewis" wrote: > > > First, would implementing DOM core include entities, notations, document > > types, entity references etc.? If so, I think you're increasing the > > conceptual load quite a bit. > > I think it should include anything that users want to use. Refusing > patches because they extend beyond an originally-set feature set is > not good. Guido does that all of the time. It's part of how we keep things simple! Every extra feature must be added to the documentation and increases people's XML-a-phobia by that much more. Paul Prescod From chris@rpgarchive.com Thu Feb 8 06:00:21 2001 From: chris@rpgarchive.com (Chris Davis) Date: Thu, 8 Feb 2001 00:00:21 -0600 Subject: [XML-SIG] SAXReaderNotAvailable Message-ID: <0102080000210J.12796@lab.rpgarchive.com> I'm sorry to just throw an error out like this, but can anyone tell me wh= at=20 might be the cause of this expection. I trying to help a friend get pytho= n20=20 installed properly. He has expat installed and running slackware. =20 File "./minidom.py", line 581, in parseString =A0 =A0 return _doparse(pulldom.parseString, args, kwargs) =A0 File "./minidom.py", line 570, in _doparse =A0 =A0 events =3D apply(func, args, kwargs) =A0 File "./pulldom.py", line 244, in parseString =A0 =A0 parser =3D xml.sax.make_parser() =A0 File "/usr/lib/python2.0/xml/sax/__init__.py", line 88, in make_parse= r =A0 =A0 raise SAXReaderNotAvailable("No parsers found", None) xml.sax._exceptions.SAXReaderNotAvailable: No parsers found xml dir: darkstar:/usr/lib/python2.0:>ls -R xml xml: __init__.py =A0__init__.pyc =A0__init__.pyo =A0dom/ =A0parsers/ =A0sax/ xml/dom: __init__.py =A0 __init__.pyo =A0minidom.pyc =A0pulldom.py =A0 pulldom.pyo __init__.pyc =A0minidom.py =A0 =A0minidom.pyo =A0pulldom.pyc xml/parsers: __init__.py =A0__init__.pyc =A0__init__.pyo =A0expat.py =A0expat.pyc =A0e= xpat.pyo xml/sax: __init__.py =A0 =A0 _exceptions.pyc =A0expatreader.pyc =A0handler.pyo =A0= saxutils.pyo __init__.pyc =A0 =A0_exceptions.pyo =A0expatreader.pyo =A0parsers@ =A0 =A0= =A0xmlreader.py __init__.pyo =A0 =A0dom@ =A0 =A0 =A0 =A0 =A0 =A0 handler.py =A0 =A0 =A0 s= axutils.py =A0=20 xmlreader.pyc _exceptions.py =A0expatreader.py =A0 handler.pyc =A0 =A0 =A0saxutils.pyc = =A0 xmlreader.pyo Thanks --=20 Chris Davis chris@rpgarchive.com RPGArchive http://rpgarchive.com OpenRPG http://openrpg.com From stefan.marsiske@sysdata.siemens.hu Thu Feb 8 12:40:21 2001 From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244) Date: Thu, 8 Feb 2001 13:40:21 +0100 Subject: [XML-SIG] sax and entities Message-ID: <20010208134021.D4340@sysdata.siemens.hu> hi, i got a little problem. when i want to load an xml file using sax2, i loose entities. in one file (which is actually almost html) i have a " " entity, but once loaded that entity in the dom tree is already converted to a space. that is quite unfortunate. because i want to write this dom tree back after a few changes, but then this   is lost... "one in a million..." how can i force the sax2 reader not to expand entities? or do i miss the point here? ciao -- Stefan [http://web.interware.hu/stef] UPDATED:001031 quote: "happy(y2k++)" gpg-key: http://web.interware.hu/stef/gpg.txt From larsga@garshol.priv.no Thu Feb 8 13:31:05 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 08 Feb 2001 14:31:05 +0100 Subject: [XML-SIG] sax and entities In-Reply-To: <20010208134021.D4340@sysdata.siemens.hu> References: <20010208134021.D4340@sysdata.siemens.hu> Message-ID: * Marsiske Stefan | | i got a little problem. when i want to load an xml file using sax2, | i loose entities. You are quite right that you lose information about which character data came from character entities, and that this information is not passed on to the DOM. The reason this is so is that this information is hardly ever wanted, and keeping all information of this kind would make the SAX API a lot more complicated. | in one file (which is actually almost html) i have a " " | entity, but once loaded that entity in the dom tree is already | converted to a space. that is quite unfortunate. because i want to | write this dom tree back after a few changes, but then this   | is lost... Well, first of all, it should not be converted to a space, but to the NBSP character, ISO Latin-1 character 160, U+00A0. If it is converted to an NBSP character, you still have it, and it will still be there when you write your DOM tree back, although in a different form. If you really want to have it as an ' ' in your output XML rather than as an NBSP character you should do something like string.replace(text, "\240", " ") when you write the DOM tree out. Exactly how to do this will depend on your DOM implementation. I think it would make very good sense, BTW, for the DOM serializers to provide some mechanism for doing escapings of this kind when serializing the DOM. It might be that you pass a dictionary like {"\240" : " "} or perhaps a function. What say ye, DOM implementors? --Lars M. From uche.ogbuji@fourthought.com Thu Feb 8 17:41:26 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 08 Feb 2001 10:41:26 -0700 Subject: [XML-SIG] sax and entities In-Reply-To: Message from Lars Marius Garshol of "08 Feb 2001 14:31:05 +0100." Message-ID: <200102081741.KAA19184@localhost.localdomain> > If you really want to have it as an ' ' in your output XML rather > than as an NBSP character you should do something like > > string.replace(text, "\240", " ") > > when you write the DOM tree out. Exactly how to do this will depend > on your DOM implementation. > > > I think it would make very good sense, BTW, for the DOM serializers to > provide some mechanism for doing escapings of this kind when > serializing the DOM. It might be that you pass a dictionary like > > {"\240" : " "} > > or perhaps a function. What say ye, DOM implementors? 4DOM and 4XSLT already automatically do this for HTML output. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Thu Feb 8 17:46:38 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 08 Feb 2001 11:46:38 -0600 Subject: [XML-SIG] Minidom bugs/questions In-Reply-To: "Martin v. Loewis"'s message of "Thu, 8 Feb 2001 02:14:03 +0100" References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> <3A81B9A5.CE897F06@ActiveState.com> <200102080114.f181E3k01764@mira.informatik.hu-berlin.de> Message-ID: "Martin v. Loewis" writes: > [Paul Prescod wrote:] > > First, would implementing DOM core include entities, notations, > > document types, entity references etc.? If so, I think you're > > increasing the conceptual load quite a bit. > > I think it should include anything that users want to use. Refusing > patches because they extend beyond an originally-set feature set is > not good. Seperating basic usage from extended usage in the documentation would go a long way towards satisfying both requirements: keeping the initial conceptual load down while still allowing richer use. I think what makes minidom lightweight is it's implementation and footprint, mostly due to relaxing many of the requirements of W3C DOM. -- Ken From martin@loewis.home.cs.tu-berlin.de Thu Feb 8 20:06:41 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 8 Feb 2001 21:06:41 +0100 Subject: [XML-SIG] sax and entities In-Reply-To: (message from Lars Marius Garshol on 08 Feb 2001 14:31:05 +0100) References: <20010208134021.D4340@sysdata.siemens.hu> Message-ID: <200102082006.f18K6fO01195@mira.informatik.hu-berlin.de> > I think it would make very good sense, BTW, for the DOM serializers to > provide some mechanism for doing escapings of this kind when > serializing the DOM. It might be that you pass a dictionary like > > {"\240" : " "} > > or perhaps a function. What say ye, DOM implementors? That would make another issue on a "standard Python DOM extensions" PEP. Unfortunately, so far, nobody has offered to draft one. Once it is there and agreed, I think it won't be too hard to provide such a feature in DOM implementations. The purpose of the PEP would be to present the feature uniformly across DOM implementations, of course. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Feb 8 20:31:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 8 Feb 2001 21:31:15 +0100 Subject: [XML-SIG] SAXReaderNotAvailable In-Reply-To: <0102080000210J.12796@lab.rpgarchive.com> (message from Chris Davis on Thu, 8 Feb 2001 00:00:21 -0600) References: <0102080000210J.12796@lab.rpgarchive.com> Message-ID: <200102082031.f18KVFx01306@mira.informatik.hu-berlin.de> > I'm sorry to just throw an error out like this, but can anyone tell me what > might be the cause of this expection. I trying to help a friend get python20 > installed properly. He has expat installed and running slackware. What do you mean with "has expat installed"? That expat.py is present? That the expat 1.1 header files and library is present? That the pyexpat module is present? The last one is required to find a parser. You need to enable pyexpat in Modules/Setup. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Feb 9 08:51:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 9 Feb 2001 09:51:56 +0100 Subject: [XML-SIG] Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2 In-Reply-To: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com> Message-ID: <200102090851.f198puK01486@mira.informatik.hu-berlin.de> > First, a number of modules reference the "core" module, which isn't > there (any more, I assume?). Some are tests, some are not. I don't > have a list at the moment, but they should be flushed out. That's a known issue; not all of that has been ported to 4DOM. Contributions are welcome. > Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement > doesn't throw an ImportError, but a NameError (seems strange). That code will look differently once we get an updated copy of 4DOM. > I'm not sure who shoud be putting fixes for these bugs in once they > are agreed on. I'm still not getting secure access negotiated > properly, so it's not going to be me for while yet. Patches can be submitted to sourceforge.net/projects/pyxml also. Regards, Martin From fdrake@acm.org Fri Feb 9 18:09:33 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 9 Feb 2001 13:09:33 -0500 (EST) Subject: [XML-SIG] Question about namespace declarations Message-ID: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com> I took a look in the namespaces recommendation (http://www.w3.org/TR/REC-xml-names/), and it makes me think that this isn't quite right. I vaguely recall that "xmlns" was supposed to magically map into that URI, but I don't see it in the recommendation. Further, the recommendation says (section 4, in "Namespace Constraint: Prefix Declared") that the "xmlns" prefix is not bound to any namespace URI. This makes me think that both "xmlns" and "xmlns:*" should be presented as attributes without namespaces in the DOM. Can anyone point to references that extend or override this recommendation? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From paulp@ActiveState.com Fri Feb 9 18:27:12 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 09 Feb 2001 10:27:12 -0800 Subject: [XML-SIG] Question about namespace declarations References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com> Message-ID: <3A843680.64D8235B@ActiveState.com> I am probably missing some context but your reading of the XML namespaces specification is correct. Minidom does not bind xmlns to REC-xml-names. Which DOM does? Paul Prescod From paulp@ActiveState.com Fri Feb 9 18:31:40 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 09 Feb 2001 10:31:40 -0800 Subject: [XML-SIG] Question about namespace declarations References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com> Message-ID: <3A84378C.19B0AFD2@ActiveState.com> Oops, I just found this: "Note: In the DOM, all namespace declaration attributes are by definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". These are the attributes whose namespace prefix or qualified name is "xmlns". Although, at the time of writing, this is not part of the XML Namespaces specification [Namespaces], it is planned to be incorporated in a future revision." http://www.w3.org/TR/DOM-Level-2-Core/core.html Paul Prescod From dsturtevant@comversens.com Fri Feb 9 18:50:23 2001 From: dsturtevant@comversens.com (Sturtevant, Dean) Date: Fri, 9 Feb 2001 13:50:23 -0500 Subject: [XML-SIG] problem with saxdemo.py Message-ID: Hi - I'm trying to find a simple example of the usage of the xml parser, so I thought I'd try saxdemo. But there's a problem: saxdemo wants saxexts from xml/sax, which doesn't exist in the python 2.0 installation. Should I look to another example? Which one? - Dean From fdrake@acm.org Fri Feb 9 18:46:07 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 9 Feb 2001 13:46:07 -0500 (EST) Subject: [Parsed-XML-Dev] Re: [XML-SIG] Question about namespace declarations In-Reply-To: <3A84378C.19B0AFD2@ActiveState.com> References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com> <3A84378C.19B0AFD2@ActiveState.com> Message-ID: <14980.15087.450232.203602@cj42289-a.reston1.va.home.com> Paul Prescod writes: > "Note: In the DOM, all namespace declaration attributes are by > definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". > These are the attributes whose namespace prefix or qualified name is > "xmlns". Although, at the time of writing, this is not part of the XML > Namespaces specification [Namespaces], it is planned to be incorporated > in a future revision." Boy is this stuff messy! The context is a DOM implementation I'm working on for use in Zope and some DOM client code Guido is working on. Our current implementation does what the DOM Level 2 recommendation says, but Guido complained because exposed a bug elsewhere in our DOM when he tried to insert namespace declarations. There's some information about our DOM project at: http://dev.zope.org/Wikis/DevSite/Projects/ParsedXML/ -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Fri Feb 9 18:52:39 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 09 Feb 2001 11:52:39 -0700 Subject: [XML-SIG] Question about namespace declarations In-Reply-To: Message from Paul Prescod of "Fri, 09 Feb 2001 10:31:40 PST." <3A84378C.19B0AFD2@ActiveState.com> Message-ID: <200102091852.LAA18011@localhost.localdomain> > Oops, I just found this: > > "Note: In the DOM, all namespace declaration attributes are by > definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". > These are the attributes whose namespace prefix or qualified name is > "xmlns". Although, at the time of writing, this is not part of the XML > Namespaces specification [Namespaces], it is planned to be incorporated > in a future revision." > > http://www.w3.org/TR/DOM-Level-2-Core/core.html Yes. And this is precisely how 4DOM, pDomlette and cDomlette are implemented. If minidom doesn't have a namespaceURI assigned for namespace declaration attributes, it should be fixed. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Sat Feb 10 00:29:01 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 09 Feb 2001 16:29:01 -0800 Subject: [XML-SIG] [Bug #131797] failed build on 2.1a2 and 2.0 Message-ID: Bug #131797, was updated on 2001-Feb-09 16:29 Here is a current snapshot of the bug. Project: Python/XML Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: bcollar Assigned to : nobody Summary: failed build on 2.1a2 and 2.0 Details: I get the following output from "python2.1 ./setup.py build", on debian. The same occurs when using python2.0: running build_ext building '_xmlplus.parsers.pyexpat' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD -DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok -Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o extensions/pyexpat.c: In function `getcode': extensions/pyexpat.c:262: warning: passing arg 11 of `PyCode_New' makes pointer from integer without a cast extensions/pyexpat.c:262: too few arguments to function `PyCode_New' extensions/pyexpat.c: In function `call_with_frame': extensions/pyexpat.c:289: too few arguments to function `PyFrame_New' error: command 'gcc' failed with exit status 1 For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=131797&group_id=6473 From don_wakefield@mentorg.com Sat Feb 10 02:05:55 2001 From: don_wakefield@mentorg.com (Don Wakefield) Date: Fri, 9 Feb 2001 18:05:55 -0800 (PST) Subject: [XML-SIG] Using PyExpat.py Message-ID: <14980.41475.888529.565845@gargle.gargle.HOWL> I'm trying to construct a DOM using PyExpat.py. My environment is: Python 1.5.2 PyXML 0.6.2 Here's the simple code. I've added a few lines like reader._override = None to get past errors that I didn't understand until I came to this point. Now I don't know what more to do... ------------------------------------------ from xml.parsers import pyexpat from xml.dom.ext.reader import PyExpat import time class Cells: def __init__(self, filename): try: reader = PyExpat.Reader() reader._override = None fp = open(filename, 'r') xml_dom_object = reader.fromStream(fp) except Exception, msg: print "Exception caught:", msg return self.root = xml_dom_object.documentElement if __name__ == '__main__': import sys if len(sys.argv) == 2: start = time.clock() x = Cells(sys.argv[1]) end = time.clock() print "Finished loading:", end - start else: print "Usage: python %s [XML-filename]" % sys.argv[0] Here is the output of a run: ---------------------------- <343 : /user/donw/src/Demo/bigproto> !334 python_ic Time2.py big.xml Exception caught: pyexpat Finished loading: 0.0 ---------------------------- Can anybody tell me what I'm doing wrong? The goal here is to use pyexpat.so to speed the building of the DOM. Thanks for any comments. -- Don Wakefield Mentor Graphics Corporation (503) 685-1262 8005 S.W. Boeckman Road don_wakefield@mentorg.com Wilsonville, OR 97070-7777 From martin@loewis.home.cs.tu-berlin.de Sat Feb 10 07:00:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 10 Feb 2001 08:00:33 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <14980.41475.888529.565845@gargle.gargle.HOWL> (message from Don Wakefield on Fri, 9 Feb 2001 18:05:55 -0800 (PST)) References: <14980.41475.888529.565845@gargle.gargle.HOWL> Message-ID: <200102100700.f1A70X701220@mira.informatik.hu-berlin.de> > I'm trying to construct a DOM using PyExpat.py. My environment is: > > Python 1.5.2 > PyXML 0.6.2 [...] > try: > reader = PyExpat.Reader() > reader._override = None > fp = open(filename, 'r') > xml_dom_object = reader.fromStream(fp) > except Exception, msg: > print "Exception caught:", msg > return [...] > Can anybody tell me what I'm doing wrong? That's hard to say. First, a number of changes have been made since 0.6.2; I can't reproduce your problem. In any case, I recommend to let the exception through instead of trying to print it this way: it is much more informative to get a full traceback, and full information about the exception. Regards, Martin From don_wakefield@mentorg.com Sat Feb 10 18:52:20 2001 From: don_wakefield@mentorg.com (Don Wakefield) Date: Sat, 10 Feb 2001 10:52:20 -0800 (PST) Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102100700.f1A70X701220@mira.informatik.hu-berlin.de> References: <14980.41475.888529.565845@gargle.gargle.HOWL> <200102100700.f1A70X701220@mira.informatik.hu-berlin.de> Message-ID: <14981.36324.913804.941652@gargle.gargle.HOWL> >>>>> "Martin" == Martin v Loewis writes: Martin> I recommend to let the exception through instead of trying to Martin> print it this way Thanks. I tried that, and got the following traceback: python_ic Timediag.py cs39.xml Traceback (innermost last): File "Timediag.py", line 20, in ? x = Cells(sys.argv[1]) File "Timediag.py", line 12, in __init__ xml_dom_object = reader.fromStream(fp) File "/wv/icdet/python_src/12-19-00/BUILD_AREA/ss6/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 65, in fromStream self.initParser() File "/wv/icdet/python_src/12-19-00/BUILD_AREA/ss6/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 51, in initParser self.parser=pyexpat.ParserCreate() NameError: pyexpat But if I start python from the command line, I can do: <47 : /user/donw/src/Demo/bigproto> python Python 1.5.2 (#1, Dec 20 2000, 08:50:14) [GCC 2.9-mentor-98r2p24] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from xml.parsers import pyexpat >>> parser=pyexpat.ParserCreate() >>> ^D So my environment is fine. PyExpat.py does not import pyexpat, but I do in my calling test script: from xml.parsers import pyexpat from xml.dom.ext.reader import PyExpat Martin> That's hard to say. First, a number of changes have been made since Martin> 0.6.2; I can't reproduce your problem. Note that I've downloaded PyXML-0.6.3 from Sourceforge (haven't installed it yet) and PyExpat.py in *that* version does not import pyexpat either. So if you are not able to duplicate the problem with that version, it must be something deeper... I'll try installing 0.6.3 and hammer on it for awhile. Note that I'm able to build a DOM using the lines: from xml.dom import ext from xml.dom.ext.reader import Sax2 : : xml_dom_object = Sax2.FromXmlFile(filename, validate=0) I'm just hoping to build a DOM with expat to improve performance. Thanks for any suggestions. -- Don Wakefield Mentor Graphics Corporation (503) 685-1262 8005 S.W. Boeckman Road don_wakefield@mentorg.com Wilsonville, OR 97070-7777 From uche.ogbuji@fourthought.com Sat Feb 10 21:07:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 10 Feb 2001 14:07:40 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Don Wakefield of "Sat, 10 Feb 2001 10:52:20 PST." <14981.36324.913804.941652@gargle.gargle.HOWL> Message-ID: <200102102107.OAA00904@localhost.localdomain> > I'll try installing 0.6.3 and hammer on it for awhile. Note that I'm > able to build a DOM using the lines: > > from xml.dom import ext > from xml.dom.ext.reader import Sax2 > : > : > xml_dom_object = Sax2.FromXmlFile(filename, validate=0) > > I'm just hoping to build a DOM with expat to improve performance. Thanks > for any suggestions. I do recommend the upgrade, and 0.6.4 is on its way. As a forewarning, the 0.6.3 and up way is from xml.dom.ext.reader import PyExpat #or Sax2 reader = PyExpat.Reader() xml_dom_object = reader.fromUri(filename) #should work for either URL or file Good luck. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From guido@digicool.com Sat Feb 10 22:13:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 10 Feb 2001 17:13:23 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Your message of "Sat, 10 Feb 2001 14:07:40 MST." <200102102107.OAA00904@localhost.localdomain> References: <200102102107.OAA00904@localhost.localdomain> Message-ID: <200102102213.RAA28403@cj20424-a.reston1.va.home.com> > xml_dom_object = reader.fromUri(filename) #should work for either URL or file Let's talk about this comment. Is it really a good idea to build URL access right into the API here? For apps that need this, it's trivial to write as long as the reader takes an open file object ("stream") as an alternative to a filename: just call urllib.urlopen(uri) and pass it as the argument. Case in point: I found this bit in saxutilx.py: if os.path.isfile(sysid): basehead = os.path.split(os.path.normpath(base))[0] source.setSystemId(os.path.join(basehead, sysid)) f = open(sysid, "rb") else: source.setSystemId(urlparse.urljoin(base, sysid)) f = urllib.urlopen(source.getSystemId()) Now I don't know under which circumstances this get triggered (the context is obscure), but I'd say it's a bad idea to just try to open a URL when a string isn't a local file. Maybe *you* live in a world where the network is "always on" (and I do too!), but for plenty of folks, it's rather annoying to find that their modem starts dialing out each time they make a typo in a filename. Besides, the syntax for local filenames and URLs is not the same; the quoting conventions are different and it's quite possible to find that the same name could be either a URL or a filename, with vastly different interpretations. (See nturl2path.) Without more context, it's unclear which syntax should be tried first. The application knows this, but the library doesn't. It's also fine to have an alternative API that takes a URL instead of a local filename -- but it's not okay to attempt to overlap the two namespaces. --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Sun Feb 11 00:41:24 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 10 Feb 2001 17:41:24 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Guido van Rossum of "Sat, 10 Feb 2001 17:13:23 EST." <200102102213.RAA28403@cj20424-a.reston1.va.home.com> Message-ID: <200102110041.RAA09847@localhost.localdomain> > > xml_dom_object = reader.fromUri(filename) #should work for either URL or file > > Let's talk about this comment. Is it really a good idea to build URL > access right into the API here? For apps that need this, it's trivial > to write as long as the reader takes an open file object ("stream") as > an alternative to a filename: just call urllib.urlopen(uri) and pass > it as the argument. Yes, but XML's interactions with URI are by no means straightforward. The reason that URIs are built into so many APIs side-by-side with stream APIs (and this is the case in all implementations I know of Python or not) is to allow a smooth interface to all the URI complications XML brings about, mainly the network of rules for luuk-up according to base-URI reolution. Basically, in XML just about everything is a URI. Some implementations (such as PySAX) resolve to local file names merely as a convenience to the user. And, for instance, there is the matter that URIs are a superset of URL, and esoterica such as URNs actually do exist in the XML fairy land. > Case in point: I found this bit in saxutilx.py: > > if os.path.isfile(sysid): > basehead = os.path.split(os.path.normpath(base))[0] > source.setSystemId(os.path.join(basehead, sysid)) > f = open(sysid, "rb") > else: > source.setSystemId(urlparse.urljoin(base, sysid)) > f = urllib.urlopen(source.getSystemId()) > > Now I don't know under which circumstances this get triggered (the > context is obscure), but I'd say it's a bad idea to just try to open a > URL when a string isn't a local file. Maybe *you* live in a world > where the network is "always on" (and I do too!), but for plenty of > folks, it's rather annoying to find that their modem starts dialing > out each time they make a typo in a filename. I think this is a good point in general, but the attitude embodied into many XML practices is just this "always on" mentality. This matter is the subject of debate every month or so on XML-DEV. In fact, there are far more nasty implications of XML's URI-happiness than just the modem dialing example. But I must say: unless urllib is broken, I don't see why this would cause any modem dialing in any environment other than Windows, where unfortunately drive specifiers look like URL schemes. And even in windows, why would this cause dialing in a case other than when someone has ill-advisedly set up a share drive called http: or ftp:? > Besides, the syntax for local filenames and URLs is not the same; I didn't know there was any universal syntax for local filenames. > the quoting conventions are different and it's quite possible to find that > the same name could be either a URL or a filename, with vastly > different interpretations. I don't see where this is a problem. If someone wants file "hello\\ world" on his local drive, he can just specify it as so, and if someone wants "http://spam.com/hello%20world", he can just specify it as so. If he tries to resolve "http://spam.com/hello\\ world", he should get a malformed URL error from his user agent or library. The solution is to use URL quoting if you want a URL, and your local quoting convention if you want a local file. > (See nturl2path.) Ah. I don't claim to be able to speak intelligently about Windows NT. > Without more context, > it's unclear which syntax should be tried first. The application > knows this, but the library doesn't. It's also fine to have an > alternative API that takes a URL instead of a local filename -- but > it's not okay to attempt to overlap the two namespaces. Actually, the library does know. There is very little about XML that has anything to do with file names. Pretty much everything is a URI. In most cases, the library's trying to resolve a file name first is merely a convenience to the user so that he doesn't need to deal with URI arcana for local resources, say by type "file:" before every path. If anything is to be done, I'd say this convenience should be taken away. But I don't see a problem big enough to warrant doing so. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From don_wakefield@mentorg.com Sun Feb 11 01:45:20 2001 From: don_wakefield@mentorg.com (Don Wakefield) Date: Sat, 10 Feb 2001 17:45:20 -0800 (PST) Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102102107.OAA00904@localhost.localdomain> References: <14981.36324.913804.941652@gargle.gargle.HOWL> <200102102107.OAA00904@localhost.localdomain> Message-ID: <14981.61104.276292.699921@gargle.gargle.HOWL> >>>>> "Uche" == Uche Ogbuji writes: Uche> I do recommend the upgrade, and 0.6.4 is on its way. I installed 0.6.3, and immediately encountered several problems. Part of this may be my freshness to Python. My environment may not be complete in some way. First things first: - Using the code supplied by Uche below, I got a complaint of 'os' not being visible within PyExpat.py. It isn't imported there, and my importing it into my calling script didn't help. I had to add the 'import os' to the top of PyExpat.py to eliminate this error. - The next problem was this: > python Timediag.py cs39.xml Traceback (innermost last): File "Timediag.py", line 19, in ? x = Cells(sys.argv[1]) File "Timediag.py", line 11, in __init__ xml_dom_object = reader.fromUri(filename) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 80, in fromUri rt = self.fromStream(stream, doc,stripElements) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 64, in fromStream if not self._override: AttributeError: _override And indeed this variable is not defined in PyExpat.py. I added a line to my own script thusly: reader._override = None This eliminated that error, allowing me to move on to the next one. - Here is the next traceback: > python Timediag.py cs39.xml Traceback (innermost last): File "Timediag.py", line 20, in ? x = Cells(sys.argv[1]) File "Timediag.py", line 12, in __init__ xml_dom_object = reader.fromUri(filename) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 80, in fromUri rt = self.fromStream(stream, doc,stripElements) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 65, in fromStream self.initParser() File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 51, in initParser self.parser=pyexpat.ParserCreate() NameError: pyexpat Based on my experience with 'os', I placed the line 'from xml.parsers import pyexpat' directly into PyExpat.py. Now that error has gone away... - I ran again, and got: > python Timediag.py cs39.xml Traceback (innermost last): File "Timediag.py", line 20, in ? x = Cells(sys.argv[1]) File "Timediag.py", line 12, in __init__ xml_dom_object = reader.fromUri(filename) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 81, in fromUri rt = self.fromStream(stream, doc,stripElements) File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 67, in fromStream self.initState(doc, stripElements) TypeError: too many arguments; expected 2, got 3 Checking in PyExpat.py, I indeed discovered that the caller was supplying stripElements, while the method did not take such an argument. I.e.: class Reader: : def initState(self, doc=None): : def fromStream(self, stream, doc=None, stripElements=None): if not self._override: self.initParser() self.initState(doc, stripElements) I've stopped here. Admittedly some of these problems may stem from my limited understanding of Python modules and/or namespaces. But the checking of an undefined variable, and the calling of a method with more than the defined number of arguments, leads me to believe that I've somehow managed to pick up a corrupted/scrambled version of PyXML-0.6.3.tar.gz. Is this possible? Or do 0.6.2 and 0.6.3 just work better with Python 2.0? I'm currently stuck with 1.5.2, so I hope not. Uche> As a forewarning, the 0.6.3 and up way is Uche> from xml.dom.ext.reader import PyExpat #or Sax2 Uche> reader = PyExpat.Reader() Uche> xml_dom_object = reader.fromUri(filename) #should work for either URL or file By the way, thanks for all the friendly advice so far. I've noticed that this list has more traffic by far relating to development work than questions like mine, so I hope this isn't an intrusion. -- Don Wakefield Mentor Graphics Corporation (503) 685-1262 8005 S.W. Boeckman Road don_wakefield@mentorg.com Wilsonville, OR 97070-7777 From uche.ogbuji@fourthought.com Sun Feb 11 04:19:12 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 10 Feb 2001 21:19:12 -0700 Subject: [XML-SIG] 4Suite 0.10.2 beta 1 Message-ID: <200102110419.VAA19771@localhost.localdomain> The 4Suite and 4SS 0.10.2 releases are about a week behind schedule. Fingers crossed for Monday or Tuesday. Great stuff in the offing, though. I've posted a beta. Source only for now, but Windows binaries should be along soon. ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b1.tar.gz Please help us find and squash the remaining bugs. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Feb 11 04:33:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 10 Feb 2001 21:33:09 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Don Wakefield of "Sat, 10 Feb 2001 17:45:20 PST." <14981.61104.276292.699921@gargle.gargle.HOWL> Message-ID: <200102110433.VAA22469@localhost.localdomain> > > >>>>> "Uche" == Uche Ogbuji writes: > > Uche> I do recommend the upgrade, and 0.6.4 is on its way. > > I installed 0.6.3, and immediately encountered several problems. Part of > this may be my freshness to Python. My environment may not be complete > in some way. First things first: [Tale of woes snipped] Ouch! I don't use PyXML standalone, but even so I would have imagined screams from every quarter if 0.6.3 was really so broken. I suspect someting might have gone wrong with your installation. I'd suggest either using python setup.py install -f To force file overwrites or just blow away the _xmlplus directory in your Python library and reinstall. Here are the results I get with Python 2.1a2 and 4Suite 0.10.2beta1 (which includes an updated PyXML). Should be the same results with Python 1.5 or 2.0. ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b1.tar.gz [uogbuji@borgia uogbuji]$ cat test.xml toast [uogbuji@borgia uogbuji]$ python Python 2.1a2 (#1, Feb 3 2001, 14:38:13) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from xml.dom.ext.reader import PyExpat >>> from xml.dom.ext import Print >>> reader = PyExpat.Reader() >>> xml_dom_object = reader.fromUri('test.xml') >>> Print(xml_dom_object) toast >>> >>> Huh? Where'd that broken doctype come from? Looks as if I found my own first beta bug. Anyway, in general, you can see that the PyExpat reader works in 4Suite 0.10.2beta1 Note that if your need is for speed and your pattern is just parse and read, you might want to consider cDomlette (in 4Suite only) which is *very* fast, but read-only: [uogbuji@borgia uogbuji]$ python Python 2.1a2 (#1, Feb 3 2001, 14:38:13) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Ft.Lib import cDomlette >>> reader = cDomlette.RawExpatReader() >>> xml_dom_object = reader.fromUri('test.xml') >>> from xml.dom.ext import Print >>> Print(xml_dom_object) toast >>> >>> Hmm. Interesting BTW: no broken doctype. My guess is that the PyExpat reader is inserting an incomplete DocumentType node, but again, this seems to be unrelated to your problems with PyXML 0.6.3. > Uche> As a forewarning, the 0.6.3 and up way is > > Uche> from xml.dom.ext.reader import PyExpat #or Sax2 > Uche> reader = PyExpat.Reader() > Uche> xml_dom_object = reader.fromUri(filename) #should work for either URL or file > > By the way, thanks for all the friendly advice so far. I've noticed that > this list has more traffic by far relating to development work than > questions like mine, so I hope this isn't an intrusion. Not even close. Your messages are *right* on-topic, and highly appreciated. We love to hear all the field-testing reports we can. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From don_wakefield@mentorg.com Sun Feb 11 18:35:58 2001 From: don_wakefield@mentorg.com (Don Wakefield) Date: Sun, 11 Feb 2001 10:35:58 -0800 (PST) Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102110433.VAA22469@localhost.localdomain> References: <14981.61104.276292.699921@gargle.gargle.HOWL> <200102110433.VAA22469@localhost.localdomain> Message-ID: <14982.56206.740790.679411@gargle.gargle.HOWL> >>>>> "Uche" == Uche Ogbuji writes: Uche> [Tale of woes snipped] Uche> [...] I'd suggest either using Uche> python setup.py install -f Uche> To force file overwrites or just blow away the _xmlplus directory in your Uche> Python library and reinstall. Here's an interesting discrepancy. I don't *have* an _xmlplus directory in my Python library. I instead have, starting from PYTHONHOME: lib/python1.5/site-packages/xml. When I installed PyXML-0.6.3, I mv'ed xml away, and sure enough, 'python setup.py install --prefix=$MYDIR' put a new xml directory there, not _xmlplus. Just for chuckles, I tried it your way, and still only got an xml directory. Running your test case with test.xml, I get the following after the '-f' install: <37 : /user/donw/src/Demo/bigproto> python Python 1.5.2 (#1, Feb 10 2001, 16:25:02) [GCC 2.9-mentor-98r2p24] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from xml.dom.ext.reader import PyExpat >>> from xml.dom.ext import Print >>> reader = PyExpat.Reader() >>> xml_dom_object = reader.fromUri('test.xml') Traceback (innermost last): File "", line 1, in ? File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 76, in fromUri if os.path.exists(uri): NameError: os >>> ^D So I'm seeing the same cascade of problems... At least, PyExpat.py doesn't look any different from my last try at an install... I'll work on this more during the week. I'm beginning to think that I'm missing stuff anyway. Does PyXML require anything in the Python environment other than what comes by default? For instance, PyExpat.py has the following line (fragment qyoted): raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR... But to the best of my knowledge, I don't have any Ft module anywhere in my Python install... Could this be part of the problem? Uche> Note that if your need is for speed and your pattern is just parse and read, Uche> you might want to consider cDomlette (in 4Suite only) which is *very* fast, Uche> but read-only: Thanks. Some of my usages will be read-only, so I'll try this out (probably on Monday, since Sundays are busy and I'm fighting a cold ;^)~ ). Uche> Not even close. Your messages are *right* on-topic, and highly appreciated. Uche> We love to hear all the field-testing reports we can. Thanks for the encouragement! -- Don Wakefield Mentor Graphics Corporation (503) 685-1262 8005 S.W. Boeckman Road don_wakefield@mentorg.com Wilsonville, OR 97070-7777 From guenter.radestock@sap.com Mon Feb 12 10:36:06 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Mon, 12 Feb 2001 11:36:06 +0100 Subject: [XML-SIG] windows installer for XML package failing on Windows 95 Message-ID: Hello, I tried to install the XML package onto a Windoze 95 box a few days ago and it does not work. The installer crashes without unpacking source or opening any window. This may be a distutils issue. First: I can successfully unpack the executable with winzip and move the package directory into Python20/Lib. This seems to work, but I am not sure if I should also patch any existing files. Is there a script inside the installer that I should run after unpacking? I did not find a setup.py; the source package won't help me because I would have to install a compiler for the extensions, right? Second: To get the problem (distutils or not?) fixed, I have observed the following: 1. The installer crashes only on this one Libretto 50ct Laptop with Windows 95, second edition. I have successfully used it on other Windows computers. 2. Before installing the XML package, I first removed Python 1.5.2, then removed the TCL/TK that came with 1.5.2, then installed Python 2.0. I did not have Python 1.5.2 on the other systems I installed the package on. I also have an older (don't rememver the exact) version of Winzip on the Libretto - can the Winzip DLL be the source of my problem? - Guenter From guenter.radestock@sap.com Mon Feb 12 10:47:13 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Mon, 12 Feb 2001 11:47:13 +0100 Subject: [XML-SIG] Parsing DTDs Message-ID: Hello, in a current project, I want to parse simple DTDs and generate a kind of recursive descent parser from them. I have built a few of these parsers and they work well. I wanted to use a DTD parser from the XML utilities to do the DTD parsing. Looking into it, I have some problems - maybe someone who knows the utilities better can hint me at what to do next. 1. I have not seen much documentation for the XML package. Is anybody currently working on documentation? Is there any way to extract documentation from the classes? 2. There is a DTD parser inside xmlproc. This seems to be pretty closely coupled to the validating XML parser. At first sight it looks like it gets very low level DTD events and generates finite state automata objects among other things used to validate XML later on. It looks like there is no intermediate representation of the DTD that can (or should) be used for other purposes than validating XML. Is this correct? Have I looked at the wrong piece of code (i.e. is there something in the 4suite package I could use? Thanks in advance for any help. - Guenter From larsga@garshol.priv.no Mon Feb 12 11:05:31 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Feb 2001 12:05:31 +0100 Subject: [XML-SIG] Parsing DTDs In-Reply-To: References: Message-ID: * Guenter Radestock | | 2. There is a DTD parser inside xmlproc. Yup. It is documented at This documentation should also be in the XML-SIG CVS. | This seems to be pretty closely coupled to the validating XML | parser. It is not. The DTD API consists of two parts: an event-based parser and an object structure for representing DTDs that also implements the application interface of the event-based parser. The event-based parser is not tied to the validating XML parser at all. The DTD structure needs a reference to the event-based parser to produce error messages. This is a weakness of the current design, but shouldn't really cause any problems for your application. | At first sight it looks like it gets very low level DTD events and | generates finite state automata objects among other things used to | validate XML later on. It looks like there is no intermediate | representation of the DTD that can (or should) be used for other | purposes than validating XML. Is this correct? Yes, it is. Look at the xmldtd module. That contains the object structure that is built by the parser. The finite state automata are used by the ElementType objects, and are hidden within their interface. You can get access to the information in them, but no the automata themselves. Do let me know if you have problems with the interface in any way. --Lars M. From Alexandre.Fayolle@logilab.fr Mon Feb 12 11:15:05 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 12 Feb 2001 12:15:05 +0100 (CET) Subject: [XML-SIG] Parsing DTDs In-Reply-To: Message-ID: On Mon, 12 Feb 2001, Radestock, Guenter wrote: > 2. There is a DTD parser inside xmlproc. This seems to be pretty closely > coupled to the validating XML parser. At first sight it looks like it > gets very low level DTD events and generates finite state automata > objects among other things used to validate XML later on. It looks > like there is no intermediate representation of the DTD that can (or should) > be used for other purposes than validating XML. Is this correct? Have > I looked at the wrong piece of code (i.e. is there something in the > 4suite package I could use? You can access a DTD object that gets generated from the parsing. The following sample code comes from the xmltools utility set that uses the DTD information to generate contextual menus for an XML editor. There is extensive API documentation on Lars Marius Garshol's page (http://www.garshol.priv.no/download/software/xmlproc/) -------------------------8<------------------------------------- from xml.parsers.xmlproc.dtdparser import DTDParser from xml.parsers.xmlproc.xmldtd import CompleteDTD def parse_dtd_file(dtd_file,dtd_obj=None): parser = DTDParser() dtd = dtd_obj or CompleteDTD(parser) parser.set_dtd_consumer(dtd) parser.set_dtd_object(dtd) parser.parse_resource(dtd_file) parser.deref() return dtd def getElementsName(child,dtd,list=None): """ A recursive function that permits to extract allowed elements name from the complex output tuple of ElementType.get_content_model (something like (',', [('caption', '?'), ('|', [('col', '*'), ('colgroup', '*')], ''), ('thead', '?'), ('tfoot', '?'), ('|', [('tbody', '+'), ('tr', '+')], '')], '') : example of the allowed elements of the HTML tag ) Inputs the complex tuple to be processed. Inputs the dtd object from which the elements have been read Inputs the list in which will be stored the elements name Returns the list """ templist = list or [] # processes the case of child == None (occurs when element content # is specified to be ANY) if (child == None) : # the return list is set to all of the elements declared in the # DTD templist = dtd.get_elements() else : # if the penultimate element of the complex tuple is a list, # then we have to recursively process each element of the list. if type(child[-2])==type([]): for c in child[-2]: templist = getElementsName(c,dtd,templist) # if the penultimate element of the complex tuple is a tuple, # then we have to recursively process this last tuple. elif type(child[-2])==type(()): templist = getElementsName(child[-2],dtd,templist) # else the penultimate element of the complex tuple is a string # containing an allowed element name. We just have to append it # the return list. else: templist.append(child[-2]) return templist ------------------------------8<---------------------------------------- Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From akuchlin@mems-exchange.org Mon Feb 12 16:26:51 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 12 Feb 2001 11:26:51 -0500 Subject: [XML-SIG] XML icons Message-ID: <20010212112651.B3637@thrak.cnri.reston.va.us> A minor thing that could be added to the PyXML distribution would be icons for representing downloadable XML content on Web pages. In April/May 1999, there was an xml-dev discussion of this, and numerous candidates were submitted: http://www.iol.ie/~alank/xml/icons.htm A vote was held so people could choose their favorites, but that page now returns a 404: http://users.javanet.com/~sbrown/icons.html Does anyone recall the results? Or should we just pick some set of graphics and ask the designer's permission to include them? --amk From Alexandre.Fayolle@logilab.fr Mon Feb 12 16:55:21 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 12 Feb 2001 17:55:21 +0100 (CET) Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <14982.56206.740790.679411@gargle.gargle.HOWL> Message-ID: On Sun, 11 Feb 2001, Don Wakefield wrote: > > Uche> To force file overwrites or just blow away the _xmlplus directory in your > Uche> Python library and reinstall. > > Here's an interesting discrepancy. I don't *have* an _xmlplus directory > in my Python library. I instead have, starting from PYTHONHOME: Uche was wrong, there. He forgot you were using Python 1.5.2. _xmlplus comes on python 2.0 to avoid a name conflict. > I'll work on this more during the week. I'm beginning to think that I'm > missing stuff anyway. Does PyXML require anything in the Python > environment other than what comes by default? For instance, PyExpat.py > has the following line (fragment qyoted): > > raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR... > > But to the best of my knowledge, I don't have any Ft module anywhere in > my Python install... Could this be part of the problem? This sounds like a 4Suite problem. I won't attempt to solve this, but just to give you an idea of what's going on. To my best knowledge, xml.dom comes from 4Suite, an XML library from Fourthought. This tiny part of the library is part of PyXML. When used within 4Suite, it uses several other modules in Ft.* (Ft stands for Fourthought). Periodically, changes from the 4S cvs repository are commited to the PyXML cvs repository. And sometimes, these changes were not intended to get there ;o) I think this is what happened in this case. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From guenter.radestock@sap.com Mon Feb 12 17:26:29 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Mon, 12 Feb 2001 18:26:29 +0100 Subject: Expat crashing Python (was: RE: [XML-SIG] Parsing DTDs) Message-ID: > > like there is no intermediate representation of the DTD > that can (or should) > > be used for other purposes than validating XML. Is this > correct? Have > > I looked at the wrong piece of code (i.e. is there something in the > > 4suite package I could use? > > You can access a DTD object that gets generated from the parsing. The > following sample code comes from the xmltools utility set > that uses the > DTD information to generate contextual menus for an XML > editor. There is > extensive API documentation on Lars Marius Garshol's page > (http://www.garshol.priv.no/download/software/xmlproc/) > Thanks a lot for the quick help. It works perfectly well now. There seems to be a problem in pyexpat. It crashes, when I feed it a file with an incorrect XML prefix, something like: or I can reproduce this under Windows 2000, Python 2.0 (bombs out of python with a memory error): --- from xml.parsers import expat po = expat.ParserCreate('ISO-8859-1') po.Parse("""""", 1) --- ***thinking a little*** trying outside emacs, I see a stack trace before it bombs out. so I insert an exception handler: --- from xml.parsers import expat po = expat.ParserCreate('ISO-8859-1') exc = None try: po.Parse("""""", 1) except exc, arg: global xxx xxx = (exc, arg) --- and now I get: --- Traceback (most recent call last): File "C:\perforce\workplace\ims\dev\python-api\python\xml\expattest.py", line 9, in ? SystemError: 'finally' pops bad exception --- Seems to be a problem of some exception handler in the Expat module. From Alexandre.Fayolle@logilab.fr Mon Feb 12 17:38:11 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 12 Feb 2001 18:38:11 +0100 (CET) Subject: [XML-SIG] Re: Expat crashing Python In-Reply-To: Message-ID: On Mon, 12 Feb 2001, Radestock, Guenter wrote: > There seems to be a problem in pyexpat. It crashes, when I feed it > a file with an incorrect XML prefix, something like: I think this has been fixed, or else the bug does not show up on Linux. On a redhat 6.2 box, with python 1.5.2 and 4Suite0.10.2b1 (and whatever version of PyXML comes bundled with it), I get: Traceback (innermost last): File "", line 2, in ? xml.parsers.expat.error: syntax error: line 1, column 6 Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From guenter.radestock@sap.com Mon Feb 12 18:06:56 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Mon, 12 Feb 2001 19:06:56 +0100 Subject: [XML-SIG] RE: Expat crashing Python Message-ID: > -----Original Message----- > From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr] > Sent: Montag, 12. Februar 2001 18:38 > To: Radestock, Guenter > Cc: 'XML-SIG@python.org' > Subject: Re: Expat crashing Python > > > On Mon, 12 Feb 2001, Radestock, Guenter wrote: > > > There seems to be a problem in pyexpat. It crashes, when I feed it > > a file with an incorrect XML prefix, something like: > > I think this has been fixed, or else the bug does not show up > on Linux. > > On a redhat 6.2 box, with python 1.5.2 and 4Suite0.10.2b1 > (and whatever > version of PyXML comes bundled with it), I get: > > Traceback (innermost last): > File "", line 2, in ? > xml.parsers.expat.error: syntax error: line 1, column 6 Thanks again. I tried to reproduce it, too under Linux (SuSE 7) and Windows me and another (nt) system without the XML package. I could not reproduce it on any of those systems. Mine is Win2000. Hope I can find out what the problem is, or at least reproduce it. From Alexandre.Fayolle@logilab.fr Mon Feb 12 19:33:18 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 12 Feb 2001 20:33:18 +0100 (CET) Subject: [XML-SIG] [ANN] VCalsSax : VCal parser with SAX API Message-ID: We have just released vcalsax, which provides a vcal file parser with a SAX API. It is thus possible to see such file as a DOM tree, to manipulate it as if it were some XML data, and then store it back in the native format using an XSL Transformation, or some other scheme It is easy to integrate vcalsax with the PyXML and 4Suite tools. VCal is the file format used by many calendar programs, including KOrganiser and Evolution. http://www.logilab.org/vcalsax/ ftp://ftp.logilab.org/pub/vcalsax/ Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Mon Feb 12 21:30:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 12 Feb 2001 14:30:40 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Don Wakefield of "Sun, 11 Feb 2001 10:35:58 PST." <14982.56206.740790.679411@gargle.gargle.HOWL> Message-ID: <200102122130.OAA18163@localhost.localdomain> > > >>>>> "Uche" == Uche Ogbuji writes: > > Uche> [Tale of woes snipped] > > Uche> [...] I'd suggest either using > > Uche> python setup.py install -f > > Uche> To force file overwrites or just blow away the _xmlplus directory in your > Uche> Python library and reinstall. > > Here's an interesting discrepancy. I don't *have* an _xmlplus directory > in my Python library. I instead have, starting from PYTHONHOME: > lib/python1.5/site-packages/xml. When I installed PyXML-0.6.3, I mv'ed > xml away, and sure enough, 'python setup.py install --prefix=$MYDIR' put > a new xml directory there, not _xmlplus. Sorry, I forgot that in Python 1.5 it is the xml dir not the _xmlplus dir. > I'll work on this more during the week. I'm beginning to think that I'm > missing stuff anyway. Does PyXML require anything in the Python > environment other than what comes by default? For instance, PyExpat.py > has the following line (fragment qyoted): > > raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR... > > But to the best of my knowledge, I don't have any Ft module anywhere in > my Python install... Could this be part of the problem? Hmm. I thought this was removed from PyXML 0.6.3. The Ft module is part of 4Suite. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From MichaelDyck@home.com Tue Feb 13 08:42:01 2001 From: MichaelDyck@home.com (Michael Dyck) Date: Tue, 13 Feb 2001 00:42:01 -0800 Subject: [XML-SIG] problems with PyXML 0.6.3 Message-ID: <3A88F359.991E26FD@home.com> I downloaded PyXML-0.6.3.win32-py2.0.exe and ran it. Here are some comments: The first time I ran it, it installed into my existing _xmlplus directory, which left some old files, which confused python. Shouldn't the installer remove or rename the existing _xmlplus dir first? xmldoc/README says it's "v0.6.2" xmldoc/README could note that if you've just run an installer, you don't have to do any of the "python setup.py ..." commands. (At least, I *think* you don't have to.) xmldoc/test: Either xmldoc/README or (new file) xmldoc/test/README should tell you how to run the tests in this dir (`python testxml.py -g', I think), and how to interpret what happens. Similarly for subdirs. Maybe tests should be run automatically on installation. I had 2 tests fail: test test_sax crashed -- exception.SystemError : 'finally' pops bad exception test test_saxdrivers crashed -- exceptions.IOError : [Errno url error] unknown url type: 'c' xmldoc/test/dom: When I tried `python test.py', I got "Error in syntax" right away. When I ran one of my DOM programs, I got this exception: from xml.dom.Node import Node ImportError: No module named Node When I tried removing the ".Node" from the import statement, the program ran as before, so apparently that is the fix, but shouldn't this be noted fairly prominently in xmldoc/README or xmldoc/README.dom? xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and ../README.html, which do not exist. -Michael Dyck From guenter.radestock@sap.com Tue Feb 13 10:10:12 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Tue, 13 Feb 2001 11:10:12 +0100 Subject: [XML-SIG] problems with PyXML 0.6.3 Message-ID: > I had 2 tests fail: > test test_sax crashed -- > exception.SystemError : 'finally' pops bad exception > test test_saxdrivers crashed -- > exceptions.IOError : [Errno url error] unknown url type: 'c' The first is the same problem I tried to reproduce yesterday. It happens only on Windows NT or Windows 2000 with the installed XML package. I only wish I had some time to look into the coding (I will try). Downgrading to an older version of the expat extension may help; the one supplied with Python2.0 does not have the problem. The problem is whenever you parse incorrect XML, Python may crash, instead of just raising an exception (very unfortunate e.g. in an http server process). From guenter.radestock@sap.com Tue Feb 13 10:28:53 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Tue, 13 Feb 2001 11:28:53 +0100 Subject: [XML-SIG] Bug: XML Prolog from xml.sax.writer should contain version Message-ID: The xml.sax.writer in 0.6.3 (and previous) will output a prolog like this is incorrect, according to the XML 1.0 specification and will not be parsed by expat. When outputting an encoding, the writer must also say an XML version number. I changed the code in sax/writer.py (unfortunately, I don't have diff available here): def startDocument(self): if self.__syntax.pic == "?>": lit = self.__syntax.lit s = '%sxml version="1.0" encoding%s%siso-8859-1%s' % ( self.__syntax.pio, self.__syntax.vi, lit, lit) if self.__standalone: s = '%s standalone%s%s%s%s' % ( s, self.__syntax.vi, lit, self.__standalone, lit) self._write("%s%s\n" % (s, self.__syntax.pic)) please anybody fix this on sourceforge so it will be OK in the next release. - Guenter From don_wakefield@mentorg.com Tue Feb 13 18:28:21 2001 From: don_wakefield@mentorg.com (Don Wakefield) Date: Tue, 13 Feb 2001 10:28:21 -0800 (PST) Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102122130.OAA18163@localhost.localdomain> References: <14982.56206.740790.679411@gargle.gargle.HOWL> <200102122130.OAA18163@localhost.localdomain> Message-ID: <14985.31941.949839.528973@gargle.gargle.HOWL> >>>>> "Uche" == Uche Ogbuji writes: >> [...] When I installed PyXML-0.6.3 [...] >> raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR... >> >> But to the best of my knowledge, I don't have any Ft module anywhere >> in my Python install [...] Uche> Hmm. I thought this was removed from PyXML 0.6.3. The Ft module Uche> is part of 4Suite. Since I downloaded the 0.6.3 tarball from Sourceforge, and PyExpat.py contains that line, there must have been a merge error... -- Don Wakefield Mentor Graphics Corporation (503) 685-1262 8005 S.W. Boeckman Road don_wakefield@mentorg.com Wilsonville, OR 97070-7777 From MichaelDyck@home.com Wed Feb 14 04:02:15 2001 From: MichaelDyck@home.com (Michael Dyck) Date: Tue, 13 Feb 2001 20:02:15 -0800 Subject: [XML-SIG] problems with PyXML 0.6.3 References: Message-ID: <3A8A0347.FE109E3C@home.com> "Radestock, Guenter" wrote: > > > I had 2 tests fail: > > test test_sax crashed -- > > exception.SystemError : 'finally' pops bad exception > > test test_saxdrivers crashed -- > > exceptions.IOError : [Errno url error] unknown url type: 'c' > > The first is the same problem I tried to reproduce yesterday. It > happens only on Windows NT or Windows 2000 with the installed > XML package. I'm using Windows 95, so you can add that to the list. -Michael Dyck From MichaelDyck@home.com Wed Feb 14 08:42:08 2001 From: MichaelDyck@home.com (Michael Dyck) Date: Wed, 14 Feb 2001 00:42:08 -0800 Subject: [XML-SIG] bug in xml.dom.Document.importNode? Message-ID: <3A8A44E0.FEBD419C@home.com> When I "import" a node from one document into another, it loses attributes. To reproduce: ------------- from xml.dom import Document from xml.dom.ext.reader.Sax import FromXml from xml.dom.ext import PrettyPrint doc1 = FromXml("") original_node = doc1.documentElement PrettyPrint( original_node ) doc2 = Document.Document( None ) imported_node = doc2.importNode( original_node, deep=1 ) PrettyPrint( imported_node ) ------------- prints out: This happened with Python 2.0, and also happens with PyXML 0.6.3. (I'm on Windows 95, if that makes a difference.) I think the problem is somewhere near Element.__setstate__'s call to setNamedItemNS. If someone could provide a fix or workaround, I would appreciate it. -Michael Dyck From jere.kahanpaa@helsinki.fi Wed Feb 14 11:09:51 2001 From: jere.kahanpaa@helsinki.fi (Jere =?iso-8859-1?Q?Kahanp=E4=E4?=) Date: Wed, 14 Feb 2001 13:09:51 +0200 Subject: [XML-SIG] Unicode support problems in parsers Message-ID: <3A8A677F.B227C56B@helsinki.fi> Dear XML/Python-gurus, I've encountered a slight problem while using the otherwise quite excellent PyXML package (version 0.6.2, IIRC). One of my functions iterates thought a long list of long XML files with varying encodings, which makes it quite sensisitive to both memory use and Unicode issues. I'm using the DOM interface and read the XML data using import xml.dom.ext.reader.Sax2 f = open('myfile') doc = xml.dom.ext.reader.Sax2.FromXMLStream(f) f.close() Unfortunately the default parser seeems to have serious memory management problems: the total amount of used memory grows by 1-2 megabytes for each processed file. A forced garbage collection (this is Py2.0) doesn't help at all. The most obvious solution was to use a different parser - we needed a validating parser anyhow. And adding the keyword 'validate=1' to the 'FromXMLStream' call did indeed solve the memort leak bug. However, an even more serious problem was now encountered; the default *validating* parser returns normal Python string, while the default parser returns Unicode strings as any sensible XML-processing tool should do. This behaviour do cause any amount of trouble elsewhere in the code: The PrettyPrinter, for example, don't work at all with normal strings with non-ascii chars. I don't have the names of the parsers with problems right here, but the test runs were done on a Linux box with PyXML 0.6.2. Yours Jere Kahanpää jere.kahanpaa@helsinki.fi From Alexandre.Fayolle@logilab.fr Wed Feb 14 12:57:37 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 13:57:37 +0100 (CET) Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: <3A8A44E0.FEBD419C@home.com> Message-ID: On Wed, 14 Feb 2001, Michael Dyck wrote: > When I "import" a node from one document into another, it loses attributes. This is a known bug in the 4DOM version that shipped with PyXML 0.6.3. It has been fixed in the CVS. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Wed Feb 14 14:20:01 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 07:20:01 -0700 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: Message from "Radestock, Guenter" of "Tue, 13 Feb 2001 11:10:12 +0100." Message-ID: <200102141420.HAA01615@localhost.localdomain> > > I had 2 tests fail: > > test test_sax crashed -- > > exception.SystemError : 'finally' pops bad exception > > test test_saxdrivers crashed -- > > exceptions.IOError : [Errno url error] unknown url type: 'c' > > The first is the same problem I tried to reproduce yesterday. It > happens only on Windows NT or Windows 2000 with the installed > XML package. I only wish I had some time to look into the coding > (I will try). Downgrading to an older version of the expat extension > may help; the one supplied with Python2.0 does not have the problem. I think this may be the sort of problem Guido was pointing to this weekend. My guess is that you specified "c:\foo\bar.xml" as for parsing, and the software checked and saw that that file did not exist, and then tried to interpret it as a URL. So as usual, it seems the BDFL is right, but not for the reasons he originally gave. So can we think of a better algorithm than the current "check for file, and if it doesn't exist, just blindly toss it to urllib)? I personally think it's more important to be able to interpret things as URL than to interpret things as a file-name. Maybe a flag named "force_file_interpretation" or the like is in order. This problem affects 4Suite as well. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Wed Feb 14 14:22:56 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 07:22:56 -0700 Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: Message from Michael Dyck of "Wed, 14 Feb 2001 00:42:08 PST." <3A8A44E0.FEBD419C@home.com> Message-ID: <200102141422.HAA01626@localhost.localdomain> > When I "import" a node from one document into another, it loses attributes. > > To reproduce: > ------------- > from xml.dom import Document > from xml.dom.ext.reader.Sax import FromXml > from xml.dom.ext import PrettyPrint > > doc1 = FromXml("") > original_node = doc1.documentElement > PrettyPrint( original_node ) > > doc2 = Document.Document( None ) > imported_node = doc2.importNode( original_node, deep=1 ) > PrettyPrint( imported_node ) > ------------- > prints out: > > > > This happened with Python 2.0, and also happens with PyXML 0.6.3. > (I'm on Windows 95, if that makes a difference.) > > I think the problem is somewhere near Element.__setstate__'s call to > setNamedItemNS. > > If someone could provide a fix or workaround, I would appreciate it. I think Jeremey fixed this in 4Suite, and we'll be checking this into PyXML. Hopefully, based on all the problems reported lately, there will soon be a PyXML 0.6.4. After today's 4Suite release (yes, we're in final packaging. Hooray!) we'll be removing 4DOM from the package so it lives completely in PyXML. This should accelerate maintenance. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Juergen Hermann" Hi! I know of two SOAP implementations for Python: * soaplib.py by PythonWare, more or less beta software * Scarab - the WANT to implement SOAP, there's already a module named SOAP.py, but they're seemingly not ready yet Any other implementations you know of? Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From fdrake@acm.org Wed Feb 14 14:36:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 09:36:09 -0500 (EST) Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: <200102141422.HAA01626@localhost.localdomain> References: <3A8A44E0.FEBD419C@home.com> <200102141422.HAA01626@localhost.localdomain> Message-ID: <14986.38873.297269.693330@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > After today's 4Suite release (yes, we're in final packaging. > Hooray!) we'll be removing 4DOM from the package so it lives > completely in PyXML. This should accelerate maintenance. Excellent news! Are you planning to update the PyXML CVS as soon as 4Suite is released? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Alexandre.Fayolle@logilab.fr Wed Feb 14 14:48:42 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 15:48:42 +0100 (CET) Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: <200102141420.HAA01615@localhost.localdomain> Message-ID: On Wed, 14 Feb 2001, Uche Ogbuji wrote: > So can we think of a better algorithm than the current "check for file, and if > it doesn't exist, just blindly toss it to urllib)? If running windows, and the second character of the 'url' is a colon, replace it with a pipe and prepend file: to the url? > This problem affects 4Suite as well. I had to use a similar hack when generating a CATALOG file for Narval, for use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine until it got C|\fooo\dtd_base\ Maybe what we need is a new function in os.path or similar that would perform the file -> URL conversion described above. This would ease the work of application writers. I, for one, would be much more at ease if I knew that no implicit assumptions are made on what I pass. If the API requires an URI/URL, then this is what it should get. Opinions? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Wed Feb 14 14:52:54 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 15:52:54 +0100 (CET) Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: Message-ID: On Wed, 14 Feb 2001, Juergen Hermann wrote: > Hi! > > I know of two SOAP implementations for Python: > * soaplib.py by PythonWare, more or less beta software We're using this in Narval. It works well. However it chokes on unicode strings, so beware if you're planning to use it with python2.0 > * Scarab - the WANT to implement SOAP, there's already a module named > SOAP.py, but they're seemingly not ready yet Do you have a URL for this one? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Wed Feb 14 14:54:44 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 15:54:44 +0100 (CET) Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: <200102141422.HAA01626@localhost.localdomain> Message-ID: On Wed, 14 Feb 2001, Uche Ogbuji wrote: > > If someone could provide a fix or workaround, I would appreciate it. > > I think Jeremey fixed this in 4Suite, and we'll be checking this into PyXML. > Hopefully, based on all the problems reported lately, there will soon be a > PyXML 0.6.4. Well, I thought it was in PyXML CVS. Sorry for the missinformation. It is most certainly fixed in 4Suite. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Wed Feb 14 15:10:17 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 10:10:17 -0500 (EST) Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: References: Message-ID: <14986.40921.565960.792080@cj42289-a.reston1.va.home.com> Alexandre Fayolle writes: > On Wed, 14 Feb 2001, Juergen Hermann wrote: > > * Scarab - the WANT to implement SOAP, there's already a module named > > SOAP.py, but they're seemingly not ready yet > > Do you have a URL for this one? http://casbah.org/Scarab/ I don't see a date on the page stating when this was last updated, and the casbah.org pages seem old. (Ken MacLeod, can you inform us on this? Or add dates to the Web pages?) The front page at casbah.org doesn't contain a link to Scarab, and the "Casbah Glossary" link is broken (which is one place I'd expect to see a reference to Scarab). There might be more information in the download package. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From rsalz@caveosystems.com Wed Feb 14 15:25:50 2001 From: rsalz@caveosystems.com (Rich Salz) Date: Wed, 14 Feb 2001 10:25:50 -0500 Subject: [XML-SIG] problems with PyXML 0.6.3 References: Message-ID: <3A8AA37E.36AD9DF6@caveosystems.com> > If running windows, and the second character of the 'url' is a colon, > replace it with a pipe and prepend file: to the url? Yes, it *IS* really gross, but internal windows code does this; I've seen it, as part of a DCOM port (monikers, anyone?). The test is if (isalpha(name[0]) && name[1] == ':') ... actually, it might be isupper not isalpha, I can't recall. /r$ From fdrake@acm.org Wed Feb 14 15:47:21 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 10:47:21 -0500 (EST) Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: References: <200102141420.HAA01615@localhost.localdomain> Message-ID: <14986.43145.945513.277893@cj42289-a.reston1.va.home.com> Alexandre Fayolle writes: > Maybe what we need is a new function in os.path or similar that would > perform the file -> URL conversion described above. This would ease the > work of application writers. I, for one, would be much more at ease if I > knew that no implicit assumptions are made on what I pass. If the API > requires an URI/URL, then this is what it should get. I started to write a response saying "take a look at urllib.pathname2url()", but upon thinking more about it and chatting with Guido on the topic, have concluded that that's not the right response. Aside from urllib.pathname2url() being undocumented. ;) What we decided was that while the "XML world" uses URIs for system identifiers, it still doesn't make a lot of sense for the Python APIs to hide the distinction between URLs and filenames (and URNs, if you're using those). What it comes down to is that there is no way to ensure proper conversion from a filename to a URL for an arbitrary system, and the application will generally need to know the difference anyway. There are two places which need to feed data to an XML parser: the public API which tells it to start parsing, and the internal entity management. The later can either be disabled (or non-existant), or should allow the application to provide an entity manager which can do whatever makes sense with regard to opening network resources. From this, it is reasonable to infer that we should be able to provide data to the parser by passing it a string and/or a file object. Anything which opens a file based on a filename or URL is a convenience method, and the URL and filename forms should be distinct. (And let's face it: while urllib may be a convenient entity manager, it's not an efficient one!) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From ken@bitsko.slc.ut.us Wed Feb 14 17:30:52 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 14 Feb 2001 11:30:52 -0600 Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: "Fred L. Drake, Jr."'s message of "Wed, 14 Feb 2001 10:10:17 -0500 (EST)" References: <14986.40921.565960.792080@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Alexandre Fayolle writes: > > On Wed, 14 Feb 2001, Juergen Hermann wrote: > > > * Scarab - the WANT to implement SOAP, there's already a module named > > > SOAP.py, but they're seemingly not ready yet > > > > Do you have a URL for this one? > > > http://casbah.org/Scarab/ > > I don't see a date on the page stating when this was last updated, > and the casbah.org pages seem old. (Ken MacLeod, can you inform us > on this? Or add dates to the Web pages?) The front page at > casbah.org doesn't contain a link to Scarab, and the "Casbah > Glossary" link is broken (which is one place I'd expect to see a > reference to Scarab). > There might be more information in the download package. The Scarab comm library went on hold when I went to rewrite some of the underlying code (Casbah as a whole went on hold quite a bit earlier :(. That underlying code has resulted in Orchard[1] which has a couple of features that not uncoincidentally make working with SOAP (in particular, XML Namespaces) a *lot* easier. The Orchard/Python implementation includes a new SOAP client[2,3] module that we're using successfuly with Apache SOAP. This module supports both SOAP pickling and RPC over HTTP. Note: I just found a bug last week: we're not encoding &<>"'. Doh! A SOAP server would be similarly easy, but we haven't penciled it in yet. The pure Python implementation of Orchard was written as an API prototype. Eventually it will go away in favor of the Mostly-C bridge, and SOAP will be ported to Mostly-C (can you say *screaming fast*? ;-). SOAP encoding will be the standard XML pickling format for Orchard, so it will be a heavily used module, with corresponding levels of maintenance and support. Although we're not expecting to maintain the pure Python implementation moving forward, this implementation is well tested so I can recommend using it until the Mostly-C bridge is available. -- Ken [1] [2] [3] From uche.ogbuji@fourthought.com Wed Feb 14 18:40:46 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 11:40:46 -0700 Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: Message from "Fred L. Drake, Jr." of "Wed, 14 Feb 2001 09:36:09 EST." <14986.38873.297269.693330@cj42289-a.reston1.va.home.com> Message-ID: <200102141840.LAA02710@localhost.localdomain> > > Uche Ogbuji writes: > > After today's 4Suite release (yes, we're in final packaging. > > Hooray!) we'll be removing 4DOM from the package so it lives > > completely in PyXML. This should accelerate maintenance. > > Excellent news! > Are you planning to update the PyXML CVS as soon as 4Suite is > released? Yep. And you'll be happy to know isSameNode() is implemented (but not documented). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Wed Feb 14 18:43:01 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 11:43:01 -0700 Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: Message from Alexandre Fayolle of "Wed, 14 Feb 2001 15:52:54 +0100." Message-ID: <200102141843.LAA02730@localhost.localdomain> > On Wed, 14 Feb 2001, Juergen Hermann wrote: > > > Hi! > > > > I know of two SOAP implementations for Python: > > * soaplib.py by PythonWare, more or less beta software > > We're using this in Narval. It works well. However it chokes on unicode > strings, so beware if you're planning to use it with python2.0 Note, /F says they are working on soaplib 0.9.5. Sounds as if they have a hurdle or two, but it will probably emerge soon. I imagine, based on his involvement with Python/Unicode that the next release will have better UNicode support. Just in case, I'd send him e-mail mentioning the need. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Wed Feb 14 18:41:41 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 13:41:41 -0500 (EST) Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: <200102141840.LAA02710@localhost.localdomain> References: <14986.38873.297269.693330@cj42289-a.reston1.va.home.com> <200102141840.LAA02710@localhost.localdomain> Message-ID: <14986.53605.447049.324038@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Yep. And you'll be happy to know isSameNode() is implemented (but not > documented). Even better! I'm not worried about the 4Suite documentation since the Python DOM API spec (in the Python Library Reference under "xml.dom") covers it. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Wed Feb 14 18:57:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 11:57:47 -0700 Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: Message from "Fred L. Drake, Jr." of "Wed, 14 Feb 2001 13:41:41 EST." <14986.53605.447049.324038@cj42289-a.reston1.va.home.com> Message-ID: <200102141857.LAA02816@localhost.localdomain> > > Uche Ogbuji writes: > > Yep. And you'll be happy to know isSameNode() is implemented (but not > > documented). > > Even better! I'm not worried about the 4Suite documentation since > the Python DOM API spec (in the Python Library Reference under > "xml.dom") covers it. ;-) I don't see it. At least not in the Node interface docs. I think we should carefully mark this, since it could possibly change or even go away before DOM Level 3 goes gold. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Wed Feb 14 19:01:34 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 20:01:34 +0100 (CET) Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: Message-ID: On Wed, 14 Feb 2001, Juergen Hermann wrote: > Hi! > > I know of two SOAP implementations for Python: > Any other implementations you know of? http://python.scripting.com/directory/13/soap/implementations lists a few things under that topic, but it looks like it's mostly java and perl stuff. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Wed Feb 14 19:04:17 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 14 Feb 2001 20:04:17 +0100 (CET) Subject: [XML-SIG] Python SOAP Implementations In-Reply-To: <200102141843.LAA02730@localhost.localdomain> Message-ID: On Wed, 14 Feb 2001, Uche Ogbuji wrote: > > We're using this in Narval. It works well. However it chokes on unicode > > strings, so beware if you're planning to use it with python2.0 > > Note, /F says they are working on soaplib 0.9.5. Sounds as if they have a > hurdle or two, but it will probably emerge soon. > > I imagine, based on his involvement with Python/Unicode that the next release > will have better UNicode support. > > Just in case, I'd send him e-mail mentioning the need. I'm pretty sure that he's aware of this: it is explicitely mentionned on http://www.pythonware.com/products/soap/profile.htm : "soaplib.py only supports 8-bit character sets. Future versions will add support for arbitrary character sets (but only under Python 1.6)." Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Wed Feb 14 19:35:17 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 14:35:17 -0500 (EST) Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: <200102141857.LAA02816@localhost.localdomain> References: <14986.53605.447049.324038@cj42289-a.reston1.va.home.com> <200102141857.LAA02816@localhost.localdomain> Message-ID: <14986.56821.158358.561035@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > I don't see it. At least not in the Node interface docs. It's in the CVS version, so it becomes part of the "official" API in Python 2.1. > I think we should carefully mark this, since it could possibly > change or even go away before DOM Level 3 goes gold. That's not necessarily the case for the Python bindings, but a note about it shouldn't hurt. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Wed Feb 14 19:44:46 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 12:44:46 -0700 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: Message from Alexandre Fayolle of "Wed, 14 Feb 2001 15:48:42 +0100." Message-ID: <200102141944.MAA02965@localhost.localdomain> > On Wed, 14 Feb 2001, Uche Ogbuji wrote: > > > So can we think of a better algorithm than the current "check for file, and if > > it doesn't exist, just blindly toss it to urllib)? > > If running windows, and the second character of the 'url' is a colon, > replace it with a pipe and prepend file: to the url? > > > This problem affects 4Suite as well. > > I had to use a similar hack when generating a CATALOG file for Narval, for > use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine > until it got C|\fooo\dtd_base\ > > Maybe what we need is a new function in os.path or similar that would > perform the file -> URL conversion described above. This would ease the > work of application writers. I, for one, would be much more at ease if I > knew that no implicit assumptions are made on what I pass. If the API > requires an URI/URL, then this is what it should get. Here's what Tom Passim suggested to us a while back """ - Handle "file:" with no slashes because rightly or wrongly they're often used. - For Windows, allow constructions like file:///c|... even though it isn't in the rfc, because this form too is used a lot (Who started it, Netscape or Tim BL??)(The rfc doesn't require or suggest replacing a colon with a bar). - For Windows, treat file:///c:\.... as an opaque url and just use the embedded path literally. - For Windows, treat file:///c:/... as a parsable path starting at c:\, or at least replace the forward with back slashes. - Make sure that file://localhost/ acts the same as file:/// because the rfc says to do so. - What have I missed? Something for the Mac? """ I meant to implement these heuristics for 4Suite, but I forgot. Any comments? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rvprasad@cis.ksu.edu Wed Feb 14 21:00:25 2001 From: rvprasad@cis.ksu.edu (Venkatesh Prasad Ranganath) Date: 14 Feb 2001 15:00:25 -0600 Subject: [XML-SIG] Re: DOM creation In-Reply-To: Uche Ogbuji's message of "Wed, 14 Feb 2001 14:12:30 GMT" References: <3A8A9254.C4F9FDC@ogbuji.net> Message-ID: The following message is a courtesy copy of an article that has been posted to comp.lang.python as well. >>>>> "Uche" == Uche Ogbuji writes: Uche> Venkatesh Prasad Ranganath wrote: >> I have a question on how DOM for a XML document conforming to DOM 2 >> should be constructed? >> >> Now if there are no namespaces specified in the document then should >> attributes be added to DOM using setAttributeNS('', Name, Value) or >> setAttribute(Name, Value)? >> >> The problem I am facing is when reading in a XML document with no >> explicit namespace specified in it through PyXML the attributes are added >> to the DOM using setAttributeNS with an empty NameSpace. So, I wanted to >> clarify if this is a problem with PyXML or is this how other DOM >> Constructors work. Uche> This is correct behavior. Of course, if you use the Uche> xml.dom.ext.reader.Sax reader, you get a tree with no namespace Uche> specifiers at all, which is also correct. Uche> If you plan to migrate to namespaces in future, or to mix namespace Uche> with non-namespace behavior, I'd suggest sticking to the PyExpat and Uche> Sax2 readers and using the DOm Level 2 methods (with appended "NS"). If this is the case then should or shouldn't set/getAttribute() in DOM2 "intelligently" assume empty namespace ('')? Also, is there any specification on construction of DOM for XML documents? Or does the DOM specs available at W3C describe the construction process? If so, can somebody tell me in which section? >> waiting for reply, Uche> This reminds me. I'm not sure I sent a reply to your last enquiry. Uche> I had a few questions, such as which Reader you were trying to use (It Uche> looked as if you didn't paste all of your example code in). Uche> However, I'd suggest trying the latest 4Suite 0.10.2 beta that was Uche> announced (since I noticed you're using XPath), and see if you still Uche> have those problems. If so, pleace copy follow-ups to Uche> xml-sig@python.org, which I check more regularly than this newsgroup. I guess it works fine on with 0.10.2b. Thankx -- Venkatesh Prasad Ranganath From fdrake@acm.org Wed Feb 14 21:17:35 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 14 Feb 2001 16:17:35 -0500 (EST) Subject: [XML-SIG] Re: DOM creation In-Reply-To: References: <3A8A9254.C4F9FDC@ogbuji.net> Message-ID: <14986.62959.130541.7707@cj42289-a.reston1.va.home.com> Venkatesh Prasad Ranganath writes: > If this is the case then should or shouldn't set/getAttribute() in > DOM2 "intelligently" assume empty namespace ('')? The level 1 methods should ignore the namespaceURI attribute and use only the nodeName attribute when matching against existing nodes, and new nodes created via setAttribute() should have a namespaceURI of None (the Python way to spell the "empty" namespace). > Also, is there any specification on construction of DOM for XML > documents? Or does the DOM specs available at W3C describe the > construction process? If so, can somebody tell me in which > section? I presume you mean from a string or file containing marked text rather than programmatically via DOM node constructors and tree-manipulation methods. There is some effort being made in the DOM Level 3 working drafts which covers this, but that's still pretty raw. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Wed Feb 14 21:52:06 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 14:52:06 -0700 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: Message from Uche Ogbuji of "Wed, 14 Feb 2001 12:44:46 MST." <200102141944.MAA02965@localhost.localdomain> Message-ID: <200102142152.OAA03320@localhost.localdomain> > > On Wed, 14 Feb 2001, Uche Ogbuji wrote: > Here's what Tom Passim suggested to us a while back My apologies to Tom Passin. I also spelled Jeremy's name wrongly today. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Wed Feb 14 23:01:54 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 14 Feb 2001 16:01:54 -0700 Subject: [XML-SIG] bug in xml.dom.Document.importNode? In-Reply-To: Message from Michael Dyck of "Wed, 14 Feb 2001 00:42:08 PST." <3A8A44E0.FEBD419C@home.com> Message-ID: <200102142301.QAA03637@localhost.localdomain> > When I "import" a node from one document into another, it loses attributes. > If someone could provide a fix or workaround, I would appreciate it. This is another bug fixed in 4DOM CVS: from xml.dom import Document from xml.dom.ext.reader import PyExpat from xml.dom.ext import PrettyPrint reader = PyExpat.Reader() doc1 = reader.fromString("") original_node = doc1.documentElement PrettyPrint( original_node ) doc2 = Document.Document( None ) imported_node = doc2.importNode( original_node, deep=1 ) PrettyPrint(imported_node) prints We're wrapping up the 4Suite release, which will have the fix, then we'll check in to PyXML CVS to propagate the fix there. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Thu Feb 15 01:02:49 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 14 Feb 2001 20:02:49 -0500 Subject: [XML-SIG] problems with PyXML 0.6.3 References: Message-ID: <001901c096eb$04a9ed20$7cac1218@reston1.va.home.com> This file: business is trickier than it seems, because the RFC is ambiguous for file: urls. A pipe character isn't in the rfc at all even though it's used by some of the browsers. I strongly suggest that when a local file is intended, that one should use the file: scheme. That way, the application doesn't have to guess and it won't try a spurious url if the file isn't found. The way it's done in this example is just asking for continuous trouble, as I guess we're seeing now. I think we should come to an agreement with the maintainer of the urllib about the allowed forms for file: schemes. It's mainly on Windows (and, perhaps, Macs) that there would be a problem. My preferred forms are these, for a file at d:\temp\python\thefile.xml - 1) file:///d:/temp/python/thefile.xml 2) file:///d:\temp\python\thefile.xml Both of these comply fully with the rfc. 2) is an "opaque" form - no further parsing would be done by the url processor, it would just pass it to the os. 1) is what you get according to the rfc when you want the url processor to be able to parse out the path parts. The processor is supposed to know to replace slashes by backslashes if appropriate for the os. Either 1) or 2) would also work for files on a network file system, if you put the host name in there - file://host/temp/python/thefile.xml 1) would be more portable, and is my preference. The processor should be able to handle both, however. For backwards compatibility, form 3) should also be accepted, I suppose: 3) file:d:\temp\python\thefile.xml This could be negotiated, though. Let's agree on this and get it working right! Cheers, Tom P Alexandre Fayolle wrote - > On Wed, 14 Feb 2001, Uche Ogbuji wrote: > > > So can we think of a better algorithm than the current "check for file, and if > > it doesn't exist, just blindly toss it to urllib)? > > If running windows, and the second character of the 'url' is a colon, > replace it with a pipe and prepend file: to the url? > > > This problem affects 4Suite as well. > > I had to use a similar hack when generating a CATALOG file for Narval, for > use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine > until it got C|\fooo\dtd_base\ > > Maybe what we need is a new function in os.path or similar that would > perform the file -> URL conversion described above. This would ease the > work of application writers. I, for one, would be much more at ease if I > knew that no implicit assumptions are made on what I pass. If the API > requires an URI/URL, then this is what it should get. > > Opinions? From tpassin@home.com Thu Feb 15 03:40:52 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 14 Feb 2001 22:40:52 -0500 Subject: [XML-SIG] problems with PyXML 0.6.3 References: <001901c096eb$04a9ed20$7cac1218@reston1.va.home.com> Message-ID: <001e01c09701$19653b00$7cac1218@reston1.va.home.com> Sorry, for style 1) I meant this instead: 1) file:///d/temp/python/thefile.xml Using this style, the root of the path would be d/, and you don't need the colon. > I think we should come to an agreement with the maintainer of the urllib about > the allowed forms for file: schemes. It's mainly on Windows (and, perhaps, > Macs) that there would be a problem. My preferred forms are these, for a file > at d:\temp\python\thefile.xml - > > 1) file:///d:/temp/python/thefile.xml > > 2) file:///d:\temp\python\thefile.xml > From uche.ogbuji@fourthought.com Thu Feb 15 07:22:41 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 15 Feb 2001 00:22:41 -0700 Subject: [XML-SIG] Murphy strikes again Message-ID: <200102150722.AAA00484@localhost.localdomain> Our final tests turned up some more work needed in ODS and elsewhere. Since we're trying to be extra-cautious with this release, and the others on the road to 1.0, we decided to hold off for a little more trouble-shooting and testing. Unfortunately we have other obligations in the morning, so it could be until Friday before the next release. I'll try to post another beta with today's fixes tomorrow morning. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From l.szyster@ibm.net Thu Feb 15 12:36:10 2001 From: l.szyster@ibm.net (Laurent Szyster) Date: Thu, 15 Feb 2001 13:36:10 +0100 Subject: [XML-SIG] Python SOAP Implementations References: Message-ID: <3A8BCD3A.7D9BF7CC@ibm.net> Juergen Hermann wrote: > > I know of two SOAP implementations for Python: > * soaplib.py by PythonWare, more or less beta software > * Scarab - the WANT to implement SOAP, there's already a module named > SOAP.py, but they're seemingly not ready yet > > Any other implementations you know of? > I've wrote a small SOAP server prototype for a customer, using Medusa, pyexpat and a simplistic DOM based on qp_xml.py (from Greg Stein). But I did not implement a SOAP library (something that instanciate objects from an XML stream and reverse). The technique used for processing a SOAP request is to pass a simple DOM instance to a function (actually, call the __call__ method of the DOM instance), along with a file-like instance where to "print" the response SOAP envelope. class SOAP_request (DOM.DOM): def __call__ (dom, stdout): It's then up to this function to walk down the tree for parameters, do what the procedure must do and output a SOAP envelope response. I cannot publish this prototype code, but I'm ready to share my experience with the Apache SOAP toolkit. Laurent Szyster From guenter.radestock@sap.com Thu Feb 15 14:34:35 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Thu, 15 Feb 2001 15:34:35 +0100 Subject: [XML-SIG] broken expat module in PyXML-0.6.3 Message-ID: I have tried to find the problem in the expat parser module that comes with PyXML-0.6.3 and that leads to Python crashes on Windows when an exception is thrown while parsing incorrect stuff like: from xml.parsers import expat import sys po = expat.ParserCreate('ISO-8859-1') po.Parse(u'', 1) (The xml version is missing in the above example) I have found the following: 1. the problems will go away if you remove the _xmlplus/parsers/pyexpat.pyd extension. Then the extension supplied with Python2.0 will be used. Because this has less features, things like SAX2 will probably not work any more, but xml.parsers.expat will be usable as well as features of the XML package that do not require expat. 2. in the file pyexpat.c, the variable "ErrorObject" is not initialized (there is a test for null in the init method of the module). This is clearly a bug, but unfortunately not the (only) source of the problem. ErrorObject should be declared as: static PyObject *ErrorObject = NULL; 3. Inserting debug prints into the function xmlparse_Parse(() shows that the pointer ErrorObject gets destroyed while parsing the incorrect XML. It does not get destroyed when correct XML is parsed. 4. If I put the line static int *willNotBeUsed; immediately after the declaration of ErrorObject, the module becomes more stable - it did not crash anymore with my tests. This cannot be the solution, though. I have no idea right now how to get this straight and little experience in debugging at would appreciate it a lot if somebody else could look into this. This may be a problem with expat itself and not the module? - Guenter From noreply@sourceforge.net Thu Feb 15 16:46:39 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 15 Feb 2001 08:46:39 -0800 Subject: [XML-SIG] [Bug #132541] xml.dom.WrongDocumentErr is missing redefinition of __init__ Message-ID: Bug #132541, was updated on 2001-Feb-15 08:46 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: mjpieters Assigned to : nobody Summary: xml.dom.WrongDocumentErr is missing redefinition of __init__ Details: xml.dom.WrongDocumentErr (defined in xml/dom/__init__.py) is missing the following line: __init__ = DOMException._derived_init Trying to raise xml.dom.WrongDocumentErr() will therefor fail. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=132541&group_id=6473 From uche.ogbuji@fourthought.com Thu Feb 15 18:16:44 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 15 Feb 2001 11:16:44 -0700 Subject: [XML-SIG] Python Web Services Column Message-ID: <200102151816.LAA09197@localhost.localdomain> Also wanted to note that Mike and I are newly columnists on the Web Services Zone of IBM developerWorks: http://www-106.ibm.com/developerworks/webservices/ The column is called "The Python Web services developer" First installment is at http://www-106.ibm.com/developerworks/library/ws-pyth1.html?dwzone=ws Blurb: "Python's motto has always been "batteries included," referring to the large array of standard libraries and facilities that come with the language installation. This article presents an overview and survey of tools and facilities available for Web services development in Python. This includes built-in Python features and third-party open-source tools." Unfortunately, we only mentioned Ken MacLeod's Scarab, and not Orchard. Didn't know any better. Next time. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu Feb 15 19:53:23 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 15 Feb 2001 20:53:23 +0100 Subject: [XML-SIG] broken expat module in PyXML-0.6.3 In-Reply-To: (guenter.radestock@sap.com) References: Message-ID: <200102151953.f1FJrNM02423@mira.informatik.hu-berlin.de> > I have tried to find the problem in the expat parser module that > comes with PyXML-0.6.3 and that leads to Python crashes on > Windows when an exception is thrown while parsing incorrect > stuff like I believe this bug is fixed on both Python CVS and PyXML CVS: an array should have 257 instead of 256 elements. You can either take the corrected version from the CVS, or wait for 0.6.4. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Feb 15 20:05:44 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 15 Feb 2001 21:05:44 +0100 Subject: [XML-SIG] windows installer for XML package failing on Windows 95 In-Reply-To: (guenter.radestock@sap.com) References: Message-ID: <200102152005.f1FK5iK02500@mira.informatik.hu-berlin.de> > I tried to install the XML package onto a Windoze 95 box a few days > ago and it does not work. The installer crashes without unpacking > source or opening any window. This may be a distutils issue. It certainly sounds like one. I recommend to contact the author of the bdist_wininst command, Thomas Heller; or to post a message to the distutils SIG. I believe I've used distutils 1.0 to create teh installer. > First: I can successfully unpack the executable with winzip and move > the package directory into Python20/Lib. This seems to work, but I > am not sure if I should also patch any existing files. Is there a > script inside the installer that I should run after unpacking? No, nothing. The installer does not support any post-processing, AFAIK. > I did not find a setup.py; the source package won't help me because > I would have to install a compiler for the extensions, right? Right. > 1. The installer crashes only on this one Libretto 50ct Laptop with > Windows 95, second edition. I have successfully used it on other > Windows computers. Unfortunately, this is a multi-level bootstrapping. The installer GUI might use some Windows DLLs or Windows API in the wrong way. However, the installer itself is compressed with an auto-uncompression program, which might also fail. > 2. Before installing the XML package, I first removed Python 1.5.2, then > removed the TCL/TK that came with 1.5.2, then installed Python 2.0. I did > not have Python 1.5.2 on the other systems I installed the package on. > I also have an older (don't rememver the exact) version of Winzip on the > Libretto - can the Winzip DLL be the source of my problem? Unlikely. The installer has the InfoZip library statically linked. Regards, Martin From karl@digicool.com Fri Feb 16 00:31:21 2001 From: karl@digicool.com (Karl Anderson) Date: 15 Feb 2001 16:31:21 -0800 Subject: [XML-SIG] Python IDL mapping reference? Message-ID: I can't find the Python IDL mapping reference that IIRC used to be in the xml-sig area. Could someone send me an URL? -- Karl Anderson karl@digicool.com From martin@loewis.home.cs.tu-berlin.de Fri Feb 16 07:11:17 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 16 Feb 2001 08:11:17 +0100 Subject: [XML-SIG] Python IDL mapping reference? In-Reply-To: (message from Karl Anderson on 15 Feb 2001 16:31:21 -0800) References: Message-ID: <200102160711.f1G7BHm00848@mira.informatik.hu-berlin.de> > I can't find the Python IDL mapping reference that IIRC used to be in > the xml-sig area. Could someone send me an URL? Not sure where it was supposed to be in the xml-sig area, but the OMG-adopted Python language mapping is at http://cgi.omg.org/cgi-bin/doc?ptc/00-04-08 Regards, Martin From guenter.radestock@sap.com Fri Feb 16 08:47:24 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Fri, 16 Feb 2001 09:47:24 +0100 Subject: [XML-SIG] broken expat module in PyXML-0.6.3 Message-ID: > -----Original Message----- > From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de] > Sent: Donnerstag, 15. Februar 2001 20:53 > To: Radestock, Guenter > Cc: XML-SIG@python.org > Subject: Re: [XML-SIG] broken expat module in PyXML-0.6.3 > > > > I have tried to find the problem in the expat parser module that > > comes with PyXML-0.6.3 and that leads to Python crashes on > > Windows when an exception is thrown while parsing incorrect > > stuff like > > I believe this bug is fixed on both Python CVS and PyXML CVS: an array > should have 257 instead of 256 elements. > > You can either take the corrected version from the CVS, or wait for > 0.6.4. Thanks a lot. I got the corrected file from CVS. Unfortunately, it does not compile (revision 1.31 of pyexpat.c) because my_StartElementHandler() is defined twice (from a macro and literally). I deleted one definition (the literal one at the top of the file) and it seems the problem has gone away. - Guenter From Juergen Hermann" Message-ID: On Fri, 16 Feb 2001 08:11:17 +0100, Martin v. Loewis wrote: >> I can't find the Python IDL mapping reference that IIRC used to be >Not sure where it was supposed to be in the xml-sig area, but the >OMG-adopted Python language mapping is at > >http://cgi.omg.org/cgi-bin/doc?ptc/00-04-08 Great, so far we only had two URLs with the original spec and the corrections for it. BTW, Martin, is there anything you are NOT involved = with? ;) Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From noreply@sourceforge.net Fri Feb 16 11:36:46 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 16 Feb 2001 03:36:46 -0800 Subject: [XML-SIG] [Bug #132683] DOMImplementation.hasFeature('Core', None) returns 0 Message-ID: Bug #132683, was updated on 2001-Feb-16 03:36 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: mjpieters Assigned to : nobody Summary: DOMImplementation.hasFeature('Core', None) returns 0 Details: The following is wrong: > python Python 1.5.2 (#0, Dec 27 2000, 14:53:01) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from xml.dom import implementation >>> implementation.hasFeature('Core', None) 0 >>> The spec says that any DOM implementation compliant with the DOM API should at least implement the 'Core' feature set; PyXML certainly does, so the call to hasFeature should succeed. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=132683&group_id=6473 From Dan.Rolander@marriott.com Fri Feb 16 19:03:58 2001 From: Dan.Rolander@marriott.com (Rolander, Dan) Date: Fri, 16 Feb 2001 14:03:58 -0500 Subject: [XML-SIG] windows installer for XML package failing on Window s 95 Message-ID: <6176E3D8E36FD111B58900805FA7E0F80CCF63A9@mcnc-mdm1-ex01> There are two possible problems. Either your missing MSVCRT.DLL, or you need to update COMCTL32.DLL. The latter is probably the problem, because if you're missing the first dll you'll get a warning telling you that, but if you have an older version of comctl32.dll the installer will crash (I had this same problem). You can get the update from http://www.microsoft.com/msdownload/ieplatform/ie/comctrlx86.asp. There are installers for IE 4.01 and IE 5.0. HTH, Dan -----Original Message----- From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de] Sent: Thursday, February 15, 2001 3:06 PM To: guenter.radestock@sap.com Cc: XML-SIG@python.org Subject: Re: [XML-SIG] windows installer for XML package failing on Windows 95 > I tried to install the XML package onto a Windoze 95 box a few days > ago and it does not work. The installer crashes without unpacking > source or opening any window. This may be a distutils issue. It certainly sounds like one. I recommend to contact the author of the bdist_wininst command, Thomas Heller; or to post a message to the distutils SIG. I believe I've used distutils 1.0 to create teh installer. > First: I can successfully unpack the executable with winzip and move > the package directory into Python20/Lib. This seems to work, but I > am not sure if I should also patch any existing files. Is there a > script inside the installer that I should run after unpacking? No, nothing. The installer does not support any post-processing, AFAIK. > I did not find a setup.py; the source package won't help me because > I would have to install a compiler for the extensions, right? Right. > 1. The installer crashes only on this one Libretto 50ct Laptop with > Windows 95, second edition. I have successfully used it on other > Windows computers. Unfortunately, this is a multi-level bootstrapping. The installer GUI might use some Windows DLLs or Windows API in the wrong way. However, the installer itself is compressed with an auto-uncompression program, which might also fail. > 2. Before installing the XML package, I first removed Python 1.5.2, then > removed the TCL/TK that came with 1.5.2, then installed Python 2.0. I did > not have Python 1.5.2 on the other systems I installed the package on. > I also have an older (don't rememver the exact) version of Winzip on the > Libretto - can the Winzip DLL be the source of my problem? Unlikely. The installer has the InfoZip library statically linked. Regards, Martin _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From jeremy.kloth@fourthought.com Fri Feb 16 20:23:39 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Fri, 16 Feb 2001 13:23:39 -0700 Subject: [XML-SIG] broken expat module in PyXML-0.6.3 References: Message-ID: <3A8D8C4B.3B16F3D5@fourthought.com> "Radestock, Guenter" wrote: > > Unfortunately, it does not compile (revision 1.31 of pyexpat.c) because > my_StartElementHandler() is defined twice (from a macro and literally). > I deleted one definition (the literal one at the top of the file) and > it seems the problem has gone away. > Is the literal handler there for Expat 1.95? If so, we should probably have a #if..#endif around it for that version. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Fri Feb 16 20:35:41 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 16 Feb 2001 15:35:41 -0500 (EST) Subject: [XML-SIG] broken expat module in PyXML-0.6.3 In-Reply-To: <3A8D8C4B.3B16F3D5@fourthought.com> References: <3A8D8C4B.3B16F3D5@fourthought.com> Message-ID: <14989.36637.359790.864097@cj42289-a.reston1.va.home.com> Jeremy Kloth writes: > Is the literal handler there for Expat 1.95? If so, we should probably > have > a #if..#endif around it for that version. Actually, the literal handler should be there for all versions, and the macro-ized version should be removed. I'll get this fixed. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Fri Feb 16 20:37:08 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 16 Feb 2001 15:37:08 -0500 (EST) Subject: [XML-SIG] broken expat module in PyXML-0.6.3 In-Reply-To: References: Message-ID: <14989.36724.278682.884483@cj42289-a.reston1.va.home.com> Radestock, Guenter writes: > is defined twice (from a macro and literally). I deleted one definition > (the literal one at the top of the file) and it seems the problem has > gone away. Try removing the macro-ized version; it doesn't have all the features of the first implementation. I'll make corrections in CVS shortly. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From larsga@garshol.priv.no Sat Feb 17 14:04:37 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Feb 2001 15:04:37 +0100 Subject: [XML-SIG] Roadmap document - finally! Message-ID: After going through lots of trouble with mail servers and crashed disk drives I've now written the roadmap document (twice) and posted it at (once): Please have a look at it and tell me what you think. I haven't yet added any links to it, but will do so as soon as it is accepted by the group. --Lars M. From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 15:55:58 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 16:55:58 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: (message from Lars Marius Garshol on 17 Feb 2001 15:04:37 +0100) References: Message-ID: <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de> > Please have a look at it and tell me what you think. It looks good to me. On the pyexpat lexical handler: Uche already contributed such support, which reports comments and CDATA. Do you think you can talk pyexpat into reporting more than that? Would that require some minimal expat version to work? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 15:44:51 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 16:44:51 +0100 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: (message from Lars Marius Garshol on Sat, 17 Feb 2001 05:59:54 -0800) References: Message-ID: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> Hi Lars, Thanks for maintaining the roadmap. >
  • Re-indent everything to 4-space indents I'll do that. It is actually done for most of the code that does not have an explicit owner, only 4DOM and xmlproc still need to go through reindent.py. >
  • Move development to the PyXML CVS tree. This includes moving > the test suite. >
  • Release version 0.80 with updates for XML 1.0 2nd edition > compliance, better validator independence (from parser), better > location reporting and base sysid handling, Unicode support and > improved convenience APIs. It may be that there will be several > releases on the road to 0.80. Can you give an estimate point in time for completion of these items? Or perhaps just the first one? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 16:05:37 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 17:05:37 +0100 Subject: [XML-SIG] Unicode support problems in parsers In-Reply-To: <3A8A677F.B227C56B@helsinki.fi> (message from Jere =?ISO-8859-1?Q?Kahanp=E4=E4?= on Wed, 14 Feb 2001 13:09:51 +0200) References: <3A8A677F.B227C56B@helsinki.fi> Message-ID: <200102171605.f1HG5bL09016@mira.informatik.hu-berlin.de> > Unfortunately the default parser seeems to have serious memory > management problems: the total amount of used memory grows by 1-2 > megabytes for each processed file. A forced garbage collection (this > is Py2.0) doesn't help at all. pyexpat in 0.6.2 had a number of memory leaks, most of which got fixed in 0.6.3, although some are only fixed in the CVS. So if you take the pyexpat.c from CVS, things should look much better. There were two problems: the SAX reader created cyclic garbage (which it shouldn't), and pyexpat would not participate in garbage collection, which caused cycles involving Parser objects not to be collected. > However, an even more serious problem was now encountered; the > default *validating* parser returns normal Python string, while the > default parser returns Unicode strings as any sensible > XML-processing tool should do. Yes, this is a known problem with xmlproc in the Python CVS, I hope Lars Marius will contribute an updated version soon. > This behaviour do cause any amount of trouble elsewhere in the code: > The PrettyPrinter, for example, don't work at all with normal > strings with non-ascii chars. Which, in turn, is a bug in the pretty printer - since we are attempting backwards compatibility with 1.5.2, it *should* support plain strings. > I don't have the names of the parsers with problems right here, but > the test runs were done on a Linux box with PyXML 0.6.2. Sorry for the inconvenience. If you need a fix right away, I suggest you either use the PyXML CVS, or the 4Suite 0.10.2 beta, which has many of the components updated. If you can wait somewhat longer - I hope that I can release PyXML 0.6.4 in the near future. Regards, Martin From larsga@garshol.priv.no Sat Feb 17 16:14:13 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Feb 2001 17:14:13 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de> References: <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | On the pyexpat lexical handler: Uche already contributed such | support, which reports comments and CDATA. Do you think you can talk | pyexpat into reporting more than that? It should be able to support reporting of entity boundaries, at least. | Would that require some minimal expat version to work? The current expat version should be sufficient for entity boundaries. I forget whether the LexicalHandler contains anything more. If it does and if that requires anything special from expat I'll raise the issue at that point. I haven't got all this stuff in my head now, so I can't say anything more yet. --Lars M. From larsga@garshol.priv.no Sat Feb 17 16:19:00 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Feb 2001 17:19:00 +0100 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> References: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> Message-ID: Hi Martin, * Martin v. Loewis | | Thanks for maintaining the roadmap. No problem. :) * Lars Marius Garshol | |
  • Re-indent everything to 4-space indents * Martin v. Loewis | | I'll do that. OK. Should I remove it from the list or leave it there until you've done it? * Lars Marius Garshol | |
  • Move development to the PyXML CVS tree. This includes moving | the test suite. |
  • Release version 0.80 with updates for XML 1.0 2nd edition | compliance, better validator independence (from parser), better | location reporting and base sysid handling, Unicode support and | improved convenience APIs. It may be that there will be several | releases on the road to 0.80. * Martin v. Loewis | | Can you give an estimate point in time for completion of these items? | Or perhaps just the first one? The first one I hope to do very soon. I would have done it already had not my laptop crashed and taken some of this work with it. As it is I am not sure how much I need to do over, but this is the first XML-SIG related thing I'll do[1], and it shouldn't take too long. Getting all of version 0.80 done will take several months, I expect, mostly because I'll be taking a lot of time off from all kinds of work. Since I have yet to provide an accurrate estimate of this kind of thing I won't try to be more specific. --Lars M. [1] Provided I can resist the temptation to implement Rick Jelliffe's Hook schema language. From tpassin@home.com Sat Feb 17 16:46:54 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 17 Feb 2001 11:46:54 -0500 Subject: [XML-SIG] Roadmap document - finally! References: Message-ID: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> Lars Marius Garshol wrote - > > After going through lots of trouble with mail servers and crashed disk > drives I've now written the roadmap document (twice) and posted it at > (once): > > > > Please have a look at it and tell me what you think. > Thanks, Lars, for doing this. It's a big service. I'd like to suggest a few things, and see what people think. First of all, I think we need to address testing and especially regression testing. From reading various posts lately, it seems like a lot of things pop up, get fixed in some version on the cvs tree, and later on, who knows which version has what fixed, or how to prevent it from popping up again. We would benefit from a good test suite that is easy to run, self-evaluates the results, contains plenty of regression tests, and makes it easy to add tests. Although I know that no one (including me) wants to spend time on this, once it's accomplished, we should be able to improve the quality of the results while spending less effort on testing and bug fixing. I suggest we look at using pyUnit for this. I only looked at it for a few minutes, but it looks promising. It might make sense to use the OASIS parser test cases as a part of the test suite. Second, I think the road map should include directions for future work. What's in there now is mostly finishing up on current work. What might we want to get into? One thing is to keep the standard tools up with newer versions of existing W3C Recs. This would include DOM 3, and the new releases of xpath, xslt, and xpointer. We did this for SAX2, and surely we will want/need to do the same for the other key recs. Let's sketch out these intents in the Roadmap. Next in the way of future directions would be important new Recs. Xml Schemas would seem to be a prime candidate. Is anyone working or wanting to work on py-xml-xchemas? Can we get some of Henry Thompson's code? What about an API for xml schemas? Can we take the lead in that? Or do we not want to (or no one is personally interested?). Let's get it into the Roadmap. Then there are the non-standards things. Is pyXml going to do anything with RDF? Topic maps? What else? Into the roadmap, even if there is no one to work on such projects at the moment. Finally, let's add some direction for some of the other efforts that keep popping up, like miniDOM. How will it fit into the picture. We've been talking about it recently. Into the roadmap, I say! I apologise for the length of this post, but there is a lot to think about here! Cheers, Tom Passin From uche.ogbuji@fourthought.com Sat Feb 17 16:51:36 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 17 Feb 2001 09:51:36 -0700 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: Message from "Martin v. Loewis" of "Sat, 17 Feb 2001 16:44:51 +0100." <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> Message-ID: <200102171651.JAA05661@localhost.localdomain> > Hi Lars, > > Thanks for maintaining the roadmap. > > >
  • Re-indent everything to 4-space indents > > I'll do that. It is actually done for most of the code that does not > have an explicit owner, only 4DOM and xmlproc still need to go through > reindent.py. Our internal strandard is 4-space indents as well, so 4DOM should be a simple enough task. Speaking of: any point mentioning the full and permanent merging of 4DOM into the PyXML core? Then again, it will happen to soon to bother adding it now. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Sat Feb 17 16:57:10 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Feb 2001 17:57:10 +0100 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: <200102171651.JAA05661@localhost.localdomain> References: <200102171651.JAA05661@localhost.localdomain> Message-ID: * Uche Ogbuji | | Speaking of: any point mentioning the full and permanent merging of | 4DOM into the PyXML core? Then again, it will happen to soon to | bother adding it now. Seems like you answered your own question. :-) --Lars M. From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 16:46:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 17:46:33 +0100 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: <3A88F359.991E26FD@home.com> (message from Michael Dyck on Tue, 13 Feb 2001 00:42:01 -0800) References: <3A88F359.991E26FD@home.com> Message-ID: <200102171646.f1HGkXm09267@mira.informatik.hu-berlin.de> From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 16:23:41 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 17:23:41 +0100 Subject: [XML-SIG] Bug: XML Prolog from xml.sax.writer should contain version In-Reply-To: (guenter.radestock@sap.com) References: Message-ID: <200102171623.f1HGNfe09147@mira.informatik.hu-berlin.de> > please anybody fix this on sourceforge so it will be OK in the > next release. This was fixed in revision 1.4 of writer.py. Regards, Martin From uche.ogbuji@fourthought.com Sat Feb 17 17:06:35 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 17 Feb 2001 10:06:35 -0700 Subject: [XML-SIG] Hook In-Reply-To: Message from Lars Marius Garshol of "17 Feb 2001 17:19:00 +0100." Message-ID: <200102171706.KAA06679@localhost.localdomain> > [1] Provided I can resist the temptation to implement Rick Jelliffe's > Hook schema language. Ah. You too? I'm also quite intrigued by Hook. Interesting to see how such an extremely minimalist schema language will hold up to real-world cases. In case anyone is wondering what Hook is, here is a complete schema for XHTML Basic. html head [ title; meta. link. base. ] body [ a br. blockquote caption; div dl; form h1; h2; h3; h4; h5; h6; img. ol; p; pre; table; ul; ] [ tr; dt; dd; li; input; label; select; textarea; ] [ td option. ] [ abbr acronym address cite code dfn em kbd q samp span strong var object; ] param Me, I like. See http://www.ascc.net/xml/hook. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 17:16:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 18:16:00 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <14981.36324.913804.941652@gargle.gargle.HOWL> (message from Don Wakefield on Sat, 10 Feb 2001 10:52:20 -0800 (PST)) References: <14980.41475.888529.565845@gargle.gargle.HOWL> <200102100700.f1A70X701220@mira.informatik.hu-berlin.de> <14981.36324.913804.941652@gargle.gargle.HOWL> Message-ID: <200102171716.f1HHG0B09368@mira.informatik.hu-berlin.de> > So my environment is fine. PyExpat.py does not import pyexpat, but I do > in my calling test script: > > from xml.parsers import pyexpat > from xml.dom.ext.reader import PyExpat That does not matter. An import is always local to the module, so if you import it into __main__, it still won't be in PyExpat - so there is a clear bug in PyExpat. > Note that I've downloaded PyXML-0.6.3 from Sourceforge (haven't > installed it yet) and PyExpat.py in *that* version does not import > pyexpat either. So if you are not able to duplicate the problem with > that version, it must be something deeper... Sorry for the confusion. I had some 4Suite release installed, not PyXML 0.6.3, which indeed has this bug. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 17:06:21 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 18:06:21 +0100 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: <3A88F359.991E26FD@home.com> (message from Michael Dyck on Tue, 13 Feb 2001 00:42:01 -0800) References: <3A88F359.991E26FD@home.com> Message-ID: <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de> Hi Michael, Thanks for your comments. > Shouldn't the installer remove or rename the existing _xmlplus dir > first? Unfortunately, the installer is based on distutils, which does not provide such a capability. Patches are welcome, of course. > xmldoc/README says it's "v0.6.2" Thanks, it will read 0.6.4 in the next release. > xmldoc/README could note that if you've just run an installer, > you don't have to do any of the "python setup.py ..." commands. Ok, I added such a comment. > xmldoc/test: > Either xmldoc/README or (new file) xmldoc/test/README should tell you > how to run the tests in this dir (`python testxml.py -g', I think), > and how to interpret what happens. Similarly for subdirs. > Maybe tests should be run automatically on installation. Not sure about that. Perhaps I should add a note that the tests should *not* be run, unless you know what you do. Contributions of more elaborate documentation would be welcome, of course. > I had 2 tests fail: > test test_sax crashed -- > exception.SystemError : 'finally' pops bad exception That is a serious bug of pyexpat in 0.6.3 on Windows, which basically means that the Windows distribution is useless. It was subsequently fixed with the pyexpat.c in the Python and PyXML CVS. > test test_saxdrivers crashed -- > exceptions.IOError : [Errno url error] unknown url type: 'c' Not sure about this one. It might be a problem with drive letters and urllib. > xmldoc/test/dom: > When I tried `python test.py', I got "Error in syntax" right away. I hope that we'll get an update to this code soon, so there is probably no need to investigate it further. > When I ran one of my DOM programs, I got this exception: > from xml.dom.Node import Node > ImportError: No module named Node Yes, xml.dom.Node is gone. Why did you need to import it? If it was to get at the node type constants, they live in xml.dom.Node now. > When I tried removing the ".Node" from the import statement, the > program ran as before, so apparently that is the fix, but shouldn't > this be noted fairly prominently in xmldoc/README or > xmldoc/README.dom? Contributions of documentation are welcome. I'd rather not maintain a change log of all API changes; having the current state of the API documented somewhere would be good, though. > xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and > ../README.html, which do not exist. Again, with the next 4DOM update, this might look completely different. Regards, Martin From uche.ogbuji@fourthought.com Sat Feb 17 17:19:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 17 Feb 2001 10:19:09 -0700 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Message from "Thomas B. Passin" of "Sat, 17 Feb 2001 11:46:54 EST." <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> Message-ID: <200102171719.KAA07503@localhost.localdomain> > We would benefit from a good test suite that is easy to run, self-evaluates > the results, contains plenty of regression tests, and makes it easy to add > tests. Although I know that no one (including me) wants to spend time on > this, once it's accomplished, we should be able to improve the quality of the > results while spending less effort on testing and bug fixing. > > I suggest we look at using pyUnit for this. I only looked at it for a few > minutes, but it looks promising. It might make sense to use the OASIS parser > test cases as a part of the test suite. Looks as if PyUnit is about to be elevated to The True Python Unit Testing System, so I guess this makes sense. > Second, I think the road map should include directions for future work. > What's in there now is mostly finishing up on current work. What might we > want to get into? One thing is to keep the standard tools up with newer > versions of existing W3C Recs. This would include DOM 3, On its way. > and the new releases > of xpath, None yet. > xslt, I'm still conducting a Jihad against XSLT 1.1 on xsl-list (and the xsl-editors@w3.org list). Hopefully I can get them to ditch xsl:script. Looks as if I have quite a bit of support, but who ever knows what the W3C will do? > and xpointer. 4XPointer in 0.10.2 is about 90% there. A bit of work left on points and ranges. > We did this for SAX2, and surely we will > want/need to do the same for the other key recs. Let's sketch out these > intents in the Roadmap. > > Next in the way of future directions would be important new Recs. Xml Schemas > would seem to be a prime candidate. Is anyone working or wanting to work on > py-xml-xchemas? Eww! XSchemas got cooties! I'm not touching it. I'd rather see if Lars comes up with anything on Hook. But I know, I know, someone will have to implement XSchemas for maximum Python Buzzworthiness. > Can we get some of Henry Thompson's code? What about an API > for xml schemas? Can we take the lead in that? Or do we not want to (or no one > is personally interested?). Let's get it into the Roadmap. > > Then there are the non-standards things. Is pyXml going to do anything with > RDF? There is 4RDF. Does PyXML really need to dupe the effort? 4RDF is a *very* advanced RDF implementation, even though I say so myself. See http://www.xml.com/2000/10/11/rdf/index.html > Topic maps? I think Lars and Geir are manning this fort. > What else? Into the roadmap, even if there is no one to work > on such projects at the moment. Off-head: XQL has finally awoken from its funk Experimental parser-level XInclude and XML:Base support maybe A low-level Infoset API would be interesting Schematron implemented in Python rather than XSLT RELAX TREX UDDI WebDAV client services -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sat Feb 17 17:31:00 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 17 Feb 2001 10:31:00 -0700 Subject: [XML-SIG] 4Suite Beta 3 (pretty much release candidate) Message-ID: <200102171731.KAA08295@localhost.localdomain> 4Suite is pretty much all done. 4SS was the reason the release didn't go out on Friday. We'll be in to finish the job tomorrow. Meanwhile, here is a version with the ODS fixes I mentioned a few days ago and other fixes. The only changes I expect between this one and the final are l10n changes based on discussions with Martin and Alexandre, so please help us keep off the brown paper bag. Thanks. ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b3.tar.gz -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 17:21:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 18:21:24 +0100 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: (message from Lars Marius Garshol on 17 Feb 2001 17:19:00 +0100) References: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> Message-ID: <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de> > OK. Should I remove it from the list or leave it there until you've > done it? Please leave it as a reminder. > The first one I hope to do very soon. I would have done it already had > not my laptop crashed and taken some of this work with it. As it is I > am not sure how much I need to do over, but this is the first XML-SIG > related thing I'll do[1], and it shouldn't take too long. > > Getting all of version 0.80 done will take several months, I expect, > mostly because I'll be taking a lot of time off from all kinds of work. > > Since I have yet to provide an accurrate estimate of this kind of > thing I won't try to be more specific. Thanks. This is accurate enough. I'm looking forward to the integration of the current xmlproc then, since I'd like to look into generating Unicode strings in xmlproc myself, unless this is already done. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Feb 17 17:48:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 17 Feb 2001 18:48:38 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> Message-ID: <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> > We would benefit from a good test suite that is easy to run, self-evaluates > the results, contains plenty of regression tests, and makes it easy to add > tests. Although I know that no one (including me) wants to spend time on > this, once it's accomplished, we should be able to improve the quality of the > results while spending less effort on testing and bug fixing. I'd like to point out that PyXML already has such a thing. I run it regularly before building releases, and won't produce a release that has new test failures. Of course, additions to this test suite are infrequent. > I suggest we look at using pyUnit for this. I only looked at it for > a few minutes, but it looks promising. It might make sense to use > the OASIS parser test cases as a part of the test suite. Currently, the PyXML test suite uses regrtest for many tests; 4DOM has its own framework. Could you please say what is wrong with these frameworks? It seems that we don't really need a new framework; we need more tests. Of course, if somebody would contribute additional tests, requiring a new framework would be acceptable if we can bundle the framework with PyXML. > Second, I think the road map should include directions for future work. I'd avoid maintaining a pure wishlist. Additions to the roadmap should include commitments of individual contributors to actually contribute; ideally with a commitment to contribute at a specific time in the future (which may be well several months from now). Otherwise, people will think that they will get something soon, only to find out that they did not get it two years from now. > Xml Schemas would seem to be a prime candidate. Is anyone working > or wanting to work on py-xml-xchemas? Can we get some of Henry > Thompson's code? What about an API for xml schemas? Can we take the > lead in that? Or do we not want to (or no one is personally > interested?). Let's get it into the Roadmap. These are good questions. Without answers, I'd like to avoid giving the impression that any work on this is actually done. E.g. if somebody stands up and offers to define an XML Schema API, that would be a good thing to add to the roadmap, since it gives people a contact point, and may keep discussion alive. > Then there are the non-standards things. Is pyXml going to do > anything with RDF? Topic maps? What else? Into the roadmap, even if > there is no one to work on such projects at the moment. Please, no. Maybe I misunderstand the purpose of this document. If so, can you please explain what its purpose is? > Finally, let's add some direction for some of the other efforts that keep > popping up, like miniDOM. How will it fit into the picture. We've been > talking about it recently. Into the roadmap, I say! I think the direction of minidom should be best documented in the minidom documentation. If anybody can provide a specific patch against the minidom documentation, I'm sure there is interest in discussing that. When that is documented, it could give a clear guideline for the maintenance of the package. Regards, Martin From tpassin@home.com Sat Feb 17 19:12:10 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 17 Feb 2001 14:12:10 -0500 Subject: [XML-SIG] Roadmap document - finally! References: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> Message-ID: <003401c09915$878e8080$7cac1218@reston1.va.home.com> Martin v. Loewis wrote > > I'd avoid maintaining a pure wishlist. Additions to the roadmap should > include commitments of individual contributors to actually contribute; > ideally with a commitment to contribute at a specific time in the > future (which may be well several months from now). > > Otherwise, people will think that they will get something soon, only > to find out that they did not get it two years from now. ... I see the roadmap as more of a guide than a wishlist. To the extent that "we" have an idea of where we'd like to go, it should get into the roadmap. If there are some projects that have no contributor right now, the roadmap would show that there is a hole. The "Documentation" item in the current Roadmap is an example. Perhaps someone will decide to fill it. The wish-list things I see as different (although there is probably no clear line). A roadmap like this could also help people coordinate things, since some things might need to happen before others. > Please, no. Maybe I misunderstand the purpose of this document. If so, > can you please explain what its purpose is? > Maybe "roadmap" isn't the best term, then. Lars might want to say what he thought it was going to be, since he's the one who posted it. Regards, Tom P From fdrake@acm.org Sat Feb 17 19:05:38 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 17 Feb 2001 14:05:38 -0500 (EST) Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> References: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> Message-ID: <14990.52098.942491.452239@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > Of course, if somebody would contribute additional tests, requiring a > new framework would be acceptable if we can bundle the framework with > PyXML. This might be a good time to note that some of us at Digital Creations (mostly Martijn Pieters) have created a DOM test suite that can test for DOM Level 1 & 2 compliance of the "Core" and "XML" features (so far); we hope to make this a standard test for Python DOM implementations. The XML crew at DC will have to talk about how to make the suite readily available, but I hope it won't be too far off. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Sat Feb 17 19:39:43 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 17 Feb 2001 12:39:43 -0700 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Message from "Fred L. Drake, Jr." of "Sat, 17 Feb 2001 14:05:38 EST." <14990.52098.942491.452239@cj42289-a.reston1.va.home.com> Message-ID: <200102171939.MAA16669@localhost.localdomain> > > Martin v. Loewis writes: > > Of course, if somebody would contribute additional tests, requiring a > > new framework would be acceptable if we can bundle the framework with > > PyXML. > > This might be a good time to note that some of us at Digital > Creations (mostly Martijn Pieters) have created a DOM test suite that > can test for DOM Level 1 & 2 compliance of the "Core" and "XML" > features (so far); we hope to make this a standard test for Python DOM > implementations. > The XML crew at DC will have to talk about how to make the suite > readily available, but I hope it won't be too far off. Lars already has such a beast. Does your test suite incorporate or work with his? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rob@pangolin.org.uk Sat Feb 17 19:47:49 2001 From: rob@pangolin.org.uk (rob) Date: Sat, 17 Feb 2001 19:47:49 +0000 Subject: [XML-SIG] possible bug with xml dom events Message-ID: <20010217194749.A7835@samantha.inRobsRoom> hi, I couldn't see this mentioned anywhere so I thought I mention it if you change a cdata using "node.nodeValue = xxxx" no DOMCharacterDataModified event is generated if you do "node.data = xxxxx" the event is generated properly Is this a bug or a feature? From reading the stuff at w3c I expected "node.nodeValue = xxxx " to generate an event nb i'm using python 2.0 and pyxml 6.3 rob From MichaelDyck@home.com Sat Feb 17 21:33:24 2001 From: MichaelDyck@home.com (Michael Dyck) Date: Sat, 17 Feb 2001 13:33:24 -0800 Subject: [XML-SIG] problems with PyXML 0.6.3 References: <3A88F359.991E26FD@home.com> <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de> Message-ID: <3A8EEE24.75A025C4@home.com> "Martin v. Loewis" wrote: > > Michael Dyck wrote: > > Shouldn't the installer remove or rename the existing _xmlplus dir > > first? > > Unfortunately, the installer is based on distutils, which does not > provide such a capability. In that case, it might be nice for the download page (or message) to advise the user to do it before running the installer. > > xmldoc/test: > > Either xmldoc/README or (new file) xmldoc/test/README should tell you > > how to run the tests in this dir (`python testxml.py -g', I think), > > and how to interpret what happens. Similarly for subdirs. > > Maybe tests should be run automatically on installation. > > Not sure about that. Perhaps I should add a note that the tests should > *not* be run, unless you know what you do. Yeah, perhaps. But I think there should still be instructions somewhere. Otherwise the only way to *become* someone who knows what they're doing wrt tests is to read the code. Or maybe that's sufficient. > > When I ran one of my DOM programs, I got this exception: > > from xml.dom.Node import Node > > ImportError: No module named Node > > Yes, xml.dom.Node is gone. Why did you need to import it? If it was to > get at the node type constants, they live in xml.dom.Node now. Yup. > > When I tried removing the ".Node" from the import statement, the > > program ran as before, so apparently that is the fix, but shouldn't > > this be noted fairly prominently in xmldoc/README or > > xmldoc/README.dom? > > Contributions of documentation are welcome. I'd rather not maintain a > change log of all API changes; having the current state of the API > documented somewhere would be good, though. Well, it wouldn't have to be a log of *all* changes. What I'm really concerned about are the non-backwards-compatible changes. > > xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and > > ../README.html, which do not exist. > > Again, with the next 4DOM update, this might look completely > different. Will that be in PyXML 0.6.4? -Michael From larsga@garshol.priv.no Sun Feb 18 15:44:32 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Feb 2001 16:44:32 +0100 Subject: [XML-SIG] pysp released Message-ID: I've now put an experimental release of pysp on pysp is a wrapper for the SP SGML parser which can be used to develop SGML processing applications. No SAX driver is provided yet, since I'm not entirely certain where to put it. I think it should be distributed with pysp, but if there are any other opinions on this I'd like to hear them. --Lars M. From larsga@garshol.priv.no Sun Feb 18 16:29:12 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Feb 2001 17:29:12 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> References: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> Message-ID: Tom: thank you for this posting. You managed to start a discussion of lots of issues that I've wanted to see discussed for quite a while now. * Thomas B. Passin | | I'd like to suggest a few things, and see what people think. First | of all, I think we need to address testing and especially regression | testing. From reading various posts lately, it seems like a lot of | things pop up, get fixed in some version on the cvs tree, and later | on, who knows which version has what fixed, or how to prevent it | from popping up again. I certainly agree with this. As you can see from the roadmap I plan to improve the SAX test suite to ensure that it is well tested. xmlproc already has a good test suite. I don't believe anything more is needed there. javadom has an acceptable test suite, which, BTW, can be applied to any Python DOM implementation. Doing this might be a good idea. The test suite could be larger, but for something as seemingly little-used as javadom it probably is not worthwhile. | I suggest we look at using pyUnit for this. I only looked at it for | a few minutes, but it looks promising. It might make sense to use | the OASIS parser test cases as a part of the test suite. This is what test_javadom uses and it worked very well for that test case. This also has the benefit that PyUnit is already in the package. :) For some test suites, however, PyUnit is not suitable. The xmlproc tests use a homespun set of scripts because most of them parse an XML document and produce some output that is then compared with the output from a baseline run. PyUnit is not very suitable for this. (There are some API tests, however, that are tested with PyUnit.) So the question is, I guess, what is there that needs to be improved about the current test suite? The SAX tests for sure. Do we need a description of how to run it and how to add new tests? Does the suite need tighter integration? | Second, I think the road map should include directions for future | work. What's in there now is mostly finishing up on current work. | What might we want to get into? One thing is to keep the standard | tools up with newer versions of existing W3C Recs. This would | include DOM 3, and the new releases of xpath, xslt, and xpointer. | We did this for SAX2, and surely we will want/need to do the same | for the other key recs. Let's sketch out these intents in the | Roadmap. I agree with this, though I also agree with Martin that it might be confusing if we do this. So if we do, let's make sure that the text leaves no doubt that these are wishes for the future rather than planned work. | Next in the way of future directions would be important new Recs. | Xml Schemas would seem to be a prime candidate. Is anyone working | or wanting to work on py-xml-xchemas? Not me. If I wanted to do something like this I'd start with Hook, RELAX and TREX, in that order. Other than that I agree. If we can agree that we want it it might be useful to list it as an open task. | Then there are the non-standards things. Is pyXml going to do | anything with RDF? Topic maps? What else? Into the roadmap, even if | there is no one to work on such projects at the moment. I think RDF and topic maps are both outside the scope of the XML-SIG. Neither are really XML standards. | Finally, let's add some direction for some of the other efforts that | keep popping up, like miniDOM. How will it fit into the picture. | We've been talking about it recently. Into the roadmap, I say! If there is anything that needs to be done about minidom, then, yes, I think it should go in. | I apologise for the length of this post, but there is a lot to think | about here! There sure is. :) --Lars M. From larsga@garshol.priv.no Sun Feb 18 16:34:16 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Feb 2001 17:34:16 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <003401c09915$878e8080$7cac1218@reston1.va.home.com> References: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> <003401c09915$878e8080$7cac1218@reston1.va.home.com> Message-ID: * Thomas B. Passin | | Maybe "roadmap" isn't the best term, then. Lars might want to say | what he thought it was going to be, since he's the one who posted | it. My idea was to have a single document that people could look at to see where XML-SIG development is headed, what is going on and what may show up in the future. Martin is of course right that adding pure wishlist items may turn out to be disinformation, but I guess that can be avoided by putting a warning in the document. --Lars M. From larsga@garshol.priv.no Sun Feb 18 16:45:52 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Feb 2001 17:45:52 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <200102171719.KAA07503@localhost.localdomain> References: <200102171719.KAA07503@localhost.localdomain> Message-ID: * Tom Passin | | Second, I think the road map should include directions for future work. | What's in there now is mostly finishing up on current work. What might we | want to get into? One thing is to keep the standard tools up with newer | versions of existing W3C Recs. This would include DOM 3, * Uche Ogbuji | | On its way. Should I add a 4DOM section and note that DOM 3 support is in the pipeline? | But I know, I know, someone will have to implement XSchemas for | maximum Python Buzzworthiness. Henry Thompson has already done this for us. As far as I understand his implementation can be used with any parser, so we may want to make a SAX filter that can do schema validation based on his stuff. Does anyone have opinions on this? | [Topic maps] | I think Lars and Geir are manning this fort. Geir Ove is probably not going to work much more on tmproc. At least not in the near future. (He's got a commercial Java implementation to worry about.) I'll probably have to add SAX 2.0 and XTM support at some stage, but those holding their breaths waiting for this do so at their peril. | Off-head: | | XQL has finally awoken from its funk Would be interesting to see an implementation based on DbDom. | Experimental parser-level XInclude and XML:Base support maybe I would say that this belongs in SAX filters. This is planned for saxtools. | A low-level Infoset API would be interesting Personally I would prefer to see a nice tree-based XML API. My personal opinion is that the DOM stinks and needs replacement. Sean McGrath's xTree looks far better, in my opinion. | Schematron implemented in Python rather than XSLT | RELAX | TREX Yes. | UDDI | WebDAV client services Maybe, though probably not in the XML-SIG package. --Lars M. From larsga@garshol.priv.no Sun Feb 18 16:47:17 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Feb 2001 17:47:17 +0100 Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2 In-Reply-To: <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de> References: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | Please leave it as a reminder. Me do. | Thanks. This is accurate enough. I'm looking forward to the | integration of the current xmlproc then, since I'd like to look into | generating Unicode strings in xmlproc myself, unless this is already | done. It is not done, and we are now two people looking forward to this. I've been itching to do this ever since the first Python 2.0 beta. We'll see who gets there first. :-) --Lars M. From martin@loewis.home.cs.tu-berlin.de Sun Feb 18 21:13:02 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 18 Feb 2001 22:13:02 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: (message from Lars Marius Garshol on 18 Feb 2001 17:45:52 +0100) References: <200102171719.KAA07503@localhost.localdomain> Message-ID: <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> > Henry Thompson has already done this for us. As far as I understand > his implementation can be used with any parser, so we may want to > make a SAX filter that can do schema validation based on his > stuff. Does anyone have opinions on this? Assuming you are talking about XSV (http://dev.w3.org/cvsweb/xmlschema/), I had a short look at this once when I studied XPath. Unless I'm missing something obvious, it seems that the XPath support in it is quite incomplete. E.g. where is the evaluation of binary operators, or function calls to the builtin functions? Appart from that, I find the implementation strategy for XPath, well, interesting... I can't comment on the schema validation itself, as I don't understand that spec at all (I haven't even read it). > | A low-level Infoset API would be interesting > > Personally I would prefer to see a nice tree-based XML API. My > personal opinion is that the DOM stinks and needs replacement. Sean > McGrath's xTree looks far better, in my opinion. XSV also has a file called XMLInfoset.py. I'm not sure how that integrates with a parser; you may need to use LT XML. Regards, Martin From Nicolas.Chauvat@logilab.fr Mon Feb 19 09:25:59 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Mon, 19 Feb 2001 10:25:59 +0100 (CET) Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Message-ID: On 18 Feb 2001, Lars Marius Garshol wrote: > My idea was to have a single document that people could look at to see > where XML-SIG development is headed, what is going on and what may > show up in the future. Martin is of course right that adding pure > wishlist items may turn out to be disinformation, but I guess that can > be avoided by putting a warning in the document. FWIW, I'm voting +1 on that. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From larsga@garshol.priv.no Mon Feb 19 10:06:03 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Feb 2001 11:06:03 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> References: <200102171719.KAA07503@localhost.localdomain> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | Assuming you are talking about XSV | (http://dev.w3.org/cvsweb/xmlschema/), I was. | Appart from that, I find the implementation strategy for XPath, well, | interesting... How so? You've made me curious now. :) | XSV also has a file called XMLInfoset.py. I'm not sure how that | integrates with a parser; you may need to use LT XML. It doesn't integrate directly. XMLInfoset.py is just the data structure. The LTXMLInfoset.py module has the code for using LTXML to build a data structure. As far as I can tell no other parsers are used, but it seems that layer.py is the place to look to integrate them. It also seems that a SAX filter may be difficult, because from what I can tell one needs to build the entire tree before validating. --Lars M. From mj@digicool.com Mon Feb 19 10:48:27 2001 From: mj@digicool.com (Martijn Pieters) Date: Mon, 19 Feb 2001 11:48:27 +0100 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: <200102171939.MAA16669@localhost.localdomain>; from uche.ogbuji@fourthought.com on Sat, Feb 17, 2001 at 12:39:43PM -0700 References: <200102171939.MAA16669@localhost.localdomain> Message-ID: <20010219114827.B28553@zopatista.com> --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat, Feb 17, 2001 at 12:39:43PM -0700, Uche Ogbuji wrote: > > This might be a good time to note that some of us at Digital > > Creations (mostly Martijn Pieters) have created a DOM test suite > > that can test for DOM Level 1 & 2 compliance of the "Core" and "XML" > > features (so far); we hope to make this a standard test for Python > > DOM implementations. > > > > The XML crew at DC will have to talk about how to make the suite > > readily available, but I hope it won't be too far off. > > Lars already has such a beast. Does your test suite incorporate or work > with his? I cannot find any references to Lars' test suite; so I don't know if it will work with his. Maybe a small overview of what our suite does may help: - We use PyUnit; the whole Zope testing framework is based on it. - The suite tests only for DOM compliance, nothing implementation specific should be in there. There are some python binding tests, we may want to move those out. - The tests are organized by interface; the test classes follow the same inheritence structure as the interfaces in the DOM specs. So the CDATASection interface tests inherit the Text interface tests, which in turn inherit the Node interface tests. This has made the tests far more complete. - The test suites are further organised by feature set and compliance level. There are seperate files for Core level 1 and Core level 2 tests, and the same for the XML tests. Adding tests for a different DOM feature is trivial. - The "Core" feature is almost fully tested now; only some NO_MODIFICATION_ALLOWED and default attribute situations aren't tested for yet. - The "XML" feature tests are still missing Entity and Notation Node tests; adding these is my next priority. - I have made a first go at tests for the "Traversal" feature; only the DocumentTraversal interface is tested. - DOMString and text manipulating interface methods are not tested beyond ASCII text due to an implementation limitation of ParsedXML.DOM. So, implementations will not be tested if text is correctly treated when multi-byte UTF-16 characters are involved. - Currently, about 650 tests will be run on a DOM supporting all the features we can test for. To obtain the tests, you'll have to do a CVS checkout from cvs.zope.org: % cvs -d :pserver:anonymous@cvs.zope.org:/cvs-repository login (Logging in to anonymous@cvs.zope.org) CVS Password: anonymous # So the password is 'anonymous' % cvs -z7 -d :pserver:anonymous@cvs.zope.org:/cvs-repository checkout \ -d DOMTests Products/DC/ParsedXML/test/domapi To test a DOM implementation, you need to pass in your DOMImplementation object, and a parsing method that will create a DOM tree for a given XML string. The latter is used to create Notation, Entity and default Attr Nodes, which you can't produce with the current DOM API. I attached a sample script which tests the PyXML DOM; it assumes you made a stand-alone checkout of the tests as described above into a DOMTests directory on the Python path. It requires a patched PyXML that will return true on DOMImplementation.hasFeature('Core', '2.0') (fixed in the FourThought CVS, I believe). See bug #132683 on SourceForge (now closed). When running the tests, there are three that trigger an infinite loop in the PyXML 0.6.3 suite. When a test seems to take too long, a keyboard interrupt will cause PyUnit to skip to the next test (and log a traceback on KeyboardInterrupt for the offending test). -- Martijn Pieters | Software Engineer mailto:mj@digicool.com | Digital Creations http://www.digicool.com/ | Creators of Zope http://www.zope.org/ --------------------------------------------- --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="test_PyXMLDOM.py" #!/usr/bin/env python from xml.dom import implementation from xml.dom.ext.reader.Sax2 import Reader from DOMTests import DOMImplementationTestSuite try: from cStringIO import StringIO except ImportError: from StringIO import StringIO def Sax2ParseString(self, xml): file = StringIO(xml) return Reader().fromStream(file) def test_suite(): """Create a test suite for a DOM implementation.""" return DOMImplementationTestSuite(implementation, Sax2ParseString) if __name__ == '__main__': import unittest unittest.TextTestRunner().run(test_suite()) --XsQoSWH+UP9D9v3l-- From martin@loewis.home.cs.tu-berlin.de Mon Feb 19 20:08:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 19 Feb 2001 21:08:40 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: (message from Lars Marius Garshol on 19 Feb 2001 11:06:03 +0100) References: <200102171719.KAA07503@localhost.localdomain> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> Message-ID: <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de> > | Appart from that, I find the implementation strategy for XPath, well, > | interesting... > > How so? Well, try to understand def parse(self,str): disjuncts=map(lambda s:string.split(s,'/'),string.split(str,'|')) return map(lambda d,ss=self:map(lambda p,s=ss:s.patBit(p), d), disjuncts) where patbit will return things like return lambda e,y=None,s=self,a=part,ns=ns:s.attrs(e,a,ns,y) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 19 20:59:18 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 19 Feb 2001 21:59:18 +0100 Subject: [XML-SIG] problems with PyXML 0.6.3 In-Reply-To: <3A8EEE24.75A025C4@home.com> (message from Michael Dyck on Sat, 17 Feb 2001 13:33:24 -0800) References: <3A88F359.991E26FD@home.com> <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de> <3A8EEE24.75A025C4@home.com> Message-ID: <200102192059.f1JKxIL01502@mira.informatik.hu-berlin.de> > In that case, it might be nice for the download page (or message) to advise > the user to do it before running the installer. Ok, the next release will do this in the installer. > Yeah, perhaps. But I think there should still be instructions > somewhere. Otherwise the only way to *become* someone who knows > what they're doing wrt tests is to read the code. Or maybe that's > sufficient. Again: contributions are welcome. I personally won't change the status quo in this respect. > Well, it wouldn't have to be a log of *all* changes. What I'm really > concerned about are the non-backwards-compatible changes. Same issue: I'd be happy if there was any documentation describing the current API in detail; I cannot find the time to produce a detailed report of what has changed between releases - especially if these packages are updated by third-party contributors. It is much easier if people that run into problems report them, and ask for help in porting to a new release. If many people are affected, and no easy transition is possible, API breakage should be considered a bug and fixed in a subsequent release, instead of being documented. > > Again, with the next 4DOM update, this might look completely > > different. > > Will that be in PyXML 0.6.4? Probably yes; I hope that the 4DOM integration will happen RSN. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Feb 19 21:11:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 19 Feb 2001 22:11:47 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102102213.RAA28403@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sat, 10 Feb 2001 17:13:23 -0500) References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> Message-ID: <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> > > xml_dom_object = reader.fromUri(filename) #should work for either > > URL or file > Let's talk about this comment. Is it really a good idea to build URL > access right into the API here? I can't find out whether this has been settled. Did you propose to drop the support for URLs in the API, or the one for local files. We just had a report where urllib apparently decided to use "c" as the protocol name; I'm not entirely sure what the exact cause was. > Case in point: I found this bit in saxutilx.py: > > if os.path.isfile(sysid): > basehead = os.path.split(os.path.normpath(base))[0] > source.setSystemId(os.path.join(basehead, sysid)) > f = open(sysid, "rb") > else: > source.setSystemId(urlparse.urljoin(base, sysid)) > f = urllib.urlopen(source.getSystemId()) > > Now I don't know under which circumstances this get triggered (the > context is obscure) prepare_input_source is invoked by every parser when processing the argument to .parse(), so the common usage is p = make_parser() p.setContentHandler(something) p.parse(filename) Instead of filename, you can have URLs, stream, and InputSource objects (the Java API only supports InputSource here). > but I'd say it's a bad idea to just try to open a URL when a string > isn't a local file. Maybe *you* live in a world where the network > is "always on" (and I do too!), but for plenty of folks, it's rather > annoying to find that their modem starts dialing out each time they > make a typo in a filename. But would the modem actually start dialling? Wouldn't it rather determine that the protocol is "file" and the report that the file is missing? So I think it would either report an unknown url type, or an ENOENT. What kind of typo did you think of? > The application knows this, but the library doesn't. It's also fine > to have an alternative API that takes a URL instead of a local > filename -- but it's not okay to attempt to overlap the two > namespaces. The application can always make sure that the right thing is processed by opening it itself, and then passing that to the parser. Regards, Martin From larsga@garshol.priv.no Mon Feb 19 21:31:27 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Feb 2001 22:31:27 +0100 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de> References: <200102171719.KAA07503@localhost.localdomain> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | Well, try to understand | | def parse(self,str): | disjuncts=map(lambda s:string.split(s,'/'),string.split(str,'|')) | return map(lambda d,ss=self:map(lambda p,s=ss:s.patBit(p), | d), | disjuncts) | | where patbit will return things like | | return lambda e,y=None,s=self,a=part,ns=ns:s.attrs(e,a,ns,y) I see what you mean. Interesting, indeed. :-) --Lars M. From guido@digicool.com Mon Feb 19 21:49:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 19 Feb 2001 16:49:20 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Your message of "Mon, 19 Feb 2001 22:11:47 +0100." <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> Message-ID: <200102192149.QAA24348@cj20424-a.reston1.va.home.com> > > > xml_dom_object = reader.fromUri(filename) #should work for either > > > URL or file > > > Let's talk about this comment. Is it really a good idea to build URL > > access right into the API here? > > I can't find out whether this has been settled. Did you propose to > drop the support for URLs in the API, or the one for local files. I'd like to drop support for URLs; I don't think the typical computer is sufficiently networked to make this work well. > We just had a report where urllib apparently decided to use "c" as the > protocol name; I'm not entirely sure what the exact cause was. That's the ambiguity between local filenames and URLs. You have to decide whether filenames passed to APIs are in local filename space or in URL space, and not try to guess based on what the name looks like. On the Mac, all absolute filenames look like foo:bar or foo:bar:bletch, so there you have even less to work with. > > Case in point: I found this bit in saxutilx.py: > > > > if os.path.isfile(sysid): > > basehead = os.path.split(os.path.normpath(base))[0] > > source.setSystemId(os.path.join(basehead, sysid)) > > f = open(sysid, "rb") > > else: > > source.setSystemId(urlparse.urljoin(base, sysid)) > > f = urllib.urlopen(source.getSystemId()) > > > > Now I don't know under which circumstances this get triggered (the > > context is obscure) > > prepare_input_source is invoked by every parser when processing the > argument to .parse(), so the common usage is > > p = make_parser() > p.setContentHandler(something) > p.parse(filename) > > Instead of filename, you can have URLs, stream, and InputSource > objects (the Java API only supports InputSource here). I would suggest to have separate APIs depending on the argument type, e.g. p.parseFile(filename), p.parseURL(url), p.parseStream(InputSource), p.parseString(text). (And no, Java overloading wouldn't help much here, since three out of four APIs have string arguments.) > > but I'd say it's a bad idea to just try to open a URL when a string > > isn't a local file. Maybe *you* live in a world where the network > > is "always on" (and I do too!), but for plenty of folks, it's rather > > annoying to find that their modem starts dialing out each time they > > make a typo in a filename. > > But would the modem actually start dialling? Wouldn't it rather > determine that the protocol is "file" and the report that the file is > missing? So I think it would either report an unknown url type, or an > ENOENT. What kind of typo did you think of? Maybe I was thinking of another case (not involving PyXML) that was reported to me third hand, where a filename containing a colon on Windows (using Cygwin tools) ended up being interpreted as Unix rcp filename syntax, and the system was doing a host lookup on the part before the colon -- that really does make the modem dial! > > The application knows this, but the library doesn't. It's also fine > > to have an alternative API that takes a URL instead of a local > > filename -- but it's not okay to attempt to overlap the two > > namespaces. > > The application can always make sure that the right thing is processed > by opening it itself, and then passing that to the parser. Sure, and if a string is given, it should be assumed to be a local filename unless the API name has "URL" in it. --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Mon Feb 19 22:06:44 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 15:06:44 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Guido van Rossum of "Mon, 19 Feb 2001 16:49:20 EST." <200102192149.QAA24348@cj20424-a.reston1.va.home.com> Message-ID: <200102192206.PAA15107@localhost.localdomain> > > > > xml_dom_object = reader.fromUri(filename) #should work for either > > > > URL or file > > > > > Let's talk about this comment. Is it really a good idea to build URL > > > access right into the API here? > > > > I can't find out whether this has been settled. Did you propose to > > drop the support for URLs in the API, or the one for local files. > > I'd like to drop support for URLs; I don't think the typical computer > is sufficiently networked to make this work well. In this case, the typical computer user will have a great deal of trouble using any XML application in any language. Almost all of them use URIs as basis, and for good reason. Special support for local files are almost universally a mere convenience. Most XML processing specifications mandate that the URI of the XML entity that contains an infoset node is used as the basis for further processing. To me, this argues strongly for dropping local files rather than URIs if we must choose. Some XML specs would be very difficult to implement properly if the low-level tools became file-system-only readers. > > We just had a report where urllib apparently decided to use "c" as the > > protocol name; I'm not entirely sure what the exact cause was. > > That's the ambiguity between local filenames and URLs. You have to > decide whether filenames passed to APIs are in local filename space or > in URL space, and not try to guess based on what the name looks like. > On the Mac, all absolute filenames look like foo:bar or > foo:bar:bletch, so there you have even less to work with. The Mac people should have spoken to the IETF a decade ago when URLs emerged, or a bit later when URIs came out. I suspect, again that if this is the case, they suffer much more pain in XML processing than is inflicted on them by PyXML. > > > Case in point: I found this bit in saxutilx.py: > > > > > > if os.path.isfile(sysid): > > > basehead = os.path.split(os.path.normpath(base))[0] > > > source.setSystemId(os.path.join(basehead, sysid)) > > > f = open(sysid, "rb") > > > else: > > > source.setSystemId(urlparse.urljoin(base, sysid)) > > > f = urllib.urlopen(source.getSystemId()) > > > > > > Now I don't know under which circumstances this get triggered (the > > > context is obscure) > > > > prepare_input_source is invoked by every parser when processing the > > argument to .parse(), so the common usage is > > > > p = make_parser() > > p.setContentHandler(something) > > p.parse(filename) > > > > Instead of filename, you can have URLs, stream, and InputSource > > objects (the Java API only supports InputSource here). > > I would suggest to have separate APIs depending on the argument type, > e.g. p.parseFile(filename), p.parseURL(url), > p.parseStream(InputSource), p.parseString(text). (And no, Java > overloading wouldn't help much here, since three out of four APIs have > string arguments.) Sure, one can add a parseFile, but what do you do with ]> &foo; URI or file? Note that this is a trick question, and the "trick" is *exactly* my point. > > > but I'd say it's a bad idea to just try to open a URL when a string > > > isn't a local file. Maybe *you* live in a world where the network > > > is "always on" (and I do too!), but for plenty of folks, it's rather > > > annoying to find that their modem starts dialing out each time they > > > make a typo in a filename. > > > > But would the modem actually start dialling? Wouldn't it rather > > determine that the protocol is "file" and the report that the file is > > missing? So I think it would either report an unknown url type, or an > > ENOENT. What kind of typo did you think of? > > Maybe I was thinking of another case (not involving PyXML) that was > reported to me third hand, where a filename containing a colon on > Windows (using Cygwin tools) ended up being interpreted as Unix rcp > filename syntax, and the system was doing a host lookup on the part > before the colon -- that really does make the modem dial! Yes, but that does sound like a bug elsewhere. > > > The application knows this, but the library doesn't. It's also fine > > > to have an alternative API that takes a URL instead of a local > > > filename -- but it's not okay to attempt to overlap the two > > > namespaces. > > > > The application can always make sure that the right thing is processed > > by opening it itself, and then passing that to the parser. > > Sure, and if a string is given, it should be assumed to be a local > filename unless the API name has "URL" in it. It's not all that easy, as evidenced by my example above. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Feb 19 22:22:15 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 15:22:15 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Uche Ogbuji of "Mon, 19 Feb 2001 15:06:44 MST." <200102192206.PAA15107@localhost.localdomain> Message-ID: <200102192222.PAA16128@localhost.localdomain> > Sure, one can add a parseFile, but what do you do with > > > > ]> > &foo; > > URI or file? > > Note that this is a trick question, and the "trick" is *exactly* my point. On re-reading, it seems as if I'm trying to be coy, but I'm not. My point is that "foo.bar" must be evaluated against the base URI of the entity in which it is contained. Here we have no choice of letting the user say "parseFile" or "parseUri". The same trap is all over the place: Basically, if you want to play with XML, you have to play with URI. There's not much for it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From guido@digicool.com Mon Feb 19 22:34:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 19 Feb 2001 17:34:16 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Your message of "Mon, 19 Feb 2001 15:06:44 MST." <200102192206.PAA15107@localhost.localdomain> References: <200102192206.PAA15107@localhost.localdomain> Message-ID: <200102192234.RAA24747@cj20424-a.reston1.va.home.com> > > I'd like to drop support for URLs; I don't think the typical computer > > is sufficiently networked to make this work well. > > In this case, the typical computer user will have a great deal of trouble > using any XML application in any language. Almost all of them use URIs as > basis, and for good reason. Special support for local files are almost > universally a mere convenience. > > Most XML processing specifications mandate that the URI of the XML > entity that contains an infoset node is used as the basis for > further processing. To me, this argues strongly for dropping local > files rather than URIs if we must choose. Some XML specs would be > very difficult to implement properly if the low-level tools became > file-system-only readers. Can you give more details of how this is used? I've got very limited XML experience, and so far it all falls in the category of "here's a file; give me a DOM tree for it" or "here's a DOM tree, write it to a file". There are no URLs anywhere. Sometimes instead of a file it'll be text data read from or written to a database. But no URLs. > The Mac people should have spoken to the IETF a decade ago when URLs > emerged, or a bit later when URIs came out. I suspect, again that > if this is the case, they suffer much more pain in XML processing > than is inflicted on them by PyXML. That's a pretty intolerant attitude you're displaying there. They need not suffer at all if at all times it is clear whether a name is a URL or a filename. It's trying to fold the two namespaces into one that I'm fighting here. > > I would suggest to have separate APIs depending on the argument type, > > e.g. p.parseFile(filename), p.parseURL(url), > > p.parseStream(InputSource), p.parseString(text). (And no, Java > > overloading wouldn't help much here, since three out of four APIs have > > string arguments.) > > Sure, one can add a parseFile, but what do you do with > > > > ]> > &foo; > > URI or file? > > Note that this is a trick question, and the "trick" is *exactly* my point. So explain the trick. I don't know enough XML to understand what it means. I don't even know which thing you are asking about! spam? foo? foo.bar? &foo;? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Mon Feb 19 22:38:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 19 Feb 2001 23:38:24 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102192206.PAA15107@localhost.localdomain> (message from Uche Ogbuji on Mon, 19 Feb 2001 15:06:44 -0700) References: <200102192206.PAA15107@localhost.localdomain> Message-ID: <200102192238.f1JMcOF06853@mira.informatik.hu-berlin.de> > Most XML processing specifications mandate that the URI of the XML > entity that contains an infoset node is used as the basis for > further processing. I agree. The XML recommendation is quite clear about this: # The SystemLiteral is called the entity's system identifier. It is a # URI, which may be used to retrieve the entity. So in XML, a system identifier is an URI, even though in SGML, it is system dependent (as the name suggests). It goes on # Unless otherwise provided by information outside the scope of this # specification (...), relative URIs are relative to the location of # the resource within which the entity declaration occurs. A URI might # thus be relative to the document entity, to the entity containing # the external DTD subset, or to some other external parameter entity. So if a document was downloaded from http://www.python.org/xml/foo.xml, and encounter a system identifier of "../bar/bar.dtd", it MUST be interpreted as http://www.python.org/bar/bar.dtd. Regards, Martin From uche.ogbuji@fourthought.com Mon Feb 19 22:48:04 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 15:48:04 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from Guido van Rossum of "Mon, 19 Feb 2001 17:34:16 EST." <200102192234.RAA24747@cj20424-a.reston1.va.home.com> Message-ID: <200102192248.PAA17821@localhost.localdomain> > > > I'd like to drop support for URLs; I don't think the typical computer > > > is sufficiently networked to make this work well. > > > > In this case, the typical computer user will have a great deal of trouble > > using any XML application in any language. Almost all of them use URIs as > > basis, and for good reason. Special support for local files are almost > > universally a mere convenience. > > > > Most XML processing specifications mandate that the URI of the XML > > entity that contains an infoset node is used as the basis for > > further processing. To me, this argues strongly for dropping local > > files rather than URIs if we must choose. Some XML specs would be > > very difficult to implement properly if the low-level tools became > > file-system-only readers. > > Can you give more details of how this is used? I've got very limited > XML experience, and so far it all falls in the category of "here's a > file; give me a DOM tree for it" or "here's a DOM tree, write it to a > file". There are no URLs anywhere. Sometimes instead of a file it'll > be text data read from or written to a database. But no URLs. Sorry. Basically, it's what you do with the DOM, and especially how attributes, system identifiers and other such creatures are interpreted. Basically, parseFile or parseUri in a top-level URI is typically only a small cross-section of the usage pattern in any XML processor. Other functions such as Stylesheet processing, XIncludes, xml:base, RDF, and pretty much anything else, gets these strings and are *required* to interpret these as URIs. If they were originally interpreted purely as files, then all the points of confusion you pointed out are immediately compounded as the system tries to reconcile the relative URIs against the "base URI" which is actually a file system file. This is actually a problem that I have seen people run into far more often than any worries about computers not having network connections. I'be been sorely tempted to remove file support just because it eliminates confusion with the large body of XML processing that requires relative URI normalization and resolution. > > The Mac people should have spoken to the IETF a decade ago when URLs > > emerged, or a bit later when URIs came out. I suspect, again that > > if this is the case, they suffer much more pain in XML processing > > than is inflicted on them by PyXML. > > That's a pretty intolerant attitude you're displaying there. They > need not suffer at all if at all times it is clear whether a name is a > URL or a filename. It's trying to fold the two namespaces into one > that I'm fighting here. Not my intention. My point is that I can't imagine PyXML is an outstanding problem for XML developers on a platform that uses colons as path separators. It's a purely technical argument. I don't know a thing about the Mac. > > > I would suggest to have separate APIs depending on the argument type, > > > e.g. p.parseFile(filename), p.parseURL(url), > > > p.parseStream(InputSource), p.parseString(text). (And no, Java > > > overloading wouldn't help much here, since three out of four APIs have > > > string arguments.) > > > > Sure, one can add a parseFile, but what do you do with > > > > > > > > > ]> > > &foo; > > > > URI or file? > > > > Note that this is a trick question, and the "trick" is *exactly* my point. > > So explain the trick. I don't know enough XML to understand what it > means. I don't even know which thing you are asking about! spam? > foo? foo.bar? &foo;? "foo.bar". I think I explained it better in my succeeding message. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Mon Feb 19 22:49:03 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Feb 2001 23:49:03 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102192149.QAA24348@cj20424-a.reston1.va.home.com> References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> <200102192149.QAA24348@cj20424-a.reston1.va.home.com> Message-ID: * Guido van Rossum | | I'd like to drop support for URLs; I don't think the typical | computer is sufficiently networked to make this work well. Dropping support for URLs not really an option when dealing with XML. The XML recommendation states clearly that all system identifiers[1] are URIs in XML. What this really means is that we have two cases to deal with: - XML software is provided a reference to an XML document - XML document references as used internally by XML software and also as passed back out to client software In the second case the references must be URIs, since it is a deep-seated assumption in the entire XML family of specifications that all such references will be URIs. This is especially clear in the case of entity references (as Uche illustrated), but most other XML specifications are equally clear on this point, such as the infoset, XSLT, XBase and so on. Of course, in the first case there is no reason why it shouldn't be allowed to pass file names into the APIs to have them converted into URIs there. In fact, I think there is very good reason to do so, since my experience with the Java tools that require URIs have been fairly painful. (Who remembers the precise syntax for file URIs on all kinds of platforms anyway?) Outlawing URIs, however is not really an option. [1] What most people would call 'references to external resources', usually files. | I would suggest to have separate APIs depending on the argument | type, e.g. p.parseFile(filename), p.parseURL(url), | p.parseStream(InputSource), p.parseString(text). That may be a better option than to have a single function/method, but that is really separate from the issue of whether to allow URIs or not. --Lars M. From larsga@garshol.priv.no Mon Feb 19 23:03:51 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Feb 2001 00:03:51 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102192234.RAA24747@cj20424-a.reston1.va.home.com> References: <200102192206.PAA15107@localhost.localdomain> <200102192234.RAA24747@cj20424-a.reston1.va.home.com> Message-ID: * Uche Ogbuji | | Most XML processing specifications mandate that the URI of the XML | entity that contains an infoset node is used as the basis for | further processing. To me, this argues strongly for dropping local | files rather than URIs if we must choose. Some XML specs would be | very difficult to implement properly if the low-level tools became | file-system-only readers. * Guido van Rossum | | Can you give more details of how this is used? The simplest example is perhaps ]> The Meaning of Life Life, the Universe and Everything &chapter1; &chapter2; &chapter3; This XML document is really a hub document for a book, which contains metadata about the book (the title), the part structure and references to each chapter. The chapters, however, reside in files. The XML recommendation says clearly that the bit after the 'SYSTEM' must be a URI, and that it is turned into an absolute URI by being resolved against the base URI of the document. With the XML Base specification you can put attributes named 'xml:base' into your documents to locally change the base URI in a part of the document. This then interacts with other XML specifications that allow URI references to appear in the contents of the document. The XML syntax for topic maps is one example of this. This does not mean that we can't have a parseFile method, but that the file name given must be converted into a URI before the XML system starts using it. | I've got very limited XML experience, and so far it all falls in the | category of "here's a file; give me a DOM tree for it" or "here's a | DOM tree, write it to a file". There are no URLs anywhere. | Sometimes instead of a file it'll be text data read from or written | to a database. But no URLs. That is probably the most common use case in the near future, but not everyone uses XML like that and the entire family of standards assumes that the basic framework is that of the web. Quite a few XML applications work across the network and really rely on it being possible to parse remote documents (RSS perhaps being the most famous), and I think this will only be more common in the future. And in any case it works just fine already. :-) [larsga@pc36 dist]$ python xvcmd.py http://www.w3.org/TR/2000/REC-xml-20001006.xml xmlproc version 0.70 Parsing 'http://www.w3.org/TR/2000/REC-xml-20001006.xml' W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:2:9: Attribute 'id' defined more than once W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:3:9: Attribute 'role' defined more than once W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:414:9: Attribute 'diff' defined more than once E:http://www.w3.org/TR/2000/REC-xml-20001006.xml:2816:76: Actual value of attribute 'xmlns:xlink' does not match fixed value Parse complete, 1 error(s) and 3 warning(s) --Lars M. From tpassin@home.com Mon Feb 19 23:27:20 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 19 Feb 2001 18:27:20 -0500 Subject: [XML-SIG] Using PyExpat.py References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> <200102192149.QAA24348@cj20424-a.reston1.va.home.com> Message-ID: <002801c09acb$823dca20$7cac1218@reston1.va.home.com> This discussion highlights why I've said several times that you should use file:/// if you mean a file on your local machine. I've used a few commandline tools where you actually had to write that (I forget which ones). I was annoyed at first, but soon got used to it. As soon as you do insist on using file:///, distinctions about local files go away, and it becomes the responsibility of the url handler code to figure out where to go to get that particular resource. Also, you can get files on network file systems with no extra work, as in file://yourcomputer/... It's a convenience to let the code try to figure it out from a bare filename. But all that code should do is to translate a bare absolute local file reference to the file:/// scheme, then hand it off. Cheers, Tom P From fredrik@effbot.org Mon Feb 19 23:38:04 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 20 Feb 2001 00:38:04 +0100 Subject: [XML-SIG] Using PyExpat.py References: <200102192206.PAA15107@localhost.localdomain> Message-ID: <00ba01c09acd$020db5c0$e46940d5@hagrid> Uche Ogbuji wrote: > > > I can't find out whether this has been settled. Did you propose to > > > drop the support for URLs in the API, or the one for local files. > > > > I'd like to drop support for URLs; I don't think the typical computer > > is sufficiently networked to make this work well. > > In this case, the typical computer user will have a great deal of trouble > using any XML application in any language. Almost all of them use URIs as > basis, and for good reason. Special support for local files are almost > universally a mere convenience. > > Most XML processing specifications mandate that the URI of the XML entity that > contains an infoset node is used as the basis for further processing. To me, > this argues strongly for dropping local files rather than URIs if we must > choose. Some XML specs would be very difficult to implement properly if the > low-level tools became file-system-only readers. is the code Guido quoted taken from a utility function (e.g. a standard input handler), or is it part of the core library: if os.path.isfile(sysid): basehead = os.path.split(os.path.normpath(base))[0] source.setSystemId(os.path.join(basehead, sysid)) f = open(sysid, "rb") else: source.setSystemId(urlparse.urljoin(base, sysid)) f = urllib.urlopen(source.getSystemId()) if the latter, I hope you realize that this can be abused in all sorts of interesting ways... Cheers /F From uche.ogbuji@fourthought.com Mon Feb 19 23:40:43 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 16:40:43 -0700 Subject: [XML-SIG] ANN: 4Suite and 4Suite Server 0.10.2 Message-ID: <200102192340.QAA21305@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.10.2 and 4Suite Server 0.10.2 ---------------------------- Open source XML processing tools and an XML data server http://4Suite.org http://Fourthought.com/4SuiteServer 4Suite News ----------- * ODS: optimized back end * ODS: Better collection support * ODS: DBM and Oracle driver fixes * XSLT: format-number overhaul * XPath: C boolean extension implemented for performance * XPath: Added extension functs search-re, base-uri * RDF: serialization fixes * RDF: shelve (DBM) driver * Localization support * Friendlier error messages * URI handling fixes * Many misc bug-fixes 4Suite Server News ------------------ * Many usability improvements * omniNotify: Removed our implementation of Event Channel and replaced with omniNotify * TxFactory: Rewrote to avoid common race conditions * Strobe: (formerly Reaper) Added a test harness * UserServer: Moved many user specific things out of the common IDL * UserServer: Added a test harness * RdfServer: Now uses system exceptions for common exception cases. * RdfServer: Added a test harness * XmlServer: Allow Raw files * XmlServer: Now uses the standard system exceptions * XmlServer: Added a proper test harness * XmlServer: Added XSLT-based API to 4SS * MetaUserServer: Completed the implementation * MetaUserServer: Added a proper test harness * MetaXmlServer: Completed the implementation * MetaXmlServer: Added a proper test harness * HTTPListener: Added a test harness * HTTPListener: XSLT support * HTTPListener: Custom handler support * webDAV: Incorporated pydav into 4SS * webDAV: Finished initial implementation * All: Renamed interfaces (where approriate) to follow Create/Fetch/Update/Delete naming convention. * All: Added command-line tools * All: Added console * All: Added populate script to bootstrap useful resources * All: More comprehensive documentation * All: Many, many fixes and optimizations 4Suite is a collection of Python tools for XML processing and object database management. It provides support for XML parsing, several transient and persistent DOM implementations, XPath expressions, XPointer, XSLT transforms, XLink, RDF and ODMG object databases. 4Suite Server is a platform for XML processing. It features an XML data repository, a rules-based engine, and XSLT transforms, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also supports related services such as distributed transactions and access control lists. Along with basic console and command-line management, it supports remote, cross-platform and cross-language access through CORBA, WebDAV, HTTP and other request protocols to be added shortly. 4Suite Server is not meant to be a full-blown application server. It provides highly-specialized services for XML processing that can be used with other application servers. All the software is open-source and free to download. Priority support and customization is available from Fourthought, Inc. For more information on this, see the http://FourThought.com, or contact Fourthought at info@fourthought.com or +1 303 583 9900 More info and Obtaining 4Suite and 4Suite Server ------------------------------------------------ Please see http://4Suite.org http://Fourthought.com/4SuiteServer >From where you can download source, Windows and Linux binaries. 4Suite is distributed under a license similar to that of the Apache Web Server. From uche.ogbuji@fourthought.com Mon Feb 19 23:43:24 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 16:43:24 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from "Thomas B. Passin" of "Mon, 19 Feb 2001 18:27:20 EST." <002801c09acb$823dca20$7cac1218@reston1.va.home.com> Message-ID: <200102192343.QAA21554@localhost.localdomain> > This discussion highlights why I've said several times that you should use > file:/// if you mean a file on your local machine. Agreed. That's why I was saying I sometimes had a mind to banish regular file names. People can always use "file:" if they need to. Sometimes I think the extra typing is worth the minimized confusion. > I've used a few > commandline tools where you actually had to write that (I forget which ones). > I was annoyed at first, but soon got used to it. That was another point I was trying to make: PyXML is hardly unique in this. URI is the native form for most compliant XML processors. > As soon as you do insist on > using file:///, distinctions about local files go away, and it becomes the > responsibility of the url handler code to figure out where to go to get that > particular resource. Also, you can get files on network file systems with no > extra work, as in file://yourcomputer/... Yes. > It's a convenience to let the code try to figure it out from a bare filename. > But all that code should do is to translate a bare absolute local file > reference to the file:/// scheme, then hand it off. Agreed. I still think the algorithm you posted, and your follow-up, make most sense. It's just a matter of implementing it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Feb 19 23:45:51 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 19 Feb 2001 16:45:51 -0700 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Message from "Fredrik Lundh" of "Tue, 20 Feb 2001 00:38:04 +0100." <00ba01c09acd$020db5c0$e46940d5@hagrid> Message-ID: <200102192345.QAA21718@localhost.localdomain> > Uche Ogbuji wrote: > > Most XML processing specifications mandate that the URI of the XML entity that > > contains an infoset node is used as the basis for further processing. To me, > > this argues strongly for dropping local files rather than URIs if we must > > choose. Some XML specs would be very difficult to implement properly if the > > low-level tools became file-system-only readers. > > is the code Guido quoted taken from a utility function (e.g. a standard > input handler), or is it part of the core library: > > if os.path.isfile(sysid): > basehead = os.path.split(os.path.normpath(base))[0] > source.setSystemId(os.path.join(basehead, sysid)) > f = open(sysid, "rb") > else: > source.setSystemId(urlparse.urljoin(base, sysid)) > f = urllib.urlopen(source.getSystemId()) > > if the latter, I hope you realize that this can be abused in all sorts of > interesting ways... I forgot who it was on XML-DEV who said that XML is a dream for malicious network abusers. I'm not arguing whether or not it's a good thing that XML is so URI-happy. I'm just stating the fact. As for your precise question, Guido said it came from saxutils.py -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Mon Feb 19 23:53:44 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 00:53:44 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <00ba01c09acd$020db5c0$e46940d5@hagrid> (fredrik@effbot.org) References: <200102192206.PAA15107@localhost.localdomain> <00ba01c09acd$020db5c0$e46940d5@hagrid> Message-ID: <200102192353.f1JNrim07455@mira.informatik.hu-berlin.de> > is the code Guido quoted taken from a utility function (e.g. a standard > input handler), or is it part of the core library: > > if os.path.isfile(sysid): > basehead = os.path.split(os.path.normpath(base))[0] > source.setSystemId(os.path.join(basehead, sysid)) > f = open(sysid, "rb") > else: > source.setSystemId(urlparse.urljoin(base, sysid)) > f = urllib.urlopen(source.getSystemId()) > > if the latter, I hope you realize that this can be abused in all sorts of > interesting ways... That is part of xml.sax.saxlib.prepare_input_source. I don't realize all the sorts in which this can be abused, though - can you elaborate some? Regards, Martin From larsga@garshol.priv.no Tue Feb 20 07:48:32 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Feb 2001 08:48:32 +0100 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: <20010219114827.B28553@zopatista.com> References: <200102171939.MAA16669@localhost.localdomain> <20010219114827.B28553@zopatista.com> Message-ID: * Martijn Pieters | | I cannot find any references to Lars' test suite; so I don't know if it | will work with his. I think Uche is referring to test/test_javadom.py in the PyXML package. It's not very big, and it sounds like you've probably covered what it does already. It also uses PyUnit. | - The suite tests only for DOM compliance, nothing implementation specific | should be in there. There are some python binding tests, we may want to | move those out. I don't think they should be. The Python extensions are more a part of the interface than some of the W3C-defined stuff, I would say. | - DOMString and text manipulating interface methods are not tested beyond | ASCII text due to an implementation limitation of ParsedXML.DOM. So, | implementations will not be tested if text is correctly treated when | multi-byte UTF-16 characters are involved. By "multi-byte UTF-16 characters" I assume you mean Unicode characters outside the BMP that are represented using two surrogates? But this test suite really sounds like an excellent piece of work. It would be great if we could start using it in the PyXML package, and also if some scheme could be worked out so that both groups could easily contribute to the package. --Lars M. From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 07:50:37 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 08:50:37 +0100 Subject: [XML-SIG] Preparing for 0.6.4 Message-ID: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de> I'm going to release PyXML 0.6.4 later this week or early next week. If you have any pending changes that you want to integrate, please let me know, or commit them yourself. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 07:56:09 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 08:56:09 +0100 Subject: [XML-SIG] Pending patches Message-ID: <200102200756.f1K7u9A01449@mira.informatik.hu-berlin.de> There is a number of patches pending on SF which need review, in particular: 4DOM: 103418, 103417 wddx: 103408 xmlproc: 103470 I'd appreciate if the owners of these modules could review the patches and accept or reject them. If you think you ought to review them but cannot do so in any foreseeable future, please let me know. Thanks, Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 08:02:12 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 09:02:12 +0100 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: (message from Lars Marius Garshol on 20 Feb 2001 08:48:32 +0100) References: <200102171939.MAA16669@localhost.localdomain> <20010219114827.B28553@zopatista.com> Message-ID: <200102200802.f1K82Cn01522@mira.informatik.hu-berlin.de> > | - DOMString and text manipulating interface methods are not tested beyond > | ASCII text due to an implementation limitation of ParsedXML.DOM. So, > | implementations will not be tested if text is correctly treated when > | multi-byte UTF-16 characters are involved. > > By "multi-byte UTF-16 characters" I assume you mean Unicode characters > outside the BMP that are represented using two surrogates? I rather read that as "Unicode characters outside row 0", ie. non-Latin-1 - although problems likely occur for "multibyte UTF-8 characters", i.e. non-ASCII. > But this test suite really sounds like an excellent piece of work. I definitely agree. > It would be great if we could start using it in the PyXML package, > and also if some scheme could be worked out so that both groups > could easily contribute to the package. I'm not sure it needs to be incorporated in PyXML; getting our DOM implementations to pass and then run them regularly as regression tests should be sufficient. An official feedback procedure (patch submission address, or CVS write access) would be good, though. I actually don't know how people contribute to Zope - although I could probably find out with little reasearch. Regards, Martin From jerome.marant@free.fr Tue Feb 20 08:42:56 2001 From: jerome.marant@free.fr (Jérôme Marant) Date: 20 Feb 2001 09:42:56 +0100 Subject: [XML-SIG] Preparing for 0.6.4 In-Reply-To: "Martin v. Loewis"'s message of "Tue, 20 Feb 2001 08:50:37 +0100" References: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de> Message-ID: <7z1yst219r.fsf@amboise.ird.idealx.com> "Martin v. Loewis" writes: > I'm going to release PyXML 0.6.4 later this week or early next > week. If you have any pending changes that you want to integrate, > please let me know, or commit them yourself. Yes, please. There is a missing #!/usr/bin/env python in demo/xbel/xbel2html.py Please also make sure that the right version number appears in the RE= ADME file. Thanks.=20 --=20 J=E9r=F4me Marant http://jerome.marant.free.fr From larsga@garshol.priv.no Tue Feb 20 09:00:23 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Feb 2001 10:00:23 +0100 Subject: [XML-SIG] SAX: Names with no namespace Message-ID: We had a discussion earlier about how to represent the namespace URI of names that are not in any namespace, and this discussion was never properly concluded. The alternatives seem to be None and '', and the question is which to choose. I see that the Java version of SAX has chosen '', but I think this is in large part because anything else would be very inconvenient because of the way Java and Java SAX are put together. Personally, I am leaning toward None, since that seems to me the best way to represent a missing namespace URI. That is also my only argument in favour. Does anyone else have any opinions on this? --Lars M. From ken@bitsko.slc.ut.us Tue Feb 20 14:06:29 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 20 Feb 2001 08:06:29 -0600 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Lars Marius Garshol's message of "18 Feb 2001 17:45:52 +0100" References: <200102171719.KAA07503@localhost.localdomain> Message-ID: Lars Marius Garshol writes: > | A low-level Infoset API would be interesting > > Personally I would prefer to see a nice tree-based XML API. My > personal opinion is that the DOM stinks and needs replacement. Sean > McGrath's xTree looks far better, in my opinion. Orchard[1] exposes *just* the infoset in the simplest possible way[2] (that is, an element's attributes is a mapping, contents are sequences, other attributes are simple values). Orchard's nodes differ from DOM nodes in that they have no navigation methods or attributes (firstChild, nextSibling) or DOM-special manipulation (insertBefore, replaceChild) -- depending solely on Python's standard mapping and sequence interface. Orchard also uses a (URI, LocalName) tuple for supporting XML Namespaces, instead of additional *NS methods. Like Python's DOM binding, Orchard uses normal attribute accessors instead of (or in addition to) get/set methods. Essentially the whole API (the XML node attributes for common XML nodes), in language-neutral form, less a few convenience methods like getElementsByTagName(), load(), and save(), is attached below. >From a quick re-review, Pyxie's xTree also has navigation methods (Up, Down, HasUp). I would be very interested to find out if people have a preference for navigation methods vs. using the mappings and sequences directly. Again, Orchard nodes use direct access, no navigation methods. Like Pyxie's xDispatch (and discussed here earlier[3,4]), Orchard uses node-based events/dispatch (SAX). Event handlers, pull modules, or dispatch functions all use the same node types as trees do. "But Wait!! That's not all!" :-) As a last note, the C optimization is well underway. Orchard/Mostly-C is about 3-10x faster than pure Python/Perl while still retaining attribute accessors (with overrides), garbage collection, and no problems with cycles. Current status is that we have a pure Python prototype of the Orchard APIs, and the Python binding is scheduled for early post-1.0 (as always, volunteers can change that!). We have ported Matt Sergeant's XPath step evaluator to C as an example of C optimization for higher language modules[5]. -- Ken [1] [2] [3] [4] [5] Orchard's common XML nodes: document element attribute characters -------- -------------- -------------- ---------- contents name name data root attributes value contents namespace-uri* namespace-uri* local-name* local-name* prefix* prefix* * Available when namespace processing is enabled (the default). The `contents' property of a document or element node is a list of the nodes within that document or element. The `name' of an element or attribute node is name of the element/attribute, including prefix, if any. The `root' of a document is the root element of the document. An element's `attributes' is a container indexed by the attribute's `name' property. The `value' of an attribute is the normalized, string value of the attribute. The `data' of a characters node is XML text. *** XML Namespaces If an XML document uses XML Namespaces, the following additional properties are available on element and attribute nodes. `namespace-uri' is the XML Namespace URI string. `local-name' is local-name portion of the element name (the element name without the prefix). `prefix' is the prefix portion of the element name (the element name without the local-name). The `attributes' container is indexed also by the namespace-uri/local-name pair of each attribute. When accessing documents using XML Namespaces, you should only use the namespace-uri/local-name indexes for attributes. XML Namespace processing is used by default if the document uses XML Namespaces. From uche.ogbuji@fourthought.com Tue Feb 20 14:07:25 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 20 Feb 2001 07:07:25 -0700 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: Message from Lars Marius Garshol of "20 Feb 2001 08:48:32 +0100." Message-ID: <200102201407.HAA14271@localhost.localdomain> > | - DOMString and text manipulating interface methods are not tested beyond > | ASCII text due to an implementation limitation of ParsedXML.DOM. So, > | implementations will not be tested if text is correctly treated when > | multi-byte UTF-16 characters are involved. > > By "multi-byte UTF-16 characters" I assume you mean Unicode characters > outside the BMP that are represented using two surrogates? I wonder if that's what Martijn means. I've read that most Java implementations have trouble with characters outside the BMP. I wonder if Python handles these properly. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Feb 20 14:07:57 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 20 Feb 2001 07:07:57 -0700 Subject: [XML-SIG] Preparing for 0.6.4 In-Reply-To: Message from "Martin v. Loewis" of "Tue, 20 Feb 2001 08:50:37 +0100." <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de> Message-ID: <200102201407.HAA14282@localhost.localdomain> > I'm going to release PyXML 0.6.4 later this week or early next > week. If you have any pending changes that you want to integrate, > please let me know, or commit them yourself. You probably noticed that Jeremy updated 4DOM. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Feb 20 14:32:02 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 20 Feb 2001 07:32:02 -0700 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: Message from Lars Marius Garshol of "20 Feb 2001 10:00:23 +0100." Message-ID: <200102201432.HAA14350@localhost.localdomain> > > We had a discussion earlier about how to represent the namespace URI > of names that are not in any namespace, and this discussion was never > properly concluded. I thought it was. > The alternatives seem to be None and '', and the question is which to > choose. I see that the Java version of SAX has chosen '', but I think > this is in large part because anything else would be very inconvenient > because of the way Java and Java SAX are put together. > > Personally, I am leaning toward None, since that seems to me the best > way to represent a missing namespace URI. That is also my only > argument in favour. Well, in the end I don't think there was a single dissention against "None", so I'd call it a group Pronouncement. For those looking for these threads in the archive, note that it came up twice recently. Look for the "DOM documentation update" subject line back in November/December and the "problem with empty namespace uri" subject in January. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From guido@digicool.com Tue Feb 20 14:36:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 20 Feb 2001 09:36:37 -0500 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: Your message of "Tue, 20 Feb 2001 07:07:25 MST." <200102201407.HAA14271@localhost.localdomain> References: <200102201407.HAA14271@localhost.localdomain> Message-ID: <200102201436.JAA27994@cj20424-a.reston1.va.home.com> > > | - DOMString and text manipulating interface methods are not tested beyond > > | ASCII text due to an implementation limitation of ParsedXML.DOM. So, > > | implementations will not be tested if text is correctly treated when > > | multi-byte UTF-16 characters are involved. > > > > By "multi-byte UTF-16 characters" I assume you mean Unicode characters > > outside the BMP that are represented using two surrogates? > > I wonder if that's what Martijn means. I've read that most Java > implementations have trouble with characters outside the BMP. I wonder if > Python handles these properly. Depends on what you call properly. Can you elaborate on what you would call proper treatment here? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Feb 20 14:41:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 20 Feb 2001 09:41:57 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Your message of "19 Feb 2001 23:49:03 +0100." References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> <200102192149.QAA24348@cj20424-a.reston1.va.home.com> Message-ID: <200102201441.JAA28044@cj20424-a.reston1.va.home.com> > * Guido van Rossum > | > | I'd like to drop support for URLs; I don't think the typical > | computer is sufficiently networked to make this work well. [Lars] > Dropping support for URLs not really an option when dealing with XML. > The XML recommendation states clearly that all system identifiers[1] > are URIs in XML. > > What this really means is that we have two cases to deal with: > > - XML software is provided a reference to an XML document > - XML document references as used internally by XML software and > also as passed back out to client software > > In the second case the references must be URIs, since it is a > deep-seated assumption in the entire XML family of specifications that > all such references will be URIs. This is especially clear in the case > of entity references (as Uche illustrated), but most other XML > specifications are equally clear on this point, such as the infoset, > XSLT, XBase and so on. OK, I understand. > Of course, in the first case there is no reason why it shouldn't be > allowed to pass file names into the APIs to have them converted into > URIs there. In fact, I think there is very good reason to do so, since > my experience with the Java tools that require URIs have been fairly > painful. (Who remembers the precise syntax for file URIs on all kinds > of platforms anyway?) OK. That's useful information. > Outlawing URIs, however is not really an option. OK, I also understand that. > [1] What most people would call 'references to external resources', > usually files. > > | I would suggest to have separate APIs depending on the argument > | type, e.g. p.parseFile(filename), p.parseURL(url), > | p.parseStream(InputSource), p.parseString(text). > > That may be a better option than to have a single function/method, but > that is really separate from the issue of whether to allow URIs or > not. OK, so let's focus on this then: APIs must be clear in whether they accept a URI or a filename, and not guess based on the form of the string. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Feb 20 14:51:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 20 Feb 2001 09:51:32 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: Your message of "Mon, 19 Feb 2001 15:48:04 MST." <200102192248.PAA17821@localhost.localdomain> References: <200102192248.PAA17821@localhost.localdomain> Message-ID: <200102201451.JAA28102@cj20424-a.reston1.va.home.com> [Uche] > Sorry. Accepted. :-) > Basically, it's what you do with the DOM, and especially how attributes, > system identifiers and other such creatures are interpreted. > > Basically, parseFile or parseUri in a top-level URI is typically > only a small cross-section of the usage pattern in any XML > processor. Other functions such as Stylesheet processing, > XIncludes, xml:base, RDF, and pretty much anything else, gets these > strings and are *required* to interpret these as URIs. > > If they were originally interpreted purely as files, then all the > points of confusion you pointed out are immediately compounded as > the system tries to reconcile the relative URIs against the "base > URI" which is actually a file system file. > > This is actually a problem that I have seen people run into far more > often than any worries about computers not having network > connections. I'be been sorely tempted to remove file support just > because it eliminates confusion with the large body of XML > processing that requires relative URI normalization and resolution. OK, I think I understand the issues a bit better now. When XML docs contain references to other things, they typically use (absolute or relative) URL references. I'm guessing that this means that the separator is always "/" and the parent directory is always represented by "..". Fine. But I still maintain that the API used by the application should be clear and explicit about whether it is naming a local file or a URI. Then parseFile(f) can call parseURI("file:" + f) [1] internally and parseURI can set the proper base URI. [1]: don't take this literally; reality is more complicated than tacking "file:" onto the front. On non-Unix platforms, use macurl2path.pathname2url(f) on the Mac, and nturl2path on DOS/Windows. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Feb 20 14:55:41 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 20 Feb 2001 09:55:41 -0500 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: Your message of "Tue, 20 Feb 2001 07:32:02 MST." <200102201432.HAA14350@localhost.localdomain> References: <200102201432.HAA14350@localhost.localdomain> Message-ID: <200102201455.JAA28149@cj20424-a.reston1.va.home.com> > > We had a discussion earlier about how to represent the namespace URI > > of names that are not in any namespace, and this discussion was never > > properly concluded. > > I thought it was. > > > The alternatives seem to be None and '', and the question is which to > > choose. I see that the Java version of SAX has chosen '', but I think > > this is in large part because anything else would be very inconvenient > > because of the way Java and Java SAX are put together. > > > > Personally, I am leaning toward None, since that seems to me the best > > way to represent a missing namespace URI. That is also my only > > argument in favour. > > Well, in the end I don't think there was a single dissention against "None", > so I'd call it a group Pronouncement. > > For those looking for these threads in the archive, note that it > came up twice recently. Look for the "DOM documentation update" > subject line back in November/December and the "problem with empty > namespace uri" subject in January. Which reminds me. I've been told that getAttribute() and getAttributeNS() are supposed to return "" for a non-existent attribute, and that if you want to know whether the attribute was really there, you should use getAttributeNode() etc. Again, that may be a good design for Java or IDL, but is it right for Python? I'd much rather see None used as it was intended! --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Tue Feb 20 15:47:59 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 20 Feb 2001 08:47:59 -0700 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Message from Ken MacLeod of "20 Feb 2001 08:06:29 CST." Message-ID: <200102201547.IAA14655@localhost.localdomain> > Lars Marius Garshol writes: > > > | A low-level Infoset API would be interesting > > > > Personally I would prefer to see a nice tree-based XML API. My > > personal opinion is that the DOM stinks and needs replacement. Sean > > McGrath's xTree looks far better, in my opinion. > > Orchard[1] exposes *just* the infoset in the simplest possible way[2] > (that is, an element's attributes is a mapping, contents are > sequences, other attributes are simple values). > > Orchard's nodes differ from DOM nodes in that they have no navigation > methods or attributes (firstChild, nextSibling) or DOM-special > manipulation (insertBefore, replaceChild) -- depending solely on > Python's standard mapping and sequence interface. Orchard also uses a > (URI, LocalName) tuple for supporting XML Namespaces, instead of > additional *NS methods. Like Python's DOM binding, Orchard uses > normal attribute accessors instead of (or in addition to) get/set > methods. Wow. Sounds very clean and Pythonic. I'll have to dig. > "But Wait!! That's not all!" :-) > > As a last note, the C optimization is well underway. Orchard/Mostly-C > is about 3-10x faster than pure Python/Perl while still retaining > attribute accessors (with overrides), garbage collection, and no > problems with cycles. Current status is that we have a pure Python > prototype of the Orchard APIs, and the Python binding is scheduled for > early post-1.0 (as always, volunteers can change that!). We have > ported Matt Sergeant's XPath step evaluator to C as an example of C > optimization for higher language modules[5]. How is the memory footprint? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Tue Feb 20 16:13:40 2001 From: fdrake@acm.org (Fred L. Drake) Date: Tue, 20 Feb 2001 11:13:40 -0500 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: Message-ID: On 20 Feb 2001 10:00:23 +0100 Lars Marius Garshol wrote: > We had a discussion earlier about how to represent the > namespace URI > of names that are not in any namespace, and this > discussion was never > properly concluded. Actually, I had though we *had* decided, and None was the concensus. Anyway, I still favor None. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From ken@bitsko.slc.ut.us Tue Feb 20 17:25:05 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 20 Feb 2001 11:25:05 -0600 Subject: [XML-SIG] Roadmap document - finally! In-Reply-To: Uche Ogbuji's message of "Tue, 20 Feb 2001 08:47:59 -0700" References: <200102201547.IAA14655@localhost.localdomain> Message-ID: Uche Ogbuji writes: > > "But Wait!! That's not all!" :-) > > > > As a last note, the C optimization is well underway. > > Orchard/Mostly-C is about 3-10x faster than pure Python/Perl while > > still retaining attribute accessors (with overrides), garbage > > collection, and no problems with cycles. Current status is that > > we have a pure Python prototype of the Orchard APIs, and the > > Python binding is scheduled for early post-1.0 (as always, > > volunteers can change that!). We have ported Matt Sergeant's > > XPath step evaluator to C as an example of C optimization for > > higher language modules. > > How is the memory footprint? The core runtime, liborchard.so, is 129472 bytes (i386 Linux) and requires Boehm-Demers-Weiser libgc.so, which is 74212 bytes. It also supports the expat 1.95.1 .so, but so should everyone else ;-). The data footprint is still very small because the runtime is not maintaining a lot of metainformation yet on classes. The current "fast/small" DOM is running about 8x XML file size with slots for XML Namespaces (XML Rec 159357bytes, 1246003bytes in memory). -- Ken From uche.ogbuji@fourthought.com Tue Feb 20 18:54:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 20 Feb 2001 11:54:34 -0700 Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!) In-Reply-To: Message from Guido van Rossum of "Tue, 20 Feb 2001 09:36:37 EST." <200102201436.JAA27994@cj20424-a.reston1.va.home.com> Message-ID: <200102201854.LAA15786@localhost.localdomain> > > > | - DOMString and text manipulating interface methods are not tested beyond > > > | ASCII text due to an implementation limitation of ParsedXML.DOM. So, > > > | implementations will not be tested if text is correctly treated when > > > | multi-byte UTF-16 characters are involved. > > > > > > By "multi-byte UTF-16 characters" I assume you mean Unicode characters > > > outside the BMP that are represented using two surrogates? > > > > I wonder if that's what Martijn means. I've read that most Java > > implementations have trouble with characters outside the BMP. I wonder if > > Python handles these properly. > > Depends on what you call properly. Can you elaborate on what you > would call proper treatment here? Sure. I admit it's hearsay, but I thought I'd read that because Java Unicode is or was underspecified, that there was the possibility of transposition of the high-surrogate with the low-surrogate character between Java implementations or platforms. Now I don't exactly write XML dissertations on "Hello Kitty" , so I'm not likely to run into this myself, but I was wondering whether Python handles surrogate blocks appropriately across platforms and implementations (I guess including cpyhton -> Jpython). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:38:55 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:38:55 +0100 Subject: [XML-SIG] Preparing for 0.6.4 In-Reply-To: <200102201407.HAA14282@localhost.localdomain> (message from Uche Ogbuji on Tue, 20 Feb 2001 07:07:57 -0700) References: <200102201407.HAA14282@localhost.localdomain> Message-ID: <200102201838.f1KIctu00927@mira.informatik.hu-berlin.de> > You probably noticed that Jeremy updated 4DOM. Yes, thanks indeed for that update. That was actually what initiated the release procedure :-) Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:32:08 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:32:08 +0100 Subject: [XML-SIG] Preparing for 0.6.4 In-Reply-To: <7z1yst219r.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr) References: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de> <7z1yst219r.fsf@amboise.ird.idealx.com> Message-ID: <200102201832.f1KIW8T00923@mira.informatik.hu-berlin.de> > There is a missing #!/usr/bin/env python in demo/xbel/xbel2html.py Thanks! changed in my local sandbox. > Please also make sure that the right version number appears in the > README file. That should be alright already... Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:53:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:53:47 +0100 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: (fdrake@acm.org) References: Message-ID: <200102201853.f1KIrlw00957@mira.informatik.hu-berlin.de> > Actually, I had though we *had* decided, and None was the > concensus. That is also my recollection - there is even a PEP document somewhere; you can get a copy from the archives, or from Tom Passin. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:42:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:42:16 +0100 Subject: [XML-SIG] DC DOM tests In-Reply-To: <200102201407.HAA14271@localhost.localdomain> (message from Uche Ogbuji on Tue, 20 Feb 2001 07:07:25 -0700) References: <200102201407.HAA14271@localhost.localdomain> Message-ID: <200102201842.f1KIgGU00930@mira.informatik.hu-berlin.de> > I wonder if that's what Martijn means. I've read that most Java > implementations have trouble with characters outside the BMP. I > wonder if Python handles these properly. Not sure what "properly" would be: >>> s=unichr(0xD000)+unichr(0xD800) >>> s u'\ud000\ud800' >>> len(s) 2 Do I even use them in the right order here? It can store them, and reproduce what was stored. Apart for that, it does not special-case for surrogates at all. Regards, Martin P.S. I really think Python should have used a 32-bit wide character representation instead. From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:52:14 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:52:14 +0100 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102201451.JAA28102@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Tue, 20 Feb 2001 09:51:32 -0500) References: <200102192248.PAA17821@localhost.localdomain> <200102201451.JAA28102@cj20424-a.reston1.va.home.com> Message-ID: <200102201852.f1KIqEO00955@mira.informatik.hu-berlin.de> > But I still maintain that the API used by the application should be > clear and explicit about whether it is naming a local file or a URI. I agree in principle. When it comes to changing existing API, I'd hesitate to break existing code. If such breakage is planned, it ought to be carried out rather earlier than later. The specific case in question is the parse() method in the SAX2 API (*); I'd argue it needs a PEP and/or your direct order to change it. Deprecating it in the documentation is a different matter - that still could be done after 2.1. Regards, Martin (*) in turn, minidom should see a corresponding change. From martin@loewis.home.cs.tu-berlin.de Tue Feb 20 18:47:03 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 20 Feb 2001 19:47:03 +0100 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: <200102201455.JAA28149@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Tue, 20 Feb 2001 09:55:41 -0500) References: <200102201432.HAA14350@localhost.localdomain> <200102201455.JAA28149@cj20424-a.reston1.va.home.com> Message-ID: <200102201847.f1KIl3d00953@mira.informatik.hu-berlin.de> > Which reminds me. I've been told that getAttribute() and > getAttributeNS() are supposed to return "" for a non-existent > attribute, and that if you want to know whether the attribute was > really there, you should use getAttributeNode() etc. Again, that may > be a good design for Java or IDL, but is it right for Python? I'd > much rather see None used as it was intended! I'd have to check again, but I think the current DOM spec is painfully clear about null and empty strings, and it also clear that a null string ought to be None, and an empty string ought to be "". So there is not much choice - except for developing a dislike towards the entire DOM (which I wouldn't do just because of that problem). Regards, Martin From fdrake@acm.org Tue Feb 20 19:37:50 2001 From: fdrake@acm.org (Fred L. Drake) Date: Tue, 20 Feb 2001 14:37:50 -0500 Subject: [XML-SIG] Using PyExpat.py In-Reply-To: <200102201852.f1KIqEO00955@mira.informatik.hu-berlin.de> Message-ID: "Martin v. Loewis" wrote: > it. Deprecating it in the documentation is a different > matter - that still could be done after 2.1. Actually, I'd expect to see the documentation updated as soon as possible, and any new APIs added immediately. This would allow people to migrate away from the old way as early as possible, and expose any bugs that are introduced with the changes to the code. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dieter@handshake.de Tue Feb 20 19:03:12 2001 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 20 Feb 2001 20:03:12 +0100 (CET) Subject: [XML-SIG] Preparing for 0.6.4 In-Reply-To: <321438041@toto.iv> Message-ID: <14994.49008.154170.999605@lindm.dm> --Multipart_Tue_Feb_20_20:03:12_2001-1 Content-Type: text/plain; charset=US-ASCII Martin v. Loewis writes: > I'm going to release PyXML 0.6.4 later this week or early next > week. If you have any pending changes that you want to integrate, > please let me know, or commit them yourself. I hit a strange bug in "expat.c" (still 0.6.2) last Sunday: "expat" reported "no element found". The problem only occured during parsing of an external entity. It was caused by a buffer switch inside a CDATA section (in the external entity). When "expat.c" left the CDATA, it chose "contentProcessor" as "processor" rather than "externalEntityContentProcessor". When it reached the end of the external entity, "contentProcessor" found an inconsitent state and threw the "no element found" exception. I have a patch appended. I am not sure, whether it is still necessary for 0.6.3. Dieter ---------------------------------------------------------------------- --Multipart_Tue_Feb_20_20:03:12_2001-1 Content-Type: application/octet-stream Content-Disposition: attachment; filename="xmlparse.pat" Content-Transfer-Encoding: 7bit --- :xmlparse.c Fri Sep 24 04:18:38 1999 +++ xmlparse.c Sat Feb 17 22:47:31 2001 @@ -301,6 +301,7 @@ void (*m_unknownEncodingRelease)(void *); PROLOG_STATE m_prologState; Processor *m_processor; + Processor *m_beforeCdataProcessor; enum XML_Error m_errorCode; const char *m_eventPtr; const char *m_eventEndPtr; @@ -360,6 +361,7 @@ #define ns (((Parser *)parser)->m_ns) #define prologState (((Parser *)parser)->m_prologState) #define processor (((Parser *)parser)->m_processor) +#define beforeCdataProcessor (((Parser *)parser)->m_beforeCdataProcessor) #define errorCode (((Parser *)parser)->m_errorCode) #define eventPtr (((Parser *)parser)->m_eventPtr) #define eventEndPtr (((Parser *)parser)->m_eventEndPtr) @@ -1384,6 +1386,9 @@ case XML_TOK_CDATA_SECT_OPEN: { enum XML_Error result; + + beforeCdataProcessor= processor; + if (startCdataSectionHandler) startCdataSectionHandler(handlerArg); #if 0 @@ -1731,8 +1736,8 @@ { enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr); if (start) { - processor = contentProcessor; - return contentProcessor(parser, start, end, endPtr); + /* processor = contentProcessor; */ + return processor(parser, start, end, endPtr); } return result; } @@ -1767,6 +1772,9 @@ *eventEndPP = next; switch (tok) { case XML_TOK_CDATA_SECT_CLOSE: + + processor= beforeCdataProcessor; + if (endCdataSectionHandler) endCdataSectionHandler(handlerArg); #if 0 --Multipart_Tue_Feb_20_20:03:12_2001-1-- From tpassin@home.com Wed Feb 21 01:02:56 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 20 Feb 2001 20:02:56 -0500 Subject: [XML-SIG] SAX: Names with no namespace References: <200102201853.f1KIrlw00957@mira.informatik.hu-berlin.de> Message-ID: <002301c09ba2$08727560$7cac1218@reston1.va.home.com> Martin v. Loewis wrote - > > Actually, I had though we *had* decided, and None was the > > concensus. > > That is also my recollection - there is even a PEP document somewhere; > you can get a copy from the archives, or from Tom Passin. > I don't recall that anyone actually declared that it was decided, but almost everyone who posted on this issue agreed that using "None" is the way to go. I propose that we do declare that it has been decided - Martin, are you willing to be the temporary benevolent dictator on this? Here's a copy of the draft PEP: ============================================= xmlpep-1 Values for Null Or Empty Namespace URIs 0.20 Draft Standards Track 29-Jan-2001 This PEP specifies the proper values of the Namespace URI property when its value might otherwise appear to be either "null", "None", or the empty string. Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3] These three recommendations do not appear to be in full agreement. This fact, and differences between Java and Python, has lead to some confusion and some disagreement between various implementations supported by PyXML. The language in these three Recommendations is reviewed. The recommendation is made to use None as the URI value in all cases where no URI applies to an element or attribute. The XMLPEP, when approved, will apply to all namespace-aware software maintained by the pyxml interest group. When no namespace has been declared whose scope applies to a particular element or attribute, the application MUST report the URI of the namespace of the element or attribute as None. When there is no namespace prefix, the application MUST report the value of the prefix as None. This requirement does not apply for applications that are not namespace-aware. This requirement applies to all XML processing software maintained by the PyXML interest group. This PEP is needed because of continued uncertainty among varous PyXML developers as to the proper values to use, and because of inconsistency among various PyXML products. Differences between Python, IDL, and Java make an unambiguous interpretation unclear. A definitive and consistent treatment is needed so that all the PyXML software may be made consistent. The Namespaces Recommendation recognizes that a namespace URI may be given no value - called "empty" in the Recommendation - even though a structure for a URI is provided in the document. Two relevant passages are quoted here: Section 2. ... [Definition:] If the attribute name matches DefaultAttName, then the namespace name in the attribute value is that of the default namespace in the scope of the element to which the declaration is attached. In such a default declaration, the attribute value may be empty. 5.2 Namespace Defaulting A default namespace is considered to apply to the element where it is declared (if that element has no namespace prefix), and to all elements with no prefix within the content of that element. If the URI reference in a default namespace declaration is empty, then unprefixed elements in the scope of the declaration are not considered to be in any namespace. Note that default namespaces do not apply directly to attributes. ...The default namespace can be set to the empty string. This has the same effect, within the scope of the declaration, of there being no default namespace. The term "empty" is not defined further, but in the context of the Recommendation, it must mean a missing string value. The last fragment quoted above suggests, but does not require, that an empty string may be returned for an "empty" URI value. This has no direct applicability to values returned by implemenations, since 1) the word "can" is used, rather than "must", and 2) the Recommendation seems to apply to XML documents, not to implementations. The W3C DOM Level 2 Recommendation refers to "null" namespaces in several places. The thrust is clear and consistent: a "null" value is to be used to indicate a non-existent namespace URI value. Here are some relevant extracts from the Recommendation: Note that because the DOM does no lexical checking, the empty string will be treated as a real namespace URI in DOM Level 2 methods. Applications must use the value null as the namespaceURI parameter for methods if they wish to have no namespace. The IDL definition for the createAttributeNS() method creates an attribute with these characteristics: A new Attr object with the following attributes: Attribute Value Node.nodeName qualifiedName Node.namespaceURI namespaceURI Node.prefix prefix, extracted from qualifiedName, or null if there is no prefix Node.localName local name, extracted from qualifiedName Attr.name qualifiedName Node.nodeValue the empty string For the older, non-NS aware createAttribute() method, the Recommendation says ...localName, prefix, and namespaceURI set to null. This is typical - a "null" is returned of there is no prefix or URI. It is clear that the IDL specifies the use of "null" for empty namespaces, rather that the empty string. The java binding does not specify any particular way value. Thus there seems to be nothing the the DOM Recommendation that suggests that empty strings should be used, and there is clear language that "null" values should be used. The SAX2 java API clearly says that an empty string is to be returned. The following extracts demonstrate this: In SAX2, the startElement and endElement callbacks in a content handler look like this: public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException; public void endElement (String uri, String localName, String qName) throws SAXException; By default, an XML reader will report a Namespace URI and a local name for every element, in both the start and end handler. Consider the following example: With the default SAX2 Namespace processing, the XML reader would report a start and end element event with the Namespace URI "http://www.w3.org/1999/xhtml" and the local name "hr". The XML reader might also report the original qName "html:hr", but that parameter might simply be an empty string. If namespaces is true and namespace-prefixes is true, then a SAX2 XML reader will report the following: an element with the Namespace URI "http://www.greeting.com/ns/", the local name "hello", and the qName "h:hello"; an attribute with no Namespace URI (empty string), no local name (empty string), and the qName "xmlns:h"; an attribute with no Namespace URI (empty string), the local name "id", and the qName "id"; and an attribute with the Namespace URI "http://www.greeting.com/ns/", the local name "person", and the qName "h:person". To summarize, the Namespace Recommendation is essentially silent on the subject, the DOM clearly specifies "null" values, and SAX2 clearly specifies the use of empty strings. The "highest" level Recommendation is presumably the DOM. Python offers a data object similar to "null" - the None object. The None object can be tested for exactly as for an empty string: if uri: doYourThing() Alternatively, None can be tested for explicitly, as in: if uri is not None: doYourThing() Thus, None is flexible enough to be useful for this purpose. Many posts to the PyXML list have favored the use of None, although not all. Either None or the empty string would seem to work in this context. "None" agrees with the DOM Recommendation, and would seem (in a mnemonic sense)to suggest the absence of a prefix or URI. The 4DOM code will handle a None URI correctly in many places, since it uses tests like this typical example: if namespaceURI and namespaceURI != XML_NAMESPACE: # ... This code works correctly if the namespaceURI is None. Another test used in 4DOM is as follows: def getElementsByTagNameNS(self,namespaceURI,localName): root = self.documentElement if root == None: return implementation.createNodeList([]) py = root.getElementsByTagNameNS(namespaceURI,localName) if namespaceURI == '*' or namespaceURI == root.namespaceURI: if localName == '*' or localName == root.localName: py.insert(0,root) return py The expression "namespaceURI == '*'" also evaluates correctly when the URI is None. If handling code is consistent throughout 4DOM, then it will handle None correctly. [Need material here] [Should there be a reference here to one particular processor, such as xmlproc?] This PEP may be used by anyone. From jeremy.kloth@fourthought.com Wed Feb 21 01:17:23 2001 From: jeremy.kloth@fourthought.com (Jeremy J Kloth) Date: Tue, 20 Feb 2001 18:17:23 -0700 Subject: [XML-SIG] 4DOM and PyXML Message-ID: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com> The code for 4DOM now exists solely in the PyXML CVS tree. This should prevent any future feature clashes. Happy hacking... -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Wed Feb 21 15:01:58 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 21 Feb 2001 09:01:58 -0600 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: "Martin v. Loewis"'s message of "Tue, 20 Feb 2001 19:47:03 +0100" References: <200102201432.HAA14350@localhost.localdomain> <200102201455.JAA28149@cj20424-a.reston1.va.home.com> <200102201847.f1KIl3d00953@mira.informatik.hu-berlin.de> Message-ID: "Martin v. Loewis" writes: > [Guido van Rossum writes:] > > Which reminds me. I've been told that getAttribute() and > > getAttributeNS() are supposed to return "" for a non-existent > > attribute, and that if you want to know whether the attribute was > > really there, you should use getAttributeNode() etc. Again, that > > may be a good design for Java or IDL, but is it right for Python? > > I'd much rather see None used as it was intended! > > I'd have to check again, but I think the current DOM spec is > painfully clear about null and empty strings, and it also clear that > a null string ought to be None, and an empty string ought to be > "". So there is not much choice - except for developing a dislike > towards the entire DOM (which I wouldn't do just because of that > problem). Yes, the only place I see "" should be returned is in the two methods getAttribute() and getAttributeNS(): "The Attr value as a string, or the empty string if that attribute does not have a specified or default value". That does seem odd, and unfortunate, but this would be one of the little places I'd rather adhere to the spec and not have any Python-specific documentation to the contrary, rather than note the difference and emphasize it wherever it might be an issue. -- Ken From martin@loewis.home.cs.tu-berlin.de Wed Feb 21 09:18:36 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 21 Feb 2001 10:18:36 +0100 Subject: [XML-SIG] 4DOM and PyXML In-Reply-To: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com> (jeremy.kloth@fourthought.com) References: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com> Message-ID: <200102210918.f1L9IaQ01213@mira.informatik.hu-berlin.de> > The code for 4DOM now exists solely in the PyXML CVS tree. > This should prevent any future feature clashes. Thanks a lot. This should reduce the troubles of distributors (primarily of Linux distributions) where 4Suite and PyXML had an overlap of files. Regards, Martin From larsga@garshol.priv.no Wed Feb 21 14:09:55 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Feb 2001 15:09:55 +0100 Subject: [XML-SIG] SAX: Names with no namespace In-Reply-To: References: Message-ID: * Fred L. Drake | | Actually, I had though we *had* decided, and None was the concensus. Then I bow to those with better memories than mine. :-) --Lars M. From paulp@ActiveState.com Wed Feb 21 21:50:44 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 21 Feb 2001 13:50:44 -0800 Subject: [XML-SIG] Encoding autodetection Message-ID: <3A943834.3C0473C@ActiveState.com> Is there Python code around to do the encoding autodetection? I started to write it and then thought I would check first... -- Vote for Your Favorite Python & Perl Programming Accomplishments in the first Active Awards! http://www.ActiveState.com/Awards From martin@loewis.home.cs.tu-berlin.de Wed Feb 21 22:02:01 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 21 Feb 2001 23:02:01 +0100 Subject: [XML-SIG] Encoding autodetection In-Reply-To: <3A943834.3C0473C@ActiveState.com> (message from Paul Prescod on Wed, 21 Feb 2001 13:50:44 -0800) References: <3A943834.3C0473C@ActiveState.com> Message-ID: <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> > Is there Python code around to do the encoding autodetection? I started > to write it and then thought I would check first... Not that I know of. Regards, Martin From stefan.marsiske@sysdata.siemens.hu Thu Feb 22 10:11:07 2001 From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244) Date: Thu, 22 Feb 2001 11:11:07 +0100 Subject: [XML-SIG] cloning nodes Message-ID: <20010222111107.O14235@sysdata.siemens.hu> hi all, once again i ran into a problem, this maybe my fault, or a bug (which alread may have been fixed), anyhow here it is: when i want to clone a dom node (and all its subnodes), the cloned node doesn't contain the attributes for elements. it seems to me that cloneNode doesn't clone Attribute type nodes. i'm using 4suite 0.10.1, is this a bug which is fixed in 0.10.2 or am i missing something? ciao -- Stefan [http://web.interware.hu/stef] UPDATED:001031 quote: "happy(y2k++)" gpg-key: http://web.interware.hu/stef/gpg.txt From uche.ogbuji@fourthought.com Thu Feb 22 15:17:31 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Feb 2001 08:17:31 -0700 Subject: [XML-SIG] cloning nodes In-Reply-To: Message from Marsiske Stefan - 3244 of "Thu, 22 Feb 2001 11:11:07 +0100." <20010222111107.O14235@sysdata.siemens.hu> Message-ID: <200102221517.IAA01939@localhost.localdomain> > hi all, > > once again i ran into a problem, this maybe my fault, or a bug (which alread > may have been fixed), anyhow here it is: > > when i want to clone a dom node (and all its subnodes), the cloned node > doesn't contain the attributes for elements. it seems to me that cloneNode > doesn't clone Attribute type nodes. > > i'm using 4suite 0.10.1, is this a bug which is fixed in 0.10.2 or am i > missing something? Yes. It's a bug in 0.10.1 that was fixed in 0.10.2. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paulp@ActiveState.com Sat Feb 24 00:04:37 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 23 Feb 2001 16:04:37 -0800 Subject: [XML-SIG] Encoding autodetection References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> Message-ID: <3A96FA95.93EB1888@ActiveState.com> "Martin v. Loewis" wrote: > > > Is there Python code around to do the encoding autodetection? I started > > to write it and then thought I would check first... > > Not that I know of. Thanks anyways. I've written the code now. Would it be useful to anyone else out there? -- Vote for Your Favorite Python & Perl Programming Accomplishments in the first Active Awards! http://www.ActiveState.com/Awards From larsga@garshol.priv.no Sat Feb 24 10:05:58 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Feb 2001 11:05:58 +0100 Subject: [XML-SIG] Encoding autodetection In-Reply-To: <3A96FA95.93EB1888@ActiveState.com> References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> <3A96FA95.93EB1888@ActiveState.com> Message-ID: * Paul Prescod | | Thanks anyways. I've written the code now. Would it be useful to | anyone else out there? xmlproc could use it. When the Unicode support is added it will need to do the same thing. I guess it could also be useful as a utility in some cases, such as in a web server. --Lars M. From paulp@ActiveState.com Sat Feb 24 19:36:13 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Sat, 24 Feb 2001 11:36:13 -0800 Subject: [XML-SIG] Encoding autodetection References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> <3A96FA95.93EB1888@ActiveState.com> Message-ID: <3A980D2D.ACA5F980@ActiveState.com> Lars Marius Garshol wrote: > > * Paul Prescod > | > | Thanks anyways. I've written the code now. Would it be useful to > | anyone else out there? > > xmlproc could use it. When the Unicode support is added it will need > to do the same thing. Yeah, that's where I looked first. > I guess it could also be useful as a utility in some cases, such as in > a web server. I'll include it here for the record. If anyone wants to do anything with it they can. It is hereby in the public domain. In response to a question I got privately: it will detect any encoding that has a reasonable resemblence to an ASCII superset (e.g. UTF-8, ISO 8859-*, Shift-JIS) or to a 2 byte Unicode encoding (big or little endian, with or without BOM). EBCDIC and 4-byte encodings are not tested. import codecs, encodings """Komodo will hand this library a buffer and ask it to either convert it or auto-detect the type.""" # None represents a potentially variable byte. "##" in the XML spec... autodetect_dict={ # bytepattern : ("name", (0x00, 0x00, 0xFE, 0xFF) : ("ucs4_be"), (0xFF, 0xFE, 0x00, 0x00) : ("ucs4_le"), (0xFE, 0xFF, None, None) : ("utf_16_be"), (0xFF, 0xFE, None, None) : ("utf_16_le"), (0x00, 0x3C, 0x00, 0x3F) : ("utf_16_be"), (0x3C, 0x00, 0x3F, 0x00) : ("utf_16_le"), (0x3C, 0x3F, 0x78, 0x6D): ("utf_8"), (0x4C, 0x6F, 0xA7, 0x94): ("EBCDIC") } def autoDetectXMLEncoding(buffer): """ buffer -> encoding_name The buffer should be at least 4 bytes long. Returns None if encoding cannot be detected. Note that encoding_name might not have an installed decoder (e.g. EBCDIC or Shift-JIS) """ # a more efficient implementation would not decode the whole # buffer at once but otherwise we'd have to decode a character at # a time looking for the quote character...that's a pain encoding = "utf_8" # according to the XML spec, this is the default # this code successively tries to refine the default # whenever it fails to refine, it falls back to the last place # encoding was set. bytes = (byte1, byte2, byte3, byte4) = tuple(map(ord, buffer[0:4])) enc_info = autodetect_dict.get(bytes, None) if not enc_info: # try autodetection again removing potentially variable bytes bytes = (byte1, byte2, None, None) enc_info = autodetect_dict.get(bytes) if enc_info: encoding = enc_info # we've got a guess... these are #the new defaults # try to find a more precise encoding using xml declaration secret_decoder_ring = codecs.lookup(encoding)[1] (decoded,length) = secret_decoder_ring(buffer) first_line = decoded.split("\n")[0] if first_line and first_line.startswith(u"-1: quote_char,rest=(first_line[quote_pos], first_line[quote_pos+1:]) encoding=rest[:rest.find(quote_char)] return encoding ##### Testing code big_teststrs = (u"\u2222", u'\u2222') big_encodings = [ #name BOM prefix ("utf-16" , None), # this one already has a BOM prefix ("utf-8" , None), ("utf-16-le", None), ("utf-16-be", None), ("utf-16-le", codecs.BOM_LE), ("utf-16-be", codecs.BOM_BE), ("MBCS" , None)] little_teststrs = (u"q", u'q') little_encodings = [ ("ASCII" , None), ("Latin-1" , None), ("ISO 8859-1", None)] default_teststrs = ("%s", "%s", '%s') xml_default_encodings = [ ("utf_8" , None), ("utf_16_le", codecs.BOM_LE), ("utf_16_be", codecs.BOM_BE)] def _assertSame(expr1,expr2): if expr1 != expr2: raise AssertionError, (expr1, "!=", expr2) def testDetect(teststrs, test_encodings): for (encoding, bom) in test_encodings: for teststr in teststrs: data = (teststr % encoding).encode(encoding) if bom: data = bom + data _assertSame(autoDetectXMLEncoding(data), encoding) def test(): teststr=u"\u2222\u2323\u4343" testDetect(big_teststrs, big_encodings) testDetect(little_teststrs, little_encodings) testDetect(default_teststrs, xml_default_encodings) if __name__=="__main__": test() print "All tests succeeded" -- Vote for Your Favorite Python & Perl Programming Accomplishments in the first Active Awards! http://www.ActiveState.com/Awards From guenter.radestock@sap.com Sun Feb 25 18:25:44 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Sun, 25 Feb 2001 19:25:44 +0100 Subject: [XML-SIG] Whitespace handling in XMLWriter Message-ID: I am using xml.sax.writer to produce nicely indentex XML output from Python. The xmlwriter is pretty good at indentation and formatting, but for some tags, I would like to have whitespace preserved. I did not see a way to tell this via the doctype info. Right now I am using the following: class OutputWriter: def __init__(self, fo=sys.stdout): self.fo = fo self.containers = [] self.docinfo = xml.sax.writer.XMLDoctypeInfo() saxout = self.saxout = xml.sax.writer.PrettyPrinter( self.fo, dtdinfo=self.docinfo, endtagindentation=-2) # put a print here to see how (slow) output is generated. # there should not be a visible delay between the message # printed and the log output of the http server. #print '### starting xml output' saxout.startDocument() def pcdata_tag(self, name, s): s = '%s' % s self.saxout.startElement(name) self.saxout.characters(s, 0, len(s)) self.saxout.endElement(name) def start_tag(self, name): if not name in self.containers: # needed to make pretty printing work (the pretty # printer needs to know where whitespace is allowed # in the output) self.containers.append(name) self.docinfo.add_element_container(name) self.saxout.startElement(name, {}) def end_tag(self, name): self.saxout.endElement(name) def comment(self, text): text = ' ' + text + ' ' self.saxout.comment(text, 0, len(text)) def close(self): self.saxout.endDocument() self.fo.flush() This works the way I want only for short content (there is no whitespace inserted before and after). Passing longer strings, possibly with whitespace, to pcdata_tag will reformat, changing the internal and external whitespace contained in my text. Is there a way to do this with the current xmlwriter or is this missing right now? - Guenter From guenter.radestock@sap.com Sun Feb 25 19:23:13 2001 From: guenter.radestock@sap.com (Radestock, Guenter) Date: Sun, 25 Feb 2001 20:23:13 +0100 Subject: [XML-SIG] Whitespace handling in XMLWriter Message-ID: > -----Original Message----- > From: Radestock, Guenter > Sent: Sonntag, 25. Februar 2001 19:26 > To: 'XML-SIG@python.org' > Subject: [XML-SIG] Whitespace handling in XMLWriter > > > I am using xml.sax.writer to produce nicely indentex XML > output from Python. > The xmlwriter is pretty good at indentation and formatting, > but for some > tags, I would like to have whitespace preserved. I did not see a way > to tell this via the doctype info. > diving in a little deeper I have found that when I call self.docinfo.add_attribute_defn(tagname, 'xml:space', None, None, 'preserve') just before opening the tag tagname, it will preserve the whitespace the way I want it to. Now two little problems remain: 1. the tag itself will not be indented. The place the tag is put into the output should not have anything to do with how its content is formatted? 2. the pretty printer does not behave properly when formatting empty tags (a linefeed is missing after the empty tag). You can see both in the output fragment below: hfk107_6_1010_1010140254.nitf 0.00093607098097 de dpa_german8de - Guenter From loewis@informatik.hu-berlin.de Mon Feb 26 08:41:50 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 26 Feb 2001 09:41:50 +0100 (MET) Subject: [XML-SIG] PyXML 0.6.4 is released Message-ID: <200102260841.JAA09898@pandora> Version 0.6.4 of the Python/XML distribution is now available. It should be considered a beta release, and can be downloaded from the following URLs: http://download.sourceforge.net/pyxml/PyXML-0.6.4.tar.gz http://download.sourceforge.net/pyxml/PyXML-0.6.4.win32-py1.5.exe http://download.sourceforge.net/pyxml/PyXML-0.6.4.win32-py2.0.exe http://download.sourceforge.net/pyxml/PyXML-0.6.4-1.5.2.i386.rpm http://download.sourceforge.net/pyxml/PyXML-0.6.4-2.0.i386.rpm Changes in this version, compared to 0.6.3: * 4DOM was integrated from 4Suite 0.10.2. 4DOM is now maintained as a part of PyXML. A detailed list of changes can be found in xml/dom/ChangeLog. * minidom now supports the standard methods isSameNode and hasAttributes, and the extension toprettyxml. A number of bugs have been fixed * A DOM implementation registration is now available (functions getDOMImplementation and registerDOMImplementation in xml.dom). * If expat 1.95.x is available on the system, this is used instead of the included expat copy; it will then offer additional handlers. * A pyexpat parser can now return the attributes ordered, and restrict the attribute list to the specified attributes. * The xmllib SAX1 driver now generates Unicode strings in Python 2. * The xml.unicode emulation was extended to support bidirectional conversion, and to support a few more aliases. The Python/XML distribution contains the basic tools required for processing XML data using the Python programming language, assembled into one easy-to-install package. The distribution includes parsers and standard interfaces such as SAX and DOM, along with various other useful modules. =20 The package currently contains: * XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius Garshol), sgmlop (Fredrik Lundh). * SAX interface (Lars Marius Garshol) * minidom DOM implementation (Paul Prescod) * 4DOM from Fourthought (Uche Ogbuji, Mike Olson) * Various utility modules and functions (various people) * Documentation and example programs (various people) The code is being developed bazaar-style by contributors from the Python XML Special Interest Group, so please send comments, questions, or bug reports to . For more information about Python and XML, see: http://www.python.org/topics/xml/ --=20 Martin v. L=F6wis http://www.informatik.hu-berlin.de/~loewis From stefan.marsiske@sysdata.siemens.hu Mon Feb 26 10:44:42 2001 From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244) Date: Mon, 26 Feb 2001 11:44:42 +0100 Subject: [XML-SIG] 4Suite installation problems Message-ID: <20010226114442.A14235@sysdata.siemens.hu> hi all, yesterday i decided to upgrade to 4Suite-0.10.2 at home. at work here (solaris) i've been using it since the release, and it's very nice. just one thing: in XHtmlPrint theres one line which has to be changed: on line 12 in python2.0/site-packages/_xmlplus/dom/ext/XHtmlPrinter.py self.notations = doctype and doctype.notation or [] needs an "s" after notation... but! on my home system (linux) i've had a lot of trouble, the installation script didn't update a lot files, and so a lot of missing functions/attributes where the result. somehow 4Suite and PyXML together seem to screw up. i needed to copy a lot of files by hand. and i played a lot with the PyXML-0.6.3 installation, the 4Suite installation, and the PyXML tree included with 4Suite, but in the end i worked it out. unfortunately i didn't document this, so i can't really tell you what, and why went wrong. but i can remember one error: it said that DOMImplementation is missing _4dom_importfile() function or something similar, so i found out which package is carrying this particular implementation and copied it by hand. so is my system screwed, or the installation procedure? or do 4Suite and PyXML clash? bye -- Stefan [http://web.interware.hu/stef] UPDATED:001031 quote: "happy(y2k++)" gpg-key: http://web.interware.hu/stef/gpg.txt From Alexandre.Fayolle@logilab.fr Mon Feb 26 10:56:56 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 26 Feb 2001 11:56:56 +0100 (CET) Subject: [XML-SIG] 4Suite installation problems In-Reply-To: <20010226114442.A14235@sysdata.siemens.hu> Message-ID: On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote: > but! on my home system (linux) i've had a lot of trouble, the installation > script didn't update a lot files, and so a lot of missing functions/attributes > where the result. The 4Suite guys are the one who'd really be able to answer your question, but in the meantime, you may want to use 'python setupt.py install -f' to force the overwriting of all the files. I had a similar problem, and the -f option was helpful. In the last resort, maybe manually erasing site-packages/Ft, site-packages/xml or site-packages/_xmlplus and running setup.py install could solve your problem. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From stefan.marsiske@sysdata.siemens.hu Mon Feb 26 10:56:56 2001 From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244) Date: Mon, 26 Feb 2001 11:56:56 +0100 Subject: [XML-SIG] 4Suite installation problems In-Reply-To: ; from Alexandre.Fayolle@logilab.fr on Mon, Feb 26, 2001 at 11:56:56AM +0100 References: <20010226114442.A14235@sysdata.siemens.hu> Message-ID: <20010226115656.D14235@sysdata.siemens.hu> On Mon, Feb 26, 2001 at 11:56:56AM +0100, Alexandre Fayolle wrote: > On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote: > > > but! on my home system (linux) i've had a lot of trouble, the installation > > script didn't update a lot files, and so a lot of missing functions/attributes > > where the result. > > The 4Suite guys are the one who'd really be able to answer your question, > but in the meantime, you may want to use 'python setupt.py install -f' to > force the overwriting of all the files. I had a similar problem, and the > -f option was helpful. i tried the -f and it didn't work... > In the last resort, maybe manually erasing site-packages/Ft, > site-packages/xml or site-packages/_xmlplus and running setup.py install > could solve your problem. maybe, but i tracked down the missing sources by hand and copied them. actually the problem is solved, i just sent this here, so others don't have to struggle that much... ---end quoted text--- -- Stefan [http://web.interware.hu/stef] UPDATED:001031 quote: "happy(y2k++)" gpg-key: http://web.interware.hu/stef/gpg.txt From uche.ogbuji@fourthought.com Mon Feb 26 13:47:20 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 26 Feb 2001 06:47:20 -0700 Subject: [XML-SIG] 4Suite installation problems In-Reply-To: Message from Marsiske Stefan - 3244 of "Mon, 26 Feb 2001 11:56:56 +0100." <20010226115656.D14235@sysdata.siemens.hu> Message-ID: <200102261347.GAA20296@localhost.localdomain> > On Mon, Feb 26, 2001 at 11:56:56AM +0100, Alexandre Fayolle wrote: > > On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote: > > > > > but! on my home system (linux) i've had a lot of trouble, the installation > > > script didn't update a lot files, and so a lot of missing functions/attributes > > > where the result. > > > > The 4Suite guys are the one who'd really be able to answer your question, > > but in the meantime, you may want to use 'python setupt.py install -f' to > > force the overwriting of all the files. I had a similar problem, and the > > -f option was helpful. > > i tried the -f and it didn't work... > > > In the last resort, maybe manually erasing site-packages/Ft, > > site-packages/ml or site-packages/_xmlplus and running setup.py install > > could solve your problem. > maybe, but i tracked down the missing sources by hand and copied them. > actually the problem is solved, i just sent this here, so others don't have > to struggle that much... Any clash between 4Suite 0.10.2 and PyXML 0.6.3 is a bug. You said the problem is fixed, but if you have any more specifics, it would be great if you posted them here. The problems you originally posted looked like straightforward "-f needed" problems, but you said this didn't work for you. I should note that I'm not sure that 4Suite 0.10.2 and PyXML 0.6.4 won't clash. Most likely, one would end up with the older revision 4DOM from 0.10.2, rather than the updated revision in PyXML 0.6.4 that contains some bug-fixes. The next 4Suite release will only use the DOM in PyXML. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mclay@nist.gov Mon Feb 26 02:25:17 2001 From: mclay@nist.gov (Michael McLay) Date: Sun, 25 Feb 2001 21:25:17 -0500 Subject: [XML-SIG] Version number question on PyXML 0.6.4 In-Reply-To: <200102260841.JAA09898@pandora> References: <200102260841.JAA09898@pandora> Message-ID: <01022521251706.28858@fermi.eeel.nist.gov> On Monday 26 February 2001 03:41, Martin von Loewis wrote: > Version 0.6.4 of the Python/XML distribution is now available. It > should be considered a beta release, and can be downloaded from > the following URLs: I'm begining to think someone from the Enlightenment window manager project has been given control of the version numbering for PyXML. Version numbers are arbitrary, but some people will mistakenly read the low number on PyXML as an inidcation of unstable and immature software. Based on the improved level of integration of this latest release the version number should have at least been bumped to a 0.7.0 release number. What needs to be added/finished before the number can be bumped to 1.0? From Alexandre.Fayolle@logilab.fr Mon Feb 26 15:35:08 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 26 Feb 2001 16:35:08 +0100 (CET) Subject: [XML-SIG] How to build a DOM from an HTML file? Message-ID: Hello, I'm trying to parse HTML documents into DOMs, using the 4DOM version that comes with 4Suite 0.10.2 I first tried xml.dom.ext.reader.HtmlSax.HtmlDomGenerator with a xml.dom.ext.reader.Sax.Reader but it seems to be broken (see bug #404072). Then I tried xml.dom.ext.reader.HtmlLib.FromHmlUrl which uses the Sgmlop parser. However, this parser looks only partially implemented (it chokes on doctype directives, for example, which means that pages which probably contain the most valid HTML won't be parsed). What is the current prefered way to do this ? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From nobody@sourceforge.net Mon Feb 26 12:27:12 2001 From: nobody@sourceforge.net (nobody) Date: Mon, 26 Feb 2001 04:27:12 -0800 Subject: [XML-SIG] [ pyxml-Bugs-404272 ] HtmlDomGenerator constructor bug Message-ID: Artifact #404272, was updated on 2001-02-26 04:27 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=404272&group_id=6473 Category: 4Suite Group: None Status: Open Priority: 5 Submitted By: Alexandre Fayolle Assigned to: Nobody/Anonymous Summary: HtmlDomGenerator constructor bug Initial Comment: The HtmlDomGenerator's constructor expects 2 arguments, an owner document and a keepAllWs flag, whereas all other similar classes only expect the keepAllWs flag. The result is that when the constructor is invoked by the Reader class, the flag is passed as the owner document, which in turn deeply pertubates the constructor: >>> from xml.dom.ext.reader.Sax import Reader >>> from xml.dom.ext.reader.HtmlSax import HtmlDomGenerator >>> r = Reader(saxHandlerClass = HtmlDomGenerator) Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax.py", line 124, in __init__ self.handler = saxHandlerClass(keepAllWs) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/HtmlSax.py", line 39, in __init__ self._rootNode = self._ownerDoc.createDocumentFragment() AttributeError: 'int' object has no attribute 'createDocumentFragment' ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=404272&group_id=6473 From akuchlin@mems-exchange.org Tue Feb 27 15:11:19 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Feb 2001 10:11:19 -0500 Subject: [XML-SIG] Maintaining catalogs Message-ID: For a project, I'd like to install a DTD on the system and automatically add its public identifier to the catalog. Is there a standard place to put SGML/XML catalogs on Unix systems? /usr/(local)?/lib/sgml? /etc/sgml/? --amk From gregor@hoffleit.de Tue Feb 27 15:24:34 2001 From: gregor@hoffleit.de (Gregor Hoffleit) Date: Tue, 27 Feb 2001 16:24:34 +0100 Subject: [XML-SIG] Maintaining catalogs In-Reply-To: ; from akuchlin@mems-exchange.org on Tue, Feb 27, 2001 at 10:11:19AM -0500 References: Message-ID: <20010227162434.C20349@mediasupervision.de> On Tue, Feb 27, 2001 at 10:11:19AM -0500, Andrew Kuchling wrote: > For a project, I'd like to install a DTD on the system and > automatically add its public identifier to the catalog. Is there a > standard place to put SGML/XML catalogs on Unix systems? > /usr/(local)?/lib/sgml? /etc/sgml/? Debian has a package sgml-base that sets up some infrastructure for managing SGML files. All SGML description files live in /usr/lib/sgml. The catalog file is /etc/sgml.catalog, /usr/lib/sgml/catalog is a symlink pointing to the real file /etc/sgml.catalog. sgml-base contains a tool install-sgmlcatalog that's used to add and remove entries to the catalog file. The README (see below) contains an example how that's supposed to be done. Gregor Guidelines for SGML packages ============================ Package dependencies -------------------- All SGML packages that provide a DTD or entity description file have to depend on "sgml-base". This package installs the "install-sgmlcatalog" script and provides the necessary directory structure. The SGML Description Files -------------------------- The location of SGML description files (DTD's, entities, etc.) is /usr/lib/sgml . All DTD's should be installed in /usr/lib/sgml/dtd , all entity description files should go into /usr/lib/sgml/entities . The SGML Catalog ---------------- The SGML catalog file is /etc/sgml.catalog , but should be refered to through the symbolic link /usr/lib/sgml/catalog . Furthermore, all path specifications given in the SGML catalog have to be relativ to /usr/lib/sgml . Please don't modify the SGML catalog directly in the postinst/postrm scripts of your package--you should use the install-sgmlcatalog script for that. Here is a simple example: Consider the package "foo" which provides the DTD foo.dtd and an entity description file "foo-general". The package will install the following files: /usr/lib/sgml/dtd/foo.dtd /usr/lib/sgml/entities/foo-general /usr/lib/foo/sgml.catalog The sgml.catalog file will look like this: DOCTYPE foodoc dtd/foo.dtd ENTITY %foo-general entities/foo-general That's the postinst script: #!/bin/sh install-sgmlcatalog --install /usr/lib/foo/sgml.catalog foo and the postrm script: #!/bin/sh install-sgmlcatalog --remove foo Please check the install-sgmlcatalog(8) manpage for details. Feedback -------- Please send me an email for bugs/suggestions/critics on these guidelines. -- May 8, 1997 Christian Schwarz From akuchlin@mems-exchange.org Tue Feb 27 15:31:50 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Feb 2001 10:31:50 -0500 Subject: [XML-SIG] Maintaining catalogs In-Reply-To: <20010227162434.C20349@mediasupervision.de>; from gregor@mediasupervision.de on Tue, Feb 27, 2001 at 04:24:34PM +0100 References: <20010227162434.C20349@mediasupervision.de> Message-ID: <20010227103150.B17362@ute.cnri.reston.va.us> On Tue, Feb 27, 2001 at 04:24:34PM +0100, Gregor Hoffleit wrote: >sgml-base contains a tool install-sgmlcatalog that's used to add and remove >entries to the catalog file. The README (see below) contains an example how >that's supposed to be done. Redhat 7.0 has something similar, though annoyingly the script is called install-catalog instead. --amk From akuchlin@mems-exchange.org Tue Feb 27 16:23:42 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Feb 2001 11:23:42 -0500 Subject: [XML-SIG] DTD design: include categorization, or use RDF? Message-ID: I'm revisiting and extending my quotation DTD this week, hence my suddenly asking a bunch of questions here. I'm wondering about categorization. A common application would be to group quotations into categories. I can add a category element or attribute, but then someone comes along who wants to sort quotes by newsgroup, or by date, or by some other wacky thing. I can invent a general syntax, but that's just reinventing RDF badly, so RDF seems like the obvious course. Question: is it better to embed RDF annotations in a single file, or to encourage maintaining an RDF index in a separate file, as a gloss on the original file. In other words, I'm wondering about: Author's Name ... versus: ... and in some other file have: Author's Name The first form has only one file, but I'm wondering if it will complicate the task of modifying the file programmatically too much. (I'd really like to write a Tkinter program for maintaining a collection, which means that the data will have to be round-tripped from XML to Python objects and back again. Hopefully people here will have application experience doing this sort of thing.) --amk From tpassin@home.com Tue Feb 27 22:25:35 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 27 Feb 2001 17:25:35 -0500 Subject: [XML-SIG] DTD design: include categorization, or use RDF? References: Message-ID: <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com> Andrew Kuchling asks > Question: is it better to embed RDF annotations in a single file, or > to encourage maintaining an RDF index in a separate file, as a gloss > on the original file. In other words, I'm wondering about: > I encourage you to use a separate file. This is because we're going to want more tools for working with third-party data, I think, and you will be furthering that if you choose to use a separate file. It really depends on whether you see the annotations as being something separate, and if you might like to apply the same idea to some other data that you don't control. Cheers, Tom P From akuchlin@mems-exchange.org Tue Feb 27 23:03:44 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 27 Feb 2001 18:03:44 -0500 Subject: [XML-SIG] DTD design: include categorization, or use RDF? In-Reply-To: <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com>; from tpassin@home.com on Tue, Feb 27, 2001 at 05:25:35PM -0500 References: <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com> Message-ID: <20010227180344.D15343@ute.cnri.reston.va.us> On Tue, Feb 27, 2001 at 05:25:35PM -0500, Thomas B. Passin wrote: >It really depends on whether you see the annotations as being something >separate, and if you might like to apply the same idea to some other data that >you don't control. Author and source seems critical, and therefore suitable as part of the DTD, but additional categorizations seem less important and application-dependent. --amk From martin@loewis.home.cs.tu-berlin.de Wed Feb 28 00:14:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 28 Feb 2001 01:14:47 +0100 Subject: [XML-SIG] Maintaining catalogs In-Reply-To: (message from Andrew Kuchling on Tue, 27 Feb 2001 10:11:19 -0500) References: Message-ID: <200102280014.f1S0El901398@mira.informatik.hu-berlin.de> > For a project, I'd like to install a DTD on the system and > automatically add its public identifier to the catalog. Is there a > standard place to put SGML/XML catalogs on Unix systems? > /usr/(local)?/lib/sgml? /etc/sgml/? I believe the standard location is below /usr/share/sgml. Not sure how it is supposed to work; it seems that a tool should look at all files matching CATALOG.* in that directory. In addition, I have a number of subdirectories in /usr/share/sgml, e.g. OASIS, W3C, James_Clark, Normal_Walsh, etc. They seem to correspond to the public identifiers; eg. "-//OASIS//DTD DocBook V3.1//EN" can be found in /usr/share/sgml/OASIS/dtd/DocBook_V3.1. However, these files are referred-to in the CATALOG.* files, so that seems to be the primary resource. In addition, nsgml honors the SGML_CATALOG_FILES environment variable; if this is not set, the documentation says it uses a system-dependent default list of catalog files. There is something called "open catalogs", but I'm not certain how much that actually specifies. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Feb 28 00:50:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 28 Feb 2001 01:50:00 +0100 Subject: [XML-SIG] Catalogs and LSB Message-ID: <200102280050.f1S0o0D01871@mira.informatik.hu-berlin.de> I just found that the Linux Standards Base addendum R003 specifies locations for catalogs; they say that centralized catalogs must reside in /etc/sgml, end in .cat, and only contain CATALOG declarations. It goes on saying that /etc/sgml/catalog is the central catalog, and managed by means of the install-catalog utility. It seems that Redhat provides that utility, but that this utility manages /usr/lib/sgml/CATALOG (and puts new catalog files into /usr/lib/sgml). Debian apparently puts the central catalog into /etc/sgml.catalog, and the individual catalogs into /usr/lib/sgml; they have a corresponding install-catalog utility. It would probably be worthwhile writing a library that locates the central catalog, or individual catalogs, in a best-effort manner. If we were to set a precedent, it would be probably best to stick to the LSB proposal, regardless whether Debian and Redhat differ from that, and even though only Caldera appears to implement it fully. Regards, Martin From akuchlin@mems-exchange.org Wed Feb 28 05:49:08 2001 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Wed, 28 Feb 2001 00:49:08 -0500 Subject: [XML-SIG] QEL 2.0 DTD Message-ID: <200102280549.AAA01205@mira.erols.com> First stab at a Quotation Exchange Language Web page: http://www.amk.ca/qel/ Take a look at the QEL 2.0 DTD and offer any comments. Now to work on the software... --amk From eric2461@caramail.com Wed Feb 28 17:24:20 2001 From: eric2461@caramail.com (RICO) Date: Wed, 28 Feb 2001 18:24:20 +0100 Subject: [XML-SIG] Gratuitement : le meilleur du web ! Message-ID: <200102281809.f1SI9HO07583@bacho.adi.fr> From Alexandre.Fayolle@logilab.fr Wed Feb 28 18:42:58 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 28 Feb 2001 19:42:58 +0100 (CET) Subject: [XML-SIG] [off topic] getting XML on the net Message-ID: This is off topic, but I thought some of you might be interested. I've just learned this from http://www.scripting.com. Google can deliver results as XML: wget http://www.google.com/xml?q=narval It is possible to get NASDAQ stock quotes in XML too: wget "http://quotes.nasdaq.com/quote.dll?page=xml&mode=stock&symbol=AAPL" I'm pretty sure I'll take some time to set up a couple of Narval recipes to take advantage of this. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From xml-sig@teleo.net Wed Feb 28 19:24:32 2001 From: xml-sig@teleo.net (xml-sig@teleo.net) Date: Wed, 28 Feb 2001 11:24:32 -0800 Subject: [XML-SIG] DTD design: include categorization, or use RDF? In-Reply-To: References: Message-ID: <0102281124320Y.04301@quadra.teleo.net> On Tuesday 27 February 2001 08:23, Andrew Kuchling wrote: > I'm revisiting and extending my quotation DTD this week, hence my > suddenly asking a bunch of questions here. I'm wondering about > categorization. A common application would be to group quotations > into categories. I can add a category element or attribute, but then > someone comes along who wants to sort quotes by newsgroup, or by date, > or by some other wacky thing. I can invent a general syntax, but > that's just reinventing RDF badly, so RDF seems like the obvious > course. Have you considered Topic Maps, as a possible alternative to RDF? http://xmlcoverpages.org/topicMaps.html There's now an Open Source TM engine in Python: http://ontopia.net/software/tmproc/