From uche.ogbuji@fourthought.com Tue Jan 2 03:57:47 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 01 Jan 2001 20:57:47 -0700 Subject: [XML-SIG] 4Suite -> gettext Message-ID: <200101020357.UAA21220@localhost.localdomain> I started looking into converting 4Suite from my hacked i18n to Python's gettext, but it seems this is only supported for Python 2.0. Unfortunately, as we've discussed here before, we need to maintain support for Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?) So I'm holding off on the changes for now. If anyone has any tricks for straddling Python versions using gettext, please let me know. Thanks. I will look next at supporting Martin's factory architecture for 4XPath/4XSLT. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Tue Jan 2 04:08:41 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 1 Jan 2001 23:08:41 -0500 Subject: [XML-SIG] 4Suite -> gettext References: <200101020357.UAA21220@localhost.localdomain> Message-ID: <00cf01c07471$b158fee0$7cac1218@reston1.va.home.com> asks - > > Unfortunately, as we've discussed here before, we need to maintain support for > Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?) > > So I'm holding off on the changes for now. If anyone has any tricks for > straddling Python versions using gettext, please let me know. Thanks. > At least 6 months after Zope switches to 2.0. Cheers, Tom P From gstein@lyra.org Tue Jan 2 04:16:38 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 1 Jan 2001 20:16:38 -0800 Subject: [XML-SIG] 4Suite -> gettext In-Reply-To: <200101020357.UAA21220@localhost.localdomain>; from uche.ogbuji@fourthought.com on Mon, Jan 01, 2001 at 08:57:47PM -0700 References: <200101020357.UAA21220@localhost.localdomain> Message-ID: <20010101201638.O10567@lyra.org> On Mon, Jan 01, 2001 at 08:57:47PM -0700, uche.ogbuji@fourthought.com wrote: > I started looking into converting 4Suite from my hacked i18n to Python's > gettext, but it seems this is only supported for Python 2.0. > > Unfortunately, as we've discussed here before, we need to maintain support for > Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?) By "we", do you mean Fourthought, or PyXML? IIRC, PyXML 0.5.5.1 is for 1.5.2 and the latest is for Python 2.0 only. Cheers, -g -- Greg Stein, http://www.lyra.org/ From uche.ogbuji@fourthought.com Tue Jan 2 05:20:50 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 01 Jan 2001 22:20:50 -0700 Subject: [XML-SIG] 4Suite -> gettext References: <200101020357.UAA21220@localhost.localdomain> <20010101201638.O10567@lyra.org> Message-ID: <3A516532.5DFA3CA7@fourthought.com> Greg Stein wrote: > > On Mon, Jan 01, 2001 at 08:57:47PM -0700, uche.ogbuji@fourthought.com wrote: > > I started looking into converting 4Suite from my hacked i18n to Python's > > gettext, but it seems this is only supported for Python 2.0. > > > > Unfortunately, as we've discussed here before, we need to maintain support for > > Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?) > > By "we", do you mean Fourthought, or PyXML? Well, to be clear, PyXML debated about it and no firm resolution was come to, except that Martin added back in the Unicode support. Fourthought certainly intends the support. > IIRC, PyXML 0.5.5.1 is for 1.5.2 and the latest is for Python 2.0 only. I thought that was the original plan, but that it was decided to continue supporting Python 1.5.2 in PyXML 0.6.3 and up. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Tue Jan 2 07:49:28 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 2 Jan 2001 08:49:28 +0100 Subject: [4suite] Re: [XML-SIG] 4Suite -> gettext In-Reply-To: <3A516532.5DFA3CA7@fourthought.com> (message from Uche Ogbuji on Mon, 01 Jan 2001 22:20:50 -0700) References: <200101020357.UAA21220@localhost.localdomain> <20010101201638.O10567@lyra.org> <3A516532.5DFA3CA7@fourthought.com> Message-ID: <200101020749.IAA00706@loewis.home.cs.tu-berlin.de> > I thought that was the original plan, but that it was decided to > continue supporting Python 1.5.2 in PyXML 0.6.3 and up. Indeed, the rationale being that people using PyXML want to also use Python 1.5. PyXML 0.5.5.1 is not supported in any sense: Nobody answers even questions related to that release, all they can get is a recommendation to use the latest release. That recommendation would be meaningless if the latest versions didn't support Python 1.5. There are even binary distributions of PyXML 0.6 for Python 1.5.2. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Jan 2 07:56:09 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 2 Jan 2001 08:56:09 +0100 Subject: [XML-SIG] Re: [4suite] 4Suite -> gettext In-Reply-To: <200101020357.UAA21220@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101020357.UAA21220@localhost.localdomain> Message-ID: <200101020756.IAA00757@loewis.home.cs.tu-berlin.de> > So I'm holding off on the changes for now. If anyone has any tricks > for straddling Python versions using gettext, please let me know. How about "degraded functionality": try: import gettext def _(msg): gettext.dgettext("4suite",msg) except ImportError: def _(msg): return msg That is, for 1.5, there would be only the english message. That shouldn't be a major obstacle, since there aren't any translations of the messages so far, AFAICT. Regards, Martin P.S. On some Linux systems, the above import will even succeed with 1.5, and do the right thing. A gettext module is available as part of the GNOME package. From martin@loewis.home.cs.tu-berlin.de Tue Jan 2 08:00:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 2 Jan 2001 09:00:39 +0100 Subject: [XML-SIG] Preparing 0.6.3 Message-ID: <200101020800.JAA00817@loewis.home.cs.tu-berlin.de> I'd like to release PyXML 0.6.3 later this week or early next week. If you have any changes that you'd like to get into the release - this would be the time to check them in. Remember: there will be always another release. Regards, Martin From Alexandre.Fayolle@logilab.fr Tue Jan 2 14:49:29 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 2 Jan 2001 15:49:29 +0100 (CET) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012231649.JAA02948@localhost.localdomain> Message-ID: Sorry to come back on last weeks mails, but I was offline for a while (and this was really a *good* thing for my mental health). Anyway, a happy new year to everyone here... On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote: > Seriously, after a quick survey of my code, the only place I import Node is in > order to get at the constants. Yup, I noticed this in 4Suite code, and I kept wondering about the rational of doing so, since almost every object you manipulate _is_ a node, and therefore has access to the class attributes. In other words a typical line of code is: "if some_node.nodeType == Node.ELEMENT_NODE :" Is there a difference in performance with: "if some_node.nodeType == some_node.ELEMENT_NODE :" ? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From teg@redhat.com Tue Jan 2 14:53:35 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 02 Jan 2001 09:53:35 -0500 Subject: [XML-SIG] 4Suite -> gettext In-Reply-To: <200101020357.UAA21220@localhost.localdomain> References: <200101020357.UAA21220@localhost.localdomain> Message-ID: uche.ogbuji@fourthought.com writes: > I started looking into converting 4Suite from my hacked i18n to Python's > gettext, but it seems this is only supported for Python 2.0. There are modules available for Python 1.5 handling this - we use one in the installer for Red Hat Linux (which is written in python), and there is also one which is part of pygnome. > Unfortunately, as we've discussed here before, we need to maintain support for > Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?) We're using python 1.5 through the 7 series. We may (or not - right now, likely not) include a python2 package as well, but it won't be the primary one. -- Trond Eivind Glomsrød Red Hat, Inc. From uche.ogbuji@fourthought.com Tue Jan 2 16:37:33 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 02 Jan 2001 09:37:33 -0700 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: Message from Alexandre Fayolle of "Tue, 02 Jan 2001 15:49:29 +0100." Message-ID: <200101021637.JAA01405@localhost.localdomain> > On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote: > > > Seriously, after a quick survey of my code, the only place I import Node is in > > order to get at the constants. > > Yup, I noticed this in 4Suite code, and I kept wondering about the > rational of doing so, since almost every object you manipulate _is_ a > node, and therefore has access to the class attributes. > In other words a typical line of code is: > "if some_node.nodeType == Node.ELEMENT_NODE :" > > Is there a difference in performance with: > "if some_node.nodeType == some_node.ELEMENT_NODE :" ? Nope. It's all about developer's intertia, AKA cutnpasteitis. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Tue Jan 2 22:34:55 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 2 Jan 2001 23:34:55 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: (message from Alexandre Fayolle on Tue, 2 Jan 2001 15:49:29 +0100 (CET)) References: Message-ID: <200101022234.f02MYtN07543@mira.informatik.hu-berlin.de> > > Seriously, after a quick survey of my code, the only place I > > import Node is in order to get at the constants. > Yup, I noticed this in 4Suite code Actually, when editing 4DOM, I found that a number of places uses Node as a base class, so you still need to import the module. > Is there a difference in performance with: > "if some_node.nodeType == some_node.ELEMENT_NODE :" ? Yes, but it should not matter much. If you have an inheritance depth of 4 (xml.dom.Node, xml.dom.FtNode.Node, xml.dom.Element.Element, something that derives from Element), then you get 5 dictionary lookups to find self.ELEMENT_NODE (for the instance, and for each of the bases). For Node.ELEMENT_NODE, you get only two (one to find Node, one to find ELEMENT_NODE); three if you look in FtNode.Node, four if you write xml.dom.Node.ELEMENT_NODE. Since dictionary lookups were tuned to be one of the most efficient operations in Python, and since it is so easy to get many dictionary lookups in other places, that really shouldn't matter much. So what counts would be clarity, I have to admit that I find Node.ELEMENT_NODE clearer than self.ELEMENT_NODE (although either is obvious if you know the DOM). Regards, Martin From Mike.Olson@fourthought.com Wed Jan 3 06:06:00 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 02 Jan 2001 23:06:00 -0700 Subject: [XML-SIG] PyXPath 1.1 References: <200012270120.SAA02777@localhost.localdomain> Message-ID: <3A52C148.390819BE@FourThought.com> uche.ogbuji@fourthought.com wrote: > > > > Likely! :-) I briefly skimmed the source and 4suite.org and can't seem > > to get a good description of what those structures look like, is there > > a URL I missed? > > There is no such beast. These were originally intended to be purely internal > objects. If we decided to expose them as an API, we'd want to decide on the > naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and > document them properly. I'm confused. This thread originally started as an interface from multiple lexers into 4XPath (if I remeber correctly). However, the Parsed* classes in 4XPath are created by the parser (Bison). This is why I originally recommended the interface of a token stream to feed into the parser (currently Bison, but could be replaced with a python only version). Mike > > For now, your best bet is to have a look at XPath/Parsed* in 4Suite (and also > check out Xslt/Parsed* for the associated Pattern machine objects). > > > Note also: I'm getting odd URL redirects going to 4suite.{org|com}, > > with URLs being replaced with quoted strings that then won't resolve: > > > > http://www.4suite.org/ > > --> http://www.4suite.org/"index.epy" > > > > This seems to happen on "directory" URLs. > > Hmm. I looked into this, but I'm not seeing it. I went as bare-bones as > possible to avoid user agent artifacts and all that: > > [uogbuji@borgia uogbuji]$ telnet www.4suite.org 80 > Trying 204.144.146.184... > Connected to dollar.4suite.org. > Escape character is '^]'. > GET http://www.4suite.org/ HTTP/1.0 > > HTTP/1.1 200 OK > Date: Wed, 27 Dec 2000 01:14:59 GMT > Server: Apache/1.3.12 (Unix) mod_snake/0.4.1 > Last-Modified: Thu, 02 Nov 2000 19:07:30 GMT > ETag: "36f0d-178-3a01bb72" > Accept-Ranges: bytes > Content-Length: 376 > Connection: close > Content-Type: text/html > > > > > > > > > > > > >

> Click to Enter >
> > > > Connection closed by foreign host. > [uogbuji@borgia uogbuji]$ > > As you can see, the meta refresh goes to the relative "index.epy". I don't > know how this would cause the effect you mention. What user agent are you > using? > > Thanks. > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Wed Jan 3 11:34:52 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 03 Jan 2001 05:34:52 -0600 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: Mike Olson's message of "Tue, 02 Jan 2001 23:06:00 -0700" References: <200012270120.SAA02777@localhost.localdomain> <3A52C148.390819BE@FourThought.com> Message-ID: Mike Olson writes: > uche.ogbuji@fourthought.com wrote: > > > > > > Likely! :-) I briefly skimmed the source and 4suite.org and > > > can't seem to get a good description of what those structures > > > look like, is there a URL I missed? > > > > There is no such beast. These were originally intended to be > > purely internal objects. If we decided to expose them as an API, > > we'd want to decide on the naming (Martin doesn't like the > > "Parsed" prefixes, I'm +0 on killing them) and document them > > properly. > > I'm confused. This thread originally started as an interface from > multiple lexers into 4XPath (if I remeber correctly). However, the > Parsed* classes in 4XPath are created by the parser (Bison). This > is why I originally recommended the interface of a token stream to > feed into the parser (currently Bison, but could be replaced with a > python only version). I'm the one who asked for the resulting parse tree, rather than the token stream. I would like to use the XPath (already parsed, thanks everyone!) to traverse other structures, like Py objects (where Py object attribute names stand in for XML element names). -- Ken From martin@loewis.home.cs.tu-berlin.de Wed Jan 3 11:21:48 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 3 Jan 2001 12:21:48 +0100 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: <3A52C148.390819BE@FourThought.com> (message from Mike Olson on Tue, 02 Jan 2001 23:06:00 -0700) References: <200012270120.SAA02777@localhost.localdomain> <3A52C148.390819BE@FourThought.com> Message-ID: <200101031121.f03BLmw01813@mira.informatik.hu-berlin.de> > I'm confused. This thread originally started as an interface from > multiple lexers into 4XPath (if I remeber correctly). It was never clear an interface to *what* is the subject. As the subject still indicates, it started with my announcement that I have multiple pure-Python lexers and parsers. It may be reasonable to get an interface to multiple lexers also, but only if there are actually multiple lexers that are sufficiently different (e.g. C based ones, sre based ones, fast ones, correct ones - assuming you can't be fast and correct simultaneously). Note that an interface to XPath could be even higher-level than the parsing level, since there are multiple independent software blocks involved in your typical XPath application: - the XPath lexer (reading streams, generating tokens) - the XPath parser (reading tokens, generating trees) - the tree implementation (providing expression trees, offering evaluation) - the application (evaluating trees) That gives a total of three potential interfaces. There may be other things that an application wishes to do with an XPath expression (e.g. navigating it), which would require more features from the tree implementation. > However, the Parsed* classes in 4XPath are created by the parser > (Bison). This is why I originally recommended the interface of a > token stream to feed into the parser (currently Bison, but could be > replaced with a python only version). As a matter of fact, PyXML 1.2 creates the Parsed* classes without a bison parser. Regards, Martin From Mike.Olson@fourthought.com Wed Jan 3 17:09:50 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 03 Jan 2001 10:09:50 -0700 Subject: [XML-SIG] Specializing DOM exceptions References: <200101021637.JAA01405@localhost.localdomain> Message-ID: <3A535CDE.80C448E6@FourThought.com> uche.ogbuji@fourthought.com wrote: > > > On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote: > > > > > Seriously, after a quick survey of my code, the only place I import Node is in > > > order to get at the constants. > > > > Yup, I noticed this in 4Suite code, and I kept wondering about the > > rational of doing so, since almost every object you manipulate _is_ a > > node, and therefore has access to the class attributes. > > In other words a typical line of code is: > > "if some_node.nodeType == Node.ELEMENT_NODE :" > > > > Is there a difference in performance with: > > "if some_node.nodeType == some_node.ELEMENT_NODE :" ? > > Nope. It's all about developer's intertia, AKA cutnpasteitis. Actually there may be a small performace advantage doing it they way it is done. Looking it up from the instance it will have to look into atleast 3 dictionaries to find the value, while looking it up from the class itself it will only have to look into one dictionary. (though this theroy is untested) Mike > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Wed Jan 3 17:48:20 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 3 Jan 2001 12:48:20 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <3A535CDE.80C448E6@FourThought.com> References: <200101021637.JAA01405@localhost.localdomain> <3A535CDE.80C448E6@FourThought.com> Message-ID: <14931.26084.715041.483820@cj42289-a.reston1.va.home.com> Mike Olson writes: > Actually there may be a small performace advantage doing it they way it > is done. Looking it up from the instance it will have to look into > atleast 3 dictionaries to find the value, while looking it up from the > class itself it will only have to look into one dictionary. (though > this theroy is untested) Mike, It doesn't quite work like that -- looking it up from the class only takes one dict lookup *once you have the class*, but you are also doing one lookup for the class itself, assuming you've imported it into your module's globals. So the difference is a single dictionary lookup for each level of class derivation from Node. For interned strings, this is pretty trivial and you can reasonably expect it to disappear in the wash. On the other hand, picking it up from the class does assure you know the exact access path, and some people think it's more readable. "from xml.dom import Node" is your friend. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Mike.Olson@fourthought.com Wed Jan 3 18:56:40 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 03 Jan 2001 11:56:40 -0700 Subject: [XML-SIG] Specializing DOM exceptions References: <200101021637.JAA01405@localhost.localdomain> <3A535CDE.80C448E6@FourThought.com> <14931.26084.715041.483820@cj42289-a.reston1.va.home.com> Message-ID: <3A5375E8.B1B0DB41@FourThought.com> "Fred L. Drake, Jr." wrote: > > Mike Olson writes: > > Actually there may be a small performace advantage doing it they way it > > is done. Looking it up from the instance it will have to look into > > atleast 3 dictionaries to find the value, while looking it up from the > > class itself it will only have to look into one dictionary. (though > > this theroy is untested) > > Mike, > It doesn't quite work like that -- looking it up from the class only > takes one dict lookup *once you have the class*, but you are also > doing one lookup for the class itself, assuming you've imported it > into your module's globals. Of course, I realized after I saw Martin's response. That's what I get for answering email before coffee. Mike So the difference is a single dictionary > lookup for each level of class derivation from Node. For interned > strings, this is pretty trivial and you can reasonably expect it to > disappear in the wash. > On the other hand, picking it up from the class does assure you know > the exact access path, and some people think it's more readable. > "from xml.dom import Node" is your friend. ;-) > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Digital Creations -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu Jan 4 14:37:08 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 04 Jan 2001 07:37:08 -0700 Subject: [XML-SIG] Python XML topic page Message-ID: <200101041437.HAA07999@localhost.localdomain> http://pyxml.sourceforge.net/topics/ Way out of date in general. I'd like to make bunch of additions and a few corrections. First of all I wanted to be sure no one minded. If not, the next bit is knowing where it is in the sourceforge source tree. While I'm noting the fact, python.org is terribly out of date in general beyond the first few pages. I know there are some unfortunate reasons behind this, but it's pretty sad. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Thu Jan 4 15:13:05 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 10:13:05 -0500 Subject: [XML-SIG] Python XML topic page In-Reply-To: <200101041437.HAA07999@localhost.localdomain>; from uche.ogbuji@fourthought.com on Thu, Jan 04, 2001 at 07:37:08AM -0700 References: <200101041437.HAA07999@localhost.localdomain> Message-ID: <20010104101305.A23803@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 07:37:08AM -0700, uche.ogbuji@fourthought.com wrote: >Way out of date in general. I'd like to make bunch of additions and a few >corrections. First of all I wanted to be sure no one minded. If not, the >next bit is knowing where it is in the sourceforge source tree. Please do. The Web pages are in a separate module, 'www', so you'll have to check that module out from cvs.pyxml.sourceforge.net separately. >While I'm noting the fact, python.org is terribly out of date in general >beyond the first few pages. I know there are some unfortunate reasons behind >this, but it's pretty sad. Yes; we should make it a goal to spruce things up through the first half of 2001 (maybe after 2.1 is released). --amk From martin@loewis.home.cs.tu-berlin.de Thu Jan 4 18:44:44 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 4 Jan 2001 19:44:44 +0100 Subject: [XML-SIG] Python XML topic page In-Reply-To: <20010104101305.A23803@kronos.cnri.reston.va.us> (message from Andrew Kuchling on Thu, 4 Jan 2001 10:13:05 -0500) References: <200101041437.HAA07999@localhost.localdomain> <20010104101305.A23803@kronos.cnri.reston.va.us> Message-ID: <200101041844.f04Iii601097@mira.informatik.hu-berlin.de> > >Way out of date in general. I'd like to make bunch of additions and a few > >corrections. First of all I wanted to be sure no one minded. If not, the > >next bit is knowing where it is in the sourceforge source tree. > > Please do. The Web pages are in a separate module, 'www', so you'll > have to check that module out from cvs.pyxml.sourceforge.net > separately. I'd like to add that a cron job is supposed to re-generate the pages within 6 hours after the changes have been committed. You can run the generator manually if you want on pyxml.sourceforge.net, although I recommend running it locally if you only want to check whether it is correct; customize doupdate to your needs to do so. Regards, Martin From loewis@informatik.hu-berlin.de Sun Jan 7 11:22:03 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 7 Jan 2001 12:22:03 +0100 (MET) Subject: [XML-SIG] PyXML 0.6.3 is available Message-ID: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> Version 0.6.3 of the Python/XML distribution is now available. It should be considered a beta release, and can be downloaded from the following URLs: http://download.sourceforge.net/pyxml/PyXML-0.6.3.tar.gz http://download.sourceforge.net/pyxml/PyXML-0.6.3.win32-py1.5.exe http://download.sourceforge.net/pyxml/PyXML-0.6.3.win32-py2.0.exe http://download.sourceforge.net/pyxml/PyXML-0.6.3-1.5.2.i386.rpm http://download.sourceforge.net/pyxml/PyXML-0.6.3-2.0.i386.rpm Changes in this version, compared to 0.6.2: * Include documentation in binary packages as well. * Update to Expat 1.2, offer all Python Unicode codecs to expat. * support the lexical-handler property in the expat SAX driver. * Restructure DOM interfaces to better accomodate multiple DOM implementations: provide standard exceptions and symbolic constants (including those inside of the Node interface) in xml.dom. * Improve minidom: validate arguments and raise DOM exceptions, correct NameNodeMap operations, offer cloneNode, splitText, DocumentType, DOMImplementation, and correct various other errors. * Restore xml.unicode for compatibility with PyXML 0.5. This is a pure-Python implementation of the iso8859 module, which can only convert between ISO-8859-x and UTF-8. Python 2 users should use the Unicode type instead of this service. * Fix memory leaks in expat parser and pulldom. The Python/XML distribution contains the basic tools required for processing XML data using the Python programming language, assembled into one easy-to-install package. The distribution includes parsers and standard interfaces such as SAX and DOM, along with various other useful modules. =20 The package currently contains: * XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius Garshol), sgmlop (Fredrik Lundh). * SAX interface (Lars Marius Garshol) * minidom DOM implementation (Paul Prescod) * 4DOM from Fourthought (Uche Ogbuji, Mike Olson) * Various utility modules and functions (various people) * Documentation and example programs (various people) The code is being developed bazaar-style by contributors from the Python XML Special Interest Group, so please send comments, questions, or bug reports to . For more information about Python and XML, see: http://www.python.org/topics/xml/ --=20 Martin v. L=F6wis http://www.informatik.hu-berlin.de/~loewis From noreply@sourceforge.net Mon Jan 8 15:32:04 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jan 2001 07:32:04 -0800 Subject: [XML-SIG] [Bug #128044] 4DOM is unpickleable Message-ID: Bug #128044, was updated on 2001-Jan-08 07:31 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: larsga Assigned to : nobody Summary: 4DOM is unpickleable Details: For some reason, when trying to dump 4DOM Document nodes with cPickle or pickle under Python 2.0, only the Document node is serialized. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128044&group_id=6473 From nobody@sourceforge.net Tue Jan 9 15:38:50 2001 From: nobody@sourceforge.net (nobody) Date: Tue, 09 Jan 2001 07:38:50 -0800 Subject: [XML-SIG] [Bug #128172] [4XSLT] strange behaviour of xsl:import Message-ID: From: noreply@sourceforge.net Bug #128172, was updated on 2001-Jan-09 07:38 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: ornicar Assigned to : nobody Summary: [4XSLT] strange behaviour of xsl:import Details: Hello, I'm using XSL Transformation to turn xml trees in viewable html documents (again!) and I've just found something looking like a bug in 4xslt engine. The attached xml file (carpool.xml) contains data on a pool of cars. Each node has a 'state' attribute used to know if the car is free or used or in the garage for maintenance. The 'state' value is a number. ... ... In order to display valuable information on a web page, I use the attached xslt stylesheet (pool2html.xsl) and I transform the 'state' numeric value to an understandable string. As this number-to-string transformation should be used in various stylesheets, I put it in a named-template stored in a common XSLT stylesheet (pool-comm.xsl). This common stylesheet is imported at the beginning of pool2html.xsl stylesheet. ... ... ... What I expected to get is the attached html document called expected-pool.html . Nevertheless, I got the attached html document called pool.html ! 4xslt wasn't able to call the template named 'state-value' whereas this template is defined in the imported stylesheet (pool-comm.xsl). Another XSLT engine (e.g. xalan) is able to call the template and outputs the expected html. A stranger behaviour : when I replace the 'xsl:import' with an 'xsl:include', 4xslt can call the named-template and outputs the expected html. I read very carefully the XSLT spec and I didn't find any possible explanation to this strange behaviour ... could you let me know if this is a bug or if there is something I didn't get in stylesheets combination philosophy. Best regards, O. CAYROL. ------------------------------------------------------- carpool.xml Ferrari F40 459 CBO 75 Porsche 911 347 CQQ 75 ----------------------------------------------------- pool2html.xsl Cars Pool Management

Cars Pool Management

State Brand Type Registration Number
------------------------------------------------------- pool-comm.xsl Free Used Getting repaired --------------------------------------------------- pool.html Cars Pool Management

Cars Pool Management

State Brand Type Registration Number
Ferrari F40 459 CBO 75
Porsche 911 347 CQQ 75
---------------------------------------------------- expected-pool.html Cars Pool Management

Cars Pool Management

StateBrandTypeRegistration Number
FreeFerrariF40459 CBO 75
Used Porsche911347 CQQ 75
For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128172&group_id=6473 From matt@virtualspectator.com Wed Jan 10 05:20:43 2001 From: matt@virtualspectator.com (matt) Date: Wed, 10 Jan 2001 18:20:43 +1300 Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again In-Reply-To: References: Message-ID: <0101101829390Y.00856@localhost.localdomain> --Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC Content-Type: text/plain Content-Transfer-Encoding: 8bit If this is a bug, I will post it, but I'm not sure it is yet. Attached are two files, one a test xml with encoding ISO-8859-1 and the other a test python script. The problem is that if one uses a pyexpat parser, and then renders in ISO-8859-1 then things are ok. If one uses the drv_xmllib driver, then an error occurs as it tries to translate back to ISO-8859-1. My guess is that the ISO-8859-1 transformation into UTF-8 for character data(which is what happens when the original document is parsed) is not being done properly in the drv_xmllib driver. I have also included an xml document created within the script to show that infact that one is ok, and that it is the parser that is doing something wrong, or me doing something wrong with the parser. My only reason for using drv_xmllib is that pyexpat still has a memory leak in it. I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur. regards Matt --Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC Content-Type: text/x-java; name="test.py" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test.py" ZnJvbSB4bWwuZG9tIGltcG9ydCBpbXBsZW1lbnRhdGlvbgpmcm9tIHhtbC5kb20gaW1wb3J0IGV4 dApmcm9tIHhtbC5kb20uZXh0LnJlYWRlciBpbXBvcnQgU2F4MgoKZHQgPSBpbXBsZW1lbnRhdGlv bi5jcmVhdGVEb2N1bWVudFR5cGUoJycsJycsJycpCmRvYyA9IGltcGxlbWVudGF0aW9uLmNyZWF0 ZURvY3VtZW50KCcnLCd0ZXN0JyxkdCkKY2RzID0gZG9jLmNyZWF0ZUNEQVRBU2VjdGlvbigiaGVs bG8iKQpjZHMuZGF0YT0iaGVsbG8gdGhpcyBpcyB0ZXh0IDog6SIKZm4gPSBkb2MuZ2V0RWxlbWVu dHNCeVRhZ05hbWVOUygnJywnKicpWzBdCmZuLmFwcGVuZENoaWxkKGNkcykKCmV4dC5QcmV0dHlQ cmludChkb2MsZW5jb2Rpbmc9J0lTTy04ODU5LTEnKQoKZnJvbSB4bWwuc2F4IGltcG9ydCBzYXhl eHRzCgpkb2MyID0geG1sX2RvbV9vYmplY3QgPSBTYXgyLkZyb21YbWxGaWxlKCd0ZXN0LnhtbCcp CmRvYzMgPSB4bWxfZG9tX29iamVjdCA9IFNheDIuRnJvbVhtbEZpbGUoJ3Rlc3QueG1sJyxwYXJz ZXI9c2F4ZXh0cy5YTUxQYXJzZXJGYWN0b3J5Lm1ha2VfcGFyc2VyKCd4bWwuc2F4LmRyaXZlcnMu ZHJ2X3B5ZXhwYXQnKSkKZG9jNCA9IHhtbF9kb21fb2JqZWN0ID0gU2F4Mi5Gcm9tWG1sRmlsZSgn dGVzdC54bWwnLHBhcnNlcj1zYXhleHRzLlhNTFBhcnNlckZhY3RvcnkubWFrZV9wYXJzZXIoJ3ht bC5zYXguZHJpdmVycy5kcnZfeG1sbGliJykpCgpwcmludApwcmludCAibm8gcGFyc2VyIHdhcyBz ZWxlY3RlZCAuLiBzaG91bGQgZGVmYXVsdCB0byBweWV4cGF0IgpleHQuUHJldHR5UHJpbnQoZG9j MixlbmNvZGluZz0nSVNPLTg4NTktMScpCgpwcmludApwcmludCAicHlleHBhdCB3YXMgcGFyc2Vy IHNlbGVjdGVkIgpleHQuUHJldHR5UHJpbnQoZG9jMyxlbmNvZGluZz0nSVNPLTg4NTktMScpCgpw cmludApwcmludCAiZHJ2X3htbGxpYiB3YXMgcGFyc2VyIHNlbGVjdGVkIgpleHQuUHJldHR5UHJp bnQoZG9jNCxlbmNvZGluZz0nSVNPLTg4NTktMScpCiMgbm90ZSBpdCBpcyBmaW5lIGlmIHByaW50 ZWQgYXMgVVRGLTggZm9ybWF0Cgo= --Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC Content-Type: text/x-c++; name="test.xml" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test.xml" PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nSVNPLTg4NTktMSc/Pgo8dGVzdD48IVtDREFU QVtoZWxsbyB0aGlzIGlzIHRleHQgOiDpXV0+CjwvdGVzdD4K --Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC-- From martin@loewis.home.cs.tu-berlin.de Wed Jan 10 07:49:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 10 Jan 2001 08:49:56 +0100 Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again In-Reply-To: <0101101829390Y.00856@localhost.localdomain> (message from matt on Wed, 10 Jan 2001 18:20:43 +1300) References: <0101101829390Y.00856@localhost.localdomain> Message-ID: <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de> > If this is a bug, I will post it, but I'm not sure it is yet. > Attached are two files, one a test xml with encoding ISO-8859-1 and > the other a test python script. The problem is that if one uses a > pyexpat parser, and then renders in ISO-8859-1 then things are ok. > If one uses the drv_xmllib driver, then an error occurs as it tries > to translate back to ISO-8859-1. My guess is that the ISO-8859-1 > transformation into UTF-8 for character data(which is what happens > when the original document is parsed) is not being done properly in > the drv_xmllib driver. That's a good guess. drv_xmllib does not implement handle_xml at all, so it does not know what the encoding is. However, what it *should* do, atleast in Python 2.0, is to produce Unicode objects, not UTF-8 encoded strings. Would you like to look into correcting that? > My only reason for using drv_xmllib is that pyexpat still has a > memory leak in it. Not that I know of, atleast not in PyXML 0.6.3. > I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur. I'm confused. Where did you get PyXML 1.2 from? Regards, Martin From matt@virtualspectator.com Wed Jan 10 08:15:09 2001 From: matt@virtualspectator.com (matt) Date: Wed, 10 Jan 2001 21:15:09 +1300 Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again In-Reply-To: <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de> References: <0101101829390Y.00856@localhost.localdomain> <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de> Message-ID: <01011021320810.00856@localhost.localdomain> --Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM Content-Type: text/plain Content-Transfer-Encoding: 8bit On Wed, 10 Jan 2001, Martin v. Loewis wrote: > > If this is a bug, I will post it, but I'm not sure it is yet. > > Attached are two files, one a test xml with encoding ISO-8859-1 and > > the other a test python script. The problem is that if one uses a > > pyexpat parser, and then renders in ISO-8859-1 then things are ok. > > If one uses the drv_xmllib driver, then an error occurs as it tries > > to translate back to ISO-8859-1. My guess is that the ISO-8859-1 > > transformation into UTF-8 for character data(which is what happens > > when the original document is parsed) is not being done properly in > > the drv_xmllib driver. > > That's a good guess. drv_xmllib does not implement handle_xml at all, > so it does not know what the encoding is. However, what it *should* > do, atleast in Python 2.0, is to produce Unicode objects, not UTF-8 > encoded strings. ahh ... ok. > > Would you like to look into correcting that? > Hmm, means upgrading to 2.0, which perhaps I should do. The problem is that I use 4dom in some quite heavy zope products, and I am unconvinced that python 2.0 and Zope are stable enough for production environments, and too different to have split between production and development. I am starting to figure out PyXMLs stitching and would love to contribute somewhere. Character encoding is a good area. The other part though is making 4Dom pickleable, which was actually my next little project, to look at it some more and see where it is not pickleable. Could be simple, someone may already have the answer. > > My only reason for using drv_xmllib is that pyexpat still has a > > memory leak in it. > > Not that I know of, atleast not in PyXML 0.6.3. > Having a closer inspection of PyXML 0.6.3, the original memory leak from the parser doing it's parsing thing has gone, but there is one that exists for just purely making a parser. I use to call FromXML and its derivatives with no parser defined(ugh!!) and after about 77 loops of this memory would suddenly start been eaten. Anyway, now I just create parsers(the same pyexpat that Reader defaults to) as members of any class that needs them, so the memory leak never shows now. Some of my functions need to open up to 100 xml documents from files, import nodes, and write out others, so these leaks tend to show up quickly. > > I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur. > > I'm confused. Where did you get PyXML 1.2 from? > Someone said go get PyXML 1.3 on the 5th January from sourcefourge and I only found PyXML 1.2 ..... which has now changed to 1.3 ... and there are differences .. I have attached diff PyXML-0.6.2 PyXML-0.6.3 so you can see. regards Matt > Regards, > Martin -- --Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM Content-Type: text/english; name="diff.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="diff.txt" ZGlmZiBQeVhNTC0wLjYuMi9BTk5PVU5DRSBQeVhNTC0wLjYuMy9BTk5PVU5DRQo2YzYKPCBWZXJz aW9uIDAuNi4yIG9mIHRoZSBQeXRob24vWE1MIGRpc3RyaWJ1dGlvbiBpcyBub3cgYXZhaWxhYmxl LiAgSXQKLS0tCj4gVmVyc2lvbiAwLjYuMyBvZiB0aGUgUHl0aG9uL1hNTCBkaXN0cmlidXRpb24g aXMgbm93IGF2YWlsYWJsZS4gIEl0CjEwLDE0YzEwLDE0CjwgaHR0cDovL2Rvd25sb2FkLnNvdXJj ZWZvcmdlLm5ldC9weXhtbC9QeVhNTC0wLjYuMi50YXIuZ3oKPCBodHRwOi8vZG93bmxvYWQuc291 cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4yLndpbjMyLXB5MS41LmV4ZQo8IGh0dHA6Ly9k b3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwtMC42LjIud2luMzItcHkyLjAuZXhl CjwgaHR0cDovL2Rvd25sb2FkLnNvdXJjZWZvcmdlLm5ldC9weXhtbC9QeVhNTC0wLjYuMi0xLjUu Mi5pMzg2LnJwbQo8IGh0dHA6Ly9kb3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwt MC42LjItMi4wLmkzODYucnBtCi0tLQo+IGh0dHA6Ly9kb3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQv cHl4bWwvUHlYTUwtMC42LjMudGFyLmd6Cj4gaHR0cDovL2Rvd25sb2FkLnNvdXJjZWZvcmdlLm5l dC9weXhtbC9QeVhNTC0wLjYuMy53aW4zMi1weTEuNS5leGUKPiBodHRwOi8vZG93bmxvYWQuc291 cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4zLndpbjMyLXB5Mi4wLmV4ZQo+IGh0dHA6Ly9k b3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwtMC42LjMtMS41LjIuaTM4Ni5ycG0K PiBodHRwOi8vZG93bmxvYWQuc291cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4zLTIuMC5p Mzg2LnJwbQoxNmMxNgo8IENoYW5nZXMgaW4gdGhpcyB2ZXJzaW9uLCBjb21wYXJlZCB0byAwLjYu MToKLS0tCj4gQ2hhbmdlcyBpbiB0aGlzIHZlcnNpb24sIGNvbXBhcmVkIHRvIDAuNi4yOgoxOGMx OAo8IAkqIFN5bmNocm9uaXplIHdpdGggc3RhbmRhcmQgbGlicmFyeSBmcm9tIFB5dGhvbiAyLjAK LS0tCj4gCSogSW5jbHVkZSBkb2N1bWVudGF0aW9uIGluIGJpbmFyeSBwYWNrYWdlcyBhcyB3ZWxs LgoyMCwyM2MyMCwyMQo8IAkqIFVwZGF0ZWQgdG8gNERPTSBmcm9tIDRTdWl0ZSAwLjkuMS4gVGhp cyBjb3JyZWN0cyBtYW55CjwgCWVycm9ycywgc2VlIHRoZSA0U3VpdGUgQ2hhbmdlTG9nIGZvciBk ZXRhaWxzLiBNb3N0IG5vdGFibHksCjwgCXRoZSBTQVggcmVhZGVyIGludGVyZmFjZSBoYXMgYmVl biBleHBhbmRlZCB0byBzdXBwb3J0CjwgCWFyYml0cmFyeSBwYXJzZXJzLCBhbmQgYSBQeUV4cGF0 IHJlYWRlciBjbGFzcyB3YXMgYWRkZWQuCi0tLQo+IAkqIFVwZGF0ZSB0byBFeHBhdCAxLjIsIG9m ZmVyIGFsbCBQeXRob24gVW5pY29kZSBjb2RlY3MgdG8KPiAgICAgICAgICAgZXhwYXQuCjI1YzIz CjwgCSogQWRkIG1pbmlkb20gZnVuY3Rpb25zOiBub3JtYWxpemUgYW5kIGhhc0F0dHJpYnV0ZS4K LS0tCj4gICAgICAgICAqIHN1cHBvcnQgdGhlIGxleGljYWwtaGFuZGxlciBwcm9wZXJ0eSBpbiB0 aGUgZXhwYXQgU0FYIGRyaXZlci4KMjdjMjUsMjgKPCAJKiBGaXggYSBudW1iZXIgb2YgbWlub3Ig YnVncy4KLS0tCj4gCSogUmVzdHJ1Y3R1cmUgRE9NIGludGVyZmFjZXMgdG8gYmV0dGVyIGFjY29t b2RhdGUgbXVsdGlwbGUKPiAgICAgICAgICAgRE9NIGltcGxlbWVudGF0aW9uczogcHJvdmlkZSBz dGFuZGFyZCBleGNlcHRpb25zIGFuZCBzeW1ib2xpYwo+ICAgICAgICAgICBjb25zdGFudHMgKGlu Y2x1ZGluZyB0aG9zZSBpbnNpZGUgb2YgdGhlIE5vZGUgaW50ZXJmYWNlKSBpbgo+ICAgICAgICAg ICB4bWwuZG9tLgoyOWMzMCw0MAo8IAkqIE1vcmUgdGVzdHMgcGFzcyBub3csIGluIHBhcnRpY3Vs YXIgdGVzdF9kb20sIGFuZCB0ZXN0L2RvbS90ZXN0LgotLS0KPiAJKiBJbXByb3ZlIG1pbmlkb206 IHZhbGlkYXRlIGFyZ3VtZW50cyBhbmQgcmFpc2UgRE9NIGV4Y2VwdGlvbnMsCj4gICAgICAgICAg IGNvcnJlY3QgTmFtZU5vZGVNYXAgb3BlcmF0aW9ucywgb2ZmZXIgY2xvbmVOb2RlLCBzcGxpdFRl eHQsCj4gICAgICAgICAgIERvY3VtZW50VHlwZSwgRE9NSW1wbGVtZW50YXRpb24sIGFuZCBjb3Jy ZWN0IHZhcmlvdXMgb3RoZXIKPiAgICAgICAgICAgZXJyb3JzLgo+IAo+IAkqIFJlc3RvcmUgeG1s LnVuaWNvZGUgZm9yIGNvbXBhdGliaWxpdHkgd2l0aCBQeVhNTCAwLjUuIFRoaXMgaXMKPiAgICAg ICAgICAgYSBwdXJlLVB5dGhvbiBpbXBsZW1lbnRhdGlvbiBvZiB0aGUgaXNvODg1OSBtb2R1bGUs IHdoaWNoIGNhbgo+ICAgICAgICAgICBvbmx5IGNvbnZlcnQgYmV0d2VlbiBJU08tODg1OS14IGFu ZCBVVEYtOC4gUHl0aG9uIDIgdXNlcnMKPiAgICAgICAgICAgc2hvdWxkIHVzZSB0aGUgVW5pY29k ZSB0eXBlIGluc3RlYWQgb2YgdGhpcyBzZXJ2aWNlLgo+IAo+IAkqIEZpeCBtZW1vcnkgbGVha3Mg aW4gZXhwYXQgcGFyc2VyIGFuZCBwdWxsZG9tLgo0MCw0MWM1MQo8IEdhcnNob2wpLCB4bWxsaWIu cHkgKFNqb2VyZCBNdWxsZW5kZXIpIHVzaW5nIHRoZSBzZ21sb3AuYyBhY2NlbGVyYXRvcgo8IG1v ZHVsZSAoRnJlZHJpayBMdW5kaCkuCi0tLQo+IEdhcnNob2wpLCBzZ21sb3AgKEZyZWRyaWsgTHVu ZGgpLgo0NCw0NmM1NCw1NQo8IAkqIERPTSBpbnRlcmZhY2UgKFN0ZWZhbmUgRmVybWlnaWVyLCBB Lk0uIEt1Y2hsaW5nKQo8IAkqIDRET00gaW50ZXJmYWNlIGZyb20gRm91cnRob3VnaHQgKFVjaGUg T2didWppLCBNaWtlIE9sc29uKQo8IAkqIHhtbGFyY2gucHksIGZvciBhcmNoaXRlY3R1cmFsIGZv cm1zIHByb2Nlc3NpbmcgKEdlaXIgT3ZlIEdy+G5tbykKLS0tCj4gCSogbWluaWRvbSBET00gaW1w bGVtZW50YXRpb24gKFBhdWwgUHJlc2NvZCkKPiAJKiA0RE9NIGZyb20gRm91cnRob3VnaHQgKFVj aGUgT2didWppLCBNaWtlIE9sc29uKQpkaWZmIFB5WE1MLTAuNi4yL0NSRURJVFMgUHlYTUwtMC42 LjMvQ1JFRElUUwo3YzcKPCAgICA8aGFja2VyPiBlbGVtZW50IGFyZTogPG5hbWU+LCA8ZW1haWw+ LCA8aG9tZS1wYWdlPiwgPHB1YmxpYy1rZXk+LAotLS0KPiAgICA8eG1sLWhhY2tlcj4gZWxlbWVu dCBhcmU6IDxuYW1lPiwgPGVtYWlsPiwgPGhvbWUtcGFnZT4sIDxwdWJsaWMta2V5PiwKMjNjMjMs MjUKPCAgICAgPG5hbWU+IEZyZWQgTC4gRHJha2UgPC9uYW1lPgotLS0KPiAgICAgPG5hbWU+IEZy ZWQgTC4gRHJha2UsIEpyLiA8L25hbWU+Cj4gICAgIDxlbWFpbD4gZmRyYWtlQGFjbS5vcmcgPC9l bWFpbD4KPiAgICAgPGhvbWUtcGFnZT4gaHR0cDovL3B5dGhvbi5zdGFyc2hpcC5uZXQvY3Jldy9m ZHJha2UvIDwvaG9tZS1wYWdlPgo4OWE5Mgo+ICAgICA8dGFzaz4gbWluaWRvbSA8L3Rhc2s+CjEx NWMxMTgsMTI0CjwgICAgIDx0YXNrPiBBZGQgbm9ybWFsaXplKCkgdG8gbWluaWRvbSA8L3Rhc2s+ Ci0tLQo+ICAgICA8dGFzaz4gQWRkIHZhcmlvdXMgbWluaWRvbSBmZWF0dXJlcyA8L3Rhc2s+Cj4g ICA8L3htbC1oYWNrZXI+Cj4gCj4gICA8eG1sLWhhY2tlcj4KPiAgICAgPG5hbWU+IEV2Z2VueSBD aGVya2FzaGluIDwvbmFtZT4KPiAgICAgPGVtYWlsPiBldWdlbmVhaUBpY2MucnUgPC9lbWFpbD4K PiAgICAgPHRhc2s+IEV4cG9zZSBQeXRob24gY29kZWNzIHRvIHB5ZXhwYXQgPC90YXNrPgpkaWZm IFB5WE1MLTAuNi4yL0xJQ0VOQ0UgUHlYTUwtMC42LjMvTElDRU5DRQo1YzUKPCBET006Ci0tLQo+ IDRET006CjcsMTFjNwo8IFB5RXhwYXQ6CjwgCjwgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPCBDb3B5cmlnaHQgMTk5 MS0xOTk1IGJ5IFN0aWNodGluZyBNYXRoZW1hdGlzY2ggQ2VudHJ1bSwgQW1zdGVyZGFtLAo8IFRo ZSBOZXRoZXJsYW5kcy4KLS0tCj4gQ29weXJpZ2h0IChjKSAyMDAwIEZvdXJ0aG91Z2h0IEluYywg VVNBCjE5LDIxYzE1LDE2Mgo8IHN1cHBvcnRpbmcgZG9jdW1lbnRhdGlvbiwgYW5kIHRoYXQgdGhl IG5hbWVzIG9mIFN0aWNodGluZyBNYXRoZW1hdGlzY2gKPCBDZW50cnVtIG9yIENXSSBvciBDb3Jw b3JhdGlvbiBmb3IgTmF0aW9uYWwgUmVzZWFyY2ggSW5pdGlhdGl2ZXMgb3IKPCBDTlJJIG5vdCBi ZSB1c2VkIGluIGFkdmVydGlzaW5nIG9yIHB1YmxpY2l0eSBwZXJ0YWluaW5nIHRvCi0tLQo+IHN1 cHBvcnRpbmcgZG9jdW1lbnRhdGlvbiwgYW5kIHRoYXQgdGhlIG5hbWUgb2YgRm91clRob3VnaHQg TExDIG5vdCBiZQo+IHVzZWQgaW4gYWR2ZXJ0aXNpbmcgb3IgcHVibGljaXR5IHBlcnRhaW5pbmcg dG8gZGlzdHJpYnV0aW9uIG9mIHRoZQo+IHNvZnR3YXJlIHdpdGhvdXQgc3BlY2lmaWMsIHdyaXR0 ZW4gcHJpb3IgcGVybWlzc2lvbi4KPiAKPiBGT1VSVEhPVUdIVCBMTEMgRElTQ0xBSU0gQUxMIFdB UlJBTlRJRVMgV0lUSCBSRUdBUkQgVE8gVEhJUyBTT0ZUV0FSRSwKPiBJTkNMVURJTkcgQUxMIElN UExJRUQgV0FSUkFOVElFUyBPRiBNRVJDSEFOVEFCSUxJVFkgQU5EIEZJVE5FU1MsCj4gSU4gTk8g RVZFTlQgU0hBTEwgRk9VUlRIT1VHSFQgQkUgTElBQkxFIEZPUiBBTlkgU1BFQ0lBTCwgSU5ESVJF Q1QgT1IKPiBDT05TRVFVRU5USUFMIERBTUFHRVMgT1IgQU5ZIERBTUFHRVMgV0hBVFNPRVZFUiBS RVNVTFRJTkcgRlJPTSBMT1NTIE9GCj4gVVNFLCBEQVRBIE9SIFBST0ZJVFMsIFdIRVRIRVIgSU4g QU4gQUNUSU9OIE9GIENPTlRSQUNULCBORUdMSUdFTkNFCj4gT1IgT1RIRVIgVE9SVElPVVMgQUNU SU9OLCBBUklTSU5HIE9VVCBPRiBPUiBJTiBDT05ORUNUSU9OIFdJVEggVEhFCj4gVVNFIE9SIFBF UkZPUk1BTkNFIE9GIFRISVMgU09GVFdBUkUuCj4gCj4gCj4gUHlFeHBhdCwgU0FYIGxpYnJhcmll czoKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLQo+IEJFT1BFTiBQWVRIT04gT1BFTiBTT1VSQ0UgTElDRU5TRSBBR1JF RU1FTlQgVkVSU0lPTiAxCj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0KPiAKPiAxLiBUaGlzIExJQ0VOU0UgQUdSRUVNRU5UIGlzIGJldHdlZW4g QmVPcGVuLmNvbSAoIkJlT3BlbiIpLCBoYXZpbmcgYW4KPiBvZmZpY2UgYXQgMTYwIFNhcmF0b2dh IEF2ZW51ZSwgU2FudGEgQ2xhcmEsIENBIDk1MDUxLCBhbmQgdGhlCj4gSW5kaXZpZHVhbCBvciBP cmdhbml6YXRpb24gKCJMaWNlbnNlZSIpIGFjY2Vzc2luZyBhbmQgb3RoZXJ3aXNlIHVzaW5nCj4g dGhpcyBzb2Z0d2FyZSBpbiBzb3VyY2Ugb3IgYmluYXJ5IGZvcm0gYW5kIGl0cyBhc3NvY2lhdGVk Cj4gZG9jdW1lbnRhdGlvbiAoInRoZSBTb2Z0d2FyZSIpLgo+IAo+IDIuIFN1YmplY3QgdG8gdGhl IHRlcm1zIGFuZCBjb25kaXRpb25zIG9mIHRoaXMgQmVPcGVuIFB5dGhvbiBMaWNlbnNlCj4gQWdy ZWVtZW50LCBCZU9wZW4gaGVyZWJ5IGdyYW50cyBMaWNlbnNlZSBhIG5vbi1leGNsdXNpdmUsCj4g cm95YWx0eS1mcmVlLCB3b3JsZC13aWRlIGxpY2Vuc2UgdG8gcmVwcm9kdWNlLCBhbmFseXplLCB0 ZXN0LCBwZXJmb3JtCj4gYW5kL29yIGRpc3BsYXkgcHVibGljbHksIHByZXBhcmUgZGVyaXZhdGl2 ZSB3b3JrcywgZGlzdHJpYnV0ZSwgYW5kCj4gb3RoZXJ3aXNlIHVzZSB0aGUgU29mdHdhcmUgYWxv bmUgb3IgaW4gYW55IGRlcml2YXRpdmUgdmVyc2lvbiwKPiBwcm92aWRlZCwgaG93ZXZlciwgdGhh dCB0aGUgQmVPcGVuIFB5dGhvbiBMaWNlbnNlIGlzIHJldGFpbmVkIGluIHRoZQo+IFNvZnR3YXJl LCBhbG9uZSBvciBpbiBhbnkgZGVyaXZhdGl2ZSB2ZXJzaW9uIHByZXBhcmVkIGJ5IExpY2Vuc2Vl Lgo+IAo+IDMuIEJlT3BlbiBpcyBtYWtpbmcgdGhlIFNvZnR3YXJlIGF2YWlsYWJsZSB0byBMaWNl bnNlZSBvbiBhbiAiQVMgSVMiCj4gYmFzaXMuICBCRU9QRU4gTUFLRVMgTk8gUkVQUkVTRU5UQVRJ T05TIE9SIFdBUlJBTlRJRVMsIEVYUFJFU1MgT1IKPiBJTVBMSUVELiAgQlkgV0FZIE9GIEVYQU1Q TEUsIEJVVCBOT1QgTElNSVRBVElPTiwgQkVPUEVOIE1BS0VTIE5PIEFORAo+IERJU0NMQUlNUyBB TlkgUkVQUkVTRU5UQVRJT04gT1IgV0FSUkFOVFkgT0YgTUVSQ0hBTlRBQklMSVRZIE9SIEZJVE5F U1MKPiBGT1IgQU5ZIFBBUlRJQ1VMQVIgUFVSUE9TRSBPUiBUSEFUIFRIRSBVU0UgT0YgVEhFIFNP RlRXQVJFIFdJTEwgTk9UCj4gSU5GUklOR0UgQU5ZIFRISVJEIFBBUlRZIFJJR0hUUy4KPiAKPiA0 LiBCRU9QRU4gU0hBTEwgTk9UIEJFIExJQUJMRSBUTyBMSUNFTlNFRSBPUiBBTlkgT1RIRVIgVVNF UlMgT0YgVEhFCj4gU09GVFdBUkUgRk9SIEFOWSBJTkNJREVOVEFMLCBTUEVDSUFMLCBPUiBDT05T RVFVRU5USUFMIERBTUFHRVMgT1IgTE9TUwo+IEFTIEEgUkVTVUxUIE9GIFVTSU5HLCBNT0RJRllJ TkcgT1IgRElTVFJJQlVUSU5HIFRIRSBTT0ZUV0FSRSwgT1IgQU5ZCj4gREVSSVZBVElWRSBUSEVS RU9GLCBFVkVOIElGIEFEVklTRUQgT0YgVEhFIFBPU1NJQklMSVRZIFRIRVJFT0YuCj4gCj4gNS4g VGhpcyBMaWNlbnNlIEFncmVlbWVudCB3aWxsIGF1dG9tYXRpY2FsbHkgdGVybWluYXRlIHVwb24g YSBtYXRlcmlhbAo+IGJyZWFjaCBvZiBpdHMgdGVybXMgYW5kIGNvbmRpdGlvbnMuCj4gCj4gNi4g VGhpcyBMaWNlbnNlIEFncmVlbWVudCBzaGFsbCBiZSBnb3Zlcm5lZCBieSBhbmQgaW50ZXJwcmV0 ZWQgaW4gYWxsCj4gcmVzcGVjdHMgYnkgdGhlIGxhdyBvZiB0aGUgU3RhdGUgb2YgQ2FsaWZvcm5p YSwgZXhjbHVkaW5nIGNvbmZsaWN0IG9mCj4gbGF3IHByb3Zpc2lvbnMuICBOb3RoaW5nIGluIHRo aXMgTGljZW5zZSBBZ3JlZW1lbnQgc2hhbGwgYmUgZGVlbWVkIHRvCj4gY3JlYXRlIGFueSByZWxh dGlvbnNoaXAgb2YgYWdlbmN5LCBwYXJ0bmVyc2hpcCwgb3Igam9pbnQgdmVudHVyZQo+IGJldHdl ZW4gQmVPcGVuIGFuZCBMaWNlbnNlZS4gIFRoaXMgTGljZW5zZSBBZ3JlZW1lbnQgZG9lcyBub3Qg Z3JhbnQKPiBwZXJtaXNzaW9uIHRvIHVzZSBCZU9wZW4gdHJhZGVtYXJrcyBvciB0cmFkZSBuYW1l cyBpbiBhIHRyYWRlbWFyawo+IHNlbnNlIHRvIGVuZG9yc2Ugb3IgcHJvbW90ZSBwcm9kdWN0cyBv ciBzZXJ2aWNlcyBvZiBMaWNlbnNlZSwgb3IgYW55Cj4gdGhpcmQgcGFydHkuICBBcyBhbiBleGNl cHRpb24sIHRoZSAiQmVPcGVuIFB5dGhvbiIgbG9nb3MgYXZhaWxhYmxlIGF0Cj4gaHR0cDovL3d3 dy5weXRob25sYWJzLmNvbS9sb2dvcy5odG1sIG1heSBiZSB1c2VkIGFjY29yZGluZyB0byB0aGUK PiBwZXJtaXNzaW9ucyBncmFudGVkIG9uIHRoYXQgd2ViIHBhZ2UuCj4gCj4gNy4gQnkgY29weWlu ZywgaW5zdGFsbGluZyBvciBvdGhlcndpc2UgdXNpbmcgdGhlIHNvZnR3YXJlLCBMaWNlbnNlZQo+ IGFncmVlcyB0byBiZSBib3VuZCBieSB0aGUgdGVybXMgYW5kIGNvbmRpdGlvbnMgb2YgdGhpcyBM aWNlbnNlCj4gQWdyZWVtZW50Lgo+IAo+IAo+IENOUkkgT1BFTiBTT1VSQ0UgTElDRU5TRSBBR1JF RU1FTlQKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gCj4gUHl0aG9uIDEu NiBDTlJJIE9QRU4gU09VUkNFIExJQ0VOU0UgQUdSRUVNRU5UCj4gCj4gSU1QT1JUQU5UOiBQTEVB U0UgUkVBRCBUSEUgRk9MTE9XSU5HIEFHUkVFTUVOVCBDQVJFRlVMTFkuIEJZIENMSUNLSU5HCj4g T04gIkFDQ0VQVCIgV0hFUkUgSU5ESUNBVEVEIEJFTE9XLCBPUiBCWSBDT1BZSU5HLCBJTlNUQUxM SU5HIE9SCj4gT1RIRVJXSVNFIFVTSU5HIFBZVEhPTiAxLjYgU09GVFdBUkUsIFlPVSBBUkUgREVF TUVEIFRPIEhBVkUgQUdSRUVEIFRPCj4gVEhFIFRFUk1TIEFORCBDT05ESVRJT05TIE9GIFRISVMg TElDRU5TRSBBR1JFRU1FTlQuCj4gCj4gMS4gVGhpcyBMSUNFTlNFIEFHUkVFTUVOVCBpcyBiZXR3 ZWVuIHRoZSBDb3Jwb3JhdGlvbiBmb3IgTmF0aW9uYWwKPiBSZXNlYXJjaCBJbml0aWF0aXZlcywg aGF2aW5nIGFuIG9mZmljZSBhdCAxODk1IFByZXN0b24gV2hpdGUgRHJpdmUsCj4gUmVzdG9uLCBW QSAyMDE5MSAoIkNOUkkiKSwgYW5kIHRoZSBJbmRpdmlkdWFsIG9yIE9yZ2FuaXphdGlvbgo+ICgi TGljZW5zZWUiKSBhY2Nlc3NpbmcgYW5kIG90aGVyd2lzZSB1c2luZyBQeXRob24gMS42IHNvZnR3 YXJlIGluCj4gc291cmNlIG9yIGJpbmFyeSBmb3JtIGFuZCBpdHMgYXNzb2NpYXRlZCBkb2N1bWVu dGF0aW9uLCBhcyByZWxlYXNlZCBhdAo+IHRoZSB3d3cucHl0aG9uLm9yZyBJbnRlcm5ldCBzaXRl IG9uIFNlcHRlbWJlciA1LCAyMDAwICgiUHl0aG9uIDEuNiIpLgo+IAo+IDIuIFN1YmplY3QgdG8g dGhlIHRlcm1zIGFuZCBjb25kaXRpb25zIG9mIHRoaXMgTGljZW5zZSBBZ3JlZW1lbnQsIENOUkkK PiBoZXJlYnkgZ3JhbnRzIExpY2Vuc2VlIGEgbm9uZXhjbHVzaXZlLCByb3lhbHR5LWZyZWUsIHdv cmxkLXdpZGUKPiBsaWNlbnNlIHRvIHJlcHJvZHVjZSwgYW5hbHl6ZSwgdGVzdCwgcGVyZm9ybSBh bmQvb3IgZGlzcGxheSBwdWJsaWNseSwKPiBwcmVwYXJlIGRlcml2YXRpdmUgd29ya3MsIGRpc3Ry aWJ1dGUsIGFuZCBvdGhlcndpc2UgdXNlIFB5dGhvbiAxLjYKPiBhbG9uZSBvciBpbiBhbnkgZGVy aXZhdGl2ZSB2ZXJzaW9uLCBwcm92aWRlZCwgaG93ZXZlciwgdGhhdCBDTlJJJ3MKPiBMaWNlbnNl IEFncmVlbWVudCBhbmQgQ05SSSdzIG5vdGljZSBvZiBjb3B5cmlnaHQsIGkuZS4sICJDb3B5cmln aHQgKGMpCj4gMTk5NS0yMDAwIENvcnBvcmF0aW9uIGZvciBOYXRpb25hbCBSZXNlYXJjaCBJbml0 aWF0aXZlczsgQWxsIFJpZ2h0cwo+IFJlc2VydmVkIiBhcmUgcmV0YWluZWQgaW4gUHl0aG9uIDEu NiBhbG9uZSBvciBpbiBhbnkgZGVyaXZhdGl2ZQo+IHZlcnNpb24gcHJlcGFyZWQgYnkKPiAKPiBM aWNlbnNlZS4gQWx0ZXJuYXRlbHksIGluIGxpZXUgb2YgQ05SSSdzIExpY2Vuc2UgQWdyZWVtZW50 LCBMaWNlbnNlZQo+IG1heSBzdWJzdGl0dXRlIHRoZSBmb2xsb3dpbmcgdGV4dCAob21pdHRpbmcg dGhlIHF1b3Rlcyk6ICJQeXRob24gMS42Cj4gaXMgbWFkZSBhdmFpbGFibGUgc3ViamVjdCB0byB0 aGUgdGVybXMgYW5kIGNvbmRpdGlvbnMgaW4gQ05SSSdzCj4gTGljZW5zZSBBZ3JlZW1lbnQuIFRo aXMgQWdyZWVtZW50IHRvZ2V0aGVyIHdpdGggUHl0aG9uIDEuNiBtYXkgYmUKPiBsb2NhdGVkIG9u IHRoZSBJbnRlcm5ldCB1c2luZyB0aGUgZm9sbG93aW5nIHVuaXF1ZSwgcGVyc2lzdGVudAo+IGlk ZW50aWZpZXIgKGtub3duIGFzIGEgaGFuZGxlKTogMTg5NS4yMi8xMDEyLiBUaGlzIEFncmVlbWVu dCBtYXkgYWxzbwo+IGJlIG9idGFpbmVkIGZyb20gYSBwcm94eSBzZXJ2ZXIgb24gdGhlIEludGVy bmV0IHVzaW5nIHRoZSBmb2xsb3dpbmcKPiBVUkw6IGh0dHA6Ly9oZGwuaGFuZGxlLm5ldC8xODk1 LjIyLzEwMTIiLgo+IAo+IDMuIEluIHRoZSBldmVudCBMaWNlbnNlZSBwcmVwYXJlcyBhIGRlcml2 YXRpdmUgd29yayB0aGF0IGlzIGJhc2VkIG9uCj4gb3IgaW5jb3Jwb3JhdGVzIFB5dGhvbiAxLjYg b3IgYW55IHBhcnQgdGhlcmVvZiwgYW5kIHdhbnRzIHRvIG1ha2UgdGhlCj4gZGVyaXZhdGl2ZSB3 b3JrIGF2YWlsYWJsZSB0byBvdGhlcnMgYXMgcHJvdmlkZWQgaGVyZWluLCB0aGVuIExpY2Vuc2Vl Cj4gaGVyZWJ5IGFncmVlcyB0byBpbmNsdWRlIGluIGFueSBzdWNoIHdvcmsgYSBicmllZiBzdW1t YXJ5IG9mIHRoZQo+IGNoYW5nZXMgbWFkZSB0byBQeXRob24gMS42Lgo+IAo+IDQuIENOUkkgaXMg bWFraW5nIFB5dGhvbiAxLjYgYXZhaWxhYmxlIHRvIExpY2Vuc2VlIG9uIGFuICJBUyBJUyIKPiBi YXNpcy4gQ05SSSBNQUtFUyBOTyBSRVBSRVNFTlRBVElPTlMgT1IgV0FSUkFOVElFUywgRVhQUkVT UyBPUgo+IElNUExJRUQuIEJZIFdBWSBPRiBFWEFNUExFLCBCVVQgTk9UIExJTUlUQVRJT04sIENO UkkgTUFLRVMgTk8gQU5ECj4gRElTQ0xBSU1TIEFOWSBSRVBSRVNFTlRBVElPTiBPUiBXQVJSQU5U WSBPRiBNRVJDSEFOVEFCSUxJVFkgT1IgRklUTkVTUwo+IEZPUiBBTlkgUEFSVElDVUxBUiBQVVJQ T1NFIE9SIFRIQVQgVEhFIFVTRSBPRiBQWVRIT04gMS42IFdJTEwgTk9UCj4gSU5GUklOR0UgQU5Z IFRISVJEIFBBUlRZIFJJR0hUUy4KPiAKPiA1LiBDTlJJIFNIQUxMIE5PVCBCRSBMSUFCTEUgVE8g TElDRU5TRUUgT1IgQU5ZIE9USEVSIFVTRVJTIE9GIFBZVEhPTgo+IDEuNiBGT1IgQU5ZIElOQ0lE RU5UQUwsIFNQRUNJQUwsIE9SIENPTlNFUVVFTlRJQUwgREFNQUdFUyBPUiBMT1NTIEFTIEEKPiBS RVNVTFQgT0YgTU9ESUZZSU5HLCBESVNUUklCVVRJTkcsIE9SIE9USEVSV0lTRSBVU0lORyBQWVRI T04gMS42LCBPUgo+IEFOWSBERVJJVkFUSVZFIFRIRVJFT0YsIEVWRU4gSUYgQURWSVNFRCBPRiBU SEUgUE9TU0lCSUxJVFkgVEhFUkVPRi4KPiAKPiA2LiBUaGlzIExpY2Vuc2UgQWdyZWVtZW50IHdp bGwgYXV0b21hdGljYWxseSB0ZXJtaW5hdGUgdXBvbiBhIG1hdGVyaWFsCj4gYnJlYWNoIG9mIGl0 cyB0ZXJtcyBhbmQgY29uZGl0aW9ucy4KPiAKPiA3LiBUaGlzIExpY2Vuc2UgQWdyZWVtZW50IHNo YWxsIGJlIGdvdmVybmVkIGJ5IGFuZCBpbnRlcnByZXRlZCBpbiBhbGwKPiByZXNwZWN0cyBieSB0 aGUgbGF3IG9mIHRoZSBTdGF0ZSBvZiBWaXJnaW5pYSwgZXhjbHVkaW5nIGNvbmZsaWN0IG9mCj4g bGF3IHByb3Zpc2lvbnMuIE5vdGhpbmcgaW4gdGhpcyBMaWNlbnNlIEFncmVlbWVudCBzaGFsbCBi ZSBkZWVtZWQgdG8KPiBjcmVhdGUgYW55IHJlbGF0aW9uc2hpcCBvZiBhZ2VuY3ksIHBhcnRuZXJz aGlwLCBvciBqb2ludCB2ZW50dXJlCj4gYmV0d2VlbiBDTlJJIGFuZCBMaWNlbnNlZS4gVGhpcyBM aWNlbnNlIEFncmVlbWVudCBkb2VzIG5vdCBncmFudAo+IHBlcm1pc3Npb24gdG8gdXNlIENOUkkg dHJhZGVtYXJrcyBvciB0cmFkZSBuYW1lIGluIGEgdHJhZGVtYXJrIHNlbnNlCj4gdG8gZW5kb3Jz ZSBvciBwcm9tb3RlIHByb2R1Y3RzIG9yIHNlcnZpY2VzIG9mIExpY2Vuc2VlLCBvciBhbnkgdGhp cmQKPiBwYXJ0eS4KPiAKPiA4LiBCeSBjbGlja2luZyBvbiB0aGUgIkFDQ0VQVCIgYnV0dG9uIHdo ZXJlIGluZGljYXRlZCwgb3IgYnkgY29weWluZywKPiBpbnN0YWxsaW5nIG9yIG90aGVyd2lzZSB1 c2luZyBQeXRob24gMS42LCBMaWNlbnNlZSBhZ3JlZXMgdG8gYmUgYm91bmQKPiBieSB0aGUgdGVy bXMgYW5kIGNvbmRpdGlvbnMgb2YgdGhpcyBMaWNlbnNlIEFncmVlbWVudC4KPiAKPiBBQ0NFUFQK PiAKPiAKPiBDV0kgUEVSTUlTU0lPTlMgU1RBVEVNRU5UIEFORCBESVNDTEFJTUVSCj4gLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+IAo+IENvcHlyaWdodCAoYykgMTk5 MSAtIDE5OTUsIFN0aWNodGluZyBNYXRoZW1hdGlzY2ggQ2VudHJ1bSBBbXN0ZXJkYW0sCj4gVGhl IE5ldGhlcmxhbmRzLiAgQWxsIHJpZ2h0cyByZXNlcnZlZC4KPiAKPiBQZXJtaXNzaW9uIHRvIHVz ZSwgY29weSwgbW9kaWZ5LCBhbmQgZGlzdHJpYnV0ZSB0aGlzIHNvZnR3YXJlIGFuZCBpdHMKPiBk b2N1bWVudGF0aW9uIGZvciBhbnkgcHVycG9zZSBhbmQgd2l0aG91dCBmZWUgaXMgaGVyZWJ5IGdy YW50ZWQsCj4gcHJvdmlkZWQgdGhhdCB0aGUgYWJvdmUgY29weXJpZ2h0IG5vdGljZSBhcHBlYXIg aW4gYWxsIGNvcGllcyBhbmQgdGhhdAo+IGJvdGggdGhhdCBjb3B5cmlnaHQgbm90aWNlIGFuZCB0 aGlzIHBlcm1pc3Npb24gbm90aWNlIGFwcGVhciBpbgo+IHN1cHBvcnRpbmcgZG9jdW1lbnRhdGlv biwgYW5kIHRoYXQgdGhlIG5hbWUgb2YgU3RpY2h0aW5nIE1hdGhlbWF0aXNjaAo+IENlbnRydW0g b3IgQ1dJIG5vdCBiZSB1c2VkIGluIGFkdmVydGlzaW5nIG9yIHB1YmxpY2l0eSBwZXJ0YWluaW5n IHRvCjI1LDM2YzE2NiwxNzIKPCBXaGlsZSBDV0kgaXMgdGhlIGluaXRpYWwgc291cmNlIGZvciB0 aGlzIHNvZnR3YXJlLCBhIG1vZGlmaWVkIHZlcnNpb24KPCBpcyBtYWRlIGF2YWlsYWJsZSBieSB0 aGUgQ29ycG9yYXRpb24gZm9yIE5hdGlvbmFsIFJlc2VhcmNoIEluaXRpYXRpdmVzCjwgKENOUkkp IGF0IHRoZSBJbnRlcm5ldCBhZGRyZXNzIGZ0cDovL2Z0cC5weXRob24ub3JnLgo8IAo8IFNUSUNI VElORyBNQVRIRU1BVElTQ0ggQ0VOVFJVTSBBTkQgQ05SSSBESVNDTEFJTSBBTEwgV0FSUkFOVElF UyBXSVRICjwgUkVHQVJEIFRPIFRISVMgU09GVFdBUkUsIElOQ0xVRElORyBBTEwgSU1QTElFRCBX QVJSQU5USUVTIE9GCjwgTUVSQ0hBTlRBQklMSVRZIEFORCBGSVRORVNTLCBJTiBOTyBFVkVOVCBT SEFMTCBTVElDSFRJTkcgTUFUSEVNQVRJU0NICjwgQ0VOVFJVTSBPUiBDTlJJIEJFIExJQUJMRSBG T1IgQU5ZIFNQRUNJQUwsIElORElSRUNUIE9SIENPTlNFUVVFTlRJQUwKPCBEQU1BR0VTIE9SIEFO WSBEQU1BR0VTIFdIQVRTT0VWRVIgUkVTVUxUSU5HIEZST00gTE9TUyBPRiBVU0UsIERBVEEgT1IK PCBQUk9GSVRTLCBXSEVUSEVSIElOIEFOIEFDVElPTiBPRiBDT05UUkFDVCwgTkVHTElHRU5DRSBP UiBPVEhFUgo8IFRPUlRJT1VTIEFDVElPTiwgQVJJU0lORyBPVVQgT0YgT1IgSU4gQ09OTkVDVElP TiBXSVRIIFRIRSBVU0UgT1IKPCBQRVJGT1JNQU5DRSBPRiBUSElTIFNPRlRXQVJFLgotLS0KPiBT VElDSFRJTkcgTUFUSEVNQVRJU0NIIENFTlRSVU0gRElTQ0xBSU1TIEFMTCBXQVJSQU5USUVTIFdJ VEggUkVHQVJEIFRPCj4gVEhJUyBTT0ZUV0FSRSwgSU5DTFVESU5HIEFMTCBJTVBMSUVEIFdBUlJB TlRJRVMgT0YgTUVSQ0hBTlRBQklMSVRZIEFORAo+IEZJVE5FU1MsIElOIE5PIEVWRU5UIFNIQUxM IFNUSUNIVElORyBNQVRIRU1BVElTQ0ggQ0VOVFJVTSBCRSBMSUFCTEUKPiBGT1IgQU5ZIFNQRUNJ QUwsIElORElSRUNUIE9SIENPTlNFUVVFTlRJQUwgREFNQUdFUyBPUiBBTlkgREFNQUdFUwo+IFdI QVRTT0VWRVIgUkVTVUxUSU5HIEZST00gTE9TUyBPRiBVU0UsIERBVEEgT1IgUFJPRklUUywgV0hF VEhFUiBJTiBBTgo+IEFDVElPTiBPRiBDT05UUkFDVCwgTkVHTElHRU5DRSBPUiBPVEhFUiBUT1JU SU9VUyBBQ1RJT04sIEFSSVNJTkcgT1VUCj4gT0YgT1IgSU4gQ09OTkVDVElPTiBXSVRIIFRIRSBV U0UgT1IgUEVSRk9STUFOQ0UgT0YgVEhJUyBTT0ZUV0FSRS4KNDUsNDdjMTgxCjwgc2F4bGliOgo8 IAo8IHNnbWxvcC5jCi0tLQo+IHNnbWxvcC5jOgo1Niw2MmQxODkKPCB4bWxhcmNoOgo8IC0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tCjwgQ29weXJpZ2h0IChDKSAxOTk4IGJ5IEdlaXIgTy4gR3L4bm1vLCBncm92ZUBpbmZv dGVrLm5vCjwgCjwgRnJlZSBmb3IgY29tbWVyY2lhbCBhbmQgbm9uLWNvbW1lcmNpYWwgdXNlLgo8 IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tCjwgCjY3YTE5NSwxOTcKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+IAo+IHNldHVwZXh0L2lu c3RhbGxfZGF0YS5weToKNjhhMTk5LDIxNgo+IFBlcm1pc3Npb24gaXMgaGVyZWJ5IGdyYW50ZWQs IGZyZWUgb2YgY2hhcmdlLCB0byBhbnkgcGVyc29uIG9idGFpbmluZwo+IGEgY29weSBvZiB0aGlz IHNvZnR3YXJlIGFuZCBhc3NvY2lhdGVkIGRvY3VtZW50YXRpb24gZmlsZXMgKHRoZQo+ICJTb2Z0 d2FyZSIpLCB0byBkZWFsIGluIHRoZSBTb2Z0d2FyZSB3aXRob3V0IHJlc3RyaWN0aW9uLCBpbmNs dWRpbmcKPiB3aXRob3V0IGxpbWl0YXRpb24gdGhlIHJpZ2h0cyB0byB1c2UsIGNvcHksIG1vZGlm eSwgbWVyZ2UsIHB1Ymxpc2gsCj4gZGlzdHJpYnV0ZSwgc3VibGljZW5zZSwgYW5kL29yIHNlbGwg Y29waWVzIG9mIHRoZSBTb2Z0d2FyZSwgYW5kIHRvCj4gcGVybWl0IHBlcnNvbnMgdG8gd2hvbSB0 aGUgU29mdHdhcmUgaXMgZnVybmlzaGVkIHRvIGRvIHNvLCBzdWJqZWN0IHRvCj4gdGhlIGZvbGxv d2luZyBjb25kaXRpb25zOgo+ICAKPiBUaGUgYWJvdmUgY29weXJpZ2h0IG5vdGljZSBhbmQgdGhp cyBwZXJtaXNzaW9uIG5vdGljZSBzaGFsbCBiZSBpbmNsdWRlZAo+IGluIGFsbCBjb3BpZXMgb3Ig c3Vic3RhbnRpYWwgcG9ydGlvbnMgb2YgdGhlIFNvZnR3YXJlLgo+ICAKPiBUSEUgU09GVFdBUkUg SVMgUFJPVklERUQgIkFTIElTIiwgV0lUSE9VVCBXQVJSQU5UWSBPRiBBTlkgS0lORCwKPiBFWFBS RVNTIE9SIElNUExJRUQsIElOQ0xVRElORyBCVVQgTk9UIExJTUlURUQgVE8gVEhFIFdBUlJBTlRJ RVMgT0YKPiBNRVJDSEFOVEFCSUxJVFksIEZJVE5FU1MgRk9SIEEgUEFSVElDVUxBUiBQVVJQT1NF IEFORCBOT05JTkZSSU5HRU1FTlQuCj4gSU4gTk8gRVZFTlQgU0hBTEwgVEhFIEFVVEhPUlMgT1Ig Q09QWVJJR0hUIEhPTERFUlMgQkUgTElBQkxFIEZPUiBBTlkKPiBDTEFJTSwgREFNQUdFUyBPUiBP VEhFUiBMSUFCSUxJVFksIFdIRVRIRVIgSU4gQU4gQUNUSU9OIE9GIENPTlRSQUNULAo+IFRPUlQg T1IgT1RIRVJXSVNFLCBBUklTSU5HIEZST00sIE9VVCBPRiBPUiBJTiBDT05ORUNUSU9OIFdJVEgg VEhFCj4gU09GVFdBUkUgT1IgVEhFIFVTRSBPUiBPVEhFUiBERUFMSU5HUyBJTiBUSEUgU09GVFdB UkUuCmRpZmYgUHlYTUwtMC42LjIvTUFOSUZFU1QgUHlYTUwtMC42LjMvTUFOSUZFU1QKMTBhMTEK PiBzZXR1cC5jZmcKNjJjNjMKPCBkZW1vL3hiZWwveGJlbC5kdGQKLS0tCj4gZGVtby94YmVsL3hi ZWwtMS4wLmR0ZAoxMDhkMTA4CjwgZXh0ZW5zaW9ucy9leHBhdC9NUEwtMV8wLmh0bWwKMTEyLDEx M2QxMTEKPCBleHRlbnNpb25zL2V4cGF0L2V4cGF0Lm1hawo8IGV4dGVuc2lvbnMvZXhwYXQvZ3Bs ZWxlY3QuaHRtbAoxMTcsMTE4ZDExNAo8IGV4dGVuc2lvbnMvZXhwYXQveG1scGFyc2UvaGFzaHRh YmxlLmMKPCBleHRlbnNpb25zL2V4cGF0L3htbHBhcnNlL2hhc2h0YWJsZS5oCjEyMWExMTgKPiBl eHRlbnNpb25zL2V4cGF0L3htbHRvay9hc2NpaS5oCjE0OWExNDcsMTQ5Cj4gc2V0dXBleHQvX19p bml0X18ucHkKPiBzZXR1cGV4dC9pbnN0YWxsX2RhdGEucHkKPiB0ZXN0L2VuY190ZXN0LnhtbAox NTVhMTU2Cj4gdGVzdC90ZXN0X2VuY29kaW5ncy5weQoyMDJhMjA0LDI2Mgo+IHRlc3QvZG9tL2h0 bWwvdGVzdC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9hLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0 X2FwcGxldC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9hcmVhLnB5Cj4gdGVzdC9kb20vaHRtbC90 ZXN0X2Jhc2UucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfYmFzZWZvbnQucHkKPiB0ZXN0L2RvbS9o dG1sL3Rlc3RfYmxvY2txdW90ZS5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9ib2R5LnB5Cj4gdGVz dC9kb20vaHRtbC90ZXN0X2JyLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2J1dHRvbi5weQo+IHRl c3QvZG9tL2h0bWwvdGVzdF9jYXB0aW9uLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2NvbC5weQo+ IHRlc3QvZG9tL2h0bWwvdGVzdF9jb2xsZWN0aW9uLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2Rp ci5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9kaXYucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZGwu cHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZG9jdW1lbnQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rf ZWxlbWVudC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9maWVsZHNldC5weQo+IHRlc3QvZG9tL2h0 bWwvdGVzdF9mb250LnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2Zvcm0ucHkKPiB0ZXN0L2RvbS9o dG1sL3Rlc3RfZnJhbWUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZnJhbWVzZXQucHkKPiB0ZXN0 L2RvbS9odG1sL3Rlc3RfaC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9oZWFkLnB5Cj4gdGVzdC9k b20vaHRtbC90ZXN0X2hyLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2h0bWwucHkKPiB0ZXN0L2Rv bS9odG1sL3Rlc3RfaHRtbF9kb21faW1wbGVtZW50YXRpb24ucHkKPiB0ZXN0L2RvbS9odG1sL3Rl c3RfaWZyYW1lLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2ltZy5weQo+IHRlc3QvZG9tL2h0bWwv dGVzdF9pbnB1dC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9pc2luZGV4LnB5Cj4gdGVzdC9kb20v aHRtbC90ZXN0X2xhYmVsLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2xlZ2VuZC5weQo+IHRlc3Qv ZG9tL2h0bWwvdGVzdF9saS5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9saW5rLnB5Cj4gdGVzdC9k b20vaHRtbC90ZXN0X21hcC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9tZW51LnB5Cj4gdGVzdC9k b20vaHRtbC90ZXN0X21ldGEucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfbW9kLnB5Cj4gdGVzdC9k b20vaHRtbC90ZXN0X29iamVjdC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9vbC5weQo+IHRlc3Qv ZG9tL2h0bWwvdGVzdF9vcHRncm91cC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9vcHRpb24ucHkK PiB0ZXN0L2RvbS9odG1sL3Rlc3RfcC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9wYXJhbS5weQo+ IHRlc3QvZG9tL2h0bWwvdGVzdF9wcmUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfcS5weQo+IHRl c3QvZG9tL2h0bWwvdGVzdF9zY3JpcHQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rfc2VjdGlvbi5w eQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9zZWxlY3QucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rfc3R5 bGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdGFibGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rf dGQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdGV4dGFyZWEucHkKPiB0ZXN0L2RvbS9odG1sL3Rl c3RfdGl0bGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdHIucHkKPiB0ZXN0L2RvbS9odG1sL3Rl c3RfdWwucHkKPiB0ZXN0L2RvbS9odG1sL3V0aWwucHkKMjA0YTI2NQo+IHRlc3Qvb3V0cHV0L3Rl c3RfZW5jb2RpbmdzCjIzM2EyOTUKPiB4bWwvZG9tL0Z0Tm9kZS5weQoyMzVkMjk2CjwgeG1sL2Rv bS9Ob2RlLnB5CjQyNmE0ODgsNDkwCj4geG1sL3VuaWNvZGUvX19pbml0X18ucHkKPiB4bWwvdW5p Y29kZS9pc284ODU5LnB5Cj4geG1sL3VuaWNvZGUvdXRmOF9pc28ucHkKZGlmZiBQeVhNTC0wLjYu Mi9NQU5JRkVTVC5pbiBQeVhNTC0wLjYuMy9NQU5JRkVTVC5pbgo2MmE2Myw2NAo+IAo+IGluY2x1 ZGUgc2V0dXBleHQvKi5weQpkaWZmIFB5WE1MLTAuNi4yL1JFQURNRSBQeVhNTC0wLjYuMy9SRUFE TUUKMzVhMzYKPiAJbWluaWRvbQkJCVBhdWwgUHJlc2NvZApkaWZmIFB5WE1MLTAuNi4yL1JFQURN RS5weWV4cGF0IFB5WE1MLTAuNi4zL1JFQURNRS5weWV4cGF0CjEsMmMxLDIKPCBQeXRob24gRXhw YXQgd3JhcHBlciBtb2R1bGUsIHZlcnNpb24gb2YgMTktTWF5LTk4CjwgPT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0KPiBQeXRob24gRXhwYXQgd3Jh cHBlciBtb2R1bGUKPiA9PT09PT09PT09PT09PT09PT09PT09PT09PT0KNCwxMGM0CjwgSWYgeW91 IGhhdmUgZG93bmxvYWRlZCB0aGUgYmluYXJ5IGRpc3RyaWJ1dGlvbiBmb3IgdGhlIG1hY2ludG9z aCB5b3UKPCBjYW4gc2tpcCB0aGUgImJ1aWxkaW5nIiBzZWN0aW9ucyBhbmQgZ28gc3RyYWlnaHQg dG8gdGhlICJ1c2luZyIKPCBiaXQuIElmIHlvdSBhcmUgdXNpbmcgYSBtYWNpbnRvc2ggYW5kIGRv IHdhbnQgdG8gYnVpbGQgZnJvbSBzb3VyY2UgeW91IAo8IHNob3VsZCBnZXQgdGhlIHB5ZXhwYXQu dGd6IGRpc3RyaWJ1dGlvbiAoU3R1ZmZpdCBFeHBhbmRlciB3aXRoCjwgRXhwYW5kZXIgRW5oYW5j ZXIgd2lsbCBrbm93IGhvdyB0byB1bnBhY2sgYSBnemlwcGVkIHRhciBmaWxlKS4KPCAJCjwgQnVp bGRpbmcgdGhlIHB5ZXhwYXQgbW9kdWxlIHVuZGVyIHVuaXgKLS0tCj4gQnVpbGRpbmcgdGhlIHB5 ZXhwYXQgbW9kdWxlCjEzLDMwYzcKPCAtIEJ1aWxkIGxpYmV4cGF0LmEgaW4gZXhwYXQuIFRoaXMg dmVyc2lvbiBpcyB2ZXJ5IHNsaWdodGx5IGRpZmZlcmVudCAKPCAgIGZyb20gdGhlIG9yaWdpbmFs IGJ5IEphbWVzIENsYXJrICh0aGUgbGliZXhwYXQuYSB0YXJnZXQgd2FzIGFkZGVkLAo8ICAgYW5k IGEgZmV3IEMrKyBjb21tZW50cyB3ZXJlIHJlcGxhY2VkIGJ5IEMgY29tbWVudHMpLgo8IC0gRWRp dCBNYWtlZmlsZS5wcmUuaW4gYW5kIHNldCB5b3VyIGluc3RhbGxkaXIKPCAtIG1ha2UgLWYgTWFr ZWZpbGUucHJlLmluIFZFUlNJT049MS41LjEgTWFrZWZpbGUKPCAtIG1ha2Ugc2hhcmVkbW9kcwo8 IC0gcHV0IHRoZSBzaGFyZWQgbW9kdWxlIHNvbWV3aGVyZSBpbiB5b3VyIHN5cy5wYXRoCjwgCjwg KGlmIHlvdSB3YW50IGEgc3RhdGljIFB5dGhvbiBlZGl0IFNldHVwLmluLCBhbmQgcmVwbGFjZSB0 aGUgbGFzdCBsaW5lCjwgd2l0aCAibWFrZSIpLgo8IAo8IEJ1aWxkaW5nIHRoZSBweWV4cGF0IG1v ZHVsZSBvbiB0aGUgbWFjaW50b3NoCjwgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0KPCAtIFVucGFjayB0aGUgdmFyaW91cyAuaHF4IHByb2plY3QgZmlsZXMuCjwg LSBBbGwgdGhlIHByb2plY3RzIGFyZSBsaW5rZWQsIHNvIGJ1aWxkaW5nIHB5ZXhwYXQucHJqIHNo b3VsZCBidWlsZAo8ICAgZXZlcnl0aGluZy4gSWYgdGhpcyBkb2Vzbid0IHdvcmsgeW91IHdpbGwg ZmluZCB0aGUgbGlicmFyeQo8ICAgc3VicHJvamVjdHMgdG8gYnVpbGQgaW4gdGhlIGV4cGF0IGZv bGRlci4KPCAtIFVzZSBFZGl0UHl0aG9uUHJlZnMgdG8gYWRkIHRoZSBjdXJyZW50IGZvbGRlciB0 byBzeXMucGF0aC4KLS0tCj4gVGhlIG1vZHVsZSBpcyBidWlsdCBhcyBwYXJ0IG9mIHJ1bm5pbmcg c2V0dXAucHkKNDMsNDVjMjAsMjEKPCAJdGhpcyBpcyB0aGUgbGFzdCBiaXQgb2YgZGF0YS4gUmV0 dXJucyB0cnVlIGlmIHBhcnNpbmcKPCAJc3VjY2VlZGVkIChzbyBmYXIpLCBvdGhlcndpc2UgdGhl IGVycm9yIGF0dHJpYnV0ZXMgaGF2ZQo8IAlpbmZvcm1hdGlvbiBvbiB0aGUgZXJyb3IuCi0tLQo+ IAl0aGlzIGlzIHRoZSBsYXN0IGJpdCBvZiBkYXRhLiBSYWlzZXMgYW4gZXhjZXB0aW9uIGluIGNh c2Ugb2YKPiAJYW4gZXJyb3IsIHRoZSBlcnJvciBhdHRyaWJ1dGVzIGhhdmUgaW5mb3JtYXRpb24g b24gdGhlIGVycm9yLgo2Nyw2OWM0Mwo8IFRoaXMgbW9kdWxlIGN1cnJlbnRseSBsaXZlcyBhdAo8 IGZ0cDovL2Z0cC5jd2kubmwvcHViL2phY2svcHl0aG9uL3B5ZXhwYXRzcmMudGd6IChzb3VyY2Up IGFuZAo8IGZ0cDovL2Z0cC5jd2kubmwvcHViL2phY2svcHl0aG9uL3B5ZXhwYXQuaHF4IChtYWNp bnRvc2ggYmluYXJ5LW9ubHkpLgotLS0KPiBQbGVhc2UgcmVwb3J0IHByb2JsZW1zIHRvIHhtbC1z aWdAcHl0aG9uLm9yZy4KZGlmZiBQeVhNTC0wLjYuMi9UT0RPIFB5WE1MLTAuNi4zL1RPRE8KMiw3 ZDEKPCAgICAgICAgICogSW50ZWdyYXRlIHdpZGVzdHJpbmcgc3VwcG9ydCB3aXRoIHRoZSBQeUV4 cGF0IG1vZHVsZSAobWFqb3IgdGhpbmcpCjwgCSogU3dpdGNoIHRvIDRET00ncyBET00gaW1wbGVt ZW50YXRpb24gCjwgCSogQWRkIFNBWDIgc3VwcG9ydAo8IAkqIERyb3Agd3N0cm9wICYgVW5pY29k ZTsgUHl0aG9uIDEuNiB3aWxsIGhhbmRsZSB0aGlzCjwgCSogU3BlZWQgdXAgdGhlIGJ1aWxkZXIg Y2xhc3Mgc29tZWhvdywgYW5kIGRvIHNvbWUgcGVyZm9ybWFuY2UgdGVzdHMKPCAJKiBDaGFuZ2Ug SFRNTEJ1aWxkZXIgdG8gdXNlIFNBWCBpbnN0ZWFkIG9mIFNHTUxsaWIKMTBkMwo8IAkqIEFkZCBS RUFETUVzIHRvIGV4aXN0aW5nIGRlbW8gcHJvZ3JhbXMKMTNkNQo8IAkqIHNheGxpYi5BdHRyaWJ1 dGVMaXN0IHNob3VsZCByZWFsbHkgc3VwcG9ydCBhbGwgZGljdGlvbmFyeSBiZWhhdmlvdXIKMTUs MTZkNgo8IAkqIFVwZGF0ZSB0aGUgV2luZG93cyBETExzIGFuZCBpbnRlZ3JhdGUgQ2hyaXN0aWFu IFRpc21lcidzIFdJU0UgCjwgCSAgaW5zdGFsbGVyCjIzLDI5ZDEyCjwgCSogQ29udmVydCBhbGwg dGhlIHJhaXNlIHN0YXRlbWVudCB0byB1c2UgdGhlIGV4Y2VwdGlvbihhcmcpIGZvcm0KPCAKPCAJ KiBJbXBsZW1lbnQgcmVhZGluZyBvZiBTR01MIGRvY3VtZW50cyBmb3IgRmlsZVJlYWRlcgo8IAo8 IAkqIERvY3VtZW50VHlwZSBjbGFzcyBpcyBtb3N0bHkgdW5maW5pc2hlZDsgd2hhdCBzaG91bGQg dGhlCjwgaW50ZXJmYWNlIGZvciBjcmVhdGluZyB0aGVtIGxvb2sgbGlrZT8KPCAKMzIsNDFkMTQK PCAKPCAJKiBXYWxrZXI6IG1lcmdlIHdhbGsoKSBhbmQgd2FsazEoKSBpbnRvIG9uZSBmdW5jdGlv biAob3IgYXQKPCBsZWFzdCBtYWtlIGl0IG1vcmUgZ2VuZXJpYykKPCAKPCAJKiBYbWxMaW5lYXJp c2VyOiB3aGF0IHNob3VsZCBpdCBkbyB3aXRoIFBJcyBhbmQgb3RoZXIgc2ltaWxhciB0aGluZ3M/ CjwgCjwgCSogTm9kZUxpc3QgcmV0dXJuZWQgZnJvbSAuZ2V0RWxlbWVudHNCeVRhZ05hbWUgc2hv dWxkIGJlIGxpdmUuCjwgKEhhcmQsIGFuZCBkb2Vzbid0IHNlZW0gdG8gYmUgdmVyeSB1c2VmdWw7 IEFNSyBkb2Vzbid0IHJlYWxseSBjYXJlLikKPCAKPCAJKiBET00gTGV2ZWwgMiBjaGFuZ2VzCkNv bW1vbiBzdWJkaXJlY3RvcmllczogUHlYTUwtMC42LjIvV2lzZSBhbmQgUHlYTUwtMC42LjMvV2lz ZQpDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2J1aWxkIGFuZCBQeVhNTC0wLjYu My9idWlsZApDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2RlbW8gYW5kIFB5WE1M LTAuNi4zL2RlbW8KQ29tbW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhNTC0wLjYuMi9kb2MgYW5kIFB5 WE1MLTAuNi4zL2RvYwpDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2V4dGVuc2lv bnMgYW5kIFB5WE1MLTAuNi4zL2V4dGVuc2lvbnMKQ29tbW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhN TC0wLjYuMi9tYWMgYW5kIFB5WE1MLTAuNi4zL21hYwpPbmx5IGluIFB5WE1MLTAuNi4zOiBzZXR1 cC5jZmcKZGlmZiBQeVhNTC0wLjYuMi9zZXR1cC5weSBQeVhNTC0wLjYuMy9zZXR1cC5weQo2YzYK PCBpbXBvcnQgc3lzLCBvcwotLS0KPiBpbXBvcnQgc3lzLCBvcywgc3RyaW5nCjhhOQo+IGZyb20g c2V0dXBleHQgaW1wb3J0IERhdGFfRmlsZXMsIGluc3RhbGxfRGF0YV9GaWxlcwo0NmE0OCw1MQo+ ICAgICBpZiAncHlleHBhdCcgaW4gc3lzLmJ1aWx0aW5fbW9kdWxlX25hbWVzOgo+ICAgICAgICAg cHJpbnQgIkVycm9yOiBidWlsdGluIGV4cGF0IGxpYnJhcnkgd2lsbCBjb25mbGljdCB3aXRoIG91 cnMiCj4gICAgICAgICBwcmludCAiUmUtYnVpbGQgcHl0aG9uIHdpdGhvdXQgYnVpbHRpbiBleHBh dCBtb2R1bGUiCj4gICAgICAgICByYWlzZSBTeXN0ZW1FeGl0CjUzYzU4LDYwCjwgICAgICAgICAg ICAgICAgICAgZGVmaW5lX21hY3JvcyA9IFsoJ1hNTF9OUycsIE5vbmUpXSwKLS0tCj4gICAgICAg ICAgICAgICAgICAgZGVmaW5lX21hY3JvcyA9IFsoJ1hNTF9OUycsIE5vbmUpLAo+ICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgKCdYTUxfRFREJyxOb25lKSwKPiAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICgnRVhQQVRfVkVSU0lPTicsJzB4MDEwMjAwJyldLAo2 M2M3MCw3MQo8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICdleHRlbnNpb25zL2V4cGF0 L3htbHBhcnNlL2hhc2h0YWJsZS5jJywKLS0tCj4gICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgIyBHb25lIGluIDEuMgo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMnZXh0ZW5z aW9ucy9leHBhdC94bWxwYXJzZS9oYXNodGFibGUuYycsCjcxYzc5LDEzMwo8ICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAKLS0tCj4gCj4gCj4gIyBPbiBXaW5kb3dzLCBpbnN0YWxs IHRoZSBkb2N1bWVudGF0aW9uIGludG8gYSBkaXJlY3RvcnkgeG1sZG9jLCBhbG9uZwo+ICMgd2l0 aCB4bWwvX3htbHBsdXMuIEZvciBSUE1zLCBkb2NzIGFyZSBpbnN0YWxsZWQgaW50byB0aGUgUlBN IGRvYwo+ICMgZGlyZWN0b3J5IHZpYSBzZXR1cC5jZmcgKHVzdWFsbCAvdXNyL2RvYykuIE9uIGFs bCBvdGhlciBzeXN0ZW1zLCB0aGUKPiAjIGRvY3VtZW50YXRpb24gaXMgbm90IGluc3RhbGxlZC4K PiAKPiBkb2MyeG1sZG9jID0gMAo+IGlmIHN5cy5wbGF0Zm9ybSA9PSAnd2luMzInOgo+ICAgICBk b2MyeG1sZG9jID0gMQo+IAo+ICMgVGhpcyBpcyBhIGZyYWdtZW50IGZyb20gTUFOSUZFU1QuaW4g d2hpY2ggc2hvdWxkIGNvbnRhaW4gYWxsCj4gIyBmaWxlcyB3aGljaCBhcmUgY29uc2lkZXJlZCBk b2N1bWVudGF0aW9uIChkb2MsIGRlbW8sIHRlc3QsIHBsdXMgc29tZQo+ICMgdG9wbGV2ZWwgZmls ZXMpCj4gZG9jZmlsZXM9IiIiCj4gcmVjdXJzaXZlLWluY2x1ZGUgZG9jICouaHRtbCAKPiByZWN1 cnNpdmUtaW5jbHVkZSBkb2MgKi50ZXggCj4gcmVjdXJzaXZlLWluY2x1ZGUgZG9jICoudHh0IAo+ IHJlY3Vyc2l2ZS1pbmNsdWRlIGRvYyAqLmdpZiAKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2MgKi5j c3MKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2MgKi5hcGkKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2Mg Ki53ZWIKPiAKPiByZWN1cnNpdmUtaW5jbHVkZSBkZW1vIFJFQURNRSAKPiByZWN1cnNpdmUtaW5j bHVkZSBkZW1vICoucHkgCj4gcmVjdXJzaXZlLWluY2x1ZGUgZGVtbyAqLnhtbAo+IHJlY3Vyc2l2 ZS1pbmNsdWRlIGRlbW8gKi5kdGQKPiByZWN1cnNpdmUtaW5jbHVkZSBkZW1vICouaHRtbAo+IHJl Y3Vyc2l2ZS1pbmNsdWRlIGRlbW8gKi5odG0KPiBpbmNsdWRlIGRlbW8vZ2VueG1sL2RhdGEudHh0 Cj4gaW5jbHVkZSBkZW1vL2RvbS9odG1sMmh0bWwKPiBpbmNsdWRlIGRlbW8veGJlbC9kb2MveGJl bC5iaWIKPiBpbmNsdWRlIGRlbW8veGJlbC9kb2MveGJlbC50ZXgKPiBpbmNsdWRlIGRlbW8veG1s cHJvYy9jYXRhbG9nLnNvYwo+IAo+IHJlY3Vyc2l2ZS1pbmNsdWRlIHRlc3QgKi5weSAKPiByZWN1 cnNpdmUtaW5jbHVkZSB0ZXN0ICoueG1sCj4gaW5jbHVkZSB0ZXN0L3Rlc3QueG1sLm91dAo+IHJl Y3Vyc2l2ZS1pbmNsdWRlIHRlc3Qvb3V0cHV0IHRlc3RfKgo+IAo+IGluY2x1ZGUgQU5OT1VOQ0Ug Cj4gaW5jbHVkZSBDUkVESVRTIAo+IGluY2x1ZGUgTElDRU5DRSAKPiBpbmNsdWRlIFJFQURNRSog Cj4gaW5jbHVkZSBUT0RPIAo+ICIiIgo+IAo+IGlmIGRvYzJ4bWxkb2M6Cj4gICAgIHhtbGRvY2Zp bGVzID0gWwo+ICAgICAgICAgRGF0YV9GaWxlcyhjb3B5X3RvID0gJ3htbGRvYycsCj4gICAgICAg ICAgICAgICAgICAgIHRlbXBsYXRlID0gc3RyaW5nLnNwbGl0KGRvY2ZpbGVzLCJcbiIpLAo+ICAg ICAgICAgICAgICAgICAgICBwcmVzZXJ2ZV9wYXRoID0gMSkKPiAgICAgICAgIF0KPiBlbHNlOgo+ ICAgICB4bWxkb2NmaWxlcyA9IFtdCjc0YzEzNgo8ICAgICAgICB2ZXJzaW9uID0gIjAuNi4yIiwg IyBOZWVkcyB0byBtYXRjaCB4bWwvX19pbml0X18udmVyc2lvbl9pbmZvCi0tLQo+ICAgICAgICB2 ZXJzaW9uID0gIjAuNi4zIiwgIyBOZWVkcyB0byBtYXRjaCB4bWwvX19pbml0X18udmVyc2lvbl9p bmZvCjgyYTE0NSwxNDcKPiAKPiAgICAgICAgIyBPdmVycmlkZSBjZXJ0YWluIGNvbW1hbmQgY2xh c3NlcyB3aXRoIG91ciBvd24gb25lcwo+ICAgICAgICBjbWRjbGFzcyA9IHsnaW5zdGFsbF9kYXRh JzppbnN0YWxsX0RhdGFfRmlsZXN9LCAKODRhMTUwLDE1MQo+IAo+ICAgICAgICBkYXRhX2ZpbGVz ID0geG1sZG9jZmlsZXMsCjg5YzE1Ngo8ICAgICAgICAgICAgICAgICAgICB4bWwoJy5tYXJzaGFs JyksCi0tLQo+ICAgICAgICAgICAgICAgICAgICB4bWwoJy5tYXJzaGFsJyksIHhtbCgnLnVuaWNv ZGUnKSwKT25seSBpbiBQeVhNTC0wLjYuMzogc2V0dXBleHQKQ29tbW9uIHN1YmRpcmVjdG9yaWVz OiBQeVhNTC0wLjYuMi90ZXN0IGFuZCBQeVhNTC0wLjYuMy90ZXN0Ck9ubHkgaW4gUHlYTUwtMC42 LjI6IHZzX2FjY2VwdExpdmVGaWxlSW5mbz9mZGF0ZT0wMDEyMjYmZm5hbWU9dzAwMTIyNl8wMDAw JmZ0aW1lPTAwMDAmZnR5cGU9dyZmdmVyc2lvbj0wMyZmc2l6ZT00MjA2OQpDb21tb24gc3ViZGly ZWN0b3JpZXM6IFB5WE1MLTAuNi4yL3dpbmRvd3MgYW5kIFB5WE1MLTAuNi4zL3dpbmRvd3MKQ29t bW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhNTC0wLjYuMi94bWwgYW5kIFB5WE1MLTAuNi4zL3htbAo= --Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM-- From Olivier Deckmyn" Hi all, Looks like parser modifies my content :( I have the following "xml" string : """ GB-OTAN-santé 20010110T105314Z AFP La polémique loin d'être apaisée par l'annonce de tests à Londres LONDRES """ One can notice that there are accents chars (iso-8859-1) inside or tags ; with a well defined encoding value in header... If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and nodes[0].firstChild.nodeValue) ; the tag content becomes : """ La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests \303\240 Londres """ Looks like there has been a unicode (utf-8 ?) conversion ... What can I do, not to have this conversion made ? I don't want the parser to modify my content !!!! Thanx for your support... I've tried with py-xml 0.5.1 and 0.6.2 I use python 1.5.2 under FreeBSD 4.2 My imports (might help ?): from xml import dom from xml.dom.ext.reader import Sax2 from xml.dom import ext from xml.dom.Node import Node Thanx again, Olivier. --- We are Micro$oft. You will be assimilated. Resistance is futile. From matt@virtualspectator.com Wed Jan 10 11:29:38 2001 From: matt@virtualspectator.com (matt) Date: Thu, 11 Jan 2001 00:29:38 +1300 Subject: [XML-SIG] [URGENT] Problem with accent char In-Reply-To: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> Message-ID: <0101110034181B.00856@localhost.localdomain> Have a look through the mailing list ... I asked a whol lot of these question earlier ... anyway, comments below : On Thu, 11 Jan 2001, Olivier Deckmyn wrote: > Hi all, > > Looks like parser modifies my content :( > good .. it should ... see later > I have the following "xml" string : > """ > > > > GB-OTAN-santé > 20010110T105314Z > AFP > > > La polémique loin d'être apaisée par l'annonce de tests à > Londres > LONDRES > > > """ > > One can notice that there are accents chars (iso-8859-1) inside or > tags ; with a well defined encoding value in header... > > If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and > nodes[0].firstChild.nodeValue) ; the tag content becomes : > """ > La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests > \303\240 Londres > """ > > Looks like there has been a unicode (utf-8 ?) conversion ... > Yes, that is correct, as specified. All xml parsers should recognise the encoding set and CONVERT it to unicode ... UTF-8 being the common flavour. > What can I do, not to have this conversion made ? I don't want the parser to > modify my content !!!! It's ok, you can get it back out nicely .... try the following little function I use : from xml.dom import ext def retPrettyPrint(doc): t = cStringIO.StringIO() ext.PrettyPrint(doc,t, encoding='ISO-8859-1') return t.getvalue() regards Matt > > Thanx for your support... > > I've tried with py-xml 0.5.1 and 0.6.2 > > I use python 1.5.2 under FreeBSD 4.2 > > My imports (might help ?): > from xml import dom > from xml.dom.ext.reader import Sax2 > from xml.dom import ext > from xml.dom.Node import Node > > Thanx again, > > Olivier. > > --- > We are Micro$oft. You will be assimilated. Resistance is futile. > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Matt Halstead (PhD) Research and development VirtualSpectator http://www.virtualspectator.com ph 64-9-9136896 From larsga@garshol.priv.no Wed Jan 10 13:31:50 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 10 Jan 2001 14:31:50 +0100 Subject: [XML-SIG] [URGENT] Problem with accent char In-Reply-To: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> Message-ID: * Olivier Deckmyn | | One can notice that there are accents chars (iso-8859-1) inside | or tags ; with a well defined encoding value in | header... | | If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and | nodes[0].firstChild.nodeValue) ; the tag content becomes : | """ | La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests | \303\240 Londres | """ | | Looks like there has been a unicode (utf-8 ?) conversion ... That is correct. | What can I do, not to have this conversion made ? I don't want the | parser to modify my content !!!! You can use xmlproc, you can convert back to latin1 yourself, or you can use Python 2.0, where you'd get Unicode strings. IMHO this is perfectly reasonable behaviour on the part of pyexpat. --Lars M. From uche.ogbuji@fourthought.com Wed Jan 10 20:23:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 10 Jan 2001 13:23:47 -0700 Subject: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> Message-ID: <3A5CC4D3.C933C9AD@fourthought.com> Lars Marius Garshol wrote: > | What can I do, not to have this conversion made ? I don't want the > | parser to modify my content !!!! > > You can use xmlproc, you can convert back to latin1 yourself, or you > can use Python 2.0, where you'd get Unicode strings. Bah. Just to illustrate I prepped the following: ----------------------------------%------------------------------------ from xml.dom.ext.reader import Sax2 from xml.sax.sax2exts import make_parser p = make_parser("xml.sax.drivers2.drv_xmlproc") reader = Sax2.Reader(parser=p) src = """ GB-OTAN-santé 20010110T105314Z AFP La polémique loin d'être apaisée par l'annonce de tests à Londres LONDRES """ doc = reader.fromString(src) nodes = doc.getElementsByTagName('HeadLine') print repr(nodes[0].firstChild.nodeValue) ----------------------------------%------------------------------------ But on the fromString I get >>> doc = reader.fromString(src) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py", line 49, in fromString rt = self.fromStream(stream, ownerDoc) File "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 270, in fromStream self.parser.parse(stream) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py", line 88, in parse parser.parse_resource(source.getSystemId()) # FIXME: rest! AttributeError: getSystemId Looks as if drv_xmlproc is broken for Sax2. However, Oliver should be OK since the following works. ----------------------------------%------------------------------------ from xml.dom.ext.reader import Sax from xml.sax.saxexts import make_parser p = make_parser("xml.sax.drivers.drv_xmlproc") reader = Sax.Reader(parser=p) src = """ GB-OTAN-santé 20010110T105314Z AFP La polémique loin d'être apaisée par l'annonce de tests à Londres LONDRES """ doc = reader.fromString(src) nodes = doc.getElementsByTagName('HeadLine') print repr(nodes[0].firstChild.nodeValue) ----------------------------------%------------------------------------ I get >>> print repr(nodes[0].firstChild.nodeValue) "La pol\351mique loin d'\352tre apais\351e par l'annonce de tests \340\012Londres" Which is what I think Oliver wants. Lars, is the Sax2 problem something you've fixed in your CVS tree? Any chance of a quick fix? (I know you're still swamped). Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Wed Jan 10 21:18:20 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 10 Jan 2001 22:18:20 +0100 Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again In-Reply-To: <01011021320810.00856@localhost.localdomain> (message from matt on Wed, 10 Jan 2001 21:15:09 +1300) References: <0101101829390Y.00856@localhost.localdomain> <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de> <01011021320810.00856@localhost.localdomain> Message-ID: <200101102118.f0ALIKA01226@mira.informatik.hu-berlin.de> > > Would you like to look into correcting that? > > > > Hmm, means upgrading to 2.0, which perhaps I should do. Ok, I now had a look at it myself; please try the patch attached. It generates Unicode objects in Python 2, UTF-8 in 1.5. > The problem is that I use 4dom in some quite heavy zope products, > and I am unconvinced that python 2.0 and Zope are stable enough for > production environments, and too different to have split between > production and development. I understand the Zope problems are not resolved, yet, so not upgrading seems still the right thing to do. > The other part though is making 4Dom pickleable, which was actually > my next little project, to look at it some more and see where it is > not pickleable. Could be simple, someone may already have the > answer. I don't know what the state of this is; if you think you can contribute, just go ahead. > Having a closer inspection of PyXML 0.6.3, the original memory leak > from the parser doing it's parsing thing has gone, but there is one > that exists for just purely making a parser. Can you provide sample code showing the problem? Perhaps I'm not seeing it because the Python 2 garbage collector collects the cycles. Also, did you call xml.dom.ext.ReleaseNode? The DOM is full of cycles; without a cyclic gc, the only way to get rid of them is to explicitly release them. > > > I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur. > > > > I'm confused. Where did you get PyXML 1.2 from? > > > > Someone said go get PyXML 1.3 on the 5th January from sourcefourge and I only > found PyXML 1.2 ..... which has now changed to 1.3 ... and there are > differences .. I have attached diff PyXML-0.6.2 PyXML-0.6.3 so you can see. Well, I know well what PyXML 0.6.3 is. I'm just curious as to why you are calling it 1.3... Regards, Martin From odeckmyn.list@teaser.fr Thu Jan 11 07:46:16 2001 From: odeckmyn.list@teaser.fr (Olivier Deckmyn) Date: Thu, 11 Jan 2001 08:46:16 +0100 Subject: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> <3A5CC4D3.C933C9AD@fourthought.com> Message-ID: <003701c07ba2$93d5e1c0$0d00000a@ODECKMYN2K> Hi !! Thanx you all for your support ! I solved using 4T UTF8String class provided in Unicode package, found in xml.dom.ext .... Thanx 4T ;) ----- Original Message ----- From: "Uche Ogbuji" To: "Olivier Deckmyn" Cc: ; "'4suite@lists.fourthought.com'" <4suite@dollar.fourthought.com> Sent: Wednesday, January 10, 2001 9:23 PM Subject: Re: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char > Lars Marius Garshol wrote: > > > | What can I do, not to have this conversion made ? I don't want the > > | parser to modify my content !!!! > > > > You can use xmlproc, you can convert back to latin1 yourself, or you > > can use Python 2.0, where you'd get Unicode strings. > > Bah. Just to illustrate I prepped the following: > > ----------------------------------%------------------------------------ > > from xml.dom.ext.reader import Sax2 > from xml.sax.sax2exts import make_parser > p = make_parser("xml.sax.drivers2.drv_xmlproc") > reader = Sax2.Reader(parser=p) > > src = """ > > > GB-OTAN-santé > 20010110T105314Z > AFP > > > La polémique loin d'être apaisée par l'annonce de tests à > Londres > LONDRES > > > """ > > doc = reader.fromString(src) > nodes = doc.getElementsByTagName('HeadLine') > print repr(nodes[0].firstChild.nodeValue) > > ----------------------------------%------------------------------------ > > But on the fromString I get > > >>> doc = reader.fromString(src) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py", > line 49, in fromString > rt = self.fromStream(stream, ownerDoc) > File > "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", > line 270, in fromStream > self.parser.parse(stream) > File > "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py ", > line 88, in parse > parser.parse_resource(source.getSystemId()) # FIXME: rest! > AttributeError: getSystemId > > > Looks as if drv_xmlproc is broken for Sax2. > > However, Oliver should be OK since the following works. > > ----------------------------------%------------------------------------ > > from xml.dom.ext.reader import Sax > from xml.sax.saxexts import make_parser > p = make_parser("xml.sax.drivers.drv_xmlproc") > reader = Sax.Reader(parser=p) > > src = """ > > > GB-OTAN-santé > 20010110T105314Z > AFP > > > La polémique loin d'être apaisée par l'annonce de tests à > Londres > LONDRES > > > """ > > doc = reader.fromString(src) > nodes = doc.getElementsByTagName('HeadLine') > print repr(nodes[0].firstChild.nodeValue) > ----------------------------------%------------------------------------ > > I get > > >>> print repr(nodes[0].firstChild.nodeValue) > "La pol\351mique loin d'\352tre apais\351e par l'annonce de tests > \340\012Londres" > > Which is what I think Oliver wants. > > Lars, is the Sax2 problem something you've fixed in your CVS tree? Any > chance of a quick fix? (I know you're still swamped). > > Thanks. > > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig From martin@loewis.home.cs.tu-berlin.de Thu Jan 11 11:29:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 12:29:59 +0100 Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again In-Reply-To: <01011111124501.00909@localhost.localdomain> (message from matt on Thu, 11 Jan 2001 10:59:52 +1300) References: <01011021320810.00856@localhost.localdomain> <200101102118.f0ALIKA01226@mira.informatik.hu-berlin.de> <01011111124501.00909@localhost.localdomain> Message-ID: <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de> > > > Having a closer inspection of PyXML 0.6.3, the original memory leak > > > from the parser doing it's parsing thing has gone, but there is one > > > that exists for just purely making a parser. I found the problem: While I updated the SAX2 driver, I had not changed the SAX1 driver. With the patch below, I don't get any memory leak for your example. There where two problems: For one, drv_pyexpat did not use our pyexpat module but the Python one if available, and it would not attempt to break cycles at the end of parsing. Regards, Martin Index: drv_pyexpat.py =================================================================== RCS file: /cvsroot/pyxml/xml/xml/sax/drivers/drv_pyexpat.py,v retrieving revision 1.11 diff -u -r1.11 drv_pyexpat.py --- drv_pyexpat.py 2000/10/05 19:32:52 1.11 +++ drv_pyexpat.py 2001/01/11 11:25:28 @@ -14,10 +14,9 @@ from xml.sax import saxlib,saxutils try: - import pyexpat + from xml.parsers import expat except ImportError: - # pyexpat not built in core installation, use our own - from xml.parsers import pyexpat + raise SAXReaderNotAvailable("expat not supported",None) import urllib,types @@ -57,7 +56,7 @@ def parse(self,sysID): self.parseFile(urllib.urlopen(sysID),sysID) - + def parseFile(self,fileobj,sysID=None): self.reset() self.sysID=sysID @@ -71,6 +70,7 @@ self.parser.Parse("", 1) self.doc_handler.endDocument() + self.close() # --- Locator methods. Only usable after errors. @@ -90,7 +90,7 @@ def __report_error(self): errc=self.parser.ErrorCode - msg=pyexpat.ErrorString(errc) + msg=expat.ErrorString(errc) exc=saxlib.SAXParseException(msg,None,self) self.err_handler.fatalError(exc) @@ -113,7 +113,7 @@ def reset(self): self.sysID=None - self.parser=pyexpat.ParserCreate() + self.parser=expat.ParserCreate() self.parser.StartElementHandler = self.startElement self.parser.EndElementHandler = self.endElement self.parser.CharacterDataHandler = self.characters @@ -125,8 +125,12 @@ self.__report_error() def close(self): + if self.parser is None: + # make sure close is idempotent + return if self.parser.Parse("", 0) != 1: self.__report_error() + self.parser = None # --- An expat driver that uses the lazy map From odeckmyn.list@teaser.fr Thu Jan 11 16:30:28 2001 From: odeckmyn.list@teaser.fr (Olivier Deckmyn) Date: Thu, 11 Jan 2001 17:30:28 +0100 Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss Message-ID: <008e01c07beb$cec80580$0d00000a@ODECKMYN2K> C'est un message de format MIME en plusieurs parties. ------=_NextPart_000_008B_01C07BF4.30795AB0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable ----- Original Message -----=20 From: GExpertsDiscuss@egroups.com=20 To: GExpertsDiscuss@egroups.com=20 Sent: Thursday, January 11, 2001 3:54 PM Subject: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss=20 Hello, This email message is a notification to let you know that a file has been uploaded to the Files area of the GExpertsDiscuss=20 group. File : /Enhance002.zip=20 Uploaded by : rschoenaker@hotmail.com=20 Description : Latest and greatest Formdrawer. Please test the drawing = and spawn flames and comments.=20 You can access this file at the URL http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20 To learn more about eGroups file sharing, please visit http://www.egroups.com/help/files.html Regards, rschoenaker@hotmail.com eGroups Sponsor=20 Click here to Win a 2001 Acura MDX=20 To unsubscribe from this group, send an email to: GExpertsDiscuss-unsubscribe@egroups.com ------=_NextPart_000_008B_01C07BF4.30795AB0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
 
----- Original Message -----=20
From: GExpertsDiscuss@egroups.com =
Sent: Thursday, January 11, 2001 3:54 PM
Subject: [GExpertsDiscuss] New file uploaded to = GExpertsDiscuss=20


Hello,

This email message is a = notification to let=20 you know that
a file has been uploaded to the Files area of the=20 GExpertsDiscuss
group.

 =20 File        : /Enhance002.zip =
 =20 Uploaded by : rschoenaker@hotmail.com =
 =20 Description : Latest and greatest Formdrawer. Please test the drawing = and spawn=20 flames and comments.

You can access this file at the = URL

ht= tp://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20

To learn more about eGroups file sharing, please visit

http://www.egroups.com/he= lp/files.html


Regards,

rschoenaker@hotmail.com
<= BR>



eGroups=20 Sponsor
3D"Click
Click here to Win a 2001 Acura = MDX

To=20 unsubscribe from this group, send an email=20 to:
GExpertsDiscuss-unsubscribe@egroups.com

= ------=_NextPart_000_008B_01C07BF4.30795AB0-- From odeckmyn.list@teaser.fr Thu Jan 11 16:48:00 2001 From: odeckmyn.list@teaser.fr (Olivier Deckmyn) Date: Thu, 11 Jan 2001 17:48:00 +0100 Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss References: <008e01c07beb$cec80580$0d00000a@ODECKMYN2K> Message-ID: <011a01c07bee$41f17530$0d00000a@ODECKMYN2K> C'est un message de format MIME en plusieurs parties. ------=_NextPart_000_0117_01C07BF6.A3A2CA60 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable bad click - Sorry for the noise :( ----- Original Message -----=20 From: Olivier Deckmyn=20 To: xml-sig@python.org=20 Sent: Thursday, January 11, 2001 5:30 PM Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to = GExpertsDiscuss ----- Original Message -----=20 From: GExpertsDiscuss@egroups.com=20 To: GExpertsDiscuss@egroups.com=20 Sent: Thursday, January 11, 2001 3:54 PM Subject: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss=20 Hello, This email message is a notification to let you know that a file has been uploaded to the Files area of the GExpertsDiscuss=20 group. File : /Enhance002.zip=20 Uploaded by : rschoenaker@hotmail.com=20 Description : Latest and greatest Formdrawer. Please test the = drawing and spawn flames and comments.=20 You can access this file at the URL http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20 To learn more about eGroups file sharing, please visit http://www.egroups.com/help/files.html Regards, rschoenaker@hotmail.com eGroups Sponsor=20 Click here to Win a 2001 Acura MDX=20 To unsubscribe from this group, send an email to: GExpertsDiscuss-unsubscribe@egroups.com ------=_NextPart_000_0117_01C07BF6.A3A2CA60 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
bad click - Sorry for the noise=20 :(
----- Original Message -----
From:=20 Olivier=20 Deckmyn
Sent: Thursday, January 11, = 2001 5:30=20 PM
Subject: [XML-SIG] Fw: = [GExpertsDiscuss]=20 New file uploaded to GExpertsDiscuss

 
----- Original Message -----=20
From: GExpertsDiscuss@egroups.com =
Sent: Thursday, January 11, 2001 3:54 PM
Subject: [GExpertsDiscuss] New file uploaded to = GExpertsDiscuss=20


Hello,

This email message is a = notification to=20 let you know that
a file has been uploaded to the Files area of the = GExpertsDiscuss
group.

 =20 File        : /Enhance002.zip =
 =20 Uploaded by : rschoenaker@hotmail.com =
 =20 Description : Latest and greatest Formdrawer. Please test the drawing = and=20 spawn flames and comments.

You can access this file at the=20 URL

ht= tp://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20

To learn more about eGroups file sharing, please = visit

http://www.egroups.com/he= lp/files.html


Regards,

rschoenaker@hotmail.com
<= BR>



eGroups=20 Sponsor
3D"Click
Click here to Win a 2001 Acura = MDX

To=20 unsubscribe from this group, send an email=20 = to:
GExpertsDiscuss-unsubscribe@egroups.com

<= /BODY> ------=_NextPart_000_0117_01C07BF6.A3A2CA60-- From teg@redhat.com Thu Jan 11 21:12:53 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 11 Jan 2001 16:12:53 -0500 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> Message-ID: Martin von Loewis writes: > Version 0.6.3 of the Python/XML distribution is now available. It > should be considered a beta release, and can be downloaded from > the following URLs: > > * Restructure DOM interfaces to better accomodate multiple > DOM implementations: provide standard exceptions and symbolic > constants (including those inside of the Node interface) in > xml.dom. > > * Improve minidom: validate arguments and raise DOM exceptions, > correct NameNodeMap operations, offer cloneNode, splitText, > DocumentType, DOMImplementation, and correct various other > errors. Given this, what is the best way to create RPMs of PyXML and 4Suite which coexist? (no overlapping files). If the dom directory of PyXML is included (and the one from 4Suite thus not included), things like XSLT break. OTOH, PyXML has a couple of extra files (minidom, javadom) etc... would these coexist with the rest of the directory coming from 4Suite_ -- Trond Eivind Glomsrød Red Hat, Inc. From martin@loewis.home.cs.tu-berlin.de Thu Jan 11 21:59:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 22:59:00 +0100 Subject: [XML-SIG] Jython usage survey Message-ID: <200101112159.f0BLx0J02401@mira.informatik.hu-berlin.de> To find out usage of PyXML with Jython, and to play with another SF facility, I created a roughly-two-question survey. Please take a moment to answer it. It is available at http://sourceforge.net/survey/survey.php?group_id=6473&survey_id=11258 Please understand that answering the survey won't have any immediate effect on PyXML; it's rather an indication how Jython support should evolve in the long term. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Jan 11 21:49:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 22:49:49 +0100 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: (teg@redhat.com) References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> Message-ID: <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de> > Given this, what is the best way to create RPMs of PyXML and 4Suite > which coexist? (no overlapping files). If you are speaking as a Linux distributor now, I think the best action is to not distribute PyXML 0.6.3 at all, as it does not cooperate with the current 4Suite release. Instead, I recommend to wait for 0.6.4, and the next release of 4Suite [as it turns out, 4DOM's xml.dom.ext.reader.Sax won't even use the pyexpat improvements]. > If the dom directory of PyXML is included (and the one from 4Suite > thus not included), things like XSLT break. OTOH, PyXML has a couple > of extra files (minidom, javadom) etc... would these coexist with > the rest of the directory coming from 4Suite_ If you are just asking as a user who wants to use the current version of 4Suite and PyXML 0.6.3, then yes, that would be a good combination. I don't know how many javadom users are out there, so just including minidom and pulldom might be sufficient. However, these are strictly necessary in combination with Python 2.0 - otherwise PyXML would break its contract with Python 2, which is to offer a proper superset of the Python 2 functionality. If somebody now wonders why I bothered releasing 0.6.3 at all: I would not have learned about these problems if I hadn't. If you really where asking about the long-term co-existance of 4Suite and PyXML, with regard to the 4DOM overlap: I have good faith that things will work out to everybody's liking. Regards, Martin (*) To find out more about that question, I just created a survey: http://sourceforge.net/survey/survey.php?group_id=6473&survey_id=11258 From teg@redhat.com Thu Jan 11 22:17:14 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 11 Jan 2001 17:17:14 -0500 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de> References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de> Message-ID: "Martin v. Loewis" writes: > > Given this, what is the best way to create RPMs of PyXML and 4Suite > > which coexist? (no overlapping files). > > If you are speaking as a Linux distributor now, Both this and as a user - we are distributing it (rawhide), but this is mainly because we'll be using it. > I think the best action is to not distribute PyXML 0.6.3 at all, as >it does not cooperate with the current 4Suite release. Instead, I >recommend to wait for 0.6.4, and the next release of 4Suite [as it >turns out, 4DOM's xml.dom.ext.reader.Sax won't even use the pyexpat >improvements]. Noted. I'll wait before updating - we're currently at 0.5.5.1 and 0.10.0 -- Trond Eivind Glomsrød Red Hat, Inc. From uche.ogbuji@fourthought.com Thu Jan 11 22:39:11 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 11 Jan 2001 15:39:11 -0700 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: Message from teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) of "11 Jan 2001 16:12:53 EST." Message-ID: <200101112239.PAA08928@localhost.localdomain> > Martin von Loewis writes: > Given this, what is the best way to create RPMs of PyXML and 4Suite > which coexist? (no overlapping files). If the dom directory of PyXML > is included (and the one from 4Suite thus not included), things like > XSLT break. OTOH, PyXML has a couple of extra files (minidom, javadom) > etc... would these coexist with the rest of the directory coming from > 4Suite_ = We've pretty much had enough. As of version 0.10.1, PyXML will come bund= led = with 4Suite. I'm desperately hacking at xmlproc, trying to get it to behave with SAX2,= and = we have a few other minor PyXML fixes. 4Suite 0.10.1 needs a lot of thes= e = fixes, but I'm not sure a full PyXML 0.6.4 is warranted. I think at this point, we'll just be sure to run all applicable tests and= = bundle the version of PyXML we need. Of course all credits and attributi= ons = will be maintained. This should make building your RPMs much simpler. In fact, if I had more= = RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite= " = RPMs from the single 4Suite source tar. -- = Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com = 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From 34065280@25480.com Fri Jan 12 08:02:04 2001 From: 34065280@25480.com (Joy) Date: Fri, 12 Jan 01 03:02:04 EST Subject: [XML-SIG] a late happy new year to you;-) Message-ID: <234> Cross Stitcher WIN-Stitch is the best Cross-Stitch Program on the Market - used by most Professionals. WIN-Stitch Publisher normally $550 - this week $200 All other programs 50% discount till 15th January only. Free Download at http://www.WIN-Stitch.com P.S. sorry if this mail reached you in error. No remove needed as this is a one-time notice only. From uche.ogbuji@fourthought.com Fri Jan 12 06:38:37 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 11 Jan 2001 23:38:37 -0700 Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now Message-ID: <200101120638.XAA10719@localhost.localdomain> I've fought with it, but I think I'm running into pretty fundamental problems in xmlproc.XMLProcessor. I'm not sure what it is about driver2.drv_xmlproc that brings out these problems, but I'm getting phantom end tags being reported and such weirdness. Hopefully I'll be able to revisit the problem if Lars can't get to it, but for now I must turn back to other issues so we can get out 4Suite 0.10.1. I have fixed quite a few bugs in driver2.drv_xmlproc which I'm about to check in. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Fri Jan 12 08:30:53 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Jan 2001 09:30:53 +0100 Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now In-Reply-To: <200101120638.XAA10719@localhost.localdomain> References: <200101120638.XAA10719@localhost.localdomain> Message-ID: * uche ogbuji | | I've fought with it, but I think I'm running into pretty fundamental | problems in xmlproc.XMLProcessor. I'm not sure what it is about | driver2.drv_xmlproc that brings out these problems, but I'm getting | phantom end tags being reported and such weirdness. The problem is almost certainly that your application raises an IndexError in one of its handler methods. This causes xmlproc's buffering to get out of whack and will give just the symptoms you report. This is a known weakness of xmlproc that I will fix as soon as I can. Note that it is non-trivial to fix it without impacting performance too much. --Lars M. From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 08:19:28 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 09:19:28 +0100 Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now In-Reply-To: <200101120638.XAA10719@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101120638.XAA10719@localhost.localdomain> Message-ID: <200101120819.f0C8JSk00987@mira.informatik.hu-berlin.de> > I've fought with it, but I think I'm running into pretty fundamental > problems in xmlproc.XMLProcessor. I'm not sure what it is about > driver2.drv_xmlproc that brings out these problems, but I'm getting > phantom end tags being reported and such weirdness. Could you please provide a few bug reports for these problems? I'd like to help, but a general "it is broken" is a bad starting point... Regards, Martin From larsga@garshol.priv.no Fri Jan 12 08:40:36 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Jan 2001 09:40:36 +0100 Subject: [XML-SIG] [OT] Compiler problems Message-ID: Whenever I try to compile anything at all using the Python 2.0 sources I get this compilation error: /usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT definition appears wrong for platform (bad gcc config?)." I'm using a stock RedHat 7.0 Linux system, except that I removed the gcc 2.96 version that came with it (it caused problems compiling SP) and replaced it with this: Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/specs gcc version 2.95.1 19990816/Linux (release) Does anyone have any ideas as to what the problem is and how it is best fixed? --Lars M. From loewis@informatik.hu-berlin.de Fri Jan 12 11:23:32 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 12 Jan 2001 12:23:32 +0100 (MET) Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: <200101112239.PAA08928@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101112239.PAA08928@localhost.localdomain> Message-ID: <200101121123.MAA15892@pandora.informatik.hu-berlin.de> > This should make building your RPMs much simpler. In fact, if I had more > RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" > RPMs from the single 4Suite source tar. That probably would not be too desirable; depending on how it is done, it might not even work. No matter how packaging is done, xml.dom.minidom should be available in PyXML. In turn, xml.dom.__init__ must be present to provide Node. In turn, xml.dom.en_us must also be included. Regards, Martin From akuchlin@mems-exchange.org Fri Jan 12 15:04:39 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 12 Jan 2001 10:04:39 -0500 Subject: [XML-SIG] [OT] Compiler problems In-Reply-To: ; from larsga@garshol.priv.no on Fri, Jan 12, 2001 at 09:40:36AM +0100 References: Message-ID: <20010112100439.A27688@kronos.cnri.reston.va.us> On Fri, Jan 12, 2001 at 09:40:36AM +0100, Lars Marius Garshol wrote: > >Whenever I try to compile anything at all using the Python 2.0 sources >I get this compilation error: > >/usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT definition appears wrong for platform (bad gcc config?)." I believe that it's actually glibc at fault, and the error message in Python is misleading. Check at Red Hat for an updated glibc. --amk From uche.ogbuji@fourthought.com Fri Jan 12 15:48:12 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 12 Jan 2001 08:48:12 -0700 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: Message from Martin von Loewis of "Fri, 12 Jan 2001 12:23:32 +0100." <200101121123.MAA15892@pandora.informatik.hu-berlin.de> Message-ID: <200101121548.IAA12133@localhost.localdomain> > > This should make building your RPMs much simpler. In fact, if I had more > > RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" > > RPMs from the single 4Suite source tar. > > That probably would not be too desirable; depending on how it is done, > it might not even work. No matter how packaging is done, > xml.dom.minidom should be available in PyXML. In turn, > xml.dom.__init__ must be present to provide Node. In turn, > xml.dom.en_us must also be included. Is this really a problem? PyXML would be a prereq for 4Suite, and would have everything it needs. The 4Suite RPM vould write to the xml/dom dir the additional stuff. This would mandate that we keep at least __init__ and en_us in sync. But right after this release we plan to have a closer look at the co-packaging between PyXML and 4Suite, and I don't think all this will be such a mess for long. I've already taken the preliminary step by updating 4Suite's setup.py to install PyXML as well if it's there. See an announcement coming soon on the 4Suite list. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From teg@redhat.com Fri Jan 12 15:50:05 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 12 Jan 2001 10:50:05 -0500 Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: <200101121548.IAA12133@localhost.localdomain> References: <200101121548.IAA12133@localhost.localdomain> Message-ID: uche.ogbuji@fourthought.com writes: > > > This should make building your RPMs much simpler. In fact, if I had more > > > RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" > > > RPMs from the single 4Suite source tar. > > > > That probably would not be too desirable; depending on how it is done, > > it might not even work. No matter how packaging is done, > > xml.dom.minidom should be available in PyXML. In turn, > > xml.dom.__init__ must be present to provide Node. In turn, > > xml.dom.en_us must also be included. > > Is this really a problem? PyXML would be a prereq for 4Suite, and would have > everything it needs. The 4Suite RPM vould write to the xml/dom dir the > additional stuff. This would mandate that we keep at least __init__ and en_us > in sync. Note that files being present in both packages is a pain - epescially if you can't use one or the other, but really need a combination of the two... -- Trond Eivind Glomsrød Red Hat, Inc. From larsga@garshol.priv.no Fri Jan 12 16:12:58 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Jan 2001 17:12:58 +0100 Subject: [XML-SIG] [OT] Compiler problems In-Reply-To: <20010112100439.A27688@kronos.cnri.reston.va.us> References: <20010112100439.A27688@kronos.cnri.reston.va.us> Message-ID: * Lars Marius Garshol | | Whenever I try to compile anything at all using the Python 2.0 sources | I get this compilation error: | | /usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT | definition appears wrong for platform (bad gcc config?)." * Andrew Kuchling | | I believe that it's actually glibc at fault, and the error message in | Python is misleading. Check at Red Hat for an updated glibc. That was it! Thank you! I upgraded to glibc-2.2-9 and the problem just disappeared. --Lars M. From uche.ogbuji@fourthought.com Fri Jan 12 16:48:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 12 Jan 2001 09:48:47 -0700 Subject: [XML-SIG] 4Suite-0.10.1beta1 (help please) Message-ID: <3A5F356F.E6768700@fourthought.com> I have prepared a beta for the 4Suite 0.10.1 release. I'd especially like people to help test it because it's the first release that incorporates PyXML. If you do care to test it (not on a production machine, of course), please nuke your Ft and _xmlplus directories in your Python library first. Then simply install using python setup.py install And give it a whirl. Send in your bug reports right away so we can get them in. ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta1.tar.gz Windows users will need a C compiler. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From loewis@informatik.hu-berlin.de Fri Jan 12 17:02:26 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 12 Jan 2001 18:02:26 +0100 (MET) Subject: [XML-SIG] PyXML 0.6.3 is available In-Reply-To: <200101121548.IAA12133@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101121548.IAA12133@localhost.localdomain> Message-ID: <200101121702.SAA05880@pandora.informatik.hu-berlin.de> > Is this really a problem? PyXML would be a prereq for 4Suite, and > would have everything it needs. The 4Suite RPM vould write to the > xml/dom dir the additional stuff. This would mandate that we keep > at least __init__ and en_us in sync. But right after this release > we plan to have a closer look at the co-packaging between PyXML and > 4Suite, and I don't think all this will be such a mess for long. As a short-term solution, that is fine. I'm just worried about somebody installing PyXML and not getting 4DOM. Regards, Martin From eric2461@caramail.com Fri Jan 12 17:11:54 2001 From: eric2461@caramail.com (RICO) Date: Fri, 12 Jan 2001 18:11:54 +0100 Subject: [XML-SIG] =?iso-8859-1?Q?Invitations_aux_soldes_priv=E9s_de_Grandes_marques_!?= Message-ID: <200101121800.f0CI0PW13822@bacho.adi.fr> From pg@fluent.com Fri Jan 12 21:33:49 2001 From: pg@fluent.com (Pankaj Gupta) Date: Fri, 12 Jan 2001 16:33:49 -0500 (EST) Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10 Message-ID: Hi, I downloaded PyXML and tried to setup on my ultra. It seems the location of the files expected in Lib/distutils/sysconfig.py is different from the ones that I have. I have not installed Python in /usr/local, but have it in my home area. The exception which comes is: distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/local/lib/python2.0/config/Makefile (No such file or directory) I tried to findout where this config directory is, but didnot get anything. If anyone can suggest any workaround, it will be very helpful. Specifically, can't I simply compile the C files in this distribution with the Python source files and import the .py files once I open the interpreter? Thanks, Pankaj From akuchlin@mems-exchange.org Fri Jan 12 22:13:22 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 12 Jan 2001 17:13:22 -0500 Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10 In-Reply-To: ; from pg@fluent.com on Fri, Jan 12, 2001 at 04:33:49PM -0500 References: Message-ID: <20010112171322.A5372@kronos.cnri.reston.va.us> On Fri, Jan 12, 2001 at 04:33:49PM -0500, Pankaj Gupta wrote: >distutils.errors.DistutilsPlatformError: invalid Python installation: >unable to open /usr/local/lib/python2.0/config/Makefile (No such file or >directory) sysconfig uses the value of sys.prefix and sys.exec_prefix. What are they set to for your Python installation? --amk From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 22:42:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 23:42:56 +0100 Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10 In-Reply-To: (message from Pankaj Gupta on Fri, 12 Jan 2001 16:33:49 -0500 (EST)) References: Message-ID: <200101122242.f0CMguH01448@mira.informatik.hu-berlin.de> > I downloaded PyXML and tried to setup on my ultra. It seems the location > of the files expected in Lib/distutils/sysconfig.py is different from the > ones that I have. I have not installed Python in /usr/local, but have it > in my home area. The exception which comes is: > > distutils.errors.DistutilsPlatformError: invalid Python installation: > unable to open /usr/local/lib/python2.0/config/Makefile (No such file or > directory) Can you give a details description of how you installed Python (what commands in what sequence), and how you attempted to installed PyXML? Is there a Python installation in /usr/local/lib/python2.0? If so, which python binary did you use for setup.py? > Specifically, can't I simply compile the C files in this > distribution with the Python source files and import the .py files > once I open the interpreter? If you do it right, yes, you can. Please be aware that you need to give certain defines when compiling the files, and that Python 2.0 comes with its own xml module which is only superceded by PyXML if the latter is installed in _xmlplus. Regards, Martin From pg@fluent.com Fri Jan 12 23:15:39 2001 From: pg@fluent.com (Pankaj Gupta) Date: Fri, 12 Jan 2001 18:15:39 -0500 (EST) Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10 In-Reply-To: <200101122242.f0CMguH01448@mira.informatik.hu-berlin.de> Message-ID: Hi, > > I downloaded PyXML and tried to setup on my ultra. It seems the location > > of the files expected in Lib/distutils/sysconfig.py is different from the > > ones that I have. I have not installed Python in /usr/local, but have it > > in my home area. The exception which comes is: > > > > distutils.errors.DistutilsPlatformError: invalid Python installation: > > unable to open /usr/local/lib/python2.0/config/Makefile (No such file or > > directory) > > Can you give a details description of how you installed Python (what > commands in what sequence), and how you attempted to installed PyXML? > Is there a Python installation in /usr/local/lib/python2.0? If so, > which python binary did you use for setup.py? I am not sure how I installed python. I think I just untarred and gunzipped the distribution and invoked the Makefile after configuring. I have the python directory in ~/Python-2.0 As for PyXML, I downloaded it in ~/Python-2.0 and after untarring it, I used: 'python setup.py install' in ~/Python-2.0/PyXML-0.6.3 directory. I donot have anyother python loaded anywhere in the /usr/local area. I found this path was more or less hardcoded for posix systems in ~/Python-2.0/Lib/distutils/sysconfig.py. Even changing this path didn't help as I could not find any config sub-directory in Python-2.0. Thanks, Pankaj From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 23:41:07 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 13 Jan 2001 00:41:07 +0100 Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10 In-Reply-To: (message from Pankaj Gupta on Fri, 12 Jan 2001 18:15:39 -0500 (EST)) References: Message-ID: <200101122341.f0CNf7S02168@mira.informatik.hu-berlin.de> > I am not sure how I installed python. I think I just untarred and > gunzipped the distribution and invoked the Makefile after configuring. I > have the python directory in ~/Python-2.0 Please have a look at the README file in the python sources. You should invoke 'make install' to really get a working Python installation. You probably want to give a --prefix option to configure. I believe if you properly install Python, distutils will properly work as well. > I donot have anyother python loaded anywhere in the /usr/local area. I > found this path was more or less hardcoded for posix systems in > ~/Python-2.0/Lib/distutils/sysconfig.py. sysconfig.py does not contain the string 'local', so I doubt there is anything hard coded anywhere. Instead, it uses sys.prefix, which is the location you gave to configure's --prefix option. > Even changing this path didn't help as I could not find any config > sub-directory in Python-2.0. Yes, that's because 'make install' will create it. Regards, Martin From noreply@sourceforge.net Sat Jan 13 14:59:38 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 13 Jan 2001 06:59:38 -0800 Subject: [XML-SIG] [Bug #128666] [4S-0.10.1beta2] problem with validating parser Message-ID: Bug #128666, was updated on 2001-Jan-13 06:59 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: [4S-0.10.1beta2] problem with validating parser Details: Hi there, I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to generate a DOM with validate set to 1 fails. ---------------------------------- Sample script: xml=""" """ from xml.dom.ext.reader import Sax2 d = Sax2.FromXml(xml,validate=1)#,catName=catalog) ----------------------------------- stack trace: Traceback (innermost last): File "catalog_bug.py", line 9, in ? d = Sax2.FromXml(xml,validate=1)#,catName=catalog) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 313, in FromXml saxHandlerClass, parser) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 306, in FromXmlStream return reader.fromStream(stream, ownerDocument) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 292, in fromStream self.parser.parse(s) File "/usr/lib/python1.5/site-packages/xml/sax/drivers2/drv_xmlproc.py", line 93, in parse parser.flush() File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 206, in flush self.do_parse() File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 9 3, in do_parse self.parse_start_tag() File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 1 92, in parse_start_tag self.report_error(3017) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py", line 6 3, in report_error EntityParser.report_error(self,number,args) File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py", line 372, in report_error self.err.fatal(msg) File "/usr/lib/python1.5/site-packages/xml/sax/drivers2/drv_xmlproc.py", line 215, in fatal self._err_handler.fatalError(saxlib.SAXParseException(msg, None, self)) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 260, in fatalError raise exception ------------------------------------ test.DTD For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128666&group_id=6473 From noreply@sourceforge.net Sat Jan 13 15:09:02 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 13 Jan 2001 07:09:02 -0800 Subject: [XML-SIG] [Bug #128667] XHtmlPrettyPrint fails Message-ID: Bug #128667, was updated on 2001-Jan-13 07:09 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: XHtmlPrettyPrint fails Details: This bug was present in 4S-0.10.0, and it's still there in 0.10.1. >>> from xml.dom.ext.reader import Sax2 >>> d = Sax2.FromXml('') >>> from xml.dom.ext import XHtmlPrettyPrint >>> XHtmlPrettyPrint(d) Traceback (innermost last) : File "", line 1, in ? File "/usr/lib/python1.5/site-packages/xml/dom/ext/__init__.py", line 92, in X HtmlPrettyPrint Printer.PrintWalker(visitor, root).run() File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 353, in r un return self.step() File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 349, in s tep self.visitor.visit(self.start_node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 159, in v isit return self.visitDocument(node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/XHtmlPrinter.py", line 26, in visitDocument Printer.PrintVisitor.visitDocument(self,node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 204, in v isitDocument self.visitNodeList(node.childNodes, exclude=node.doctype) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 175, in v isitNodeList curr is not exclude and self.visit(curr) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 135, in v isit return self.visitElement(node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/XHtmlPrinter.py", line 65, in visitElement self.stream.write(self._newLine + self._indent*self._depth + '<' + string.lo wer(node.localName)) TypeError: bad operand type(s) for * For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128667&group_id=6473 From dsokol@osnut.com Sun Jan 14 00:13:20 2001 From: dsokol@osnut.com (dsokol@osnut.com) Date: Sat, 13 Jan 2001 19:13:20 -0500 (EST) Subject: [XML-SIG] Exciting New Nutraceutical Company- Promote your own ideas! Message-ID: <20010114001320.108AAEAF1@mail.python.org> --=200101131341= Content-Type: text/html;charset=US-ASCII Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits

xml-sig@python.org,                                                                                                            

    It was a pleasure learning about your interests in chemistry from your website.  Based on your credentials, I am offering you the following opportunity, which I hope you may find worthwhile.

Thank you,

Daniel

 Have your nutraceutical ideas become reality and marketed to the general public-and perhaps even globally.

Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits from the Quality of your own ideas!

Kava Kava, Ginseng, Echinacea, St. John's Wort...

For FREE information on these nutraceuticals, including their methods of synthesis,  you can go to http://www.osnut.com/freeinfo.htm by clicking HERE.

The explosion in the nutraceutical industry has left open the possibility for considerable profits.  New nutraceuticals and herbal formulas are being discovered, designed, and marketed every day!  If you have a background in herbs/ biology/ chemistry /nutrition and/or medicine, then OSnutraceuticals is the company for you.

Open Source Nutraceuticals, Inc. is a company committed to excellence in the nutraceutical industry by providing an open source for the creation and standardization of nutraceuticals for naturally treating all kinds of conditions. By implementing a linux-like platform for discussion and protection of your ideas, OSnutraceuticals can be the best way to have your innovations marketed to the general public and for you to reap the financial benefits from the sales.

Sign up NOW and get 2 months FREE!

For more information, visit www.osnut.com (or if you live in the USA, call 718-336-1974, 9AM-5PM Eastern Standard Time)

by clicking HERE!

(Note: www.osnut.com is best viewed using Microsoft's Internet Explorer but can also be viewed with Netscape as well)

 If you feel you received this ad by mistake, please contact dsokol@osnut.com and put the word "remove" in the subject line.  You will automatically be taken off our mailing list!

--=200101131341=-- From martin@mira.cs.tu-berlin.de Sun Jan 14 00:40:23 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 14 Jan 2001 01:40:23 +0100 Subject: [XML-SIG] Re: [4S-0.10.1beta2] problem with validating parser In-Reply-To: (noreply@sourceforge.net) References: Message-ID: <200101140040.f0E0eNf19753@mira.informatik.hu-berlin.de> > I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to > generate a DOM with validate set to 1 fails. It's a bug in 4DOM, although xmlproc could be more robust (as Lars Marius already admitted). The problem is indeed that the XmlDomGenerator produces an index error in the line old_nss, del_nss = self._namespaceStack[-1] At that point, nothing is on the namespace stack. The reason for that is that xmlproc uses the namespace interface of the content handler by default, ie. it calls startElementNS and endElementNS. Now, while startElement of the XmlDomGenerator extends the _namespacestack, startElementNS doesn't. However, endElementNS invokes endElement, which tries to remove things from the namespace stack. If the XmlDomGenerator was designed to always do its own namespace processing, I suggest that this is explicitly requested from the SAX parser, by setting xml.sax.handler.feature_namespaces to 0. Then, the SAX parser *should* never invoke startElementNS; those methods might be implemented as raising AssertionErrors just to make sure they aren't. IOW, the quick fix for this bug is to patch --- Sax2.py.orig Sun Jan 14 01:07:31 2001 +++ Sax2.py Sun Jan 14 01:08:08 2001 @@ -264,6 +264,7 @@ def __init__(self, validate=0, keepAllWs=0, catName=None, saxHandlerClass=XmlDomGenerator, parser=None): self.parser = parser or (validate and sax2exts.XMLValParserFactory.make_parser()) or sax2exts.XMLParserFactory.make_parser() + self.parser.setFeature(handler.feature_namespaces, 0) if catName: #set up the catalog, if there is one from xml.parsers.xmlproc import catalog into 4DOM. Regards, Martin P.S. As for xmlproc catching IndexErrors, it appears that the only possible cause for an index error inside do_parse is the assignment to t. So why would it hurt to write try: t=self.data[self.pos+1] # Optimization except IndexError, e: raise OutOfDataException() and to remove the outer IndexError? AFAICT, it only costs a SETUP_EXCEPT/POP_BLOCK pair, which are quite cheap (a function call, and storing a few variables, no memory allocation). From uche.ogbuji@fourthought.com Sun Jan 14 02:06:57 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 13 Jan 2001 19:06:57 -0700 Subject: [XML-SIG] Re: [4suite] Re: [4S-0.10.1beta2] problem with validating parser References: <200101140040.f0E0eNf19753@mira.informatik.hu-berlin.de> Message-ID: <3A6109C1.9EE4D92@fourthought.com> "Martin v. Loewis" wrote: > > > I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to > > generate a DOM with validate set to 1 fails. > > It's a bug in 4DOM, although xmlproc could be more robust (as Lars > Marius already admitted). > > The problem is indeed that the XmlDomGenerator produces an index error > in the line > > old_nss, del_nss = self._namespaceStack[-1] > > At that point, nothing is on the namespace stack. The reason for that > is that xmlproc uses the namespace interface of the content handler by > default, ie. it calls startElementNS and endElementNS. A bit of a co-incidence. I discovered this bug (and others in Sax2) a few hours ago. Lars's comment about IndexErrors was also my clue. The code on my machine now works with xmlproc and SAX2. It still appears that the fixes I made to drv_xmlproc and xmlproc itself are valid. For instance, drv_xmlproc's InputSource management was broken and xmlproc itself would incorrectly assign the elements namespace URI to any unprefixed attributes. I'm currently looking into the minidom and pulldom masking bugs you mentioned and there should be another beta out today. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche@ogbuji.net Sun Jan 14 05:26:40 2001 From: uche@ogbuji.net (Uche Ogbuji) Date: Sat, 13 Jan 2001 22:26:40 -0700 Subject: [XML-SIG] [Fwd: Anyone use Installer with PyXML?] Message-ID: <3A613890.7E2E3EF4@ogbuji.net> This is a multi-part message in MIME format. --------------3B2DC2B12EA665D671B00EFD Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit -- Uche Ogbuji Personal: uche@ogbuji.net http://uche.ogbuji.net Work: uche.ogbuji@fourthought.com http://Fourthought.com --------------3B2DC2B12EA665D671B00EFD Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Path: newsfeed.intelenet.net!news.service.uci.edu!csulb.edu!logbridge.uoregon.edu!newsfeed.mesh.ad.jp!uunet!osa.uu.net!dfw.uu.net!ash.uu.net!news.baymountain.net!not-for-mail From: "Dan Rolander" Newsgroups: comp.lang.python Subject: Anyone use Installer with PyXML? Date: Sat, 13 Jan 2001 11:59:42 -0500 Organization: Baymountain Message-ID: NNTP-Posting-Host: 63.102.49.30 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Trace: news.baymountain.net 979405324 12892 63.102.49.30 (13 Jan 2001 17:02:04 GMT) X-Complaints-To: abuse@baymountain.net NNTP-Posting-Date: 13 Jan 2001 17:02:04 GMT To: Return-Path: Delivered-To: mm+python-list@python.org X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Errors-To: python-list-admin@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.0.1 (101270) Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: Errors-To: python-list-admin@python.org X-BeenThere: python-list@python.org Xref: newsfeed.intelenet.net comp.lang.python:120524 I have not been able to get Gordon McMillan's installer to work with PyXML. The Win32 exe's I create cannot import a parser and I'm not sure how to manually configure the .cfg file. Has anyone done this? Dan --------------3B2DC2B12EA665D671B00EFD-- From uche.ogbuji@fourthought.com Sun Jan 14 08:16:22 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 14 Jan 2001 01:16:22 -0700 Subject: [XML-SIG] 4Suite-0.10.1beta3 Message-ID: <3A616056.285C4DD4@fourthought.com> Bis again, please help us test this thoroughly. The Sax2/xmlprocproblems and the masking of minidom and pulldom appear to be fixed. Let us know if it's not so. On a non-production machine, nuke your Ft and _xmlplus directories in your Python library. Then simply install using python setup.py install And give it a whirl. Send in your bug reports right away so we can get them in. ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta3.tar.gz Windows users will need a C compiler. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Sun Jan 14 18:15:31 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 14 Jan 2001 19:15:31 +0100 Subject: [XML-SIG] 4Suite-0.10.1beta3 In-Reply-To: <3A616056.285C4DD4@fourthought.com> (message from Uche Ogbuji on Sun, 14 Jan 2001 01:16:22 -0700) References: <3A616056.285C4DD4@fourthought.com> Message-ID: <200101141815.f0EIFVC01536@mira.informatik.hu-berlin.de> > Bis again, please help us test this thoroughly. The PyXML test suite fails with it for 'test_dom test_howto test_minidom test_saxdrivers'. At least test_saxdrivers can be fixed by using the current PyXML CVS code. test_howto fails because it now generates an empty in the DOM tests; I'm not sure what change was causing that behaviour. Can somebody comment whether the line is well-formed? Then we could regenerate test_howto - although suppressing the empty DOCTYPE declaration might be a better solution. test_minidom fails because 4DOM's dom/__init__.py deviates from PyXML's; minidom passes string arguments into the DOM exceptions. Perhaps some clarification/agreement is necessary of how exactly the specific DOM exceptions work; bear in mind that Python 2, PyXML and 4Suite must offer consistent definitions of these classes. test_dom fails, again, for writing an empty DOCTYPE. I'd appreciate if you could run the testsuite just before releasing 4Suite; if you run into any problems, please let me know. To run the testsuite, run testxml.py. Regards, Martin From noreply@sourceforge.net Mon Jan 15 10:27:38 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 15 Jan 2001 02:27:38 -0800 Subject: [XML-SIG] [Bug #128827] [0.10.1-beta3] cannot Print a validated document Message-ID: Bug #128827, was updated on 2001-Jan-15 02:27 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: [0.10.1-beta3] cannot Print a validated document Details: Attempting to use xml.dom.ext.Print() on a validated document gives the following traceback: >>> Print(tree) ", line 1, in ? File "/usr/lib/python1.5/site-packages/xml/dom/ext/__init__.py", line 65, in Print Printer.PrintWalker(visitor, root).run() File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 353, in run return self.step() File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 349, in step self.visitor.visit(self.start_node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 159, in visit return self.visitDocument(node) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 203, in visitDocument node.doctype and self.visitDocumentType(node.doctype) File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 282, in visitDocumentType self.stream.write(' PUBLIC %s %s' % public, system) TypeError: not enough arguments for format string For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128827&group_id=6473 From noreply@sourceforge.net Mon Jan 15 13:53:11 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 15 Jan 2001 05:53:11 -0800 Subject: [XML-SIG] [Bug #128851] 4xslt (0.10.0) crash Message-ID: Bug #128851, was updated on 2001-Jan-15 05:53 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: ornicar Assigned to : nobody Summary: 4xslt (0.10.0) crash Details: When running the command "4xslt bugnicar-database.xml bugnicar-insert.xslt" I get a traceback (see below). The error occurs with both 4Suite 0.10.0 and 0.10.1beta3 Traceback (innermost last): File "/usr/bin/4xslt", line 5, in ? _4xslt.Run(sys.argv) File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 94, in Run topLevelParams=top_level_params) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 130, in runUri writer, uri, outputStream) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 202, in runNode self.applyTemplates(context, None) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 222, in applyTemplates self.applyBuiltins(context, mode) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 235, in applyBuiltins self.applyTemplates(context, mode) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 218, in applyTemplates found = sty.applyTemplates(context, mode, self, params) File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 353, in applyTemplates patternInfo[TEMPLATE].instantiate(context, processor, params) File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line 115, in instantiate context = child.instantiate(context, processor)[0] File "/usr/lib/python1.5/site-packages/xml/xslt/LiteralElement.py", line 91, in instantiate context = child.instantiate(context, processor)[0] File "/usr/lib/python1.5/site-packages/xml/xslt/AttributeElement.py", line 60, in instantiate processor.writers[-1].attribute(name, value, namespace) File "/usr/lib/python1.5/site-packages/xml/xslt/XmlWriter.py", line 89, in attribute self._currElement.attrs[name] = TranslateCdataAttr(value) AttributeError: 'None' object has no attribute 'attrs' Here the XSLT file "bugnicar-insert.xslt": ---------------------------------------------- Insérer un nouveau bug dans la base de données de Bugnicar. Insert a new bug in Bugnicar's database. bugnicar-task/bug-report bugnicar-database bugnicar-database 1 And here the XML file bugnicar-database.xml: ------------------------------------------- For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128851&group_id=6473 From noreply@sourceforge.net Mon Jan 15 15:00:54 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 15 Jan 2001 07:00:54 -0800 Subject: [XML-SIG] [Patch #103240] patch for bug #128827 - Print() fails on validated documents Message-ID: Patch #103240 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: afayolle Assigned to : nobody Summary: patch for bug #128827 - Print() fails on validated documents ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=103240&group_id=6473 From noreply@sourceforge.net Mon Jan 15 15:14:10 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 15 Jan 2001 07:14:10 -0800 Subject: [XML-SIG] [Bug #128860] [0.10.1beta3] Sax2 parser ignores keepAllWs option Message-ID: Bug #128860, was updated on 2001-Jan-15 07:13 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: [0.10.1beta3] Sax2 parser ignores keepAllWs option Details: When using Sax.FromXmlWhatever with the validate argument set to TRUE and keepAllWs to FALSE, whitespace at the beginning and ending of text nodes is not ignored. 4Suite 0.10.0 had the right behaviour. Comments in source code seem to indicate that this may be related to xmlproc (Sax2.py line 156) Alexandre Fayolle For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128860&group_id=6473 From uche.ogbuji@fourthought.com Mon Jan 15 18:28:00 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 15 Jan 2001 11:28:00 -0700 Subject: [XML-SIG] 4Suite 0.10.1beta4 Message-ID: <3A634130.258A62FF@fourthought.com> Thanks so much to all those who reported bugs in the past betas. We have addressed most of these ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta4.tar.gz This fixes * Problems accessing minidom and pulldom (including a distutils work-around for Alexandre's problem with permissions to write to the source dir) * Problems with DOM HTML, XHTML, Sax2 and printers * XSLT bugs * etc. Ther are still a couple of bugs we want to address before packaging so expect a release candidate in a few hours. We'll probably begin final packaging late afternoon. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Mon Jan 15 20:59:54 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 15 Jan 2001 12:59:54 -0800 Subject: [XML-SIG] [Bug #128924] xmlproc not generating ignorableWhitespace events Message-ID: Bug #128924, was updated on 2001-Jan-15 12:59 Here is a current snapshot of the bug. Project: Python/XML Category: xmlproc Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: uche Assigned to : nobody Summary: xmlproc not generating ignorableWhitespace events Details: Trying the following using 4DOM ----------------------------------%-------------------------------- import cStringIO from xml.dom.ext import Print, PrettyPrint from xml.dom.ext.reader import Sax, Sax2 from xml.sax import sax2exts, saxexts source_1 = """\ ]> Pieter Aaron
404 Error Way
404-555-1234 404-555-4321 404-555-5555 pieter.aaron@inter.net
Emeka Ndubuisi
42 Spam Blvd
767-555-7676 767-555-7642 800-SKY-PAGEx767676 endubuisi@spamtron.com
Vasia Zhugenev
2000 Disaster Plaza
000-987-6543 000-000-0000 vxz@magog.ru
""" p = saxexts.make_parser("xml.sax.drivers.drv_xmlproc") reader = Sax.Reader(parser=p, keepAllWs=0) doc = reader.fromString(source_1) stream = cStringIO.StringIO() Print(doc, stream=stream) result = stream.getvalue() ----------------------------------%-------------------------------- No ignorableWhitespace events are generated. I have checked that drv_xmlproc does not seem to be getting the handle_ignorable_data events. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=128924&group_id=6473 From uche.ogbuji@fourthought.com Mon Jan 15 21:02:10 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 15 Jan 2001 14:02:10 -0700 Subject: [XML-SIG] xmlproc bug, I think Message-ID: <200101152102.OAA11363@localhost.localdomain> Relevant to a problem Alexandre Fayolle is having http://sourceforge.net/bugs/?func=detailbug&bug_id=128924&group_id=6473 I might have time to look into this after the 4Suite release today, but any help is appreciated. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From matt@virtualspectator.com Mon Jan 15 21:45:29 2001 From: matt@virtualspectator.com (matt) Date: Tue, 16 Jan 2001 10:45:29 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de> References: <01011111124501.00909@localhost.localdomain> <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de> Message-ID: <01011610521502.00889@localhost.localdomain> I'm using PyXML 0.6.3 and python 1.5.2. It seems that CDATA sections are still not handled correctly. The following code demonstrates. test_xml = """ a test caption """ from xml.dom import ext from xml.dom.ext.reader import Sax2 from xml.sax import saxexts a_parser = saxexts.XMLParserFactory.make_parser('xml.sax.drivers.drv_pyexpat') doc = Sax2.FromXml(test_xml,None,parser=a_parser, validate=0) ext.PrettyPrint(doc,encoding='ISO-8859-1') from this I get the CDATA element returning as a text node regards Matt From martin@mira.cs.tu-berlin.de Tue Jan 16 00:21:05 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 16 Jan 2001 01:21:05 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011610521502.00889@localhost.localdomain> (message from matt on Tue, 16 Jan 2001 10:45:29 +1300) References: <01011111124501.00909@localhost.localdomain> <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de> <01011610521502.00889@localhost.localdomain> Message-ID: <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> > I'm using PyXML 0.6.3 and python 1.5.2. It seems that CDATA > sections are still not handled correctly. The following code > demonstrates. Can you elaborate why this is incorrect? Regards, Martin From matt@virtualspectator.com Tue Jan 16 02:25:15 2001 From: matt@virtualspectator.com (matt) Date: Tue, 16 Jan 2001 15:25:15 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> References: <01011610521502.00889@localhost.localdomain> <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> Message-ID: <01011615263906.00889@localhost.localdomain> Sorry, the result of the ext.PrettyPrint is : test_xml = """ a test caption some test data the CDATA escaping has disappeared On Tue, 16 Jan 2001, Martin v. Loewis wrote: > > I'm using PyXML 0.6.3 and python 1.5.2. It seems that CDATA > > sections are still not handled correctly. The following code > > demonstrates. > > Can you elaborate why this is incorrect? > > Regards, > Martin From martin@mira.cs.tu-berlin.de Tue Jan 16 07:24:22 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 16 Jan 2001 08:24:22 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011615263906.00889@localhost.localdomain> (message from matt on Tue, 16 Jan 2001 15:25:15 +1300) References: <01011610521502.00889@localhost.localdomain> <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> <01011615263906.00889@localhost.localdomain> Message-ID: <200101160724.f0G7OMc00811@mira.informatik.hu-berlin.de> > Sorry, the result of the ext.PrettyPrint is : > > > test_xml = """ > > > a test caption > > some test data > > > > the CDATA escaping has disappeared Yes, and why is this incorrect? The two documents are equal. Regards, Martin From ndw@nwalsh.com Tue Jan 16 07:41:25 2001 From: ndw@nwalsh.com (Norman Walsh) Date: 16 Jan 2001 14:41:25 +0700 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: matt's message of "Tue, 16 Jan 2001 15:25:15 +1300" References: <01011610521502.00889@localhost.localdomain> <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> <01011615263906.00889@localhost.localdomain> Message-ID: <877l3w3pwa.fsf@nwalsh.com> was heard to say: | Sorry, the result of the ext.PrettyPrint is : [...] | | some test data | | | the CDATA escaping has disappeared IMHO, that's the behavior that you should expect. CDATA sections are an escaping mechanism, but a serializer is free to choose an alternate escaping mechanism if it chooses. Note also that CDATA escaping and document encoding are related. It's possible to construct documents (if you combine several input sources) that *cannot* preserve the CDATA escaping and the desired encoding. Be seeing you, norm -- Norman Walsh | Life is an irritation--Tucker Case http://nwalsh.com/ | (Christopher Moore) ]]> From matt@virtualspectator.com Tue Jan 16 09:34:23 2001 From: matt@virtualspectator.com (matt) Date: Tue, 16 Jan 2001 22:34:23 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <877l3w3pwa.fsf@nwalsh.com> References: <01011615263906.00889@localhost.localdomain> <877l3w3pwa.fsf@nwalsh.com> Message-ID: <01011622394806.00912@localhost.localdomain> I was following the logic that ext.PrettyPrint can write to a stream, and that it is useful to pick up a document that has escaped data(which may be xml itself), add some nodes to it, and save it back to the stream expecting the escaped sections to be still present as escaped sections. So what I understand now is that I should either use a serializer that keeps these, or write a DTD and use that to write my xml back out to file in a more proper way. Which I guess is my next question, what is the cleanest method in PyXML for reading in such a file with CDATA sections, and getting them back out when rewriting? regards Matt On Tue, 16 Jan 2001, Norman Walsh wrote: > / matt was heard to say: > | Sorry, the result of the ext.PrettyPrint is : > [...] > | > | some test data > | > | > | the CDATA escaping has disappeared > > IMHO, that's the behavior that you should expect. CDATA sections are > an escaping mechanism, but a serializer is free to choose an alternate > escaping mechanism if it chooses. > > Note also that CDATA escaping and document encoding are related. It's > possible to construct documents (if you combine several input sources) > that *cannot* preserve the CDATA escaping and the desired encoding. > > Be seeing you, > norm > > -- > Norman Walsh | Life is an irritation--Tucker Case > http://nwalsh.com/ | (Christopher Moore) > ]]> > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- From Wolfgang.Schoeberl@web.de Tue Jan 16 13:34:26 2001 From: Wolfgang.Schoeberl@web.de (Wolfgang Schoeberl) Date: Tue, 16 Jan 2001 14:34:26 +0100 Subject: [XML-SIG] Problem with 'Bad Request' Message-ID: <200101161334.f0GDYQh11472@mailgate4.cinetic.de> Hi, this is not a specific xml-Problem, but I hope you will help me though. I'= ve got a problem with catching errors. More specific, I would like to catc= h 'Bad Request', which wont't work because of the space. Is it a bug in Py= thon=3F Does anybody know a neat trick=3F Thanks a lot, Wolfgang Here's some more code to describe my problem: def test1(): try: raise "NoProblem" except "NoProblem": print "Test1: NoProblem catched" # work fine def test2(): try: raise "No Problem with blank" except "No Problem with blank": print "Test2: No Problem with blank catched" #work fine def raiseNoProblem(): raise "NoProblem" def raiseProblemWithBlank(): raise "Problem with blank" def test3(): try: raiseNoProblem() except "NoProblem": print "Test3: NoProblem catched" #work fine def test4(): try: raiseProblemWithBlank() except "Problem with blank": print "Test4: No Problem with blank" # does not work :-( except "Problem": print "'Test4: Problem' catched it" except "Problem ": print "'Test4: Problem=5F' catched it" except: print "Test4: 'Problem with blank' not catched - except catched it= " test1() test2() test3() test4() =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F Die Fachpresse ist sich einig: WEB.DE 16mal Testsieger! Kostenlos E-Mail,=20 Fax, SMS, Verschl=FCsselung, POP3, WAP....testen Sie uns! http://freemail.we= b.de From uche.ogbuji@fourthought.com Tue Jan 16 14:33:35 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 16 Jan 2001 07:33:35 -0700 Subject: [XML-SIG] ANN: 4Suite 0.10.1 Message-ID: <200101161433.HAA25167@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.10.1 --------------------------- Open source tools for standards-based XML, DOM, XPath, XSLT, RDF XPointer, XLink and object-database development in Python http://4Suite.org 4Suite is a collection of Python tools for XML processing and object database management. An integrated packaging of several formerly separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS, 4XPointer, 4XLink and DbDOM. News ---- * PyXML (0.6.3 + fixes) is now built in * Implement XInclude * DbDom: Implement cloneNode and document fragments * XSLT: More thorough test harness * XSLT: Support source docs from stdin on 4xslt command line * XSLT: Implement unparsed-entity-uri * XSLT: Restricted HTML writer output allowed as security tool * XPath: Add extension funcs: evaluate,distinct,split,range,if,find * DOM: Update to 2000-11-13 level 2 recomendation * DOM: Proper SAX2 support for reader * DOM: Add native sgmlop reader * RDF: Add removeAll to Model * Documentation updates and consolidation * Domlette reader option to force 8-bit DOM strings even in Python 2.0 * Organize Reader and URI handler APIs to allow easier customizations * Many Python 1.5.2 and 2.0 compatibility fixes * Many misc optimizations * Many misc bug-fixes * 4Suite.org revamped: much heavier use of 4Suite Server features More info and Obtaining 4Suite ------------------------------ Please see http://4Suite.org >From where you can download source, Windows and Linux binaries. 4Suite is distributed under a license similar to that of the Apache Web Server. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Jan 16 14:33:59 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 16 Jan 2001 07:33:59 -0700 Subject: [XML-SIG] ANN: 4Suite Server 0.10.1 Message-ID: <200101161433.HAA25292@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite Server 0.10.1 ---------------------------- An open source XML data server based on open standards implemented using 4Suite and other tools http://FourThought.com/4SuiteServer http://4Suite.org News ---- * Windows support * Smoother installation and configuration * Comprehensive installation HOWTOs * HTTP server support * Raw file support: can serve arbitrary files given mime type * Very experimental SOAP support * Python 2.0 support * More demos * Many optimizations and bug fixes * 4Suite.org revamped: much heavier use of 4Suite Server features 4Suite Server is a platform for XML processing. It features an XML data repository, a rules-based engine, and XSLT transforms, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also supports related services such as distributed transactions and access control lists. It supports remote, cross-platform and cross-language access through CORBA, HTTP and other request protocols to be added shortly. It's not meant to be a full-blown application server. It provides highly-specialized services for XML processing that can be used with other application servers. The software is open-source and free to download. Priority support and customization is available from Fourthought, Inc. For more information on this, see the http://FourThought.com, or contact Fourthought at info@fourthought.com or +1 303 583 9900 The 4Suite Server home page is http://FourThought.com/4SuiteServer >From where you can download the software itself or an executive summary thereof, read usage scenarios and find other information. From martin@mira.cs.tu-berlin.de Wed Jan 17 00:00:43 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 17 Jan 2001 01:00:43 +0100 Subject: [XML-SIG] Problem with 'Bad Request' In-Reply-To: <200101161334.f0GDYQh11472@mailgate4.cinetic.de> (Wolfgang.Schoeberl@web.de) References: <200101161334.f0GDYQh11472@mailgate4.cinetic.de> Message-ID: <200101170000.f0H00hC00944@mira.informatik.hu-berlin.de> > this is not a specific xml-Problem, but I hope you will help me > though. I've got a problem with catching errors. More specific, I > would like to catch 'Bad Request', which wont't work because of the > space. Is it a bug in Python? Does anybody know a neat trick? It is not a bug in Python; please look at the description of the intern builtin to see why that happens (perhaps the raise/try specification also requiring on using identical, not equal strings). Anyway, the neat trick is to write problemWithBlank = "Problem with blank" def raiseProblemWithBlank(): raise problemWithBlank def test4(): try: raiseProblemWithBlank() except problemWithBlank: print "Test4: No Problem with blank" # does not work :-( test4() Please note that string exceptions are deprecated; the Pythonic way to write this code is class ProblemWithBlank(Exception): pass def raiseProblemWithBlank(): raise ProblemWithBlank def test4(): try: raiseProblemWithBlank() except ProblemWithBlank: print "Test4: No Problem with blank" # does not work :-( test4() Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 16 23:54:14 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 17 Jan 2001 00:54:14 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011622394806.00912@localhost.localdomain> (message from matt on Tue, 16 Jan 2001 22:34:23 +1300) References: <01011615263906.00889@localhost.localdomain> <877l3w3pwa.fsf@nwalsh.com> <01011622394806.00912@localhost.localdomain> Message-ID: <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de> > I was following the logic that ext.PrettyPrint can write to a stream That assumption is good, it indeed does. > and that it is useful to pick up a document that has escaped > data(which may be xml itself), add some nodes to it, and save it > back to the stream expecting the escaped sections to be still > present as escaped sections. That logic is flawed (or, there is no logic in it - that's just an assertion). Why is that useful? I.e. why would anybody who'll read the resulting document need to know where exactly the CDATA sections where located in the original document? > So what I understand now is that I should either use a serializer > that keeps these, or write a DTD and use that to write my xml back > out to file in a more proper way. I think your understanding is incorrect. It is not possible to write a serializer that produces the original input by just looking at the DOM tree, and having a DTD does not help at all, either. > Which I guess is my next question, what is the cleanest method in > PyXML for reading in such a file with CDATA sections, and getting > them back out when rewriting? The cleanest way is to accept that it is not possible to write the document back so that it equals the original document on a byte-by-byte basis. It is possible to write the document back so that the content is the same as in the original document; the cleanest way for that is to use ext.PrettyPrint. Regards, Martin P.S. What you *can* get back is CDATA sections for every text element, by properly inheriting from the PrettyPrinter. However, this will give you CDATA sections in places where the original document had none. From matt@virtualspectator.com Wed Jan 17 00:42:17 2001 From: matt@virtualspectator.com (matt) Date: Wed, 17 Jan 2001 13:42:17 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de> References: <01011622394806.00912@localhost.localdomain> <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de> Message-ID: <0101171357420F.00889@localhost.localdomain> On Wed, 17 Jan 2001, Martin v. Loewis wrote: > > I was following the logic that ext.PrettyPrint can write to a stream > > That assumption is good, it indeed does. > > > and that it is useful to pick up a document that has escaped > > data(which may be xml itself), add some nodes to it, and save it > > back to the stream expecting the escaped sections to be still > > present as escaped sections. > > That logic is flawed (or, there is no logic in it - that's just an > assertion). Why is that useful? I.e. why would anybody who'll read the > resulting document need to know where exactly the CDATA sections where > located in the original document? umm, I actually don't care where the CDATA sections are in the doucment. I thought the most obvious scenario that I was alluding to is that one reads in an xml document from a file. Since one has NO interest in parsing the content, rendering, or interpreting it, but does have an interest in locating a particular node and adding a new fragment to it, then saving the modifed document, via ext.PrettyPrint(which I am using), to file again, then one obviously does not want CDATA markers to be removed, because, 1) they may have not written the first document, and 2) they are not trying to interpret it, this will be done at some later stage, in which case one would use an event handler xml parser. Consideriong DOM is useful for document assembly, I don't see any flaw in this logic. You missed the point entirely in that I don't care where they are in the document. > > > So what I understand now is that I should either use a serializer > > that keeps these, or write a DTD and use that to write my xml back > > out to file in a more proper way. > > I think your understanding is incorrect. It is not possible to write a > serializer that produces the original input by just looking at the DOM > tree, and having a DTD does not help at all, either. again you are on the wrong track ... I don't care about order ....... > > > Which I guess is my next question, what is the cleanest method in > > PyXML for reading in such a file with CDATA sections, and getting > > them back out when rewriting? > > The cleanest way is to accept that it is not possible to write the > document back so that it equals the original document on a > byte-by-byte basis. maybe the following will explain why it is useful ..... which is the hack I use to get CDATA back into the file again. Presumably you would think that if you opened an xml file into a DOM tree, then saved it again, then it would still be the same "kind" of document, i.e. CDATA nodes would STILL be CDATA nodes. Yes I assume 1) the node name is unique and 2) that it's first child is a text node ...... def convertTextNodeToCDataNodeByName(doc,name): node_list = doc.getElementsByTagNameNS('',name) text_node = node_list[0].firstChild text_data = retPrettyPrint(text_node) new_cdata_node = makeCDataSection(doc,text_data) text_node.parentNode.replaceChild(new_cdata_node,text_node) > > It is possible to write the document back so that the content is the > same as in the original document; the cleanest way for that is to use > ext.PrettyPrint. > > Regards, > Martin > > P.S. What you *can* get back is CDATA sections for every text element, > by properly inheriting from the PrettyPrinter. However, this will give > you CDATA sections in places where the original document had none. -- regards Matt From martin@mira.cs.tu-berlin.de Wed Jan 17 07:40:53 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 17 Jan 2001 08:40:53 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <0101171357420F.00889@localhost.localdomain> (message from matt on Wed, 17 Jan 2001 13:42:17 +1300) References: <01011622394806.00912@localhost.localdomain> <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de> <0101171357420F.00889@localhost.localdomain> Message-ID: <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> > Since one has NO interest in parsing the content, rendering, or > interpreting it, but does have an interest in locating a particular > node and adding a new fragment to it, then saving the modifed > document, via ext.PrettyPrint(which I am using), to file again, I understand you are not interested in parsing the document; if you build a DOM tree, parsing of the document will happen as a side effect. You cannot avoid this: this is the only way to get a DOM tree from a document. So while you are not interested in the parsing, you should accept that it is done. > then one obviously does not want CDATA markers to be removed, > because, 1) they may have not written the first document, and 2) > they are not trying to interpret it, Who is "they" here? The CDATA markers? or the users of your tool? So somebody has not written the document, and that same person/entity/whatever is not trying to interpret it. Why does it follow that this person/entity does not want the CDATA markers to be removed? If that person does not even look at the document, why is there any harm done by removing the CDATA markers. They have *no* meaning in the document. > You missed the point entirely in that I don't care where they are in > the document. I assume "they" is the CDATA markers, here. If you don't care where they are in the document, why is it a problem if there is no CDATA marker in the output of PrettyPrint? > maybe the following will explain why it is useful ..... which is the > hack I use to get CDATA back into the file again. Presumably you > would think that if you opened an xml file into a DOM tree, then > saved it again, then it would still be the same "kind" of document, That I would think. It should still be the same "kind" of document, i.e. have the same elements, the elements should have the same attributes, and elements containing text should still contain the same text. > i.e. CDATA nodes would STILL be CDATA nodes. No, I would not think that. Changing CDATA nodes to text does not change the document; it is still the same one. Replacing CDATA fragments with text is the same kind of transformation as replacing < with < - this does not change the document. > Yes I assume 1) the node name is unique and 2) that it's first child is a > text node ...... > > def convertTextNodeToCDataNodeByName(doc,name): > node_list = doc.getElementsByTagNameNS('',name) > text_node = node_list[0].firstChild > text_data = retPrettyPrint(text_node) > new_cdata_node = makeCDataSection(doc,text_data) > text_node.parentNode.replaceChild(new_cdata_node,text_node) That means you know in advance that you only have a single CDATA fragment in the original document, you want to produce one in the output in the same location (i.e. inside the same element as it was in the original input). What if there is more than one CDATA section in the original document? What if there was none? Regards, Martin From matt@virtualspectator.com Wed Jan 17 09:14:49 2001 From: matt@virtualspectator.com (matt) Date: Wed, 17 Jan 2001 22:14:49 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> References: <0101171357420F.00889@localhost.localdomain> <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> Message-ID: <01011722360200.00860@localhost.localdomain> ok, so now I am getting somewhere in understanding this .... more comments below On Wed, 17 Jan 2001, Martin v. Loewis wrote: > > Since one has NO interest in parsing the content, rendering, or > > interpreting it, but does have an interest in locating a particular > > node and adding a new fragment to it, then saving the modifed > > document, via ext.PrettyPrint(which I am using), to file again, > > I understand you are not interested in parsing the document; if you > build a DOM tree, parsing of the document will happen as a side > effect. You cannot avoid this: this is the only way to get a DOM tree > from a document. So while you are not interested in the parsing, you > should accept that it is done. This is where I see the extra step that is necessary, so tell me if I am on the right track. A CDATA section that contains xml will be translated by a parser into a text node that is still valid by virtue of the character references that it places in place of characters such as "<" ... i.e. <, and that for example if they wrote some naff xml in an input , eg "&&<<" this, if escaped in the original document by CDAT, would be translated into a text node with "&&<name><<". Now if that CDATA was supposed to be xml as well, but was necessarily hidden for a while so that validation could be performed further along a processing chain, then I also need to write a processor to replace the character references, in which case I could possibly define s for such a translation, so that the parser would see < instead of < > > > then one obviously does not want CDATA markers to be removed, > > because, 1) they may have not written the first document, and 2) > > they are not trying to interpret it, > > Who is "they" here? The CDATA markers? or the users of your tool? > many people who pick up a document and modify it and put it back. > So somebody has not written the document, and that same > person/entity/whatever is not trying to interpret it. Why does it > follow that this person/entity does not want the CDATA markers to be > removed? If that person does not even look at the document, why is > there any harm done by removing the CDATA markers. They have *no* > meaning in the document. Just the above, one wants to take the CDATA at some point and treat it as either an xml document on its own, or just part of the current xml document. The CDATA simply being used to escape sections that could possibly break validation at earlier points, eg on a server, where there may be no chance of handling bad xml sections, but that at a later point, eg some client application, then an exception can be handled nicely, in which case the CDATA section can now be safely interpreted. This is where I see I need reverse translation, and simply cannot directly parse what use to be a CDATA section. > > > You missed the point entirely in that I don't care where they are in > > the document. > > I assume "they" is the CDATA markers, here. If you don't care where > they are in the document, why is it a problem if there is no CDATA > marker in the output of PrettyPrint? as above > > > maybe the following will explain why it is useful ..... which is the > > hack I use to get CDATA back into the file again. Presumably you > > would think that if you opened an xml file into a DOM tree, then > > saved it again, then it would still be the same "kind" of document, > > That I would think. It should still be the same "kind" of document, > i.e. have the same elements, the elements should have the same > attributes, and elements containing text should still contain the same > text. > > > i.e. CDATA nodes would STILL be CDATA nodes. > > No, I would not think that. Changing CDATA nodes to text does not > change the document; it is still the same one. Replacing CDATA > fragments with text is the same kind of transformation as replacing > < with < - this does not change the document. > > > Yes I assume 1) the node name is unique and 2) that it's first child is a > > text node ...... > > > > def convertTextNodeToCDataNodeByName(doc,name): > > node_list = doc.getElementsByTagNameNS('',name) > > text_node = node_list[0].firstChild > > text_data = retPrettyPrint(text_node) > > new_cdata_node = makeCDataSection(doc,text_data) > > text_node.parentNode.replaceChild(new_cdata_node,text_node) > > That means you know in advance that you only have a single CDATA > fragment in the original document, you want to produce one in the > output in the same location (i.e. inside the same element as it was in > the original input). > > What if there is more than one CDATA section in the original document? > What if there was none? > I already do checking for it being a text node and the node names that are searched for are gauranteed to be unique and to be a single child node. > Regards, > Martin -- From martin@mira.cs.tu-berlin.de Wed Jan 17 17:47:19 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 17 Jan 2001 18:47:19 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011722360200.00860@localhost.localdomain> (message from matt on Wed, 17 Jan 2001 22:14:49 +1300) References: <0101171357420F.00889@localhost.localdomain> <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> <01011722360200.00860@localhost.localdomain> Message-ID: <200101171747.f0HHlJU00867@mira.informatik.hu-berlin.de> > A CDATA section that contains xml The entire document is xml; you probably mean "A CDATA section that contains markup delimiters" here. A CDATA section, by definition, contains only characters. It never contains markup. > will be translated by a parser into a text node that is still valid > by virtue of the character references that it places in place of > characters such as "<" ... i.e. <, and that for example if they > wrote some naff xml in an input , eg "&&<<" this, if escaped > in the original document by CDAT, would be translated into a text > node with "&&<name><<". Not exactly. Character entities will be replaced with their true characters in the DOM tree, i.e. the CDATA section will appear in the DOM tree as a text node with its contents; a text containing "<" in the input will be translated to "<" when creating the DOM tree. It is the *output* function that does any necessary escaping. So when the CDATA section contained a literal "<", then, on output, the pretty printer has the option of generating < or < or a CDATA section. > Now if that CDATA was supposed to be xml as well, but was > necessarily hidden for a while so that validation could be performed > further along a processing chain, It seems you are trying to use XML in a way not supported by any standard. If you have a CDATA section, it contains characters by definition; you can't suppose that these characters are markup. > then I also need to write a processor to replace the character > references, in which case I could possibly define s for > such a translation, so that the parser would see < instead of < No. Each conforming XML parser knows that < represents "<" - you don't need to supply a entity definition for that. It also knows that "<" cannot be represented as "<" in text; section 2.4 of the recommendation clearly says # The ampersand character (&) and the left angle bracket (<) may # appear in their literal form only when used as markup delimiters, or # within a comment, a processing instruction, or a CDATA section. ... # If they are needed elsewhere, they must be escaped using either # numeric character references or the strings "&" and "<" # respectively. So when generating XML, a conforming processor will only emit "<" outside a CDATA section to mean the markup delimiter. > Just the above, one wants to take the CDATA at some point and treat > it as either an xml document on its own, or just part of the current > xml document. That is not supported by the XML recommendation. A CDATA section only contains characters, not markup. So if you treat CDATA sections in any other way, you violate the XML recommendation. > The CDATA simply being used to escape sections that could possibly > break validation at earlier points, eg on a server, where there may > be no chance of handling bad xml sections, but that at a later > point, eg some client application, then an exception can be handled > nicely, in which case the CDATA section can now be safely > interpreted. This is where I see I need reverse translation, and > simply cannot directly parse what use to be a CDATA section. You need to invented a new markup language for that kind of processing; XML does not support such a kind of interpretation of a document. Regards, Martin From matt@virtualspectator.com Wed Jan 17 21:03:32 2001 From: matt@virtualspectator.com (matt) Date: Thu, 18 Jan 2001 10:03:32 +1300 Subject: [XML-SIG] CDATA sections still not handled Message-ID: <01011810040608.00856@localhost.localdomain> hmm, I'm off track again .... On Thu, 18 Jan 2001, you wrote: > > A CDATA section that contains xml > > The entire document is xml; you probably mean > > "A CDATA section that contains markup delimiters" > > here. A CDATA section, by definition, contains only characters. It > never contains markup. > > > will be translated by a parser into a text node that is still valid > > by virtue of the character references that it places in place of > > characters such as "<" ... i.e. <, and that for example if they > > wrote some naff xml in an input , eg "&&<<" this, if escaped > > in the original document by CDAT, would be translated into a text > > node with "&&<name><<". > > Not exactly. Character entities will be replaced with their true > characters in the DOM tree, i.e. the CDATA section will appear in the > DOM tree as a text node with its contents; a text containing "<" in > the input will be translated to "<" when creating the DOM tree. > This translation obviously happens after validation, since invalid xml like data in CDATA will never be validated against. Which is what I want. > It is the *output* function that does any necessary escaping. So when > the CDATA section contained a literal "<", then, on output, the pretty > printer has the option of generating < or < or a CDATA section. > > > Now if that CDATA was supposed to be xml as well, but was > > necessarily hidden for a while so that validation could be performed > > further along a processing chain, > > It seems you are trying to use XML in a way not supported by any > standard. If you have a CDATA section, it contains characters by > definition; you can't suppose that these characters are markup. I don't suppose they are, I know they are. > > > then I also need to write a processor to replace the character > > references, in which case I could possibly define s for > > such a translation, so that the parser would see < instead of < > > No. Each conforming XML parser knows that < represents "<" - you > don't need to supply a entity definition for that. It also knows that > "<" cannot be represented as "<" in text; section 2.4 of the > recommendation clearly says > > # The ampersand character (&) and the left angle bracket (<) may > # appear in their literal form only when used as markup delimiters, or > # within a comment, a processing instruction, or a CDATA section. ... > # If they are needed elsewhere, they must be escaped using either > # numeric character references or the strings "&" and "<" > # respectively. > > So when generating XML, a conforming processor will only emit "<" > outside a CDATA section to mean the markup delimiter. > > > Just the above, one wants to take the CDATA at some point and treat > > it as either an xml document on its own, or just part of the current > > xml document. > > That is not supported by the XML recommendation. A CDATA section only > contains characters, not markup. So if you treat CDATA sections in any > other way, you violate the XML recommendation. ummm, here is another confusing part ... the following is from the xml specification : 2.7 CDATA Sections [Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "":] ummm, so can you be clearer about my apparent violation of CDATA by putting xml like data in it? > > > The CDATA simply being used to escape sections that could possibly > > break validation at earlier points, eg on a server, where there may > > be no chance of handling bad xml sections, but that at a later > > point, eg some client application, then an exception can be handled > > nicely, in which case the CDATA section can now be safely > > interpreted. This is where I see I need reverse translation, and > > simply cannot directly parse what use to be a CDATA section. > > You need to invented a new markup language for that kind of > processing; XML does not support such a kind of interpretation of a > document. No I don't, because it works fine when the CDATA label are kept, but you are also saying that a parser can/should translate the character references such as "<", and looking at expat, it does, so, well, it seems to work perfectly fine. But now I am interested why this is a violation. A perfectly acceptable use is that one uses xml to wrap a message, which itself may be xml, but ut is up to the message interpreter later on to figure out if it valid. > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig regards Matt ------------------------------------------------------- -- Matt Halstead (PhD) Research and development VirtualSpectator http://www.virtualspectator.com ph 64-9-9136896 From martin@mira.cs.tu-berlin.de Wed Jan 17 21:57:18 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 17 Jan 2001 22:57:18 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011810040608.00856@localhost.localdomain> (message from matt on Thu, 18 Jan 2001 10:03:32 +1300) References: <01011810040608.00856@localhost.localdomain> Message-ID: <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de> > This translation obviously happens after validation, since invalid xml like > data in CDATA will never be validated against. Which is what I want. I'm telling you: the data in CDATA are is just character text, not markup. So no matter what text you put in there, it is always well-formed and valid (unless it violates the document charset). > > It seems you are trying to use XML in a way not supported by any > > standard. If you have a CDATA section, it contains characters by > > definition; you can't suppose that these characters are markup. > > I don't suppose they are, I know they are. Maybe in your understanding of how your application should work. Not in XML. > 2.7 CDATA Sections > > [Definition: CDATA sections may occur anywhere character data may occur; > they are used to escape blocks of text containing characters which would > otherwise be recognized as markup. CDATA sections begin with the > string "":] > > ummm, so can you be clearer about my apparent violation of CDATA by > putting xml like data in it? It is completely well-formed to put "xml-like" data into a CDATA section. However, an application that suddenly "turns" those data into markup by removing the CDATA markers violates XML; it appears that your application is supposed to operate in such a way. IOW, the data might look like xml. When they are in a CDATA section, they are not markup. Trying to see them as markup at some point and not as markup at some other point means to read something into the XML standard that is not there. > > You need to invented a new markup language for that kind of > > processing; XML does not support such a kind of interpretation of a > > document. > > > No I don't, because it works fine when the CDATA label are kept, but you are > also saying that a parser can/should translate the character references > such as "<", and looking at expat, it does, so, well, it seems to work > perfectly fine. To be precise, I'm saying it can. It might chose to keep the generate rougly the same, or even more, CDATA sections on output as well. >But now I am interested why this is a violation. A perfectly >acceptable use is that one uses xml to wrap a message, which itself >may be xml, but ut is up to the message interpreter later on to >figure out if it valid. It's not a violation to put "xml like" data into a CDATA section, but they are just plain character data. I said # So if you treat CDATA sections in any other way, you violate the XML # recommendation. *That* is something you cannot expect to work. Regards, Martin From matt@virtualspectator.com Wed Jan 17 23:11:26 2001 From: matt@virtualspectator.com (matt) Date: Thu, 18 Jan 2001 12:11:26 +1300 Subject: Fwd: Re: [XML-SIG] CDATA sections still not handled Message-ID: <01011812115302.00886@localhost.localdomain> Now I see where you are coming from. No I don't expect anything to suddenly see xml where CDATA was and interpret it within the same context of the document containing this node. All I am saying is that xml documant A holds a node B. Node B happens to contain some xml, because that is part of a message format. A doesn't need to know about the form of B, in only so far as it is CDATA and therefore it should not try to validate it as xml if it contains xml markup, but it will validate the character set, as, yes it is character data. At some point a process picks up A, searches for node B, extracts it, does NOT assume it is xml, but will look through it for any xml that exists, If it finds some then it validates it ... which means that section will be cut out ans passed to an xml parser. The important thing that I think I understand is the following : Any xml in the CDATA section doesn't need to look like xml to the human reader. A parser however, when handling a text node may do the following : a) if the tag CDATA is still there, then call handlers for the start and and CDATA sections, and pass the character data(which may contain markup explicitly) to the character data handler. b) if the CDATA tags are not there, then it will/needs to be represented as character references, such as < and one needs to make sure that it is translated, either by the parser or by the process reading it into the correct characters before being passed to a stream for later processing and possibly validation. On Thu, 18 Jan 2001, you wrote: > > This translation obviously happens after validation, since invalid xml like > > data in CDATA will never be validated against. Which is what I want. > > I'm telling you: the data in CDATA are is just character text, not > markup. So no matter what text you put in there, it is always > well-formed and valid (unless it violates the document charset). > so what's this then ? <<, but we don't want to validate this yet]]> looks like markup inside CDATA to me .... I think you actually mean "unescaped" character data does not contain markup, eg : < is certainly not markup. > > > It seems you are trying to use XML in a way not supported by any > > > standard. If you have a CDATA section, it contains characters by > > > definition; you can't suppose that these characters are markup. > > > > I don't suppose they are, I know they are. > > Maybe in your understanding of how your application should work. Not > in XML. what would you say to someone wanting to let other people put html formatting in text node data, but knowing that html is often not written as valid xml, then escaping it is a safe bet .... > > > 2.7 CDATA Sections > > > > [Definition: CDATA sections may occur anywhere character data may occur; > > they are used to escape blocks of text containing characters which would > > otherwise be recognized as markup. CDATA sections begin with the > > string "":] > > > > > ummm, so can you be clearer about my apparent violation of CDATA by > > putting xml like data in it? > > It is completely well-formed to put "xml-like" data into a CDATA > section. However, an application that suddenly "turns" those data into > markup by removing the CDATA markers violates XML; it appears that > your application is supposed to operate in such a way. Nope, nowhere near what I am trying to do. A and B are independent.(see above) > > IOW, the data might look like xml. When they are in a CDATA section, > they are not markup. Trying to see them as markup at some point and > not as markup at some other point means to read something into the XML > standard that is not there. ..... makes my html example look wrong, yet it is a common use for CDATA. > > > > You need to invented a new markup language for that kind of > > > processing; XML does not support such a kind of interpretation of a > > > document. > > > > > > No I don't, because it works fine when the CDATA label are kept, but you are > > also saying that a parser can/should translate the character references > > such as "<", and looking at expat, it does, so, well, it seems to work > > perfectly fine. > > To be precise, I'm saying it can. It might chose to keep the generate > rougly the same, or even more, CDATA sections on output as well. > > >But now I am interested why this is a violation. A perfectly > >acceptable use is that one uses xml to wrap a message, which itself > >may be xml, but ut is up to the message interpreter later on to > >figure out if it valid. > > It's not a violation to put "xml like" data into a CDATA section, but > they are just plain character data. I said > > # So if you treat CDATA sections in any other way, you violate the XML > # recommendation. > > *That* is something you cannot expect to work. > All I originally wanted was for CDATA tags to remain in place so that at some point, when looking at B, one could actually look for the markup tags. Now that I know these are often reverse translated when character data is handles then that is fine(I know they are with expat). regards Matt > Regards, > Martin ------------------------------------------------------- -- Matt Halstead (PhD) Research and development VirtualSpectator http://www.virtualspectator.com ph 64-9-9136896 From ken@bitsko.slc.ut.us Wed Jan 17 23:32:09 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 17 Jan 2001 17:32:09 -0600 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: "Martin v. Loewis"'s message of "Wed, 17 Jan 2001 22:57:18 +0100" References: <01011810040608.00856@localhost.localdomain> <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de> Message-ID: Matt, If I understand this thread correctly, it's the common "how do I pass XML inside XML" question. CDATA sections are not relevant to this question. These two XML fragments are equivalent for all practical purposes: <[CDATA[Some & &entities; inside XML]]> Some <tags> &amp; &entities; inside XML In both cases your application will see: startElement() with element name 'my-tag' characters() with data "Some & &entities; inside XML" endElement() with element name 'my-tag' That the data "is" XML is also not relevant to this question, it could be any type of data that contains markup characters. If you want to "do something with the XML" inside the XML, the easiest way is to use another instance of a parser to parse the string as XML. If you are interested in preserving the fact that the original file used a CDATA section to escape the markup, instead of entities to escape the markup, I believe SAX2 does provide that information, but you need to evaluate whether or not that really does what you want. Besides downplaying CDATA sections, a SAX parser is going to normalize a lot of other characters from the original file before it passes it to you, in such a way that you really can't reproduce the original file. Does that help? -- Ken From ndw@nwalsh.com Thu Jan 18 08:08:05 2001 From: ndw@nwalsh.com (Norman Walsh) Date: 18 Jan 2001 15:08:05 +0700 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011722360200.00860@localhost.localdomain> References: <0101171357420F.00889@localhost.localdomain> <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> <01011722360200.00860@localhost.localdomain> Message-ID: <87k87t8hrm.fsf@nwalsh.com> / matt was heard to say: | On Wed, 17 Jan 2001, Martin v. Loewis wrote: [...] | > I understand you are not interested in parsing the document; if you | > build a DOM tree, parsing of the document will happen as a side | > effect. You cannot avoid this: this is the only way to get a DOM tree | > from a document. So while you are not interested in the parsing, you | > should accept that it is done. | | This is where I see the extra step that is necessary, so tell me if | I am on the right track. I'm not trying to be pedantic, it just looks that way :-) | A CDATA section that contains xml will be translated by a parser A CDATA section cannot contain XML. It contains text, with a particular form of escaping. | into a text node that is still valid by virtue of the character | references that it places in place of characters such as "<" | ... i.e. <, and that for example if they wrote some naff xml in | an input , eg "&&<<" this, if escaped in the original document | by CDAT, would be translated into a text node with | "&&<name><<". I think about this in a different way. Parsing a document that contains <<]]> produces an XML information set that includes a text node that contains the Unicode characters "&" "&" "<" "n" "a" "m" "e" ">" "<" "<" These characters are not escaped in any way. If the processor subsequently has reason to serialize the text node in question, it may use any (or all) of the following mechanisms to do so: 1. CDATA sections 2. The predefined entities < and & 3. Using numeric character references, < and & (in either decimal or hex). If the document is known to have additional entity declarations associated with it, these entities may also be used (for example, >). | Now if that CDATA was supposed to be | xml as well, but was necessarily hidden for a while so that | validation could be performed further along a processing chain, then | I also need to write a processor to replace the character | references, in which case I could possibly define s for | such a translation, so that the parser would see < instead of < There's no easy means to "unescape" these characters in an XML processor. You can do it with Python, or some other non-XML string processing language, and you could do it with XSLT using disable-output-escaping (in some limited circumstances). | many people who pick up a document and modify it and put it back. Assuming I haven't made any typos, the following serializations of a text node: <<]]> &&<name><< &&<name><< <name> are indistinguishable to an XML processor. It *doesn't matter* what escaping mechanism you use, unless you are including non-XML processors. If you're using non-XML processors, you may care about the escaping, but XML isn't designed to help you with that problem. (And you may care about other things that XML can't help you with, like the serialization order of attributes.) | Just the above, one wants to take the CDATA at some point and treat it as | either an xml document on its own, or just part of the current xml document. | The CDATA simply being used to escape sections that could possibly break | validation at earlier points, eg on a server, where there may be no chance of | handling bad xml sections, but that at a later point, eg some client | application, then an exception can be handled nicely, in which case the CDATA | section can now be safely interpreted. This is where I see I need reverse | translation, and simply cannot directly parse what use to be a CDATA section. Don't do that. I'm serious. You don't say exactly what problem you're trying to solve, but the solution you're outlining is ugly and fragile. (IMHO, naturally.) Be seeing you, norm -- Norman Walsh | Life is a great bundle of little http://nwalsh.com/ | things.--Oliver Wendell Holmes From matt@virtualspectator.com Thu Jan 18 12:20:52 2001 From: matt@virtualspectator.com (matt) Date: Fri, 19 Jan 2001 01:20:52 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: References: <01011810040608.00856@localhost.localdomain> <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de> Message-ID: <01011901332205.00859@localhost.localdomain> On Thu, 18 Jan 2001, Ken MacLeod wrote: > Matt, > > If I understand this thread correctly, it's the common "how do I pass > XML inside XML" question. sort of ... but that will answer it too. > > CDATA sections are not relevant to this question. These two XML > fragments are equivalent for all practical purposes: > > <[CDATA[Some & &entities; inside XML]]> > > Some <tags> &amp; &entities; inside XML > > In both cases your application will see: > > startElement() with element name 'my-tag' > characters() with data "Some & &entities; inside XML" > endElement() with element name 'my-tag' > Yes, yes, that is what I have been trying to say. CDATA just lets it remain human readable in the original document. But once through a DOM implementation and all that is gone, you get the second option back out. Which is fine w.r.t parsing down the line, but not much fun when perusing modified documents. > > That the data "is" XML is also not relevant to this question, it could > be any type of data that contains markup characters. Yes, I also include program fragments sometimes ..... so that's another good example. > > If you want to "do something with the XML" inside the XML, the easiest > way is to use another instance of a parser to parse the string as XML. > Yep, I mentioned that in about my second email, that some "other" process will be the thing that reads this data and "possibly" validating it if it indeed needs to. > If you are interested in preserving the fact that the original file > used a CDATA section to escape the markup, instead of entities to > escape the markup, I believe SAX2 does provide that information, but > you need to evaluate whether or not that really does what you want. > Besides downplaying CDATA sections, a SAX parser is going to normalize > a lot of other characters from the original file before it passes it > to you, in such a way that you really can't reproduce the original > file. Yes, I found that both fortunate and unfortunate. I now see that if I want my data to remain clean in the sense I can still look at it an read it with some ease, then I need to write my own reverse-translation method and then rewrap those text data nodes with CDATA tags again, and save that document. > > Does that help? Yes, very much so, it means I WAS on the right track, and that it IS normal to want to put xml or xml like data within an xml document and not have it parsed for well-formedness. Maybe I am a rare exception where my translated CDATA, i.e. in 'entity references' just looks such a nightmare to read through. Keeping the original characters speeds debugging of contained data immensely. > > -- Ken > thanks regards Matt > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- From matt@virtualspectator.com Thu Jan 18 09:27:46 2001 From: matt@virtualspectator.com (matt) Date: Thu, 18 Jan 2001 22:27:46 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <87k87t8hrm.fsf@nwalsh.com> References: <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> Message-ID: <01011823353400.00859@localhost.localdomain> ... comments throughout ... On Thu, 18 Jan 2001, Norman Walsh wrote: > / matt was heard to say: > | On Wed, 17 Jan 2001, Martin v. Loewis wrote: > [...] > | > I understand you are not interested in parsing the document; if you > | > build a DOM tree, parsing of the document will happen as a side > | > effect. You cannot avoid this: this is the only way to get a DOM tree > | > from a document. So while you are not interested in the parsing, you > | > should accept that it is done. > | > | This is where I see the extra step that is necessary, so tell me if > | I am on the right track. > > I'm not trying to be pedantic, it just looks that way :-) > > | A CDATA section that contains xml will be translated by a parser > > A CDATA section cannot contain XML. It contains text, with a > particular form of escaping. Ok, so now I am being pedantic, but this is good, I'm getting a clearer idea of xml usage, my entry to xml has been recent and only from the building side of documents, but now that I have to process them heavily it's nice to reason out these things. >From what I am seeing it seems CDATA can hold anything it wants, within the constraints of the character encoding set. Say I formed my own language that happend to use things like "<" very often, then CDATA seems to give me and "initial" way to write this in a plain, raw form, without translating it to entity references first. This is nice, since your new language section within the xml document is still human readable. It won't matter which way you go from the point of the parser, because, for example, expat will recognize it as character data by virtue of the CDATA escaping, or by the alternative replacement of all xml markup in that section by entity references. There is no way around the fact that CDATA allows you to write xml, programming code, ..... whatever you want inside CDATA. The parser will NOT try to parse it. For all I care, I could have encoded it with BASE64 ..... I don't need it to be parsed as part of the document. > > | into a text node that is still valid by virtue of the character > | references that it places in place of characters such as "<" > | ... i.e. <, and that for example if they wrote some naff xml in > | an input , eg "&&<<" this, if escaped in the original document > | by CDAT, would be translated into a text node with > | "&&<name><<". > > I think about this in a different way. Parsing a document that contains > <<]]> produces an XML information set that includes > a text node that contains the Unicode characters > > "&" "&" "<" "n" "a" "m" "e" ">" "<" "<" > > These characters are not escaped in any way. Nope, not after they have been parsed, but they certainly were when they were part of the CDATA section in the original document. As the specification says, they are used to ESCAPE blocks of text containing characters which would otherwise be recognized as markup. More on this below .... > > If the processor subsequently has reason to serialize the text node > in question, it may use any (or all) of the following mechanisms to > do so: > > 1. CDATA sections > 2. The predefined entities < and & > 3. Using numeric character references, < and & (in either > decimal or hex). > > If the document is known to have additional entity declarations associated > with it, these entities may also be used (for example, >). > > | Now if that CDATA was supposed to be > | xml as well, but was necessarily hidden for a while so that > | validation could be performed further along a processing chain, then > | I also need to write a processor to replace the character > | references, in which case I could possibly define s for > | such a translation, so that the parser would see < instead of < > > There's no easy means to "unescape" these characters in an XML > processor. You can do it with Python, or some other non-XML string > processing language, and you could do it with XSLT using > disable-output-escaping (in some limited circumstances). > > | many people who pick up a document and modify it and put it back. > > Assuming I haven't made any typos, the following serializations of a > text node: > > <<]]> > &&<name><< > &&<name><< > <name> > > are indistinguishable to an XML processor. yes, I realize that. >It *doesn't matter* what > escaping mechanism you use, unless you are including non-XML > processors. If you're using non-XML processors, you may care about > the escaping, but XML isn't designed to help you with that problem. > (And you may care about other things that XML can't help you with, > like the serialization order of attributes.) > > | Just the above, one wants to take the CDATA at some point and treat it as > | either an xml document on its own, or just part of the current xml document. > | The CDATA simply being used to escape sections that could possibly break > | validation at earlier points, eg on a server, where there may be no chance of > | handling bad xml sections, but that at a later point, eg some client > | application, then an exception can be handled nicely, in which case the CDATA > | section can now be safely interpreted. This is where I see I need reverse > | translation, and simply cannot directly parse what use to be a CDATA section. > > Don't do that. I'm serious. You don't say exactly what problem you're > trying to solve, but the solution you're outlining is ugly and > fragile. (IMHO, naturally.) No it's not. If I put base64 encoded gzip compressed versions of the same "escaped xml fragments" that I want to hide, then that would seem to make you happy. These xml documents are a transport, and when a transpot is interpreted then certain tags may mean do something with the character data of this node. All seems pretty normal to me. For example, say one wants to transport html. Now html is usually really ugly in that it is hardly ever well formed xml. Escaping with CDATA it is an easy way to hide that, and giving that data to an html renderer some time later would be fine. Being in CDATA, it is never parsed for "well formedness". Of course now I understand that a DOM implementation will remove CDATA tags and replace all character data between them with entity references where necessary. If this is then persisted to disk and later parsed with an xml handler, then the real characters will come back out again in the character stream for the text node. So that is fine too, I get back what I put in, and who cares whether it was xml, or someones program code. So the conclusion is that CDATA is just a useless feature if you are parsing it into a DOM tree. All it gives you is a free way of translating markup to entity references. That is nice in that sense, but not so nice that you have now rendered your previously escaped sections as not very human readable anymore. And this can be a problem. If someone complains that, for example, their message, which was transported via some transport xml, looked weird, and all that you had was the raw transport packets on your server, then if things are still wrapped in nice CDATA tags then you can easily look through it and find the improper formatting in the message. However, if the message has been translated into entity references, then forget it, you may as well be looking at binary in a hex editor in some instances. regards Matt > > Be seeing you, > norm > > -- > Norman Walsh | Life is a great bundle of little > http://nwalsh.com/ | things.--Oliver Wendell Holmes From jday@csihq.com Thu Jan 18 16:26:30 2001 From: jday@csihq.com (John Day) Date: Thu, 18 Jan 2001 11:26:30 -0500 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011823353400.00859@localhost.localdomain> References: <87k87t8hrm.fsf@nwalsh.com> <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> Message-ID: <4.3.1.0.20010118112124.00cf3810@mail.csihq.com> --=====================_56340670==_.ALT Content-Type: text/plain; charset="us-ascii"; format=flowed At 10:27 PM 1/18/01 +1300, matt wrote: >weird, and all that you had was the raw transport packets on your server, then >if things are still wrapped in nice CDATA tags then you can easily look >through it and find the improper formatting in the message. However, if the Matt, I think most of your problem is caused by viewing CDATA as a kind of markup tag. It's not. Your problem is easily solved by inventing some real XML tag to wrap around your 'encoded' data, e.g. {HTML-encoded-as-CDATA-or-whatever} Then you won't care how the html is handled but you can still extract all of the precisely because it's marked up by 'real' tags. John Day Staff Scientist Computer Science Innovations --=====================_56340670==_.ALT Content-Type: text/html; charset="us-ascii" At 10:27 PM 1/18/01 +1300, matt wrote:
weird, and all that you had was the raw transport packets on your server, then
if things are still wrapped in nice CDATA tags then you can easily look
through it and find the improper formatting in the message.  However, if the

Matt,

I think most of your problem is caused by viewing CDATA as a kind of markup tag. It's not. Your problem is easily solved by inventing some real XML tag to wrap around your 'encoded' data, e.g.

<html>  {HTML-encoded-as-CDATA-or-whatever} </html>

Then you won't care how the html is handled but you can still extract all of the precisely because it's marked up by 'real' tags.

John Day
Staff Scientist
Computer Science Innovations --=====================_56340670==_.ALT-- From ndw@nwalsh.com Thu Jan 18 16:54:38 2001 From: ndw@nwalsh.com (Norman Walsh) Date: 18 Jan 2001 23:54:38 +0700 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011823353400.00859@localhost.localdomain> References: <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> <01011823353400.00859@localhost.localdomain> Message-ID: <87ae8on6lt.fsf@nwalsh.com> / matt was heard to say: | happend to use things like "<" very often, then CDATA seems to give me and | "initial" way to write this in a plain, raw form, without translating it to | entity references first. In the interest of technical accuracy, I'll point out that there's nothing that says a processor is not allowed to use CDATA to escape text. (It might be an interesting switch on a serializer: use CDATA for any text node that contains more than 5% entity references or something...) | > Don't do that. I'm serious. You don't say exactly what problem you're | > trying to solve, but the solution you're outlining is ugly and | > fragile. (IMHO, naturally.) | | No it's not. If I put base64 encoded gzip compressed versions of the same | "escaped xml fragments" that I want to hide, then that would seem to make you | happy. These xml documents are a transport, and when a transpot is interpreted | then certain tags may mean do something with the character data of this node. | All seems pretty normal to me. Ok, perhaps I overstated the case. I should have said something like "in most cases that's going to be ugly and fragile". XML isn't particularly good at wrapping up other chunks of XML. Using CDATA sections is dangerous if there's any chance that the text you're wrapping up might contain "]]>". For example, if one of the documents that you're wrapping up has its own CDATA section. | through it and find the improper formatting in the message. However, if the | message has been translated into entity references, then forget it, you may as | well be looking at binary in a hex editor in some instances. Yes. That's a problem. Maybe you need that special-purpose serializer I alluded to above. Be seeing you, norm -- Norman Walsh | It is not impossibilities which fill us http://nwalsh.com/ | with the deepest despair, but | possibilities which we have failed to | realize.--Robert Mallet From iron@mso.oz.net Thu Jan 18 16:54:32 2001 From: iron@mso.oz.net (Mike Orr) Date: Thu, 18 Jan 2001 08:54:32 -0800 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011823353400.00859@localhost.localdomain>; from matt@virtualspectator.com on Thu, Jan 18, 2001 at 10:27:46PM +1300 References: <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> <01011823353400.00859@localhost.localdomain> Message-ID: <20010118085431.A15316@mso.oz.net> On Thu, Jan 18, 2001 at 10:27:46PM +1300, matt wrote: > For example, say one wants to transport html. > Now html is usually really ugly in that it is hardly ever well formed xml. > Escaping with CDATA it is an easy way to hide that, and giving that data to an > html renderer some time later would be fine. Being in CDATA, it is never > parsed for "well formedness". I was just about to suggest looking at it this way. If you have a set of records and a certain tag contains HTML, which you don't want to un-CDATA-ize because the (human) editor doesn't want to see or type <H1> . Three other questions. Are there certain tags that will always be CDATA, or does it differ randomly from document to document? Do you care whether your application changes the witespace outside that CDATA section, making an "equivalent" document? Or do you want the indentation and all to remain exactly as it is? If you know that a certain tag should always be CDATA, and you're willing to settle for an "equivalent" document otherwise, then maybe it doesn't matter that the parser normalizes CDATA on input, because you can write it out manually and convert that tag body to CDATA. If the CDATA sections will be coming in at random and you must leave the document formatted exactly as it is (minus whatever changes your application is supposed to be making to it), then perhaps you need a lower-level parser than full XML. Perhaps then you'll want to consider modifying one of the existing XML parser classes or the sgmllib parser to fit your needs. -- -Mike (Iron) Orr, iron@mso.oz.net (if mail problems: mso@jimpick.com) http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol From ndw@nwalsh.com Thu Jan 18 16:58:58 2001 From: ndw@nwalsh.com (Norman Walsh) Date: 18 Jan 2001 23:58:58 +0700 Subject: Fwd: Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011812115302.00886@localhost.localdomain> References: <01011812115302.00886@localhost.localdomain> Message-ID: <8766jcn6el.fsf@nwalsh.com> / matt was heard to say: | | | <<, but we don't | want to validate this yet]]> | | | looks like markup inside CDATA to me .... I think you actually mean | "unescaped" character data does not contain markup, eg : < is certainly not | markup. Yes, it looks like markup to you because you're a human being. At least, I think you are. Maybe you're just an NSA machine that passes the turing test, I dunno. Then again, maybe that's all I am, so nevermind. It does not look like markup to the XML processor. | what would you say to someone wanting to let other people put html formatting | in text node data, but knowing that html is often not written as valid xml, | then escaping it is a safe bet .... I see your point, but I warned you that we were in danger of pedantry. :-) Be seeing you, norm -- Norman Walsh | Do not seek to follow in the footsteps http://nwalsh.com/ | of men of old; seek what they | sought.--Matsuo Basho From matt@virtualspectator.com Thu Jan 18 20:15:13 2001 From: matt@virtualspectator.com (matt) Date: Fri, 19 Jan 2001 09:15:13 +1300 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <4.3.1.0.20010118112124.00cf3810@mail.csihq.com> References: <87k87t8hrm.fsf@nwalsh.com> <4.3.1.0.20010118112124.00cf3810@mail.csihq.com> Message-ID: <01011909164601.00874@localhost.localdomain> On Fri, 19 Jan 2001, John Day wrote: > > At 10:27 PM 1/18/01 +1300, matt wrote: > >weird, and all that you had was the raw transport packets on your server, then > >if things are still wrapped in nice CDATA tags then you can easily look > >through it and find the improper formatting in the message. However, if the > > Matt, > > I think most of your problem is caused by viewing CDATA as a kind of markup > tag. It's not. Your problem is easily solved by inventing some real XML tag > to wrap around your 'encoded' data, e.g. > > {HTML-encoded-as-CDATA-or-whatever} > > Then you won't care how the html is handled but you can still extract all > of the precisely because it's marked up by 'real' tags. I do that already .... I usually wrap all messages with .... . I certainly don't use CDATA as an identifier. Any DOM implementation that would allow me to do that would be wrong in doing so. > > John Day > Staff Scientist > Computer Science Innovations ---------------------------------------- Content-Type: text/html; name="unnamed" Content-Transfer-Encoding: 7bit Content-Description: ---------------------------------------- From matt@virtualspectator.com Thu Jan 18 20:17:46 2001 From: matt@virtualspectator.com (matt) Date: Fri, 19 Jan 2001 09:17:46 +1300 Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <20010118085431.A15316@mso.oz.net> References: <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net> Message-ID: <01011909403002.00874@localhost.localdomain> On Fri, 19 Jan 2001, Mike Orr wrote: > On Thu, Jan 18, 2001 at 10:27:46PM +1300, matt wrote: > > For example, say one wants to transport html. > > Now html is usually really ugly in that it is hardly ever well formed xml. > > Escaping with CDATA it is an easy way to hide that, and giving that data to an > > html renderer some time later would be fine. Being in CDATA, it is never > > parsed for "well formedness". > > I was just about to suggest looking at it this way. If you have a set > of records and a certain tag contains HTML, which you don't want to > un-CDATA-ize because the (human) editor doesn't want to see or type > <H1> . Exactly. > > Three other questions. Are there certain tags that will always be CDATA, > or does it differ randomly from document to document? Do you care > whether your application changes the witespace outside that CDATA > section, making an "equivalent" document? Or do you want the > indentation and all to remain exactly as it is? Hmm, no, in my most common case, whitespace is not an issue, eg: html being transported, but in some instances keeping the correct whitespace within messages may be useful .... eg : when it is program code, where this could be a) critical to preserving scope, or b) again the human readability factor. In any case the message is between message tags, eg : , so it doesn't matter if there are numerous CDATA sections within it, which would be the case if one was to append more data to the message instead of doing a node replace. > > If you know that a certain tag should always be CDATA, and you're > willing to settle for an "equivalent" document otherwise, then maybe > it doesn't matter that the parser normalizes CDATA on input, > because you can write it out manually and convert that tag body to CDATA. That is what I currently do, and it works really well, and preserves my sanity server side. > > If the CDATA sections will be coming in at random and you must leave > the document formatted exactly as it is (minus whatever changes your > application is supposed to be making to it), then perhaps you need a > lower-level parser than full XML. Perhaps then you'll want to consider > modifying one of the existing XML parser classes or the sgmllib parser > to fit your needs. That would defeat my intention of using xml from the point of view that it is a standard. What you raise though is interesting, if I go full circle and readdress my original question that "CDATA sections are still not handled" then I was just wondering that since one gets CDATA begin and end events while parsing a document that contains CDATA section, then why couldn't the DOM document still represent it as a CDATA section internally? as it was when first created. Furthermore, a parser such as expat will preserve the original form of the characters that have been escaped, and even convert them if they happened to be in entity references. It seems to me that the handling of CDATA sits at the level of it's base class which is a text node and that the CDATA sections are only used to say "don't validate the following, it is ALL character data".. > > -- > -Mike (Iron) Orr, iron@mso.oz.net (if mail problems: mso@jimpick.com) > http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol -- regards Matt From matt@virtualspectator.com Thu Jan 18 20:47:57 2001 From: matt@virtualspectator.com (matt) Date: Fri, 19 Jan 2001 09:47:57 +1300 Subject: thread 2) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <87ae8on6lt.fsf@nwalsh.com> References: <01011823353400.00859@localhost.localdomain> <87ae8on6lt.fsf@nwalsh.com> Message-ID: <01011909515003.00874@localhost.localdomain> On Fri, 19 Jan 2001, Norman Walsh wrote: > / matt was heard to say: > | happend to use things like "<" very often, then CDATA seems to give me and > | "initial" way to write this in a plain, raw form, without translating it to > | entity references first. > > In the interest of technical accuracy, I'll point out that there's nothing > that says a processor is not allowed to use CDATA to escape text. (It might > be an interesting switch on a serializer: use CDATA for any text node that > contains more than 5% entity references or something...) > > | > Don't do that. I'm serious. You don't say exactly what problem you're > | > trying to solve, but the solution you're outlining is ugly and > | > fragile. (IMHO, naturally.) > | > | No it's not. If I put base64 encoded gzip compressed versions of the same > | "escaped xml fragments" that I want to hide, then that would seem to make you > | happy. These xml documents are a transport, and when a transpot is interpreted > | then certain tags may mean do something with the character data of this node. > | All seems pretty normal to me. > > Ok, perhaps I overstated the case. I should have said something like "in > most cases that's going to be ugly and fragile". > > XML isn't particularly good at wrapping up other chunks of XML. Using > CDATA sections is dangerous if there's any chance that the text you're > wrapping up might contain "]]>". For example, if one of the documents > that you're wrapping up has its own CDATA section. Is it perhaps cleaner to use xlinks for the message nodes? I haven't used these yet, but I gather it would seperate transport from message. Though to maintain performance a server would have to parse it first to see what to transport in the same network connection. > > | through it and find the improper formatting in the message. However, if the > | message has been translated into entity references, then forget it, you may as > | well be looking at binary in a hex editor in some instances. > > Yes. That's a problem. Maybe you need that special-purpose serializer > I alluded to above. > > Be seeing you, > norm > > -- > Norman Walsh | It is not impossibilities which fill us > http://nwalsh.com/ | with the deepest despair, but > | possibilities which we have failed to > | realize.--Robert Mallet -- From matt@virtualspectator.com Thu Jan 18 20:53:29 2001 From: matt@virtualspectator.com (matt) Date: Fri, 19 Jan 2001 09:53:29 +1300 Subject: thread 3) Re: Fwd: Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <8766jcn6el.fsf@nwalsh.com> References: <01011812115302.00886@localhost.localdomain> <8766jcn6el.fsf@nwalsh.com> Message-ID: <01011909565004.00874@localhost.localdomain> On Fri, 19 Jan 2001, Norman Walsh wrote: > / matt was heard to say: > | > | > | <<, but we don't > | want to validate this yet]]> > | > | > | looks like markup inside CDATA to me .... I think you actually mean > | "unescaped" character data does not contain markup, eg : < is certainly not > | markup. > > Yes, it looks like markup to you because you're a human being. That is exactly my purpose. >At east, I think you are. Maybe you're just an NSA machine that passes > the turing test, I dunno. Then again, maybe that's all I am, so > nevermind. It does not look like markup to the XML processor. > | what would you say to someone wanting to let other people put html formatting > | in text node data, but knowing that html is often not written as valid xml, > | then escaping it is a safe bet .... > > I see your point, but I warned you that we were in danger of pedantry. :-) The last thing I want is for the xml to become a mess, so pedantry is good. Perhaps it will force me to keep these messages seperate from the transport and instead just place references within the document. > > Be seeing you, > norm > > -- > Norman Walsh | Do not seek to follow in the footsteps > http://nwalsh.com/ | of men of old; seek what they > | sought.--Matsuo Basho -- From iron@mso.oz.net Thu Jan 18 21:11:29 2001 From: iron@mso.oz.net (Mike Orr) Date: Thu, 18 Jan 2001 13:11:29 -0800 Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011909403002.00874@localhost.localdomain>; from matt@virtualspectator.com on Fri, Jan 19, 2001 at 09:17:46AM +1300 References: <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net> <01011909403002.00874@localhost.localdomain> Message-ID: <20010118131129.A17157@mso.oz.net> On Fri, Jan 19, 2001 at 09:17:46AM +1300, matt wrote: > > Perhaps then you'll want to consider > > modifying one of the existing XML parser classes or the sgmllib parser > > to fit your needs. > > That would defeat my intention of using xml from the point of view that it is > a standard. The purpose of XML is to provide data interchange between diverse applications. If your application somehow produces a valid XML file, that should be enough. Of course, if your program may be expanded later by XML programmers, you'll want something familiar enough they can work with it. But trying to contort your application to work with the standard xml modules if they weren't desgined for that job may not be the answer. > why couldn't the DOM > document still represent it as a CDATA section internally? Do you necessarily need DOM? -- -Mike (Iron) Orr, iron@mso.oz.net (if mail problems: mso@jimpick.com) http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol From sales@spiderline.com Thu Jan 18 19:51:27 2001 From: sales@spiderline.com (Spiderline) Date: Thu, 18 Jan 2001 19:51:27 Subject: [XML-SIG] Your Site Search Engine Message-ID: <20010119015212.1DA35F128@mail.python.org> Make your Website Searchable in Minutes! With Spiderline(SM), you can add a search engine to your website without any additional software or special maintenance. Visitors can search through the pages of your website to quickly find useful information. - No ads or design limitations of any kind. Your design can be customized to look exactly like your website! - Comprehensive query reports - Know what visitors are searching for. - No software or special maintenance required. Register today and add working search options to your site immediately. HOW DOES IT WORK? Follow a one-step registration process and Spiderline will crawl your website and make an index from the pages it finds. When a visitor submits a search query on your website, information on relevant pages is retrieved from the index and displayed on customized pages. Your customers will click on a link from the search results page and return to your site withought knowing they left! REGISTER FOR FREE TODAY, by visiting http://www.spiderline.com/ - The Spiderline Team - http://www.spiderline.com/ ---------------------------------------------------------------------- Note: If you reply to this message with the subject "REMOVE", we will be sure you are not part of future mailings. From jeremy.kloth@fourthought.com Fri Jan 19 03:09:41 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Thu, 18 Jan 2001 20:09:41 -0700 Subject: [XML-SIG] Announcing PyXPath 1.2 References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> Message-ID: <3A67AFF5.F0895522@fourthought.com> "Martin v. Loewis" wrote: > module XPath{ > > typedef wstring DOMString; > > const unsigned short ABSOLUTE_LOCATION_PATH = 1; > const unsigned short ABBREVIATED_ABSOLUTE_LOCATION_PATH = 2; > const unsigned short RELATIVE_LOCATION_PATH = 3; > const unsigned short ABBREVIATED_RELATIVE_LOCATION_PATH = 4; > const unsigned short STEP_EXPR = 5; // STEP would conflict with Step in case > const unsigned short NODE_TEST = 6; > const unsigned short NAME_TEST = 7; > const unsigned short BINARY_EXPR = 8; Since there are two basic types of binary expressions, I suggest splitting this into a BOOLEAN_EXPR and NUMERIC_EXPR. They do offer quite different functionality. > const unsigned short UNARY_EXPR = 9; This would be considered a NUMERIC_EXPR. > const unsigned short PATH_EXPR = 10; > const unsigned short ABBREVIATED_PATH_EXPR = 11; // filter '//' path > const unsigned short FILTER_EXPR = 12; > const unsigned short VARIABLE_REFERENCE = 13; > const unsigned short LITERAL_EXPR = 14; > const unsigned short NUMBER_EXPR = 15; > const unsigned short FUNCTION_CALL = 16; > > interface Expr{ > readonly attribute unsigned short exprType; > }; > > interface AbsoluteLocationPath; > interface AbbreviatedAbsoluteLocationPath; > interface RelativeLocationPath; > interface Step; > interface AxisSpecifier; > interface NodeTest; > typedef sequence PredicateList, ExprList; > interface NameTest; > interface BinaryExpr; > interface UnaryExpr; > interface UnionExpr; > interface PathExpr; > interface FilterExpr; > interface VariableReference; > interface Literal; > interface Number; > interface FunctionCall; > > interface ExprFactory{ > AbsoluteLocationPath createAbsoluteLocationPath(in RelativeLocationPath p); > AbsoluteLocationPath createAbbreviatedAbsoluteLocationPath(in RelativeLocationPath p); > RelativeLocationPath createRelativeLocationPath(in RelativeLocationPath left, > in Step right); > RelativeLocationPath createAbbreviatedRelativeLocationPath(in RelativeLocationPath left, > in Step right); > > Step createStep(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); > // . is represented as self::node(); .. as parent::node() > Step createAbbreviatedStep(in boolean dotdot); // false for .; true for .. > // An omitted axisname is created as CHILD; @ is created as ATTRIBUTE > > AxisSpecifier createAxisSpecifier(in unsigned short name); > > NodeTest createNodeTest(in unsigned short type); > NameTest createNameTest(in DOMString prefix, in DOMString localName); > > BinaryExpr createBinaryExpr(in unsigned short operator, in Expr left, in Expr right); > > UnaryExpr createUnaryExpr(in Expr exp); > See above for Binary and Unary expressions. > PathExpr createPathExpr(in Expr filter, in Expr path); > // filter '//' path > PathExpr createAbbreviatedPathExpr(in Expr filter, in Expr path); > > FilterExpr createFilterExpr(in Expr filter, in Expr predicate); > > // the name must still contain the leading $ > VariableReference createVariableReference(in DOMString name); name can be a qualified name. use prefix, localname > > Literal createLiteral(in DOMString literal); > Number createNumber(in DOMString value); > FunctionCall createFunctionCall(in DOMString name, in ExprList args); See createVariableReference > }; > > interface Parser{ > Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step > }; This should probably be parseExpression, since the Expr is the primary construct. (See XPath spec - sect 1) > > interface AbsoluteLocationPath:Expr{ > /* '/' relative-opt, or '//' relative */ > readonly attribute Expr relative; // step or relative path relative may be null (case of '/') > }; > > interface RelativeLocationPath:Expr{ > readonly attribute Expr left; // step or relative path > readonly attribute Step right; > }; > > interface Step:Expr{ > readonly attribute AxisSpecifier axis; > readonly attribute NodeTest test; > readonly attribute PredicateList predicates; > }; > > const unsigned short ANCESTOR = 1; > const unsigned short ANCESTOR_OR_SELF = 2; > const unsigned short _ATTRIBUTE = 3; // attribute is a keyword > const unsigned short CHILD = 4; > const unsigned short DESCENDANT = 5; > const unsigned short DESCENDANT_OR_SELF = 6; > const unsigned short FOLLOWING = 7; > const unsigned short FOLLOWING_SIBLING = 8; > const unsigned short NAMESPACE = 9; > const unsigned short PARENT = 10; > const unsigned short PRECEDING = 11; > const unsigned short PRECEDING_SIBLING = 12; > const unsigned short SELF = 13; Maybe suffix the types with '_AXIS'? > interface AxisSpecifier:Expr{ > readonly attribute unsigned short name; Should we use axisType just for consistancy? > }; > > const unsigned short COMMENT = 1; > const unsigned short TEXT = 2; > const unsigned short PROCESSING_INSTRUCTION = 3; > const unsigned short NODE = 4; suffix of '_NODE_TEST' ?? > interface NodeTest:Expr{ > readonly attribute unsigned short test; testType ?? > readonly attribute DOMString literal; // only for PROCESSING_INSTRUCTION > }; > > interface NameTest:Expr{ > readonly attribute DOMString prefix; // may be null > readonly attribute DOMString localName; // may be "*" > }; > > const unsigned short BINOP_OR = 1; > const unsigned short BINOP_AND = 2; > const unsigned short BINOP_EQ = 3; > const unsigned short BINOP_NEQ = 4; > const unsigned short BINOP_LT = 5; > const unsigned short BINOP_GT = 6; > const unsigned short BINOP_LE = 7; > const unsigned short BINOP_GE = 8; > const unsigned short BINOP_PLUS = 9; > const unsigned short BINOP_MINUS = 10; > const unsigned short BINOP_TIMES = 11; > const unsigned short BINOP_DIV = 12; > const unsigned short BINOP_MOD = 13; > const unsigned short BINOP_UNION = 14; possibly ??_OPERATOR as apposed to BINOP_?? > interface BinaryExpr:Expr{ > readonly attribute unsigned short operator; > readonly attribute Expr left,right; > }; > > UnaryExpr createUnaryExpr(in Expr exp); > See factory functions above. > interface PathExpr:Expr{ > readonly attribute Expr filter; > readonly attribute Expr path; > }; > > interface FilterExpr:Expr{ > readonly attribute Expr filter; > readonly attribute Expr predicate; > }; > > interface VariableReference:Expr{ > readonly attribute DOMString name; > }; > > interface Literal:Expr{ > readonly attribute DOMString value; > }; > > interface Number:Expr{ > readonly attribute double value; > }; > > interface FunctionCall:Expr{ > readonly attribute DOMString name; > readonly attribute ExprList args; > }; > > }; > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jeremy.kloth@fourthought.com Fri Jan 19 03:18:28 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Thu, 18 Jan 2001 20:18:28 -0700 Subject: [XML-SIG] Announcing PyXPath 1.2 References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> Message-ID: <3A67B204.4CC0CABD@fourthought.com> "Martin v. Loewis" wrote: > > The API is IDL based, which is meant in the same way as in the DOM: > there is a (yet to be specified) mapping to Python, which roughly > works that way: > - global constants are defined in the module xml.xpath. > - DOMString means Unicode objects, although normal strings should > be accepted were possible. > - attributes are accessed as attributes; _get_ accessor functions > are optional. Should the constants be defined where they are used? The expression types in the Expr interface, axis specifier types in the AxisSpecifier interface, node test types in the NodeTest interface, This would be similar to node types in Node, filter types in NodeFilter. A benefit from this would be helping to avoiding circular imports. xml.xpath -> (the parser) -> ExprFactory -> (constants from xml.xpath) Sure imports could be done in the functions, but top level imports offer some speed improvements. Slight, but every little bit helps. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Anthony Baxter Fri Jan 19 09:43:13 2001 From: Anthony Baxter (Anthony Baxter) Date: Fri, 19 Jan 2001 09:43:13 +0000 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: Message from Norman Walsh of "18 Jan 2001 23:54:38 +0700." <87ae8on6lt.fsf@nwalsh.com> Message-ID: <200101190943.UAA04290@mbuna.arbhome.com.au> >>> Norman Walsh wrote > In the interest of technical accuracy, I'll point out that there's nothing > that says a processor is not allowed to use CDATA to escape text. (It might > be an interesting switch on a serializer: use CDATA for any text node that > contains more than 5% entity references or something...) That was something that occurred to me when reading this thread - aside from the file size issue, it's also going to be faster to write out and read in the documents. Ok, this is assuming a fairly odd slab of text, but hey, look at the number of tags in your average web page today - including slabs of them as text is going to hurt. The readability is surely only an issue if you're editing the XML directly in vi, or whatever, I can't see an XML-aware editor leaving the text as entities... -- Anthony Baxter It's never too late to have a happy childhood. From paulp@ActiveState.com Fri Jan 19 04:01:59 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 18 Jan 2001 20:01:59 -0800 Subject: [XML-SIG] Re: adding the XML to 2.0 to be a mistake? References: <3d8zocuqrd.fsf@kronos.cnri.reston.va.us> Message-ID: <3A67BC37.213BAACB@ActiveState.com> Andrew Kuchling wrote: > > John Schmitt writes: > > Pardon the ignorance, but where is the mistake? Is it in adding PyXML to > > 2.0 or is it the way it was done? Is there no development strategy that > > makes this less of a burden? If a previous release of PyXML had been added > > to 2.0, would you still consider it a mistake? > > Duplicating complex code in two different projects, so that they have > to be kept in sync manually at the cost of time and effort, is the > mistake. I agree with this. I don't think that minidom should have an existence independent of Python. The PyXML minidom should be phased out. The only reason it was not is because some people still use it with older versions of Python. But that will always be a problem when code is moved from an "extension" environment to the standard library. > Another one is tying a fast-moving project such as PyXML to > the slower releases of Python; Python 2.0 was released on October 16, > and there have been two PyXML releases (0.6.2 and 0.6.3) since then. I don't know what you mean by saying that PyXML is "tied to Python." PyXML depends on Python, just as PIL and NumPy do. Paul Prescod From martin@mira.cs.tu-berlin.de Fri Jan 19 09:04:13 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 19 Jan 2001 10:04:13 +0100 Subject: [XML-SIG] Announcing PyXPath 1.2 In-Reply-To: <3A67B204.4CC0CABD@fourthought.com> (message from Jeremy Kloth on Thu, 18 Jan 2001 20:18:28 -0700) References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> <3A67B204.4CC0CABD@fourthought.com> Message-ID: <200101190904.f0J94DV00897@mira.informatik.hu-berlin.de> Hi Jeremy, Thanks for your comments. I'll study them in detail later. > Should the constants be defined where they are used? I think the DOM is proof that this is not desirable. If constants are defined in an interface, applications have to know the names of the interface implementation classes. In the case of the DOM, we just solved this by providing xml.dom.Node in the package, which *just* contains the node type constants. That, in turn, required to rename 4DOM's Node.py to FtNode. > The expression types in the Expr interface, > axis specifier types in the AxisSpecifier interface, > node test types in the NodeTest interface, To get a true separation of interface and implementation, the base package would need to provide xml.xpath.Expr.RELATIVE_LOCATION_PATH - how else are applications supposed to refer to these constants? > A benefit from this would be helping to avoiding circular imports. > > xml.xpath -> (the parser) -> ExprFactory -> (constants from xml.xpath) > Sure imports could be done in the functions, but top level imports offer > some speed improvements. Slight, but every little bit helps. I don't see how it would remove circular imports: the constants would still live in xml.xpath.__init__.py. Also, circular imports are not a problem per se: __init__ just needs to guarantee that the constants (and anything else provided to implementations) is defined before anything originating from an implementation is imported. Perhaps it would be even better *not* to provide xml.xpath.{parser|factory}, but to require the user to explicitly specify the implementation to use: from xml.xpath.FtFactory import factory from xml.xpath.PyXPath import parser or, with the "pick an arbitrary one" API from xml.xpath.anyfactory import factory from xml.xpath.anyparser import parser Regards, Martin From larsga@garshol.priv.no Fri Jan 19 09:15:48 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Jan 2001 10:15:48 +0100 Subject: [XML-SIG] CDATA sections still not handled In-Reply-To: <87ae8on6lt.fsf@nwalsh.com> References: <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> <01011823353400.00859@localhost.localdomain> <87ae8on6lt.fsf@nwalsh.com> Message-ID: * Norman Walsh | | In the interest of technical accuracy, I'll point out that there's | nothing that says a processor is not allowed to use CDATA to escape | text. (It might be an interesting switch on a serializer: use CDATA | for any text node that contains more than 5% entity references or | something...) I think giving serializers a switch similar to that used by the XSLT serializers would be a good idea: a list of elements, the contents of which will be wrapped in CDATA sections. --Lars M. From larsga@garshol.priv.no Fri Jan 19 09:27:02 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Jan 2001 10:27:02 +0100 Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <01011909403002.00874@localhost.localdomain> References: <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net> <01011909403002.00874@localhost.localdomain> Message-ID: * matt@virtualspectator.com | | [...] since one gets CDATA begin and end events while parsing a | document that contains CDATA section, then why couldn't the DOM | document still represent it as a CDATA section internally? Because it would be a real pain, and would most likely break lots of applications. If text nodes can suddenly be represented as both text and cdata nodes, applications that only test for text nodes (and I assume this is the majority) will be silently losing data. Furthermore, the normalize method, which many applications use to ensure that there are no adjacent text nodes in the DOM tree stops working in the presence of cdata nodes, since these are not normalized. | Furthermore, a parser such as expat will preserve the original form | of the characters that have been escaped, and even convert them if | they happened to be in entity references. What are you trying to say here? | It seems to me that the handling of CDATA sits at the level of it's | base class which is a text node and that the CDATA sections are only | used to say "don't validate the following, it is ALL character | data".. CDATA sections and ordinary 'text'[1] are just two ways to represent the same thing, and applications should not care which of the two ways have been used. The distinction between these two ways of representing character data is information about how the document was put together, as opposed to information about what is in the document. In other words, this issue is really the same as the issues 'white space in tags is lost', 'I can't tell what character data came from numeric character references' and so on. I think your current way of handling it, to control what is represented as CDATA in the serializer, is the correct way to do it. One should consider very carefully before adding information of this sort to the document tree (or event stream), because there is such an unbelievably awful lot of it that it needs to be handled with the greatest of care. I have been thinking lately that it would be an interesting experiment to make an XML parser with an interface specialized for representing ALL the lexical information about a document. I guess this could be done by passing along with every event the list of tokens that made up that event. --Lars M. [1] Correct terminology is really to call it character data. Text, as defined by XML, is both markup and character data. From akuchlin@mems-exchange.org Fri Jan 19 18:31:26 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 19 Jan 2001 13:31:26 -0500 Subject: [XML-SIG] Re: adding the XML to 2.0 to be a mistake? In-Reply-To: <3A67BC37.213BAACB@ActiveState.com>; from paulp@activestate.com on Thu, Jan 18, 2001 at 08:01:59PM -0800 References: <3d8zocuqrd.fsf@kronos.cnri.reston.va.us> <3A67BC37.213BAACB@ActiveState.com> Message-ID: <20010119133126.A875@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 08:01:59PM -0800, Paul Prescod wrote: >I agree with this. I don't think that minidom should have an existence >independent of Python. The PyXML minidom should be phased out. The only That won't work, because the _xmlplus package overrides the xml/ package completely, and therefore has to keep copies of everything in Python's package, so we're stuck with the duplication. >I don't know what you mean by saying that PyXML is "tied to Python." >PyXML depends on Python, just as PIL and NumPy do. I should have been clearer and said that effectively its release schedule is tied to Python. --amk From matt@virtualspectator.com Fri Jan 19 20:19:33 2001 From: matt@virtualspectator.com (matt) Date: Sat, 20 Jan 2001 09:19:33 +1300 Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: References: <01011909403002.00874@localhost.localdomain> Message-ID: <01012009443202.00856@localhost.localdomain> Sorry to keep this thread going, but now it's getting really interesting .... and useful. On Fri, 19 Jan 2001, Lars Marius Garshol wrote: > * matt@virtualspectator.com > | > | [...] since one gets CDATA begin and end events while parsing a > | document that contains CDATA section, then why couldn't the DOM > | document still represent it as a CDATA section internally? > > Because it would be a real pain, and would most likely break lots of > applications. If text nodes can suddenly be represented as both text > and cdata nodes, applications that only test for text nodes (and I > assume this is the majority) will be silently losing data. That would make either the implementation of CDATA wrong, or the way you use it. Text nodes are base classes of CDATA, so process that works on text nodes will implicitly work on CDATA nodes .... which it does fortunately. Even if you try a type cast to assert this you should get a valid base class pointer back .... not that python on it's face worries too much about that. Otherwise I am confused as to what you mean. It seems to me anyway that everyone has been trying to make the argument that they are one in the same, which they are in the interpretation sense. A parser such as expat handles the inheritance perfectly since for a CDATA section it will give you CDATA begin and end events while passing the data itself into character data handlers. I don't see things breaking anywhere. > > Furthermore, the normalize method, which many applications use to > ensure that there are no adjacent text nodes in the DOM tree stops > working in the presence of cdata nodes, since these are not > normalized. Perhaps the specification for normalize on a nodes sub-tree is wrong, or, you expect it to always give you a nice single replacement node. I think it is equally wrong to flatly remove all CDATA nodes without giving the user a handle to keep them. They serve a useful purpose, and it seems bizarre that the DOM document builder just throws away the events that tell us we have come across a CDATA node. Perhaps it should sit at the level of normalize itself .... pass an extra optional argument that translates CDATA nodes and therefore includes them in the merge? > > | Furthermore, a parser such as expat will preserve the original form > | of the characters that have been escaped, and even convert them if > | they happened to be in entity references. > > What are you trying to say here? That it doesn't matter which way you represent any "hidden" markup eg as < or as < within a CDATA section, expat will give '<' to the character data handler. Which is useful. > > | It seems to me that the handling of CDATA sits at the level of it's > | base class which is a text node and that the CDATA sections are only > | used to say "don't validate the following, it is ALL character > | data".. > > CDATA sections and ordinary 'text'[1] are just two ways to represent > the same thing, and applications should not care which of the two ways > have been used. The distinction between these two ways of representing > character data is information about how the document was put together, > as opposed to information about what is in the document. > > In other words, this issue is really the same as the issues 'white > space in tags is lost', 'I can't tell what character data came from > numeric character references' and so on. > > I think your current way of handling it, to control what is > represented as CDATA in the serializer, is the correct way to do it. > One should consider very carefully before adding information of this > sort to the document tree (or event stream), because there is such an > unbelievably awful lot of it that it needs to be handled with the > greatest of care. > But when you build a CDATA section in a DOM document you get a CDATA section object, which I assume, should inherit a Text node object. > I have been thinking lately that it would be an interesting experiment > to make an XML parser with an interface specialized for representing > ALL the lexical information about a document. I guess this could be > done by passing along with every event the list of tokens that made up > that event. What sort of representation? > > --Lars M. > > [1] Correct terminology is really to call it character data. Text, as > defined by XML, is both markup and character data. > yes .... but since Text nodes inherit character data I just left that alone ...... regards Matt > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From dan.rolander@marriott.com Sat Jan 20 04:39:03 2001 From: dan.rolander@marriott.com (Dan Rolander) Date: Fri, 19 Jan 2001 23:39:03 -0500 Subject: [XML-SIG] Using Installer with PyXML Message-ID: <016801c0829a$ea7bf2e0$11260340@yin> David Bolen has been a tremendous help to me figuring out how to use Gordon McMillan's Installer 20_3i to create standalone EXEs for Win32 with Python 2.0 and PyXML 0.6.3. We've discovered a couple of things though that I'd like to point out and perhaps get some explanations on. In order for Installer to properly discover the required PyXML files, we had to rename the _xmlplus directory to xml and rename the core xml directory to something else. According to David... "The problem here has to be the way that the xml library tree is replacing itself with the _xmlplus tree from the later PyXML distribution. While runtime re-assigns xml to _xmlplus in the __init__ for xml, the import system used by the installation package can't track that, so it still looks for the actual module tree it loaded from the Python distribution beneath the name xml." So the question is, will this adversely impact normal Python operation, and is there a better way? The other question I have is... Why are there two different pyexpat.pyd files, one as part of the core 2.0 distribution (at only 25 kb) and the other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb). I haven't been able to get the large one to work using Installer, but the small core file works fine. What is the difference? Thanks to anybody who can help here, Dan From martin@mira.cs.tu-berlin.de Sat Jan 20 09:55:54 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 20 Jan 2001 10:55:54 +0100 Subject: [XML-SIG] Using Installer with PyXML In-Reply-To: <016801c0829a$ea7bf2e0$11260340@yin> (dan.rolander@marriott.com) References: <016801c0829a$ea7bf2e0$11260340@yin> Message-ID: <200101200955.f0K9tsL00802@mira.informatik.hu-berlin.de> > "The problem here has to be the way that the xml library tree is replacing > itself with the _xmlplus tree from the later PyXML distribution. While > runtime re-assigns xml to _xmlplus in the __init__ for xml, the import > system used by the installation package can't track that, so it still looks > for the actual module tree it loaded from the Python distribution beneath > the name xml." I'm not sure I understand the problem. Will the packager refuse (or forget) to package the xml package, or will it, at runtime, fail to load it? If it manages to package both xml and _xmlplus: when loading xml, will it execute xml/__init__.py? In there, there is an import of _xmlplus. Will that succeed? If so, what happens to the lines import sys sys.modules[__name__] = _xmlplus Will __name__ have a value of "xml"? Will the assignment succeed? Now, suppose we do from xml.sax import sax2exts In normal Python, this will look for sys.modules["xml"] and start from there. Are you saying the installer does not work that way, or that even if it starts from there, it still can't figure out to load _xmlplus.sax? > So the question is, will this adversely impact normal Python operation, and > is there a better way? No, replacing the Python xml package completely with _xmlplus will work just fine - except perhaps for the pyexpat difference. > The other question I have is... Why are there two different pyexpat.pyd > files, one as part of the core 2.0 distribution (at only 25 kb) and the > other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb). I > haven't been able to get the large one to work using Installer, but the > small core file works fine. What is the difference? There are two differences: the one from PyXML contains a number of bug fixes which are not in Python 2. In addition, it contains a literal copy of the expat libraries, so that the expat DLLs in the Python core should not be needed anymore. When you say "get the large one to work", what exactly have you tried, and how exactly did it fail? Regards, Martin From dan.rolander@marriott.com Sat Jan 20 18:48:58 2001 From: dan.rolander@marriott.com (Dan Rolander) Date: Sat, 20 Jan 2001 13:48:58 -0500 Subject: [XML-SIG] Re: Using Installer with PyXML Message-ID: <02c401c08311$a66e8f00$11260340@yin> Hi Martin, Thanks for responding. Here are the specifics-- When I use a script with the statements: from xml.sax import saxexts, saxlib, saxutils and parser = saxexts.make_parser("xml.sax.drivers.drv_pyexpat") the packager (Gordon McMillan's Installer) is able to find xml.sax.saxutils, but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually reside in _xmlplus.sax. I can force builder.py to include the entire _xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section of the .cfg file, but the exe still fails because it is looking for xml.sax.*: ImportError: cannot import name xml.sax.saxexts When I rename _xmlplus to xml and then run builder again without specifying any additional packages, the EXE fails because it can't find an available parser: File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in make_parser xml.sax._exceptions.SAXReaderNotAvailable: No parsers found If I manually import the entire PyXML tree (now named 'xml') by adding a packages=xml line to the .cfg file, I get a little farther but now the exe isn't able to find pyexpat. ImportError: cannot import name xml.parsers.pyexpat I then try to manually import pyexpat by adding xml.parsers.pyexpat to the misc line in the [MYCOLLECT] section, but finder.py is not able to find it: File "D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py", line 121, in identify ValueError: xml.parsers.pyexpat.pyd not found If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the core \DLLs version of pyexpat.pyd is found and put into the dist directory. Now when the exe is run I get a Windows error stating that the xmlparse.dll couldn't be located. I add xmlparse.dll to the misc= line and then I get an error stating that the xmltok.dll couldn't be found. I add xmltok.dll to the misc= line and voila! it works! I then start to wonder why the exe couldn't find xml.parsers.pyexpat.pyd if I imported the entire xml tree. I study the builder.log some more and realize that it only imported .py files and not .pyd files! I tried using directories= instead of packages= and got the same results. I re-read Gordon's documentation several times and tried different combinations of .cfg statements but nothing I tried resulted in a good import of xml.parsers.pyexpat. I then replaced the core version of pyexpat.pyd in \DLLs with the PyXML version and found that I could build a good exe without having to manually include the xmlparse.dll and xmltok.dll. So my final .cfg file looks like this: [MYCOLLECT] type= COLLECT name= dist_testsax bindepends= testsax.py misc= MYSTANDALONE, pyexpat.pyd debug = 0 excludes = PyWinTypes20.dll, win32api [MYSTANDALONE] type= STANDALONE name= testsax.exe script= testsax.py zlib = APPZLIB userunw = 0 support = 0 debug = 0 [APPZLIB] name= testsax.pyz dependencies= testsax.py excludes= dospath, posixpath, macpath directories=xml Now, for another example... Another test script has the statement: from xml.parsers import pyexpat and parser = pyexpat.ParserCreate() I start with the _xmlplus directory renamed to xml, because I know that's necessary, and I build a new standalone installation. This time the pyexpat file is imported to the dist directory as xml.parsers.pyexpat.pyd but the exe won't import it: ImportError: cannot import name xml.parsers.pyexpat Renaming the file to pyexpat.pyd does not help. I add packages=xml to the .cfg file and I still have the same problem. The only fix I can figure out is to change the import statement to: import pyexpat and that works. So in summary, my tests lead me to conclude the following... To use Gordon McMillan's Installer to create standalone executables of scripts that import modules from the PyXML package, the following must be done (depending on what modules are actually being used): 1. Replace the core xml directory with the _xmlplus directory, by renaming _xmlplus to xml. 2. Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the \DLLs directory. 3. If pyexpat is needed, either explicitly import it in your script, or manually include it in the standalone installation by adding an entry to the misc line in the COLLECT section of the builder .cfg file. 4. If importing from xml.sax, manually import the entire PyXML tree (source files only) by specifying either packages=xml or directories=xml in the PYZ section of the builder .cfg file. (I have not even tried using DOM yet, so I'm sure there are more issues there to be found.) I am by no means an expert on this, so if anybody understands this better and can provide simpler workarounds I would appreciate hearing it. Thanks, and I hope this helps someone! Dan ----- Original Message ----- From: "Martin v. Loewis" To: Cc: ; ; Sent: Saturday, January 20, 2001 4:55 AM Subject: Re: [XML-SIG] Using Installer with PyXML > > "The problem here has to be the way that the xml library tree is replacing > > itself with the _xmlplus tree from the later PyXML distribution. While > > runtime re-assigns xml to _xmlplus in the __init__ for xml, the import > > system used by the installation package can't track that, so it still looks > > for the actual module tree it loaded from the Python distribution beneath > > the name xml." > > I'm not sure I understand the problem. Will the packager refuse (or > forget) to package the xml package, or will it, at runtime, fail to > load it? > > If it manages to package both xml and _xmlplus: when loading xml, will > it execute xml/__init__.py? In there, there is an import of _xmlplus. > Will that succeed? If so, what happens to the lines > > import sys > sys.modules[__name__] = _xmlplus > > Will __name__ have a value of "xml"? Will the assignment succeed? > > Now, suppose we do > > from xml.sax import sax2exts > > In normal Python, this will look for sys.modules["xml"] and start from > there. Are you saying the installer does not work that way, or that > even if it starts from there, it still can't figure out to load > _xmlplus.sax? > > > So the question is, will this adversely impact normal Python operation, and > > is there a better way? > > No, replacing the Python xml package completely with _xmlplus will > work just fine - except perhaps for the pyexpat difference. > > > The other question I have is... Why are there two different pyexpat.pyd > > files, one as part of the core 2.0 distribution (at only 25 kb) and the > > other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb). I > > haven't been able to get the large one to work using Installer, but the > > small core file works fine. What is the difference? > > There are two differences: the one from PyXML contains a number of bug > fixes which are not in Python 2. In addition, it contains a literal > copy of the expat libraries, so that the expat DLLs in the Python core > should not be needed anymore. > > When you say "get the large one to work", what exactly have you tried, > and how exactly did it fail? > > Regards, > Martin > From martin@mira.cs.tu-berlin.de Sat Jan 20 22:25:47 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 20 Jan 2001 23:25:47 +0100 Subject: [XML-SIG] Re: Using Installer with PyXML In-Reply-To: <02c401c08311$a66e8f00$11260340@yin> (dan.rolander@marriott.com) References: <02c401c08311$a66e8f00$11260340@yin> Message-ID: <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de> > from xml.sax import saxexts, saxlib, saxutils [...] > the packager (Gordon McMillan's Installer) is able to find xml.sax.saxutils, > but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually > reside in _xmlplus.sax. I was going to claim this to be a bug in the installer, but it now rather seems like an operator error: The installer has now way of knowing that it ought to load the _xmlplus.sax.saxexts into the distribution, since there is no import statement for it. So announcing the full _xmlplus package to it is the right thing to do. > I can force builder.py to include the entire > _xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section of > the .cfg file, but the exe still fails because it is looking for xml.sax.*: > > ImportError: cannot import name xml.sax.saxexts It's not clear what is causing that. It could be a bug in the installer, or it could be the distribution contains no pyexpat.pyd. In that case, you'll have to explicitly request inclusion of pyexpat.pyd. It would be good to check what files are actually included. > When I rename _xmlplus to xml and then run builder again without specifying > any additional packages, the EXE fails because it can't find an available > parser: > > File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in > make_parser > xml.sax._exceptions.SAXReaderNotAvailable: No parsers found No surprise. The installer is looking at import statements, but there are no import statements for xml.sax.drivers.*; instead, they are imported by calling __import__ for a computed string. So again, that is an operator error: everything imported "by magic" must be announced explicitly to such a packager. > If I manually import the entire PyXML tree (now named 'xml') by adding a > packages=xml line to the .cfg file, I get a little farther but now the exe > isn't able to find pyexpat. > > ImportError: cannot import name xml.parsers.pyexpat > > I then try to manually import pyexpat by adding xml.parsers.pyexpat to the > misc line in the [MYCOLLECT] section, but finder.py is not able to find it: > > File "D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py", > line 121, in identify > ValueError: xml.parsers.pyexpat.pyd not found You did not say *how* you specified it - it might be that Installer mistook your command as trying to import a module named "pyd" from a package named "pyexpat" - that is not available. > If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the core > \DLLs version of pyexpat.pyd is found and put into the dist directory. Now > when the exe is run I get a Windows error stating that the xmlparse.dll > couldn't be located. > > I add xmlparse.dll to the misc= line and then I get an error stating that > the xmltok.dll couldn't be found. > > I add xmltok.dll to the misc= line and voila! it works! When you use the pyexpat from PyXML, the difference should be that xmlparse.dll and xmltok.dll are not required. > I then replaced the core version of pyexpat.pyd in \DLLs with the > PyXML version and found that I could build a good exe without having > to manually include the xmlparse.dll and xmltok.dll. Not only do you not need to include them manually - they are not needed at all. Care to write a small howto document for the XML topic guide? > I start with the _xmlplus directory renamed to xml, because I know that's > necessary, and I build a new standalone installation. This time the pyexpat > file is imported to the dist directory as xml.parsers.pyexpat.pyd but the > exe won't import it: > > ImportError: cannot import name xml.parsers.pyexpat Do you have a traceback for that? All applications should import xml.parsers.expat, which should have from pyexpat import * so there should be no request to load xml.parsers.pyexpat. Older PyXML versions had such code, but it should have been wrapped with catching and ImportError, which then should fall back to load pyexpat unqualified. > The only fix I can figure out is to change the import statement to: > > import pyexpat > > and that works. As I said, the real solution is to write from xml.parsers import expat or, if you need to keep the pyexpat name, from xml.parsers import expat as pyexpat > 1. Replace the core xml directory with the _xmlplus directory, by renaming > _xmlplus to xml. I'm not entirely sure *why* this is needed, but it certainly can't hurt. > 2. Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the > \DLLs directory. That is a good idea, yes. > 3. If pyexpat is needed, either explicitly import it in your script, or > manually include it in the standalone installation by adding an entry to the > misc line in the COLLECT section of the builder .cfg file. pyexpat should always be included in PyXML applications, so that is also fine. > 4. If importing from xml.sax, manually import the entire PyXML tree (source > files only) by specifying either packages=xml or directories=xml in the PYZ > section of the builder .cfg file. I would guess the same applies when importing DOM stuff - the DOM readers also use make_parser at some point. Regards, Martin From dan.rolander@marriott.com Sat Jan 20 23:25:21 2001 From: dan.rolander@marriott.com (Dan Rolander) Date: Sat, 20 Jan 2001 18:25:21 -0500 Subject: [XML-SIG] Re: Using Installer with PyXML References: <02c401c08311$a66e8f00$11260340@yin> <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de> Message-ID: <037201c08338$42ce08a0$11260340@yin> Thank you for your assistance Martin. Although your analysis of the problem ("operator error") is close, I would probably more correctly identify it as operator ignorance. I'm still trying to figure out how to effectively use PyXML and build standalone executables with it. Since the Installer seems to play by its own rules when it comes to imports, it is especially challenging. I have not found the xml-sig documentation, or the python library reference, to be too helpful for someone new to xml processing, so I bought Sean McGrath's book "XML Processing with Python" and have found that to be *very* helpful. But his examples, which I was testing, use references to pyexpat. I tested your suggestion of using "from xml.parsers import expat" vs. "import pyexpat" and that works fine, but I'm not sure what the benefit of using that form is. I haven't quite grok'd all of this yet, but once I do I would have no problem with writing a mini-howto. Thanks again, Dan ----- Original Message ----- From: "Martin v. Loewis" To: Cc: ; ; Sent: Saturday, January 20, 2001 5:25 PM Subject: Re: [XML-SIG] Re: Using Installer with PyXML > from xml.sax import saxexts, saxlib, saxutils [...] > the packager (Gordon McMillan's Installer) is able to find xml.sax.saxutils, > but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually > reside in _xmlplus.sax. I was going to claim this to be a bug in the installer, but it now rather seems like an operator error: The installer has now way of knowing that it ought to load the _xmlplus.sax.saxexts into the distribution, since there is no import statement for it. So announcing the full _xmlplus package to it is the right thing to do. > I can force builder.py to include the entire > _xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section of > the .cfg file, but the exe still fails because it is looking for xml.sax.*: > > ImportError: cannot import name xml.sax.saxexts It's not clear what is causing that. It could be a bug in the installer, or it could be the distribution contains no pyexpat.pyd. In that case, you'll have to explicitly request inclusion of pyexpat.pyd. It would be good to check what files are actually included. > When I rename _xmlplus to xml and then run builder again without specifying > any additional packages, the EXE fails because it can't find an available > parser: > > File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in > make_parser > xml.sax._exceptions.SAXReaderNotAvailable: No parsers found No surprise. The installer is looking at import statements, but there are no import statements for xml.sax.drivers.*; instead, they are imported by calling __import__ for a computed string. So again, that is an operator error: everything imported "by magic" must be announced explicitly to such a packager. > If I manually import the entire PyXML tree (now named 'xml') by adding a > packages=xml line to the .cfg file, I get a little farther but now the exe > isn't able to find pyexpat. > > ImportError: cannot import name xml.parsers.pyexpat > > I then try to manually import pyexpat by adding xml.parsers.pyexpat to the > misc line in the [MYCOLLECT] section, but finder.py is not able to find it: > > File "D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py", > line 121, in identify > ValueError: xml.parsers.pyexpat.pyd not found You did not say *how* you specified it - it might be that Installer mistook your command as trying to import a module named "pyd" from a package named "pyexpat" - that is not available. > If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the core > \DLLs version of pyexpat.pyd is found and put into the dist directory. Now > when the exe is run I get a Windows error stating that the xmlparse.dll > couldn't be located. > > I add xmlparse.dll to the misc= line and then I get an error stating that > the xmltok.dll couldn't be found. > > I add xmltok.dll to the misc= line and voila! it works! When you use the pyexpat from PyXML, the difference should be that xmlparse.dll and xmltok.dll are not required. > I then replaced the core version of pyexpat.pyd in \DLLs with the > PyXML version and found that I could build a good exe without having > to manually include the xmlparse.dll and xmltok.dll. Not only do you not need to include them manually - they are not needed at all. Care to write a small howto document for the XML topic guide? > I start with the _xmlplus directory renamed to xml, because I know that's > necessary, and I build a new standalone installation. This time the pyexpat > file is imported to the dist directory as xml.parsers.pyexpat.pyd but the > exe won't import it: > > ImportError: cannot import name xml.parsers.pyexpat Do you have a traceback for that? All applications should import xml.parsers.expat, which should have from pyexpat import * so there should be no request to load xml.parsers.pyexpat. Older PyXML versions had such code, but it should have been wrapped with catching and ImportError, which then should fall back to load pyexpat unqualified. > The only fix I can figure out is to change the import statement to: > > import pyexpat > > and that works. As I said, the real solution is to write from xml.parsers import expat or, if you need to keep the pyexpat name, from xml.parsers import expat as pyexpat > 1. Replace the core xml directory with the _xmlplus directory, by renaming > _xmlplus to xml. I'm not entirely sure *why* this is needed, but it certainly can't hurt. > 2. Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the > \DLLs directory. That is a good idea, yes. > 3. If pyexpat is needed, either explicitly import it in your script, or > manually include it in the standalone installation by adding an entry to the > misc line in the COLLECT section of the builder .cfg file. pyexpat should always be included in PyXML applications, so that is also fine. > 4. If importing from xml.sax, manually import the entire PyXML tree (source > files only) by specifying either packages=xml or directories=xml in the PYZ > section of the builder .cfg file. I would guess the same applies when importing DOM stuff - the DOM readers also use make_parser at some point. Regards, Martin From ole@discus.anu.edu.au Sun Jan 21 08:28:03 2001 From: ole@discus.anu.edu.au (Ole NIELSEN) Date: Sun, 21 Jan 2001 19:28:03 +1100 (EST) Subject: [XML-SIG] Problem Installing PyXML Message-ID: Dear xml-sig specialist I have tried to install PyXML on two different machines with the following problem: "ImportError: cannot import name Extension" We have Python 1.5.2 installed. I have enclosed a transcript of the installation below. Would upgrading to a newer version of Python solve the problem ? Thanks you very much in advance. Ole Nielsen TRANSCRIPT: ------------------------------------------------- capricorn: ole/PyXML-0.6.3/:python setup.py build Traceback (innermost last): File "setup.py", line 8, in ? from distutils.core import setup, Extension ImportError: cannot import name Extension capricorn: ole/PyXML-0.6.3/:python Python 1.5.2 (#2, Aug 16 2000, 09:31:06) [GCC 2.95.1 19990816 (release)] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam ------------------------------------------------------------------------- ------------------------------------------------------------------- Ole Moller Nielsen | Email: Ole.Nielsen@anu.edu.au Computer Sciences Lab, RSISE, |----------------------------------- Australian National University | Phone: +61 2 6125 8627 (Direct) Canberra ACT 0200 | Phone: +61 2 6125 8644 (Secr.) Australia | Fax: +61 2 6125 8645/8651 ------------------------------------------------------------------- URL: www.bigfoot.com/~uniomni ------------------------------------------------------------------- From martin@mira.cs.tu-berlin.de Sun Jan 21 08:58:40 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 21 Jan 2001 09:58:40 +0100 Subject: [XML-SIG] Problem Installing PyXML In-Reply-To: (message from Ole NIELSEN on Sun, 21 Jan 2001 19:28:03 +1100 (EST)) References: Message-ID: <200101210858.f0L8we501034@mira.informatik.hu-berlin.de> > I have tried to install PyXML on two different machines with the following > problem: "ImportError: cannot import name Extension" > We have Python 1.5.2 installed. I have enclosed a transcript of the > installation below. > > Would upgrading to a newer version of Python solve the problem ? That, or upgrading to distutils 1.0 (which is probably easier to achieve). Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 21 08:56:58 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 21 Jan 2001 09:56:58 +0100 Subject: [XML-SIG] Re: Using Installer with PyXML In-Reply-To: <037201c08338$42ce08a0$11260340@yin> (dan.rolander@marriott.com) References: <02c401c08311$a66e8f00$11260340@yin> <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de> <037201c08338$42ce08a0$11260340@yin> Message-ID: <200101210856.f0L8uwj01030@mira.informatik.hu-berlin.de> > I have not found the xml-sig documentation, or the python library > reference, to be too helpful for someone new to xml processing, so I > bought Sean McGrath's book "XML Processing with Python" and have > found that to be *very* helpful. I'm glad to hear this. The PyXML documentation is certainly not targetted at people new to XML at all; it is mostly for people that know XML, and want to learn about XML processing in Python. BTW, did you have a look at the PyXML tutorial as well? > But his examples, which I was testing, use references to pyexpat. Not surprising; the wrapper module was created just before the Python 2.0 release. > I tested your suggestion of using "from xml.parsers import expat" > vs. "import pyexpat" and that works fine, but I'm not sure what the > benefit of using that form is. To get independent from the location of the pyexpat module. If you say "import pyexpat", and you use PyXML, you still won't get the PyXML version of that module - this lives in xml.parsers.pyexpat. Regards, Martin From larsga@garshol.priv.no Mon Jan 22 09:27:06 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Jan 2001 10:27:06 +0100 Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled In-Reply-To: <01012009443202.00856@localhost.localdomain> References: <01011909403002.00874@localhost.localdomain> <01012009443202.00856@localhost.localdomain> Message-ID: * Lars Marius Garshol | | Because it would be a real pain, and would most likely break lots of | applications. If text nodes can suddenly be represented as both text | and cdata nodes, applications that only test for text nodes (and I | assume this is the majority) will be silently losing data. * matt@virtualspectator.com | | That would make either the implementation of CDATA wrong, or the way | you use it. Well, both, actually. I think the way CDATA is handled by the DOM is wrong, in that it pushes lexical information[1] into your face and forces you to deal with it when in 99% of the cases you do not care at all. SAX and expat handle this much better, by telling you about the CDATA without forcing you to care. xmllib gets it very wrong. And since no Python DOMs currently create CDATA nodes and it requires some extra thought to handle I suspect that the great majority of DOM applications have no code to handle CDATA nodes appearing instead of Text nodes. | Text nodes are base classes of CDATA, so process that works on text | nodes will implicitly work on CDATA nodes .... Nope, because in most cases you will test for the type of node through the nodeType attribute, and that has different values for CDATA and Text. You can also test via the isinstance function, but that would tie your application to a specific implementation and would be a very bad idea. | Even if you try a type cast to assert this you should get a valid | base class pointer back .... not that python on it's face worries | too much about that. :-) | Otherwise I am confused as to what you mean. It seems to me anyway | that everyone has been trying to make the argument that they are one | in the same, which they are in the interpretation sense. Exactly. | A parser such as expat handles the inheritance perfectly since for a | CDATA section it will give you CDATA begin and end events while | passing the data itself into character data handlers. This is the way to handle it, yes. | I don't see things breaking anywhere. Not with expat, but with the DOM and xmllib chances are that applications written by people who are not fully into XML and the API they are using will break when CDATA starts appearing. * Lars Marius Garshol | | Furthermore, the normalize method, which many applications use to | ensure that there are no adjacent text nodes in the DOM tree stops | working in the presence of cdata nodes, since these are not | normalized. * matt@virtualspectator.com | | Perhaps the specification for normalize on a nodes sub-tree is | wrong, or, you expect it to always give you a nice single | replacement node. I think it is equally wrong to flatly remove all | CDATA nodes without giving the user a handle to keep them. Well, I think the whole cake should have been cut up differently. Text nodes should have a method isCDATA that could be used to check whether it originally was a CDATA section or not. (Note that this requires CDATA sections to give rise to separate DOM nodes, but they tend to do that anyway.) Normalize would then collapse both text and CDATA, which IMHO is the only reasonable behaviour for it anyway. It is only useful to simplify traversal of the tree, but it doesn't achieve that if CDATA nodes are not normalized. Any user that cares about the CDATA/text distinction will then have to do without normalize(), but I doubt that they will care much, and in any case they are a very small minority. | They serve a useful purpose, and it seems bizarre that the DOM | document builder just throws away the events that tell us we have | come across a CDATA node. Perhaps it should sit at the level of | normalize itself .... pass an extra optional argument that | translates CDATA nodes and therefore includes them in the merge? That is an option, but I don't really like it. If you keep the CDATA interface you should really have HexNumericCharacterReference and DecimalNumericCharacterReference interfaces as well. | That it doesn't matter which way you represent any "hidden" markup | eg as < or as < within a CDATA section, expat will give '<' to | the character data handler. Which is useful. Uh, no, it's actually wrong, and that's probably why expat doesn't do it either. :-) * Lars Marius Garshol | | I have been thinking lately that it would be an interesting experiment | to make an XML parser with an interface specialized for representing | ALL the lexical information about a document. I guess this could be | done by passing along with every event the list of tokens that made up | that event. * matt@virtualspectator.com | | What sort of representation? Well, say that you have an event-based interface like SAX, pyexpat or something else, and that the event for character data is character_data(data, raw) where raw is a list of tokens. So for the document A Testing testing. you get these calls character_data('\012 ', ['\012 ']) character_data('A', ['A']) character_data(' wheee! ', ['']) --Lars M. From jerome.marant@free.fr Mon Jan 22 10:03:02 2001 From: jerome.marant@free.fr (Jérôme Marant) Date: 22 Jan 2001 11:03:02 +0100 Subject: [XML-SIG] Problematic use of setupext Message-ID: <7z4ryr52g9.fsf@amboise.ird.idealx.com> Hi, I'm trying to update the package the new 0.6.3 version for Debian and it seems that setupext is problematic. Whenever I use `python setup.py clean --all', .pyc files are generated (__init__.pyc and install_data.pyc). What I would like is to ge rid of all pyc and pyo in the package but for that reason, It seems to be impossible. So I'll have to remove them by hand. Could anyone explain ? Thanks. --=20 J=E9r=F4me Marant http://jerome.marant.free.fr From akuchlin@mems-exchange.org Mon Jan 22 19:46:03 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 22 Jan 2001 14:46:03 -0500 Subject: [XML-SIG] Note on XML for 2.1 Message-ID: I'm working on a "What's New in 2.1" article, and want to add a mention of the improvements to the xml package. Here's my proposed text; is it accurate? \item The PyXML package has gone through a few releases since Python 2.0, and Python 2.1 includes an updated version of the \module{xml} package. Some of the noteworthy changes include support for Expat 1.2, the ability for Expat parsers to handle files in any encoding supported by Python, and various bugfixes for SAX, DOM, and the \module{minidom} module. --amk From martin@mira.cs.tu-berlin.de Mon Jan 22 22:35:59 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 22 Jan 2001 23:35:59 +0100 Subject: [XML-SIG] Problematic use of setupext In-Reply-To: <7z4ryr52g9.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr) References: <7z4ryr52g9.fsf@amboise.ird.idealx.com> Message-ID: <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de> > Whenever I use `python setup.py clean --all', .pyc files are generated > (__init__.pyc and install_data.pyc). What I would like is to ge rid of > all pyc and pyo in the package but for that reason, It seems to be > impossible. So I'll have to remove them by hand. > > Could anyone explain ? It's not that difficult to explain: setup.py does a straight import of setupext, which results in pyc files being generated. If you think you can fix this: patches are welcome. Regards, Martin From martin@mira.cs.tu-berlin.de Mon Jan 22 22:57:59 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 22 Jan 2001 23:57:59 +0100 Subject: [XML-SIG] Note on XML for 2.1 In-Reply-To: (message from Andrew Kuchling on Mon, 22 Jan 2001 14:46:03 -0500) References: Message-ID: <200101222257.f0MMvxh01639@mira.informatik.hu-berlin.de> > I'm working on a "What's New in 2.1" article, and want to add > a mention of the improvements to the xml package. Here's my proposed > text; is it accurate? It certainly is. Thanks, Martin From 935551@ican.net Tue Jan 23 07:52:23 2001 From: 935551@ican.net (Richard Anthony Hein) Date: Tue, 23 Jan 2001 02:52:23 -0500 Subject: [XML-SIG] Newbie confused by output ... Message-ID: <000201c08511$b250ec80$0100a8c0@k6> Hi everyone, I am new to Python and am trying to get a hang of the XML libraries available. I am having trouble finding tutorials and documentation. When I finally found some documentation at http://velocity.activestate.com/docs/ActivePython/lib/expat-example.html, I tried the example for expat (actually used pyexpat and expat), and have the following result: >>> from xml.parsers import expat >>> def start_element(name, attrs): ... print 'Start element:', name, attrs ... >>> def end_element(name): ... print 'End element:', name ... >>> def char_data(data): ... print 'Character data:', repr(data) ... >>> p = pyexpat.ParserCreate() >>> p.StartElementHandler = start_element >>> p.EndElementHandler = end_element >>> p.CharacterDataHandler = char_data >>> p.Parse(""" ... Text goes here ... More text ... """) Start element: parent {u'id': u'top'} Start element: child1 {u'name': u'Paul'} Character data: u'Text goes here' End element: child1 Character data: u'\012' Start element: child2 {u'name': u'Fred'} Character data: u'More text' End element: child2 Character data: u'\012' End element: parent 1 So what are all of those u's doing in there, and why is there a 1 printed? This was unexpected. Also, perhaps you can point me towards some helpful tutorials for getting up to speed with XML processing in Python? TIA, Richard Anthony Hein From larsga@garshol.priv.no Tue Jan 23 09:05:35 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 23 Jan 2001 10:05:35 +0100 Subject: [XML-SIG] Newbie confused by output ... In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6> References: <000201c08511$b250ec80$0100a8c0@k6> Message-ID: * Richard Anthony Hein | | When I finally found some documentation at | http://velocity.activestate.com/docs/ActivePython/lib/expat-example.html, | I tried the example for expat (actually used pyexpat and expat), There is documentation in the standard library documentation on python.org, which you can download and also browse online. | So what are all of those u's doing in there, The u prefix means that the string is a Unicode string. In most cases, this is no different from an ordinary string, except that it can contain any Unicode character. | and why is there a 1 printed? The 1 is the return value of your call to Parse(), meaning that there were no errors. | Also, perhaps you can point me towards some helpful tutorials for | getting up to speed with XML processing in Python? The standard documentation is the only one I know of. There is at least one Python XML book listed at which may be worth looking at. --Lars M. From martin@mira.cs.tu-berlin.de Tue Jan 23 09:18:49 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 23 Jan 2001 10:18:49 +0100 Subject: [XML-SIG] Newbie confused by output ... In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6> (935551@ican.net) References: <000201c08511$b250ec80$0100a8c0@k6> Message-ID: <200101230918.f0N9Ink01207@mira.informatik.hu-berlin.de> > I am new to Python and am trying to get a hang of the XML libraries > available. I am having trouble finding tutorials and documentation. Please have a look at http://pyxml.sourceforge.net/topics/ specifically http://www.python.org/doc/howto/xml/ For reference documentation, use http://python.sourceforge.net/devel-docs/lib/markup.html > Character data: u'\012' > End element: parent > 1 > > So what are all of those u's doing in there, and why is there a 1 printed? > This was unexpected. The u indicates that this is a unicode object, not a bytestring object. It appears that the feature is not documented in the Python Reference Manual, see http://www.python.org/2.0/new-python.html#SECTION000500000000000000000 The 1 means that the Parse function returned with a value of 1, see http://www.python.org/doc/current/tut/node4.html Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 23 09:18:49 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 23 Jan 2001 10:18:49 +0100 Subject: [XML-SIG] Newbie confused by output ... In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6> (935551@ican.net) References: <000201c08511$b250ec80$0100a8c0@k6> Message-ID: <200101230918.f0N9Ink01207@mira.informatik.hu-berlin.de> > I am new to Python and am trying to get a hang of the XML libraries > available. I am having trouble finding tutorials and documentation. Please have a look at http://pyxml.sourceforge.net/topics/ specifically http://www.python.org/doc/howto/xml/ For reference documentation, use http://python.sourceforge.net/devel-docs/lib/markup.html > Character data: u'\012' > End element: parent > 1 > > So what are all of those u's doing in there, and why is there a 1 printed? > This was unexpected. The u indicates that this is a unicode object, not a bytestring object. It appears that the feature is not documented in the Python Reference Manual, see http://www.python.org/2.0/new-python.html#SECTION000500000000000000000 The 1 means that the Parse function returned with a value of 1, see http://www.python.org/doc/current/tut/node4.html Regards, Martin From jerome.marant@free.fr Tue Jan 23 11:23:28 2001 From: jerome.marant@free.fr (Jérôme Marant) Date: 23 Jan 2001 12:23:28 +0100 Subject: [XML-SIG] Problematic use of setupext In-Reply-To: "Martin v. Loewis"'s message of "Mon, 22 Jan 2001 23:35:59 +0100" References: <7z4ryr52g9.fsf@amboise.ird.idealx.com> <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de> Message-ID: <7zg0ia8qbz.fsf@amboise.ird.idealx.com> "Martin v. Loewis" writes: > It's not that difficult to explain: setup.py does a straight import of > setupext, which results in pyc files being generated. AFAIK, as long as this is how the interpreter behaves, I have no clue. For instance, It would be nice to specify the interpreter not to generate pyc files ... --=20 J=E9r=F4me Marant http://jerome.marant.free.fr From larsga@garshol.priv.no Tue Jan 23 12:36:43 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 23 Jan 2001 13:36:43 +0100 Subject: [XML-SIG] Development roadmap? Message-ID: I think it would make sense for the XML-SIG to create a development roadmap document that basically outlines - tasks that we plan to do - who is assigned to what task, if known - rough estimate for task completion, when known I think this would make it easier for ourselves to keep track of what is going on, make it clearer to the rest of the world what we are doing, and also help newcomers to find out where they can chip in and make a contribution. This would obviously have to be a living document that is constantly updated to reflect the current state of affairs. I'd be willing to take on that task, which I guess would consist of writing the first version of it and updating it when others neglect to do so. Comments? Thoughts? Opinions? --Lars M. From Nicolas.Chauvat@logilab.fr Tue Jan 23 12:38:46 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Tue, 23 Jan 2001 13:38:46 +0100 (CET) Subject: [XML-SIG] Development roadmap? In-Reply-To: Message-ID: On 23 Jan 2001, Lars Marius Garshol wrote: > I think this would make it easier for ourselves to keep track of what > is going on, make it clearer to the rest of the world what we are > doing, and also help newcomers to find out where they can chip in and > make a contribution. >=20 > Comments? Thoughts? Opinions? http://www.logilab.org/pygantt/ may help for that. 1. data stored as XML. 2. python script renders data as HTML Gantt diagram. 3. If you prefer, add you own renderer using XSL or deriving the basic one. Hope this helps, though it's probably not the format you were thinking to. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From larsga@garshol.priv.no Tue Jan 23 12:51:06 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 23 Jan 2001 13:51:06 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: References: Message-ID: * Nicolas Chauvat | | http://www.logilab.org/pygantt/ may help for that. | [...] | Hope this helps, though it's probably not the format you were | thinking to. I think that for something as simple as this it would really be overkill. I don't think we really need something as fancy, and as tight, as a project plan, merely something that describes the direction we intend to go in. Hand-edited HTML would do just fine, I think. --Lars M. From uche.ogbuji@fourthought.com Tue Jan 23 16:58:53 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 09:58:53 -0700 Subject: [XML-SIG] Note on XML for 2.1 In-Reply-To: Message from Andrew Kuchling of "Mon, 22 Jan 2001 14:46:03 EST." Message-ID: <200101231658.JAA03305@localhost.localdomain> > I'm working on a "What's New in 2.1" article, and want to add > a mention of the improvements to the xml package. Here's my proposed > text; is it accurate? > > \item The PyXML package has gone through a few releases since Python > 2.0, and Python 2.1 includes an updated version of the \module{xml} > package. Some of the noteworthy changes include support for Expat > 1.2, the ability for Expat parsers to handle files in any encoding > supported by Python, and various bugfixes for SAX, DOM, and the > \module{minidom} module. I can't think of anything else. Sounds good. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Tue Jan 23 23:25:59 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 23 Jan 2001 17:25:59 -0600 Subject: [XML-SIG] Development roadmap? In-Reply-To: Lars Marius Garshol's message of "23 Jan 2001 13:36:43 +0100" References: Message-ID: Lars Marius Garshol writes: > I think it would make sense for the XML-SIG to create a development > roadmap document that basically outlines [...] > I think this would make it easier for ourselves to keep track of what > is going on, make it clearer to the rest of the world what we are > doing, and also help newcomers to find out where they can chip in and > make a contribution. On a somewhat related note, I've been developing a C-based extension library that's designed for binding to host languages, called Orchard. Orchard implements "node-based" SAX (push, and soon, pull) and a DOM comparible to minidom (minus even a few more gratuitous W3C DOM methods). There will (also soon, I hope) be a Python binding for Orchard. Orchard's C preprocessor and runtime includes garbage collection, attribute syntax, dynamic methods, and accessor override methods, to make binding to languages as simple as possible (think of SWIG in reverse). Orchard also supports namespaces as a core feature (making some XML applications truly simple, see the Python RSS and SOAP implementations for examples). Orchard will not initially be Py SAX and DOM compatible, but compatibility modules are possible. I mention it mostly because it's a parallel development going on at the moment. Source and initial docs are available at . Anyone interested in working on Python support is welcome, drop me an email. -- Ken From tpassin@home.com Wed Jan 24 01:33:20 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 23 Jan 2001 20:33:20 -0500 Subject: [XML-SIG] Development roadmap? References: Message-ID: <003601c085a5$a28b2560$7cac1218@reston1.va.home.com> Yes, this would be a good thing to do. Cheers, Tom P Lars Marius Garshol wrote - > > I think it would make sense for the XML-SIG to create a development > roadmap document that basically outlines > > - tasks that we plan to do > - who is assigned to what task, if known > - rough estimate for task completion, when known > > I think this would make it easier for ourselves to keep track of what > is going on, make it clearer to the rest of the world what we are > doing, and also help newcomers to find out where they can chip in and > make a contribution. > > This would obviously have to be a living document that is constantly > updated to reflect the current state of affairs. I'd be willing to > take on that task, which I guess would consist of writing the first > version of it and updating it when others neglect to do so. > > Comments? Thoughts? Opinions? > From martin@mira.cs.tu-berlin.de Wed Jan 24 07:45:30 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 08:45:30 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: (message from Lars Marius Garshol on 23 Jan 2001 13:36:43 +0100) References: Message-ID: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> > I think it would make sense for the XML-SIG to create a development > roadmap document that basically outlines > > - tasks that we plan to do > - who is assigned to what task, if known > - rough estimate for task completion, when known It seems that the current PyXML TODO document could be used to hold that information. The only 'maintainance' I did to that document so far was to remove obsolete entries - feel free to add stuff back. If you think this information would be better maintained in a different format or location (e.g. SF task manager), I suggest that the TODO file is deleted altogether. > This would obviously have to be a living document that is constantly > updated to reflect the current state of affairs. I'd be willing to > take on that task, which I guess would consist of writing the first > version of it and updating it when others neglect to do so. That sounds like your book is complete :-) Anyway, if you are willing to maintain a roadmap, go just ahead. As for specific things I plan to do (over the course of the next months): I'd like to offer XPath and XSLT support in PyXML. I also like to push contributors to contribute updates of their respective packages :-) Regards, Martin From rob@hooft.net Wed Jan 24 10:09:04 2001 From: rob@hooft.net (Rob W. W. Hooft) Date: Wed, 24 Jan 2001 11:09:04 +0100 Subject: [XML-SIG] Problematic use of setupext In-Reply-To: <7zg0ia8qbz.fsf@amboise.ird.idealx.com> References: <7z4ryr52g9.fsf@amboise.ird.idealx.com> <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de> <7zg0ia8qbz.fsf@amboise.ird.idealx.com> Message-ID: <14958.43456.710851.579062@temoleh.chem.uu.nl> >>>>> "JM" =3D=3D J=E9r=F4me Marant writes: JM> "Martin v. Loewis" writes: >> It's not that difficult to explain: setup.py does a straight >> import of setupext, which results in pyc files being generated. JM> AFAIK, as long as this is how the interpreter behaves, I have JM> no clue. For instance, It would be nice to specify the JM> interpreter not to generate pyc files ... How about changing the directory protection to 555 before import? At least on unix that should prevent .pyc generation. Rob --=20 =3D=3D=3D=3D=3D rob@hooft.net http://www.hooft.net/people/ro= b/ =3D=3D=3D=3D=3D =3D=3D=3D=3D=3D R&D, Nonius BV, Delft http://www.nonius.nl/ = =3D=3D=3D=3D=3D =3D=3D=3D=3D=3D PGPid 0xFA19277D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Use Linux! =3D=3D=3D=3D=3D=3D=3D= =3D=3D From jerome.marant@free.fr Wed Jan 24 10:12:30 2001 From: jerome.marant@free.fr (Jérôme Marant) Date: 24 Jan 2001 11:12:30 +0100 Subject: [XML-SIG] Problematic use of setupext In-Reply-To: rob@hooft.net's message of "Wed, 24 Jan 2001 11:09:04 +0100" References: <7z4ryr52g9.fsf@amboise.ird.idealx.com> <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de> <7zg0ia8qbz.fsf@amboise.ird.idealx.com> <14958.43456.710851.579062@temoleh.chem.uu.nl> Message-ID: <7zk87ljm29.fsf@amboise.ird.idealx.com> rob@hooft.net (Rob W. W. Hooft) writes: > How about changing the directory protection to 555 before import? > At least on unix that should prevent .pyc generation. This is one of the possible solutions, but not the most elegant one :-) Cheers, --=20 J=E9r=F4me Marant http://jerome.marant.free.fr From larsga@garshol.priv.no Wed Jan 24 11:29:57 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Jan 2001 12:29:57 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> References: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | It seems that the current PyXML TODO document could be used to hold | that information. The only 'maintainance' I did to that document so | far was to remove obsolete entries - feel free to add stuff back. To be honest I have to admit that I didn't even know that it existed. Now that I've found it I think that we could start from it, but that it should be on the web somewhere and also fleshed out somewhat. | If you think this information would be better maintained in a | different format or location (e.g. SF task manager), I suggest that | the TODO file is deleted altogether. I think the main candidates are an HTML file on python.org or sourceforge.net or the SF task manager. I don't really have any strong opinions on either. The HTML file would probably give a better overview, provide more information about each task and is also more flexible, but the task manager is probably easier to keep up to date and more helpful as an organizational tool. So I think I prefer the HTML page, but it's not a very strong opinion. | That sounds like your book is complete :-) It is. (Yes yes yes yes YES!!! :-) | Anyway, if you are willing to maintain a roadmap, go just ahead. OK, will do, unless people speak up and say they want the SF task manager instead. | I also like to push contributors to contribute updates of their | respective packages :-) Will do, as soon as the problems with my account are fixed, so that I can commit again. :) (See support request 111946.) --Lars M. From tpassin@home.com Wed Jan 24 13:29:19 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 24 Jan 2001 08:29:19 -0500 Subject: [XML-SIG] Development roadmap? References: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> Message-ID: <001401c08609$a838db60$7cac1218@reston1.va.home.com> Lars Marius Garshol wrote - > > * Martin v. Loewis > | > | It seems that the current PyXML TODO document could be used to hold > | that information. The only 'maintainance' I did to that document so > | far was to remove obsolete entries - feel free to add stuff back. > ... > I think the main candidates are an HTML file on python.org or > sourceforge.net or the SF task manager. I don't really have any > strong opinions on either. > Let's do it as an HTML file in the documentation section of the SourceForge Site. > > | That sounds like your book is complete :-) > > It is. (Yes yes yes yes YES!!! :-) > Congratulations! I'm very keen to see the result. Cheers, Tom P From uche.ogbuji@fourthought.com Wed Jan 24 14:13:10 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 24 Jan 2001 07:13:10 -0700 Subject: [XML-SIG] Development roadmap? In-Reply-To: Message from "Martin v. Loewis" of "Wed, 24 Jan 2001 08:45:30 +0100." <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> Message-ID: <200101241413.HAA31296@localhost.localdomain> > That sounds like your book is complete :-) Anyway, if you are willing > to maintain a roadmap, go just ahead. > > As for specific things I plan to do (over the course of the next > months): I'd like to offer XPath and XSLT support in PyXML. I also > like to push contributors to contribute updates of their respective > packages :-) I was going to talk about this when I posted the 4Suite road-map, but we've agreed to move 4XPath and 4XSLT into PyXML. Let's get through the current opening of the XPath parser API and we can begin the process, preferably if it can be started not too close to the next release of 4Suite (scheduled second monday Feb). As for your last sentence, Jeremy was going to update PyXML's 4DOM. I know he has been working on 4Suite.org for the past few days, but he said it's high on his to-do list. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Wed Jan 24 19:32:34 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jan 2001 11:32:34 -0800 Subject: [XML-SIG] [Patch #103408] xml/marshal/wddx.py mods Message-ID: Patch #103408 has been updated. Project: pyxml Category: None Status: Open Submitted by: robin900 Assigned to : nobody Summary: xml/marshal/wddx.py mods ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=103408&group_id=6473 From martin@mira.cs.tu-berlin.de Wed Jan 24 19:55:12 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 20:55:12 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: (message from Lars Marius Garshol on 24 Jan 2001 12:29:57 +0100) References: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> Message-ID: <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de> > Now that I've found it I think that we could start from it, but that > it should be on the web somewhere and also fleshed out somewhat. Since you just got elected maintainer, you can chose any format you consider appropriate. > I think the main candidates are an HTML file on python.org or > sourceforge.net or the SF task manager. I don't really have any > strong opinions on either. I'd suggest a location inside the topic guide then; that already is CVS-accessible. There is an automatic update procedure so you just need to cvs commit to publish (if you can stand the 6h delay until the cron job runs). I understand that maintaining files on python.org is still possible only for a chosen few. Again, please remove the TODO file from PyXML when you commit the first version of your roadmap document. I'll then come up with a procedure to include the roadmap in the distributions. Regards, Martin From martin@mira.cs.tu-berlin.de Wed Jan 24 19:59:00 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 20:59:00 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: <200101241413.HAA31296@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101241413.HAA31296@localhost.localdomain> Message-ID: <200101241959.f0OJx0J00961@mira.informatik.hu-berlin.de> > I was going to talk about this when I posted the 4Suite road-map, > but we've agreed to move 4XPath and 4XSLT into PyXML. I was hoping you'd say that. Sorry for the little pushing :-) > Let's get through the current opening of the XPath parser API and we > can begin the process, preferably if it can be started not too close > to the next release of 4Suite (scheduled second monday Feb). Ok, it seems that I still owe some commentary and updates to the draft... > As for your last sentence, Jeremy was going to update PyXML's 4DOM. > I know he has been working on 4Suite.org for the past few days, but > he said it's high on his to-do list. That sounds all very well. Regards, Martin From jeremy.kloth@fourthought.com Wed Jan 24 23:45:05 2001 From: jeremy.kloth@fourthought.com (Jeremy J Kloth) Date: Wed, 24 Jan 2001 16:45:05 -0700 Subject: [XML-SIG] Re: [4suite] Where is xml.xslt? References: <000a01c08648$ace3c730$0100a8c0@k6> Message-ID: <00da01c0865f$b759baa0$1b01a8c0@fourthought.com> > Actually, it hasn't gone well ... is this the problem you were warning me about (I am using Windows 2000 and Python 2.0 btw): > [...snip traceback...] > > I installed the binary distribution first, before I got your message below warning me about the clashes. So I couldn't do what you suggested. I am downloading the sources right now. > > -- Richard A. Hein > There is bug in the HTML DOM implementation that is causing this error. Below is a patch for the related files. ---PATCH--- diff -u html/HTMLCollection.py devel/Ft/Dom/html/HTMLCollection.py --- html/HTMLCollection.py Mon Jan 15 13:21:26 2001 +++ devel/Ft/Dom/html/HTMLCollection.py Wed Jan 17 15:17:48 2001 @@ -54,11 +54,16 @@ def namedItem(self, name): found_node = None for node in self: + # IDs take presedence over NAMEs if node.getAttribute('ID') == name: - return node + found_node = node + break if not found_node and node.getAttribute('NAME') == name \ - and node.tagName in HTML_NAME_ALLOWED: + and node.tagName in HTML_NAME_ALLOWED: + # We found a node with NAME attribute, but we have to wait + # until all nodes are done (one might have an ID that matches) found_node = node + print 'found:', found_node return found_node diff -u html/HTMLDocument.py devel/Ft/Dom/html/HTMLDocument.py --- html/HTMLDocument.py Mon Jan 15 21:00:52 2001 +++ devel/Ft/Dom/html/HTMLDocument.py Wed Jan 17 15:36:46 2001 @@ -72,7 +72,7 @@ elements = self.getElementsByTagName('BODY') if elements: # Replace the existing one - oldBody.parentNode.replaceChild(newBody, elements[0]) + elements[0].parentNode.replaceChild(newBody, elements[0]) else: # Add it self.documentElement.appendChild(newBody) diff -u html/HTMLElement.py devel/Ft/Dom/html/HTMLElement.py --- html/HTMLElement.py Mon Jan 15 13:21:16 2001 +++ devel/Ft/Dom/html/HTMLElement.py Wed Jan 17 15:22:06 2001 @@ -58,19 +57,19 @@ def getAttribute(self, name): attr = self.attributes.getNamedItem(string.upper(name)) - attr and attr.value or '' + return attr and attr.value or '' def getAttributeNode(self, name): - return self.attribute.getNamedItem(string.upper(name)) + return self.attributes.getNamedItem(string.upper(name)) def getElementsByTagName(self, tagName): return Element.getElementsByTagName(self, string.upper(tagName)) def hasAttribute(self, name): - return self.attribute.getNamedItem(string.upper(name)) is not None + return self.attributes.getNamedItem(string.upper(name)) is not None def removeAttribute(self, name): - attr = set.attributes.getNamedItem(string.upper(name)) + attr = self.attributes.getNamedItem(string.upper(name)) attr and self.removeAttributeNode(attr) def setAttribute(self, name, value): @@ -80,6 +79,18 @@ return value ### Helper Functions For Cloning ### + + def _4dom_clone(self, owner): + e = self.__class__(owner, + self.tagName) + for attr in self.attributes: + clone = attr._4dom_clone(owner) + if clone.localName is None: + e.attributes.setNamedItem(clone) + else: + self.attributes.setNamedItemNS(clone) + clone._4dom_setOwnerElement(self) + return e def __getinitargs__(self): return (self.ownerDocument, -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From eugeneai@icc.ru Thu Jan 25 02:24:46 2001 From: eugeneai@icc.ru (Evgeny Cherkashin) Date: Thu, 25 Jan 2001 10:24:46 +0800 Subject: [XML-SIG] What install builder do You use ... In-Reply-To: <20010123170106.05390EEB0@mail.python.org> References: <20010123170106.05390EEB0@mail.python.org> Message-ID: <200101250226.KAA10772@monster.icc.ru> Hi! What instlator program builder do You use for building .exe packages? e.g. PyXML? An where can I find it? Evgeny From martin@mira.cs.tu-berlin.de Thu Jan 25 06:01:39 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 25 Jan 2001 07:01:39 +0100 Subject: [XML-SIG] What install builder do You use ... In-Reply-To: <200101250226.KAA10772@monster.icc.ru> (message from Evgeny Cherkashin on Thu, 25 Jan 2001 10:24:46 +0800) References: <20010123170106.05390EEB0@mail.python.org> <200101250226.KAA10772@monster.icc.ru> Message-ID: <200101250601.f0P61dl01216@mira.informatik.hu-berlin.de> > What instlator program builder do You use for building .exe packages? Distutils. python setup.py bdist_wininst. > An where can I find it? It's part of Python 2.0. Regards, Martin From noreply@sourceforge.net Thu Jan 25 09:22:57 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jan 2001 01:22:57 -0800 Subject: [XML-SIG] [Bug #130020] 4DOM: cloneNode broken for derived classes Message-ID: Bug #130020, was updated on 2001-Jan-25 01:22 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: 4DOM: cloneNode broken for derived classes Details: I'm just posting this on SF as a reminder. For a description and possible resolution, please refer to http://lists.fourthought.com/pipermail/4suite/2001-January/001199.html and http://lists.fourthought.com/pipermail/4suite/2001-January/001200.html For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=130020&group_id=6473 From larsga@garshol.priv.no Thu Jan 25 09:51:38 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 25 Jan 2001 10:51:38 +0100 Subject: [XML-SIG] Development roadmap? In-Reply-To: <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de> References: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | Since you just got elected maintainer, you can chose any format you | consider appropriate. Will do. I don't expect to have anything until after the weekend. | I'd suggest a location inside the topic guide then; that already is | CVS-accessible. There is an automatic update procedure so you just | need to cvs commit to publish (if you can stand the 6h delay until | the cron job runs) This sounds fine to me. | Again, please remove the TODO file from PyXML when you commit the | first version of your roadmap document. Will do. --Lars M. From noreply@sourceforge.net Thu Jan 25 15:30:02 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jan 2001 07:30:02 -0800 Subject: [XML-SIG] [Bug #130049] [4DOM] normalize() fails on DocumentFragments Message-ID: Bug #130049, was updated on 2001-Jan-25 07:30 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: [4DOM] normalize() fails on DocumentFragments Details: I only tested this on 4Suite 0.10.0, please tell me if this was fixed in 0.10.1. Calling normalize() on a document fragment does not recurse to the child elements (though it processes text nodes that are immediate childs of the DF. Workaround is manually iterating through the child nodes of the DF and calling normalize() manually. Sample code: from xml.dom.ext.reader import Sax2 d = Sax2.FromXml('') # Yes, I know I'm a lazy boy... df = d.createDocumentFragment() df.appendChild(d.createElementNS('','foo')) df.firstChild.appendChild(d.createTextNode('textNode1 ')) df.firstChild.appendChild(d.createTextNode('textNode2 ')) print 'before normalize' print df.firstChild df.normalize() print 'after normalize' print df.firstChild For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=130049&group_id=6473 From noreply@sourceforge.net Thu Jan 25 17:24:50 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jan 2001 09:24:50 -0800 Subject: [XML-SIG] [Patch #103417] 4DOM: Patch for normalize() Message-ID: Patch #103417 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: jkloth Assigned to : nobody Summary: 4DOM: Patch for normalize() ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=103417&group_id=6473 From noreply@sourceforge.net Thu Jan 25 18:19:06 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jan 2001 10:19:06 -0800 Subject: [XML-SIG] [Patch #103418] 4DOM: Derived class cloning Message-ID: Patch #103418 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: jkloth Assigned to : nobody Summary: 4DOM: Derived class cloning ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=103418&group_id=6473 From uche.ogbuji@fourthought.com Fri Jan 26 19:20:44 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 26 Jan 2001 12:20:44 -0700 Subject: [XML-SIG] Can't write to www repo Message-ID: <200101261920.MAA27435@localhost.localdomain> I'm trying to commit my changes to the Python XML topic, but it won't let me: [uogbuji@borgia www]$ cvs commit cvs commit: Examining . cvs commit: Examining ht2html cvs commit: Examining htdocs cvs commit: Examining htdocs/topics cvs commit: Examining htdocs/topics/dtds cvs commit: Examining htdocs/topics/xbel cvs commit: Examining htdocs/topics/xbel/docs cvs commit: Examining htdocs/topics/xbel/docs/html cvs [server aborted]: "commit" requires write access to the repository cvs commit: saving log message in /tmp/cvsXssgBm [uogbuji@borgia www]$ cat CVS/ Entries Repository Root [uogbuji@borgia www]$ cat CVS/Root :pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml [uogbuji@borgia www]$ Can someone fix the permissions? Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Fri Jan 26 22:11:46 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 26 Jan 2001 23:11:46 +0100 Subject: [XML-SIG] Can't write to www repo In-Reply-To: <200101261920.MAA27435@localhost.localdomain> (message from Uche Ogbuji on Fri, 26 Jan 2001 12:20:44 -0700) References: <200101261920.MAA27435@localhost.localdomain> Message-ID: <200101262211.f0QMBkk01145@mira.informatik.hu-berlin.de> > :pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml > [uogbuji@borgia www]$ > > Can someone fix the permissions? It's not a matter of permissions, but of authentication. Please see http://sourceforge.net/cvs/?group_id=6473 pserver only allows anonymous access - you need ssh/CVS_RSH for developer access. Regards, Martin From uche.ogbuji@fourthought.com Fri Jan 26 22:26:13 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 26 Jan 2001 15:26:13 -0700 Subject: [XML-SIG] Can't write to www repo References: <200101261920.MAA27435@localhost.localdomain> <200101262211.f0QMBkk01145@mira.informatik.hu-berlin.de> Message-ID: <3A71F985.CC6D9B88@fourthought.com> "Martin v. Loewis" wrote: > > > :pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml > > [uogbuji@borgia www]$ > > > > Can someone fix the permissions? > > It's not a matter of permissions, but of authentication. Please see > > http://sourceforge.net/cvs/?group_id=6473 > > pserver only allows anonymous access - you need ssh/CVS_RSH for > developer access. I didn't look carefully enough. All I had to do was take out the ":pserver:" part [uogbuji@borgia www]$ cvs -d uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml commit Worked fine. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Jan 26 22:39:36 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 26 Jan 2001 15:39:36 -0700 Subject: [XML-SIG] Update Python XML topic Message-ID: <200101262239.PAA28063@localhost.localdomain> I've completed and checked in the changes. I updated the front page, status, software, dom and fourthought pages. In the software page I updated links and added Pyxie, python davserver, soaplib, Lye and redfoot. Python/XMl software authors, please check http://pyxml.sourceforge.net/topics/software.html And see if I'm missing or misrepresent your work. I'll can make any additions or fixes. One question: the PyPointers link goes to http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html Which gives 404. Lars, is this still something you stillwant listed? If so, where do I point to? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Jan 26 22:51:41 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 26 Jan 2001 15:51:41 -0700 Subject: [XML-SIG] Update Python XML topic In-Reply-To: Message from Uche Ogbuji of "Fri, 26 Jan 2001 15:39:36 MST." <200101262239.PAA28063@localhost.localdomain> Message-ID: <200101262251.PAA28146@localhost.localdomain> > I've completed and checked in the changes. > > I updated the front page, status, software, dom and fourthought pages. Note: the changes won't show up until the page auto-regenerates. I believe someone mentioned a 6-hour interval? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Fri Jan 26 23:39:52 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 27 Jan 2001 00:39:52 +0100 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <200101262239.PAA28063@localhost.localdomain> (message from Uche Ogbuji on Fri, 26 Jan 2001 15:39:36 -0700) References: <200101262239.PAA28063@localhost.localdomain> Message-ID: <200101262339.f0QNdqI01809@mira.informatik.hu-berlin.de> > I've completed and checked in the changes. > > I updated the front page, status, software, dom and fourthought pages. Thanks! Contributions of documentation are often more desirable than contributions of code :-) Regards, Martin From martin@mira.cs.tu-berlin.de Fri Jan 26 23:44:28 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 27 Jan 2001 00:44:28 +0100 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <200101262251.PAA28146@localhost.localdomain> (message from Uche Ogbuji on Fri, 26 Jan 2001 15:51:41 -0700) References: <200101262251.PAA28146@localhost.localdomain> Message-ID: <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de> > Note: the changes won't show up until the page auto-regenerates. I believe > someone mentioned a 6-hour interval? Indeed. You can run the generator by invoking doupdate on shell1.sourceforge.net if you want, but I'd take this as a test case whether the mechanism still works. Anybody advise of a mechanism that performs the update on commit is highly appreciated. Please note that the specific problem is not to just execute some script (such a script is in CVSROOT already), but to have that script properly run on shell1 even though the commit occurs on cvs.sourceforge.net. Regards, Martin From gstein@lyra.org Sat Jan 27 02:28:13 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 26 Jan 2001 18:28:13 -0800 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Sat, Jan 27, 2001 at 12:44:28AM +0100 References: <200101262251.PAA28146@localhost.localdomain> <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de> Message-ID: <20010126182812.Y704@lyra.org> On Sat, Jan 27, 2001 at 12:44:28AM +0100, Martin v. Loewis wrote: > > Note: the changes won't show up until the page auto-regenerates. I believe > > someone mentioned a 6-hour interval? > > Indeed. You can run the generator by invoking doupdate on > shell1.sourceforge.net if you want, but I'd take this as a test case > whether the mechanism still works. > > Anybody advise of a mechanism that performs the update on commit is > highly appreciated. Please note that the specific problem is not to > just execute some script (such a script is in CVSROOT already), but to > have that script properly run on shell1 even though the commit occurs > on cvs.sourceforge.net. Maybe start a script which uses HTTP to invoke a CGI script on the web server? Would that propagate correctly? Have the right permissions? Cheers, -g -- Greg Stein, http://www.lyra.org/ From uche.ogbuji@fourthought.com Sat Jan 27 15:24:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 27 Jan 2001 08:24:40 -0700 Subject: [XML-SIG] Update Python XML topic In-Reply-To: Message from "Martin v. Loewis" of "Sat, 27 Jan 2001 00:44:28 +0100." <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de> Message-ID: <200101271524.IAA02505@localhost.localdomain> > > Note: the changes won't show up until the page auto-regenerates. I believe > > someone mentioned a 6-hour interval? > > Indeed. You can run the generator by invoking doupdate on > shell1.sourceforge.net if you want, but I'd take this as a test case > whether the mechanism still works. I guess it failed the test. The Web pages are still not updated, so I tried to go to the shell account to see what's what. doupdate yields an error as you can see, and I don't know enough about the SF directory layout to figure out how to correct it. [uogbuji@borgia uogbuji]$ ssh uche@pyxml.sourceforge.net Linux usw-cf-linux1 2.2.14-va.4.4-i586 #1 Tue Sep 5 15:18:51 PDT 2000 i686 unknown Welcome to usf-sf-shell1. (orbital generation two) Any problems : please submit a support request: http://sourceforge.net/support/?group_id=1 ------------------------------------------------------------ uche@usw-pr-shell1:~$ cd /home/groups/pyxml uche@usw-pr-shell1:/home/groups/pyxml$ ls cgi-bin doupdate foo ht2html htdocs log uche@usw-pr-shell1:/home/groups/pyxml$ ./doupdate cvs [export aborted]: connect to slayer:2401 failed: Connection refused ./doupdate: cd: www/htdocs: No such file or directory cp: cannot stat `/var/tmp/www5871/www/*': No such file or directory uche@usw-pr-shell1:/home/groups/pyxml$ ls cgi-bin doupdate foo ht2html htdocs log uche@usw-pr-shell1:/home/groups/pyxml$ Any ideas? Looking at the doupdate script I have no idea why it's supposed to work, but I'm not sure how to fix it. I'll keep investigating. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rnd@onego.ru Sun Jan 28 07:53:06 2001 From: rnd@onego.ru (Roman Suzi) Date: Sun, 28 Jan 2001 10:53:06 +0300 (MSK) Subject: [XML-SIG] I am confused... Message-ID: Hello, I've just subscribed to this list and my brief browsing of archives suggested this is right place to ask my question. I maintain a journal and newspaper sites (in russian and finnish languages) ( for example, http://carelia.onego.ru ) and am thinking about using XML to store articles. (Now I use custom Python scripts to generate sites) However, when I made a prototype program and tried to generate page with Python XML tools (xml.*, not 4Suite, I used Python 1.5.2) - it was so slow that I just thrown the idea out. However, XML is a natural format to represent the data I store in an ad-hoc format anyway. So, my main question is: - are Python XML tools (and which of them?) up to the task of facilitating site-generation with bearable speed? And one more less related to the above: Right now I need to markup raw material for the articles by hand and I want to do it with less keystrokes. Just typing tags for 'This And This' is not less typing How do you solve this? I am planning to do something like: a::This And This h::The Headline ... and then run custom pre-processor which will store this in proper format (I hope it will be XML if I find fast way to deal with it in Python) The other way to do the same is to write special mode for Emacs, but I am not very proficient in that and I take into consideration that if somebody else will need to add material instead of me he will be not happy... Any ideas? Thanks! Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Patience is a virtue, it's just not one of my better virtues" _/ From martin@mira.cs.tu-berlin.de Sun Jan 28 10:27:41 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 11:27:41 +0100 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <20010126182812.Y704@lyra.org> (message from Greg Stein on Fri, 26 Jan 2001 18:28:13 -0800) References: <200101262251.PAA28146@localhost.localdomain> <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de> <20010126182812.Y704@lyra.org> Message-ID: <200101281027.f0SARfK01418@mira.informatik.hu-berlin.de> > Maybe start a script which uses HTTP to invoke a CGI script on the web > server? Would that propagate correctly? Have the right permissions? Good idea. I've tried, and when I was almost done, I noticed that it will run as nobody.nobody, thus *not* have the right permissions. I guess this is a sensible thing from the SF point of view, so I'm back to square one. Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 28 10:32:57 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 11:32:57 +0100 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <200101271524.IAA02505@localhost.localdomain> (message from Uche Ogbuji on Sat, 27 Jan 2001 08:24:40 -0700) References: <200101271524.IAA02505@localhost.localdomain> Message-ID: <200101281032.f0SAWvf01441@mira.informatik.hu-berlin.de> > I guess it failed the test. Yes, a number of things seems to have broken. I corrected the script so that the hostnames are good now. I also noticed that it is best run on pyxml.sourceforge.net (which, interestingly, is not the Web server when you login, but is the Web server when you come through port 80 :-). IOW, it works fine when I run it; I wouldn't mind somebody else trying to run it. Furthermore, it seems that the crontabs are not user-readable anymore - although they seem to contain varying per-user information. So I can't even be sure that the cron job still runs; I'll ask SF what this is about. So again, any proposals (or, even better, attempts to solve this) are welcome. For the moment, you have to run /home/groups/pyxml/doupdate manually on pyxml.sourceforge.net after you've committed something. Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 28 11:17:28 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 12:17:28 +0100 Subject: [XML-SIG] I am confused... In-Reply-To: (message from Roman Suzi on Sun, 28 Jan 2001 10:53:06 +0300 (MSK)) References: Message-ID: <200101281117.f0SBHSL01714@mira.informatik.hu-berlin.de> > However, when I made a prototype program > and tried to generate page with Python XML tools (xml.*, > not 4Suite, I used Python 1.5.2) - it was so slow > that I just thrown the idea out. Python 1.5.2 did not come with an xml.* package, so I wonder what exactly you've been using. Perhaps xmllib? That *is* slow. > So, my main question is: > > - are Python XML tools (and which of them?) up to the task of facilitating > site-generation with bearable speed? That probably depends on many things: what exactly you want to achieve, and what approximately you consider bearable. I personally haven't tried myself to produce web sites with PyXML, but I haven't heard complaints about unbearable speed so far. I'd be really curious as to what transformations you wanted to achieve, and how exactly you attempted them. E.g. choice of XML parser matters significantly; there is a number of alternatives in PyXML. > Right now I need to markup raw material for the articles > by hand and I want to do it with less keystrokes. Just > typing tags for 'This And This' is not less typing > How do you solve this? Smart editors can help. For example, the psgml mode of Emacs can perform auto-completion of tags (in particular of closing tags, but also of opening tags if it sees a DTD). > I am planning to do something like: > > a::This And This > h::The Headline > ... > > and then run custom pre-processor which will store this > in proper format (I hope it will be XML if I find > fast way to deal with it in Python) That sounds also like a reasonable thing to do. > The other way to do the same is to write special mode for Emacs, but > I am not very proficient in that and I take into consideration that > if somebody else will need to add material instead of me he will be > not happy... That should favour using XML all the time. People use different editors, right. However, putting XML into a text editor is straight-forward. Some people may want to use your Emacs macros for convenience, but they don't *have* to - they might have some other smart XML editor they know, and the output will still be XML. Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 28 12:43:33 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 13:43:33 +0100 Subject: [XML-SIG] Announcing PyXPath 1.2 In-Reply-To: <3A67AFF5.F0895522@fourthought.com> (message from Jeremy Kloth on Thu, 18 Jan 2001 20:09:41 -0700) References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> <3A67AFF5.F0895522@fourthought.com> Message-ID: <200101281243.f0SChX501988@mira.informatik.hu-berlin.de> > > const unsigned short BINARY_EXPR = 8; > Since there are two basic types of binary expressions, I suggest > splitting this into a BOOLEAN_EXPR and NUMERIC_EXPR. They do offer > quite different functionality. Sounds good. How does the UNION_OPERATOR fit in? > > const unsigned short UNARY_EXPR = 9; > This would be considered a NUMERIC_EXPR. How do you represent a '-x' in a NumericExpr object, then? In particular, how to distingiush 'a-b' and '-a'? The first is createNumericExpr(MINUS_OPERATOR, a, b) Some options for the second one: createNumericExpr(UNARY_MINUS_OPERATOR, a, None) createNumericExpr(UNARY_MINUS_OPERATOR, None, a) createNumericExpr(MINUS_OPERATOR, a, None) createNumericExpr(MINUS_OPERATOR, None, a) Which one would you prefer? > > // the name must still contain the leading $ > > VariableReference createVariableReference(in DOMString name); > > name can be a qualified name. use prefix, localname Ok. > > Literal createLiteral(in DOMString literal); > > Number createNumber(in DOMString value); > > FunctionCall createFunctionCall(in DOMString name, in ExprList args); > > See createVariableReference Ok. > > Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step > > This should probably be parseExpression, since the Expr is the primary > construct. (See XPath spec - sect 1) Probably. I'm still not sure certain which start symbol is required in what applications. For the moment, I dropped parseLocationPath in favour of parseExpr. > > interface AbsoluteLocationPath:Expr{ > > /* '/' relative-opt, or '//' relative */ > > readonly attribute Expr relative; // step or relative path > > relative may be null (case of '/') Sure. That is implied in all cases where the grammar has option constructs. > > const unsigned short ANCESTOR = 1; > > const unsigned short ANCESTOR_OR_SELF = 2; > > const unsigned short _ATTRIBUTE = 3; // attribute is a keyword > > const unsigned short CHILD = 4; > > const unsigned short DESCENDANT = 5; > > const unsigned short DESCENDANT_OR_SELF = 6; > > const unsigned short FOLLOWING = 7; > > const unsigned short FOLLOWING_SIBLING = 8; > > const unsigned short NAMESPACE = 9; > > const unsigned short PARENT = 10; > > const unsigned short PRECEDING = 11; > > const unsigned short PRECEDING_SIBLING = 12; > > const unsigned short SELF = 13; > > Maybe suffix the types with '_AXIS'? All of them? Ok. > > interface AxisSpecifier:Expr{ > > readonly attribute unsigned short name; > > Should we use axisType just for consistancy? In the grammar, the non-terminal collecting them is AxisName, so I'm not sure what consistency really means here. > > const unsigned short COMMENT = 1; > > const unsigned short TEXT = 2; > > const unsigned short PROCESSING_INSTRUCTION = 3; > > const unsigned short NODE = 4; > > suffix of '_NODE_TEST' ?? So we get NODE_NODE_TEST? Try again :-) > > interface NodeTest:Expr{ > > readonly attribute unsigned short test; > > testType ?? Ok. I guess that also means we get axisType. [...] > > const unsigned short BINOP_UNION = 14; > > possibly ??_OPERATOR as apposed to BINOP_?? Ok. > > UnaryExpr createUnaryExpr(in Expr exp); > > > See factory functions above. Changed (using createNumericExpr(MINUS_OPERATOR, exp, None) instead). I'll release PyXPath 1.3 soon, which will also include a proposal for integration of XSLT match expressions. Then I'll try to patch 4XPath/4XSLT to use PyXPath. I won't change the attribute names in 4XPath to conform with the IDL, though, atleast for the moment. Regards, Martin From rnd@onego.ru Sun Jan 28 13:05:26 2001 From: rnd@onego.ru (Roman Suzi) Date: Sun, 28 Jan 2001 16:05:26 +0300 (MSK) Subject: [XML-SIG] I am confused... In-Reply-To: Message-ID: (for some reason I have not received replies from the list in my mailbox - but I'll try to answer on reading Martin's reply from Web-page) On Sun, 28 Jan 2001, Roman Suzi wrote: >- are Python XML tools (and which of them?) up to the task of facilitating >site-generation with bearable speed? I remember I was doing queries in the form "/article/author/name" - and it was so slow... (0.5 - 1 sec per query on Celeron 400) In my application I need many such queries to fill the template - that is why speed was unbearable. Please, tell me if I did it wrong: - parsed xml-file - quered each variable in a template-file from the xml-file - filled template with values found to produce web-page (some variables go to other pages, for example, content page) I am trying to learn XML for 2 years already but am still a newbie in practice. Anyway, before claiming XML tools for Python slow I need to recheck with new versions - if there are no objections to the above scheme. (And what is preferrable tool for queries? XPath?) Is there any on-line tutorial (?) or just example code to learn how to work efficiently with XML from Python? (Python is my favorite language while Java is not) I read code from xml.* but it doesn't give me clues for real usage. Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Patience is a virtue, it's just not one of my better virtues" _/ From Alexandre.Fayolle@logilab.fr Sun Jan 28 15:22:58 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Sun, 28 Jan 2001 16:22:58 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri Message-ID: Hello, I'm testing Narval with what is currently in the CVS for 4Suite and PyXML. I noticed a weird behaviour in 4DOM which is probably parser-related, so this is why I post here. If I build a DOM using the default non-validating parser, attributes that have no namespace are available by specifying an empty string as the namespace uri parameter to getAttributeNS(). Now, if I build a DOM using the default validating parser, using an empty string won't do the trick. Instead, I have to use None as the namespace uri. I think this is a problem with the sax2 driver for xmlproc, or maybe xmlproc itself. I'll look into it and submit a patch if I can figure it out. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From noreply@sourceforge.net Sun Jan 28 15:22:46 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jan 2001 07:22:46 -0800 Subject: [XML-SIG] [Patch #103470] drv_xmlproc reports None instead of empty ns-uri Message-ID: Patch #103470 has been updated. Project: pyxml Category: sax Status: Open Submitted by: afayolle Assigned to : nobody Summary: drv_xmlproc reports None instead of empty ns-uri ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=103470&group_id=6473 From uche.ogbuji@fourthought.com Sun Jan 28 15:46:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 28 Jan 2001 08:46:34 -0700 Subject: [XML-SIG] I am confused... In-Reply-To: Message from Roman Suzi of "Sun, 28 Jan 2001 16:05:26 +0300." Message-ID: <200101281546.IAA07482@localhost.localdomain> > (for some reason I have not received replies > from the list in my mailbox - but I'll try > to answer on reading Martin's reply from Web-page) > > On Sun, 28 Jan 2001, Roman Suzi wrote: > > >- are Python XML tools (and which of them?) up to the task of facilitating > >site-generation with bearable speed? > > I remember I was doing queries in the form > "/article/author/name" > - and it was so slow... (0.5 - 1 sec per query on Celeron 400) What size was the file? The time you mentioned is in line for using 4XPath on a 640KB file, as you can see in this demo: [uogbuji@borgia uogbuji]$ python Python 2.0 (#6, Oct 26 2000, 12:04:19) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> f = open("bigxml", "w") >>> f.write("
\n") >>> for i in range(10000): ... f.write("Uche OgbujiRoman Suzi ") ... >>> f.write("
\n") >>> f.close() >>> from Ft.Lib.cDomlette import RawExpatReader >>> reader = RawExpatReader() >>> doc = reader.fromUri("bigxml") >>> from xml.xpath import Evaluate >>> import time >>> start = time.time(); result = Evaluate("/article/author/name", contextNode=doc); end = time.time() >>> print end - start 1.24777603149 >>> len(result) 20000 >>> bigxml is 640K once generated. I don't think it's unreasonable for processing of that file that navigates through and extracts 20,000 nodes according to a path expression. If you cut the loop to generate only 100 author elements (6.4K file), the XPath only takes 0.018 seconds to execute. I'm curious to learn more about your data and the Python app you're using. You say not 4Suite so I assume you mean the old PyPath that used to come in PyXML. > In my application I need many such queries to fill > the template - that is why speed was unbearable. > > Please, tell me if I did it wrong: > > - parsed xml-file > - quered each variable in a template-file from the xml-file > - filled template with values found to produce web-page > (some variables go to other pages, for example, content page) > > I am trying to learn XML for 2 years already but am > still a newbie in practice. > > Anyway, before claiming XML tools for Python slow I need to recheck > with new versions - if there are no objections to the > above scheme. (And what is preferrable tool for queries? > XPath?) It depends on the nature of the queries. > Is there any on-line tutorial (?) or just example code > to learn how to work efficiently with XML from Python? > (Python is my favorite language while Java is not) > I read code from xml.* but it doesn't give me clues > for real usage. If you get 4Suite there are some examples in the demo directories. And you can always get help here. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Jan 28 16:16:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 28 Jan 2001 09:16:42 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from Alexandre Fayolle of "Sun, 28 Jan 2001 16:22:58 +0100." Message-ID: <200101281616.JAA07586@localhost.localdomain> > I'm testing Narval with what is currently in the CVS for 4Suite and > PyXML. I noticed a weird behaviour in 4DOM which is probably > parser-related, so this is why I post here. > > If I build a DOM using the default non-validating parser, attributes that > have no namespace are available by specifying an empty string as the > namespace uri parameter to getAttributeNS(). > > Now, if I build a DOM using the default validating parser, using an empty > string won't do the trick. Instead, I have to use None as the namespace > uri. > > I think this is a problem with the sax2 driver for xmlproc, or maybe > xmlproc itself. I'll look into it and submit a patch if I can figure it > out. Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default namespaces on unprefixes attributes were being returned as the namespace of the element). I thought None was an acceptable NSUri in Python SAX2. The docs certainly seem to think so. No big deal returning "" instead. I saw your patch. Have you checked this in, or should I? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Sun Jan 28 16:42:09 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Sun, 28 Jan 2001 17:42:09 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <200101281616.JAA07586@localhost.localdomain> Message-ID: On Sun, 28 Jan 2001, Uche Ogbuji wrote: > Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default > namespaces on unprefixes attributes were being returned as the namespace of > the element). > > I thought None was an acceptable NSUri in Python SAX2. The docs certainly > seem to think so. No big deal returning "" instead. Well I don't mind having None instead of '', but I'm certainly in favour of consistency. As long as empty ns uri always show up the same, this is fine by me. I was assuming None was 'wrong' only because I had always seen '' before (and all our code uses ''). > I saw your patch. Have you checked this in, or should I? I don't think I have write access on the PyXML cvs, since I'm not registered as a developer on the project, but correct me if I'm wrong. Narval (including a couple of kludges to work around bug #128860) with todays cvs snapshot of 4Suite and PyXML, and this patch works fine, so I'd say it works fine, as long as noone else is expecting None as a ns-uri. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From rnd@onego.ru Sun Jan 28 19:32:06 2001 From: rnd@onego.ru (Roman Suzi) Date: Sun, 28 Jan 2001 22:32:06 +0300 (MSK) Subject: [XML-SIG] I am confused... In-Reply-To: <200101281546.IAA07482@localhost.localdomain> Message-ID: On Sun, 28 Jan 2001, Uche Ogbuji wrote: >> On Sun, 28 Jan 2001, Roman Suzi wrote: >> >> >- are Python XML tools (and which of them?) up to the task of facilitating >> >site-generation with bearable speed? >> >> I remember I was doing queries in the form >> "/article/author/name" >> - and it was so slow... (0.5 - 1 sec per query on Celeron 400) > >What size was the file? The time you mentioned is in line for using 4XPath on >a 640KB file, as you can see in this demo: >1.24777603149 >>>> len(result) >20000 On my AMD k6-200 this is more than 2 times longer, but still impressing: python1.5 big.py 2.75321102142 >bigxml is 640K once generated. I don't think it's unreasonable for processing >I'm curious to learn more about your data and the Python app you're using. >You say not 4Suite so I assume you mean the old PyPath that used to come in >PyXML. I do not remember exact name. >> In my application I need many such queries to fill >> the template - that is why speed was unbearable. >> >> Anyway, before claiming XML tools for Python slow I need to recheck >> with new versions - if there are no objections to the >> above scheme. (And what is preferrable tool for queries? >> XPath?) > >It depends on the nature of the queries. Mostly of the type shown above. Sometimes with conditions. >> Is there any on-line tutorial (?) or just example code >> to learn how to work efficiently with XML from Python? >> (Python is my favorite language while Java is not) >> I read code from xml.* but it doesn't give me clues >> for real usage. > >If you get 4Suite there are some examples in the demo directories. And you >can always get help here. Thank you! Your example shows good performance of 4Suite tools. Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Patience is a virtue, it's just not one of my better virtues" _/ From dieter@handshake.de Sun Jan 28 20:25:36 2001 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 28 Jan 2001 21:25:36 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <278066535@toto.iv> Message-ID: <14964.32832.866565.161806@lindm.dm> Uche Ogbuji writes: > Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default > namespaces on unprefixes attributes were being returned as the namespace of > the element). Is this not correct? I interpreted the following phrase from the namespace spec in this direction: "Note that default namespaces do not apply directly to attributes." Dieter From Mike.Olson@fourthought.com Sun Jan 28 20:50:10 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 28 Jan 2001 13:50:10 -0700 Subject: [XML-SIG] I am confused... References: Message-ID: <3A748602.9935FEA2@FourThought.com> Roman Suzi wrote: > > > >- are Python XML tools (and which of them?) up to the task of facilitating > >site-generation with bearable speed? http://4suite.org is completly dynamic from XML. Infact there is one additionaly step we go from a set of RDF statements --> XML and then render it with XSLT. This is running on a Celeron 400, some times it gets a bit slow, but usally it is acceptable. > > I remember I was doing queries in the form > "/article/author/name" > - and it was so slow... (0.5 - 1 sec per query on Celeron 400) If you didn;t use 4Suite, then what did you use? I think there was an XPath implementation in PyXML but I know little about it. I know 4XPath performs pretty well. Going to the site again, there are hundreds of XPath expressions, but we still get resonable times. > > In my application I need many such queries to fill > the template - that is why speed was unbearable. What is you template? XSLT? If not have you thought of using it. It sounds like it was designed to do exactly what you need. > > Please, tell me if I did it wrong: > > - parsed xml-file > - quered each variable in a template-file from the xml-file > - filled template with values found to produce web-page > (some variables go to other pages, for example, content page) Again, it sounds like your doing a lot by hand that is not needed. You can do this in XSLT with a simple template like Article By <xsl:value-of select='author/name'/> The big advantage is that all of your XPath expressions can be relative to the current context. In the above example, the current context is already the article so you don't need to match on it again. > > I am trying to learn XML for 2 years already but am > still a newbie in practice. > > Anyway, before claiming XML tools for Python slow I need to recheck > with new versions - if there are no objections to the > above scheme. (And what is preferrable tool for queries? > XPath?) I'd definitly upgrade to latest versions. I'd also consider XSLT. > > Is there any on-line tutorial (?) or just example code > to learn how to work efficiently with XML from Python? > (Python is my favorite language while Java is not) > I read code from xml.* but it doesn't give me clues > for real usage. Were working on them. There are some demos that come with the code, but no real beginners tutorial. Hope this helps, Mike > > Sincerely yours, Roman Suzi > -- > Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 > _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ > _/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/ > _/ "Patience is a virtue, it's just not one of my better virtues" _/ > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Sun Jan 28 20:57:02 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 21:57:02 +0100 Subject: [XML-SIG] PyXPath 1.3 Message-ID: <200101282057.f0SKv2p08814@mira.informatik.hu-berlin.de> A new release of PyXPath is now available on http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.3.tgz In this release, the IDL is updated according to Jeremy's suggestion, and to include XSLT pattern support. In addition, the function pyxpath.CompilePattern was added to support parsing pattern expressions. I have updated the grammar for use with Yapps 2. Even though this generator provides a number of improvements, PyXPath was changed just so it compiles with Yapps 2; future release will make use of the Kleene star and other features where appropriate. Like previous releases, this requires a 4Suite installation to represent the expression in objects roughly according to the API. Regards, Martin From uche.ogbuji@fourthought.com Sun Jan 28 21:07:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 28 Jan 2001 14:07:34 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from Dieter Maurer of "Sun, 28 Jan 2001 21:25:36 +0100." <14964.32832.866565.161806@lindm.dm> Message-ID: <200101282107.OAA08130@localhost.localdomain> > Uche Ogbuji writes: > > Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default > > namespaces on unprefixes attributes were being returned as the namespace of > > the element). > Is this not correct? > > I interpreted the following phrase from the namespace spec > in this direction: > > "Note that default namespaces do not apply directly to attributes." Yes. And I fixed the driver to meet this. Prior to my fix, drv_xmlproc was returning the default namespace on unprefixed attributes in violation of XML Namespaces 1.0, and in particular, the portion you quoted. Now it returns None, or after I check in Alexandre's patch, "". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Sun Jan 28 22:05:11 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 23:05:11 +0100 Subject: [XML-SIG] I am confused... In-Reply-To: (message from Roman Suzi on Sun, 28 Jan 2001 16:05:26 +0300 (MSK)) References: Message-ID: <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de> > I remember I was doing queries in the form > "/article/author/name" > - and it was so slow... (0.5 - 1 sec per query on Celeron 400) What kind of API did you use? For simple queries like this, a SAX ContentHandler may be sufficient. Using Uche's bigxml file, you can try import xml.sax class NameRetriever(xml.sax.ContentHandler): def __init__(self): self.authors = [] self.in_author = self.in_name = 0 def startElement(self, tag, attrs): if tag=="author": self.in_author = 1 else: if self.in_author and tag == "name": self.in_name = 1 self.txt = "" def characters(self,str): if self.in_name: self.txt = self.txt+str def endElement(self,tag): if self.in_name and tag=="name": self.authors.append(self.txt) self.in_name=0 elif self.in_author and tag=="author": self.in_author=0 h = NameRetriever() start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time() print end - start print len(h.authors) To my own surprise, this is not as fast as the cDomlette; probably because the latter links directly with expat, and thus avoids a number of indirections. Still, it takes only three times as long (0.5s vs 1.4s on my machine), and it will work on any Python 2.0 installation. > Please, tell me if I did it wrong: > > - parsed xml-file > - quered each variable in a template-file from the xml-file > - filled template with values found to produce web-page > (some variables go to other pages, for example, content page) In general, that is ok - except that the description is unprecise. How did you parse? How did you query? How did you fill the template? > Anyway, before claiming XML tools for Python slow I need to recheck > with new versions - if there are no objections to the above > scheme. (And what is preferrable tool for queries? XPath?) It depends. A SAX ContentHandler may do in many cases - although it is apparently not necessarily faster than XPath over a fast DOM implementation. > Is there any on-line tutorial (?) or just example code > to learn how to work efficiently with XML from Python? To learn PyXML, there is a an online tutorial on the PyXML topic guide. To learn working efficiently is probably not something that can be taught in a tutorial - that is much a matter of experience. Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 28 22:23:24 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 23:23:24 +0100 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <200101281616.JAA07586@localhost.localdomain> (message from Uche Ogbuji on Sun, 28 Jan 2001 09:16:42 -0700) References: <200101281616.JAA07586@localhost.localdomain> Message-ID: <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de> > I thought None was an acceptable NSUri in Python SAX2. The docs > certainly seem to think so. What part of the docs specifically do you refer to, here? I think the None vs "" business is sufficiently confusing so it needs to be spelled out explicitly in all places. I do not think that applications should need to behave polymorphically, accepting either None or "". For SAX, the only explicit statement I could find is in the Java SAX spec: uri - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed. (http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/ContentHandler.html) So unless you found documentation that Python has to use None here, I'd say we have to clarify the SAX API that a missing namespace is represented as "". Unfortunately, the DOM specification has that different: # Note that because the DOM does no lexical checking, the empty # string will be treated as a real namespace URI in DOM Level 2 # methods. Applications must use the value null as the namespaceURI # parameter for methods if they wish to have no namespace. (1.1.8 of DOM 2 Core) This clearly means that a node without namespace has a null namespaceURI, according to http://python.sourceforge.net/devel-docs/lib/dom-type-mapping.html, this maps to None in Python. If everybody agrees that this is how it should be, we should document it as such where appropriate, and fix existing implementations accordingly. Regards, Martin From martin@mira.cs.tu-berlin.de Sun Jan 28 22:41:16 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 23:41:16 +0100 Subject: [XML-SIG] XSLT parser interface Message-ID: <200101282241.f0SMfGn09737@mira.informatik.hu-berlin.de> [This was sent to python-dev by mistake; my apologies - MvL] Based on my previous IDL interface for XPath parsers, I've defined an API for a parser that parsers XSLT pattern expressions. It is an extension to the XPath API, so I attach only the additional functions. Any comments are appreciated. Martin module XPath{ // XSLT exprType values const unsigned short PATTERN = 17; const unsigned short LOCATION_PATTERN = 18; const unsigned short RELATIVE_PATH_PATTERN = 19; const unsigned short STEP_PATTERN = 20; interface Pattern; interface LocationPathPattern; interface RelativePathPattern; interface StepPattern; interface PatternFactory:ExprFactory{ Pattern createPattern(in LocationPathPattern first); // idkey may be null, represents IdKeyPattern // if parent is true, it is '/', else '//' // rel may be null LocationPathPattern createLocationPathPattern(in FunctionCall idkey, boolean parent, in RelativePathPattern rel); // if parent is true, it is /, else // RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, boolean parent, in StepPattern step); StepPattern createStepPattern(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); }; typedef sequence LocationPathPatterns; interface Pattern:Expr{ readonly attribute LocationPathPatterns patterns; void append(in LocationPathPattern pattern); }; interface LocationPathPattern:Expr{ readonly attribute FunctionCall idkey; readonly attribute boolean parent; readonly attribute RelativePathPattern relative_pattern; }; interface RelativePathPattern:Expr{ readonly attribute RelativePathPattern relative; readonly attribute boolean parent; readonly attribute StepPattern step; }; interface StepPattern:Expr{ readonly attribute AxisSpecifier axis; readonly attribute NodeTest test; readonly attribute PredicateList predicates; }; interface XSLTParser:Parser{ Pattern parsePattern(in DOMString pattern); }; }; From Alexandre.Fayolle@logilab.fr Mon Jan 29 08:55:08 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 29 Jan 2001 09:55:08 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de> Message-ID: On Sun, 28 Jan 2001, Martin v. Loewis wrote: > spelled out explicitly in all places. I do not think that applications > should need to behave polymorphically, accepting either None or "". I could not agree more. > If everybody agrees that this is how it should be, we should document > it as such where appropriate, and fix existing implementations > accordingly. So to sum things up, this means that: * the patch to drv_xmlproc should be correct. I believe drv_expat should be already fine; * 4DOM/minidom/etc. should be updated to use None for the namespace uri; * applications using these implementation should be updated. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From larsga@garshol.priv.no Mon Jan 29 09:48:14 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Jan 2001 10:48:14 +0100 Subject: [XML-SIG] Update Python XML topic In-Reply-To: <200101262239.PAA28063@localhost.localdomain> References: <200101262239.PAA28063@localhost.localdomain> Message-ID: * Uche Ogbuji | | One question: the PyPointers link goes to | | http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html | | Which gives 404. Lars, is this still something you stillwant | listed? If so, where do I point to? Just remove it. That module implements a now obsolete XPointer syntax that is totally different from the current XPath-based one, and so really is useless. --Lars M. From larsga@garshol.priv.no Mon Jan 29 09:58:37 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Jan 2001 10:58:37 +0100 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: References: Message-ID: * Alexandre Fayolle | | If I build a DOM using the default non-validating parser, attributes | that have no namespace are available by specifying an empty string | as the namespace uri parameter to getAttributeNS(). Actually, I think this is something that is underspecified in both SAX and the DOM. We need to decide how to represent no namespace URI both in SAX and the DOM. At the moment I think both different SAX drivers and 4DOM/minidom disagree here. 4DOM/minidom also disagree in other parts of their Attributes implementations. I have, unfortunately, not had time to dig sufficiently into this to know the exact state of things, but please don't start changing the code until we have agreed what is the correct behaviour. My opinion is that names that have no namespace URI should be represented using None rather than "". --Lars M. From Alexandre.Fayolle@logilab.fr Mon Jan 29 10:20:10 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 29 Jan 2001 11:20:10 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message-ID: On 29 Jan 2001, Lars Marius Garshol wrote: > I have, unfortunately, not had time to dig sufficiently into this to > know the exact state of things, but please don't start changing the > code until we have agreed what is the correct behaviour. Do not worry about that: I just submitted a very quick patch for review, which enables Narval to work with the cvs HEAD code of 4Suite and PyXML and not too many kludges in the code to handle both conventions. It is now up to the PyXML developers to decide whether it will be applied or not. I agree that some agreement has to be reached first. And if the agreement is to use None, I'll change the code in Narval to match this, it's as simple as that (and a strong 'requires' statement on the download page, for this decision can break existing code). Martin pointed out some very interesting parts of the various specs, in another mail on this thread, which seem to clarify this point very much. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From rnd@onego.ru Mon Jan 29 10:59:29 2001 From: rnd@onego.ru (Roman Suzi) Date: Mon, 29 Jan 2001 13:59:29 +0300 (MSK) Subject: [XML-SIG] I am confused... In-Reply-To: <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de> Message-ID: On Sun, 28 Jan 2001, Martin v. Loewis wrote: I do not remember if this was what I used for measuring, but this was my another effort to create query-mechanisms (It doesnt work anymore due to lack of xml.dom.utils) -------------------- #!/usr/bin/python1.5 print "1. simple" from xml.dom.utils import FileReader from xml.dom.core import createDocument from string import split, index ELEMENT = 1 ATTRIBUTE = 2 TEXT = 3 CDATA_SECTION = 4 ENTITY_REFERENCE = 5 ENTITY = 6 PROCESSING_INSTRUCTION = 7 COMMENT = 8 DOCUMENT = 9 DOCUMENT_TYPE = 10 DOCUMENT_FRAGMENT = 11 NOTATION = 12 d = FileReader() dom = d.readFile('104.xml') def portr(node): typ = node.get_nodeType() value = node.get_nodeValue() name = node.get_nodeName() atts = node.get_attributes() par = node.get_parentNode() print "t ", typ, "v ",value, "n ",name, "a ", atts, "p ", par class strstream: def __init__(self, str): self.str = str # print "strstream init" def read(self, n): tmp = self.str[:n] self.str = self.str[n:] return tmp def readline(self): return self.str def _normalize_tokens(tl): """ rules: $,word,$ --> $word$ """ rules2 = { ("/","/") : "//", (".","/") : "./", ("!","=") : "$ne$", ("<","=") : "$le$", (">","=") : "$ge$", ("=","~") : "$match$", ("!","~") : "$no_match$", (";",";") : ";", } rules1 = { "=" : "$eq$", "!" : "$lt$", "<" : "$lt$", ">" : "$gt$", } ntl = [] i = 0 while i < len(tl)-1: if rules2.has_key( tuple(tl[i:i+2]) ): toapp = rules2[tuple(tl[i:i+2])] i = i+2 else: if tl[i] == "$": if i+2 < len(tl): toapp = tl[i] + tl[i+1] + tl[i+2] i = i+3 else: raise "Query error !!!" + `tl` else: toapp = tl[i] i = i+1 if rules1.has_key( toapp ): toapp = rules1[toapp] ntl.append( toapp ) return ntl def _parse_query(q): from shlex import shlex # i1 = index(q, "/") lexer = shlex(strstream(q)) tokens = [] tt = lexer.get_token() while tt: tokens.append(tt) tt = lexer.get_token() return _normalize_tokens(tokens) def find_all_descendants(node, cond): return None # XXX !!! stub def find_all_children(node, cond): lst = [] exec(cond) ### must define condition !!! for n in node.get_childNodes(): if condition(n): lst.append(n) return lst class PYQL: def __init__(self, file): d = FileReader() self.dom = d.readFile(file) if self.dom.get_nodeType() == DOCUMENT: self.docel = self.dom.get_documentElement() def query(self, q): # return self._query(self.docel, q) # return _parse_query(q) qr = self._query(self.docel, _parse_query(q), self.dom ) # ??? qel = self.dom.createElement("xql:result") if qr: qel.appendChild(qr) qel.setAttribute("orig", str(q)) return qel def _query(self, node, subq, qrdoc): # print subq print find_all_children(node, """def condition(n): return n.get_nodeName() == "fig" """) if subq[0] == "//": self._query(node, subq[1:], qrdoc) elif subq[0] == "/": if subq[1] == node.get_nodeName(): if len(subq) > 2: if subq[2] == "/": qel = qrdoc.createElement(node.get_nodeName()) for a in node.get_attributes().keys(): qel.setAttribute(a, node.get_attributes()[a].get_nodeValue()) for node1 in node.get_childNodes(): q2 = self._query(node1, subq[2:], qrdoc) # print "q2: ", q2 if q2: qel.appendChild(q2) if len(qel.get_childNodes())==0: del qel return None else: return qel else: return node else: return node else: return None a = PYQL('104.xml') # a.query('$or$ != 1.23E-4 /article/text/topic$') # print a.query('/article/text/topic.').toxml() print a.query('/article/text/figures/fig.').toxml() # print a.query('//fig.').toxml() ----------- It was naive attempt to write XQL for Python... >> I remember I was doing queries in the form >> "/article/author/name" >> - and it was so slow... (0.5 - 1 sec per query on Celeron 400) > >What kind of API did you use? For simple queries like this, a SAX >ContentHandler may be sufficient. Using Uche's bigxml file, you can >try >import xml.sax >class NameRetriever(xml.sax.ContentHandler): > def __init__(self): > self.authors = [] > self.in_author = self.in_name = 0 > > def startElement(self, tag, attrs): > if tag=="author": > self.in_author = 1 > else: > if self.in_author and tag == "name": > self.in_name = 1 > self.txt = "" > > def characters(self,str): > if self.in_name: > self.txt = self.txt+str > > def endElement(self,tag): > if self.in_name and tag=="name": > self.authors.append(self.txt) > self.in_name=0 > elif self.in_author and tag=="author": > self.in_author=0 > >h = NameRetriever() >start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time() >print end - start >print len(h.authors) The above code is what I avoid to do. I want my application to be completely data-driven, so even "/article/author/name" must not appear in the program! >To my own surprise, this is not as fast as the cDomlette; probably >because the latter links directly with expat, and thus avoids a number >of indirections. Still, it takes only three times as long (0.5s vs >1.4s on my machine), and it will work on any Python 2.0 installation. > >> Please, tell me if I did it wrong: >> >> - parsed xml-file >> - quered each variable in a template-file from the xml-file >> - filled template with values found to produce web-page >> (some variables go to other pages, for example, content page) > >In general, that is ok - except that the description is unprecise. How >did you parse? How did you query? How did you fill the template? My code above answer these questions. >> Anyway, before claiming XML tools for Python slow I need to recheck >> with new versions - if there are no objections to the above >> scheme. (And what is preferrable tool for queries? XPath?) > >It depends. A SAX ContentHandler may do in many cases - although it is >apparently not necessarily faster than XPath over a fast DOM >implementation. >> Is there any on-line tutorial (?) or just example code >> to learn how to work efficiently with XML from Python? > >To learn PyXML, there is a an online tutorial on the PyXML topic >guide. To learn working efficiently is probably not something that can >be taught in a tutorial - that is much a matter of experience. Thanks! I shall look there too. >Regards, >Martin Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "The tuna doesn't taste the same without the dolphin." _/ From rnd@onego.ru Mon Jan 29 13:33:26 2001 From: rnd@onego.ru (Roman Suzi) Date: Mon, 29 Jan 2001 16:33:26 +0300 (MSK) Subject: One more ques Re: [XML-SIG] I am confused... In-Reply-To: <3A748602.9935FEA2@FourThought.com> Message-ID: On Sun, 28 Jan 2001, Mike Olson wrote: And one more problem: my texts are far from plain ASCII. Do I need to convert them to utf8 or unicode before working with XML+XSLT+XPath? Do I need Python-2 to implement non US-ASCII site (and not latin-1)? >Roman Suzi wrote: I must admit I never had clearer answers for my questions as in this list! Even though I formulated my problem poorly, I received well-targeted answers which will help me tailor solution to my problem. >> In my application I need many such queries to fill >> the template - that is why speed was unbearable. > >What is you template? XSLT? If not have you thought of using it. It >sounds like it was designed to do exactly what you need. My templates are just fiels with %(var)s -style things inside. And thank you mentioning XSLT with referring to working site - I will see if this fit in my case. >> Please, tell me if I did it wrong: >> >> - parsed xml-file >> - quered each variable in a template-file from the xml-file >> - filled template with values found to produce web-page >> (some variables go to other pages, for example, content page) > >Again, it sounds like your doing a lot by hand that is not needed. You >can do this in XSLT with a simple template like > > Article By <xsl:value-of >select='author/name'/> > Wow! If it works as advertized - this is what I need. Can I also embed some python sentences there to handle hard cases? >The big advantage is that all of your XPath expressions can be relative >to the current context. In the above example, the current context is >already the article so you don't need to match on it again. >> >> I am trying to learn XML for 2 years already but am >> still a newbie in practice. >> >> Anyway, before claiming XML tools for Python slow I need to recheck >> with new versions - if there are no objections to the >> above scheme. (And what is preferrable tool for queries? >> XPath?) > >I'd definitly upgrade to latest versions. I did it already. >I'd also consider XSLT. >From what you have shown - sure. >> Is there any on-line tutorial (?) or just example code >> to learn how to work efficiently with XML from Python? >> (Python is my favorite language while Java is not) >> I read code from xml.* but it doesn't give me clues >> for real usage. > >Were working on them. There are some demos that come with the code, but >no real beginners tutorial. Demos are sometimes more valuable than tutorials. In fact, I feel a need to reread overviews on XML (XSLT, XPath, AFs etc) to have better idea what they do before looking at demos. >Hope this helps, > >Mike Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "The tuna doesn't taste the same without the dolphin." _/ From tpassin@home.com Mon Jan 29 14:32:45 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 29 Jan 2001 09:32:45 -0500 Subject: [XML-SIG] problem with empty namespace uri References: Message-ID: <002f01c08a00$58f295a0$7cac1218@reston1.va.home.com> Lars Marius Garshol wrote - > > My opinion is that names that have no namespace URI should be > represented using None rather than "". > I completely agree with this. If there is ***no*** namespace, the ns value should be None. The empty string should indicate that there is a namespace, but its value happens to be empty. Illustrations seem to be like this - someone help me out here, please. 1) No namespace is declared or used in the whole document, but SAX2 is in use. (ns='') 2) SAX 1 is in use. (ns=None) 3) Namespaces are used in the document, but not in some particular element. (ns='' for that element) 4) Namespaces are used in the document, but some particular element is a child of an element that declares a default namespace. (ns=default ns for that element). This leaves open the ns for an attribute in an element that declare a default ns - the old question that comes up over and over. I don't know the answer. Maybe tests like this: if ns: # Do your namespace stuff wouldn't add that much to the processing time. They would act the same on None and '' ns values. Of course, you could say, then why make a distinction. Maybe we don't need to. I thought this had been hashed out and resolved on the list a while ago, although I don't remember the details. This would be a perfect subject for one of those PEP-like pages I proposed a while ago. I'd like to resurrect that suggestion, and have this topic be the subject of the first one. What do you say? Cheers, Tom P From martin@mira.cs.tu-berlin.de Mon Jan 29 15:28:25 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 16:28:25 +0100 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: (message from Lars Marius Garshol on 29 Jan 2001 10:58:37 +0100) References: Message-ID: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de> > My opinion is that names that have no namespace URI should be > represented using None rather than "". That would be fine if agreed-upon, especially as it is consistent. I just point out that this would be another deviation from the Java, which then should be explicitly documented as such. I agree on your point to agree first, and change the code then :-) I'd go further to change the documentation before changing the code. Regards, Martin From martin@mira.cs.tu-berlin.de Mon Jan 29 15:41:43 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 16:41:43 +0100 Subject: [XML-SIG] I am confused... In-Reply-To: (message from Roman Suzi on Mon, 29 Jan 2001 13:59:29 +0300 (MSK)) References: Message-ID: <200101291541.f0TFfhC00861@mira.informatik.hu-berlin.de> > The above code is what I avoid to do. I want my application to be > completely data-driven, so even "/article/author/name" must not appear in > the program! I'll look into your code separately, but I'd like to make two points here: a) There is often a trade-off between data-driven and fast algorithms. Somebody will probably shoot me for that statement, but you should be willing to accept some performance degrading if you need it very general. b) In Python, it is often possible to transform a data-driven approach in one with explicitly coded decisions, due to the dynamic nature of the language. If all else fails, you could generate the a program from the data. c) I very much doubt that your *application* really needs to be completely data-driven; in any specific installation, there will be only a small set of queries. So that seems rather like a "nice to have" but a "must have" requirement. Well, that's three points :-) Regards, Martin From martin@mira.cs.tu-berlin.de Mon Jan 29 15:25:30 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 16:25:30 +0100 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: (message from Alexandre Fayolle on Mon, 29 Jan 2001 09:55:08 +0100 (CET)) References: Message-ID: <200101291525.f0TFPUe00830@mira.informatik.hu-berlin.de> > So to sum things up, this means that: > > * the patch to drv_xmlproc should be correct. I believe drv_expat should > be already fine; > * 4DOM/minidom/etc. should be updated to use None for the namespace uri; > * applications using these implementation should be updated. Right. Martin From larsga@garshol.priv.no Mon Jan 29 16:15:53 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Jan 2001 17:15:53 +0100 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de> References: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de> Message-ID: * Lars Marius Garshol | | My opinion is that names that have no namespace URI should be | represented using None rather than "". * Martin v. Loewis | | That would be fine if agreed-upon, especially as it is consistent. Yup. Tom Passin has said that he agrees; it would be nice if more people could post their opinions so that we have some idea of who agrees and who does not. I'd hate it if we changed this later on. | I just point out that this would be another deviation from the Java, | which then should be explicitly documented as such. I am aware of this, and agree that it should be documented as a deviation. | I agree on your point to agree first, and change the code then :-) | I'd go further to change the documentation before changing the code. I agree. I'll also change my book before we change the code. :-) --Lars M. From fdrake@acm.org Mon Jan 29 16:15:35 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 29 Jan 2001 11:15:35 -0500 (EST) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: References: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de> Message-ID: <14965.38695.931277.109716@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > Yup. Tom Passin has said that he agrees; it would be nice if more > people could post their opinions so that we have some idea of who > agrees and who does not. I'll support the move to use None, and can make the changes to the documentation in the Python Library Reference. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From ken@bitsko.slc.ut.us Mon Jan 29 16:45:58 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jan 2001 10:45:58 -0600 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Lars Marius Garshol's message of "29 Jan 2001 17:15:53 +0100" References: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de> Message-ID: Lars Marius Garshol writes: > * Lars Marius Garshol > | > | My opinion is that names that have no namespace URI should be > | represented using None rather than "". > > * Martin v. Loewis > | > | That would be fine if agreed-upon, especially as it is consistent. > > Yup. Tom Passin has said that he agrees; it would be nice if more > people could post their opinions so that we have some idea of who > agrees and who does not. +1 on None. -- Ken From martin@mira.cs.tu-berlin.de Mon Jan 29 16:34:20 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 17:34:20 +0100 Subject: [XML-SIG] I am confused... In-Reply-To: (message from Roman Suzi on Mon, 29 Jan 2001 13:59:29 +0300 (MSK)) References: Message-ID: <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de> > I do not remember if this was what I used for measuring, but > this was my another effort to create query-mechanisms > (It doesnt work anymore due to lack of xml.dom.utils) Thanks. I've ported it to minidom, see the code below. Fortunately, the DOM implementations follow the official API quite closely these days, so it is easy to move from one implementation to another. Using Uche's 640k document, I get the following timings: minidom: 6.4s 4DOM: 45s pDomlette: 8.9s cDomlette fails since it does not support createElement (pDomlette only has create*NS operations, so I added None as the namespace everywhere). Remember, this is the same machine where Uche's cDomlette/XPath query took 0.5s. So it *does* matter how exactly you approach a certain task (you can easily get a factor of 90 between solutions). However, if I had to guess in advance what the approximate outcome would have been in each of the solutions, I had been totally wrong. Regards, Martin #!/usr/local/bin/python print "1. simple" from xml.dom import minidom from string import split, index def portr(node): typ = node.nodeType value = node.nodeValue name = node.nodeName atts = node.attributes par = node.parentNode print "t ", typ, "v ",value, "n ",name, "a ", atts, "p ", par class strstream: def __init__(self, str): self.str = str # print "strstream init" def read(self, n): tmp = self.str[:n] self.str = self.str[n:] return tmp def readline(self): return self.str def _normalize_tokens(tl): """ rules: $,word,$ --> $word$ """ rules2 = { ("/","/") : "//", (".","/") : "./", ("!","=") : "$ne$", ("<","=") : "$le$", (">","=") : "$ge$", ("=","~") : "$match$", ("!","~") : "$no_match$", (";",";") : ";", } rules1 = { "=" : "$eq$", "!" : "$lt$", "<" : "$lt$", ">" : "$gt$", } ntl = [] i = 0 while i < len(tl)-1: if rules2.has_key( tuple(tl[i:i+2]) ): toapp = rules2[tuple(tl[i:i+2])] i = i+2 else: if tl[i] == "$": if i+2 < len(tl): toapp = tl[i] + tl[i+1] + tl[i+2] i = i+3 else: raise "Query error !!!" + `tl` else: toapp = tl[i] i = i+1 if rules1.has_key( toapp ): toapp = rules1[toapp] ntl.append( toapp ) return ntl def _parse_query(q): from shlex import shlex # i1 = index(q, "/") lexer = shlex(strstream(q)) tokens = [] tt = lexer.get_token() while tt: tokens.append(tt) tt = lexer.get_token() return _normalize_tokens(tokens) def find_all_descendants(node, cond): return None # XXX !!! stub def find_all_children(node, cond): lst = [] exec(cond) ### must define condition !!! for n in node.childNodes: if condition(n): lst.append(n) return lst class PYQL: def __init__(self, file): self.dom = minidom.parse(file) self.docel = self.dom.documentElement def query(self, q): qr = self._query(self.docel, _parse_query(q), self.dom) qel = self.dom.createElement("xql:result") if qr: qel.appendChild(qr) qel.setAttribute("orig", str(q)) return qel def _query(self, node, subq, qrdoc): #print subq #print find_all_children(node, #"""def condition(n): return n.nodeName == "fig" """) if subq[0] == "//": self._query(node, subq[1:], qrdoc) elif subq[0] == "/": if subq[1] == node.nodeName: if len(subq) > 2: if subq[2] == "/": qel = qrdoc.createElement(node.nodeName) for a in node.attributes.keys(): qel.setAttribute(a, node.attributes[a].nodeValue) for node1 in node.childNodes: q2 = self._query(node1, subq[2:], qrdoc) # print "q2: ", q2 if q2: qel.appendChild(q2) if len(qel.childNodes)==0: del qel return None else: return qel else: return node else: return node else: return None a = PYQL('bigxml') # a.query('$or$ != 1.23E-4 /article/text/topic$') # print a.query('/article/text/topic.').toxml() import time;start=time.time() res=a.query('/article/author/name.').toxml() print time.time()-start print len(res) # print a.query('//fig.').toxml() From Mike.Olson@fourthought.com Mon Jan 29 18:43:07 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 29 Jan 2001 11:43:07 -0700 Subject: One more ques Re: [XML-SIG] I am confused... References: Message-ID: <3A75B9BB.7EEAC2F6@FourThought.com> Roman Suzi wrote: > > On Sun, 28 Jan 2001, Mike Olson wrote: > > And one more problem: my texts are far from plain ASCII. > Do I need to convert them to utf8 or unicode before > working with XML+XSLT+XPath? > Do I need Python-2 to implement non US-ASCII site (and not latin-1)? It would certainly make life easier, but you should be able to use 1.5.2 > > My templates are just fiels with %(var)s -style things inside. > And thank you mentioning XSLT with referring to > working site - I will see if this fit in my case. It sounds like it will. I think it will help performance as well. You can precompile your stylesheets so there is almost no overhead for loading them. > > >> Please, tell me if I did it wrong: > >> > >> - parsed xml-file > >> - quered each variable in a template-file from the xml-file > >> - filled template with values found to produce web-page > >> (some variables go to other pages, for example, content page) > > > >Again, it sounds like your doing a lot by hand that is not needed. You > >can do this in XSLT with a simple template like > > > > > Article By <xsl:value-of > >select='author/name'/> > > > > Wow! If it works as advertized - this is what I need. > > Can I also embed some python sentences there to handle > hard cases? What kind of hard cases? XSLT is a lot more powerful then what I showed, there are for loops, variables, if statements. If you do reach the extent of what XSLT can do, then you can write extension functions and extension elements in Python. Cheers, Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rnd@onego.ru Mon Jan 29 19:10:09 2001 From: rnd@onego.ru (Roman Suzi) Date: Mon, 29 Jan 2001 22:10:09 +0300 (MSK) Subject: [XML-SIG] I am confused... In-Reply-To: <200101291541.f0TFfhC00861@mira.informatik.hu-berlin.de> Message-ID: On Mon, 29 Jan 2001, Martin v. Loewis wrote: >> The above code is what I avoid to do. I want my application to be >> completely data-driven, so even "/article/author/name" must not appear in >> the program! > >I'll look into your code separately, but I'd like to make two points >here: > >a) There is often a trade-off between data-driven and fast > algorithms. Somebody will probably shoot me for that statement, but > you should be willing to accept some performance degrading if you > need it very general. In C - yes, but in Python - I doubt. Data-driven programs are shorter, contain less errors and (IMHO) are faster. >b) In Python, it is often possible to transform a data-driven approach > in one with explicitly coded decisions, due to the dynamic nature > of the language. If all else fails, you could generate the a program > from the data. This is true. But This add more complexity. >c) I very much doubt that your *application* really needs to be > completely data-driven; in any specific installation, there will be > only a small set of queries. So that seems rather like a "nice to > have" but a "must have" requirement. I agree. I have this working now - but am not satisfied, because do like to make changes in one place instead of hunting them thruout many places. My points are (they are drived by laziness ;-) a) Software solution must be as general, as possible (I think its a myth that less general solutions are harder, longer to implement or are much less efficient: 2+2 is not easier than x+y, why hardcode x+x ?;-) b) One parameter change requires one change in the code ("write everything once") (if some nontrivial constant repeats in the code in the same role - its a variable ;-) c) Count total time of solution: time of programming + time of execution. (Not forgetting time of reprogramming!) (In my case I better wait 3 more seconds than make hell from supporting my solution) Now I am turning toward XML & co. because it happen to be a common data model to store such data I have for web-site. Anything else is reinventing the wheel. However, I want to apply the same design principles (expressed above) while dealing with XML. >Well, that's three points :-) I think this branch of discussion is kinda offtopic. Probably one day I will write a test for programmers where there will be questions like: #. What do you prefer more: a) if a == "1": b = "5" elif a == "4": b = "20" # ... else: b = "5000" b) b = {"1":"5", "4":"20", ..., "1000":"5000"}[a] c) b = str(int(a)*5) d) try: b = str(int(a)*5) except: b = "5000" :-) For now my answer is (d) but there are cases where (c or d) are not possible - then it will be (b). >Regards, >Martin Sincerely yours, Roman Suzi -- _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "The tuna doesn't taste the same without the dolphin." _/ From uche.ogbuji@fourthought.com Mon Jan 29 19:49:02 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 12:49:02 -0700 Subject: [XML-SIG] New articles up: XML Messaging Message-ID: <3A75C92E.CE803DE4@fourthought.com> [Sorry if you get multiple copies of this] I've started a series of tutorials on IBM developerWorks that will cover XML messaging. Python is the implementation language. The first two parts of it are up. Neither uses Python yet. The first is a background article http://www-106.ibm.com/developerworks/library/co-tutintro.html And then comes the first tutorial: on IDL (which is used to specify/document the XML messaging interfaces) http://www-105.ibm.com/developerworks/education.nsf/components-onlinecourse-bytitle/19CEA37A7099DFFC862569D50063163C?OpenDocument The actual tutorials require free registration at dW. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jeremy.kloth@fourthought.com Mon Jan 29 20:06:06 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Mon, 29 Jan 2001 13:06:06 -0700 Subject: [XML-SIG] problem with empty namespace uri References: Message-ID: <3A75CD2E.8BE10051@fourthought.com> Alexandre Fayolle wrote: > > On Sun, 28 Jan 2001, Martin v. Loewis wrote: > > > spelled out explicitly in all places. I do not think that applications > > should need to behave polymorphically, accepting either None or "". > > I could not agree more. > > > > > If everybody agrees that this is how it should be, we should document > > it as such where appropriate, and fix existing implementations > > accordingly. > > So to sum things up, this means that: > > * the patch to drv_xmlproc should be correct. I believe drv_expat should > be already fine; > * 4DOM/minidom/etc. should be updated to use None for the namespace uri; > * applications using these implementation should be updated. > Actually, the DOM spec says that objects created with the non-NS methods have the null namespaceURI, localName and prefix. So I would say that if the parser is running in NS mode, everything is created with the NS methods. That would mean that unprefixed attributes would have an '' for the namespaceURI and prefix. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dieter@handshake.de Mon Jan 29 17:53:09 2001 From: dieter@handshake.de (Dieter Maurer) Date: Mon, 29 Jan 2001 18:53:09 +0100 (CET) Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: <200101282107.OAA08130@localhost.localdomain> References: <200101282107.OAA08130@localhost.localdomain> Message-ID: <14965.44549.716772.938879@lindm.dm> Uche Ogbuji writes: > > Uche Ogbuji writes: > > > Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default > > > namespaces on unprefixes attributes were being returned as the namespace of > > > the element). > > Is this not correct? > > > > I interpreted the following phrase from the namespace spec > > in this direction: > > > > "Note that default namespaces do not apply directly to attributes." > > Yes. And I fixed the driver to meet this. Prior to my fix, drv_xmlproc was > returning the default namespace on unprefixed attributes in violation of XML > Namespaces 1.0, and in particular, the portion you quoted. Now it returns > None, or after I check in Alexandre's patch, "". I interpret this part differently: Default namespaces do not apply directly to attributes but indirectly via the element they belong to. If a have: then (at least semantically), "attr" delongs to the same namespace as "elem" (the namespace associated with "ns"). I am not sure, whether the application or the parser should make this namespace association for attributes. Dieter From uche.ogbuji@fourthought.com Mon Jan 29 20:39:48 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 13:39:48 -0700 Subject: [XML-SIG] I am confused... In-Reply-To: Message from "Martin v. Loewis" of "Sun, 28 Jan 2001 23:05:11 +0100." <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de> Message-ID: <200101292039.NAA11439@localhost.localdomain> > > I remember I was doing queries in the form > > "/article/author/name" > > - and it was so slow... (0.5 - 1 sec per query on Celeron 400) > > What kind of API did you use? For simple queries like this, a SAX > ContentHandler may be sufficient. Using Uche's bigxml file, you can > try > > import xml.sax > class NameRetriever(xml.sax.ContentHandler): > def __init__(self): > self.authors = [] > self.in_author = self.in_name = 0 > > def startElement(self, tag, attrs): > if tag=="author": > self.in_author = 1 > else: > if self.in_author and tag == "name": > self.in_name = 1 > self.txt = "" > > def characters(self,str): > if self.in_name: > self.txt = self.txt+str > > def endElement(self,tag): > if self.in_name and tag=="name": > self.authors.append(self.txt) > self.in_name=0 > elif self.in_author and tag=="author": > self.in_author=0 > > h = NameRetriever() > start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time() > print end - start > print len(h.authors) This one needs to go into the XML HOWTO as an example. We now have an XPath and SAX approach. It would be easy to add a DOM approach. I'll try to do it with the extra 3 hours the Devil offered me today in exchange for the pinkie fingernail of my soul. > To my own surprise, this is not as fast as the cDomlette; probably > because the latter links directly with expat, and thus avoids a number > of indirections. Still, it takes only three times as long (0.5s vs > 1.4s on my machine), and it will work on any Python 2.0 installation. Cool! I must confess that I would have guessed that SAX was close to cDomlette. Yes, PySAX does add quite a bit of overhead (which was one of the motivations for the PyExpat reader and cDomlette), but I would have though that the integration of the processing with the parsing would make up the advantage. Looks as if we might want to consider expanding cDomlette into a full-blown mutable DOM, though Mike and I are still discussing the best internal data structures. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 20:47:04 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 13:47:04 -0700 Subject: [XML-SIG] XSLT parser interface In-Reply-To: Message from "Martin v. Loewis" of "Sun, 28 Jan 2001 23:41:16 +0100." <200101282241.f0SMfGn09737@mira.informatik.hu-berlin.de> Message-ID: <200101292047.NAA11452@localhost.localdomain> > [This was sent to python-dev by mistake; my apologies - MvL] > > Based on my previous IDL interface for XPath parsers, I've defined an > API for a parser that parsers XSLT pattern expressions. It is an > extension to the XPath API, so I attach only the additional functions. > > Any comments are appreciated. > > Martin > > module XPath{ > // XSLT exprType values > const unsigned short PATTERN = 17; > const unsigned short LOCATION_PATTERN = 18; > const unsigned short RELATIVE_PATH_PATTERN = 19; > const unsigned short STEP_PATTERN = 20; I think we might want to space out these module-level constants a bit to allow for user extension. Or should all extensions use numbers above a certain ceiling? > interface Pattern; > interface LocationPathPattern; > interface RelativePathPattern; > interface StepPattern; > > interface PatternFactory:ExprFactory{ > Pattern createPattern(in LocationPathPattern first); > // idkey may be null, represents IdKeyPattern Minor nit, but it puzzled me for a few seconds. the comman above should be a colon, or just rephrase to "If idkey is non-Null, this is an IdKeyPattern > // if parent is true, it is '/', else '//' > // rel may be null > LocationPathPattern createLocationPathPattern(in FunctionCall idkey, > boolean parent, > in RelativePathPattern rel); > // if parent is true, it is /, else // > RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, > boolean parent, > in StepPattern step); > StepPattern createStepPattern(in AxisSpecifier axis, > in NodeTest test, > in PredicateList predicates); > }; Some of these take an approach that's a bit cute (for instance, the boolean parent idea), but since it's really a developer-only interface, this should be fine. > typedef sequence LocationPathPatterns; > interface Pattern:Expr{ > readonly attribute LocationPathPatterns patterns; > void append(in LocationPathPattern pattern); > }; > > interface LocationPathPattern:Expr{ > readonly attribute FunctionCall idkey; > readonly attribute boolean parent; > readonly attribute RelativePathPattern relative_pattern; > }; I forgot whether Expr defines a pprint method. If not, I think it should. this is a *very* handy debugging aid (and required by 4XDebug). > interface RelativePathPattern:Expr{ > readonly attribute RelativePathPattern relative; > readonly attribute boolean parent; > readonly attribute StepPattern step; > }; > > interface StepPattern:Expr{ > readonly attribute AxisSpecifier axis; > readonly attribute NodeTest test; > readonly attribute PredicateList predicates; > }; > > interface XSLTParser:Parser{ > Pattern parsePattern(in DOMString pattern); > }; > }; Other than that, looks great. Jeremy? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 20:48:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 13:48:07 -0700 Subject: [XML-SIG] Update Python XML topic In-Reply-To: Message from Lars Marius Garshol of "29 Jan 2001 10:48:14 +0100." Message-ID: <200101292048.NAA11470@localhost.localdomain> > > * Uche Ogbuji > | > | One question: the PyPointers link goes to > | > | http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html > | > | Which gives 404. Lars, is this still something you stillwant > | listed? If so, where do I point to? > > Just remove it. That module implements a now obsolete XPointer > syntax that is totally different from the current XPath-based one, and > so really is useless. K. There were some bugs in the docs I added anyway, so I have some more work to do there. And I get to test Martin's doupdate fixes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 20:50:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 13:50:42 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from Lars Marius Garshol of "29 Jan 2001 10:58:37 +0100." Message-ID: <200101292050.NAA11485@localhost.localdomain> > > * Alexandre Fayolle > | > | If I build a DOM using the default non-validating parser, attributes > | that have no namespace are available by specifying an empty string > | as the namespace uri parameter to getAttributeNS(). > > Actually, I think this is something that is underspecified in both SAX > and the DOM. We need to decide how to represent no namespace URI both > in SAX and the DOM. At the moment I think both different SAX drivers > and 4DOM/minidom disagree here. 4DOM/minidom also disagree in other > parts of their Attributes implementations. > > I have, unfortunately, not had time to dig sufficiently into this to > know the exact state of things, but please don't start changing the > code until we have agreed what is the correct behaviour. Will hold off. Too bad we don't have a dictator to Pronounce (if we were voting for one, I'd probably vote for Martin), but perhaps we're better off that way. If the tide continues in favor of None in the next few days, we'll consider it a Group Pronouncement. > My opinion is that names that have no namespace URI should be > represented using None rather than "". +1 for None -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 20:53:08 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 13:53:08 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from "Martin v. Loewis" of "Sun, 28 Jan 2001 23:23:24 +0100." <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de> Message-ID: <200101292053.NAA11496@localhost.localdomain> > > I thought None was an acceptable NSUri in Python SAX2. The docs > > certainly seem to think so. > > What part of the docs specifically do you refer to, here? I think the > None vs "" business is sufficiently confusing so it needs to be > spelled out explicitly in all places. I do not think that applications > should need to behave polymorphically, accepting either None or "". > > For SAX, the only explicit statement I could find is in the Java SAX > spec: > > uri - The Namespace URI, or the empty string if the element has no > Namespace URI or if Namespace processing is not being performed. > (http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/ContentHandler.html) IIRC, NULLs are more of a hazard in Java, so perhaps we needn;t worry about this divergence. > So unless you found documentation that Python has to use None here, > I'd say we have to clarify the SAX API that a missing namespace is > represented as "". > > Unfortunately, the DOM specification has that different: > > # Note that because the DOM does no lexical checking, the empty > # string will be treated as a real namespace URI in DOM Level 2 > # methods. Applications must use the value null as the namespaceURI > # parameter for methods if they wish to have no namespace. > (1.1.8 of DOM 2 Core) > > This clearly means that a node without namespace has a null > namespaceURI, according to > http://python.sourceforge.net/devel-docs/lib/dom-type-mapping.html, > this maps to None in Python. Yes. The DOM used to be very confused, allowing both empty string and null, but they cleaned this up, and 4DOM has followed suit. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 21:07:10 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 14:07:10 -0700 Subject: One more ques Re: [XML-SIG] I am confused... In-Reply-To: Message from Roman Suzi of "Mon, 29 Jan 2001 16:33:26 +0300." Message-ID: <200101292107.OAA11558@localhost.localdomain> > On Sun, 28 Jan 2001, Mike Olson wrote: > > And one more problem: my texts are far from plain ASCII. > Do I need to convert them to utf8 or unicode before > working with XML+XSLT+XPath? > Do I need Python-2 to implement non US-ASCII site (and not latin-1)? If you're using anything besides US-ASCII, I *stringly* suggest Python 2.0. > >> In my application I need many such queries to fill > >> the template - that is why speed was unbearable. > > > >What is you template? XSLT? If not have you thought of using it. It > >sounds like it was designed to do exactly what you need. Pretty much. > >Again, it sounds like your doing a lot by hand that is not needed. You > >can do this in XSLT with a simple template like > > > > > Article By <xsl:value-of > >select='author/name'/> > > > > Wow! If it works as advertized - this is what I need. > > Can I also embed some python sentences there to handle > hard cases? The easiest way to do this is through what's known as extension functions and extension elements. But you might be surprised at how much you can do without straying from XSLT. I would look in these places for inspiration http://www.ibiblio.org/xml/books/bible/updates/14.html http://www.zvon.org/xxl/XSLTutorial/Books/Book1/index.html http://www.jenitennison.com/xslt/index.html http://www.dpawson.co.uk/xsl/xslfaq.html http://www.w3schools.com/xsl/ > Demos are sometimes more valuable than tutorials. > In fact, I feel a need to reread overviews on XML (XSLT, XPath, AFs etc) > to have better idea what they do before looking at > demos. There's some of this in the demo directory of the 4Suite documentation, but also see the above links for examples all of which should work with 4XSLT. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 21:11:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 14:11:40 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from "Thomas B. Passin" of "Mon, 29 Jan 2001 09:32:45 EST." <002f01c08a00$58f295a0$7cac1218@reston1.va.home.com> Message-ID: <200101292111.OAA11580@localhost.localdomain> > Lars Marius Garshol wrote - > > > > My opinion is that names that have no namespace URI should be > > represented using None rather than "". > > > I completely agree with this. If there is ***no*** namespace, the ns value > should be None. The empty string should indicate that there is a namespace, > but its value happens to be empty. > > Illustrations seem to be like this - someone help me out here, please. > > 1) No namespace is declared or used in the whole document, but SAX2 is in use. > (ns='') Hmm. According to XMLNS 1.0, we shouldn't be differentiating this case. I'd say (ns=None) > 2) SAX 1 is in use. (ns=None) It's not really applicable to SAX1: no ns-aware interfaces. > 3) Namespaces are used in the document, but not in some particular element. > (ns='' for that element) OK. Now I'm confused. I guess you actually propose (ns='') to mean "no namespace on this name). > This leaves open the ns for an attribute in an element that declare a default > ns - the old question that comes up over and over. I don't know the answer. Pretty clear. The processor should report no namespace. It is up to the application to interpret differently, if it chooses to. > I thought this had been hashed out and resolved on the list a while ago, > although I don't remember the details. This would be a perfect subject for > one of those PEP-like pages I proposed a while ago. I'd like to resurrect > that suggestion, and have this topic be the subject of the first one. What do > you say? I think it's a great idea. For instance, the XPath API work could have been proposed and worked on in PEP fashion. The problem is getting someone to set it up. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 21:15:03 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 14:15:03 -0700 Subject: [XML-SIG] I am confused... In-Reply-To: Message from "Martin v. Loewis" of "Mon, 29 Jan 2001 17:34:20 +0100." <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de> Message-ID: <200101292115.OAA11617@localhost.localdomain> > > I do not remember if this was what I used for measuring, but > > this was my another effort to create query-mechanisms > > (It doesnt work anymore due to lack of xml.dom.utils) > > Thanks. I've ported it to minidom, see the code below. Fortunately, > the DOM implementations follow the official API quite closely these > days, so it is easy to move from one implementation to another. Ain't standardization coool? > Using Uche's 640k document, I get the following timings: > > minidom: 6.4s > 4DOM: 45s > pDomlette: 8.9s That chunky 4DOM. Who wrote that anyway? > cDomlette fails since it does not support createElement (pDomlette > only has create*NS operations, so I added None as the namespace > everywhere). Yeah. We're still debating adding mutation to cDomlette. This thread makes me inclined to do so. > Remember, this is the same machine where Uche's cDomlette/XPath query > took 0.5s. So it *does* matter how exactly you approach a certain task > (you can easily get a factor of 90 between solutions). However, if I > had to guess in advance what the approximate outcome would have been > in each of the solutions, I had been totally wrong. So would I. My guess would have been cDomlette = 1 SAX = 1.5 pDomlette (pyexpat reader) = 2 4DOM = 10 minidom = 2 As you can see, I was way off as well. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 21:18:03 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 14:18:03 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from Jeremy Kloth of "Mon, 29 Jan 2001 13:06:06 MST." <3A75CD2E.8BE10051@fourthought.com> Message-ID: <200101292118.OAA11636@localhost.localdomain> > Actually, the DOM spec says that objects created with the non-NS methods > have the null namespaceURI, localName and prefix. So I would say that > if the parser is running in NS mode, everything is created with the NS > methods. > > That would mean that unprefixed attributes would have an '' for the > namespaceURI and prefix. Hmm. Regardless of what the DOM says (I thought they'd unconfused themselves. I guess I was wrong), that we should keep the interface consistent between the element and attribute no-ns indicators in *SAX2*. The readers can conform to the DOM with some trivial extra effort. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Jan 29 21:22:38 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 29 Jan 2001 14:22:38 -0700 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Message from Dieter Maurer of "Mon, 29 Jan 2001 18:53:09 +0100." <14965.44549.716772.938879@lindm.dm> Message-ID: <200101292122.OAA11651@localhost.localdomain> > Uche Ogbuji writes: > > > Uche Ogbuji writes: > > > > Hmm. I introduced this behavior while fixing another drv_pyexpat bug (default > > > > namespaces on unprefixes attributes were being returned as the namespace of > > > > the element). > > > Is this not correct? > > > > > > I interpreted the following phrase from the namespace spec > > > in this direction: > > > > > > "Note that default namespaces do not apply directly to attributes." > > > > Yes. And I fixed the driver to meet this. Prior to my fix, drv_xmlproc was > > returning the default namespace on unprefixed attributes in violation of XML > > Namespaces 1.0, and in particular, the portion you quoted. Now it returns > > None, or after I check in Alexandre's patch, "". > I interpret this part differently: > > Default namespaces do not apply directly to attributes but > indirectly via the element they belong to. > > If a have: > > > > then (at least semantically), "attr" delongs to the same > namespace as "elem" (the namespace associated with "ns"). No, I think this much is pretty clear from authoritative discussion, even though the XMLNS 1.0 spec is stupidly vague on the matter. Based on my understanding of Tim Bray, James Tauber, etc, unprefixed attributes are *syntactically* in no namespace. It is up to the application to decide that it *semantically* shares the namespace of its owner element, and this determination is easy enough to determine even though it differs from the strict syntax. Basically, the XMLNS 1.0 processor should return a null namespace for attr in your example, but the appication is free to say "it's an attribute of elem, so I'll treat it as being in the {ns} namespace. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ken@bitsko.slc.ut.us Mon Jan 29 22:10:26 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 29 Jan 2001 16:10:26 -0600 Subject: [XML-SIG] problem with empty namespace uri In-Reply-To: Uche Ogbuji's message of "Mon, 29 Jan 2001 13:50:42 -0700" References: <200101292050.NAA11485@localhost.localdomain> Message-ID: Uche Ogbuji writes: > If the tide continues in favor of None in the next few days, we'll > consider it a Group Pronouncement. > > > My opinion is that names that have no namespace URI should be > > represented using None rather than "". > > +1 for None Another data point -- the XML Infoset says: For Element Information Items[1]: [namespace name] The namespace name, if any, of the element type. If the element does not belong to a namespace, this property is null. For Attribute Information Items[2]: [namespace name] The namespace name, if any, of the attribute. Otherwise, this property is null. -- Ken [1] [2] From martin@mira.cs.tu-berlin.de Mon Jan 29 22:16:08 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 23:16:08 +0100 Subject: [XML-SIG] XSLT parser interface In-Reply-To: <200101292047.NAA11452@localhost.localdomain> (message from Uche Ogbuji on Mon, 29 Jan 2001 13:47:04 -0700) References: <200101292047.NAA11452@localhost.localdomain> Message-ID: <200101292216.f0TMG8n00920@mira.informatik.hu-berlin.de> > > module XPath{ > > // XSLT exprType values > > const unsigned short PATTERN = 17; > > const unsigned short LOCATION_PATTERN = 18; > > const unsigned short RELATIVE_PATH_PATTERN = 19; > > const unsigned short STEP_PATTERN = 20; > I think we might want to space out these module-level constants a > bit to allow for user extension. We might want to do so for future revisions of XPath itself, so this is a good idea. > Or should all extensions use numbers above a certain ceiling? This is the general problem with a numeric type identification: you need UUIDs or otherwise not-conflicting strings (like the IDL repository IDs). However, this kind of identification appears to be W3C tradition. So requesting that user extensions use another range seems reasonable. > Minor nit, but it puzzled me for a few seconds. the comman above > should be a colon, or just rephrase to > > "If idkey is non-Null, this is an IdKeyPattern Ok. > Some of these take an approach that's a bit cute (for instance, the > boolean parent idea), but since it's really a developer-only > interface, this should be fine. No, please suggest a more natural interface - I'm no XSLT expert at all. The XPath tradition seems to be that everything with // is called "abbreviated", so it would be /* rel/step */ RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, in StepPattern step); /* rel//step */ AbbreviatedRelativePathPattern createAbbreviatedRelativePathPattern (in RelativePathPattern rel, in StepPattern step); but that does not sound much better. I don't mind revising my implementation, I did so a number of times when coming up with the interface initially. BTW, I find the grammar part of XSLT worded much worse than the one in XPath. E.g. there is no apparent concern for lexical issues, like when 'id' should be considered as an NCName and when it should be the pseudo-keyword of an IdKeyPattern. > I forgot whether Expr defines a pprint method. If not, I think it > should. It currently does not have anything except for the bare data model/abstract syntax. Adding methods would be the next step; I just added DOMString pprint(); to Expr. Evaluation needs more thought - atleast for me. > Other than that, looks great. Thanks! Martin From tpassin@home.com Tue Jan 30 02:26:20 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 29 Jan 2001 21:26:20 -0500 Subject: [XML-SIG] problem with empty namespace uri References: <200101292050.NAA11485@localhost.localdomain> Message-ID: <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> We translate "this property is null" to "this python property is None", right? Cheers, Tom P Ken MacLeod wrote - > > +1 for None > > Another data point -- the XML Infoset says: > > For Element Information Items[1]: > [namespace name] The namespace name, if any, of the element type. If > the element does not belong to a namespace, this property is null. > > For Attribute Information Items[2]: > [namespace name] The namespace name, if any, of the attribute. > Otherwise, this property is null. > From tpassin@home.com Tue Jan 30 06:43:57 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 30 Jan 2001 01:43:57 -0500 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> Message-ID: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> Here is a first draft of a PEP about the value for the namespace uri when it is "empty". I modeled the PEP after the Python PEP guidelines. Of course, the main Python PEPs are written in ascii and there is a python script to convert them to html. But for fun, I created an xml format. A stylesheet will follow as I get time. Other XML PEPs could be written in ascii if the author wants. Please comment - all useful material will find its way into the PEP. Especially, would someone please give the main arguments for using the empty string instead of None, and also if there are casees where one or the other shuld be used, please identify them. We need to have a home for these things - someone want to start a cvs branch, or should be just have it in the files section of the SF pyxml pages? Until there is a home, we can just keep including it in emails, I think. If anyone wants to take this one over, feel free, and add your name to the author list. Let's try to keep the discussion so that it can fit into the PEP (as extended, of course). The idea is that when stabilized, the PEP will be a permanent record of whatever pyxml has decided and why. We also need to figure out typical copyright statements (the Python PEP gudelines call for copyright statements). Finally, I invite everyone to suggest more topics for other PEPs we may find helpful. Cheers, Tom P ======================================================================= xmlpep-1 Values for Null Or Empty Namespace URIs 0.10 Draft Standards Track 29-Jan-2001 This PEP specifies the proper values of the Namespace URI property when its value might appear to be either "null", "None", or the empty string. The XMLPEP, when approved, will apply to all namespace-aware software maintained by the pyxml interest group. When no namespace has been declared whose scope applies to a particular element or attribute, the application MUST report the URI of the namespace of the element or attribute as None. When a namespace applies but its URI value is empty or null or None, the application MUST report the URI of the namespace value as None. This requirement does not apply for applications that are not namespace-aware. Applies to all XML processing software maintained by the pyxml interest group. This PEP is needed because of continued uncertainty among varous pyxml developers as to the proper values to use, and because of inconsistency among various pyxml products. Differences between Python, IDL, and Java make it difficult to interpret existing W3C Recommendations unambiguously in this regard. A definitive and consistent treatment is needed so that all the pyxml software may be brought into agreement. Most references in the Recommendations to the cases in question refer to "null" values. Python offers a data object well adapted to indicate such values. It is the None object. The None object can be tested for exactly as for an empty string: if uri: doYourThing() Alternatively, None can be tested for explicitely, as in: if uri is not None: doYourThing() Thus, None is flexible enough to be useful in this application. Should there be some situation in which the use of an empty string would be logical or advantageous, it would be clearly distinguishable from the normal case where the value is None. Future versions of this PEP should specifify clearly in what situations, if any, an empty string should be used in lieu of the None object. [Should there be a reference here to one particular processor, such as xmlproc?] This PEP may be used by anyone. From Alexandre.Fayolle@logilab.fr Tue Jan 30 08:06:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 09:06:06 +0100 (CET) Subject: [XML-SIG] dom implementations In-Reply-To: <200101292115.OAA11617@localhost.localdomain> Message-ID: On Mon, 29 Jan 2001, Uche Ogbuji wrote: > > Using Uche's 640k document, I get the following timings: > > > > minidom: 6.4s > > 4DOM: 45s > > pDomlette: 8.9s > > That chunky 4DOM. Who wrote that anyway? One thing you have to keep in mind is that 4DOM include features not available in other implementations, such as DOM L2 Events: each time you manipulate nodes, events get propagated up the DOM tree. This is a huge overhead, but it is so useful when displaying a DOM in a gui... > > cDomlette fails since it does not support createElement (pDomlette > > only has create*NS operations, so I added None as the namespace > > everywhere). > > Yeah. We're still debating adding mutation to cDomlette. This thread makes > me inclined to do so. This we would consider a good thing. We are considering switching from 4DOM to pDomlette for the kernel of Narval (after 1.0 is released), but cDomlette would be even better. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@mira.cs.tu-berlin.de Mon Jan 29 22:50:00 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 29 Jan 2001 23:50:00 +0100 Subject: One more ques Re: [XML-SIG] I am confused... In-Reply-To: <3A75B9BB.7EEAC2F6@FourThought.com> (message from Mike Olson on Mon, 29 Jan 2001 11:43:07 -0700) References: <3A75B9BB.7EEAC2F6@FourThought.com> Message-ID: <200101292250.f0TMo0r01083@mira.informatik.hu-berlin.de> > > And one more problem: my texts are far from plain ASCII. > > Do I need to convert them to utf8 or unicode before > > working with XML+XSLT+XPath? > > Do I need Python-2 to implement non US-ASCII site (and not latin-1)? > > It would certainly make life easier, but you should be able to use 1.5.2 Depending on the exact software package you are going to use, and the exact encoding that your documents have, it may or may not work. For example, expat only knows about Latin-1 and UTF-8. In Python 2, it will have access to the Python codecs, but they are not present in 1.5.2. If you use drv_xmllib, and later when you produce output, the list of supported encodings (from xml.unicode) is somewhat longer, but still limited. E.g. ISO-8859-5 is supported, KOI-8R is not; that would easy to add, though. Since they perform to-utf8 conversion anyway, it is probably best to recode to UTF-8 for 1.5.2 before parsing. Make sure that the recoding drops or changes any encoding= attribute in the xml header, though. Maybe you want to make an entire UTF-8 site :-? Many browsers display that fine these days, in my experience. Regards, Martin From rnd@onego.ru Tue Jan 30 08:13:57 2001 From: rnd@onego.ru (Roman Suzi) Date: Tue, 30 Jan 2001 11:13:57 +0300 (MSK) Subject: [XML-SIG] I am confused... In-Reply-To: <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de> Message-ID: On Mon, 29 Jan 2001, Martin v. Loewis wrote: >Using Uche's 640k document, I get the following timings: > >minidom: 6.4s >4DOM: 45s >pDomlette: 8.9s My computer has only 64M of RAM - so I was not able to measure anything because the system just dig into swap... (top showed 33M of memory used by Python... :-( >cDomlette fails since it does not support createElement (pDomlette >only has create*NS operations, so I added None as the namespace >everywhere). > >Remember, this is the same machine where Uche's cDomlette/XPath query >took 0.5s. So it *does* matter how exactly you approach a certain task >(you can easily get a factor of 90 between solutions). However, if I >had to guess in advance what the approximate outcome would have been >in each of the solutions, I had been totally wrong. > >Regards, >Martin > >#!/usr/local/bin/python > >print "1. simple" > >from xml.dom import minidom >from string import split, index > >def portr(node): > typ = node.nodeType > value = node.nodeValue > name = node.nodeName > atts = node.attributes > par = node.parentNode > print "t ", typ, "v ",value, "n ",name, "a ", atts, "p ", par > >class strstream: > def __init__(self, str): > self.str = str ># print "strstream init" > > def read(self, n): > tmp = self.str[:n] > self.str = self.str[n:] > return tmp > > def readline(self): > return self.str > >def _normalize_tokens(tl): > """ rules: > $,word,$ --> $word$ > """ > rules2 = { > ("/","/") : "//", > (".","/") : "./", > ("!","=") : "$ne$", > ("<","=") : "$le$", > (">","=") : "$ge$", > ("=","~") : "$match$", > ("!","~") : "$no_match$", > (";",";") : ";", > } > > rules1 = { > "=" : "$eq$", > "!" : "$lt$", > "<" : "$lt$", > ">" : "$gt$", > } > > ntl = [] > i = 0 > while i < len(tl)-1: > if rules2.has_key( tuple(tl[i:i+2]) ): > toapp = rules2[tuple(tl[i:i+2])] > i = i+2 > else: > if tl[i] == "$": > if i+2 < len(tl): > toapp = tl[i] + tl[i+1] + tl[i+2] > i = i+3 > else: > raise "Query error !!!" + `tl` > else: > toapp = tl[i] > i = i+1 > if rules1.has_key( toapp ): > toapp = rules1[toapp] > ntl.append( toapp ) > return ntl > >def _parse_query(q): > from shlex import shlex > # i1 = index(q, "/") > lexer = shlex(strstream(q)) > tokens = [] > tt = lexer.get_token() > while tt: > tokens.append(tt) > tt = lexer.get_token() > return _normalize_tokens(tokens) > >def find_all_descendants(node, cond): > return None # XXX !!! stub > >def find_all_children(node, cond): > lst = [] > exec(cond) ### must define condition !!! > for n in node.childNodes: > if condition(n): > lst.append(n) > return lst > >class PYQL: > def __init__(self, file): > self.dom = minidom.parse(file) > self.docel = self.dom.documentElement > > def query(self, q): > qr = self._query(self.docel, _parse_query(q), self.dom) > qel = self.dom.createElement("xql:result") > if qr: > qel.appendChild(qr) > qel.setAttribute("orig", str(q)) > return qel > > def _query(self, node, subq, qrdoc): > #print subq > #print find_all_children(node, > #"""def condition(n): return n.nodeName == "fig" """) > if subq[0] == "//": > self._query(node, subq[1:], qrdoc) > elif subq[0] == "/": > if subq[1] == node.nodeName: > if len(subq) > 2: > if subq[2] == "/": > qel = qrdoc.createElement(node.nodeName) > for a in node.attributes.keys(): > qel.setAttribute(a, node.attributes[a].nodeValue) > for node1 in node.childNodes: > q2 = self._query(node1, subq[2:], qrdoc) ># print "q2: ", q2 > if q2: > qel.appendChild(q2) > if len(qel.childNodes)==0: > del qel > return None > else: > return qel > else: > return node > else: > return node > else: > return None > > >a = PYQL('bigxml') ># a.query('$or$ != 1.23E-4 /article/text/topic$') ># print a.query('/article/text/topic.').toxml() >import time;start=time.time() >res=a.query('/article/author/name.').toxml() >print time.time()-start >print len(res) ># print a.query('//fig.').toxml() > Sincerely yours, Roman Suzi -- Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018 _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Tuesday, January 30, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Give instruction to a wise man and he will be yet wiser." _/ From uche.ogbuji@fourthought.com Tue Jan 30 08:32:55 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 01:32:55 -0700 Subject: [XML-SIG] Will gettext do? Message-ID: <200101300832.BAA13572@localhost.localdomain> OK. I've hacked away at it with a vengeance, and I nearly have gettext working in 4Suite on my computer, but I'm beginning to wonder whether gettext is not too brittle a solution. Basically, I changed the en_US.py files to MessageSource.py files, used the following globally: try: import gettext gettext.install('4Suite') except: def _(msg): return msg And wrapped all the strings in "_()". That was all the easy part. Then came the issue of building this thing. I ended up checking pygettext.py into Ft/admin, and importing the right objects (TokenEater, Options, etc.). After a lot of hacking, I got a usable distutils module that could prepare 4Suite.pot files and put them in the corresponding location in site-packages/Ft or whatever. I verified that all the messages were extracted and all that. Victory, right? Hell no! It turns out that the .pot files are useless. Even Python's gettext module requires GNU gettext and the binary .mo files. So first of all, this seems a non-starter in Windows. So I wandered off to find out how to make these .mo files. Never mind that I can't bloody get the GNU gettext command-line processor to do anything regardless of how many options and environment variables I throw at it. It looks as if even if I get it to work, I'm going to need full access to /usr/share/locale on the machine. So it's also a non-starter unless one can get root to install it. I assume I'm missing a great deal here, because if not, I don't see how pygettext is usable as a general i18n solution. And I've read and re-read the Python 2.0 gettext docs. What I can follow of it isn't very promising. Help? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Tue Jan 30 08:38:49 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 30 Jan 2001 09:38:49 +0100 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs In-Reply-To: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> Message-ID: <200101300838.f0U8cn701268@mira.informatik.hu-berlin.de> > Here is a first draft of a PEP about the value for the namespace uri > when it is "empty". One more comment: The discussion started with a specific patch for a SAX driver, and it circled around how things are done in SAX and DOM. So I think this PEP should explicitly elaborate what specific parameters in the SAX and DOM APIs are treated in what way. Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 30 08:36:23 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 30 Jan 2001 09:36:23 +0100 Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs In-Reply-To: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200101292050.NAA11485@localhost.localdomain> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> Message-ID: <200101300836.f0U8aN501266@mira.informatik.hu-berlin.de> > When a namespace applies but its URI value is empty or null or None, > the application MUST report the URI of the namespace value as None. I'm not sure what this means. In section 2 of REC-xml-names-19990114, they write # If the attribute name matches PrefixedAttName, then the NCName gives # the namespace prefix, used to associate element and attribute names # with the namespace name in the attribute value in the scope of the # element to which the declaration is attached. In such declarations, # the namespace name may not be empty. In section 4, they say # The namespace prefix, unless it is xml or xmlns, must have been # declared in a namespace declaration attribute in either the # start-tag of the element where the prefix is used or in an an # ancestor element (i.e. an element in whose content the prefixed # markup occurs). So how could it ever happen that "a namespace applies but its URI value is empty or null or None"? > This requirement does not apply for applications that are not > namespace-aware. What exactly does that mean? The XMLNS recommendation specifies what it means that documents conform to it. Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 30 08:44:09 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 30 Jan 2001 09:44:09 +0100 Subject: [XML-SIG] I am confused... In-Reply-To: (message from Roman Suzi on Tue, 30 Jan 2001 11:13:57 +0300 (MSK)) References: Message-ID: <200101300844.f0U8i9r01342@mira.informatik.hu-berlin.de> > My computer has only 64M of RAM - so I was not able to measure anything > because the system just dig into swap... That is another good reason to use SAX-based processing: In a DOM-based approach, you typically need to build an internal representation of the entire document first. It would still be possible to work out a data-driven algorithm, but it would be more limited (e.g. you couldn't go backwards in the document, or perform multiple subsequent transformations). Regards, Martin From larsga@garshol.priv.no Tue Jan 30 09:02:50 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jan 2001 10:02:50 +0100 Subject: [XML-SIG] dom implementations In-Reply-To: References: Message-ID: * Alexandre Fayolle | | One thing you have to keep in mind is that 4DOM include features not | available in other implementations, such as DOM L2 Events: each time | you manipulate nodes, events get propagated up the DOM tree. This is | a huge overhead, but it is so useful when displaying a DOM in a | gui... It is tempting to look into ways of not having to pay this huge penalty when you don't use that feature. I've come across similar problems many times when doing Python programming and wish there were a general solution. Features from Aspect-Oriented Programming, or CLOS, would be nice. --Lars M. From uche.ogbuji@fourthought.com Tue Jan 30 09:06:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 02:06:09 -0700 Subject: [XML-SIG] XSLT parser interface In-Reply-To: Message from "Martin v. Loewis" of "Mon, 29 Jan 2001 23:16:08 +0100." <200101292216.f0TMG8n00920@mira.informatik.hu-berlin.de> Message-ID: <200101300906.CAA13618@localhost.localdomain> > > > module XPath{ > > > // XSLT exprType values > > > const unsigned short PATTERN = 17; > > > const unsigned short LOCATION_PATTERN = 18; > > > const unsigned short RELATIVE_PATH_PATTERN = 19; > > > const unsigned short STEP_PATTERN = 20; > > > I think we might want to space out these module-level constants a > > bit to allow for user extension. > > We might want to do so for future revisions of XPath itself, so this > is a good idea. > > > Or should all extensions use numbers above a certain ceiling? > > This is the general problem with a numeric type identification: you > need UUIDs or otherwise not-conflicting strings (like the IDL > repository IDs). However, this kind of identification appears to be > W3C tradition. So requesting that user extensions use another range > seems reasonable. How about users get numbers above 1000? > > Some of these take an approach that's a bit cute (for instance, the > > boolean parent idea), but since it's really a developer-only > > interface, this should be fine. > > No, please suggest a more natural interface - I'm no XSLT expert at > all. The XPath tradition seems to be that everything with // is called > "abbreviated", so it would be Not quite. Abbreviated is any abbreviation, "//" just being one (abbr for '/descendant-or-self::node()/'). I think what might be throwing you off is the inconsistent modularization in 4XPath. The various Parsed* classes were pretty much thrown into modules at random without much inconsistency, and it makes things like "Abbreviated" take on significance that they shouldn't have. I've wanted to clean this up for a while, but we've always been short on time. I think your confusion and the changes we're making are good enough reason to finally neaten things up. Here are my suggested mods to your interface interface PatternFactory:ExprFactory{ Pattern createPattern(in LocationPathPattern first); // idkey or step must be null // if left is null, it's an absolute pattern LocationPathPattern locationPathPattern(in locationPathPattern left, in locationPathPattern right, in StepPattern step, in FunctionCall idkey); StepPattern createStepPattern(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); }; I'm not even sure of this. I'll want to talk some things over with Mike and Jeremy tomorrow. For one thing, I wonder whether we don't have too many "Parsed*" classes. Some things look as if they could be parameterized in combined classes. For instance, the separation of "Absolute*". Also, I wonder whether in the general case, the parser should expand abbreviations, or whether they should be reported as is to the engine. My first inclination is to make the parser do the expansion. > but that does not sound much better. I don't mind revising my > implementation, I did so a number of times when coming up with the > interface initially. Yes. One thing about the messy XPath BNF is that it doesn't suggest a model very straightforwardly. > BTW, I find the grammar part of XSLT worded much worse than the one in > XPath. E.g. there is no apparent concern for lexical issues, like when > 'id' should be considered as an NCName and when it should be the > pseudo-keyword of an IdKeyPattern. We've often wondered about this part ourselves. It looks as if they are trying to make a distinction between "id" and "key" and other function calls at a syntactic layer, when IMO it should be at the semantic layer. This wouldn't be so bad if the XPath parser state machine wasn't already so chaotic. > > I forgot whether Expr defines a pprint method. If not, I think it > > should. > > It currently does not have anything except for the bare data > model/abstract syntax. Adding methods would be the next step; > I just added > > DOMString pprint(); > > to Expr. Evaluation needs more thought - atleast for me. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Tue Jan 30 09:04:46 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 30 Jan 2001 10:04:46 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: <200101300832.BAA13572@localhost.localdomain> (message from Uche Ogbuji on Tue, 30 Jan 2001 01:32:55 -0700) References: <200101300832.BAA13572@localhost.localdomain> Message-ID: <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> > It turns out that the .pot files are useless. Even Python's gettext > module requires GNU gettext and the binary .mo files. Sure, it requires .mo files - but why GNU gettext? > So I wandered off to find out how to make these .mo files. Never > mind that I can't bloody get the GNU gettext command-line processor > to do anything regardless of how many options and environment > variables I throw at it. Do you already got .mo files? The tool to create them is msgfmt, not gettext (that utility reads .mo files). Of course, there is not much fun to generate a binary catalog if it has no translations. So you'd first produce 4Suite.de.po, send it to me, and I send it back to you filled with German translations. Then you use msgfmt. > So first of all, this seems a non-starter in Windows. Why, again, is that? To format the catalog? Please have a look at Tools/i18n/msgfmt.py. > It looks as if even if I get it to work, I'm going to need full > access to /usr/share/locale on the machine. gettext will look in /usr/share/locale for catalogs by default, yes - unless you've called bindtextdomain before that. Of course, if you know the specific catalog to use, you can also instanciate gettext.GNUTranslations directly. > I assume I'm missing a great deal here That is my theory also :-) > And I've read and re-read the Python 2.0 gettext docs. Which documentation specifically? And what specific passages made you despair (sp?). Regards, Martin From uche.ogbuji@fourthought.com Tue Jan 30 09:10:30 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 02:10:30 -0700 Subject: [XML-SIG] I am confused... In-Reply-To: Message from "Martin v. Loewis" of "Tue, 30 Jan 2001 09:44:09 +0100." <200101300844.f0U8i9r01342@mira.informatik.hu-berlin.de> Message-ID: <200101300910.CAA13661@localhost.localdomain> > > My computer has only 64M of RAM - so I was not able to measure anything > > because the system just dig into swap... > > That is another good reason to use SAX-based processing: In a > DOM-based approach, you typically need to build an internal > representation of the entire document first. Not in 4Suite 0.10.2 you won't. DbDOM is undergoing some serious surgery. It will have pretty nifty swapping of nodes in and out of memory courtesy of 4ODS. Of course, it will be slower than even 4DOM, but as ever, that's the trade-off. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Tue Jan 30 09:13:40 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 10:13:40 +0100 (CET) Subject: [XML-SIG] dom implementations In-Reply-To: Message-ID: On 30 Jan 2001, Lars Marius Garshol wrote: > * Alexandre Fayolle > | > | One thing you have to keep in mind is that 4DOM include features not > | available in other implementations, such as DOM L2 Events: each time > | you manipulate nodes, events get propagated up the DOM tree. This is > | a huge overhead, but it is so useful when displaying a DOM in a > | gui... > > It is tempting to look into ways of not having to pay this huge > penalty when you don't use that feature. I've come across similar > problems many times when doing Python programming and wish there were > a general solution. I thought about this when I added Events to 4DOM, and finally did not implement a way to disable them, because I was a bit in a hurry at that time. I now see two solutions possible solutions: * have the Document instance be aware of existing listeners, and let the propagation methods query the document before actually propagating anything (this would completely disable Event propagation if noone is listening) * use the the hasFeature method of the DOM implementation to see if we want DOM L3 events (a bit as is done in the 4DOM test suite where namespaces are disabled at some point), and let the propagatoin know if it is expected to propagate anything. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Tue Jan 30 09:16:35 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 02:16:35 -0700 Subject: [XML-SIG] Will gettext do? References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> Message-ID: <3A768673.16ABCE06@fourthought.com> "Martin v. Loewis" wrote: > Which documentation specifically? And what specific passages made you > despair (sp?). I've got to go to bed (up at bloody 6:00 a.m.when the nipper wakes up), but I wanted to first point out the culprit that seems to have led me so far astray See http://python.sourceforge.net/devel-docs/lib/node160.html Which seems to suggest that you need GNU and makes no mention of msgfmt.py I read the whole gettext section and I don't think I ever say msgfmt.py mentioned. I even tried checking out http://www.iro.umontreal.ca/contrib/po-utils/HTML But the key pages kept timing out. It does look as if you have answers, so I'll be back at it tomorrow. Thanks and good night. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Tue Jan 30 09:23:17 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 10:23:17 +0100 (CET) Subject: [XML-SIG] Will gettext do? In-Reply-To: <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> Message-ID: On Tue, 30 Jan 2001, Martin v. Loewis wrote: > Do you already got .mo files? The tool to create them is msgfmt, not > gettext (that utility reads .mo files). Of course, there is not much > fun to generate a binary catalog if it has no translations. So you'd > first produce 4Suite.de.po, send it to me, and I send it back to you > filled with German translations. Then you use msgfmt. If you need french translators, I guess you know where to look for them... Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From larsga@garshol.priv.no Tue Jan 30 10:26:25 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jan 2001 11:26:25 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: References: Message-ID: * Alexandre Fayolle | | [to Uche] | If you need french translators, I guess you know where to look for | them... BTW: xmlproc supports localization of its error messages, using a home-spun mechanism, which is far less powerful than gettext, but seems to do the job. Currently it has error messages in English, Norwegian and Swedish. Contributions of translations to any other language would be most welcome. All you need to make a translation can be found in xml/parsers/xmlproc/errors.py Just in case anyone is interested. --Lars M. From Alexandre.Fayolle@logilab.fr Tue Jan 30 11:49:40 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 12:49:40 +0100 (CET) Subject: [XML-SIG] Will gettext do? In-Reply-To: Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---1463794431-682694632-980855380=:26847 Content-Type: TEXT/PLAIN; charset=US-ASCII On 30 Jan 2001, Lars Marius Garshol wrote: > Currently it has error messages in English, Norwegian and Swedish. > Contributions of translations to any other language would be most > welcome. > > All you need to make a translation can be found in > > xml/parsers/xmlproc/errors.py > > > Just in case anyone is interested. You'll find a french translation of the message table attached to this mail. Comments are welcome. Just to be sure : msg 4003 : PE == Parsed Entity ? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). ---1463794431-682694632-980855380=:26847 Content-Type: APPLICATION/x-gzip; name="xmlproc_errors_french.py.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="xmlproc_errors_french.py.gz" H4sICA6pdjoCA3htbHByb2NfZXJyb3JzX2ZyZW5jaC5weQCtWF1v5LYVffev IGIspgXsxYw0s2sbbdHGHgMD2K6x9m43D3mgJY7NQh8TUprYKPpfg77E6kN+ Q84lRZEayets2gAb71q8l/fj3HPv5T5bKlUqzWTBzpUokoe9vbX5+ed/sX32 t0w88iJVgp3zpzLLxAG7KO9lxu/e+m9v229/zdpPa7XH8N8+uxCaVULl+LFW vPiZS83qSmZSN5rpsqhYJRX+mgqWTcp6q/i9aEU/X17gLl7c41dMVIxvNplM eCXLQh/gbg6DL2XywFV6wJZPigzQrezq5tsrFh1Gs+hwejxdRIdHe+2Xw8ND 9g+uClnc6xM2m06nh7Pj42Pzlf51wr65Vs1aPgqyKS1zXCNYUeay4Bl+Fixt koyrhk3e6Mk3B04wguCySMqUrKVjut5sSlXtnotx7uapqPj4BeWdLrPnSniB BQROueJJ9YwkFHXeqBI600n/lpQXmmWcwfaGsme0J07Oq3sHdReTJmtygeCT bYyzGhYgJRWEJryqlLyrK81yStaGm+RYp03wcaYV91rfG61OVFi9QleQW8tC sk1WQ82E7lmXUnvBIwhelulzZu0FIERRM57fyfsap4KkXXGAaMPh2I9t+qyO Y5MzwtnG5k0HcdVtYOtHhqs3ot6S1/YaqVgWhIhNTro0HU9nxqGPH1YsrYco IP2yMrH5bwXRrUyFl42MPc6cUQUTio0JrQWTF46tME4GuQgE6kL+UAsUg3pG ZSkuK2EymZLfu073I/iJZzKV1RMTpuBPWETojxz6I4v+C862PBO1sjXZmWBS WlCOlEIiyiI19vxEoGsFZL4pdeMiAX2zIdhsGiq2RoKVUJBPUNsCLIKIesmo J4kUUfhMJJO6p2zMHHfcxjawJx6xZ8KNYEM11EsGzs+H5ykPRGiyaA5shcgw n0mZbzIROLJowZmWRdFQaoSlPTrM66pUYEJh5HKPfyoFRKVfZpEr3p2UfMkB W5ers7bQNxug/ZfKlKNExrSpxpY7RBdcL0/leSNqECujP/gmq0Yb5gE9t55Q NLTo1MCvjkKWV7er2++8vmMLsKKsLJW85IHP2Wza91mJH2qY3OWC3+nQ4tms f3x19mF5zjalhKEaEK0L/AqtTjyC7nibVC8djSfcUf44jUGOgPURbIxQseBu uKVY9bQRA86EzLxFho8WBArhSMoQy/756vPyjJWgxG0pTUm2tUaaQa6AkNdI WHvMsxNLk4aijBZIGSMa0z1M5L3Qu6GQvctC0l4Hci21lneU3UkqcG1WTagp TzaoPqG2Iggiga5NrGetzvAO9FCJf+4wjNdy9BoXef+QVOLLLfFbgJzjV9ks sI9BFOheS27IdExhNP0ahb6jBpp1q1q3uj2EIgvc0ArrZdvXf8ulu/ZG/Vro V52dwgTp7j4E443opyOy1GkJoKv/r1IUDmAiyw7XJcbCFO1Kdw0ppoYUu4a0 j3kUWQA46Qrb7mPbpFa5wyPiS2MjNXMQP5QBVEnP+Nj2oVOMjZWqE2OfwXaR wDhD4f/k5gdPHsS2457YdqFu9qLKtxOIIQeEFV2r1qMdLLa9JjQTlMGlChuW cmaSr3B29flyecJuH2BJXiOldwIOoW8Ip3LuCgt8YH42FZedJf7qhTuH7+Hn nahuVJm8RRpPEYyyuOYKhexUECvcfHdzu7yky64/fnuxOn3puve2LHojIhhH PFaOT9q+A4dFgblI5BKd3yCb6sSCyus7GuojsL2uareBxbbhXHfdLlCpAxud IUTfA3NMA3pBhTFrvHnGthWtMiCsoImNalgoLWl+9nKBm0VJ+rx4ZHpKf/SW RdpsEH3Qg/AOmDaMDicUxruOU+Lp3A0fhnFoqLAMDkoKaoHU+DnSVsbP3Iwo ik2Ak4m9AGsaiL4J1BNIPpFPVgt5Fe4krpBegN01flK9OsyZJtqrUpllzT14 z8eEKuBcFnbCbydBzNk5r2owzQG4cWe4c9S4xpAS5HTxih4zmu+Mn156bAwz wPyPpN6BZSOYDmLbECkNrT4d7iIBoDG7TpDPRsmO6V+y4Oj/uBTGtlN+7CoR xxNAXW6xZrRDWSWSKiDGyK1cnWKiwTSYEAmXilaRHWz6IdENFsHG6fXPwm7z 2pAY2043+ZMd1txlv/Gq2LLNHTdNjGgauDCjSbsc043bGqDgGT2TUC1/gXyj 3s4wMB29SijPEZGh6u+//8ukt1QGoOic0CKxpJPJChiBS4EThLAri4Iww4hF bybAyaNh5kKJYcIEVckDef4lREbHrd4RM3+nyvhVkP1OvbPX9PbWm13sos30 R3lojCzTPwEtuDnHcCw3hmg9t+tuCk9HekUcj/fQnfM9fDtsvOinI0tPbCYu nJ3dnlktwem5bTfON0ajEK/xfwO+TOY2ApNewx5jdijvjRKxa0IuugTt/LlS oseD3VtK2ri8NAHC43evJS1QO845437H7/9XKnNTBE1rWP7b+dBfcBRyWWel Xy5lgbW/qAOTLB3b7AeLxNgyZlbgHQqau8Lp5JKHUj7Supa53xo46gZkZjwO EmGSkDcZL+7DBMxnuwUOqxEH/LUQWfaVUZ+3eG+V7eja7duBnF2aMVMpmO44 eueh0b9stHL78O+xovVi/n4wkShxX6PmkGZaZbVxrXrA3PyjoNZaVrisQh2k 1gT73HgFwV1utY+GnoW37WTkzrE/vNF/9IejVg2WCvdsPNRoRyKayCqzPAxP LKyaO3x8EI8cBCJzno0cfOcPDj9SEXzohoSfiOQ8UofHCdK39lmjm4KGpwjG ZyYtlNmxp4svypvB+1O79vpz9hGjGTk/M/d5AqWXBky+AweCRXRlcIl4uQ10 Thvo3G2gc7ttLrGx0fOmBTE7YRtpNrqWIBKu/avn3G6bAxHVi243arW1K956 8WhMvB3xABb3xZBGIBa/fiu7Xjo2G3SRXu/pb+ufT3mF2ee+C9KCgrSYtkFa tO/GXR+iVaAT8QQHc95oZ/DCPQ53eQ2lcNA99NGj3RvLc2aRwmLidISJvPn7 wMgZGTlzRs6Mkf31whUmNtze3tKz9N97e3u/AlqNsa4kGwAA ---1463794431-682694632-980855380=:26847-- From larsga@garshol.priv.no Tue Jan 30 11:55:12 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jan 2001 12:55:12 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: References: Message-ID: * Alexandre Fayolle | | You'll find a french translation of the message table attached to this | mail. Comments are welcome. Great! Thank you! I am not capable of providing comments as I do not speak French, but this goes into the CVS tree immediately. | Just to be sure : | msg 4003 : PE = Parsed Entity ? That is correct. --Lars M. From Nicolas.Chauvat@logilab.fr Tue Jan 30 11:50:16 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Tue, 30 Jan 2001 12:50:16 +0100 (CET) Subject: [XML-SIG] dom implementations In-Reply-To: Message-ID: On 30 Jan 2001, Lars Marius Garshol wrote: > [...] > problems many times when doing Python programming and wish there were > a general solution. >=20 > Features from Aspect-Oriented Programming, or CLOS, would be nice. Any pointers ? --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From larsga@garshol.priv.no Tue Jan 30 12:15:26 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 30 Jan 2001 13:15:26 +0100 Subject: [XML-SIG] dom implementations In-Reply-To: References: Message-ID: * Lars Marius Garshol | | Features from Aspect-Oriented Programming, or CLOS, would be nice. * Nicolas Chauvat | | Any pointers ? AOP: As far as I can tell this is little more than a cut-down version of CLOS. That is still interesting, though. CLOS is the Common Lisp Object System, which is basically the object-oriented part of Common Lisp. Paul Graham's 'ANSI Common Lisp' is by far the best introduction to Common Lisp I have ever read, and it also covers CLOS. IMHO it is also the best 'learn how to program in this language'-book ever. --Lars M. From tpassin@home.com Tue Jan 30 13:59:36 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 30 Jan 2001 08:59:36 -0500 Subject: [XML-SIG] dom implementations References: Message-ID: <005601c08ac4$e1b78de0$7cac1218@reston1.va.home.com> Nicolas Chauvat asked - On 30 Jan 2001, Lars Marius Garshol wrote: > [...] > problems many times when doing Python programming and wish there were > a general solution. > > Features from Aspect-Oriented Programming, or CLOS, would be nice. > Any pointers ? See the Aspect Oriented Programming site at http://www.parc.xerox.com/csl/projects/aop/ Cheers, Tom P From uche.ogbuji@fourthought.com Tue Jan 30 14:08:24 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 07:08:24 -0700 Subject: [XML-SIG] dom implementations In-Reply-To: Message from Alexandre Fayolle of "Tue, 30 Jan 2001 10:13:40 +0100." Message-ID: <200101301408.HAA14224@localhost.localdomain> > > It is tempting to look into ways of not having to pay this huge > > penalty when you don't use that feature. I've come across similar > > problems many times when doing Python programming and wish there were > > a general solution. > > I thought about this when I added Events to 4DOM, and finally did not > implement a way to disable them, because I was a bit in a hurry at that > time. > > I now see two solutions possible solutions: > > * have the Document instance be aware of existing listeners, and let the > propagation methods query the document before actually propagating > anything (this would completely disable Event propagation if noone is > listening) I like this idea. Users needn't do anything by default to avoid the events slow-down. Given the listeners, we should also consider having the readers use Element constructors directly rather than the factory functions, as long as we can accommodate subclasses, as we did for cloneNode. > * use the the hasFeature method of the DOM implementation to see if we > want DOM L3 events (a bit as is done in the 4DOM test suite where > namespaces are disabled at some point), and let the propagatoin know if it > is expected to propagate anything. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Jan 30 14:11:54 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 07:11:54 -0700 Subject: [XML-SIG] Will gettext do? In-Reply-To: Message from Alexandre Fayolle of "Tue, 30 Jan 2001 12:49:40 +0100." Message-ID: <200101301411.HAA14240@localhost.localdomain> Lars wonders if anyone is interested in more xmlproc message file translations at 10:26:25 GMT Alexandre Fayolle submits French translation at 11:49:40 GMT Is this a great group or what? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Tue Jan 30 14:57:57 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 15:57:57 +0100 (CET) Subject: [XML-SIG] dom implementations In-Reply-To: <200101301408.HAA14224@localhost.localdomain> Message-ID: On Tue, 30 Jan 2001, Uche Ogbuji wrote: > Given the listeners, we should also consider having the readers use Element > constructors directly rather than the factory functions, as long as we can > accommodate subclasses, as we did for cloneNode. I'd be curious to hear about the implementation you have in mind. We overloaded the factory functions in a custom document, so that they check the tag and ns of an element (for instance) and instantiate the right class depending on this, so I'd say we actually need these factory methods. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Tue Jan 30 16:28:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 09:28:47 -0700 Subject: [XML-SIG] Will gettext do? In-Reply-To: Message from "Martin v. Loewis" of "Tue, 30 Jan 2001 10:04:46 +0100." <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> Message-ID: <200101301628.JAA15183@localhost.localdomain> OK. With the rechrge given by Martin, some more digging throught the bowels of msgfmt.py and gettext.py, and a lot more hacking at distutils, I think I'm ready to declare partial victory. I have gettext working on my machine. The reason the victory is only partial is that I can't see how to generalize the procedure when more languages are added. Here is the procedure I follow in distutils, say for 4DOM. 1. run pygettext to create build/[platform]/_xmlplus/4Suite.po 2. run msgfmt to create the build/[platform]/_xmlplus/4Suite.mo 3. create build/[platform]/_xmlplus/en_US/LC_MESSAGES and move 4Suite.mo there 4. Make distutils copy build/[platform]/_xmlplus/en_US/LC_MESSAGES/4Suite.mo to the equivalent directory in the Python lib The problem is step 3. I can't see a way (and I read all the way through msgfmt.py) to automatically mark the locales whose directories I should create. I basically hard-code the creation of "en_US", and I'd have to hard-code "de_DE" and all that when I get the translations. Maybe this is how it's supposed to be, but it seems odd. I'll troll about a bit more in Tools/i18n/, but I thought maybe Martin or someone has the snap answer. Anyway, if anyone wants to have an early look, I've synced the anonymous CVS with my internal version. You can now check out 4Suite and see the updates. setup.py and admin/DistExt.py have the distutils extensions for gettext and Dom, Lib, Rdf, and Xslt have changed __init__ and en_US.py -> "MessageSource.py". Also, I've checked in the .po files for interested translators to work on, though I'd wait for a bit because I'm about to post a message on gettext translation maintenance. Anonymous CVS procedures here: http://lists.fourthought.com/pipermail/4suite/2001-January/001165.html -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Jan 30 16:41:39 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 09:41:39 -0700 Subject: [XML-SIG] on gettext maintenance Message-ID: <200101301641.JAA15264@localhost.localdomain> Again I read through all the Python gettext docs, and I might just be completely missing something, but I don't see how .po files are to be cleanly maintained. Martin said "So you'd first produce 4Suite.de.po, send it to me, and I send it back to you filled with German translations." BTW, the "4Suite.de.po" part confuses me. Based on this and the msgid/msgstr combos in the code, I'm guessing each language has a .po. Fine, but again, how does this feed into msgfmt.py? Is a single .mo file created, or one for each language? I see no fields that specify the localization for each .po file. Anyway, so what happens when I change or add messages and all that. Do I simply send brand new .po files to each translation, maybe sending a diff as well to make the changes clear? This seems cumbersome. Of course, I'm not sure what scheme would be smoother. I must say againa that the Python gettext docs are pretty hard to follow, and they leave quite a few holes unexplained, besides their occasional bad advice ("Once you've used pygettext to create your .pot files, you can use the standard GNU gettext tools to generate your machine-readable .mo files"). BTW, what's the difference between a .po and .pot file? If none, why does msgfmt.py insist on ".po" when the docs just talk about ".pot"? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Tue Jan 30 16:54:55 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 30 Jan 2001 17:54:55 +0100 (CET) Subject: [XML-SIG] on gettext maintenance In-Reply-To: <200101301641.JAA15264@localhost.localdomain> Message-ID: On Tue, 30 Jan 2001, Uche Ogbuji wrote: > Anyway, so what happens when I change or add messages and all that. Do I > simply send brand new .po files to each translation, maybe sending a diff as > well to make the changes clear? This seems cumbersome. Of course, I'm not > sure what scheme would be smoother. Well, a diff is a much friendlier way to present these kind of things than a brand new file. > BTW, what's the difference between a .po and .pot file? I think the latter is supposed to be smoked. As for the former, well... I won't go into the details, and let you decide for yourself. ;o) Alexandre 'sorry, I cound not resist this one' Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Mike.Olson@fourthought.com Tue Jan 30 17:45:37 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 30 Jan 2001 10:45:37 -0700 Subject: [XML-SIG] dom implementations References: <200101301408.HAA14224@localhost.localdomain> Message-ID: <3A76FDC1.C2591589@FourThought.com> Uche Ogbuji wrote: > > > > It is tempting to look into ways of not having to pay this huge > > > penalty when you don't use that feature. I've come across similar > > > problems many times when doing Python programming and wish there were > > > a general solution. > > > > I thought about this when I added Events to 4DOM, and finally did not > > implement a way to disable them, because I was a bit in a hurry at that > > time. > > > > I now see two solutions possible solutions: > > > > * have the Document instance be aware of existing listeners, and let the > > propagation methods query the document before actually propagating > > anything (this would completely disable Event propagation if noone is > > listening) > Or make the document the hub, all events go through the document and either it propogates or it doesn't. I don't think there is much overhead in the actual sending of an event. Another though I had was to be able to turn events on and off at runtime. Ex, when you read in a document you don't want all of the events, but after it is read you may... > > * use the the hasFeature method of the DOM implementation to see if we > > want DOM L3 events (a bit as is done in the 4DOM test suite where > > namespaces are disabled at some point), and let the propagatoin know if it > > is expected to propagate anything. Yes, except instead of haveing the code spread all over have a _4dom_propogate on the document that is a noop if the feature is not enabled. I'd like the same type of setup for Ranges as well, but I'll wait until we agree on something before I implement.... Mike > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From teg@redhat.com Tue Jan 30 19:05:10 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 30 Jan 2001 14:05:10 -0500 Subject: [XML-SIG] on gettext maintenance In-Reply-To: <200101301641.JAA15264@localhost.localdomain> References: <200101301641.JAA15264@localhost.localdomain> Message-ID: Uche Ogbuji writes: > BTW, the "4Suite.de.po" part confuses me. Based on this and the msgid/msgstr > combos in the code, I'm guessing each language has a .po. Yes... but it's usially just called "de.po" (in this case) > Fine, but again, how does this feed into msgfmt.py? Never used that... only the standard gettext modules > Is a single .mo file created, or one for > each language? One for each language... the way it usually works, is that the source package has a file like "de.po". From that, it creates a .mo file which eventually is installed as "/usr/share/locale/de/LC_MESSAGES/4Suite.mo" > Anyway, so what happens when I change or add messages and all that. Do I > simply send brand new .po files to each translation, Take a look at the makefile for e.g. kbdconfig - it's simple, and it handles this. -- Trond Eivind Glomsrød Red Hat, Inc. From dieter@handshake.de Tue Jan 30 19:28:07 2001 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 30 Jan 2001 20:28:07 +0100 (CET) Subject: [XML-SIG] on gettext maintenance In-Reply-To: <4590926@toto.iv> Message-ID: <14967.5575.121696.89293@lindm.dm> Uche Ogbuji writes: > Anyway, so what happens when I change or add messages and all that. Do I > simply send brand new .po files to each translation, maybe sending a diff as > well to make the changes clear? This seems cumbersome. Of course, I'm not > sure what scheme would be smoother. The extraction routine is smart enough to merge in new string keys to be translated and mark slightly changed keys as fuzzy. The best thing probably is to extract the new and fuzzy keys and send them. Dieter From fdrake@acm.org Tue Jan 30 20:46:13 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 30 Jan 2001 15:46:13 -0500 (EST) Subject: [XML-SIG] Will gettext do? In-Reply-To: <3A768673.16ABCE06@fourthought.com> References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> <3A768673.16ABCE06@fourthought.com> Message-ID: <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > http://python.sourceforge.net/devel-docs/lib/node160.html > > Which seems to suggest that you need GNU and makes no mention of > msgfmt.py > > I read the whole gettext section and I don't think I ever say msgfmt.py > mentioned. I've forwarded your comments on this to Barry Warsaw, so that he can update that portion of the documentation. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Tue Jan 30 20:58:30 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 13:58:30 -0700 Subject: [XML-SIG] Will gettext do? In-Reply-To: Message from "Fred L. Drake, Jr." of "Tue, 30 Jan 2001 15:46:13 EST." <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> Message-ID: <200101302058.NAA15883@localhost.localdomain> > Uche Ogbuji writes: > > http://python.sourceforge.net/devel-docs/lib/node160.html > > > > Which seems to suggest that you need GNU and makes no mention of > > msgfmt.py > > > > I read the whole gettext section and I don't think I ever say msgfmt.py > > mentioned. > > I've forwarded your comments on this to Barry Warsaw, so that he can > update that portion of the documentation. Thanks! Oh dear. I've kvetched enough about the Python docs lately that someone's going to challenge me to actually do something productive about it one of these days. I should never forget to say that in general, they are very good. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Tue Jan 30 21:40:53 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 30 Jan 2001 16:40:53 -0500 (EST) Subject: [XML-SIG] Will gettext do? In-Reply-To: <200101302058.NAA15883@localhost.localdomain> References: <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> <200101302058.NAA15883@localhost.localdomain> Message-ID: <14967.13541.550582.65144@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Oh dear. I've kvetched enough about the Python docs lately that someone's > going to challenge me to actually do something productive about it one of > these days. Chances are it will be me. You're always free to submit patches and bug reports. ;-) > I should never forget to say that in general, they are very good. Thank you! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Tue Jan 30 23:27:00 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 16:27:00 -0700 Subject: [XML-SIG] po files ready for translation Message-ID: <200101302327.QAA16754@localhost.localdomain> OK. I think I have it all working. The real-world test is to get some translations in and see if I get my nice German/French/etc messages. I've synced up CVS so that you can get the latest if you like. The po files for Lib, Dom, Xslt and Rdf are at ftp://ftp.fourthought.com/pub/etc/4Suite-po.zip If anyone translates them, just send them back (attachment to private e-mail will do) and I'll check the translations back in. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@mira.cs.tu-berlin.de Tue Jan 30 23:30:17 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 00:30:17 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: <200101301628.JAA15183@localhost.localdomain> (message from Uche Ogbuji on Tue, 30 Jan 2001 09:28:47 -0700) References: <200101301628.JAA15183@localhost.localdomain> Message-ID: <200101302330.f0UNUHA00961@mira.informatik.hu-berlin.de> > Here is the procedure I follow in distutils, say for 4DOM. > > 1. run pygettext to create build/[platform]/_xmlplus/4Suite.po > 2. run msgfmt to create the build/[platform]/_xmlplus/4Suite.mo > 3. create build/[platform]/_xmlplus/en_US/LC_MESSAGES and move 4Suite.mo there > 4. Make distutils copy build/[platform]/_xmlplus/en_US/LC_MESSAGES/4Suite.mo > to the equivalent directory in the Python lib > > The problem is step 3. I can't see a way (and I read all the way through > msgfmt.py) to automatically mark the locales whose directories I should > create. I basically hard-code the creation of "en_US", and I'd have to > hard-code "de_DE" and all that when I get the translations. > > Maybe this is how it's supposed to be, but it seems odd. > > I'll troll about a bit more in Tools/i18n/, but I thought maybe Martin or > someone has the snap answer. It's my turn to go to bed now :-) but as a snap answer: the common tradition is to have 4Suite..po in the source distribution, where is typically fr, de, en (and *not* fr_FR, de_DE, en_US - unless the German translation for Germany really differs from the one for, say, Austria). With that, it should not be too difficult to generate the directories in a loop. Furthermore, I feel that any /LC_MESSAGES/catalog.po approach is bad (even though it's gettext tradition), so I'd promote the idea of having ..mo instead, and keeping the mo files all in a single directory. Unfortunately, I'm not sure whether gettext.py supports such a scheme - essentially, you need the module to tell you what languages to consider in what order, but you may want to override the resulting file naming scheme. So *if* you can install into /share/locale, that is probably best if that also is the platform convention, otherwise, any scheme that just works should do - even if it creates many extra directories. I'd support proposals to enhance gettext.py for easier distribution of catalogs - in particular on non-Unix platforms, as well as proposals to enhance distutils to support message catalogs. Regards, Martin From uche.ogbuji@fourthought.com Tue Jan 30 23:45:54 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 30 Jan 2001 16:45:54 -0700 Subject: [XML-SIG] Will gettext do? In-Reply-To: Message from "Martin v. Loewis" of "Wed, 31 Jan 2001 00:30:17 +0100." <200101302330.f0UNUHA00961@mira.informatik.hu-berlin.de> Message-ID: <200101302345.QAA16810@localhost.localdomain> > It's my turn to go to bed now :-) but as a snap answer: the common > tradition is to have 4Suite..po in the source distribution, > where is typically fr, de, en (and *not* fr_FR, de_DE, en_US - > unless the German translation for Germany really differs from the one > for, say, Austria). With that, it should not be too difficult to > generate the directories in a loop. Well, I do know that en_US could differ from en_GR, but probably not so it's inconceivable to combine the two. > Furthermore, I feel that any /LC_MESSAGES/catalog.po > approach is bad (even though it's gettext tradition), so I'd promote > the idea of having ..mo instead, and keeping the mo > files all in a single directory. Unfortunately, I'm not sure whether > gettext.py supports such a scheme - essentially, you need the module > to tell you what languages to consider in what order, but you may want > to override the resulting file naming scheme. Given that I have reasonable dictionary for now, I'll leave it as is, and we can go about improving it when we've shaken out all the cases. > So *if* you can install into /share/locale, that is > probably best if that also is the platform convention, otherwise, any > scheme that just works should do - even if it creates many extra > directories. I'll investigate /share/locale For now they,re in each module's directory itself, which is easy to find (__file__), and I know can be written to on install. > I'd support proposals to enhance gettext.py for easier distribution of > catalogs - in particular on non-Unix platforms, as well as proposals > to enhance distutils to support message catalogs. I think I already have the distutils part down. It generates a "default" po in a "generate" phase, and creates and installs the mo in a "build" phase. See Ft/admin/DistExt.py -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From barry@digicool.com Tue Jan 30 23:53:25 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 18:53:25 -0500 Subject: [XML-SIG] Will gettext do? References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> <3A768673.16ABCE06@fourthought.com> Message-ID: <14967.21493.166564.515469@anthem.wooz.org> >> Which documentation specifically? And what specific passages >> made you despair (sp?). >>>>> "UO" == Uche Ogbuji writes: UO> I've got to go to bed (up at bloody 6:00 a.m.when the nipper UO> wakes up), but I wanted to first point out the culprit that UO> seems to have led me so far astray UO> See UO> http://python.sourceforge.net/devel-docs/lib/node160.html UO> Which seems to suggest that you need GNU and makes no mention UO> of msgfmt.py Fred Drake's brought this to my attention, since I'm not on the xml-sig. I think msgfmt.py was added after the gettext module's documentation was written, and the docos were never updated when we added Martin's tool. I'll go ahead and add some text to the page. Cheers, -Barry From martin@mira.cs.tu-berlin.de Tue Jan 30 23:41:53 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 00:41:53 +0100 Subject: [XML-SIG] on gettext maintenance In-Reply-To: <200101301641.JAA15264@localhost.localdomain> (message from Uche Ogbuji on Tue, 30 Jan 2001 09:41:39 -0700) References: <200101301641.JAA15264@localhost.localdomain> Message-ID: <200101302341.f0UNfru01025@mira.informatik.hu-berlin.de> > "So you'd first produce 4Suite.de.po, send it to me, and I send it > back to you filled with German translations." > > BTW, the "4Suite.de.po" part confuses me. Based on this and the > msgid/msgstr combos in the code, I'm guessing each language has a > .po. Fine, but again, how does this feed into msgfmt.py? Is a > single .mo file created, or one for each language? I see no fields > that specify the localization for each .po file. msgfmt.py will transform .po into .mo, as does GNU msgfmt. I suggest that you download the sources of, say, GNU fileutils, and have a look at the directory structure. There is a lot of automake magic as well which you probably want to ignore - just consider the 'po' directory. > Anyway, so what happens when I change or add messages and all that. > Do I simply send brand new .po files to each translation, maybe > sending a diff as well to make the changes clear? This seems > cumbersome. Of course, I'm not sure what scheme would be smoother. For that, GNU gettext offers the "msgmerge" utility. It will find messages that didn't change and keep the translation, find messages that changed slightly and mark the translations as "fuzzy", find new messages and put empty translation into them, and find messages that disappeared and put their translations as "obsolete" into comments. Emacs po-mode then offers to navigate between fuzzy and untranslated messages. It *is* common to clearly label a version of the message catalog (e.g. 0.10.1a, 0.10.1b, etc), so translators can use diff to find differences - a good xgettext utility will spit out the msgids in the same order each time. Unfortunately, there is no msgfmt.py, yet - so you have to use the GNU tool, or off-load merging with previous revisions to the translators. Contribution of such a tool would be welcome, of course (I know we are deep in i18n-sig stuff now). > BTW, what's the difference between a .po and .pot file? If none, why does > msgfmt.py insist on ".po" when the docs just talk about ".pot"? There always is a file with just the msgids, and no translations - that is called .pot, and no .mo file is created from it. So what you extract is .pot (or, .po template), what translators produce is .po. Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 30 23:52:43 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 00:52:43 +0100 Subject: [XML-SIG] on gettext maintenance In-Reply-To: (teg@redhat.com) References: <200101301641.JAA15264@localhost.localdomain> Message-ID: <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de> > > BTW, the "4Suite.de.po" part confuses me. Based on this and the msgid/msgstr > > combos in the code, I'm guessing each language has a .po. > > Yes... but it's usially just called "de.po" (in this case) You are right. Although, as a translator, I always get files named, say, grep-2.4a.de.po, so I forgot that they are renamed to de.po in the grep distribution. Depending on the exact installation procedure, 4Suite.de.po would work just fine, wouldn't it? Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 30 23:55:56 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 00:55:56 +0100 Subject: [XML-SIG] on gettext maintenance In-Reply-To: <14967.5575.121696.89293@lindm.dm> (message from Dieter Maurer on Tue, 30 Jan 2001 20:28:07 +0100 (CET)) References: <14967.5575.121696.89293@lindm.dm> Message-ID: <200101302355.f0UNtuR01054@mira.informatik.hu-berlin.de> > The extraction routine is smart enough to merge in new > string keys to be translated and mark slightly changed keys > as fuzzy. > > The best thing probably is to extract the new and fuzzy keys > and send them. I'm not so sure about this advice. First, it's msgmerge, not xgettext, that does the fuzzy-marking. Then, in the GNU translation project, full files are always sent, which allows the translator to review old translations, and put an updated PO-Revision-Date header into the catalog. Regards, Martin From martin@mira.cs.tu-berlin.de Tue Jan 30 23:59:43 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 00:59:43 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> <3A768673.16ABCE06@fourthought.com> <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> Message-ID: <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de> > > I read the whole gettext section and I don't think I ever say msgfmt.py > > mentioned. > > I've forwarded your comments on this to Barry Warsaw, so that he can > update that portion of the documentation. Thanks! I'd like to point out that I wrote msgfmt.py just barely before the 2.0 release, and I'm guilty of providing no documentation whatsoever :-( (is that a trademark now?) Regards, Martin From teg@redhat.com Wed Jan 31 00:04:05 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 30 Jan 2001 19:04:05 -0500 Subject: [XML-SIG] on gettext maintenance In-Reply-To: <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de> References: <200101301641.JAA15264@localhost.localdomain> <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de> Message-ID: "Martin v. Loewis" writes: > > > BTW, the "4Suite.de.po" part confuses me. Based on this and the msgid/msgstr > > > combos in the code, I'm guessing each language has a .po. > > > > Yes... but it's usually just called "de.po" (in this case) > > You are right. Although, as a translator, I always get files named, > say, grep-2.4a.de.po, so I forgot that they are renamed to de.po in > the grep distribution. Depending on the exact installation procedure, > 4Suite.de.po would work just fine, wouldn't it? Yes, but it would be .... weird... and nonstandard. I suggest taking a look at some simple packages (like kbdconfig, mouseconfig etc) and their makefiles (unless you want to go the entire autoconf way). -- Trond Eivind Glomsrød Red Hat, Inc. From fdrake@acm.org Wed Jan 31 06:06:04 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 31 Jan 2001 01:06:04 -0500 (EST) Subject: [XML-SIG] Will gettext do? In-Reply-To: <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de> References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de> <3A768673.16ABCE06@fourthought.com> <14967.10261.74642.924466@cj42289-a.reston1.va.home.com> <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de> Message-ID: <14967.43852.565918.648240@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > I'd like to point out that I wrote msgfmt.py just barely before the > 2.0 release, and I'm guilty of providing no documentation whatsoever :-( > (is that a trademark now?) No, but it may justify a plane ticket to Germany so I can hunt you down and berate you in person. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@mira.cs.tu-berlin.de Wed Jan 31 20:59:14 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 31 Jan 2001 21:59:14 +0100 Subject: [XML-SIG] Will gettext do? In-Reply-To: (message from Lars Marius Garshol on 30 Jan 2001 11:26:25 +0100) References: Message-ID: <200101312059.f0VKxEZ01572@mira.informatik.hu-berlin.de> > BTW: xmlproc supports localization of its error messages, using a > home-spun mechanism, which is far less powerful than gettext, but > seems to do the job. Have you considered moving to gettext as well? Regards, Martin From homeloan013101@aol.com Wed Jan 31 20:57:49 2001 From: homeloan013101@aol.com (homeloan013101@aol.com) Date: Wed, 31 Jan 2001 20:57:49 Subject: [XML-SIG] Buying a home? Self employed? Hard to qualify? 1216 Message-ID: <370.462366.81047@aol.com> WE SOLVE MORTGAGE PROBLEMS !!! Specializing in loans for exceptional people Self-Employed Borrowers No Income or Asset Verification All Levels of Credit Quality Up to 100% Financing High Debt Ratios Non-Owner Occupied Properties Renovation Plus Purchase and Refinance $UPER $OLUTION$ ......... $UPER RE$ULT$ If you would like additional information please email us at wed1111@excite.com?Subject=MoreInformation Help a family member or a friend with their home loan needs by FORWARDING THIS EMAIL TO THEM! An Equal Housing Opportunity Lender If you wish to be removed from this advertiser's future mailings, please reply with the subject "Remove" and this software will automatically block you from their future mailings.