From uche.ogbuji@fourthought.com Tue May 1 18:13:29 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 01 May 2001 11:13:29 -0600 Subject: [XML-SIG] Re: [4suite] PyChecker could help References: <3aeee9f93d2396cb@amyris.wanadoo.fr> (added by amyris.wanadoo.fr) Message-ID: <3AEEEEB9.29880F20@fourthought.com> Sebastien Pierre wrote: > Here are some errors that PyChecker has found with 4Suite 0.11: > > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:124 No attribute (documentElement) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:180 No attribute (documentElement) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:211 No attribute (implementation) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:242 No attribute (documentElement) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:251 No attribute (childNodes) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:299 No attribute (childNodes) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:299 No attribute (doctype) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > Document.py:299 No attribute (documentElement) found > > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:135 No global (XML_NAMESPACE) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:271 No attribute (firstChild) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:345 self is not first method argument > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:346 No global (self) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:362 No attribute (ownerDocument) found > /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/ > FtNode.py:372 No attribute (ownerDocument) found > > Using this tool could help you find out some bugs in the 4Suite. > PyChecker is available at . > Cheers! Thanks, but note that 4DOM is no longer part of 4Suite. I'll try to look into this before the PyXML 0.6.6 release. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Wed May 2 18:16:51 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 02 May 2001 13:16:51 -0400 Subject: [XML-SIG] Proposing a web services SIG Message-ID: <3AF04103.A7FA3F01@zolera.com> I'd like to propose a new SIG, Web Services. Web services uses XML and related standards (schema, wsdl, soap, uddi) to provide a distributed computing infrastructure. There is a great deal of Python activity starting up here -- several SOAP implementation, interop work, WSDL parsing, etc. Much of the information exchange has been late-night point-to-point email, and it's time to provide a visible focal point for this activity. Our feeling (a few of us have chatted about this) is that the web services community generally takes Sax, DOM, etc., "for granted" and that it makes more sense to create a new SIG rather than be part of XML-SIG. XML Schema is a likely area of overlap, and we'll work together to handle that. In terms of code, web pages, etc., we'd follow the (high) standards of the XML Sig. Comments, next steps? /r$ From Nicolas.Chauvat@logilab.fr Wed May 2 18:36:06 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Wed, 2 May 2001 19:36:06 +0200 (CEST) Subject: [XML-SIG] Proposing a web services SIG In-Reply-To: <3AF04103.A7FA3F01@zolera.com> Message-ID: > Comments, next steps? +1 for web-services-sig (and RDF tools in PyXML ;-) --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From Mike.Olson@fourthought.com Wed May 2 19:22:12 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 02 May 2001 12:22:12 -0600 Subject: [XML-SIG] Proposing a web services SIG References: Message-ID: <3AF05054.E803903D@FourThought.com> Nicolas Chauvat wrote: >=20 > > Comments, next steps? >=20 > +1 for web-services-sig (and RDF tools in PyXML ;-) +1 for me as well Mike >=20 > -- > Nicolas Chauvat >=20 > http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Pari= s (France) >=20 > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig --=20 Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com=20 Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Cayce@actzero.com Wed May 2 19:26:40 2001 From: Cayce@actzero.com (Cayce Ullman) Date: Wed, 2 May 2001 11:26:40 -0700 Subject: [XML-SIG] Proposing a web services SIG Message-ID: >> Comments, next steps? >+1 for web-services-sig (and RDF tools in PyXML ;-) I would like to second this motion as well. I'm aware of 5 implementations of SOAP in Python (2 of which were created in the month of April, one of which was mine), so there is clearly some interest in Python+WS. Plus I think some open collaboration could go a long way towards making Python a language of choice for web services work. Cayce From uche.ogbuji@fourthought.com Wed May 2 19:40:06 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 02 May 2001 12:40:06 -0600 Subject: [XML-SIG] Proposing a web services SIG In-Reply-To: Message from Cayce Ullman of "Wed, 02 May 2001 11:26:40 PDT." Message-ID: <200105021840.f42Ie6D21877@localhost.local> > >> Comments, next steps? > >+1 for web-services-sig (and RDF tools in PyXML ;-) > > I would like to second this motion as well. I'm aware of 5 implementations > of SOAP in Python (2 of which were created in the month of April, one of > which was mine), so there is clearly some interest in Python+WS. Plus I > think some open collaboration could go a long way towards making Python a > language of choice for web services work. Well, all very well, and I can go either way on new SIG vs. just use XML-SIG, but does anyone know how to expeditiously go about creating a Python SIG? I suppose it involves some magic incantations on the meta-SIG, but I don't know the current state-of-the-SIGS. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From guido@digicool.com Wed May 2 20:41:48 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:41:48 -0500 Subject: [XML-SIG] Re: [meta-sig] Proposing a web services SIG In-Reply-To: Your message of "Wed, 02 May 2001 13:16:51 -0400." <3AF04103.A7FA3F01@zolera.com> References: <3AF04103.A7FA3F01@zolera.com> Message-ID: <200105021941.OAA03587@cj20424-a.reston1.va.home.com> > I'd like to propose a new SIG, Web Services. Web services uses XML and > related standards (schema, wsdl, soap, uddi) to provide a distributed > computing infrastructure. > > There is a great deal of Python activity starting up here -- several > SOAP implementation, interop work, WSDL parsing, etc. Much of the > information exchange has been late-night point-to-point email, and it's > time to provide a visible focal point for this activity. > > Our feeling (a few of us have chatted about this) is that the web > services community generally takes Sax, DOM, etc., "for granted" and > that it makes more sense to create a new SIG rather than be part of > XML-SIG. XML Schema is a likely area of overlap, and we'll work > together to handle that. > > In terms of code, web pages, etc., we'd follow the (high) standards of > the XML Sig. > > Comments, next steps? Read http://www.python.org/sigs/guidelines.html (all of it!). Basically, you need to appoint a volunteer, write a mission statement, and circulate the draft mission statement on the meta-sig. --Guido van Rossum (home page: http://www.python.org/~guido/) From rsalz@zolera.com Wed May 2 20:23:15 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 02 May 2001 15:23:15 -0400 Subject: [XML-SIG] Re: [meta-sig] Proposing a web services SIG References: <3AF04103.A7FA3F01@zolera.com> <200105021941.OAA03587@cj20424-a.reston1.va.home.com> Message-ID: <3AF05EA3.71F8B4F1@zolera.com> > Read http://www.python.org/sigs/guidelines.html (all of it!). I did. The instructions at the end were fairly casual, and I thought my note was good enough, sorry. Let me try again... > Basically, you need to appoint a volunteer, write a mission statement, > and circulate the draft mission statement on the meta-sig. I'm volunteering to coordinate webservices-sig. Short blurb: make it easy for python programmers to provide and use web services. Longer blurb: Web services uses SOAP, WSDL, UDDI, other standards to provide a distributed component infrastructure. The webservices-sig is focused on providing implementations of these standards so that Python programmers can easily write and use web services (i.e., both clients and servers -- the latter includes HTTPServer, but also other servers such as Apache, Zope, etc.) The initial goal of the SIG will be to develop freely-usable implementations of SOAP, WSDL, and probably UDDI. Some coordination with XML Sig will be necessary, for example, because WSDL uses XML Schema. We will develop a framework for supporting multiple implementations. Thanks. /r$ From Juergen Hermann" Message-ID: On Wed, 2 May 2001 11:26:40 -0700, Cayce Ullman wrote: >I would like to second this motion as well. I'm aware of 5 implementat= ions >of SOAP in Python (2 of which were created in the month of April, one o= f >which was mine), Could you list those, together with a homepage URL? ;) Ciao, J=FCrgen From noreply@sourceforge.net Wed May 2 23:31:23 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 02 May 2001 15:31:23 -0700 Subject: [XML-SIG] [ pyxml-Bugs-420882 ] no xpath, xslt install from CVS checkout Message-ID: Bugs item #420882, was updated on 2001-05-02 15:31 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420882&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: no xpath, xslt install from CVS checkout Initial Comment: I installed a CVS checkout from an hour or so ago into a test directory with setup.py: python setup.py build python setup.py install --prefix=[dir] This didn't copy the xpath or xslt dirs into the /lib/python1.5/site-packages/xml subdirectory of my install dir. Once I copied them manually xpath worked. I expected setup.py to use everything that was built; am I doing something weird? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420882&group_id=6473 From Cayce@actzero.com Thu May 3 01:17:02 2001 From: Cayce@actzero.com (Cayce Ullman) Date: Wed, 2 May 2001 17:17:02 -0700 Subject: [XML-SIG] RE: SOAP for Python Message-ID: > >I would like to second this motion as well. I'm aware of 5 > implementations > >of SOAP in Python (2 of which were created in the month of > April, one of > >which was mine), > > Could you list those, together with a homepage URL? ;) > SOAP.py (mine) : http://www.actzero.com the leader in terms of interoperability and features (as far as I know) SOAP.py (part of Scarab) : http://www.casbah.org hasn't moved for over a year, at a glance looks fairly unusable. soaplib.py : http://www.pythonware.com by Fredrik Lundh, much in the style of xmlrpclib SOAPy : http://soapy.sourceforge.net by Adam Elman, new client implementation supports WSDL FT : http://www.fourthought.com It was my understanding that Fourthought also is working on an impl, correct me if I'm wrong Mike. From rsalz@zolera.com Thu May 3 01:39:34 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 02 May 2001 20:39:34 -0400 Subject: [XML-SIG] RE: SOAP for Python References: Message-ID: <3AF0A8C6.BD3C239F@zolera.com> > FT : http://www.fourthought.com It was my understanding that Fourthought > also is working on an impl, correct me if I'm wrong Mike. I think he's at the same stage as I am -- discussion. From uche.ogbuji@fourthought.com Thu May 3 03:24:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 02 May 2001 20:24:34 -0600 Subject: [XML-SIG] RE: SOAP for Python In-Reply-To: Message from Rich Salz of "Wed, 02 May 2001 20:39:34 EDT." <3AF0A8C6.BD3C239F@zolera.com> Message-ID: <200105030224.f432OYX01370@localhost.local> > > FT : http://www.fourthought.com It was my understanding that Fourthought > > also is working on an impl, correct me if I'm wrong Mike. > > I think he's at the same stage as I am -- discussion. Nope. Way past discussion. 4Suite Server 0.11 (alpha) features a SOAP server. Examples here http://www-106.ibm.com/developerworks/webservices/library/ws-pyth3/ -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Thu May 3 03:21:08 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 02 May 2001 20:21:08 -0600 Subject: [XML-SIG] RE: SOAP for Python References: Message-ID: <3AF0C094.3946A1AF@FourThought.com> Cayce Ullman wrote: > > > FT : http://www.fourthought.com It was my understanding that Fourthought > also is working on an impl, correct me if I'm wrong Mike. We have parts of an implementation but are looking to expand on it a lot in the next month or so. Mike > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu May 3 04:11:25 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 02 May 2001 21:11:25 -0600 Subject: [XML-SIG] RE: SOAP for Python In-Reply-To: Message from Mike Olson of "Wed, 02 May 2001 20:21:08 MDT." <3AF0C094.3946A1AF@FourThought.com> Message-ID: <200105030311.f433BPV01587@localhost.local> > Cayce Ullman wrote: > > > > > > FT : http://www.fourthought.com It was my understanding that Fourthought > > also is working on an impl, correct me if I'm wrong Mike. > > We have parts of an implementation but are looking to expand on it a lot > in the next month or so. Ah. Mike's more cautious than I. I'll be explicit though: the only part we're "missing" is the SOAP serialization. But as far as I'm concerned, we're not missing anything in that case. The SOAP serialization, frankly stinks. I've already spat my venom at whomever didn't rip section 5 out of the SOAP spec after a second reading, but we'll see how that works out. Until then, I rely on the fact that section 5 is explicitly optional. There is no requirement for a SOAP implementation to use the SOAP serialization. I'm actually more interested in writing an RDF serialization, and with some support, it's not inconceivable that such a thing would oust section 5 before XML Protocol emerges. So, I disagree that 4SS has just parts of an implementation. We have a SOAP server according to the spec. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Thu May 3 04:56:51 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 02 May 2001 23:56:51 -0400 Subject: [XML-SIG] RE: SOAP for Python References: <200105030311.f433BPV01587@localhost.local> Message-ID: <3AF0D703.29636208@zolera.com> > Until then, I rely on the fact that section 5 is explicitly optional. There > is no requirement for a SOAP implementation to use the SOAP serialization. Technically right, but it would be *very* surprising and upsetting to folks who naively used the 4SS implementation to talk to other web services. It might even cause them to spit venom at you. > I'm actually more interested in writing an RDF serialization, and with some > support, it's not inconceivable that such a thing would oust section 5 before > XML Protocol emerges. It's about as likely as someone accepting my DER encoding. /r$ From tpassin@home.com Thu May 3 05:04:47 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 3 May 2001 00:04:47 -0400 Subject: [XML-SIG] Proposing a web services SIG References: <3AF04103.A7FA3F01@zolera.com> Message-ID: <003301c0d386$319fab80$7cac1218@reston1.va.home.com> [Rich Salz] > I'd like to propose a new SIG, Web Services. Web services uses XML and > related standards (schema, wsdl, soap, uddi) to provide a distributed > computing infrastructure. > I'd go +1 on this. Cheers, Tom P From uche.ogbuji@fourthought.com Thu May 3 05:23:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 02 May 2001 22:23:42 -0600 Subject: [XML-SIG] RE: SOAP for Python In-Reply-To: Message from Rich Salz of "Wed, 02 May 2001 23:56:51 EDT." <3AF0D703.29636208@zolera.com> Message-ID: <200105030423.f434Nga02058@localhost.local> > > Until then, I rely on the fact that section 5 is explicitly optional. There > > is no requirement for a SOAP implementation to use the SOAP serialization. > > Technically right, but it would be *very* surprising and upsetting to > folks who naively used the 4SS implementation to talk to other web > services. It might even cause them to spit venom at you. Usually that's when things become fun. However, you'll have to explain yourself better. What is this naivete you're talking about? If they're using a "conformant" SOAP client, there should be little such "surprise". And they certainly should not be upset. Even Dave Reed of Miccrosoft at XML DevCon was very careful to point out that the success of SOAP interop would come with proper handling of SOAP's flexibility. Check your assumptions at the door or prepare to crash and burn. If the major champion of SOAP can say so, especially after cooking up five of their own SOAP implemnentations wand having to (admittedly) force-feed themselves interop, I don't see how I can credit your idea that anyone should be surprised or upset working with a system that doesn't implement section 5. > > I'm actually more interested in writing an RDF serialization, and with some > > support, it's not inconceivable that such a thing would oust section 5 before > > XML Protocol emerges. > > It's about as likely as someone accepting my DER encoding. If you think you know the shape of what will come from XP, I think you have another thought coming. The politics that are massed within this group are probably even more massed than those of XML Schema, and indeed the XP WG is larger than the Schema WG. I can lay a solid bet that you won't recognize a significant amount of XP from what you see in SOAP. But then again, anyone who followed XML-RPC -> SOAP should realize this isn't much of a prediction. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu May 3 05:40:28 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 02 May 2001 22:40:28 -0600 Subject: [XML-SIG] RE: SOAP for Python In-Reply-To: Message from Rich Salz of "Wed, 02 May 2001 23:56:51 EDT." <3AF0D703.29636208@zolera.com> Message-ID: <200105030440.f434eSb02098@localhost.local> > > Until then, I rely on the fact that section 5 is explicitly optional. There > > is no requirement for a SOAP implementation to use the SOAP serialization. > > Technically right, but it would be *very* surprising and upsetting to > folks who naively used the 4SS implementation to talk to other web > services. It might even cause them to spit venom at you. After I sent my last message another thought struck me. You use the term "Web services" above. Probably I have to understand what you mean by that before I understand why you think it would be surprising and upsetting to have SOAP systems that don't implement section 5. The only reason everyone would want to "just stick to section 5" is for "transparent" API-type calls. RPC all over again. Basically CORBA with SOAP/HTTP over the wire rather than CDR/IIOP. But what on earth is the use of such a thing? Why not just use CORBA or DCOM or RMI, all of which are vastly more efficient than SOAP and can claim more pedigree and interop? The answer is simple: because such tightly-coupled systems do not survive the boundary from one business technology and process to another. Crossing such a boundary requires loosely-coupled systems, and that is the only reason there is any relevance to the buzzword "Web services". Successful Web services will be message-oriented, loosely coupled systems with a great deal of flexibility that is handled through metadata management. Whether you're in the ebXML camp or the UDDI camp, you had better be taking those tModels, WSDL bindings and CPPs seriously, because if you just blindly write code that assumes that, say everyone uses SOAP serialization, you will be doing commerce with only a fraction of your brave new market. This is why it was utter silliness for section 5 not to have been broken out of SOAP transport into a separate spec. It encourages people to wrongly assume that SOAP implies section 5, and thereby condemn themselves to reinventing the RPC wheel all over again. And I'll note that I'm not alone in this sentiment. In past SOAP debates on XML-DEV, no lesser figures than Tim Bray and David Megginson have expressed similar annoyance at the conflation of transport and content model that mars SOAP. So do I think it's realistic that section 5 will be put in its place before XP emerges? Absolutely. And in the unlikely event that this doesn't happen, Web services will pretty much drown in its own unfulfilled promises. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Wed May 2 23:53:07 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 3 May 2001 00:53:07 +0200 Subject: [XML-SIG] Proposing a web services SIG In-Reply-To: <200105021840.f42Ie6D21877@localhost.local> (message from Uche Ogbuji on Wed, 02 May 2001 12:40:06 -0600) References: <200105021840.f42Ie6D21877@localhost.local> Message-ID: <200105022253.f42Mr7B01762@mira.informatik.hu-berlin.de> > Well, all very well, and I can go either way on new SIG vs. just use > XML-SIG, but does anyone know how to expeditiously go about creating > a Python SIG? I suppose it involves some magic incantations on the > meta-SIG, but I don't know the current state-of-the-SIGS. I just asked to close three of them, so it is probably time to fill the empty space :-) In any case, I think Rich's proposal is missing an expiration/review date for the SIG, yet. Traditionally, SIGs used to expire after one (?) year (after which they could be extended), but with the little review they get after that time, reviewing them every two years is probably as fine. In any case, this is all meta-sig business. Regards, Martin P.S. There is also the issue of the SIG web pages. I'm still looking for comments on whether they ought to live in the Python CVS, or in a separate SF project (which check-in-permissions for all SIG coordinators). From noreply@sourceforge.net Thu May 3 09:58:41 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 03 May 2001 01:58:41 -0700 Subject: [XML-SIG] [ pyxml-Bugs-420977 ] 4XSLT traceback Message-ID: Bugs item #420977, was updated on 2001-05-03 01:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420977&group_id=6473 Category: SAX Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: 4XSLT traceback Initial Comment: Hi, I get a traceback when trying to process an XSLT generated by schematron. The XSLT is attached to this bug report. It could be a problem with the schematron itself. The document on which he xslt is applied is ' The traceback is: alf@lapinot:~/schematron$ 4xslt test.xml recipe.xsl Traceback (innermost last): File "/usr/bin/4xslt", line 5, in ? _4xslt.Run(sys.argv) File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 113, in Run topLevelParams=top_level_params) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 150, in runUri writer, uri, outputStream) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 250, in execute self.applyTemplates(context, None) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 267, in applyTemplates found = sty.applyTemplates(context, mode, self, params) File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 430, in applyTemplates patternInfo[PatternInfo.TEMPLATE].instantiate(context, processor, params) File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line 114, in instantiate context = child.instantiate(context, processor)[0] File "/usr/lib/python1.5/site-packages/xml/xslt/ApplyTemplatesElement.py", line 93, in instantiate processor.applyTemplates(context, mode, params) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 271, in applyTemplates self.applyBuiltins(context, mode) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 284, in applyBuiltins self.applyTemplates(context, mode) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 267, in applyTemplates found = sty.applyTemplates(context, mode, self, params) File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 430, in applyTemplates patternInfo[PatternInfo.TEMPLATE].instantiate(context, processor, params) File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line 112, in instantiate new_level) File "/usr/lib/python1.5/site-packages/xml/xslt/ChooseElement.py", line 61, in instantiate context, chosen, rec_tpl_params = child.instantiate(context, processor, new_level) File "/usr/lib/python1.5/site-packages/xml/xslt/WhenElement.py", line 43, in instantiate result = self._expr.evaluate(context) File "/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py", line 369, in evaluate rt = Conversions.BooleanEvaluate(self._right, context) File "/usr/lib/python1.5/site-packages/xml/xpath/Conversions.py", line 33, in BooleanEvaluate rt = exp.evaluate(context) File "/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py", line 408, in evaluate lrt = self._left.evaluate(context) File "/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py", line 180, in evaluate return self._func(context, arg0) File "/usr/lib/python1.5/site-packages/xml/xpath/CoreFunctions.py", line 300, in Floor if int(number) == number: TypeError: object can't be converted to int This is with 4Suite-0.11a2. Cheers Alexandre ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420977&group_id=6473 From noreply@sourceforge.net Thu May 3 12:29:30 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 03 May 2001 04:29:30 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421001 ] Undefined symbol XML_SetEntityDeclHandle Message-ID: Bugs item #421001, was updated on 2001-05-03 04:29 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421001&group_id=6473 Category: expat Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Undefined symbol XML_SetEntityDeclHandle Initial Comment: On FreeBSD 4.2 i386, with Python 2.0, PyXML 0.6.5, 4Suite 0.11a2 and 4SS 0.11a2 I get the following error: File "/usr/local/bin/4ss", line 3, in ? from FtServer.Console import CommandLine File "/usr/local/lib/python2.0/site-packages/FtServer/Console/CommandLine.py", line 3, in ? from Commands import g_commands File "/usr/local/lib/python2.0/site-packages/FtServer/Console/Commands/__init__.py", line 2, in ? import Init File "/usr/local/lib/python2.0/site-packages/FtServer/Console/Commands/Init.py", line 15, in ? from FtServer.Core.Lib import ConfigFile File "/usr/local/lib/python2.0/site-packages/FtServer/Core/Lib/ConfigFile.py", line 2, in ? from Ft.Rdf.Serializers.Dom import Serializer File "/usr/local/lib/python2.0/site-packages/Ft/Rdf/Serializers/Dom.py", line 27, in ? from Ft.Lib import pDomlette File "/usr/local/lib/python2.0/site-packages/Ft/Lib/pDomlette.py", line 668, in ? from pDomletteReader import * File "/usr/local/lib/python2.0/site-packages/Ft/Lib/pDomletteReader.py", line 27, in ? from xml.parsers import expat File "/usr/local/lib/python2.0/site-packages/_xmlplus/parsers/expat.py", line 4, in ? from pyexpat import * ImportError: /usr/local/lib/python2.0/site-packages/_xmlplus/parsers/pyexpat.so: Undefined symbol "XML_SetEntityDeclHandler" ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421001&group_id=6473 From rsalz@zolera.com Thu May 3 14:58:45 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 03 May 2001 09:58:45 -0400 Subject: [XML-SIG] RE: SOAP for Python References: <200105030423.f434Nga02058@localhost.local> Message-ID: <3AF16415.A4BB8CD1@zolera.com> > What is this naivete you're > talking about? If you asked 100 people who were building SOAP applications Did you know we could both be compliant but use different data transfers and therefore be unable to interoperate? I'll bet more than half would be surprised, and more than 80% would say "yes, but doesn't everyone at least support the common scheme." I agree WSDL is way important, which is one of the motivators for a web-services SIG. I disagree that Sec5's inefficiencies doom it to failure, and it's installed base will be enough to ensure it's viability. But that's a simple bet whose answer we'll know in a couple of years. Not worth arguing over. /r$ From uche.ogbuji@fourthought.com Thu May 3 15:27:02 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 03 May 2001 08:27:02 -0600 Subject: [XML-SIG] RE: SOAP for Python In-Reply-To: Message from Rich Salz of "Thu, 03 May 2001 09:58:45 EDT." <3AF16415.A4BB8CD1@zolera.com> Message-ID: <200105031427.f43ER2004499@localhost.local> > > What is this naivete you're > > talking about? > > If you asked 100 people who were building SOAP applications > Did you know we could both be compliant but use different data > transfers and therefore be unable to interoperate? > I'll bet more than half would be surprised, and more than 80% would say > "yes, but doesn't everyone at least support the common scheme." As you said, time will tell, but you were talking about Web services, not applications. I thought this is what the entire thread was about. I can assure you that I have spoken to/worked with quite a few in the nascent Web services space, and that most have learned not to take anything for granted, as long as it is conformant. You'd be surprised how much SOAP work is proceeding without Section 5. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Thu May 3 15:31:17 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 03 May 2001 10:31:17 -0400 Subject: [XML-SIG] RE: SOAP for Python References: <200105031427.f43ER2004499@localhost.local> Message-ID: <3AF16BB5.117B7D45@zolera.com> > You'd be surprised how much SOAP work is proceeding without Section 5. Life surprises me. :) /r$ From stuff4gary@hotmail.com Thu May 3 20:48:34 2001 From: stuff4gary@hotmail.com (gary cor) Date: Thu, 03 May 2001 19:48:34 Subject: [XML-SIG] Deleting and appending of a file, without reading into memory Message-ID: I want to add some text onto the end of an XML file just before the closing tag but I don't want to read the whole file into memory as it is quite a large file. I am trying to do the following: 1. delete 14 characters off the end of the file (the closing tag) 2. add some new data text from a cgi script onto this ie - file.append(cgi_resxml) 3. - then add back on the closing tag (14 character '') ie - file.append('') I can manage (2.) & (3.) no problems opening the file handler with append access ('a'), but I can't get into to do (1.) as well... does this append function have a reverse function and can I use that, or should I be doing this a differn't way? Kind Regards Gary _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From uche.ogbuji@fourthought.com Thu May 3 21:14:46 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 3 May 2001 14:14:46 -0600 Subject: [XML-SIG] ANN: 4Suite and 4Suite Server 0.11 Message-ID: <200105032014.f43KEkv08659@localhost.local> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.11 and 4Suite Server 0.11 ---------------------------- Open source XML processing tools and an XML data server http://4Suite.org http://Fourthought.com/4SuiteServer 4Suite Server News ------------------ Basically re-written from ground up. CORBA is no longer required and is now just another way to access the server (along with HTTP, SOAP, WebDAV, Python API, etc). Many usability, documentation, performance and architectural improvements 4Suite News ----------- * Release 0.11.0 (Tag R20010501) * pDomlette: XInclude implemented directly into parse for efficiency * pDomlette: better modularized * cDomlette: memory leaks squashed * RDF: add command line * RDF: major serialization and deserialization fixes * RDF: Work access-control directly into RDF model * RDF: API tweaks: use user flags for query flexibility * XSLT: Many speedups * XSLT: xsl:variable and xsl:param conformance fixes * ODS: Many bugs fixes in the DbmAdapter * Lib: Many bugs fixes in the DbmDriver * Many misc optimizations and bug-fixes 4Suite is a collection of Python tools for XML processing and object database management. It provides support for XML parsing, several transient and persistent DOM implementations, XPath expressions, XPointer, XSLT transforms, XLink, RDF and ODMG object databases. 4Suite Server is a platform for XML processing. It features an XML data repository, metadata management, a rules-based engine, XSLT transforms, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also provides transactions and access control features. Along with basic console and command-line management, it supports remote, cross-platform and cross-language access through CORBA, WebDAV, HTTP and other request protocols. 4Suite Server is not meant to be a full-blown application server. It provides highly-specialized services for XML processing that can be used with other application servers. All the software is open-source and free to download. Priority support and customization is available from Fourthought, Inc. For more information on this, see the http://FourThought.com, or contact Fourthought at info@fourthought.com or +1 303 583 9900 More info and Obtaining 4Suite and 4Suite Server ------------------------------------------------ Please see http://4Suite.org http://Fourthought.com/4SuiteServer >From where you can download source, Windows and Linux binaries. 4Suite is distributed under a license similar to that of the Apache Web Server. From akuchlin@mems-exchange.org Thu May 3 21:19:19 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 3 May 2001 16:19:19 -0400 Subject: [XML-SIG] Deleting and appending of a file, without reading into memory In-Reply-To: ; from stuff4gary@hotmail.com on Thu, May 03, 2001 at 07:48:34PM +0000 References: Message-ID: <20010503161919.A3785@ute.cnri.reston.va.us> On Thu, May 03, 2001 at 07:48:34PM +0000, gary cor wrote: >I want to add some text onto the end of an XML file just before the closing >tag but I don't want to read the whole file into memory as it is quite a >large file. I am trying to do the following: > >1. delete 14 characters off the end of the file (the closing tag) ... This is fragile; what if there is trailing whitespace at the end of the file? What if the closing tag is written strangely, as '< / closing >' or something like that? Now, what's the best way to do this? You could write a simple SAX handler where startElement() and characters() printed their input to a file or to standard output, and then have an endElement() that outputs a closing tag, first checking if it's the root element and inserting the extra content. Is there a better way? --amk From Mike.Olson@fourthought.com Thu May 3 21:31:10 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 03 May 2001 14:31:10 -0600 Subject: [XML-SIG] Deleting and appending of a file, without reading into memory References: <20010503161919.A3785@ute.cnri.reston.va.us> Message-ID: <3AF1C00E.83423203@FourThought.com> Andrew Kuchling wrote: > > On Thu, May 03, 2001 at 07:48:34PM +0000, gary cor wrote: > >I want to add some text onto the end of an XML file just before the closing > >tag but I don't want to read the whole file into memory as it is quite a > >large file. I am trying to do the following: > > > >1. delete 14 characters off the end of the file (the closing tag) > ... > > This is fragile; what if there is trailing whitespace at the end of > the file? What if the closing tag is written strangely, as '< / > closing >' or something like that? > > Now, what's the best way to do this? You could write a simple SAX > handler where startElement() and characters() printed their input to a > file or to standard output, and then have an endElement() that outputs > a closing tag, first checking if it's the root element and inserting > the extra content. Is there a better way? If the doc is that big, what about breaking it into smaller docs and using XInclude? Then to add a new section, load the "hub" document (which will be pretty small now) and add a new include tag. Then write the new content to the referenced file. Mike > > --amk > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu May 3 23:04:22 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 4 May 2001 00:04:22 +0200 Subject: [XML-SIG] Deleting and appending of a file, without reading into memory In-Reply-To: <20010503161919.A3785@ute.cnri.reston.va.us> (message from Andrew Kuchling on Thu, 3 May 2001 16:19:19 -0400) References: <20010503161919.A3785@ute.cnri.reston.va.us> Message-ID: <200105032204.f43M4M401839@mira.informatik.hu-berlin.de> > This is fragile; what if there is trailing whitespace at the end of > the file? What if the closing tag is written strangely, as '< / > closing >' or something like that? If this CGI script is the only application that ever modifies the document, the approach seems fine to me - although it is certainly questionable why to use XML in the first place, here. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu May 3 23:03:22 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 4 May 2001 00:03:22 +0200 Subject: [XML-SIG] Deleting and appending of a file, without reading into memory In-Reply-To: (stuff4gary@hotmail.com) References: Message-ID: <200105032203.f43M3Mi01837@mira.informatik.hu-berlin.de> > 1. delete 14 characters off the end of the file (the closing tag) > 2. add some new data text from a cgi script onto this > ie - file.append(cgi_resxml) > 3. - then add back on the closing tag (14 character '') > ie - file.append('') > > I can manage (2.) & (3.) no problems opening the file handler with append > access ('a'), but I can't get into to do (1.) as well... does this append > function have a reverse function and can I use that, or should I be doing > this a differn't way? What kind of file object do you have that has an append function? I'd use f.seek to go 14 characters before the end, and start writing there. Some operating systems don't even support truncation to a certain size; they all support positioning to a given offset, though. Regards, Martin From uche.ogbuji@fourthought.com Fri May 4 02:42:31 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 03 May 2001 19:42:31 -0600 Subject: [XML-SIG] A bit o' challenge Message-ID: <200105040142.f441gVD10270@localhost.local> OK, so the conventional wisdom lately has been that the Java processors such as Xalan and Saxon cream 4XSLT for performance across the board. Alexandre Fayolle said that he thought they were "orders of magnitude" faster. Well, I know that one always does better in his own benchmarking, but I have been working with 4XSLT quite heavily in the time leading up to the 0.11 release, and I'm having trouble crediting this impression. 4XSLT is to my observations (and measurements using the time command-line timer) a good 25% faster than Saxon and faster by an even greater proportion than Xalan for most small to medium tasks. I have indeed noticed that on huge documents, such as the "Cemetary" benchmark (3MB source), Saxon 6.0.2 is up to 4 times faster than 4XSLT (similar for Xalan), but this is still not "orders of magnitude" faster, and this only seems to be true for the size and type of document I'd only expect to process in benchmarks. Now one note: I *always* use cDomlette. It is much faster than pDomlette, and that is why I've declared that I'll be working to make it the default in 4Suite as of 0.11.1. Once again, I encourage everyone to help shake out any remnant bugs in cDomlette. See this posting for more info: http://lists.fourthought.com/pipermail/4suite/2001-April/001780.html So here's the bit o' challenge. I'm looking for regular-sized, real-world transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases, and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it my way so I can have a look (and maybe find the performance bugs that I'm too close to see). I'm also interested, of course, in hearing positive reports about 4XSLT's performance. So I say 4XSLT is competitive, and as far as I can tell, is usually faster than the Java processors (though we can't touch MSXML yet). P.S. What got me starting to ponder was DataPower's benchmark that showed 4XSLT some 20 times slower than the group of Java processors. The nonsense behind this I was able to grasp with one glance at their tortured "driver" for 4XSLT. I've made my complaints about their incompetence, but here's your chance to show I'm all wet. Thanks, all. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Fri May 4 03:04:21 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 03 May 2001 19:04:21 -0700 Subject: [XML-SIG] [ pyxml-Patches-421217 ] ImportError shoudl be AttributeError Message-ID: Patches item #421217, was updated on 2001-05-03 19:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=421217&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: ImportError shoudl be AttributeError Initial Comment: I'm running a recent CVS checkout of PyXML, and have 4Suite 0.11a2 installed (I haven't installed 4Suite 0.11, but it doesn't look to be different in this regard). I'm running Python 1.5.2 under Redhat 6.2. I get an attribute error when I try to import xslt. StylesheetReader.py seems to be catching ImportError when it should catch AttributeError for me. The intended import seems to be Ft.Lib.Error.XML_PARSE_ERROR, not Ft.Lib.XML_PARSE_ERROR. This happens before my patch: >>> import sys >>> sys.path.insert(0, '/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages') # where I installed from CVS >>> import xml >>> from xml.xslt import Processor Traceback (innermost last): File "", line 1, in ? File "/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/Processor.py", line 24, in ? from xml.xslt import StylesheetReader, ReleaseNode File "/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py", line 67, in ? XML_PARSE_ERROR = Ft.Lib.XML_PARSE_ERROR AttributeError: XML_PARSE_ERROR ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=421217&group_id=6473 From cperez@zulunet.net Fri May 4 18:21:51 2001 From: cperez@zulunet.net (Carlos Perez) Date: Fri, 4 May 2001 13:21:51 -0400 Subject: [XML-SIG] Looking for XML to Python sequence code. In-Reply-To: Message-ID: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ> I'm looking for some Python code that convert XML to a Python native sequence object. Does anyone know where to get it? Thanks in advance... From dieter@handshake.de Fri May 4 19:19:41 2001 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 4 May 2001 20:19:41 +0200 (CEST) Subject: [XML-SIG] Re: [4suite] A bit o' challenge In-Reply-To: <782485507@toto.iv> Message-ID: <15090.62141.831609.704073@lindm.dm> Uche Ogbuji writes: > ... > Well, I know that one always does better in his own benchmarking, but I have > been working with 4XSLT quite heavily in the time leading up to the 0.11 > release, and I'm having trouble crediting this impression. 4XSLT is to my > observations (and measurements using the time command-line timer) a good 25% > faster than Saxon and faster by an even greater proportion than Xalan for most > small to medium tasks. When I used 4XSLT for the last time, it was version 0.9. I transformed a 240 kb DocBook/XML file into HTML using Norman Walsh's DocBook stylesheets. 4XSLT needed about 50 MB memory and about 30 min CPU time (slow Pentium 100 MHZ with 64 MB main memory). A colleague of mine used Saxon for his DocBook/XML documentation, also with Normal Walsh's stylesheets. Runtime was in the order of a minute. I should say, it was a very different machine (Sun E450 with 256MB memory). But nevertheless, I expect that after normalization Saxon was several times faster than 4XSLT. I was especially horrified by the high memory requirements. The mentioned document is one out of eight chapters of a book. In the final production, the complete book must be processed together (to get correct links, table of contents, indexes,...). I fear, I would need 200 MB memory and several hours of processing time .... > .... > So here's the bit o' challenge. I'm looking for regular-sized, real-world > transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases, > and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it > my way so I can have a look (and maybe find the performance bugs that I'm too > close to see). I will give it a try, when 0.11 is released and report back. Dieter From rsalz@zolera.com Fri May 4 20:36:27 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 04 May 2001 15:36:27 -0400 Subject: [XML-SIG] xmlproc bug? Message-ID: <3AF304BB.D5ECB468@zolera.com> If you feed() a unicode string into an xmlproc parser, Python barfs at line 234 # ignore unusal byte orders 2143 and 3412 elif new_data[:2] == '\xfe\xff': enc = "utf-16-be" # with BOM because apparently it is trying to convert the string to unicode and it's got 8bit characters. Not sure what the right thing to do is. here's a three-line script that shows the fault from xml.parsers.xmlproc import xmlproc z = xmlproc.XMLProcessor() z.feed(u'') /r$ From larsga@garshol.priv.no Fri May 4 21:22:07 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 04 May 2001 22:22:07 +0200 Subject: [XML-SIG] xmlproc bug? In-Reply-To: <3AF304BB.D5ECB468@zolera.com> References: <3AF304BB.D5ECB468@zolera.com> Message-ID: * Rich Salz | | If you feed() a unicode string into an xmlproc parser, Python barfs at | line 234 | # ignore unusal byte orders 2143 and 3412 | elif new_data[:2] == '\xfe\xff': | enc = "utf-16-be" # with BOM | | because apparently it is trying to convert the string to unicode and | it's got 8bit characters. The problem here is that we are trying to autodetect the encoding of a Unicode string, but a Unicode string is already in Unicode and so needs no decoding. You can solve this by setting the decoded parameter to feed to 1, but it would be better if you did not have to. Fixed it by doing the following: Index: xml/parsers/xmlproc/xmlutils.py =================================================================== RCS file: /cvsroot/pyxml/xml/xml/parsers/xmlproc/xmlutils.py,v retrieving revision 1.16 diff -c -r1.16 xmlutils.py *************** *** 285,290 **** --- 285,295 ---- new_data = new_data+self.encoded_data self.encoded_data = "" + + if not decoded and using_unicode and \ + type(new_data) == types.UnicodeType: + decoded = 1 + if not decoded and not self.charset_converter: self.autodetect_encoding(new_data) # If this returns with no auto-detected encoding, i.e. if I need to check it first before committing it, but this should solve the problem. (Am waiting for glibc to download, so that I can compile Python 2.1, so that I can actually test this. The download is going slowly, so I am posting before the commit.) --Lars M. From noreply@sourceforge.net Fri May 4 21:36:15 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 04 May 2001 13:36:15 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421488 ] xslt processor stylesheet reader error Message-ID: Bugs item #421488, was updated on 2001-05-04 13:36 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421488&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: xslt processor stylesheet reader error Initial Comment: Can't append stylesheet. Stylesheet reader wants to call initParser(). I'm not giving the processor a reader, it's using the default. When I run without Ft installed, the reader is MinidomReader, which doesn't define this. When I run with Ft installed from 4Suite 0.11, the reader is DomletteReader, which also gives this error. initParser is defined on the pDomletteReader ReaderMixin class, but not defined anywhere on a reader that the processor gets by default, AFAICT. >>> p = Processor.Processor() p.appendStylesheetString(sheet_4) >>> Traceback (innermost last): File "", line 1, in ? File "/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/Processor.py", line 106, in appendStylesheetString sty = self._styReader.fromString(text, baseUri) File "/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/minisupport.py", line 62, in fromString return self.fromStream(st, baseUri, ownerDoc, stripElements) File "/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py", line 305, in fromStream self.initParser() AttributeError: initParser ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421488&group_id=6473 From uche.ogbuji@fourthought.com Fri May 4 21:52:25 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 04 May 2001 14:52:25 -0600 Subject: [XML-SIG] Re: [4suite] A bit o' challenge References: <15090.62141.831609.704073@lindm.dm> Message-ID: <3AF31689.A8895445@fourthought.com> Dieter Maurer wrote: > > Uche Ogbuji writes: > > ... > > Well, I know that one always does better in his own benchmarking, but I have > > been working with 4XSLT quite heavily in the time leading up to the 0.11 > > release, and I'm having trouble crediting this impression. 4XSLT is to my > > observations (and measurements using the time command-line timer) a good 25% > > faster than Saxon and faster by an even greater proportion than Xalan for most > > small to medium tasks. > When I used 4XSLT for the last time, it was version 0.9. > > I transformed a 240 kb DocBook/XML file into HTML using Norman Walsh's > DocBook stylesheets. > > 4XSLT needed about 50 MB memory and about 30 min CPU time (slow > Pentium 100 MHZ with 64 MB main memory). I did specifically mention working with cDomlette. Is that what you were using? > A colleague of mine used Saxon for his DocBook/XML documentation, > also with Normal Walsh's stylesheets. Runtime was in the order > of a minute. I should say, it was a very different machine (Sun E450 > with 256MB memory). > But nevertheless, I expect that after normalization Saxon > was several times faster than 4XSLT. > > I was especially horrified by the high memory requirements. > The mentioned document is one out of eight chapters of a book. > In the final production, the complete book must be processed > together (to get correct links, table of contents, indexes,...). > I fear, I would need 200 MB memory and several hours of processing > time .... cDomlette takes up about half the memory as pDomlette. In some cases (since it uses string pooling) this might be more or less the proportion. When I checked with the 3MB cemetary demo, 4XSLT+cDom 0.11a2 took up 42MB and Saxon 6.0.2 took up 33MB of RAM. > > .... > > So here's the bit o' challenge. I'm looking for regular-sized, real-world > > transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases, > > and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it > > my way so I can have a look (and maybe find the performance bugs that I'm too > > close to see). > I will give it a try, when 0.11 is released and report back. 0.11 was released yesterday. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Fri May 4 22:39:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 4 May 2001 23:39:33 +0200 Subject: [XML-SIG] xmlproc bug? In-Reply-To: <3AF304BB.D5ECB468@zolera.com> (message from Rich Salz on Fri, 04 May 2001 15:36:27 -0400) References: <3AF304BB.D5ECB468@zolera.com> Message-ID: <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> > If you feed() a unicode string into an xmlproc parser, Python barfs at > line 234 > # ignore unusal byte orders 2143 and 3412 > elif new_data[:2] == '\xfe\xff': > enc = "utf-16-be" # with BOM > > because apparently it is trying to convert the string to unicode and > it's got 8bit characters. > > Not sure what the right thing to do is. My intuition is that feeding Unicode objects is an error, but that may be debatable. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri May 4 23:28:08 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 5 May 2001 00:28:08 +0200 Subject: [XML-SIG] Looking for XML to Python sequence code. In-Reply-To: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ> References: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ> Message-ID: <200105042228.f44MS8m02386@mira.informatik.hu-berlin.de> > I'm looking for some Python code that convert XML to a Python native > sequence object. > Does anyone know where to get it? Are you looking for a specific structure of the sequence? If not, try seq = open(filename).read() seq will be a Python native sequence object representing the XML document :-) Regards, Martin From noreply@sourceforge.net Sat May 5 02:06:10 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 04 May 2001 18:06:10 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI Message-ID: Bugs item #421553, was updated on 2001-05-04 18:06 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: stylesheet node reader requires '' NSURI Initial Comment: I'm unable to use ParsedXML's DOM as a stylesheet node, and I think it's because of a bug in StylesheetReader.py. The problem is at StylesheetReader.py line 186: if not sheet.getAttributeNS('', 'version'): raise XsltException(Error.STYLESHEET_MISSING_VERSION) ...where the NamespaceURI given to getAttributeNS is ''. This is supposed to find the namespace-free version attribute of the stylesheet documentElement, such as """ """. ParsedXML's DOM builder gives this attribute a NamespaceURI of None when we parse. I don't think that you can use the DOM methods to create a node with a NamespaceURI of "", since the NamespaceURI is supposed to be a URI reference. Is the empty string a valid URI reference? Well, maybe - the DOM level 2 rec says: """ Note that because the DOM does no lexical checking, the empty string will be treated as a real namespace URI in DOM Level 2 methods. Applications must use the value null as the namespaceURI parameter for methods if they wish to have no namespace. """ But anyway, this indicates that when using DOM creation methods, a None should be used as the NamespaceURI for namespaceless nodes such as "version", and I think that the stylesheet reader should accept that. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473 From larsga@garshol.priv.no Sat May 5 10:26:45 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 May 2001 11:26:45 +0200 Subject: [XML-SIG] xmlproc bug? In-Reply-To: <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | My intuition is that feeding Unicode objects is an error, but that may | be debatable. I see no reason why it should be. If the application is converting to Unicode itself, or if it got the data from somewhere as Unicode, there is no reason why it should not be allowed to parse those data. --Lars M. From martin@loewis.home.cs.tu-berlin.de Sat May 5 14:12:01 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 5 May 2001 15:12:01 +0200 Subject: [XML-SIG] xmlproc bug? In-Reply-To: (message from Lars Marius Garshol on 05 May 2001 11:26:45 +0200) References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> Message-ID: <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> > I see no reason why it should be. If the application is converting to > Unicode itself, or if it got the data from somewhere as Unicode, there > is no reason why it should not be allowed to parse those data. I agree in principle. However, just allowing to call feed with a Unicode object is too permissive: What if you had previously called it with a string? So if this is allowed, care should be taken that a sensible thing happens when somebody mixes byte and unicode strings (signalling a fatal error might be sensible). Regards, Martin From tpassin@home.com Sat May 5 17:11:46 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 5 May 2001 12:11:46 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> Message-ID: <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> I've been able to get 4SuiteServer working on Windows98/Me, but it doesn't quite work right out of the box as downloaded. Here's what I needed to do to get it working. I did all the steps in the installation and quickstart guide. Once it's installed using the WIndows installer, you set up your environmental variables, then you are told to run 4ss init 1) 4ss.bat is in the python\scripts directory, so you have to add it to your path or have to be running in that directory. 2) The init command fails because the file "core.odl" is not installed into the "generated" directory (or anywhere else) by the installer. I downloaded the source distribution, found the file, and copied into the generated directory. Now init works. 3) init works, but when it asks if you want to wipe out the old data, it wants you to answer "yes" or "no". Most Windows users are used to being able to answer 'y' or 'n' to those questions. I did, and didn't even notice that I hadn't literally done what the prompt said. Very Unix-like. Very unforgiving. This code should be changed to allow "y" and "n" as well. 4) The quick start guide has you run the script populate.py in the python\docs\4SuiteServer-0.11\demo directory. But it fails, looking for a unix file, something like /etc/mime.types. The script has a test for this file and an except branch to run in case it doesn't exist (which it doesn't on a Windows machine). But the except branch incorrectly has a "raise" statement which terminates the script. Get rid of this line, which is line 66 of populate.py. Now the script runs. 5) At this point, populate installed its downloaded files but failed when it tried to modify "docdefs". It turns out you have to be running as superuser to change docdefs. The guide doesn't tell you, but implies that you should have run as the new user it just had you create. Otherwise, why create that user just before running populate.py? I deleted the whole "gems" container and went through the steps again as superuser. 6) Then I tried to install and run the guestbook. You have to run the "bootstrap.py" script in the demo\GuestBook directory. This failed. It turned out that you have to change to the GuestBook directory and run from that, otherwise the script can't find the files it needs. 7) The Guestbook works until you try to submit the form for your first guest. Then it fails, but in a strange way. With IE, I got an error message saying it couldn't find the server or there was a DNS error. This must be an incorrect message since the form uses a relative path, but anyway something isn't working that I haven't tracked down. 8) The docs give examples of looking at various properties by their path, as in 4ss show acl /localhost/index.html None of these commands have worked for me. I had to remove the /localhost/ part. My server was running at the time. I think there was one more change I made to get init to work - there is a path with a unix-style "/" hardcoded somewhere - but unfortunately I forget just where and can't find it right now. If this strikes, you should be able to find it from the error message. It runs now - I have it on port 8090 to avoid colliding with Zope in 8080. Good luck! Cheers, Tom P From Mike.Olson@fourthought.com Sat May 5 23:43:44 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 05 May 2001 16:43:44 -0600 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> Message-ID: <3AF48220.B9E97B23@FourThought.com> "Thomas B. Passin" wrote: > Thomas, thanks for all of the work, I'm working today on getting these straigtened out. > > 4ss init > > 1) 4ss.bat is in the python\scripts directory, so you have to add it to your > path or have to be running in that directory. This is in the Windows Installation guide. see http://4suite.org/4Suite.org/documents/guides/4SuiteServer/Windows_Installation Towards the end of the Installing 4SuiteServer section. > > 2) The init command fails because the file "core.odl" is not installed into > the "generated" directory (or anywhere else) by the installer. I downloaded > the source distribution, found the file, and copied into the generated > directory. This was a packaging bug. We will be putting out new Windows packages today. > > Now init works. > > 3) init works, but when it asks if you want to wipe out the old data, it > wants you to answer "yes" or "no". Most Windows users are used to being > able to answer 'y' or 'n' to those questions. I did, and didn't even notice > that I hadn't literally done what the prompt said. Very Unix-like. Very > unforgiving. This code should be changed to allow "y" and "n" as well. I'll make this more forgiving, and more informative. > > 4) The quick start guide has you run the script populate.py in the > python\docs\4SuiteServer-0.11\demo directory. But it fails, looking for a > unix file, something like /etc/mime.types. The script has a test for this > file and an except branch to run in case it doesn't exist (which it doesn't > on a Windows machine). But the except branch incorrectly has a "raise" > statement which terminates the script. > > Get rid of this line, which is line 66 of populate.py. Now the script runs. Fixed in CVS, thanks. > > 5) At this point, populate installed its downloaded files but failed when it > tried to modify "docdefs". It turns out you have to be running as superuser > to change docdefs. The guide doesn't tell you, but implies that you should > have run as the new user it just had you create. Otherwise, why create that > user just before running populate.py? I updated the docs to say that populate needs to be run as super user. I might change it so that any user can create a document definition though. > > 6) Then I tried to install and run the guestbook. You have to run the > "bootstrap.py" script in the demo\GuestBook directory. This failed. It > turned out that you have to change to the GuestBook directory and run from > that, otherwise the script can't find the files it needs. I updated the README > > 7) The Guestbook works until you try to submit the form for your first > guest. Then it fails, but in a strange way. With IE, I got an error > message saying it couldn't find the server or there was a DNS error. This > must be an incorrect message since the form uses a relative path, but anyway > something isn't working that I haven't tracked down. I'll have to look into this one a bit closer.... > > 8) The docs give examples of looking at various properties by their path, as > in > > 4ss show acl /localhost/index.html Did you get an error of: Uri /localhost/index.html, is unknown > > None of these commands have worked for me. I had to remove the /localhost/ > part. My server was running at the time. Did it work when you removed the localhost part? Then you probably have a document in your root called index.html. Probably from the Guestbook example. That souldn't put things in your root. thanks again for your help. Hopfully it is getting easier to install. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Sun May 6 05:02:00 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 6 May 2001 00:02:00 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> Message-ID: <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> [Tom] > > > > 8) The docs give examples of looking at various properties by their path, as > > in > > > > 4ss show acl /localhost/index.html > [Mike Olson] > Did you get an error of: > Uri /localhost/index.html, is unknown > Yes [Tom] > > > > None of these commands have worked for me. I had to remove the /localhost/ > > part. My server was running at the time. [Mike] > Did it work when you removed the localhost part? Then you probably have > a document in your root called index.html. Probably from the Guestbook > example. That souldn't put things in your root. > No, same message with or without /localhost. Here are two screen captures: D:>4ss show acl /localhost/gems/ "d:\program files\python\python" -c "from FtServer.Console import CommandLine; C ommandLine.Run()" show acl /localhost/gems/ 4SS User Name: dba Uri /localhost/gems, is unknown D:>4ss show acl gems/ "d:\program files\python\python" -c "from FtServer.Console import CommandLine; C ommandLine.Run()" show acl gems/ 4SS User Name: dba Resource: gems/ ---------- Read ACL: ['dba'] Write ACL: ['admin'] You can read this object You can modify this object As for an index.html in the root: D:>4ss fetch document /localhost/index.html "d:\program files\python\python" -c "from FtServer.Console import CommandLine; C ommandLine.Run()" fetch document /localhost/index.html 4SS User Name: dba Uri /localhost/index.html, is unknown I just noticed that, at http://localhost:8090/ ( my 4ss site), there is a "folder" called localhost/. Is that how it's supposed to be? If so, I'd suggest changing the name because it could get confused (by a user - me for example!) with the "localhost" alias for 127.0.0.1. Thanks for your help. Tom P From Mike.Olson@fourthought.com Sun May 6 05:13:24 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 05 May 2001 22:13:24 -0600 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> Message-ID: <3AF4CF64.A0B32045@FourThought.com> "Thomas B. Passin" wrote: > > > > I just noticed that, at http://localhost:8090/ ( my 4ss site), there is a > "folder" called localhost/. Is that how it's supposed to be? If so, I'd > suggest changing the name because it could get confused (by a user - me for > example!) with the "localhost" alias for 127.0.0.1. Yes we need to change the default name of the SystemHost directory. It was less confusing when we put http infront of all of the URIs, now it is just confusing. I was thinking of callint it "etc" but I think windows folks might not like that. Thoughts? Mike > > Thanks for your help. > > Tom P > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Sun May 6 05:26:01 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 6 May 2001 00:26:01 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> Message-ID: <001001c0d5e4$a896f8a0$7cac1218@reston1.va.home.com> Another problem, the 4ss test_suite fails with this message: D:>test.py 4SS User Name: dba ==== D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite === ==== D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\Core === Traceback (innermost last): File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py", line 29, in ? test(tester) File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py", line 18, in test m.test(tester) File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py", line 14, in test os.chdir(dir) OSError: [Errno 2] No such file or directory: 'Core' I ran this script from the test_suite directory. Note that I added an extra print statement to see what directory it couldn't find. It looks as if the test.py script calls itself the second time rather than calling the test.py located in the test_suite\Core directory. I'm sure this wasn't intended. This would be a good time for me to put in a plug to make scripts that depend on knowing where other files are relative to themselves, detect their own location. You may have to make the script a module to do this reliably (I'm not fully up on all the ins and outs, but if you do it right then __file__ gives you the full path to the script). Cheers, Tom P From tpassin@home.com Sun May 6 05:29:28 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 6 May 2001 00:29:28 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> <3AF4CF64.A0B32045@FourThought.com> Message-ID: <001701c0d5e5$23fa30c0$7cac1218@reston1.va.home.com> [Mike Olson]" > "Thomas B. Passin" wrote: > > > > > > > > I just noticed that, at http://localhost:8090/ ( my 4ss site), there is a > > "folder" called localhost/. Is that how it's supposed to be? If so, I'd > > suggest changing the name because it could get confused (by a user - me for > > example!) with the "localhost" alias for 127.0.0.1. > > Yes we need to change the default name of the SystemHost directory. It > was less confusing when we put http infront of all of the URIs, now it > is just confusing. I was thinking of callint it "etc" but I think > windows folks might not like that. > [Tom] Depends on what you intend it to be for. It should have an evocative name. I see the docdefs and acl stuff in mine. How about sscfg? Cheers, Tom P From Mike.Olson@fourthought.com Sun May 6 09:11:06 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 06 May 2001 02:11:06 -0600 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> <3AF4CF64.A0B32045@FourThought.com> <001701c0d5e5$23fa30c0$7cac1218@reston1.va.home.com> Message-ID: <3AF5071A.7AE1CA56@FourThought.com> "Thomas B. Passin" wrote: > > > > Yes we need to change the default name of the SystemHost directory. It > > was less confusing when we put http infront of all of the URIs, now it > > is just confusing. I was thinking of callint it "etc" but I think > > windows folks might not like that. > > > > [Tom] > Depends on what you intend it to be for. It should have an evocative name. > I see the docdefs and acl stuff in mine. How about sscfg? Maybe just system Mike > > Cheers, > > Tom P > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun May 6 14:50:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 06 May 2001 07:50:42 -0600 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows In-Reply-To: Message from Mike Olson of "Sun, 06 May 2001 02:11:06 MDT." <3AF5071A.7AE1CA56@FourThought.com> Message-ID: <200105061350.f46Dog503966@localhost.local> > "Thomas B. Passin" wrote: > > > > > > Yes we need to change the default name of the SystemHost directory. It > > > was less confusing when we put http infront of all of the URIs, now it > > > is just confusing. I was thinking of callint it "etc" but I think > > > windows folks might not like that. > > > > > > > [Tom] > > Depends on what you intend it to be for. It should have an evocative name. > > I see the docdefs and acl stuff in mine. How about sscfg? > > Maybe just system Well, I still think it should be configurable (as it used to be, if only through the host-name). Don't forget our non-english speaking friends, and others who might want a user folder called "system" For a default, I favor "4sssystem", "sys4ss" or something like that which is unlikely to clash with user need. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Sun May 6 15:17:36 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 6 May 2001 10:17:36 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <200105061350.f46Dog503966@localhost.local> Message-ID: <000d01c0d637$4cfd7c00$7cac1218@reston1.va.home.com> [Uche Ogbuji] > Well, I still think it should be configurable (as it used to be, if only > through the host-name). Don't forget our non-english speaking friends, and > others who might want a user folder called "system" > > For a default, I favor "4sssystem", "sys4ss" or something like that which is > unlikely to clash with user need. > Right on. Tom P From Mike.Olson@fourthought.com Sun May 6 19:24:22 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 06 May 2001 12:24:22 -0600 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <200105061350.f46Dog503966@localhost.local> Message-ID: <3AF596D6.5AACFD17@FourThought.com> Uche Ogbuji wrote: > > > "Thomas B. Passin" wrote: > > > > > > > > Yes we need to change the default name of the SystemHost directory. It > > > > was less confusing when we put http infront of all of the URIs, now it > > > > is just confusing. I was thinking of callint it "etc" but I think > > > > windows folks might not like that. > > > > > > > > > > [Tom] > > > Depends on what you intend it to be for. It should have an evocative name. > > > I see the docdefs and acl stuff in mine. How about sscfg? > > > > Maybe just system > > Well, I still think it should be configurable (as it used to be, if only > through the host-name). Don't forget our non-english speaking friends, and > others who might want a user folder called "system" It would still be configurable through the SystemHost parameters. Maybe this should be renamed to the SystemContainer parameter. > > For a default, I favor "4sssystem", "sys4ss" or something like that which is > unlikely to clash with user need. i like sys4ss then. Mike > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Sun May 6 20:05:01 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 6 May 2001 15:05:01 -0400 Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows References: <200105061350.f46Dog503966@localhost.local> <3AF596D6.5AACFD17@FourThought.com> Message-ID: <002901c0d65f$73fbe8a0$7cac1218@reston1.va.home.com> [Mike Olson] > > For a default, I favor "4sssystem", "sys4ss" or something like that which is > > unlikely to clash with user need. > > i like sys4ss then. > Suits me (or is that "4suites me"???) Tom P From noreply@sourceforge.net Mon May 7 10:46:17 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 07 May 2001 02:46:17 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421978 ] pDomlette reader bug Message-ID: Bugs item #421978, was updated on 2001-05-07 02:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421978&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: pDomlette reader bug Initial Comment: Hi, I'm trying to build a pDomlette with a custom Sax parser, and it looks like the provided handler expects the parser to implement a SetBase method. I could not find it in the Sax documentation. Providing an empty SetBase() method leads to errors when accessing to parseFile() (instead of parse()), and further errors in the except clause. Did I miss something or is this a bug? Alexandre Fayolle ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421978&group_id=6473 From larsga@garshol.priv.no Mon May 7 13:27:47 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 07 May 2001 14:27:47 +0200 Subject: [XML-SIG] xmlproc bug? In-Reply-To: <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> Message-ID: * Lars Marius Garshol | | I see no reason why it should be. If the application is converting | to Unicode itself, or if it got the data from somewhere as Unicode, | there is no reason why it should not be allowed to parse those data. * Martin v. Loewis | | I agree in principle. However, just allowing to call feed with a | Unicode object is too permissive: What if you had previously called it | with a string? Good point. One should have to stick to either Unicode or byte strings throughout a single parse. Looking at the code I think it makes sense to require client code to also be consistent in its use of the 'decoded' flag. That is, decoded should always have the same value throughout an entire parse. | So if this is allowed, It is allowed now, since I've committed my change. | care should be taken that a sensible thing happens when somebody | mixes byte and unicode strings (signalling a fatal error might be | sensible). I agree. I am working on the modification now and will commit it shortly. --Lars M. From akrug@mps.de Mon May 7 17:31:53 2001 From: akrug@mps.de (Arne Krug) Date: Mon, 7 May 2001 18:31:53 +0200 Subject: [XML-SIG] sample code - msxml Message-ID: <3AF6EA19.6350.2FF8E8@localhost> Has anyone sample code for using the SAXXMLReader of the Microsoft Parser msxml in Python. arne --- Arne Krug: --- --- ufcx@rz.uni-karlsruhe.de --- --- akrug@mps.de --- From uche.ogbuji@fourthought.com Tue May 8 00:16:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 07 May 2001 17:16:34 -0600 Subject: [XML-SIG] Curiouser and curiouser Message-ID: <200105072316.f47NGYU31415@localhost.local> Quote from anonymous source in http://xmlhack.com/read.php?item=1203 "The charter of the XML Protocols WG isn't to invent anything new." I don't know how solid this particular source is, but this comment would seem to support Rich Salz's position in our debate from last week. However, one of the secret weapons I had in that debate was that I'd happened to attend the W3C Web Services workshop last month, and I can certainly say the the above quote completely contradicts every sense I got from that meeting. I think the politics of XML protocols and Web services will be white hot. It might even be a bloodier battlefield than the notorious XML Schemas. The camps appear to be roughly: * Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot (the camp that seems to be represented in the above quote) * Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and... * This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy) I think I can say from first hand that all camps have powerful adherents. Don't ask me what the hell this means for Python efforts... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Tue May 8 04:59:00 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 07 May 2001 23:59:00 -0400 Subject: [XML-SIG] Curiouser and curiouser References: <200105072316.f47NGYU31415@localhost.local> Message-ID: <3AF76F04.E38CC24D@zolera.com> > "The charter of the XML Protocols WG isn't to invent anything new." According to the XP charter (http://www.w3.org/2000/09/XML-Protocol-Charter), "The Working group shall start by developing a requirements document, and then evaluate the technical solutions proposed in the SOAP/1.1 submission against these requirements. If in this process the Working Group finds solutions that are agreed to be improvements over solutions suggested by SOAP 1.1, those improved solutions should be used." Now I find that phrase "agreed to be improvements" rife with all sorts of potential. Certainly one could make a case that a new preferred encoding that is non-interoperable with the deployed base of Sec 5 encodings is NOT an improvement, overall. :) I knew you were at the WS workshop, and that I was basing my opinions solely on the public record, but that's okay. I've served my time in standards activities and consortia, and I can hazard a guess as to what will happen. The same thing that always happens: folks want holes put in so they can plug in their own "embrace and extend" or "optimized" version of the current protocol. Well, since the encodings are specified by namespace, the holes are already there. :) So XP will tighten up the wording, remove ambiguity, and not break interop. > I think the politics of XML protocols and Web services will be white hot. I don't disagree. > The camps appear to be roughly: Interesting analysis, thanks! > * Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot ... > * Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and... These aren't mutually exclusive, since #2 is presumably a subset of #1. As a security expert, I question the need for signed soap, especially in the presence of actors. I think applications will want to do their own signing/encryption. > * This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy) I got a bit lost in your sentence syntax. Can you explain what you mean here? Tnx. > Don't ask me what the hell this means for Python efforts... Quoting an old colleague "with freedom comes choices, and with choices comes more lines of code." :) /r$ From laurent_fontanel@globalcrossing.com Tue May 8 17:22:06 2001 From: laurent_fontanel@globalcrossing.com (Laurent Fontanel) Date: Tue, 08 May 2001 12:22:06 -0400 Subject: [XML-SIG] Re: sample code - msxml References: Message-ID: <3AF81D2E.9EF58A50@globalcrossing.com> This is a multi-part message in MIME format. --------------5BD27586D3B9A79D098870E9 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Arne, I've never used MSXML to process XML SAX-style, but I've used it to apply XSL stylesheets. It's really easy with the win32com interface: import win32com.client def xml2htm(xmlFile): source = win32com.client.Dispatch("Microsoft.xmldom") source.async = 0 source.load(xmlFile) style = win32com.client.Dispatch("Microsoft.xmldom") style.async = 0 style.load("mystylesheet.xsl") return source.transformNode(style) if __name__ == '__main__': # xml2htm("myfile.xml") Note also that after source.load(), you can manipulate the whole document tree using DOM calls, which is pretty neat. Hope this helps, Laurent. --------------5BD27586D3B9A79D098870E9 Content-Type: text/x-vcard; charset=us-ascii; name="laurent_fontanel.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Laurent Fontanel Content-Disposition: attachment; filename="laurent_fontanel.vcf" begin:vcard n:Fontanel;Laurent tel;work:(716) 777-2752 x-mozilla-html:TRUE org:;Systems and Product Development adr:;;180 S. Clinton Ave.;Rochester;NY;14646; version:2.1 email;internet:laurent_fontanel@globalcrossing.com fn:Laurent Fontanel end:vcard --------------5BD27586D3B9A79D098870E9-- From karl@digicool.com Tue May 8 21:12:23 2001 From: karl@digicool.com (Karl Anderson) Date: 08 May 2001 13:12:23 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: noreply@sourceforge.net's message of "Fri, 04 May 2001 18:06:10 -0700" References: Message-ID: Can anyone shine some light on which DOM implementation is right here? After parsing an attribute with no namespace prefix, what namespace URIs should it be possible to retrieve that attribute with? For example, after parsing "" in a namespace aware way, which should return "1.0": getAttributeNS(None, 'version') getAttributeNS('', 'version') Only the URI of '' works for Domlette. Only the URI of None works for ParsedXML. I think that ParsedXML's restriction is morally better because of this line from the DOM rec: > Note that because the DOM does no lexical checking, the > empty string > will be treated as a real namespace URI in DOM Level 2 > methods. > Applications must use the value null as the > namespaceURI parameter for > methods if they wish to have no namespace. OTOH, I've lost arguments when it was pointed out that you don't have to use DOM methods when you're parsing, and in fact can't parse everything if you're restricted to them. OTOH again, using None would make parsing consistent with setting namespaceless names using DOM methods. ParsedXML doesn't work for the XSLT modules in the current PyXML checkout because they use '' as the NSURI to use to retrieve NSless attributes. Should ParsedXML allow names parsed without a NS to be retrieved with a NSURI of '' as well as None? Should Domlette allow None? Should None be used in getAttributeNS calls like these, regardless? noreply@sourceforge.net writes: > Bugs item #421553, was updated on 2001-05-04 18:06 > You can respond by visiting: > http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473 > > Category: 4Suite > Group: None > Status: Open > Resolution: None > Priority: 5 > Submitted By: Karl Anderson (karlanderson) > Assigned to: Nobody/Anonymous (nobody) > Summary: stylesheet node reader requires '' NSURI > > Initial Comment: > > I'm unable to use ParsedXML's DOM as a stylesheet node, > and I think > it's because of a bug in StylesheetReader.py. > > The problem is at StylesheetReader.py line 186: > > if not sheet.getAttributeNS('', 'version'): > raise > XsltException(Error.STYLESHEET_MISSING_VERSION) > > ...where the NamespaceURI given to getAttributeNS is > ''. This is > supposed to find the namespace-free version attribute > of the > stylesheet documentElement, such as > """ > xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="1.0"> > """. > > ParsedXML's DOM builder gives this attribute a > NamespaceURI of None > when we parse. > > I don't think that you can use the DOM methods to > create a node with a > NamespaceURI of "", since the NamespaceURI is supposed > to be a URI > reference. Is the empty string a valid URI reference? > Well, maybe - > the DOM level 2 rec says: > """ > Note that because the DOM does no lexical checking, the > empty string > will be treated as a real namespace URI in DOM Level 2 > methods. > Applications must use the value null as the > namespaceURI parameter for > methods if they wish to have no namespace. > """ > But anyway, this indicates that when using DOM creation > methods, a > None should be used as the NamespaceURI for > namespaceless nodes such > as "version", and I think that the stylesheet reader > should accept > that. > > > ---------------------------------------------------------------------- > > You can respond by visiting: > http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473 > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Karl Anderson karl@digicool.com From fdrake@acm.org Tue May 8 21:15:29 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 8 May 2001 16:15:29 -0400 (EDT) Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: References: Message-ID: <15096.21473.817904.541038@cj42289-a.reston1.va.home.com> Karl Anderson writes: > Can anyone shine some light on which DOM implementation is right here? > After parsing an attribute with no namespace prefix, what namespace > URIs should it be possible to retrieve that attribute with? > > For example, after parsing "" in a namespace > aware way, which should return "1.0": > > getAttributeNS(None, 'version') > getAttributeNS('', 'version') The former is correct according to past discussions in this mailing list. > Only the URI of '' works for Domlette. Only the URI of None works for > ParsedXML. I think that ParsedXML's restriction is morally better > because of this line from the DOM rec: Domlette is broke! > OTOH, I've lost arguments when it was pointed out that you don't have > to use DOM methods when you're parsing, and in fact can't parse > everything if you're restricted to them. OTOH again, using None would > make parsing consistent with setting namespaceless names using DOM > methods. Using None would be the right thing because that's the Python DOM binding. > ParsedXML doesn't work for the XSLT modules in the current PyXML > checkout because they use '' as the NSURI to use to retrieve NSless > attributes. That stinks! > Should ParsedXML allow names parsed without a NS to be retrieved > with a NSURI of '' as well as None? Should Domlette allow None? > Should None be used in getAttributeNS calls like these, regardless? Only None needs to be supported as an indication of "no namespace"; "" is different. (And probably broken.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Tue May 8 21:53:02 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 08 May 2001 14:53:02 -0600 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: Message from "Fred L. Drake, Jr." of "Tue, 08 May 2001 16:15:29 EDT." <15096.21473.817904.541038@cj42289-a.reston1.va.home.com> Message-ID: <200105082053.f48Kr2K10012@localhost.local> > > Karl Anderson writes: > > Can anyone shine some light on which DOM implementation is right here? > > After parsing an attribute with no namespace prefix, what namespace > > URIs should it be possible to retrieve that attribute with? > > > > For example, after parsing "" in a namespace > > aware way, which should return "1.0": > > > > getAttributeNS(None, 'version') > > getAttributeNS('', 'version') > > The former is correct according to past discussions in this mailing > list. Yes, and I was a proponent of the former as well, but we just haven't had a chance to go throughout XSLT and make the needed changes. It's on our to-do list, but any contributed patches can make this happen more quickly. To be clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath and 4XSLT that will eat up the sweat equity. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dieter@handshake.de Tue May 8 22:21:06 2001 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 8 May 2001 23:21:06 +0200 (CEST) Subject: [XML-SIG] 4xslt: bug and patch: variable import order Message-ID: <15096.25410.753829.204197@lindm.dm> --Multipart_Tue_May__8_23:21:06_2001-1 Content-Type: text/plain; charset=US-ASCII The XSLT spec specifies that definitions and template rules in an importing stylesheet take precedence over those from an imported stylesheet. This is essential for easy customization of imported stylesheets. "4xslt" implements this feature only partially: Top level variables in an importing stylesheet do not take precedence over imported ones. The attached patch hopefully fixes the problem. It ensures that variables in importing style sheets take precedence over those defined in imported style sheets and that all style sheets use the same top level variables. Dieter --Multipart_Tue_May__8_23:21:06_2001-1 Content-Type: application/octet-stream Content-Disposition: attachment; filename="var_import_order.pat" Content-Transfer-Encoding: 7bit --- :Stylesheet.py Thu May 3 01:29:05 2001 +++ Stylesheet.py Tue May 8 23:19:29 2001 @@ -398,8 +398,16 @@ self._primedContext = context #Note: key expressions can't have var refs, so we needn't worry about imports self._updateKeys(contextNode, processor) + # DM: imported variables have lower precedence than that from + # the main style sheet. + d= {} for imp in self._imports: - self._primedContext.varBindings.update(imp.stylesheet._primedContext.varBindings) + d.update(imp.stylesheet._primedContext.varBindings) + d.update(self._primedContext.varBindings) + self._primedContext.varBindings= d + # DM: all use the same set of top level variables + for imp in self._imports: + imp.stylesheet._primedContext.varBindings= d return topLevelParams --Multipart_Tue_May__8_23:21:06_2001-1-- From stuff4gary@hotmail.com Tue May 8 23:52:28 2001 From: stuff4gary@hotmail.com (gary cor) Date: Tue, 08 May 2001 22:52:28 Subject: [XML-SIG] What are the limits of soap and python? Message-ID: I am pretty confused by the SOAP discussion (it hasn't any connection with the operas browser movement has it!!). I imagine it is like OCX, Dynamic Data Exchange, Windows Script, applescript !! Can it push buttons on the system and fill out text fields, with text and automate through applications? Can I set it up hot folders with it to send pictures through photoshop, OCR and databases?? is that even possible in python? Many thanks for any simple explanations! Gary _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From karl@digicool.com Wed May 9 00:10:21 2001 From: karl@digicool.com (Karl Anderson) Date: 08 May 2001 16:10:21 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: Uche Ogbuji's message of "Tue, 08 May 2001 14:53:02 -0600" References: <200105082053.f48Kr2K10012@localhost.local> Message-ID: Uche Ogbuji writes: [*NS('', ...)] > Yes, and I was a proponent of the former as well, but we just haven't had a > chance to go throughout XSLT and make the needed changes. It's on our to-do > list, but any contributed patches can make this happen more quickly. To be > clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath > and 4XSLT that will eat up the sweat equity. Well, a quick grep-find shows that they're all in XSLT, and they're all getAttributeNS or setAttributeNS calls with actual empty strings, nothing fancy. Are there tests for 4XSLT? My install from a PyXML checkout didn't install any, and I'm an XSLT newbie, so my testing is pretty limited. I could supply patches, would they be useful without real testing at this stage of 4XSLT development? I'd love for this to be usable with our DOM. -- Karl Anderson karl@digicool.com From uche.ogbuji@fourthought.com Wed May 9 00:18:30 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 08 May 2001 17:18:30 -0600 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: Message from Karl Anderson of "08 May 2001 16:10:21 PDT." Message-ID: <200105082318.f48NIVb19293@localhost.local> > Uche Ogbuji writes: > > [*NS('', ...)] > > > Yes, and I was a proponent of the former as well, but we just haven't had a > > chance to go throughout XSLT and make the needed changes. It's on our to-do > > list, but any contributed patches can make this happen more quickly. To be > > clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath > > and 4XSLT that will eat up the sweat equity. > > Well, a quick grep-find shows that they're all in XSLT, and they're > all getAttributeNS or setAttributeNS calls with actual empty strings, > nothing fancy. > > Are there tests for 4XSLT? My install from a PyXML checkout didn't > install any, and I'm an XSLT newbie, so my testing is pretty limited. The test suite is in the documentation directory (e.g. /usr/doc/4Suite-0.11.1a0/test_suite/4XSLT on my machine) There are 160 test scripts, many of which have multiple test each. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Wed May 9 03:08:53 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 08 May 2001 19:08:53 -0700 Subject: [XML-SIG] [ pyxml-Bugs-422528 ] can't import xpath Message-ID: Bugs item #422528, was updated on 2001-05-08 19:08 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=422528&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: can't import xpath Initial Comment: Installed PyXML 0.6.5 and 4Suite checkout with setup.py install. Can't import xpath, or run the test suites: >>> import xml.xpath Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/xml/xpath/__init__.py", line 107, in ? import Context, XPathParser File "/usr/lib/python1.5/site-packages/xml/xpath/Context.py", line 16, in ? import CoreFunctions File "/usr/lib/python1.5/site-packages/xml/xpath/CoreFunctions.py", line 18, in ? from xml.xpath import ExpandedNameWrapper ImportError: cannot import name ExpandedNameWrapper ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=422528&group_id=6473 From karl@digicool.com Wed May 9 03:27:56 2001 From: karl@digicool.com (Karl Anderson) Date: 08 May 2001 19:27:56 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: Uche Ogbuji's message of "Tue, 08 May 2001 17:18:30 -0600" References: <200105082318.f48NIVb19293@localhost.local> Message-ID: Uche Ogbuji writes: > > Are there tests for 4XSLT? My install from a PyXML checkout didn't > > install any, and I'm an XSLT newbie, so my testing is pretty limited. > > The test suite is in the documentation directory (e.g. > /usr/doc/4Suite-0.11.1a0/test_suite/4XSLT on my machine) Oh, there's my problem, I was using a PyXML checkout. I admit that I'm unclear about the relationship between 4Suite and PyXML - I thought that once a module was added to PyXML, that checking out PyXML would give you sufficiently bleeding edge code to develop with and prod for bugs. I'm also looking for the most vanilla version that I can tell users to install and use with our code, when appropriate. Once a module from 4Suite is added to PyXML, is the PyXML version a checkout from the 4Suite CVS tree? Or is development moved to the PyXML tree? Why aren't the test suites part of PyXML? Do they rely on more of 4Suite? -- Karl Anderson karl@digicool.com From Mike.Olson@fourthought.com Wed May 9 05:31:58 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 08 May 2001 22:31:58 -0600 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI References: <200105082053.f48Kr2K10012@localhost.local> Message-ID: <3AF8C83E.35A48C05@FourThought.com> Karl Anderson wrote: > > Uche Ogbuji writes: > > [*NS('', ...)] > > > Yes, and I was a proponent of the former as well, but we just haven't had a > > chance to go throughout XSLT and make the needed changes. It's on our to-do > > list, but any contributed patches can make this happen more quickly. To be > > clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath > > and 4XSLT that will eat up the sweat equity. > > Well, a quick grep-find shows that they're all in XSLT, and they're > all getAttributeNS or setAttributeNS calls with actual empty strings, > nothing fancy. Nope. XPath uses these in the ParsedAxisSpecified, and decent hand full of functions. You would need to fix these as well. > > Are there tests for 4XSLT? My install from a PyXML checkout didn't > install any, and I'm an XSLT newbie, so my testing is pretty limited. You might need to install 4Suite to get the tests but I'm not sure. > > -- > Karl Anderson karl@digicool.com > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Wed May 9 07:59:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 9 May 2001 08:59:16 +0200 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: (message from Karl Anderson on 08 May 2001 16:10:21 -0700) References: <200105082053.f48Kr2K10012@localhost.local> Message-ID: <200105090659.f496xGN00940@mira.informatik.hu-berlin.de> > Well, a quick grep-find shows that they're all in XSLT, and they're > all getAttributeNS or setAttributeNS calls with actual empty strings, > nothing fancy. > > Are there tests for 4XSLT? My install from a PyXML checkout didn't > install any, and I'm an XSLT newbie, so my testing is pretty limited. I think a major source of confusion is that the xpath/xslt directories, as checked-out from PyXML CVS at the moment, are good for any purpose. This is not they case: They don't work, and we know it. If you want to *use* 4XSLT, you should install 4Suite, and not install the xpath/xslt directories from PyXML (indeed, unless you modify setup.py, they won't be installed). > I could supply patches, would they be useful without real testing at > this stage of 4XSLT development? That said, if you want to contribute patches to make the xpath/xslt packages useful, they are always appreciated. Of course, since you are new to these packages, you might first want to look at how they are supposed to function in 4Suite before fixing them in PyXML. As for test suites: 4Suite does include test suites for these packages. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 9 08:08:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 9 May 2001 09:08:00 +0200 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: (message from Karl Anderson on 08 May 2001 19:27:56 -0700) References: <200105082318.f48NIVb19293@localhost.local> Message-ID: <200105090708.f49780n00963@mira.informatik.hu-berlin.de> > I admit that I'm unclear about the relationship between 4Suite and > PyXML - I thought that once a module was added to PyXML, that > checking out PyXML would give you sufficiently bleeding edge code to > develop with and prod for bugs. It is absolutely bleeding edge, yes, and bug reports are welcome. However, until PyXML is released with these packages, you should not assume that they actually work. Indeed, one possible scenario is that the next PyXML release does *not* included these subdirectories. > Once a module from 4Suite is added to PyXML, is the PyXML version a > checkout from the 4Suite CVS tree? Or is development moved to the > PyXML tree? Neither, nor. 4XSLT uses a different XPath expression parser than the copy in PyXML; the 4XSLT one is based on BisonGen/SWIG/bison/flex; the PyXML one (dubbed PyXPath) uses Yapps/(s)re. The port to the other parser, as well as other DOM implementations, is not complete. > Why aren't the test suites part of PyXML? A number of reasons. First of all, Fourthought has not contributed them (although they might if asked). Then, the tests do require a 4Suite installation at the moment. Finally, the tests don't pass without modifications; I'd like to minimize the necessary changes before incorporating tests. Regards, Martin From noreply@sourceforge.net Wed May 9 14:41:02 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 09 May 2001 06:41:02 -0700 Subject: [XML-SIG] [ pyxml-Patches-422641 ] NameError in RilParserImp.py Message-ID: Patches item #422641, was updated on 2001-05-09 06:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422641&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: NameError in RilParserImp.py Initial Comment: The parser uses undefined constants to report errors. The attached patch adds definition of these constants (and uses them properly). Cheers Alexandre Fayolle ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422641&group_id=6473 From uche.ogbuji@fourthought.com Wed May 9 15:02:52 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 09 May 2001 08:02:52 -0600 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: Message from "Martin v. Loewis" of "Wed, 09 May 2001 09:08:00 +0200." <200105090708.f49780n00963@mira.informatik.hu-berlin.de> Message-ID: <200105091402.f49E2qG06632@localhost.local> > > Why aren't the test suites part of PyXML? > > A number of reasons. First of all, Fourthought has not contributed > them (although they might if asked). Of course. > Then, the tests do require a 4Suite installation at the moment. We have discused making them use PyUnit, but this, as all other such good intentions, are obstructed by time limitations. > Finally, the tests don't pass > without modifications; I'd like to minimize the necessary changes > before incorporating tests. We're hammering at the test suites all the while to fix and tweak it into submission. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Wed May 9 15:13:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 09 May 2001 08:13:42 -0600 Subject: [XML-SIG] Curiouser and curiouser In-Reply-To: Message from Rich Salz of "Mon, 07 May 2001 23:59:00 EDT." <3AF76F04.E38CC24D@zolera.com> Message-ID: <200105091413.f49EDgM06672@localhost.local> > > The camps appear to be roughly: > > Interesting analysis, thanks! > > > * Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot ... > > * Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and... > > These aren't mutually exclusive, since #2 is presumably a subset of #1. No. Some want to change SOAP, which is different from #1. Also while the #1 folks want to call it a day when WSDL and UDDI are stabilized, the #2 folk want much more. > > * This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy) > > I got a bit lost in your sentence syntax. Can you explain what you mean > here? Tnx. Basically: * take the business process, internationalization and authority-of-record work hammered out in EDI. * Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the payload for human readibility, inexpensive app integration and extensibility * Use a unified structured meta-data model for decription and modeling. I think this is the most attainable and viable approach to XML-based business transactions, mostly because it avoids reinventing wheels as much as possible. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Wed May 9 15:39:47 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 09 May 2001 10:39:47 -0400 Subject: [XML-SIG] Curiouser and curiouser References: <200105091413.f49EDgM06672@localhost.local> Message-ID: <3AF956B3.1146CC7C@zolera.com> > * take the business process, internationalization and authority-of-record work > hammered out in EDI. > * Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the > payload for human readibility, inexpensive app integration and extensibility > * Use a unified structured meta-data model for decription and modeling. So you must be a big fan of ebXML. Me, too. :) From uche.ogbuji@fourthought.com Wed May 9 16:18:38 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 09 May 2001 09:18:38 -0600 Subject: [XML-SIG] Curiouser and curiouser In-Reply-To: Message from Rich Salz of "Wed, 09 May 2001 10:39:47 EDT." <3AF956B3.1146CC7C@zolera.com> Message-ID: <200105091518.f49FIcg07346@localhost.local> > > * take the business process, internationalization and authority-of-record work > > hammered out in EDI. > > * Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the > > payload for human readibility, inexpensive app integration and extensibility > > * Use a unified structured meta-data model for decription and modeling. > > So you must be a big fan of ebXML. > > Me, too. :) You got it. I should rather clarigy that I prefer ebXML to the WSDL/UDDI camp, because they are standing on the shoulders of the EDI giants. I have a *great* deal of respect for EDI in general, and I think that the main problem with it was the unfortunate power that the main VANs such as GEIS, Sterling and Harbinger acquired, which strangled innovation and evolution. I think that the UDDI camp's insistence on reinventing it all is horrid form, and at the WSWS it looked to me as if the impetus behind reinventing it all was for each vendor to make as much of a land grab as possible on B2B-next-generation. I have no problem with the profit motive, but hypocrisy sucks. BTW, any luck on setting up that Web services SIG? We're pretty close to off-topic in this discussion, but I'd like it to continue, especially with regard to coordingating Python efforts in Web services. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Wed May 9 16:44:05 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 09 May 2001 11:44:05 -0400 Subject: [XML-SIG] Curiouser and curiouser References: <200105091518.f49FIcg07346@localhost.local> Message-ID: <3AF965C5.73BF4B@zolera.com> > BTW, any luck on setting up that Web services SIG? We're pretty close to > off-topic in this discussion, but I'd like it to continue, especially with > regard to coordingating Python efforts in Web services. Sending a "can we create it now" note to the meta-sig was on my todo list. I'll send it now. /r$ From Alexandre.Fayolle@logilab.fr Wed May 9 16:56:58 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 9 May 2001 17:56:58 +0200 (CEST) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings Message-ID: Hello, I would appreciate if someone could provide information about Sax/Sax2 interface in pyxml (or provide some pointer on some documentation). Specifically, my understanding is that the prototype of the startElement method of the ContentHandler interface in Sax2 is supposed to take 4 arguments (nsUri, localName, qName, attributes). However, in xml.sax.handler, ContentHandler's startElement method has the same prototype as xml.sax.saxlib's DocumentHandler (which should be used with a SAX 1 parser), i.e. name, attributes. I'm trying to write a parser for a non-xml document, that should behave as a sax parser for the external world, especially to the various DOM reader classes available around here. Some of these seem to be expecting calls to startElementNS (I'm thinking specifically of FT's pDomletteReader), with a signature similar to Java's SAX2 ContentHandler.startElement method. Any help appreciated. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From larsga@garshol.priv.no Wed May 9 17:23:57 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 09 May 2001 18:23:57 +0200 Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: References: Message-ID: * Alexandre Fayolle | | Specifically, my understanding is that the prototype of the startElement | method of the ContentHandler interface in Sax2 is supposed to take 4 | arguments (nsUri, localName, qName, attributes). This is not correct. In SAX 2.0 there are two startElement methods: startElement(name, attributes) startElementNS(name, qname, attributes) In the latter, name is a (nsuri, localname) tuple. | However, in xml.sax.handler, ContentHandler's startElement method | has the same prototype as xml.sax.saxlib's DocumentHandler (which | should be used with a SAX 1 parser), i.e. name, attributes. That is correct. This is used when the XML processor is not in namespace mode. | I'm trying to write a parser for a non-xml document, that should behave as | a sax parser for the external world, especially to the various DOM reader | classes available around here. Some of these seem to be expecting calls to | startElementNS (I'm thinking specifically of FT's pDomletteReader), with | a signature similar to Java's SAX2 ContentHandler.startElement method. A good DOM builder should accept calls to both startElement and startElementNS. It should also require applications to be consistent and only use one or the other throughout a single document. I hope this helps. --Lars M. From noreply@sourceforge.net Wed May 9 17:29:00 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 09 May 2001 09:29:00 -0700 Subject: [XML-SIG] [ pyxml-Patches-422689 ] RIL parser fixes Message-ID: Patches item #422689, was updated on 2001-05-09 09:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422689&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: RIL parser fixes Initial Comment: The RilParser class makes some strange calls to construct new Predicate classes, some of which do not exist. Here's an attempt to fix this. Not much tested. Please examine carefully before applying. (the diff is against a version patched with the patch I submitted earlier today). Cheers Alexandre Fayolle ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422689&group_id=6473 From fdrake@acm.org Wed May 9 17:35:23 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 12:35:23 -0400 (EDT) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: References: Message-ID: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > A good DOM builder should accept calls to both startElement and > startElementNS. It should also require applications to be consistent > and only use one or the other throughout a single document. This is not clear; does the DOM specification indicate that only one or the other can be used? I think it seems very careful to indicate that both can be used, as long as expectations are limited. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Alexandre.Fayolle@logilab.fr Wed May 9 17:47:38 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 9 May 2001 18:47:38 +0200 (CEST) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: Message-ID: On 9 May 2001, Lars Marius Garshol wrote: > > * Alexandre Fayolle > | > | Specifically, my understanding is that the prototype of the startElement > | method of the ContentHandler interface in Sax2 is supposed to take 4 > | arguments (nsUri, localName, qName, attributes). > > This is not correct. In SAX 2.0 there are two startElement methods: > > startElement(name, attributes) > startElementNS(name, qname, attributes) > > In the latter, name is a (nsuri, localname) tuple. I use http://www.megginson.com/SAX/Java/Javadoc/ as a reference. In this documentation, the ContentHandler interface has no startElementNS method, only startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, Attributes atts). If I'm not using the right reference, could someone please give me a pointer to the right one. Otherwise, I do not understand where the startElementNS method comes from. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Wed May 9 17:59:03 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 12:59:03 -0400 (EDT) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: References: Message-ID: <15097.30551.770652.554305@cj42289-a.reston1.va.home.com> Alexandre Fayolle writes: > If I'm not using the right reference, could someone please give me a > pointer to the right one. Otherwise, I do not understand where the > startElementNS method comes from. Documentation for the Python SAX2 bindings is given in the Python Library Reference: http://www.python.org/doc/current/lib/markup.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Alexandre.Fayolle@logilab.fr Wed May 9 18:17:18 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 9 May 2001 19:17:18 +0200 (CEST) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: <15097.30551.770652.554305@cj42289-a.reston1.va.home.com> Message-ID: On Wed, 9 May 2001, Fred L. Drake, Jr. wrote: > Documentation for the Python SAX2 bindings is given in the Python > Library Reference: > > http://www.python.org/doc/current/lib/markup.html Thanks. This is what I was looking for. (I'm still stuck with python 1.52, and this is not part of the Python doc I'm used to read daily). Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From larsga@garshol.priv.no Wed May 9 18:19:00 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 09 May 2001 19:19:00 +0200 Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com> References: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com> Message-ID: * Lars Marius Garshol | | A good DOM builder should accept calls to both startElement and | startElementNS. It should also require applications to be consistent | and only use one or the other throughout a single document. * Fred L. Drake, Jr. | | This is not clear; does the DOM specification indicate that only one | or the other can be used? I think it seems very careful to indicate | that both can be used, as long as expectations are limited. You are right about that, but SAX makes it clear that you must consistently use either startElement or startElementNS, so this is a SAX 2.0 issue more than a DOM issue. I wouldn't get too upset if the DOM doesn't check this, though. --Lars M. From karl@digicool.com Wed May 9 19:23:32 2001 From: karl@digicool.com (Karl Anderson) Date: 09 May 2001 11:23:32 -0700 Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI In-Reply-To: "Martin v. Loewis"'s message of "Wed, 9 May 2001 09:08:00 +0200" References: <200105082318.f48NIVb19293@localhost.local> <200105090708.f49780n00963@mira.informatik.hu-berlin.de> Message-ID: Martin v. Loewis writes: > However, until PyXML is released with these packages, you should not > assume that they actually work. Indeed, one possible scenario is that > the next PyXML release does *not* included these subdirectories. Yeah, CVS checkouts and all, I know :) FYI, my motivation is estimating the chances of a stable release in the near future that works with ParsedXML. -- Karl Anderson karl@digicool.com From Christine Hall"

Hello,

I visited glory.python.or.kr and I noticed that you are not listed on some search engines. I am sure you can increase the number of people who visit glory.python.or.kr . Do you know TrafficMagnet? TrafficMagnet is a unique technology that instantly submits your web site to over 300,000+ search engines and directories every month. This is a very low-cost and effective way of advertising your site.

To check our prices and submit glory.python.or.kr to 300,000+ search engines, go to TrafficMagnet.net

I would love to hear from you.

Best Regards,
Christine Hall
Sales & Marketing
www.TrafficMagnet.net

   

From noreply@sourceforge.net Wed May 9 23:41:27 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 09 May 2001 15:41:27 -0700 Subject: [XML-SIG] [ pyxml-Patches-422801 ] CoreFunctions misusing ExpandedName Message-ID: Patches item #422801, was updated on 2001-05-09 15:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422801&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: CoreFunctions misusing ExpandedName Initial Comment: Using 4Suite cvs checkout. xpath.CoreFunctions.py was looking at the wrong attrs of ExpandedName.ExpandedName, causing tests to fail. This patch uses the attrs in ExpandedName. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422801&group_id=6473 From fdrake@acm.org Thu May 10 03:03:52 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 22:03:52 -0400 (EDT) Subject: [XML-SIG] clarification request about Sax/Sax2 mappings In-Reply-To: References: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com> Message-ID: <15097.63240.342240.594303@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > You are right about that, but SAX makes it clear that you must > consistently use either startElement or startElementNS, so this is a > SAX 2.0 issue more than a DOM issue. I wouldn't get too upset if the > DOM doesn't check this, though. I'm not going to worry about it, either. I think their are two problems here: that the Namespaces in XML specification is poorly written and does not cover everything it should (interaction with DTDs being a major issue in my book, though part of that may be a lack of clarity in the text rather than the issues not having been approached), and the conflation of NS and non-NS documents in the DOM. But neither of those issues is directly related to the Python bindings for the APIs, so I guess we've strayed a little. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Thu May 10 06:56:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 10 May 2001 07:56:45 +0200 Subject: [XML-SIG] What are the limits of soap and python? In-Reply-To: (stuff4gary@hotmail.com) References: Message-ID: <200105100556.f4A5ujB01569@mira.informatik.hu-berlin.de> > I am pretty confused by the SOAP discussion (it hasn't any connection with > the operas browser movement has it!!). I don't know what the operas browswer movement is, but I guess it has no connection to SOAP, no. > I imagine it is like OCX, Dynamic Data Exchange, Windows Script, applescript > !! Not really. SOAP messages are typically received by Web servers; I don't think anybody uses them to control applications on the same machine. > Can it push buttons on the system and fill out text fields, with > text and automate through applications? SOAP, on its own, is just a protocol for access to objects. Whether the objects, when accessed, fill out text fields - that is up to the objects being accessed. You cannot push buttons using SOAP. > Can I set it up hot folders with it to send pictures through > photoshop, OCR and databases?? I'm not sure what a hot folder is, but I guess photoshop would not react to or emit SOAP messages; nor do I know any database system that supports SOAP directly (although many web servers may give indirect access to a database through SOAP). > is that even possible in python? Doing all these things is possible in Python, I believe - but you'ld have to do them without SOAP. Regards, Martin From mike@pdc.kth.se Thu May 10 13:25:41 2001 From: mike@pdc.kth.se (Mike Hammill) Date: Thu, 10 May 2001 14:25:41 +0200 Subject: [XML-SIG] Help with removeChild() Message-ID: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se> Dear xml-sig, I hope someone with a bit more experience can help me. I'm trying to use xml.minidom to clean up an XML file. In brief, how does one walk through the DOM tree and remove certain children using recursion? My attempt walks the tree, but some children are skipped. I believe this is because when children are removed, it not reflected in the calling program list of children. Here is a simplified version of the problem. XML file: I would like to get rid of any element that has no attributes and who's text element is just whitespace, tabs, or linefeeds. I wrote a little tree walker the reduces the above to: So far, so good. When I apply the following code, however, the result is: That is only elements a, c, and e are eliminated. The code is: def trim_dom_more(node): if node.hasChildNodes(): for child in node.childNodes: trim_dom_more(child) else: if node.nodeType == node.ELEMENT_NODE: if (not node.hasAttributes()) and (not node.hasChildNodes()): node.parentNode.removeChild(node) I think I understand that the problem is that node.childNodes gets evaluated and put on the stack, but then after the removeChild, this stacked list is not re-evaluated so not all children are iterated through. But how to solve that? Any advice welcome! Thanks Mike From tpassin@home.com Thu May 10 13:59:19 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 10 May 2001 08:59:19 -0400 Subject: [XML-SIG] Help with removeChild() References: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se> Message-ID: <000a01c0d951$076f7fe0$7cac1218@reston1.va.home.com> I think your problem is the inverse- the childNodes list ***is*** getting updated by the DOM after each removal. [Mike Hammill] > That is only elements a, c, and e are eliminated. The code is: > > def trim_dom_more(node): > if node.hasChildNodes(): > for child in node.childNodes: > trim_dom_more(child) > else: > if node.nodeType == node.ELEMENT_NODE: > if (not node.hasAttributes()) and (not node.hasChildNodes()): > node.parentNode.removeChild(node) > > I think I understand that the problem is that node.childNodes gets evaluated > and put on the stack, but then after the removeChild, this stacked list is not > re-evaluated so not all children are iterated through. But how to solve that? Try this: def trim_dom_more(node): if node.hasChildNodes(): children=node.childNodes[:] for child in children: trim_dom_more(child) Now you are iterating through a static copy of the list. It wouldn't work if the child nodes could get changed by another thread, but I don't suppose that's going to happen here. Or you could do while node.hasChildNodes(): trim_dom_more(node.childNodes[0]) That would execute slower, though. But it wouldn't get fooled by any other activity in the DOM. Cheers, Tom P From noreply@sourceforge.net Thu May 10 14:53:47 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 10 May 2001 06:53:47 -0700 Subject: [XML-SIG] [ pyxml-Bugs-423027 ] startElementNS bug in pDomletteReader Message-ID: Bugs item #423027, was updated on 2001-05-10 06:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423027&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Logilab (ornicar) Assigned to: Nobody/Anonymous (nobody) Summary: startElementNS bug in pDomletteReader Initial Comment: Hi, I've been trying to make a custom Sax parser work using the startElementNS() method... No way, this function needs some updates, and I don't exactly know how to fix it. In fact endElementNS() tries to pop elements from internal stacks which have not been pushed in before, especially namespaces... Cheers, Bruno Van Frachem, Logilab. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423027&group_id=6473 From mike@pdc.kth.se Thu May 10 15:03:40 2001 From: mike@pdc.kth.se (Michael Hammill) Date: Thu, 10 May 2001 16:03:40 +0200 Subject: [XML-SIG] Help with removeChild() In-Reply-To: <000a01c0d951$076f7fe0$7cac1218@reston1.va.home.com> References: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se> Message-ID: <5.1.0.14.2.20010510155431.02d383d0@localhost> Dear Thomas, Your solution below works great! I have discovered something else quite instructive (at least to me). When I first saw your solution, I thought "oh, I've tried that already". Silly of me. What I had tried was not exactly the same, but seemingly close. I had set children = node.childNodes *without* the final '[:]'. In testing the solution below, I found that if the [:] is left out, the result is the same as I got before (an incorrect trimming); however, with the [:] it works fine. I'm sorry if this is a newbe kind of confusion. I had always thought "list" was equivalent to "list[:]", but apparently not. Thank you again! Mike [...] > Try this: > > def trim_dom_more(node): > if node.hasChildNodes(): > children=node.childNodes[:] > for child in children: > trim_dom_more(child) > >Now you are iterating through a static copy of the list. It wouldn't work >if the child nodes could get changed by another thread, but I don't suppose >that's going to happen here. [...] From noreply@sourceforge.net Thu May 10 17:44:33 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 10 May 2001 09:44:33 -0700 Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported Message-ID: Bugs item #423086, was updated on 2001-05-10 09:44 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423086&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 9 Submitted By: Lars Marius Garshol (larsga) Assigned to: Nobody/Anonymous (nobody) Summary: xml.xpath cannot be imported Initial Comment: When importing xml.xpath, xml.xpath.Conversions gets sucked in, and that attempts to import xml.utils.boolean, which does not exist. The result is that any attempt to import xml.xpath fails. Did someone forget to commit something? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423086&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Thu May 10 18:08:35 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 10 May 2001 19:08:35 +0200 Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported In-Reply-To: (noreply@sourceforge.net) References: Message-ID: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de> > When importing xml.xpath, xml.xpath.Conversions gets sucked in, and > that attempts to import xml.utils.boolean, which does not exist. The > result is that any attempt to import xml.xpath fails. Did someone > forget to commit something? xml.utils.boolean should be compiled from extensions/boolean.c, and installed in xml/utils. Did you perform a 'setup.py install', and it still did not work? Regards, Martin From larsga@garshol.priv.no Thu May 10 18:20:45 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 10 May 2001 19:20:45 +0200 Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported In-Reply-To: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de> References: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | xml.utils.boolean should be compiled from extensions/boolean.c, and | installed in xml/utils. Did you perform a 'setup.py install', and it | still did not work? Arrrghh! No, I was so thick-headed I didn't even think of that. Sorry. Will close the bug now. --Lars M. From noreply@sourceforge.net Thu May 10 19:53:02 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 10 May 2001 11:53:02 -0700 Subject: [XML-SIG] [ pyxml-Patches-423122 ] xml.sax.writer places chardata in tags Message-ID: Patches item #423122, was updated on 2001-05-10 11:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=423122&group_id=6473 Category: sax Group: None Status: Open Resolution: None Priority: 5 Submitted By: Lars Marius Garshol (larsga) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: xml.sax.writer places chardata in tags Initial Comment: writer produces output of the form where the element name was 'doc' and the character data 'content'. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=423122&group_id=6473 From stuff4gary@hotmail.com Fri May 11 00:13:44 2001 From: stuff4gary@hotmail.com (gary cor) Date: Thu, 10 May 2001 23:13:44 Subject: [XML-SIG] XForms and SVG support in python? Message-ID: Dear All, I am working part-time at a publishers where we do magazines in DTP packages - Quark 5.0, Illustrator 9.0 and Photoshop 6.0 can now export as SVG for the web (replacing EPS format which we currently use and has never been supported on the web!!). Soon they are adding Xform fields for SVG as well! I am wondering whether anyone could forsee any problems or opportunities using the Fieldstorage() cgi from Python to process Xform data? or in changing any parts of SVG on the fly, eg like boxes in our advert sections? Gary PS I found some good tutorials for XML etc. and programming at http://www.w3schools.com - I am a bit disappointed :-( they had nothing on python, is python obscure? _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From michael.clark@ntlworld.com Sat May 12 00:06:20 2001 From: michael.clark@ntlworld.com (michael.clark) Date: Sat, 12 May 2001 00:06:20 +0100 Subject: [XML-SIG] FREE SMS Messaging Web Service Message-ID: <000001c0da6e$fe533ea0$ec07ff3e@clarks> For those who are interested in SOAP web services, I've just located this new web service. It seems to be the first commercial one I've come across that's a) working & b) actually useful. You can send SMS messages to supposedly any mobile phone in the world, free of charge! We've tried it and so far we've sent messages to people in USA, UK and ASIA, pretty neat we thought! http://www.salcentral.com/help/smsreg.htm Mark From greg.simmons@ntlworld.com Sun May 13 09:53:15 2001 From: greg.simmons@ntlworld.com (Greg Simmons) Date: Sun, 13 May 2001 09:53:15 +0100 Subject: [XML-SIG] SOAP SMS Messaging Web Services for FREE Message-ID: <001d01c0db8a$27d1bb00$e08f69d5@clarks> For those who are interested in SOAP web services, I've just located this new web service. It seems to be the first commercial one I've come across that's a) working & b) actually useful. You can send SMS messages to supposedly any mobile phone in the world, free of charge! We've tried it and so far we've sent messages to people in USA, UK and ASIA, pretty neat we thought! http://www.salcentral.com/help/smsreg.htm Greg. From uche.ogbuji@fourthought.com Sun May 13 14:36:42 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 07:36:42 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Message-ID: <3AFE8DEA.F92054CB@fourthought.com> Just a note: I'm guessing you intended to x-post this to xml-sig, not python-dev. I've changed the headers. "Martin v. Loewis" wrote: > > Currently, 4XSLT has a dependency on the DOM implementation in terms > of memory management (among other dependencies). I'd like to reduce > this dependency, by providing a centralized function that knows how to > release nodes. > > In PyXML, I currently use > > # Define ReleaseNode in a DOM-independent way > import xml.dom.ext > import xml.dom.minidom > def _releasenode(n): > if isinstance(n, xml.dom.minidom.Node): > n.unlink() > else: > xml.dom.ext.ReleaseNode(n) > > try: > from Ft.Lib import pDomlette > def ReleaseNode(n): > if isinstance(n, pDomlette.Node): > pDomlette.ReleaseNode(n) > else: > _releasenode(n) > _XsltElementBase = pDomlette.Element > except ImportError: > ReleaseNode = _releasenode > from minisupport import _XsltElementBase Wouldn't it be better to make up a Reader class for minidom which implements a releaseNode method similar to what you have above? The idea behind the reader architecture is to manage such things. There might be some places in 4XSLT that don't properly call releaseNode on the reader instance itself, but I'd rather fix them to do so. What's "minisupport" and "_XsltElementBase"? > This code knows how to release minidom, 4DOM, and pDomlette nodes, and > supports installations without 4Suite (i.e. without pDomlette). I've > put this into xslt/__init__.py, so that all callers of > Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. > If desired, I could produce a patch against the public Ft CVS. > > As a slightly independent question, such a function also ought to > support DOM implementations not known to it; I'm thinking in > particular of the Zope DOMs. I'd like to hear proposals on how such an > interface should work; I see three options: > > a) it is an operation on the document node (or any node), as in minidom. > b) it is an operation on the DOM implementation (almost as in 4Suite; > you'd need to navigate from the node to the implementation, then > you'd need a well-known operation on the implementation) > c) the code assumes that no release activity is necessary for unknown > DOMs, effectively believing in reference counting, garbage collection, > acquisition, and other black art. Maybe we need a general Reader class for unknown DOM classes. This would require the unification of DOM factories we were discusing a few months ago, but the releaseNode method could just be a NOP, i.e. your (c) option. > Any comments appreciated, in particular > 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and > 2. from authors of other DOMs on a general memory management API for > Python DOM. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sun May 13 15:41:25 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 16:41:25 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFE8DEA.F92054CB@fourthought.com> (message from Uche Ogbuji on Sun, 13 May 2001 07:36:42 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> Message-ID: <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> [yes, I indeed meant to cross-post to xml-sig] > Wouldn't it be better to make up a Reader class for minidom which > implements a releaseNode method similar to what you have above? The > idea behind the reader architecture is to manage such things. How would that work? Assume there was a reader class for minidom, and the XSLT runtime had a node object. How can you release the node? Or do you need to know the reader class which originally created that node as well? That would be not so good: the node might not have been created by a reader at all, as it might have come directly from the DOM implementation. > There might be some places in 4XSLT that don't properly call releaseNode > on the reader instance itself, but I'd rather fix them to do so. There is a number of those. Grepping for ReleaseNode in the public CVS gives Processor.py: pDomlette.ReleaseNode(rtfRoot) Processor.py: xml.dom.ext.ReleaseNode(rtfRoot) Processor.py: pDomlette.ReleaseNode(self._dummyDoc) Stylesheet.py: pDomlette.ReleaseNode(imp.stylesheet.ownerDocument) StylesheetReader.py: pDomlette.ReleaseNode(inc) StylesheetReader.py: pDomlette.ReleaseNode(sheet.ownerDocument) StylesheetReader.py: pDomlette.ReleaseNode(inc) XsltContext.py: pDomlette.ReleaseNode(doc) XsltContext.py: pDomlette.ReleaseNode(rtf) XsltContext.py: xml.dom.ext.ReleaseNode(rtf) > What's "minisupport" and "_XsltElementBase"? minisupport is an emulation of pDomlette equivalents as used by 4XSLT, implemented using pDomlette. There are various pieces that I found necessary: readers, ReaderBase, and Element. The latter is there to support pickling, and to support the __init__ signature expected from XsltElement. > Maybe we need a general Reader class for unknown DOM classes. This > would require the unification of DOM factories we were discusing a few > months ago, but the releaseNode method could just be a NOP, i.e. your > (c) option. I don't recall that discussion. Your comment seems to imply a relationship between a DOM implementation and a Reader class, which I can't find in the 4Suite code. What do I miss? Regards, Martin From uche.ogbuji@fourthought.com Sun May 13 18:48:28 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 11:48:28 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> Message-ID: <3AFEC8EC.D6CFC2F2@fourthought.com> I see what you mean. I was thinking about running 4XSLT on non-domlette source nodes. I'm guessing you've been working on code to allow XsltElements and result-tree fragments to use minidom, so you're talking about calls to releaseNode that handle these things. Well, I think the best solution to this, rather than making a universal ReleaseNode function, is to generalize the Reader architecture into a general factory that can read, initialize and dispose of nodes. This could be a Python DOM standard binding extension to DOMImplementation. The earlier conversation I alluded to is the DOMImplementationFactory discussion. If the DOMImplementation gets some standard add-ons, then this can be used to determine the destruction mechanism in the general case. http://mail.python.org/pipermail/xml-sig/2001-February/004508.html -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Sun May 13 19:19:06 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 12:19:06 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> Message-ID: <3AFED01A.3FF0E6F9@FourThought.com> Uche Ogbuji wrote: > > Wouldn't it be better to make up a Reader class for minidom which > implements a releaseNode method similar to what you have above? The > idea behind the reader architecture is to manage such things. The thing I don't like about the reader, is that you need to pass it around or store it in order to call the correct release. We could get around this by having each node store a reference to its reader when it is created. node.reader.releaseNode(node) > > There might be some places in 4XSLT that don't properly call releaseNode > on the reader instance itself, but I'd rather fix them to do so. Stylesheet nodes are the big ones (off head) because we don't keep track of what reader the stylesheet was created with so we always call pDomlette.releaseNode -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Sun May 13 19:27:00 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 12:27:00 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> Message-ID: <3AFED1F4.C11668EF@FourThought.com> "Martin v. Loewis" wrote: > > [yes, I indeed meant to cross-post to xml-sig] > > > Wouldn't it be better to make up a Reader class for minidom which > > implements a releaseNode method similar to what you have above? The > > idea behind the reader architecture is to manage such things. > > How would that work? Assume there was a reader class for minidom, and > the XSLT runtime had a node object. How can you release the node? > > Or do you need to know the reader class which originally created that > node as well? That would be not so good: the node might not have been > created by a reader at all, as it might have come directly from the > DOM implementation. This is why I vote for either the implementation has the releaseNode function, or the node itself. Readers are great for an abstract way of creating a DOM (atleast until we all support level III), but without a releationship between a node instance and its reader they don't work very well for releasing them. I also did not think of your point of nodes created without a reader but it is a good one. -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:02:20 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:02:20 +0200 Subject: [XML-SIG] Disentangling StylesheetReader from Ft.Lib Message-ID: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, only to discover that the StyleseetReader class is now much stronger connected to Ft.Lib than before, in particular to classes from pDomletteReader, and their specific instance attributes. I took the approach of providing alternative base classes to the ones provided by pDomlette, but that soon became a desaster since none of the minidom/pulldom classes bear any relationship to how the PyExpatReader and Handler classes work. I'd still like pursue my attempt of integrating 4XSLT to work without Ft.Lib, and pDomlette in particular, but I'd need some advise here. I feel that I miss some grand picture in all these classes, and how they are connected. It seems that the authors of the code lose track, too, with code duplication all over the place. So my question is: Is all this complexity really necessary? Would it be possible to simplify things by breaking down processing in multiple processing steps? It seems to me that all StylesheetReader does is to create a DOM tree, except that it creates StylesheetElement nodes where a normal DOM build would create Element nodes. If this is really all it does, I could propose some dramatic code reduction. Any proposals are welcome. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:04:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:04:39 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFEC8EC.D6CFC2F2@fourthought.com> (message from Uche Ogbuji on Sun, 13 May 2001 11:48:28 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> Message-ID: <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> > Well, I think the best solution to this, rather than making a universal > ReleaseNode function, is to generalize the Reader architecture into a > general factory that can read, initialize and dispose of nodes. This > could be a Python DOM standard binding extension to DOMImplementation. That is a solution that I could easily accept; it would take some time until all relevant implementations support the method, though, and we'd need a name for it. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:12:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:12:39 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFED01A.3FF0E6F9@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 12:19:06 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> Message-ID: <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> > The thing I don't like about the reader, is that you need to pass it > around or store it in order to call the correct release. We could get > around this by having each node store a reference to its reader when it > is created. With regard to the reader, I'd also like to point you to the level 3 load-store interfaces, http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html where they have a DOMBuilder interface. So while your Reader interface is fine as Ft-provided API, I think the DOMBuilder interface has a higher chance of getting accepted widely. Regards, Martin From uche.ogbuji@fourthought.com Sun May 13 20:31:18 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 13:31:18 -0600 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> Message-ID: <3AFEE106.4C99F9FD@fourthought.com> "Martin v. Loewis" wrote: > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, > only to discover that the StyleseetReader class is now much stronger > connected to Ft.Lib than before, in particular to classes from > pDomletteReader, and their specific instance attributes. This is to provide shared code, which, oddly enough, you advocate below. Some of the routines could indeed be moved into a generic handler that goes into xml.utils. > I took the approach of providing alternative base classes to the ones > provided by pDomlette, but that soon became a desaster since none of > the minidom/pulldom classes bear any relationship to how the > PyExpatReader and Handler classes work. This could all be helped by using mix-in classes in xml.utils. Note that I mean *real* mix-in classes, that is, classes that provide implementation but not interface (a disturbing chunk of the Python community seems to think that mixing in is just plain old inheritance). > I'd still like pursue my attempt of integrating 4XSLT to work without > Ft.Lib, and pDomlette in particular, but I'd need some advise here. I > feel that I miss some grand picture in all these classes, and how they > are connected. It seems that the authors of the code lose track, too, > with code duplication all over the place. Of course: the code is not all polished, but I must note that what you complained above in your first para was actually a step that eliminated a *great* deal of duplicated code from StylesheetReader. The solution is to move the common code somewhere accessible from PyXML. > So my question is: Is all this complexity really necessary? Would it > be possible to simplify things by breaking down processing in multiple > processing steps? It seems to me that all StylesheetReader does is to > create a DOM tree, except that it creates StylesheetElement nodes > where a normal DOM build would create Element nodes. Wow. I'd count this a huge oversimplification. The Stylesheet reader does a great deal that most readers needn't worry about, as I'd think would be obvious from a glance at te code. > If this is really > all it does, I could propose some dramatic code reduction. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun May 13 20:33:26 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 13:33:26 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> Message-ID: <3AFEE186.99AFB1EF@fourthought.com> "Martin v. Loewis" wrote: > > > Well, I think the best solution to this, rather than making a universal > > ReleaseNode function, is to generalize the Reader architecture into a > > general factory that can read, initialize and dispose of nodes. This > > could be a Python DOM standard binding extension to DOMImplementation. > > That is a solution that I could easily accept; it would take some time > until all relevant implementations support the method, though, and > we'd need a name for it. I'd favor cleanUp(). And I'm not worried that implementations would need to catch up. The desire for 4XSLT interop will accelerate this work. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun May 13 20:34:08 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 13:34:08 -0600 Subject: [XML-SIG] [Fwd: [4suite] ReleaseNode interface in 4XSLT] Message-ID: <3AFEE1B0.5461D63D@fourthought.com> This is a multi-part message in MIME format. --------------28F27A9C4716E5DCE6516942 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python --------------28F27A9C4716E5DCE6516942 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: <4suite-admin@dollar.fourthought.com> Received: from dollar.fourthought.com ([204.144.146.184]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4DJLv719962; Sun, 13 May 2001 13:21:57 -0600 Received: from dollar.fourthought.com (localhost.localdomain [127.0.0.1]) by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id NAA13808; Sun, 13 May 2001 13:16:02 -0600 Received: from yen.fourthought.com (bastion.fourthought.com [204.144.146.185]) by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id NAA13772 for <4suite@dollar.fourthought.com>; Sun, 13 May 2001 13:15:50 -0600 Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4DJKl719685; Sun, 13 May 2001 13:20:48 -0600 Received: from mira.informatik.hu-berlin.de (loewis.home.cs.tu-berlin.de [130.149.147.34]) by mail.cs.tu-berlin.de (8.9.3/8.9.3) with ESMTP id VAA14169; Sun, 13 May 2001 21:15:10 +0200 (MET DST) Received: (from martin@localhost) by mira.informatik.hu-berlin.de (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) id f4DJ8lh14249; Sun, 13 May 2001 21:08:47 +0200 Message-Id: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> From: "Martin v. Loewis" To: Mike.Olson@fourthought.com CC: 4suite@fourthought.com, python-dev@python.org In-reply-to: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 12:15:46 -0600) Subject: Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> User-Agent: REMI/1.14.2 (=?ISO-8859-4?Q?Hokuhoku-=D2shima?=) Chao/1.14.1 (=?ISO-8859-4?Q?Rokujiz=F2?=) APEL/10.2 Emacs/20.7 (i386-suse-linux) MULE/4.0 (HANANOEN) MIME-Version: 1.0 (generated by REMI 1.14.2 - =?ISO-8859-4?Q?=22Hokuhoku-=D2?= =?ISO-8859-4?Q?shima=22?=) Content-Type: text/plain; charset=US-ASCII Sender: 4suite-admin@dollar.fourthought.com Errors-To: 4suite-admin@dollar.fourthought.com X-BeenThere: 4suite@lists.fourthought.com X-Mailman-Version: 2.0beta6 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Users and support for 4Suite tools <4suite.lists.fourthought.com> List-Unsubscribe: , List-Archive: http://lists.fourthought.com/pipermail/4suite/ Date: Sun, 13 May 2001 21:08:47 +0200 > What if we put these on the implementation, that or came up with a > standard interface on the node. Then, every DOM imp that wants to be > compatible with xpath/xslt needs to support this interface? > > > node.ownerDocument.implementation.releaseNode(node) > > or > > node.py_unlink() releaseNode sounds good to me; it is unlikely that W3C would give an operation that name but a different meaning. Any objections? Regards, Martin _______________________________________________ 4suite mailing list 4suite@lists.fourthought.com http://lists.fourthought.com/mailman/listinfo/4suite --------------28F27A9C4716E5DCE6516942-- From uche.ogbuji@fourthought.com Sun May 13 20:36:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 13:36:40 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> Message-ID: <3AFEE248.CA8C2BC4@fourthought.com> "Martin v. Loewis" wrote: > > > The thing I don't like about the reader, is that you need to pass it > > around or store it in order to call the correct release. We could get > > around this by having each node store a reference to its reader when it > > is created. > > With regard to the reader, I'd also like to point you to the level 3 > load-store interfaces, > > http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html > > where they have a DOMBuilder interface. So while your Reader interface > is fine as Ft-provided API, I think the DOMBuilder interface has a > higher chance of getting accepted widely. I'm quite familiar with DOM Level 3, but the Reader architecture predates this, and there is no immediate prospect of time to move to the Level 3 interfaces. Perhaps in a month or two. Of course, this could be accelerated by contributions. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sun May 13 21:17:22 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 22:17:22 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFEE186.99AFB1EF@fourthought.com> (message from Uche Ogbuji on Sun, 13 May 2001 13:33:26 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> Message-ID: <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> > I'd favor cleanUp(). On the node, or on the DOM implementation? Martin From rsalz@zolera.com Mon May 14 01:39:48 2001 From: rsalz@zolera.com (Rich Salz) Date: Sun, 13 May 2001 20:39:48 -0400 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> Message-ID: <3AFF2954.32ACAD38@zolera.com> > The thing I don't like about the reader, is that you need to pass it > around or store it in order to call the correct release. We could get > around this by having each node store a reference to its reader when it > is created. I'm in favor of this for exactly this reason. Since Python doesn't allow tilde in method names ~Node is out, so I'd go along with releaseNode() as suggested elsewhere. :) /r$ From Mike.Olson@fourthought.com Mon May 14 02:05:48 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:05:48 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> Message-ID: <3AFF2F6C.B1350B6D@FourThought.com> "Martin v. Loewis" wrote: > > > I'd favor cleanUp(). > > On the node, or on the DOM implementation? I'm infavor of on the node. It would be a lot easier to access. If it was on the implementation, you would need more logic to release an arbitrary node as only the document has the implementation reference (and document's don't have an owner document) Mike > > Martin > _______________________________________________ > 4suite mailing list > 4suite@lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/4suite -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Mon May 14 02:14:17 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:14:17 -0600 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> Message-ID: <3AFF3169.29F2B6C8@FourThought.com> "Martin v. Loewis" wrote: > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, > only to discover that the StyleseetReader class is now much stronger > connected to Ft.Lib than before, in particular to classes from > pDomletteReader, and their specific instance attributes. I was just in there as well and quite suprised how complex the code has become. I thought of doing some work on it but figured, it ain't broke..... My thoughts were that the implementation should be able to hadle it, then there would be on reader. all of the code in the Stylesheet Reader would be handled in StylesheetDocument.createElement, or atleast the marority of it. I haven't looked too closely to see if this is 100% feasible thought. > > I took the approach of providing alternative base classes to the ones > provided by pDomlette, but that soon became a desaster since none of > the minidom/pulldom classes bear any relationship to how the > PyExpatReader and Handler classes work. Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette into xml.utils? Better yet, let's merge pDomlette and minidom so there is only one domlette. pDomlette has greatly out grown its original purpose so I have not problems with moving it into XML-Sig. > > I'd still like pursue my attempt of integrating 4XSLT to work without > Ft.Lib, and pDomlette in particular, but I'd need some advise here. I > feel that I miss some grand picture in all these classes, and how they > are connected. It seems that the authors of the code lose track, too, > with code duplication all over the place. I agree. There was a lot of redundant code when I looked into it last. I think there should be one xml-sig "reader" that works off a DOMImplementation to create actual instances. Some things to note are that this would slow things down. One big speed increase the pDomlette gives us by having its own reader is that it can create elements directly and not have to use the createElementNS interface. The problem with the interface is that we have to do a "prefix + ':' + localName" just to satisfy the interface (and then the function itself does a sting.split(qname,':'). Not really a time consuming process, but when you call it 10000 it adds up. > > So my question is: Is all this complexity really necessary? Would it > be possible to simplify things by breaking down processing in multiple > processing steps? It seems to me that all StylesheetReader does is to > create a DOM tree, except that it creates StylesheetElement nodes > where a normal DOM build would create Element nodes. If this is really > all it does, I could propose some dramatic code reduction. It also does validation, processing of include and import elements, namespace aliasing, extension element processing, and more. Though like I said, I think this could be handeled in a createElementNS of a StylesheetDocument class. Mike > > Any proposals are welcome. > > Regards, > Martin > > _______________________________________________ > 4suite mailing list > 4suite@lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/4suite -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Mon May 14 02:20:48 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:20:48 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> Message-ID: <3AFF32F0.8AAAED0C@FourThought.com> "Martin v. Loewis" wrote: > > > The thing I don't like about the reader, is that you need to pass it > > around or store it in order to call the correct release. We could get > > around this by having each node store a reference to its reader when it > > is created. > > With regard to the reader, I'd also like to point you to the level 3 > load-store interfaces, > > http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html > > where they have a DOMBuilder interface. So while your Reader interface > is fine as Ft-provided API, I think the DOMBuilder interface has a > higher chance of getting accepted widely. Agreed, same with xml.dom.ext.Print. Infact, all of the stuff in xml.dom.ext was originally put there as "stuff the w3c will add eventually" mainly the reader and printer interfaces. BAck when it was only level I, there were functions to get a nodes namespace URI, prefix, and local name in the ext directory. We moved to level II and thoase were not needed. I think the same should happen with the printers and readers. However, are we ready to move to level III? Is level III ready to be moved too? I don't think anyone here(at FT) will have too much time to work on it for a month or too. We are really trying to get 1.0 out. 4Suite has been in beta for 3 years as of June 1 :) This isn't to say that someone else can't do it and we'll help when where we can. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon May 14 02:57:53 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 19:57:53 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> Message-ID: <3AFF3BA1.DB51A55A@fourthought.com> "Martin v. Loewis" wrote: > > > I'd favor cleanUp(). > > On the node, or on the DOM implementation? DOMImplementation. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon May 14 02:59:44 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 19:59:44 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> <3AFF2F6C.B1350B6D@FourThought.com> Message-ID: <3AFF3C10.32E79FDA@fourthought.com> Mike Olson wrote: > > "Martin v. Loewis" wrote: > > > > > I'd favor cleanUp(). > > > > On the node, or on the DOM implementation? > > I'm infavor of on the node. It would be a lot easier to access. If it > was on the implementation, you would need more logic to release an > arbitrary node as only the document has the implementation reference > (and document's don't have an owner document) Fine with me. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon May 14 03:10:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 20:10:09 -0600 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> Message-ID: <3AFF3E81.7473BD6C@fourthought.com> Mike Olson wrote: > > "Martin v. Loewis" wrote: > > > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, > > only to discover that the StyleseetReader class is now much stronger > > connected to Ft.Lib than before, in particular to classes from > > pDomletteReader, and their specific instance attributes. > > I was just in there as well and quite suprised how complex the code has > become. I thought of doing some work on it but figured, it ain't > broke..... This is a false impression. The code is actually quite simpler than it was before. In the past, we had the code for mapping prefixes to NSUris releated in pDomlette/PyExpat, pDomlette/SAX and StylesheetReader. Now it's in a single place. There are many other places where code is now shared where before it was duplicated. It certainly needs a lot of polish still: the main problem is that all the reader systems have evolved separately, and mix-in based implementation merging is probbaly the best solution. > My thoughts were that the implementation should be able to hadle it, > then there would be on reader. all of the code in the Stylesheet Reader > would be handled in StylesheetDocument.createElement, or atleast the > marority of it. I haven't looked too closely to see if this is 100% > feasible thought. I don't favor this. I think tight coupling with the parse mechanism is important for efficiency. It would be better to hav e a separate fall-back Stylesheet Reader that did things throught DOM interface only (althought I'm not sure what this would buy us since the same amount of work would then need to be done in the DOM implementation). > > I took the approach of providing alternative base classes to the ones > > provided by pDomlette, but that soon became a desaster since none of > > the minidom/pulldom classes bear any relationship to how the > > PyExpatReader and Handler classes work. > > Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette > into xml.utils? Better yet, let's merge pDomlette and minidom so there > is only one domlette. pDomlette has greatly out grown its original > purpose so I have not problems with moving it into XML-Sig. I disagree with the idea of merging pDomlette and minidom, but I have no problem mocing pDomlette to xml.utils. > > I'd still like pursue my attempt of integrating 4XSLT to work without > > Ft.Lib, and pDomlette in particular, but I'd need some advise here. I > > feel that I miss some grand picture in all these classes, and how they > > are connected. It seems that the authors of the code lose track, too, > > with code duplication all over the place. > > I agree. There was a lot of redundant code when I looked into it last. > I think there should be one xml-sig "reader" that works off a > DOMImplementation to create actual instances. Disagree. See above. Things can be parameterized more usign DIMImp, but not at the parser interface level. > Some things to note are that this would slow things down. One big speed > increase the pDomlette gives us by having its own reader is that it can create > elements directly and not have to use the createElementNS interface. The problem > with the interface is that we have to do a "prefix + ':' + localName" > just to satisfy the interface (and then the function itself does a > sting.split(qname,':'). Not really a time consuming process, but when > you call it 10000 it adds up. There's more to it than just this. There is a lot about the DOM factory interfaces that is very inefficient. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uogbuji@fourthought.com Mon May 14 04:31:14 2001 From: uogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 13 May 2001 21:31:14 -0600 Subject: [XML-SIG] [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT (fwd) Message-ID: <200105140331.f4E3VEt12406@localhost.local> ------- Forwarded Message Return-Path: Received: from mail.fourthought.com [204.144.146.185] by localhost with IMAP (fetchmail-5.6.8) for uogbuji@localhost (single-drop); Sun, 13 May 2001 20:10:58 -0600 (MDT) Received: from mail.python.org (mail.python.org [63.102.49.29]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E18N706668 for ; Sun, 13 May 2001 19:08:23 -0600 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 3.21 #1) id 14z6qB-0004Y8-00; Sun, 13 May 2001 21:08:03 -0400 Received: from [204.144.146.185] (helo=yen.fourthought.com) by mail.python.org with esmtp (Exim 3.21 #1) id 14z6q5-0004Wh-00 for python-dev@python.org; Sun, 13 May 2001 21:07:57 -0400 Received: from FourThought.com (IDENT:molson@usrtcc1-pool2-38.prolynx.com [63.122.17.102]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E17k706656; Sun, 13 May 2001 19:07:46 -0600 Message-ID: <3AFF2E8B.31B9ED97@FourThought.com> From: Mike Olson Organization: FourThought, Inc X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Martin v. Loewis" CC: 4suite@fourthought.com, python-dev@python.org References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik. hu-berlin.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT Sender: python-dev-admin@python.org Errors-To: python-dev-admin@python.org X-BeenThere: python-dev@python.org X-Mailman-Version: 2.0.5 (101270) Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Python core developers List-Unsubscribe: , List-Archive: Date: Sun, 13 May 2001 19:02:03 -0600 "Martin v. Loewis" wrote: > > > What if we put these on the implementation, that or came up with a > > standard interface on the node. Then, every DOM imp that wants to be > > compatible with xpath/xslt needs to support this interface? > > > > > > node.ownerDocument.implementation.releaseNode(node) > > > > or > > > > node.py_unlink() > > releaseNode sounds good to me; it is unlikely that W3C would give an > operation that name but a different meaning. Any objections? Should we standardize all of the python xml extensions with a py prefix? pyReleaseNode or py_releaseNode? Then we will never have to worry about a name clash. Mike > > Regards, > Martin - -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev ------- End of Forwarded Message From martin@loewis.home.cs.tu-berlin.de Mon May 14 06:42:58 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 07:42:58 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF32F0.8AAAED0C@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:20:48 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> <3AFF32F0.8AAAED0C@FourThought.com> Message-ID: <200105140542.f4E5gwX01307@mira.informatik.hu-berlin.de> > However, are we ready to move to level III? Is level III ready to be > moved too? No, and no. I would not actively change or drop existing code until DOM Level 3 is almost finished (proposed recommendation, or some such). It's just a thing to take into consideration when designing new code. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 14 06:39:34 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 07:39:34 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF2F6C.B1350B6D@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:05:48 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> <3AFF2F6C.B1350B6D@FourThought.com> Message-ID: <200105140539.f4E5dYx01305@mira.informatik.hu-berlin.de> > > > > > I'd favor cleanUp(). > > > > On the node, or on the DOM implementation? > > I'm infavor of on the node. It would be a lot easier to access. If it > was on the implementation, you would need more logic to release an > arbitrary node as only the document has the implementation reference > (and document's don't have an owner document) In that case, I'd prefer unlink, since this is what is already documented for minidom. Regards, Martin From uche.ogbuji@fourthought.com Mon May 14 08:06:00 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 14 May 2001 01:06:00 -0600 Subject: [XML-SIG] [Fwd: [4suite] ReleaseNode interface in 4XSLT] Message-ID: <3AFF83D8.AFF83E34@fourthought.com> This is a multi-part message in MIME format. --------------8D5F80E05CA0787A819F3271 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python --------------8D5F80E05CA0787A819F3271 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: <4suite-admin@dollar.fourthought.com> Received: from dollar.fourthought.com ([204.144.146.184]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E5vl722886; Sun, 13 May 2001 23:57:47 -0600 Received: from dollar.fourthought.com (localhost.localdomain [127.0.0.1]) by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id XAA24241; Sun, 13 May 2001 23:52:18 -0600 Received: from yen.fourthought.com (bastion.fourthought.com [204.144.146.185]) by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id XAA24066 for <4suite@dollar.fourthought.com>; Sun, 13 May 2001 23:50:08 -0600 Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E5t7722581; Sun, 13 May 2001 23:55:07 -0600 Received: from mira.informatik.hu-berlin.de (loewis.home.cs.tu-berlin.de [130.149.147.34]) by mail.cs.tu-berlin.de (8.9.3/8.9.3) with ESMTP id HAA28334; Mon, 14 May 2001 07:54:00 +0200 (MET DST) Received: (from martin@localhost) by mira.informatik.hu-berlin.de (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) id f4E5cOb01301; Mon, 14 May 2001 07:38:24 +0200 Message-Id: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de> From: "Martin v. Loewis" To: Mike.Olson@fourthought.com CC: 4suite@fourthought.com, python-dev@python.org In-reply-to: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:02:03 -0600) Subject: Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com> User-Agent: REMI/1.14.2 (=?ISO-8859-4?Q?Hokuhoku-=D2shima?=) Chao/1.14.1 (=?ISO-8859-4?Q?Rokujiz=F2?=) APEL/10.2 Emacs/20.7 (i386-suse-linux) MULE/4.0 (HANANOEN) MIME-Version: 1.0 (generated by REMI 1.14.2 - =?ISO-8859-4?Q?=22Hokuhoku-=D2?= =?ISO-8859-4?Q?shima=22?=) Content-Type: text/plain; charset=US-ASCII Sender: 4suite-admin@dollar.fourthought.com Errors-To: 4suite-admin@dollar.fourthought.com X-BeenThere: 4suite@lists.fourthought.com X-Mailman-Version: 2.0beta6 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Users and support for 4Suite tools <4suite.lists.fourthought.com> List-Unsubscribe: , List-Archive: http://lists.fourthought.com/pipermail/4suite/ Date: Mon, 14 May 2001 07:38:24 +0200 > Should we standardize all of the python xml extensions with a py > prefix? pyReleaseNode or py_releaseNode? Then we will never have to > worry about a name clash. IMO, no. The entire interface together is the Python DOM mapping. In the unlikely event of a name clash, we could still decide to rename the DOM function, or find some other magic (e.g. overloading on the argument count). Regards, Martin _______________________________________________ 4suite mailing list 4suite@lists.fourthought.com http://lists.fourthought.com/mailman/listinfo/4suite --------------8D5F80E05CA0787A819F3271-- From martin@loewis.home.cs.tu-berlin.de Mon May 14 08:26:46 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 09:26:46 +0200 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: <3AFEE106.4C99F9FD@fourthought.com> (message from Uche Ogbuji on Sun, 13 May 2001 13:31:18 -0600) References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFEE106.4C99F9FD@fourthought.com> Message-ID: <200105140726.f4E7QkI01878@mira.informatik.hu-berlin.de> >> It seems to me that all StylesheetReader does is to >> create a DOM tree, except that it creates StylesheetElement nodes >> where a normal DOM build would create Element nodes. > Wow. I'd count this a huge oversimplification. The Stylesheet reader > does a great deal that most readers needn't worry about, as I'd think > would be obvious from a glance at te code. I'd like to discuss specific aspects, then. Looking at the current public CVS, I see: fromStream: duplicates ReaderMixin.fromStream, then adds call to sheet.setup(), and some error handling initParser: duplicates PyExpatReader.initParser. It uses Utf8OnlyHandler sometimes, but I could not find that class. _completeTextNode: creates LiteralText instead of Text nodes. Also does not deal with top_node, but I'm not sure whether this is on purpose _initializeSheet: has no equivalent elsewhere _handleExtUris: has no equivalent elsewhere processingInstruction: Does *not* create PI nodes comment: Likewise startElement: great similarities with Handler.startElement. The significant differences seem to be: - creates element nodes based on g_mappings[nsuri][localname], extension tables, or creates LiteralElement - processes xsl:include somehow (?) - passes attributes through _handleExtUris for xsl:stylesheet endElement: great overload with Handler.endElement; I could not tell whether differences are on purpose or by mistake characters: does not deal with _includeDepth and force8Bit (again, this might be by mistake) Did I miss aspects of the functionality relevant to proper operation of the StylesheetReader? So all in all, it still seems to me that the essential difference is what nodes are created; the control logic and parsing data structures seem to be duplicates of the code found in the handler. That, in turn, suggests that using a standard DOM builder with a different DOM implementation would achieve the same effect. Regards, Martin From fdrake@acm.org Mon May 14 15:08:44 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 10:08:44 -0400 (EDT) Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFEE248.CA8C2BC4@fourthought.com> References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> <3AFEE248.CA8C2BC4@fourthought.com> Message-ID: <15103.59116.344325.572131@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > I'm quite familiar with DOM Level 3, but the Reader architecture > predates this, and there is no immediate prospect of time to move to the > Level 3 interfaces. Perhaps in a month or two. Of course, this could > be accelerated by contributions. Parsed XML is already starting to support the Level 3 interfaces, most interestingly, the Load portion of the Load/Save "feature". (I just haven't had time to spend on the Save portion.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Mon May 14 19:11:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 14:11:01 -0400 (EDT) Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF32F0.8AAAED0C@FourThought.com> References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> <3AFF32F0.8AAAED0C@FourThought.com> Message-ID: <15104.8117.108130.195638@cj42289-a.reston1.va.home.com> Mike Olson writes: > However, are we ready to move to level III? Is level III ready to be > moved too? I agree with Martin on this: it's not ready. The "Load" specification is pretty reasonable, but it's still fairly preliminary as well. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Mon May 14 19:24:17 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 14:24:17 -0400 (EDT) Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFED1F4.C11668EF@FourThought.com> References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> Message-ID: <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> Mike Olson writes: > This is why I vote for either the implementation has the releaseNode > function, or the node itself. Putting such a method on the node makes the most sense, if the method makes sense at all. This allows different classes within an implementation to do the right thing without the dispatching overhead, and makes the most sense for implementations which can be subclassed. I am a little concerned about the method, however, because I see two different possibilities. One is the "I don't need you anymore; don't bother me" option (equivalent to DECREF), and the other is "Break all your internal links and die", equivalent to the minidom .unlink() method. From the discussion so far, I'm getting the sense that the latter is what is being discussed, and this is not always appropriate. To build DOM trees to use with the XPath/XSLT engines, would I need to provide an empty .releaseNode(), since the DOM trees are persistent and have lifetimes far beyond the individual use for them with a specific transformation? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jmurray@agyinc.com Mon May 14 19:22:05 2001 From: jmurray@agyinc.com (Joe Murray) Date: Mon, 14 May 2001 11:22:05 -0700 Subject: [XML-SIG] building XML docs using ? Message-ID: <3B00224D.AFB2057D@agyinc.com> Dear All, I am converting many large "legacy" text files to XML. Some of the original text files are upwards of 100 MB. What is the most efficient, using the speed/memory metrics, way to convert these text files to XML? Currently, I parse through the text files and create a DOM Document representation. However, the time and memory expenditure for conversion is huge, using either xml.dom.minidom or xml.dom. Here's an example of what I do: ---------- # import stuff from xml.dom.minidom import Document # create doc and documentElement node doc = Document() docelement = doc.appendChild(...) f = open(...) .. while 1: # get data from file line = f.readline() if not line: break line = line.strip() data = line.split(...) # create a new element node using data from file node = doc.createElement(...) node.setAttribute(...) node.appendChild(...) docelement.appendChild(node) ... ---------- Should I forgo the ease of using the DOM objects by simply generating outputting "hand-generated" markup? I was doing this previously, it's efficient, but definitely not as nice/clean as it could be... So basically, is there a lightweight XML module which provides for (as a graphics programmer would say) "immediate mode" output, with as nice an interface as the DOM modules? Oh, and BTW, can XML solve all my problems??? ;-) Thanks much, joe -- Joseph Murray Bioinformatics Specialist, AGY Therapeutics 290 Utah Avenue, South San Francisco, CA 94080 (650) 228-1146 From fdrake@acm.org Mon May 14 20:23:57 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 15:23:57 -0400 (EDT) Subject: [XML-SIG] building XML docs using ? In-Reply-To: <3B00224D.AFB2057D@agyinc.com> References: <3B00224D.AFB2057D@agyinc.com> Message-ID: <15104.12493.360699.521399@cj42289-a.reston1.va.home.com> Joe Murray writes: > Currently, I parse through the text files and create a DOM Document > representation. However, the time and memory expenditure for conversion > is huge, using either xml.dom.minidom or xml.dom. Here's an example of > what I do: Instead of building a DOM tree, send events to a SAX output generator. This avoids keeping your entire document in memory. The xml.sax.writer module provides this, and there may be others. (Be sure to get the xml.sax.writer from CVS though; I just fixed a really stupid bug...) > ---------- > > # import stuff > from xml.dom.minidom import Document > > # create doc and documentElement node > doc = Document() > docelement = doc.appendChild(...) > f = open(...) > .. > while 1: > > # get data from file > line = f.readline() > if not line: > break > line = line.strip() > data = line.split(...) > > # create a new element node using data from file > node = doc.createElement(...) > node.setAttribute(...) > node.appendChild(...) > docelement.appendChild(node) This would end up looking more like: writer = xml.sax.writer.XmlWriter(f) while 1: # get data from file ... # write new element to output: writer.startElement("item", {"attr": value}) writer.characters(data) writer.endElement("item") writer.characters("\n") # record separator, unless you're # using the PrettyPrinter version f.close() > So basically, is there a lightweight XML module which provides for (as a > graphics programmer would say) "immediate mode" output, with as nice an > interface as the DOM modules? Oh, and BTW, can XML solve all my > problems??? ;-) XML is an acronym, and as everyone knows, acronyms solve problems. All of them. So, yes, life will be perfect with your new-found TLA. ;) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:19:42 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:19:42 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <3B00224D.AFB2057D@agyinc.com> (message from Joe Murray on Mon, 14 May 2001 11:22:05 -0700) References: <3B00224D.AFB2057D@agyinc.com> Message-ID: <200105142019.f4EKJgR05670@mira.informatik.hu-berlin.de> > I am converting many large "legacy" text files to XML. Some of the > original text files are upwards of 100 MB. What is the most efficient, > using the speed/memory metrics, way to convert these text files to XML? The less markup, the less the memory overhead, and the faster the processing. So if you have a plain text file with contents XXX, the most efficient XML document you could get (from the viewpoint of parsing speed) is XXX </plaintext> Provided there is no markup in XXX, this is also the smallest XML document storing all bytes of XXX :-) > Currently, I parse through the text files and create a DOM Document > representation. Ah, so you are apparently bound by some DTD. In that case, it very much depends on how complex the transformation is. > node = doc.createElement(...) > node.setAttribute(...) > node.appendChild(...) > docelement.appendChild(node) So you create one element per line, in a single pass over the file? That is quite a simple conversion procedure. > Should I forgo the ease of using the DOM objects by simply generating > outputting "hand-generated" markup? Yes, definitely. > I was doing this previously, it's efficient, but definitely not as > nice/clean as it could be... Why is that? If you create the right template for a single line, e.g. template = '<elem attr1='%d' attr2='%s'>%s</elem>' then a simple print statement would suffice to fill out this template. This also make a nice separation of structure and content. > So basically, is there a lightweight XML module which provides for (as a > graphics programmer would say) "immediate mode" output, with as nice an > interface as the DOM modules? You could use the SAX interfaces, essentially implementing a Reader class, and using an xml.sax.XMLGenerator as the content handler. Then, you'd do proper startElement and endElement calls; the XMLGenerator will do immediate output. > Oh, and BTW, can XML solve all my problems??? ;-) Almost. To get rich quick, you still need to write chain letters :-) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:21:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:21:31 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <15104.12493.360699.521399@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <3B00224D.AFB2057D@agyinc.com> <15104.12493.360699.521399@cj42289-a.reston1.va.home.com> Message-ID: <200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de> > This would end up looking more like: > > writer = xml.sax.writer.XmlWriter(f) That's a SAX1 class, right? The SAX2 class is xml.sax.saxutils.XMLGenerator. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:09:50 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:09:50 +0200 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> Message-ID: <200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de> > Putting such a method on the node makes the most sense, if the > method makes sense at all. This allows different classes within an > implementation to do the right thing without the dispatching overhead, > and makes the most sense for implementations which can be subclassed. I agree. Making it a non-method is a suggestion you might get from a C++ programmer; the C++ equivalen - "delete this;" - bad style since you might run a method of the object that is being destroyed. Of course, in Python, this is not a problem. > I am a little concerned about the method, however, because I see two > different possibilities. One is the "I don't need you anymore; don't > bother me" option (equivalent to DECREF), and the other is "Break all > your internal links and die", equivalent to the minidom .unlink() > method. I can't understand the value of the first option. If you don't need an Element or a document anymore which somebody else might be holding onto, you can just drop it, right? > From the discussion so far, I'm getting the sense that the > latter is what is being discussed, and this is not always > appropriate. To build DOM trees to use with the XPath/XSLT engines, > would I need to provide an empty .releaseNode(), since the DOM trees > are persistent and have lifetimes far beyond the individual use for > them with a specific transformation? Not necessarily. Currently, 4XSLT uses ReleaseNode e.g. to release a style sheet, in a data flow: - read the style sheet using the StylesheetReader from an XML document (i.e. a byte stream) - process the style sheet - release it Another application is with result tree fragments: when instantiating an element, nodes get cloned over and over, and temporary results need to be released. There may be also cases where 4XSLT releases elements it did not create; I'd consider that a bug. I don't think we should introduce explicit reference counters for documents or some such; we should strive for less memory management, not more. Regards, Martin From fdrake@acm.org Mon May 14 21:26:24 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 16:26:24 -0400 (EDT) Subject: [XML-SIG] building XML docs using ? In-Reply-To: <200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de> References: <3B00224D.AFB2057D@agyinc.com> <15104.12493.360699.521399@cj42289-a.reston1.va.home.com> <200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de> Message-ID: <15104.16240.140204.456352@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > That's a SAX1 class, right? The SAX2 class is > xml.sax.saxutils.XMLGenerator. That's right. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From fdrake@acm.org Mon May 14 21:37:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 14 May 2001 16:37:09 -0400 (EDT) Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de> References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> <200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de> Message-ID: <15104.16885.755115.164847@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > I can't understand the value of the first option. If you don't need an > Element or a document anymore which somebody else might be holding > onto, you can just drop it, right? You can't do that in minidom without requiring cyclic GC, and that's not available for all projects thanks to users of legacy Python versions. I'm really learning to dislike Python 1.5.2. ;-( > Not necessarily. Currently, 4XSLT uses ReleaseNode e.g. to release a > style sheet, in a data flow: > - read the style sheet using the StylesheetReader from an XML document > (i.e. a byte stream) > - process the style sheet > - release it > > Another application is with result tree fragments: when instantiating > an element, nodes get cloned over and over, and temporary results need > to be released. OK, this makes sense. As long as it only releases nodes that it creates and does not use as part of the result, that's fine. As long as I can create a stylesheet and store it as a persistent object, create and store a bunch of documents, and then process them over & over without damaging them, and make the results persistent and usable in the same fashion, I'm happy. ;-) > There may be also cases where 4XSLT releases elements it did not > create; I'd consider that a bug. Agreed! > I don't think we should introduce explicit reference counters for > documents or some such; we should strive for less memory management, > not more. Agreed as well. If we can rely on GC, then I'm all for it. I just wanted to be sure that we were clear on the semantics of .releaseNode(), since it has a large potential for disaster. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From larsga@garshol.priv.no Mon May 14 22:57:13 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 14 May 2001 23:57:13 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <3B00224D.AFB2057D@agyinc.com> References: <3B00224D.AFB2057D@agyinc.com> Message-ID: <m37kzjsifa.fsf@lambda.garshol.priv.no> * Joe Murray | | So basically, is there a lightweight XML module which provides for | (as a graphics programmer would say) "immediate mode" output, with | as nice an interface as the DOM modules? As Martin says SAX has the advantage that it does not store the entire document in memory and so can be used to write applications that operate with a fixed amount of memory (more or less). Unless your document structure is too complex I would go for this. minidom also has mechanisms that can be used to build only parts of the tree at a time and throw them away afterwards. This may or may not work for your processing. These mechanisms are not documented, either, so it may be tricky to get them to work. Pyxie also has support for building partial trees and discarding them as you go. As an additional benefit it has an API that, IMHO, is far nicer than the DOM API. It's unlikely to be very fast, though. | Oh, and BTW, can XML solve all my problems??? ;-) I'm afraid not. You'll need topic maps for that... :-) --Lars M. From tpassin@home.com Mon May 14 23:38:20 2001 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 14 May 2001 18:38:20 -0400 Subject: [XML-SIG] building XML docs using ? References: <3B00224D.AFB2057D@agyinc.com> <m37kzjsifa.fsf@lambda.garshol.priv.no> Message-ID: <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> [Lars Marius Garshol] > > * Joe Murray > ... > > Oh, and BTW, can XML solve all my problems??? ;-) > > I'm afraid not. You'll need topic maps for that... :-) > Hey, the man needs speed here .... :-) Tom P From Mike.Olson@fourthought.com Tue May 15 05:44:51 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 14 May 2001 22:44:51 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> Message-ID: <3B00B443.18219553@FourThought.com> "Fred L. Drake, Jr." wrote: > > Mike Olson writes: > > This is why I vote for either the implementation has the releaseNode > > function, or the node itself. > > I am a little concerned about the method, however, because I see two > different possibilities. One is the "I don't need you anymore; don't > bother me" option (equivalent to DECREF), and the other is "Break all > your internal links and die", equivalent to the minidom .unlink() > method. From the discussion so far, I'm getting the sense that the > latter is what is being discussed, and this is not always > appropriate. To build DOM trees to use with the XPath/XSLT engines, > would I need to provide an empty .releaseNode(), since the DOM trees > are persistent and have lifetimes far beyond the individual use for > them with a specific transformation? It depends on the interface into the XSLT/XPath engine. They way 4XSLT/4XPath is set up, if you pass us a DOM node to process, we won't touch it. It is your DOM node, you job to release it. However, if you call appendStylesheetUri (as an example) we create a DOM node, and we will release it when processing is done. Currently, you can call "setDocumentReader" on the 4XSLT processor to use anything that conforms to the Reader interface when fromUri, fromString, fromStream are called. We then call the coresponding releaseNode on the documetn reader to free the DOM tree when we are done with it. So, I guess I still see plenty of cases where "unlink" makes sense. When would you want to use the DECREF equiv.? Mike > > -Fred > > -- > Fred L. Drake, Jr. <fdrake at acm.org> > PythonLabs at Digital Creations -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Tue May 15 08:17:10 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 15 May 2001 09:17:10 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> References: <3B00224D.AFB2057D@agyinc.com> <m37kzjsifa.fsf@lambda.garshol.priv.no> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> Message-ID: <m34run3wuh.fsf@lambda.garshol.priv.no> * Lars Marius Garshol | | I'm afraid not. You'll need topic maps for that... :-) * Thomas B. Passin | | Hey, the man needs speed here .... :-) SMOO. :-) --Lars M. From rsalz@zolera.com Tue May 15 15:02:21 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 15 May 2001 10:02:21 -0400 Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs) Message-ID: <3B0136ED.EC1EE700@zolera.com> According to my reading of the namespace spec, "xmlns" is not a namespace identifier, but is instead just lexically significant. Yet xml.dom (cf Document.py and ext/__init__.py) treats it as if it were a namespace, and uses it to find namespace nodes. Is that just an implementation technique? Where is the "xmlns" defined in a W3 recommendation? For example, in dom/__init__.py: XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/" I can't find that value in W3C docs -- what am I missing? I'm asking for a couple of reasons. First, I might be missing something on the specs. Second, I need to add this to xml/ns.py if it's really there, and third, it seems that if I'm write, then there's a (minor/obscure) bug. <tns:foo xmlns:tns="uri:zolera.com" xmlns="uri.zolera.com" xmlns:foo="http://www.w3.org/2000/xmlns/"> <bar foo:tns="uri:example.com"> <tns:testit>value</tns:testit> </bar> </tns:foo> What namespace is "testit" really in? I believe uri:zolera.com /r$ From fdrake@acm.org Tue May 15 15:06:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 10:06:01 -0400 (EDT) Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs) In-Reply-To: <3B0136ED.EC1EE700@zolera.com> References: <3B0136ED.EC1EE700@zolera.com> Message-ID: <15105.14281.196876.100997@cj42289-a.reston1.va.home.com> Rich Salz writes: > Where is the "xmlns" defined in a W3 recommendation? For example, in > dom/__init__.py: > XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/" > I can't find that value in W3C docs -- what am I missing? AFAICR, this is noted in the DOM Level 2 specification, with a note that it was an oversight in the Namespaces in XML recommendation that the W3C intends to correct in some future version. I haven't checked the errata for the Namespaces recommendation, however. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From rsalz@zolera.com Tue May 15 15:35:34 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 15 May 2001 10:35:34 -0400 Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs) References: <3B0136ED.EC1EE700@zolera.com> <15105.14281.196876.100997@cj42289-a.reston1.va.home.com> Message-ID: <3B013EB6.A9EE8032@zolera.com> > AFAICR, this is noted in the DOM Level 2 specification Aha, found it. "Note: In the DOM, all namespace declaration attributes are by definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". These are the attributes whose namespace prefix or qualified name is "xmlns". Although, at the time of writing, this is not part of the XML Namespaces specification [Namespaces], it is planned to be incorporated in a future revision." I won't hold my breath waiting for a revision of the XML Namespace spec, which seems pretty clear that xmlns is lexical, so I'd anticipate a fight. :) Thanks. /r$ From fdrake@acm.org Tue May 15 16:40:26 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 11:40:26 -0400 (EDT) Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3B00B443.18219553@FourThought.com> References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> <3B00B443.18219553@FourThought.com> Message-ID: <15105.19946.344121.580203@cj42289-a.reston1.va.home.com> Mike Olson writes: > It depends on the interface into the XSLT/XPath engine. They way > 4XSLT/4XPath is set up, if you pass us a DOM node to process, we won't > touch it. It is your DOM node, you job to release it. However, if you > call appendStylesheetUri (as an example) we create a DOM node, and we > will release it when processing is done. Currently, you can call > "setDocumentReader" on the 4XSLT processor to use anything that conforms > to the Reader interface when fromUri, fromString, fromStream are > called. We then call the coresponding releaseNode on the documetn > reader to free the DOM tree when we are done with it. This sounds pretty reasonable to me. > So, I guess I still see plenty of cases where "unlink" makes sense. > When would you want to use the DECREF equiv.? If you're using something that isn't GC friendly, such as minidom, you need explicit incref/decref machinery to be able to discard the document when it is no longer being used. This is less of an issue with the cycle detector introduced in "modern" Python releases, but is still a real problem with Python 1.5.2. And there are still a fair number of users of the older version, for a variety of reasons. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From Mike.Olson@fourthought.com Tue May 15 16:48:22 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 15 May 2001 09:48:22 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com> <3B00B443.18219553@FourThought.com> <15105.19946.344121.580203@cj42289-a.reston1.va.home.com> Message-ID: <3B014FC6.FCE8CE4F@FourThought.com> "Fred L. Drake, Jr." wrote: > > > If you're using something that isn't GC friendly, such as minidom, > you need explicit incref/decref machinery to be able to discard the > document when it is no longer being used. This is less of an issue > with the cycle detector introduced in "modern" Python releases, but is > still a real problem with Python 1.5.2. And there are still a fair > number of users of the older version, for a variety of reasons. So your saying a smarter unlink. either flag that I am no longer using this document, or completely destroy it if I was the last external reference to document. I think I see what your saying. Mike > > -Fred > > -- > Fred L. Drake, Jr. <fdrake at acm.org> > PythonLabs at Digital Creations -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Tue May 15 17:24:17 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 15 May 2001 09:24:17 -0700 Subject: [XML-SIG] [ pyxml-Bugs-424260 ] error importing Xhtml2HtmlPrinter Message-ID: <E14zhcP-0001Ak-00@usw-sf-web2.sourceforge.net> Bugs item #424260, was updated on 2001-05-15 09:24 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=424260&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: error importing Xhtml2HtmlPrinter Initial Comment: >>> import xml.dom.ext.XHtml2HtmlPrinter Traceback (innermost last): File "<stdin>", line 1, in ? File "/usr/lib/python1.5/site-packages/xml/dom/ext/XHtml2HtmlPrinter.py", line 3, in ? from xml.dom.html import HTML_FORBIDDEN_END, XHTML_NAMESPACE ImportError: cannot import name XHTML_NAMESPACE Patch for the bug is: --- XHtml2HtmlPrinter.py Tue Apr 24 20:31:42 2001 +++ /home/alf/XHtml2HtmlPrinter.py Tue May 15 18:18:18 2001 @@ -1,6 +1,7 @@ import string import Printer -from xml.dom.html import HTML_FORBIDDEN_END, XHTML_NAMESPACE +from xml.dom.html import HTML_FORBIDDEN_END +from xml.dom import XHTML_NAMESPACE class HtmlDocType: name = 'HTML' Cheers Alexandre Fayolle (I could not logging, because it seems there's some problem with SF and their ssl server) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=424260&group_id=6473 From Alexandre.Fayolle@logilab.fr Tue May 15 17:41:33 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 15 May 2001 18:41:33 +0200 (CEST) Subject: [XML-SIG] Python newbie question Message-ID: <Pine.LNX.4.21.0105151838070.9347-100000@orion.logilab.fr> Hi there, I really feel dumb for asking this... Well here comes anyway. In xml.dom.ext.Xhtml2HtmlPrinter, there's the following statement: import Printer There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not, as far as I know, in my PYTHONPATH. So how does this work (a pointer to the right page of TFM is fine by me)? TIA Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Tue May 15 18:06:36 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 13:06:36 -0400 (EDT) Subject: [XML-SIG] pyexpat interface issue Message-ID: <15105.25116.646987.317835@cj42289-a.reston1.va.home.com> The pyexpat module defines two wrappers for handlers which are expected to return integers (NotStandaloneHandler and ExternalEntityRefHandler). What stands out about these handlers is that Expat is expecting a return value (the others have void returns). The wrappers will propogate an exception if one is raised by the Python handler implementation, but then assumes that the return value is actually an integer. They use PyInt_AsLong() to convert the return value to an integer, but don't check the return value: if PyInt_AsLong() returns -1 and PyErr_Occurred() is non-NULL, a TypeError was raised by PyInt_AsLong() because the value passed to it was not an integer object. The -1 will be passed to Expat, which will happily continue parsing since it expects a false value to tell it to stop parsing. This has been this way for a while. Should the documentation for these interfaces be modifed to reflect this (strange) behavior, with some code cleanup to avoid having unused exception state laying around (which *can* show up later in unrelated code), or should the implementation be fixed to propogate the exception, or something else? I'm concerned that changing the actual behavior will adversely effect existing code that uses pyexpat. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From jmurray@agyinc.com Tue May 15 18:17:41 2001 From: jmurray@agyinc.com (Joe Murray) Date: Tue, 15 May 2001 10:17:41 -0700 Subject: [XML-SIG] building XML docs using ? References: <3B00224D.AFB2057D@agyinc.com> <m37kzjsifa.fsf@lambda.garshol.priv.no> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> Message-ID: <3B0164B5.BFB9EB46@agyinc.com> Thanks to everyone for their helpful responses. And to probe even further, into this technology that will "solve all my problems"... "Thomas B. Passin" wrote: > > [Lars Marius Garshol] > > > > * Joe Murray > > ... > > > Oh, and BTW, can XML solve all my problems??? ;-) > > > > I'm afraid not. You'll need topic maps for that... :-) > > > Hey, the man needs speed here .... :-) So, with regard to speed, is there an XSLT processor (python or not) which take a SAX-like event-driven approach to transforming XML? I know this doesn't deal fully with the dynamicity of an XSL doc, but it would be useful. I checked some old xml-dev, xml-sig... I can't vouch for the people who were discussing such a processor and given the fact that most of the posts were circa 1999... I couldn't find a straightforward answer. Does Sablotron support this? It seems as if the Oracle XML parsers packages do... but after some surfin', I ain't certain... "Martin v. Loewis" wrote: > > Should I forgo the ease of using the DOM objects by simply generating > > outputting "hand-generated" markup? > > Yes, definitely. > > > I was doing this previously, it's efficient, but definitely not as > > nice/clean as it could be... > > Why is that? If you create the right template for a single line, e.g. > > template = '<elem attr1='%d' attr2='%s'>%s</elem>' > > then a simple print statement would suffice to fill out this template. > This also make a nice separation of structure and content. Indeed, this is the route I have gone. I'm using xml.sax.saxutils.escape, a handy function, in lieu of the SAX writer interfaces. All you guys are a helpful bunch! Regards, joe -- Joseph Murray Bioinformatics Specialist, AGY Therapeutics 290 Utah Avenue, South San Francisco, CA 94080 (650) 228-1146 From larsga@garshol.priv.no Tue May 15 18:35:29 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 15 May 2001 19:35:29 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <3B0164B5.BFB9EB46@agyinc.com> References: <3B00224D.AFB2057D@agyinc.com> <m37kzjsifa.fsf@lambda.garshol.priv.no> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> <3B0164B5.BFB9EB46@agyinc.com> Message-ID: <m38zjya526.fsf@lambda.garshol.priv.no> * Joe Murray | | So, with regard to speed, is there an XSLT processor (python or not) | which take a SAX-like event-driven approach to transforming XML? Currently there is not, and part of the reason for that is that some parts of XSLT require the entire document to be available to the processor at the same time. If you use only a subset of XSLT one can use an event-based approach, but currently nobody has implemented anything like this. However, SAXON has some extensions that can enable you to build only parts of the tree at a time. This puts some constraints on what you are able to do, but you may be able to live with it. | Does Sablotron support this? It does not. | It seems as if the Oracle XML parsers packages do... but after some | surfin', I ain't certain... I don't think they do, though there is a chance that I might be wrong. You should in any case distinguish carefully between XML parsers (nearly all of which have event-based interfaces) and XSLT engines. --Lars M. From martin@loewis.home.cs.tu-berlin.de Tue May 15 19:43:04 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 20:43:04 +0200 Subject: [XML-SIG] Python newbie question In-Reply-To: <Pine.LNX.4.21.0105151838070.9347-100000@orion.logilab.fr> (message from Alexandre Fayolle on Tue, 15 May 2001 18:41:33 +0200 (CEST)) References: <Pine.LNX.4.21.0105151838070.9347-100000@orion.logilab.fr> Message-ID: <200105151843.f4FIh4Z01461@mira.informatik.hu-berlin.de> > There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not, > as far as I know, in my PYTHONPATH. So how does this work (a pointer to > the right page of TFM is fine by me)? I don't think the package import procedure is documented anywhere; the best you can get is http://www.python.org/doc/essays/packages.html For your specific question, see Intra-package References. Regards, Martin From tpassin@home.com Wed May 16 00:55:15 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 15 May 2001 19:55:15 -0400 Subject: [XML-SIG] building XML docs using ? References: <3B00224D.AFB2057D@agyinc.com> <m37kzjsifa.fsf@lambda.garshol.priv.no> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> <3B0164B5.BFB9EB46@agyinc.com> Message-ID: <002801c0dd9a$7d011960$7cac1218@reston1.va.home.com> [Joe Murray] > So, with regard to speed, is there an XSLT processor (python or not) > which take a SAX-like event-driven approach to transforming XML? I know > this doesn't deal fully with the dynamicity of an XSL doc, but it would > be useful. I checked some old xml-dev, xml-sig... I can't vouch for the > people who were discussing such a processor and given the fact that most > of the posts were circa 1999... I couldn't find a straightforward > answer. Does Sablotron support this? It seems as if the Oracle XML > parsers packages do... but after some surfin', I ain't certain... > > Some processors can do lazy evaluation and thereby avoid computing branches that aren't used in a particular transformation. I'm pretty sure Xalan does this, and I think Saxon can be asked to. Of course, if your source document and transform need to pull together nodes from all parts of the document, this won't help. Otherwise, some processors can ingest the xml via SAX as well as the/a DOM, but they then build their own DOM model. Cheers, Tom P From Alexandre.Fayolle@logilab.fr Wed May 16 10:06:03 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 16 May 2001 11:06:03 +0200 (CEST) Subject: [XML-SIG] Python newbie question In-Reply-To: <200105151843.f4FIh4Z01461@mira.informatik.hu-berlin.de> Message-ID: <Pine.LNX.4.21.0105161103550.10884-100000@orion.logilab.fr> On Tue, 15 May 2001, Martin v. Loewis wrote: > > There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not, > > as far as I know, in my PYTHONPATH. So how does this work (a pointer to > > the right page of TFM is fine by me)? > > I don't think the package import procedure is documented anywhere; the > best you can get is > > http://www.python.org/doc/essays/packages.html Thanks for this very interesting pointer. It clarifies a number of notions for which I only had an intuitive grasp. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Thu May 17 07:59:15 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 00:59:15 -0600 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> of "Mon, 14 May 2001 09:26:46 +0200." <200105140726.f4E7QkI01878@mira.informatik.hu-berlin.de> Message-ID: <200105170659.f4H6xFF13604@localhost.local> > >> It seems to me that all StylesheetReader does is to > >> create a DOM tree, except that it creates StylesheetElement nodes > >> where a normal DOM build would create Element nodes. > > > Wow. I'd count this a huge oversimplification. The Stylesheet reader > > does a great deal that most readers needn't worry about, as I'd think > > would be obvious from a glance at te code. > > I'd like to discuss specific aspects, then. Looking at the current > public CVS, I see: > > fromStream: duplicates ReaderMixin.fromStream, then adds call to > sheet.setup(), and some error handling > > initParser: duplicates PyExpatReader.initParser. It uses > Utf8OnlyHandler sometimes, but I could not find that class. > > _completeTextNode: creates LiteralText instead of Text nodes. Also does > not deal with top_node, but I'm not sure whether this is on > purpose > > _initializeSheet: has no equivalent elsewhere > _handleExtUris: has no equivalent elsewhere > > processingInstruction: Does *not* create PI nodes > comment: Likewise > > startElement: great similarities with Handler.startElement. The significant > differences seem to be: > - creates element nodes based on g_mappings[nsuri][localname], > extension tables, or creates LiteralElement > - processes xsl:include somehow (?) > - passes attributes through _handleExtUris for xsl:stylesheet > > endElement: great overload with Handler.endElement; I could not tell > whether differences are on purpose or by mistake > > characters: does not deal with _includeDepth and force8Bit (again, this > might be by mistake) > > Did I miss aspects of the functionality relevant to proper operation > of the StylesheetReader? > > So all in all, it still seems to me that the essential difference is > what nodes are created; the control logic and parsing data structures > seem to be duplicates of the code found in the handler. > > That, in turn, suggests that using a standard DOM builder with a > different DOM implementation would achieve the same effect. There is a lot of state that the StylesheetReader manages that other readers don't. This would be very cumbersome to shoe-horn into a standard DOM reader. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu May 17 08:03:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 01:03:07 -0600 Subject: [XML-SIG] building XML docs using ? In-Reply-To: Message from Joe Murray <jmurray@agyinc.com> of "Mon, 14 May 2001 11:22:05 PDT." <3B00224D.AFB2057D@agyinc.com> Message-ID: <200105170703.f4H738F13626@localhost.local> > Dear All, > > I am converting many large "legacy" text files to XML. Some of the > original text files are upwards of 100 MB. What is the most efficient, > using the speed/memory metrics, way to convert these text files to XML? 1) Using SAX 2) Cutting the output docs to reasonable size I can guarantee you you want nothing to do with XML files in the hundreds of MB. You don't even want them in the MB, period. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu May 17 08:04:48 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 01:04:48 -0600 Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org> of "Mon, 14 May 2001 16:37:09 EDT." <15104.16885.755115.164847@cj42289-a.reston1.va.home.com> Message-ID: <200105170704.f4H74mr13635@localhost.local> > > There may be also cases where 4XSLT releases elements it did not > > create; I'd consider that a bug. > > Agreed! I'm not aware of any such case. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tony.mcdonald@ncl.ac.uk Thu May 17 08:18:29 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Thu, 17 May 2001 08:18:29 +0100 Subject: [XML-SIG] Advice needed: RTF->XML conversions Message-ID: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> Hi all, I'm currently using Omnimark to convert RTF files into a usable form of XML, ready for uploading into our SQL database. Omnimark is no longer free, so this means I can't pass on our software to other HE institutions in the UK. Can anyone suggest some (preferably python based) tools I can use to get from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) to an XML form? If someone has written something that takes that (dreadful) 'XML' output that Word 2001 outputs and cleans it up into valid XML that would be a great start for me. Many thanks Tone. -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From Alexandre.Fayolle@logilab.fr Thu May 17 08:49:08 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 17 May 2001 09:49:08 +0200 (CEST) Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> Message-ID: <Pine.LNX.4.21.0105170945050.11584-100000@leo.logilab.fr> On Thu, 17 May 2001, Tony McDonald wrote: > Can anyone suggest some (preferably python based) tools I can use to get > from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) > to an XML form? > > If someone has written something that takes that (dreadful) 'XML' output > that Word 2001 outputs and cleans it up into valid XML that would be a great > start for me. I don't have a coded solution, but if I were to do such thing, I'd use the Automation interface of Word together with python's COM interface on windows to have Word parse the document for me using the various iterators available in the Word Document interface and building my own XML. This can be very simple if your document only uses the basic styles in word (title 1, text body, toc... [I don't know the english names, only guessing here]), or dreadful if your document features images, tables, floating text sections, etc. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From tony.mcdonald@ncl.ac.uk Thu May 17 09:14:34 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Thu, 17 May 2001 09:14:34 +0100 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <Pine.LNX.4.21.0105170945050.11584-100000@leo.logilab.fr> Message-ID: <B72946F8.81CA%tony.mcdonald@ncl.ac.uk> On 17/5/01 8:49 am, "Alexandre Fayolle" <Alexandre.Fayolle@logilab.fr> wrote: > On Thu, 17 May 2001, Tony McDonald wrote: > >> Can anyone suggest some (preferably python based) tools I can use to get >> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) >> to an XML form? >> >> If someone has written something that takes that (dreadful) 'XML' output >> that Word 2001 outputs and cleans it up into valid XML that would be a great >> start for me. > > I don't have a coded solution, but if I were to do such thing, I'd use the > Automation interface of Word together with python's COM interface on > windows to have Word parse the document for me using the various iterators > available in the Word Document interface and building my own XML. > We have very little experience of doing things this way - we're a Unix and Zope shop and try not to get too involved with the inner workings of Microsoft software (if at all possible). > This can be very simple if your document only uses the basic styles in > word (title 1, text body, toc... [I don't know the english names, only > guessing here]), or dreadful if your document features images, tables, > floating text sections, etc. > > Alexandre Fayolle Thanks for the advice Alexandre, but it's the latter case I'm afraid :( Our documents have tables, images, superscripts/subscripts, greek characters (ie simple formulas), page breaks and more besides. Cheers Tone. -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From Mike.Olson@fourthought.com Thu May 17 09:24:37 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 17 May 2001 02:24:37 -0600 Subject: [XML-SIG] Advice needed: RTF->XML conversions References: <B72946F8.81CA%tony.mcdonald@ncl.ac.uk> Message-ID: <3B038AC5.B205328F@FourThought.com> Tony McDonald wrote: > > On 17/5/01 8:49 am, "Alexandre Fayolle" <Alexandre.Fayolle@logilab.fr> > wrote: > > > On Thu, 17 May 2001, Tony McDonald wrote: > > > >> Can anyone suggest some (preferably python based) tools I can use to get > >> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) > >> to an XML form? Can you send me a sample of the word XML output, and the format your looking for. You can probably do it with a stylesheet as long as what word spits out really is XML. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Thu May 17 09:45:05 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 May 2001 10:45:05 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <200105170703.f4H738F13626@localhost.local> References: <200105170703.f4H738F13626@localhost.local> Message-ID: <m3eltotlda.fsf@lambda.garshol.priv.no> * Uche Ogbuji | | I can guarantee you you want nothing to do with XML files in the | hundreds of MB. You don't even want them in the MB, period. Why ever not? I've worked with lots of XML files of that size over the last years and see nothing wrong with that. If the amount of data you need to move around or work with is large, then your XML documents will be large. I see no reason why this should be considered somehow suspect or wrong. If you use SAX there is really no reason why you shouldn't be able to handle such documents. --Lars M. From larsga@garshol.priv.no Thu May 17 09:48:20 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 May 2001 10:48:20 +0200 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> References: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> Message-ID: <m3d798tl7v.fsf@lambda.garshol.priv.no> * Tony McDonald | | Can anyone suggest some (preferably python based) tools I can use to get | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) | to an XML form? These are the only ones I know of: <URL: http://www.garshol.priv.no/download/xmltools/prod/RTF2XML.html > <URL: http://www.garshol.priv.no/download/xmltools/prod/Majix.html > --Lars M. From mal@lemburg.com Thu May 17 11:12:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 12:12:40 +0200 Subject: [XML-SIG] Advice needed: RTF->XML conversions References: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> <m3d798tl7v.fsf@lambda.garshol.priv.no> Message-ID: <3B03A418.5871B67@lemburg.com> Lars Marius Garshol wrote: > > * Tony McDonald > | > | Can anyone suggest some (preferably python based) tools I can use to get > | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) > | to an XML form? > > These are the only ones I know of: > <URL: http://www.garshol.priv.no/download/xmltools/prod/RTF2XML.html > > <URL: http://www.garshol.priv.no/download/xmltools/prod/Majix.html > If you want to invest some time, you may want to look at the RTF.py example in mxTextTools (see Python Software link below) and extend it to whatever you need as basis for generating XML from the RTF input. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From DShriyash@pun.cognizant.com Thu May 17 11:35:30 2001 From: DShriyash@pun.cognizant.com (Shriyash, Divekar (CTS)) Date: Thu, 17 May 2001 16:05:30 +0530 Subject: [XML-SIG] Small problem in XML parsing Message-ID: <49532EE860A3D411812A00508B690B29FBC264@ctsinpunsxua> This is a multi-part message in MIME format. --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; charset="iso-8859-1" Hi Folks, Have got a small problem in XML parsing. I wish to append a new element in my XML file without creating new Elements. . General methodology is to first remove all the available tags & then by 'document.createElement', create the new required element. My requirement is to point to already available element and append new child to it. e.g. <security-role-assignment> <role-name> New Role</role-name> <principal-name> abc </principal-name> <principal-name> def </principal-name> --------- --------- </security-role-assignment> Here, <principal-name> <value> </principal-name> will go on adding. I wish to point to already available '<role-name> New Role</role-name>' tags & the append new principals to it. XML does not diffrentiates between <role-name> & <principal-name> tags. This may look a very simple problem but causing us bit more efforts. We would be very happy if anybody can throw light on it. Thanx in advance Regards Shri --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. Visit us at http://www.cognizant.com --------------InterScan_NT_MIME_Boundary-- From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:39 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Thu, 17 May 2001 12:05:39 +0100 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <m3d798tl7v.fsf@lambda.garshol.priv.no> Message-ID: <B7296CAA.81FE%tony.mcdonald@ncl.ac.uk> On 17/5/01 9:48 am, "Lars Marius Garshol" <larsga@garshol.priv.no> wrote: > > * Tony McDonald > | > | Can anyone suggest some (preferably python based) tools I can use to get > | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages) > | to an XML form? > > These are the only ones I know of: > <URL: http://www.garshol.priv.no/download/xmltools/prod/RTF2XML.html > > <URL: http://www.garshol.priv.no/download/xmltools/prod/Majix.html > > > --Lars M. > Thanks for that Lars, However, the first program is based on Omnimark (it's actually what I'm using now), and the second is a Java based program, and I *think* the java program I've mentioned in my other post (wh2fo) does a good enough job to get initially to XML. I still need to do my other machinations on the resultant XML however. Thanks Tone. -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:39 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Thu, 17 May 2001 12:05:39 +0100 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <3B038AC5.B205328F@FourThought.com> Message-ID: <B7296E06.81FE%tony.mcdonald@ncl.ac.uk> On 17/5/01 9:24 am, "Mike Olson" <Mike.Olson@fourthought.com> wrote: > Tony McDonald wrote: >> >> On 17/5/01 8:49 am, "Alexandre Fayolle" <Alexandre.Fayolle@logilab.fr> >> wrote: >> >>> On Thu, 17 May 2001, Tony McDonald wrote: >>> >>>> Can anyone suggest some (preferably python based) tools I can use to get >>>> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML >>>> pages) >>>> to an XML form? > > Can you send me a sample of the word XML output, and the format your > looking for. You can probably do it with a stylesheet as long as what > word spits out really is XML. > > Mike > Thanks for the offer Mike - I *was* under the impression that what word spat out was not real XML, but I found this (sorry, Java) based program; http://www-uk.hpl.hp.com/people/fabgia/wh2fo/wh2fo.html which generates XML and XSL from the html files that word 2000 generates. Frankly, I'm amazed, as I thought that constructs such as <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=Title content="TRAUMA &amp; BURNS"> <meta name=Keywords content=""> <meta http-equiv=Content-Type content="text/html; charset=macintosh"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 9"> <meta name=Originator content="Microsoft Word 9"> <link rel=File-List href="hepatology%20resource%20day_f_files/filelist.xml"> <link rel=Edit-Time-Data href="hepatology%20resource%20day_f_files/editdata.mso"> <link rel=OLE-Object-Data href="hepatology%20resource%20day_f_files/oledata.mso"> where attributes aren't quoted or, if they are, they're quoted with " or ' inconsistently, were very bad XML. I guess I was wrong. I still need to do some work with the XML that the above program uses and would like to use Python for that as I'm *far* more comfortable with it than java. If I've ready you right, are you saying I could apply a stylesheet to this XML to get to my output XML which is then ok for my (finally!) python based sgmlop processor that makes SQL? If so, I'll be very happy indeed! Essentially I need to 'stack' the headings in the original document so that this; Heading 1 "Title" heading 2 "Overview" heading 3 "Core Content" heading 2 "Theme 1" Goes to <topic type="heading1" content="Title"> <topic type="heading2" content="Overview"> <topic type="heading3" content="Core Content"> </topic> <topic type="heading2" content="Theme 1"> </topic> </topic> If you're saying that I can use XSL stylesheets to get this to work, then I need to do some reading! Thanks for the comments, Tone. -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:40 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Thu, 17 May 2001 12:05:40 +0100 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <3B03A418.5871B67@lemburg.com> Message-ID: <B7296E65.8200%tony.mcdonald@ncl.ac.uk> On 17/5/01 11:12 am, "M.-A. Lemburg" <mal@lemburg.com> wrote: > Lars Marius Garshol wrote: >> >> * Tony McDonald >> | >> | Can anyone suggest some (preferably python based) tools I can use to get >> | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML >> pages) >> | to an XML form? >> >> These are the only ones I know of: >> <URL: http://www.garshol.priv.no/download/xmltools/prod/RTF2XML.html > >> <URL: http://www.garshol.priv.no/download/xmltools/prod/Majix.html > > > If you want to invest some time, you may want to look at the > RTF.py example in mxTextTools (see Python Software link below) > and extend it to whatever you need as basis for generating XML > from the RTF input. Thanks for the pointer Marc, I did look at the RTF.py files a while back, but at the time I was ok with Omnimark and the code was a bit over my head, so I had to put it on the back burner. Cheers Tone. -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From larsga@garshol.priv.no Thu May 17 12:18:21 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 May 2001 13:18:21 +0200 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <B7296CAA.81FE%tony.mcdonald@ncl.ac.uk> References: <B7296CAA.81FE%tony.mcdonald@ncl.ac.uk> Message-ID: <m3wv7grzpe.fsf@lambda.garshol.priv.no> * Tony McDonald | | However, the first program is based on Omnimark (it's actually what I'm | using now), Uh - sorry, should have seen that. | and the second is a Java based program, and I *think* the java | program I've mentioned in my other post (wh2fo) does a good enough | job to get initially to XML. Thanks for that pointer. I've put it in the inbox to my site. | I still need to do my other machinations on the resultant XML however. Well, that's an ordinary XML processing job, so Python should have all the tools you need for that task. --Lars M. From tpassin@home.com Thu May 17 14:51:39 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 17 May 2001 09:51:39 -0400 Subject: [XML-SIG] Advice needed: RTF->XML conversions References: <B72939D4.81BA%tony.mcdonald@ncl.ac.uk> Message-ID: <002801c0ded8$810ba4a0$7cac1218@reston1.va.home.com> [Tony McDonald] > > If someone has written something that takes that (dreadful) 'XML' output > that Word 2001 outputs and cleans it up into valid XML that would be a great > start for me. > HTML-tidy has an option to clean up Word 2000 xml. You can get it from the W3C site, or in a GUI editor, as part of HTML-kit (free), from www.chami.org. Cheers, Tom P From rsalz@zolera.com Thu May 17 14:57:08 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 17 May 2001 09:57:08 -0400 Subject: [XML-SIG] XML Canonicalization Message-ID: <3B03D8B4.9108432D@zolera.com> This is a multi-part message in MIME format. --------------4C8E83122B2EF8C15F82E15C Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Someone had asked for code to do XML C14N (canonicalization) a couple of weeks ago. I finally got around to cleaning up my code; it's attached. I would be more than happy to add this to PyXML if there's interest. Since it operates on DOM nodes, perhaps xml.dom.utils ? I'd probably also need to upgrade the documentation -- the docstrings in the code should tell you all you need. Hope this helps -- looking forward to feedback. /r$ --------------4C8E83122B2EF8C15F82E15C Content-Type: text/plain; charset=us-ascii; name="c14n.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="c14n.py" #! /usr/bin/env python '''XML C14N Perform XML Canonicalization. Not fully conformant to the spec in a couple of ways (mostly minor): Comments are always stripped Whitespace preservation/stripping not totally correct Processing Instruction nodes aren't handled The nodeset must start with an element and includes all descendants Fixing the last one would be non-trivial. ''' _copyright = '''Copyright 2001, Zolera Systems Inc. All Rights Reserved. Distributed under the terms of the Python 2.0 Copyright.''' from xml.dom import Node import re import StringIO _attrs = lambda E: E._get_attributes() or [] _children = lambda E: E._get_childNodes() or [] _sorter = lambda n1, n2: cmp(n1._get_nodeName(), n2._get_nodeName()) xmlns_base = "http://www.w3.org/2000/xmlns/" class _implementation: # Handlers for each node, by node type. handlers = {} # pattern/replacement list for whitespace stripping. repats = ( ( re.compile(r'^[ \t]+', re.MULTILINE), '' ), ( re.compile(r'[ \t]+$', re.MULTILINE), '' ), ( re.compile(r'[\r\n]+'), '\n' ), ) def __init__(self, node, write, nsdict={}, stripspace=0): '''Create and run the implementation.''' if node._get_nodeType() != Node.ELEMENT_NODE: raise TypeError, 'Non-element node' self.write, self.ns_stack, self.stripspace = \ write, [nsdict], stripspace self._do_element(node) self.ns_stack.pop() def _do_text(self, node): 'Output a text node in canonical form.' s = node._get_data() \ .replace("\015", "&#xD;") \ .replace("&", "&amp;") \ .replace("<", "&lt;") \ .replace(">", "&gt;") if self.stripspace: for pat,repl in _implementation.repats: s = re.sub(pat, repl, s) if s: self.write(s) handlers[Node.TEXT_NODE] =_do_text handlers[Node.CDATA_SECTION_NODE] =_do_text def _do_pi(self, node): 'Output a processing instruction in canonical form.' pass # XXX handlers[Node.PROCESSING_INSTRUCTION_NODE] =_do_pi def _do_comment(self, node): 'Output a comment node in canonical form.' pass # XXX handlers[Node.COMMENT_NODE] =_do_comment def _do_attr(self, n, value): 'Output an attribute in canonical form.' W = self.write W(' ') W(n) W('="') s = value \ .replace("&", "&amp;") \ .replace("<", "&lt;") \ .replace('"', '&quot;') \ .replace('\011', '&#9') \ .replace('\012', '&#A') \ .replace('\015', '&#D') W(s) W('"') def _do_element(self, node): 'Output an element (and its children) in canonical form.' name = node._get_nodeName() parent_ns = self.ns_stack[-1] my_ns = { 'xmlns': parent_ns.get('xmlns', '') } W = self.write W('<') W(name) # Divide attributes to NS definitions and others. nsnodes, others = [], [] for a in _attrs(node): if a._get_namespaceURI() == xmlns_base: nsnodes.append(a) else: others.append(a) # Namespace attributes: update dictionary; if not already # in parent, output it. nsnodes.sort(_sorter) for a in nsnodes: n = a._get_nodeName() if n == "xmlns:": key, n = "", "xmlns" else: key = a._get_localName() v = my_ns[key] = a._get_nodeValue() pval = parent_ns.get(key, None) if v != pval: self._do_attr(n, v) # Other attributes: sort and output. others.sort(_sorter) for a in others: self._do_attr(a._get_nodeName(), a._get_value()) W('>') self.ns_stack.append(my_ns) for c in _children(node): handler = _implementation.handlers.get(c._get_nodeType(), None) if handler: handler(self, c) self.ns_stack.pop() W('</%s>' % (name,)) handlers[Node.ELEMENT_NODE] =_do_element def XMLC14N(node, output=None, **kw): '''Canonicalize a DOM element node and everything underneath it. Return the text; if output is specified then output.write will be called to output the text and the return value will be None. Keyword parameters: stripspace -- remove extra (almost all) whitespace from text nodes nsdict -- a dictionary of prefix/uri namespace entries assumed to exist in the surrounding context. ''' if output: s = None else: output = s = StringIO.StringIO() _implementation(node, output.write, stripspace=kw.get('stripspace', 0), nsdict=kw.get('nsdict', {}) ) if s: return (s.getvalue(), s.close())[0] return None if s == None: return None ret = s.getvalue() s.close() return ret if __name__ == '__main__': text = '''<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:spare='foo' SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body xmlns='test-uri'><?MYPI spenser?> <Price xsi:type='xsd:integer'>34</Price> <!-- 0 --> <SOAP-ENC:byte>44</SOAP-ENC:byte> <!-- 1 --> <Name>This is the name</Name> <!-- 2 --> <n2><![CDATA[<greeting>Hello</greeting>]]></n2> <!-- 3 --> <n3 href='#zzz' xsi:type='SOAP-ENC:string'/> <!-- 4 --> <n64>a GVsbG8=</n64> <!-- 5 --> <SOAP-ENC:string>Red</SOAP-ENC:string> <!-- 6 --> <a2 href='#tri2'/> <!-- 7 --> <a2 xmlns:f='z' xmlns:aa='zz'><i xmlns:f='z'>12</i><t>rich salz</t></a2> <!-- 8 --> <xsd:hexBinary>3F2041</xsd:hexBinary> <!-- 9 --> <nullint xsi:nil='1'/> <!-- 10 --> </SOAP-ENV:Body> <z xmlns='myns' id='zzz'>The value of n3</z> <zz xmlns:spare='foo' xmlns='myns2' id='tri2'><inner>content</inner></zz> </SOAP-ENV:Envelope>''' print _copyright from xml.dom.ext.reader import PyExpat reader = PyExpat.Reader() dom = reader.fromString(text) for e in _children(dom): if e._get_nodeType() != Node.ELEMENT_NODE: continue for ee in _children(e): if ee._get_nodeType() != Node.ELEMENT_NODE: continue print '\n', '=' * 60 print XMLC14N(ee, nsdict={'spare':'foo'}, stripspace=1) print '-' * 60 print XMLC14N(ee, stripspace=0) print '=' * 60 --------------4C8E83122B2EF8C15F82E15C-- From rsalz@zolera.com Thu May 17 15:13:48 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 17 May 2001 10:13:48 -0400 Subject: [XML-SIG] XML Canonicalization References: <3B03D8B4.9108432D@zolera.com> Message-ID: <3B03DC9C.D12A6B91@zolera.com> Oops. I didn't save-file in the other window before I sent... > def XMLC14N(node, output=None, **kw): ... > if s: return (s.getvalue(), s.close())[0] > return None > if s == None: return None ** > ret = s.getvalue() ** > s.close() ** > return ret ** Obviously those last four lines can be deleted. From uche.ogbuji@fourthought.com Thu May 17 18:09:28 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 11:09:28 -0600 Subject: [XML-SIG] building XML docs using ? In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no> of "17 May 2001 10:45:05 +0200." <m3eltotlda.fsf@lambda.garshol.priv.no> Message-ID: <200105171709.f4HH9SX17328@localhost.local> > > * Uche Ogbuji > | > | I can guarantee you you want nothing to do with XML files in the > | hundreds of MB. You don't even want them in the MB, period. > > Why ever not? I've worked with lots of XML files of that size over the > last years and see nothing wrong with that. If the amount of data you > need to move around or work with is large, then your XML documents > will be large. > > I see no reason why this should be considered somehow suspect or wrong. > If you use SAX there is really no reason why you shouldn't be able to > handle such documents. Why not? Because most XML handling tools are not very scalable, XSLT being the foremost example. Also because XML eliminates the need, which I think quite unneccesary, of storing mountains of data in a single file. Inclusion, transclusion, other linking mechanisms, and many tools are available for breaking XML into manageable packets. So, in my opionion, it's suspect *and* wrong to be dealing with 100MB XML files. Opinion of others might vary, of course. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu May 17 18:13:06 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 11:13:06 -0600 Subject: [XML-SIG] XML Canonicalization In-Reply-To: Message from Rich Salz <rsalz@zolera.com> of "Thu, 17 May 2001 09:57:08 EDT." <3B03D8B4.9108432D@zolera.com> Message-ID: <200105171713.f4HHD6Z17352@localhost.local> > Someone had asked for code to do XML C14N (canonicalization) a couple of > weeks ago. I finally got around to cleaning up my code; it's attached. > > I would be more than happy to add this to PyXML if there's interest. > Since it operates on DOM nodes, perhaps xml.dom.utils ? I'd probably > also need to upgrade the documentation -- the docstrings in the code > should tell you all you need. Brilliant! I heartily vote for its inclusion in PyXML. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu May 17 19:15:13 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 20:15:13 +0200 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <3B038AC5.B205328F@FourThought.com> (message from Mike Olson on Thu, 17 May 2001 02:24:37 -0600) References: <B72946F8.81CA%tony.mcdonald@ncl.ac.uk> <3B038AC5.B205328F@FourThought.com> Message-ID: <200105171815.f4HIFDF01101@mira.informatik.hu-berlin.de> > Can you send me a sample of the word XML output, and the format your > looking for. You can probably do it with a stylesheet as long as what > word spits out really is XML. It isn't. Most notably, attribute values are not enclosed in quotes. I found that sgmlop can parse what word produces, though. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu May 17 20:06:53 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 21:06:53 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <200105171713.f4HHD6Z17352@localhost.local> (message from Uche Ogbuji on Thu, 17 May 2001 11:13:06 -0600) References: <200105171713.f4HHD6Z17352@localhost.local> Message-ID: <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> > Brilliant! I heartily vote for its inclusion in PyXML. It's fine with me, too. Rich, could you please check it in? Thanks, Martin From rsalz@zolera.com Thu May 17 20:20:05 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 17 May 2001 15:20:05 -0400 Subject: [XML-SIG] XML Canonicalization References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> Message-ID: <3B042465.1DCA826D@zolera.com> > Rich, could you please check it in? Gladly. Just tell me where (xml.dom.utils?) and where are the docs that I should update. /r$ From uche.ogbuji@fourthought.com Thu May 17 20:27:31 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 17 May 2001 13:27:31 -0600 Subject: [XML-SIG] XML Canonicalization References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> Message-ID: <3B042623.157DD7F1@fourthought.com> "Martin v. Loewis" wrote: > > > Brilliant! I heartily vote for its inclusion in PyXML. > > It's fine with me, too. Rich, could you please check it in? Rich did ask about the best place to put it. He suggested xml.dom.utils, but I wonder if there's any prospect of generalizing it so that it would work with SAX streams. Based on his DOM ops, I guess probably not. So maybe xml.dom.ext.c14n I think this will be handy for RDF (parseType="literal", ya know). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From chris@hddesign.com Thu May 17 21:01:47 2001 From: chris@hddesign.com (Chris Meyers) Date: Thu, 17 May 2001 15:01:47 -0500 Subject: [XML-SIG] newbie question Message-ID: <20010517150147.A5471@hddesign.com> Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml: <?xml version="1.0" encoding="UTF-8"?> <report> <data> <rec> <fld id="1">123</fld> <fld id="2">John></fld> <fld id="3">Smith></fld> </rec> </data> </report> >From this xml I would like to pull out the id attributes and the values from the <fld> elements. I can do this in jython with jdom easily enough, but I need to use python for my current application If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it. Thanks, Chris From martin@loewis.home.cs.tu-berlin.de Thu May 17 21:12:36 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 22:12:36 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B042623.157DD7F1@fourthought.com> (message from Uche Ogbuji on Thu, 17 May 2001 13:27:31 -0600) References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> Message-ID: <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de> > He suggested xml.dom.utils, but I wonder if there's any prospect of > generalizing it so that it would work with SAX streams. Based on his > DOM ops, I guess probably not. > > So maybe xml.dom.ext.c14n xml.dom.ext sounds better than xml.dom.utils, since I dislike packages with only a single module, and because it is also an extension. I'm not whether people can make sense out of c14n - I certainly couldn't, although it is a cute name. 'normalize' would not be appropriate, would it? Regards, Martin From Mike.Olson@fourthought.com Thu May 17 23:18:52 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 17 May 2001 16:18:52 -0600 Subject: [XML-SIG] newbie question References: <20010517150147.A5471@hddesign.com> Message-ID: <3B044E4C.37A2F38C@FourThought.com> Chris Meyers wrote: > > Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml: > > <?xml version="1.0" encoding="UTF-8"?> > <report> > <data> > <rec> > <fld id="1">123</fld> > <fld id="2">John></fld> > <fld id="3">Smith></fld> > </rec> > </data> > </report> There are a couple of ways: 1. Use DOM from xml.dom.ext.reader import PyExpat reader = PyExpat.Reader() dom = reader.fromString(XML_SRC) flds = dom.documentElement.getElementsByTagName('fld') for f in flds: print fld.getAttribute('id') print fld.firstChild.data 2. Use XPath from xml import xpath from xml.dom.ext.reader import PyExpat reader = PyExpat.Reader() dom = reader.fromString(XML_SRC) flds = xpath.Evaluate('//fld',contextNode = dom) for f in flds: print fld.getAttribute('id') print fld.firstChild.data Mike > > >From this xml I would like to pull out the id attributes and the values from the <fld> elements. I can do this in jython with jdom easily enough, but I need to use python for my current application > > If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it. > > Thanks, > Chris > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From chris@hddesign.com Thu May 17 23:47:42 2001 From: chris@hddesign.com (Chris Meyers) Date: Thu, 17 May 2001 17:47:42 -0500 Subject: [XML-SIG] newbie question In-Reply-To: <3B044E4C.37A2F38C@FourThought.com>; from Mike.Olson@fourthought.com on Thu, May 17, 2001 at 04:18:52PM -0600 References: <20010517150147.A5471@hddesign.com> <3B044E4C.37A2F38C@FourThought.com> Message-ID: <20010517174742.A5790@hddesign.com> Thanks a lot, that did the trick. Chris On Thu, May 17, 2001 at 04:18:52PM -0600, Mike Olson wrote: > Chris Meyers wrote: > > > > Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml: > > > > <?xml version="1.0" encoding="UTF-8"?> > > <report> > > <data> > > <rec> > > <fld id="1">123</fld> > > <fld id="2">John></fld> > > <fld id="3">Smith></fld> > > </rec> > > </data> > > </report> > > There are a couple of ways: > > 1. Use DOM > > from xml.dom.ext.reader import PyExpat > reader = PyExpat.Reader() > > dom = reader.fromString(XML_SRC) > > flds = dom.documentElement.getElementsByTagName('fld') > > for f in flds: > print fld.getAttribute('id') > print fld.firstChild.data > > > 2. Use XPath > > from xml import xpath > from xml.dom.ext.reader import PyExpat > reader = PyExpat.Reader() > > dom = reader.fromString(XML_SRC) > > flds = xpath.Evaluate('//fld',contextNode = dom) > > for f in flds: > print fld.getAttribute('id') > print fld.firstChild.data > > > Mike > > > > > > >From this xml I would like to pull out the id attributes and the values from the <fld> elements. I can do this in jython with jdom easily enough, but I need to use python for my current application > > > > If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it. > > > > Thanks, > > Chris > > > > _______________________________________________ > > XML-SIG maillist - XML-SIG@python.org > > http://mail.python.org/mailman/listinfo/xml-sig > > -- > Mike Olson Principal Consultant > mike.olson@fourthought.com (303)583-9900 x 102 > Fourthought, Inc. http://Fourthought.com > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Chris Meyers 7941 Tree Lane Suite 200 Madison WI 53717 From jsydik@virtualparadigm.com Fri May 18 00:14:30 2001 From: jsydik@virtualparadigm.com (Jeremy J. Sydik) Date: Thu, 17 May 2001 18:14:30 -0500 Subject: [XML-SIG] Advice needed: RTF->XML conversions In-Reply-To: <200105171815.f4HIFDF01101@mira.informatik.hu-berlin.de> Message-ID: <MMEHLOIJDENFKMFKBPHEOEKECDAA.jsydik@virtualparadigm.com> --------------------------------------------------------------------------- Martin is right. The Office/Word 'XML' can be a difficult thing to work with. It's been a while since i've thought about it, but you will probably need to account for the following: * Not all attributes are quoted * Singleton tags aren't closed (This can be dealt with fairly easily, however. It's simply the 'standard' singleton html tags that occur this way (br, img, etc). * There are a few microsoft namespaces to deal with, as well as special tags. The documentation for these is found in: http://msdn.microsoft.com/library/officedev/ofxml2k/ofhtml9.exe The primary ones you'll probably encounter are o: and w: * Also described in this document are <!--[if condition]>...<[endif]--> and <![if condition]>...<![endif]> pairs. These break most SGML and XML implementations. (It would be good to think of a regex solution, since you'll probably need one to properly enclose the attributes anyway). Once those issues are addressed, you SHOULD have valid XML. If you don't, chances are you haven't hit everything in this list :) Good Luck, Jeremy -----Original Message----- From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On Behalf Of Martin v. Loewis Sent: Thursday, May 17, 2001 1:15 PM To: Mike.Olson@fourthought.com Cc: tony.mcdonald@ncl.ac.uk; Alexandre.Fayolle@logilab.fr; xml-sig@python.org Subject: Re: [XML-SIG] Advice needed: RTF->XML conversions > Can you send me a sample of the word XML output, and the format your > looking for. You can probably do it with a stylesheet as long as what > word spits out really is XML. It isn't. Most notably, attribute values are not enclosed in quotes. I found that sgmlop can parse what word produces, though. Regards, Martin _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From rsalz@zolera.com Fri May 18 01:09:59 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 17 May 2001 20:09:59 -0400 Subject: [XML-SIG] newbie question References: <20010517150147.A5471@hddesign.com> Message-ID: <3B046857.2D18B6B4@zolera.com> Mike's already posted a solution. I've found the code in dom.ext useful for examples. /r$ From rsalz@zolera.com Fri May 18 01:40:34 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 17 May 2001 20:40:34 -0400 Subject: [XML-SIG] XML Canonicalization References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de> Message-ID: <3B046F82.3F306701@zolera.com> > xml.dom.ext sounds better than xml.dom.utils, since I dislike packages > with only a single module Me too. > and because it is also an extension. I think it's a matter of very detailed use of English. :) I view it as a utility. But it doesn't matter. > I'm not whether people can make sense out of c14n - I certainly > couldn't, although it is a cute name. 'normalize' would not be > appropriate, would it? No, the proper term really is canonicalization. I agree, the name is somewhat cute, but within the community C14N is as well-known as I18N. How about from xml.dom.ext import Canonicalize and in ext/__init__.py I add from c14n import Canonicalize So the filename is c14n.py, but the exported name is more use-friendly. From martin@loewis.home.cs.tu-berlin.de Thu May 17 22:36:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 23:36:38 +0200 Subject: [XML-SIG] newbie question In-Reply-To: <20010517150147.A5471@hddesign.com> (message from Chris Meyers on Thu, 17 May 2001 15:01:47 -0500) References: <20010517150147.A5471@hddesign.com> Message-ID: <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de> > From this xml I would like to pull out the id attributes and the > values from the <fld> elements. I can do this in jython with jdom > easily enough, but I need to use python for my current application In PyXML, it works mostly the same way. The only different thing is how to obtain a DOM Document; you use xml.dom.ext.reader.Sax2.FromXml* for this. Once you have a DOM tree, you proceed just as with jython, i.e. using getElementsByTagName, etc. You probably need to be aware of the Python DOM mapping, see http://www.python.org/doc/current/lib/module-xml.dom.html Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri May 18 05:08:54 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 06:08:54 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B046F82.3F306701@zolera.com> (message from Rich Salz on Thu, 17 May 2001 20:40:34 -0400) References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de> <3B046F82.3F306701@zolera.com> Message-ID: <200105180408.f4I48s000954@mira.informatik.hu-berlin.de> > How about > from xml.dom.ext import Canonicalize > and in ext/__init__.py I add > from c14n import Canonicalize > > So the filename is c14n.py, but the exported name is more use-friendly. That sounds good. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri May 18 05:17:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 06:17:00 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B042465.1DCA826D@zolera.com> (message from Rich Salz on Thu, 17 May 2001 15:20:05 -0400) References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042465.1DCA826D@zolera.com> Message-ID: <200105180417.f4I4H0p00981@mira.informatik.hu-berlin.de> > Gladly. Just tell me where (xml.dom.utils?) and where are the docs that > I should update. As for the docs, it would be IMO best to put a \section{xml.dom.ext.c14n} into doc/xml-ref.tex. You'll notice that much of the content of that file is outdated. Since updating the documentation consists of removing most of the stuff first, adding new sections contributes to that update process. Regards, Martin From rsalz@zolera.com Fri May 18 13:42:53 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 18 May 2001 08:42:53 -0400 Subject: [XML-SIG] newbie question References: <20010517150147.A5471@hddesign.com> <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de> Message-ID: <3B0518CD.79A4D3D9@zolera.com> > You probably need to be aware of the Python DOM mapping, see > > http://www.python.org/doc/current/lib/module-xml.dom.html That brings up a question I meant to ask last week. What's better, the "raw" mapping documented above, or the Corba-style mapping? That is, self.nodeType or self._get_nodeType() ? I am mainly interested to know which is most portable across Python DOM implementations, but I also care a bit about efficiency. Since Python has documented its own DOM interface, having an official Corba->Python mapping doesn't matter all that much to me, although it is convenient to be able to read Corba IDL and write Python without any intermediate docs. /r$ From fdrake@acm.org Fri May 18 15:22:23 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 18 May 2001 10:22:23 -0400 (EDT) Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B046F82.3F306701@zolera.com> References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de> <3B046F82.3F306701@zolera.com> Message-ID: <15109.12319.311051.900182@cj42289-a.reston1.va.home.com> Rich Salz writes: > How about > from xml.dom.ext import Canonicalize > and in ext/__init__.py I add > from c14n import Canonicalize How about calling the module "canon": from xml.dom.ext import canon def main(): ... = canon.Canonicalize(...) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Fri May 18 21:52:46 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 22:52:46 +0200 Subject: [XML-SIG] newbie question In-Reply-To: <3B0518CD.79A4D3D9@zolera.com> (message from Rich Salz on Fri, 18 May 2001 08:42:53 -0400) References: <20010517150147.A5471@hddesign.com> <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de> <3B0518CD.79A4D3D9@zolera.com> Message-ID: <200105182052.f4IKqkF01843@mira.informatik.hu-berlin.de> > What's better, the "raw" mapping documented above, or the Corba-style > mapping? That is, self.nodeType or self._get_nodeType() ? > > I am mainly interested to know which is most portable across Python DOM > implementations, but I also care a bit about efficiency. It's mainly a matter of personal taste. Some people believe in accessor functions, some in attributes. If you want to care about portability and speed, you should use attributes. Whether you go through __getattr__ or not varies depending on DOM implementation and attribute; most attributes will be directly available, though. Regards, Martin From tony.mcdonald@ncl.ac.uk Sun May 20 10:17:20 2001 From: tony.mcdonald@ncl.ac.uk (Tony McDonald) Date: Sun, 20 May 2001 10:17:20 +0100 Subject: [XML-SIG] Problems with 'multiple definitions' Message-ID: <B72D4A2F.8440%tony.mcdonald@ncl.ac.uk> Hi all, This isn't strictly an XML thing, but as the packages I really want to use are the XML ones, I thought the group might be able to help. I'm working with python2.1 and MacOS X and compiling up packages such as PyXML and 4Suite (although this happens with packages such as MySQLdb too). I use the standard procedure to build and install these packages, ie % python2.1 setup.py install But, when I test out 4Suite (for example), ie % cd /usr/local/doc/4Suite-0.11/test_suite/4XSLT % python2.1 basic_test.py I get this; dyld: python2.1 multiple definitions of symbol _XML_DefaultCurrent python2.1 definition of _XML_DefaultCurrent /usr/local/lib/python2.1/site-packages/Ft/Lib/cDomlettec.so definition of _XML_DefaultCurrent I get similar errors with other packages such as PyXML and MySQLdb. I've managed to install MySQLdb by stripping out an offending symbol from libmysqlclient.a, but surely there's a cleaner way of doing this? Is there some compiler flag I can set that gets around this? The python is a pre-compiled version from http://tony.lownds.com/macosx/ any help would be appreciated, this effectively stops me using any compiled modules under MacOS X (which is, in almost all other respects, excellent!). TIA tone -- Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/ The Medical School, Newcastle University Tel: +44 191 243 6140 A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope From karl@digicool.com Tue May 22 00:38:55 2001 From: karl@digicool.com (Karl Anderson) Date: 21 May 2001 16:38:55 -0700 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: Mike Olson's message of "Sun, 13 May 2001 19:14:17 -0600" References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> Message-ID: <m1r8xixofk.fsf@localhost.localdomain> Mike Olson <Mike.Olson@fourthought.com> writes: > "Martin v. Loewis" wrote: > > > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, > > only to discover that the StyleseetReader class is now much stronger > > connected to Ft.Lib than before, in particular to classes from > > pDomletteReader, and their specific instance attributes. > > I was just in there as well and quite suprised how complex the code has > become. I thought of doing some work on it but figured, it ain't > broke..... > Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette > into xml.utils? Better yet, let's merge pDomlette and minidom so there > is only one domlette. pDomlette has greatly out grown its original > purpose so I have not problems with moving it into XML-Sig. If you're suggesting that the DOMs should be consolidated so that tools like PyXML's XSLT could support only that DOM, I hope you'll reconsider. I'd like Zope's DOM to be usable by PyXML's XSLT and XPath implementations. There are some hurdles to this, though. The tests are only usable with 4Suite, which makes it harder to find inconsistencies. Submitting patches to 4Suite's implementations wouldn't be helpful for my goals, because 4Suite's XSLT and XPath processors have become more reliant on its particular DOM since these modules were forked to PyXML. -- Karl Anderson karl@digicool.com From Mike.Olson@fourthought.com Tue May 22 00:57:53 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 21 May 2001 17:57:53 -0600 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <m1r8xixofk.fsf@localhost.localdomain> Message-ID: <3B09AB81.910B27F2@FourThought.com> Karl Anderson wrote: > > Mike Olson <Mike.Olson@fourthought.com> writes: > > > "Martin v. Loewis" wrote: > > > > > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base, > > > only to discover that the StyleseetReader class is now much stronger > > > connected to Ft.Lib than before, in particular to classes from > > > pDomletteReader, and their specific instance attributes. > > > > I was just in there as well and quite suprised how complex the code has > > become. I thought of doing some work on it but figured, it ain't > > broke..... > > > Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette > > into xml.utils? Better yet, let's merge pDomlette and minidom so there > > is only one domlette. pDomlette has greatly out grown its original > > purpose so I have not problems with moving it into XML-Sig. > > If you're suggesting that the DOMs should be consolidated so that > tools like PyXML's XSLT could support only that DOM, I hope you'll > reconsider. I'd like Zope's DOM to be usable by PyXML's XSLT and > XPath implementations. Not at all. I was suggesting that both miniDOM and pDomlette are light weight python DOM implementations and I don't think we need two of them. If Zope's DOM supports the Python DOM interface, then it should work in xslt/xpath. If not it is a bug in xslt/xpath. However, I don't know if this will always be the case. 4XSLT is about to get a _big_ rewrite and we might not support a "runNode" interface anymore. If we do, it will probably not be the most efficent way to use 4xslt as we will have to translate from DOM into the internal data structure. > > There are some hurdles to this, though. The tests are only usable > with 4Suite, which makes it harder to find inconsistencies. > Submitting patches to 4Suite's implementations wouldn't be helpful for > my goals, because 4Suite's XSLT and XPath processors have become more > reliant on its particular DOM since these modules were forked to > PyXML. Actually, the tests would be easy to fix to use another DOM, (though I'm not sure how you would do it in Zope as I ran into many hurdles executing ZDOM outside of the Zope environment). Hoever, to do this, edit the file test_harness.py. It is used by every 4XSLT test script. Either add a test for ParsedXML, or replace all of the existing tests with a parsedXML test. Then just run test.py and all of the 4XSLT tests will use Parsed XML. I don't understand the more reliant part. How have we become more reliant. Are you talking about the fact that MArtin did a lot of work when he first moved 4XSLT into PyXML to disentangle 4XSLT from Ft.Lib? Then its not really more reliant, just not ported yet. FYI to all, we will be synching 4XSLT with Martins changes in the near future. > > -- > Karl Anderson karl@digicool.com -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From karl@digicool.com Tue May 22 03:07:25 2001 From: karl@digicool.com (Karl Anderson) Date: 21 May 2001 19:07:25 -0700 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: Mike Olson's message of "Mon, 21 May 2001 17:57:53 -0600" References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <m1r8xixofk.fsf@localhost.localdomain> <3B09AB81.910B27F2@FourThought.com> Message-ID: <m166euxhk2.fsf@localhost.localdomain> Mike Olson <Mike.Olson@fourthought.com> writes: > Karl Anderson wrote: > > > > There are some hurdles to this, though. The tests are only usable > > with 4Suite, which makes it harder to find inconsistencies. > > Submitting patches to 4Suite's implementations wouldn't be helpful for > > my goals, because 4Suite's XSLT and XPath processors have become more > > reliant on its particular DOM since these modules were forked to > > PyXML. > > Actually, the tests would be easy to fix to use another DOM, (though I'm > not sure how you would do it in Zope as I ran into many hurdles > executing ZDOM outside of the Zope environment). I don't know that ZDOM is a good measure of usefulness with other DOMs - I haven't really looked at it, much less tested it. Right now I'm concentrating on ParsedXML's DOM. For a simple example of using PyXML's XPath with ParsedXML: http://www.zope.org/Wikis/DevSite/Projects/ParsedXML/ParsedXMLWith4XPath You do need a Zope installation with ParsedXML, although you don't need to actually run Zope :) If you want to use ParsedXML to test usability with other DOM implementations, I'd be glad to help. > Hoever, to do this, > edit the file test_harness.py. It is used by every 4XSLT test script. > Either add a test for ParsedXML, or replace all of the existing tests > with a parsedXML test. Then just run test.py and all of the 4XSLT tests > will use Parsed XML. Thanks, I'll look into this when I can. > I don't understand the more reliant part. How have we become more > reliant. Are you talking about the fact that MArtin did a lot of work > when he first moved 4XSLT into PyXML to disentangle 4XSLT from Ft.Lib? > Then its not really more reliant, just not ported yet. Perhaps I misread the CVS histories. I was looking into how PyXML and 4Suite depended on the included DOM implementations, and I thought that 4XPath was copied over to PyXML, and that after that updates to 4Suite's tree made it dependent on its DOM. But looking again (I was running into trouble with XPath/Conversions.py), there seems to have been some syncing and stuff going on, I'd have to do some work to convince myself that I was correct. -- Karl Anderson karl@digicool.com From martin@loewis.home.cs.tu-berlin.de Tue May 22 06:15:11 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 22 May 2001 07:15:11 +0200 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: <m166euxhk2.fsf@localhost.localdomain> (message from Karl Anderson on 21 May 2001 19:07:25 -0700) References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <m1r8xixofk.fsf@localhost.localdomain> <3B09AB81.910B27F2@FourThought.com> <m166euxhk2.fsf@localhost.localdomain> Message-ID: <200105220515.f4M5FBi00961@mira.informatik.hu-berlin.de> > Perhaps I misread the CVS histories. I was looking into how PyXML and > 4Suite depended on the included DOM implementations, and I thought > that 4XPath was copied over to PyXML, and that after that updates to > 4Suite's tree made it dependent on its DOM. But looking again (I was > running into trouble with XPath/Conversions.py), there seems to have > been some syncing and stuff going on, I'd have to do some work to > convince myself that I was correct. Before I first checked 4XPath/4XSLT into PyXML, I had already significantly modified it; see README.4XPath for an outline of the changes. Some of these changes have been integrated into 4Suite. To continue to keep the two branches similar, I've now integrated the changes of 4Suite 0.11 into PyXML. I have not yet modified them to work stand-alone, yet, since I got stuck updating the Stylesheet reader. I think I will write a new stylesheet reader from scratch which only uses a SAX DOM builder and a DOM implementation, but I haven't started with that, yet. Regards, Martin From sam@webslingerZ.com Tue May 22 14:55:13 2001 From: sam@webslingerZ.com (Sam Brauer) Date: Tue, 22 May 2001 09:55:13 -0400 (EDT) Subject: [XML-SIG] ANN: new release of maki In-Reply-To: <E150mi0-0006yC-00@mail.python.org> Message-ID: <Pine.LNX.4.31.0105220933220.1926-100000@localhost.localdomain> I've released a new version of maki at http://maki.sourceforge.net maki is a mod_python handler which uses various 4Suite components to serve XML with Apache. It allows a web developer to specify processing rules based on path-matching regular expressions. Each rule describes a pipeline with any number of XSLT steps and/or custom processing steps. A processor that evaluates embedded Python source to dynamically modify the document is included. maki also supports time-based caching of output. Also included are two example "logicsheets": one that adds HTTP request data to the document and another that executes SQL queries and creates elements from the results. The overall functionality is similar (though intentionally not identical) to Cocoon. For more info, please take a look at the online documentation at http://maki.sourceforge.net/manual/ Thank you, Sam ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sam Brauer : sbrauer@users.sourceforge.net From karl@digicool.com Tue May 22 20:05:36 2001 From: karl@digicool.com (Karl Anderson) Date: 22 May 2001 12:05:36 -0700 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: "Martin v. Loewis"'s message of "Tue, 22 May 2001 07:15:11 +0200" References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <m1r8xixofk.fsf@localhost.localdomain> <3B09AB81.910B27F2@FourThought.com> <m166euxhk2.fsf@localhost.localdomain> <200105220515.f4M5FBi00961@mira.informatik.hu-berlin.de> Message-ID: <m1ofslw6f3.fsf@localhost.localdomain> Martin v. Loewis <martin@loewis.home.cs.tu-berlin.de> writes: > > Perhaps I misread the CVS histories. I was looking into how PyXML and > > 4Suite depended on the included DOM implementations, and I thought > > that 4XPath was copied over to PyXML, and that after that updates to > > 4Suite's tree made it dependent on its DOM. But looking again (I was > > running into trouble with XPath/Conversions.py), there seems to have > > been some syncing and stuff going on, I'd have to do some work to > > convince myself that I was correct. > > Before I first checked 4XPath/4XSLT into PyXML, I had already > significantly modified it; see README.4XPath for an outline of the > changes. Thanks for clearing that up. > Some of these changes have been integrated into 4Suite. To continue to > keep the two branches similar, I've now integrated the changes of > 4Suite 0.11 into PyXML. I have not yet modified them to work > stand-alone, yet, since I got stuck updating the Stylesheet reader. I > think I will write a new stylesheet reader from scratch which only > uses a SAX DOM builder and a DOM implementation, but I haven't started > with that, yet. Just to be clear, is PyXML's XSLT intended to work with already created DOM trees as well? -- Karl Anderson karl@digicool.com From Mike.Olson@fourthought.com Tue May 22 21:43:33 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 22 May 2001 14:43:33 -0600 Subject: [XML-SIG] ANN: 4Suite and 4SuiteServer 0.11.1 release canidate 1 Message-ID: <3B0ACF75.9694866B@FourThought.com> All, Here is the first release canidate for our 0.11.1 release. A handful of new features in this release and many bug fixes. Please give it a try as we try to work out the documentation and packaging bugs for the 0.11.1 final release (expected later this week). Please see http://4Suite.org/download.html for the packages. 4Suite new features: pure python parser for Xslt, XPath, and XPointer Support for unicode in the C based XSLT, XPath and XPointer parsers ODS dictionaries and type definitions ODS bug fixes and optimizations 4Suite Server new features FTP server text indexing using swish more CORBA support Backup and Restore command line tools Better security True access control lists Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue May 22 21:54:49 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 22 May 2001 14:54:49 -0600 Subject: [XML-SIG] Re: [4suite] ANN: 4Suite and 4SuiteServer 0.11.1 release canidate 1 References: <3B0ACF75.9694866B@FourThought.com> Message-ID: <3B0AD219.CB493E17@fourthought.com> Mike Olson wrote: > 4Suite Server new features > > FTP server > text indexing using swish > more CORBA support > Backup and Restore command line tools > Better security > True access control lists One note. The new 4SS requires a re-initialization of all databases. We specifically made backup and restore facilities a priority for this release so that in future, we will provide a smooth migration path whenever new releases break data. Hopefully no one has accumulated irreplaceable data in 4SS yet, and if you have, let us know and we should be able to help with the migration. Migration testing will be a standard part of every 4SS release form now so you needn't fear for your data in future. I apologize for any inconvenience. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mnot@mnot.net Tue May 22 23:06:41 2001 From: mnot@mnot.net (Mark Nottingham) Date: Tue, 22 May 2001 15:06:41 -0700 Subject: [XML-SIG] XML and Unicode Message-ID: <20010522150638.C22396@mnot.net> How does one detect the charset used in an XML document from a SAX2 parser (PyXML 0.6.5)? Also, if I have an XML document encoded ISO-8851-1 (and properly identified), should I have a reasonable expectation that the output of a SAX processor, post- .encode('utf-8'), should be correct if viewed in a Web browser with UTF-8 selected as a character encoding? In other words, is the post-parse unicode string a neutral representation of the 8851-x string, which can then be encoded as utf-8? Or, is it in the charset of the original XML document (my testing seems to indicate the latter - what was a 8851 character in the original text does not successfully come out the other side)? (Sorry if this is obtuse - just getting into i18n, and Python docs are thin on the ground) -- Mark Nottingham http://www.mnot.net/ From mal@lemburg.com Tue May 22 23:38:34 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 23 May 2001 00:38:34 +0200 Subject: [XML-SIG] XML and Unicode References: <20010522150638.C22396@mnot.net> Message-ID: <3B0AEA6A.9CCD2A1F@lemburg.com> Mark Nottingham wrote: > > How does one detect the charset used in an XML document from a SAX2 > parser (PyXML 0.6.5)? > > Also, if I have an XML document encoded ISO-8851-1 (and properly > identified), should I have a reasonable expectation that the output > of a SAX processor, post- .encode('utf-8'), should be correct if > viewed in a Web browser with UTF-8 selected as a character encoding? This should work... > In other words, is the post-parse unicode string a neutral > representation of the 8851-x string, which can then be encoded as > utf-8? Unicode is encoding neutral in the sense that it provides space for the characters of most scripts. If the parser returns Unicode, then you can encode it as UTF-8 and have the original contents of the attribute/element represented as UTF-8 string. > Or, is it in the charset of the original XML document (my > testing seems to indicate the latter - what was a 8851 character in > the original text does not successfully come out the other side)? > > (Sorry if this is obtuse - just getting into i18n, and Python docs > are thin on the ground) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy.kloth@fourthought.com Tue May 22 23:53:45 2001 From: jeremy.kloth@fourthought.com (Jeremy J Kloth) Date: Tue, 22 May 2001 16:53:45 -0600 Subject: [XML-SIG] New parsers in 4XPath and 4XSLT Message-ID: <003101c0e312$0e4519e0$f803a8c0@dhcp.fourthought.comfourthought.com> The new generated parsers in XPath and XSLT are now created in a more factory-ish method. The parsers are now referenced from: xml.(xpath|xslt).parser This allows for the changing of parsers easily. To create a runtime parser, call parser.new(). And to parse expressions simply use the parse() method on the created object. Hopefully this change will help ease the integration into PyXML. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mnot@mnot.net Wed May 23 03:33:18 2001 From: mnot@mnot.net (Mark Nottingham) Date: Tue, 22 May 2001 19:33:18 -0700 Subject: [XML-SIG] XML and Unicode In-Reply-To: <3B0AEA6A.9CCD2A1F@lemburg.com>; from mal@lemburg.com on Wed, May 23, 2001 at 12:38:34AM +0200 References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> Message-ID: <20010522193314.E22396@mnot.net> --jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline OK, so I'm not getting something then. The attached test script (and data file) is the problem pared down - if u'string' is a neutral encoding, and .encode('utf-8') generates a utf-8 encoded string of that encoding, then the utf-8.html output file should display correctly; however, it doesn't, while the latin-1 output does (because the input is latin-1). It seems like the XML parser isn't converting the ISO-8859-1 to Unicode; does this make sense? Thanks, On Wed, May 23, 2001 at 12:38:34AM +0200, M.-A. Lemburg wrote: > Mark Nottingham wrote: > > > > How does one detect the charset used in an XML document from a SAX2 > > parser (PyXML 0.6.5)? > > > > Also, if I have an XML document encoded ISO-8851-1 (and properly > > identified), should I have a reasonable expectation that the output > > of a SAX processor, post- .encode('utf-8'), should be correct if > > viewed in a Web browser with UTF-8 selected as a character encoding? > > This should work... > > > In other words, is the post-parse unicode string a neutral > > representation of the 8851-x string, which can then be encoded as > > utf-8? > > Unicode is encoding neutral in the sense that it provides > space for the characters of most scripts. If the parser returns > Unicode, then you can encode it as UTF-8 and have the original > contents of the attribute/element represented as UTF-8 string. > > > Or, is it in the charset of the original XML document (my > > testing seems to indicate the latter - what was a 8851 character in > > the original text does not successfully come out the other side)? > > > > (Sorry if this is obtuse - just getting into i18n, and Python docs > > are thin on the ground) > > -- > Marc-Andre Lemburg > CEO eGenix.com Software GmbH > ______________________________________________________________________ > Company & Consulting: http://www.egenix.com/ > Python Software: http://www.lemburg.com/python/ -- Mark Nottingham http://www.mnot.net/ --jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="testuni.py" #!/usr/bin/env python2.0 from xml import sax import string def run(i, e): dh = Parser() p = sax.sax2exts.make_parser() p.setContentHandler(dh) p.setFeature(sax.handler.feature_namespaces, 1) p.parse(i + '.xml') content = dh.content.encode(e) file = open(e + ".html", 'w') file.write(template % (e, content)) file.close() class Parser(sax.handler.ContentHandler): def __init__(self): self._tmp_buf = '' self.content = None def startElementNS(self, name, qname, attrs): pass def endElementNS(self, name, qname): if name[1] == 'content': self.content = string.strip(self._tmp_buf) def characters(self, content): self._tmp_buf = self._tmp_buf + content template = """\ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=%s"> </head> <body> <p>%s</p> </body> </html """ if __name__ == '__main__': run('ISO-8859-1', 'UTF-8') run('ISO-8859-1', 'Latin-1') --jI8keyz6grp/JLjh Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename="ISO-8859-1.xml" Content-Transfer-Encoding: 8bit <?xml version="1.0" encoding="ISO-8859-1" ?> <content>Net 21 – The Survivors</content> --jI8keyz6grp/JLjh-- From mal@lemburg.com Wed May 23 08:38:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 23 May 2001 09:38:14 +0200 Subject: [XML-SIG] XML and Unicode References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> Message-ID: <3B0B68E6.9CBF7689@lemburg.com> Mark Nottingham wrote: > > OK, so I'm not getting something then. The attached test script (and > data file) is the problem pared down - if u'string' is a neutral > encoding, and .encode('utf-8') generates a utf-8 encoded string of > that encoding, then the utf-8.html output file should display > correctly; however, it doesn't, while the latin-1 output does > (because the input is latin-1). > > It seems like the XML parser isn't converting the ISO-8859-1 to > Unicode; does this make sense? That's a possibility (even though I don't see any funny characters in your example XML file); looking through the pyexpat.c code it seems as if the parser assumes that the XML file is encoded as UTF-8 -- at least all Unicode conversions are done using UTF-8. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From hansv@net4all.be Wed May 23 08:44:20 2001 From: hansv@net4all.be (Hans verschooten) Date: Wed, 23 May 2001 09:44:20 +0200 Subject: [XML-SIG] HTML parsing on Python 2.1 Message-ID: <B73136F3.6A02%hansv@net4all.be> > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --MS_Mac_OE_3073455860_75874_MIME_Part Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Hi, I am using a freshly installed MacPython 2.1 and would like to know what I should install extra to use the following script: [uogbuji@borgia one-offs]$ cat html-to-xhtml-converter.py import sys from xml.dom.ext.reader import HtmlLib import xml.dom.ext #set up a re-usable reader object reader = HtmlLib.Reader() #parse HTML ffrom file or URI given on command line. Return the DOM document doc = reader.fromUri(sys.argv[1]) #Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for empty tags, all attributes with given value, etc. xml.dom.ext.XHtmlPrettyPrint(doc) If anybody could point me in the right direction, If tried installing PyXML but keep getting end-of line errors. After trying to correct these I keep running into errors like, ReleaseNode not found; HtmlLib has no module named Reader. Any help as to how and what should be installed on MacPython 2.1 would be greatly appreciated. Hans --MS_Mac_OE_3073455860_75874_MIME_Part Content-type: text/html; charset="US-ASCII" Content-transfer-encoding: quoted-printable <HTML> <HEAD> <TITLE>HTML parsing on Python 2.1</TITLE> </HEAD> <BODY> Hi,<BR> <BR> I am using a freshly installed MacPython 2.1 and would like to know what I = should install extra to use the following script:<BR> <FONT SIZE=3D"4"><FONT FACE=3D"Courier New"><BR> [<FONT COLOR=3D"#0000FF"><U>uogbuji@borgia</U></FONT> one-offs]$ cat html-to-= xhtml-converter.py <BR> import sys<BR> from xml.dom.ext.reader import HtmlLib<BR> import xml.dom.ext<BR> <BR> #set up a re-usable reader object<BR> reader =3D HtmlLib.Reader()<BR> <BR> #parse HTML ffrom file or URI given on command line. &nbsp;Return the DOM d= ocument<BR> doc =3D reader.fromUri(sys.argv[1])<BR> <BR> #Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for = <BR> empty tags, all attributes with given value, etc.<BR> <BR> xml.dom.ext.XHtmlPrettyPrint(doc)<BR> <BR> If anybody could point me in the right direction, If tried installing PyXML= but keep getting end-of line errors. After trying to correct these I keep r= unning into errors like, ReleaseNode not found; HtmlLib has no module named = Reader.<BR> <BR> Any help as to how and what should be installed on MacPython 2.1 would be g= reatly appreciated.<BR> <BR> Hans<BR> <BR> </FONT></FONT> </BODY> </HTML> --MS_Mac_OE_3073455860_75874_MIME_Part-- From Alexandre.Fayolle@logilab.fr Wed May 23 10:57:19 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 23 May 2001 11:57:19 +0200 (CEST) Subject: [XML-SIG] ANN: Narval 1.0 Message-ID: <Pine.LNX.4.21.0105231156470.1970-100000@orion.logilab.fr> Logilab (www.logilab.com) announces the release of Narval 1.0 GPL'd Intelligent Personnal Assistant Framework http://www.logilab.org/narval News ---- The engine is now stable as it has been working nicely for the past three months. It's also much faster. The Horn GUI features lots of usability improvements. The infopal application (available separately) is now usable. Description ----------- Narval is a framework (language + interpreter + GUI/IDE) dedicated to the setting up of intelligent personal assistants (IPAs). An Intelligent Personal Assitant is a companion that will help you in your daily work in the information world. It runs on your machine or on a remote server, and you can communicate with it via all standard means (email, web, telnet, phone, specific GUI, etc). It executes recipes (sequences of actions) you wrote, to perform a wide range of tasks, such as prepare your morning newspaper, help you surf the web by filtering out junk ads, keep searching the web day after day for things you want, participe in on-line auctions, learn you interests and bring you back valuable information, take care of repetitive chores, answer e-mail, negociate the date and time of a meeting, and much more... It is easy to extend the built in action library by writing new actions in Python. Infopal, your information pal, is a Narval application that implements part of the above, but Narval makes it easy for you to set up new assistants. Others applications will soon be available from Logilab. Logilab S.A. is a french company that specializes in the fields of artificial intelligence, knowledge management, data analysis and natural language processing. More info --------- Please see http://www.logilab.org/narval http://www.logilab.com http://www.logilab.fr or contact contact@logilab.fr -- Alexandre Fayolle http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From stuff4gary@hotmail.com Wed May 23 14:40:41 2001 From: stuff4gary@hotmail.com (gary cor) Date: Wed, 23 May 2001 13:40:41 Subject: [XML-SIG] XLST - Can't show JPEG image from XML abstraction to rendition Message-ID: <F231IWOZmurKcDXUjRr00001ff4@hotmail.com> Hi, I hope someone can help! I have set up some XSL files which use XLST methods to produce tables of information about images which works great!.. just using the MSXML 4.0 parser with explorer 5.5. However, I can't get the cells which suppose to show my imagethumbnails to display any images at all (the transformations for the tables won't work when they have my x:link for them in the XML). **** IN XSL ***** <xsl:template match="image"> <xsl:value-of select="picture"/> etc. </xsl:template> **** IN XML ***** <picture xlink:form="simple" href="imageLibrary/Sky.jpg" show"embed" actuate="auto"> I would be greatful if anyone has any suggestions on how I should go about including theses images . Kind Regards Gary C PS Also in a few months I would like to include SVG images for some illustrations that I have... I am under the understanding that I will have to use the svg.htc for explorer 5.5 and the <object> tag, does anyone know wether it is possible to use the same method for both images and svg... is it easier for me to do with a python parser than the micorosoft one? _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From mnot@mnot.net Wed May 23 16:46:25 2001 From: mnot@mnot.net (Mark Nottingham) Date: Wed, 23 May 2001 08:46:25 -0700 Subject: [XML-SIG] XML and Unicode In-Reply-To: <3B0B68E6.9CBF7689@lemburg.com>; from mal@lemburg.com on Wed, May 23, 2001 at 09:38:14AM +0200 References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> <3B0B68E6.9CBF7689@lemburg.com> Message-ID: <20010523084622.A25059@mnot.net> It's the em dash in the middle. If true, this behaviour would be a bug, no? Is there any kind of workaround possible (such as detecting the encoding of the XML file outside of the parser and .encode()ing to suit)? Thanks again, On Wed, May 23, 2001 at 09:38:14AM +0200, M.-A. Lemburg wrote: > Mark Nottingham wrote: > > > > OK, so I'm not getting something then. The attached test script (and > > data file) is the problem pared down - if u'string' is a neutral > > encoding, and .encode('utf-8') generates a utf-8 encoded string of > > that encoding, then the utf-8.html output file should display > > correctly; however, it doesn't, while the latin-1 output does > > (because the input is latin-1). > > > > It seems like the XML parser isn't converting the ISO-8859-1 to > > Unicode; does this make sense? > > That's a possibility (even though I don't see any funny characters > in your example XML file); looking through the pyexpat.c code > it seems as if the parser assumes that the XML file is encoded > as UTF-8 -- at least all Unicode conversions are done using UTF-8. > > -- > Marc-Andre Lemburg > CEO eGenix.com Software GmbH > ______________________________________________________________________ > Company & Consulting: http://www.egenix.com/ > Python Software: http://www.lemburg.com/python/ -- Mark Nottingham http://www.mnot.net/ From rsalz@zolera.com Wed May 23 17:10:39 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 23 May 2001 12:10:39 -0400 Subject: [XML-SIG] Web services non-SIG Message-ID: <3B0BE0FF.56CAB4FC@zolera.com> Guido is unconvinced of the longer-term viability of a separate Web Services SIG, and since we have no desire to add to his administrivia, for now we're going to use a SourceForge project. In particular, the "pywebsvcs-talk" mailing list is intended for discussion of Python and Web Sevices. To join, visit http://lists.sourceforge.net/lists/listinfo/pywebsvcs-talk The pywebsvcs project also has a developer's mailing list, and hopefully will soon have a CVS tree with <gasp>sources. If nobody objects, I'll add a link to the pywebsvcs project in the pyxml web page (htdocs/links.h it seems). /r$ From martin@loewis.home.cs.tu-berlin.de Wed May 23 19:28:21 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 20:28:21 +0200 Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib In-Reply-To: <m1ofslw6f3.fsf@localhost.localdomain> (message from Karl Anderson on 22 May 2001 12:05:36 -0700) References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <m1r8xixofk.fsf@localhost.localdomain> <3B09AB81.910B27F2@FourThought.com> <m166euxhk2.fsf@localhost.localdomain> <200105220515.f4M5FBi00961@mira.informatik.hu-berlin.de> <m1ofslw6f3.fsf@localhost.localdomain> Message-ID: <200105231828.f4NISLP01544@mira.informatik.hu-berlin.de> > Just to be clear, is PyXML's XSLT intended to work with already > created DOM trees as well? My immediate target is to make it work with minidom, without pDomlette. That will initially be tested by reading both the stylesheet and the document through a parser, but I can't see anything preventing usage of pre-loaded trees for the document or the stylesheet. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 23 21:01:50 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 22:01:50 +0200 Subject: [XML-SIG] XML and Unicode In-Reply-To: <20010522150638.C22396@mnot.net> (message from Mark Nottingham on Tue, 22 May 2001 15:06:41 -0700) References: <20010522150638.C22396@mnot.net> Message-ID: <200105232001.f4NK1ot02120@mira.informatik.hu-berlin.de> > How does one detect the charset used in an XML document from a SAX2 > parser (PyXML 0.6.5)? That is not supported in SAX. The underlying parser may expose this information; but that is of course parser dependent. > Also, if I have an XML document encoded ISO-8851-1 (and properly > identified), should I have a reasonable expectation that the output > of a SAX processor, post- .encode('utf-8'), should be correct if > viewed in a Web browser with UTF-8 selected as a character encoding? Not necessarily. If the document was a HTML document, and if it has a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> line, then the browser has to decide whether it leaves the XML header or the Content-Type. It would normally use the content type, which would be incorrect. If there is no incorrect character set information in the output document, then a receiver should display it properly. Of course, whether a Web browser can "correctly" display arbitrary XML documents is a different question. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 23 21:04:11 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 22:04:11 +0200 Subject: [XML-SIG] XML and Unicode In-Reply-To: <20010522193314.E22396@mnot.net> (message from Mark Nottingham on Tue, 22 May 2001 19:33:18 -0700) References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> Message-ID: <200105232004.f4NK4Bo02122@mira.informatik.hu-berlin.de> > It seems like the XML parser isn't converting the ISO-8859-1 to > Unicode; does this make sense? As others have explained, your document is really Windows CP 1252, not ISO 8859 1 encoded. If you consider the document as ISO-8859-1, then the parser *will* convert it correctly. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 23 21:15:06 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 22:15:06 +0200 Subject: [XML-SIG] XML and Unicode In-Reply-To: <20010523084622.A25059@mnot.net> (message from Mark Nottingham on Wed, 23 May 2001 08:46:25 -0700) References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> <3B0B68E6.9CBF7689@lemburg.com> <20010523084622.A25059@mnot.net> Message-ID: <200105232015.f4NKF6I02205@mira.informatik.hu-berlin.de> > > That's a possibility (even though I don't see any funny characters > > in your example XML file); looking through the pyexpat.c code > > it seems as if the parser assumes that the XML file is encoded > > as UTF-8 -- at least all Unicode conversions are done using UTF-8. > > > It's the em dash in the middle. If true, this behaviour would be a > bug, no? It would be a bug, but pyexpat works correctly. expat indeed does guarantee that all text is UTF-8, because it converts the file from any input encoding to UTF-8 before passing it to the application. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 23 21:29:06 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 22:29:06 +0200 Subject: [XML-SIG] New parsers in 4XPath and 4XSLT In-Reply-To: <003101c0e312$0e4519e0$f803a8c0@dhcp.fourthought.comfourthought.com> (jeremy.kloth@fourthought.com) References: <003101c0e312$0e4519e0$f803a8c0@dhcp.fourthought.comfourthought.com> Message-ID: <200105232029.f4NKT6O02302@mira.informatik.hu-berlin.de> > The new generated parsers in XPath and XSLT are now created in a more > factory-ish method. The parsers are now referenced from: > xml.(xpath|xslt).parser This allows for the changing of parsers easily. > To create a runtime parser, call parser.new(). And to parse expressions > simply use the parse() method on the created object. Great! I hope I can look into that shortly. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 23 21:28:32 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 23 May 2001 22:28:32 +0200 Subject: [XML-SIG] HTML parsing on Python 2.1 In-Reply-To: <B73136F3.6A02%hansv@net4all.be> (message from Hans verschooten on Wed, 23 May 2001 09:44:20 +0200) References: <B73136F3.6A02%hansv@net4all.be> Message-ID: <200105232028.f4NKSWW02300@mira.informatik.hu-berlin.de> > I am using a freshly installed MacPython 2.1 and would like to know > what I should install extra to use the following script: It works fine for me with Python 2.1 on Linux, using PyXML 0.6.5(+). > If anybody could point me in the right direction, If tried > installing PyXML but keep getting end-of line errors. After trying > to correct these I keep running into errors like, ReleaseNode not > found; HtmlLib has no module named Reader. That is quite unspecific: what exactly did you try, and what exactly happened? Regards, Martin From mnot@mnot.net Wed May 23 21:44:23 2001 From: mnot@mnot.net (Mark Nottingham) Date: Wed, 23 May 2001 13:44:23 -0700 Subject: [XML-SIG] XML and Unicode In-Reply-To: <200105232015.f4NKF6I02205@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Wed, May 23, 2001 at 10:15:06PM +0200 References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> <3B0B68E6.9CBF7689@lemburg.com> <20010523084622.A25059@mnot.net> <200105232015.f4NKF6I02205@mira.informatik.hu-berlin.de> Message-ID: <20010523134419.A4434@mnot.net> Martin, Thanks. If that's the case, what's happening here (see test script)? The source text, when written directly to HTML and identified as ISO-8859-1, correctly displays. when parsed by pyexpat, the resulting unicode string, .encode('UTF-8') and included in HTML identified as UTF-8 does not display correctly. I'm not sure I understand your previous message - noone has suggested that it's Windows CP 1252 (although I may have missed messages), and I'm not sure what you mean by 'consider the document as ISO-8859-1'; I'm feeding a document into an XML parser with encoding="ISO-8859-1", and getting unicode strings out of it. What mechanism do I have to consider it as having a particular encoding, beyond the XML declaration? I've been given the impression that unicode strings are encoding-neutral. Cheers & thanks, On Wed, May 23, 2001 at 10:15:06PM +0200, Martin v. Loewis wrote: > > > That's a possibility (even though I don't see any funny > > > characters in your example XML file); looking through the > > > pyexpat.c code it seems as if the parser assumes that the XML > > > file is encoded as UTF-8 -- at least all Unicode conversions > > > are done using UTF-8. > > > > > It's the em dash in the middle. If true, this behaviour would be > > a bug, no? > > It would be a bug, but pyexpat works correctly. expat indeed does > guarantee that all text is UTF-8, because it converts the file from > any input encoding to UTF-8 before passing it to the application. > > Regards, > Martin > On Wed, May 23, 2001 at 10:04:11PM +0200, Martin v. Loewis wrote: > > It seems like the XML parser isn't converting the ISO-8859-1 to > > Unicode; does this make sense? > > As others have explained, your document is really Windows CP 1252, > not ISO 8859 1 encoded. > > If you consider the document as ISO-8859-1, then the parser *will* > convert it correctly. > > Regards, > Martin -- Mark Nottingham http://www.mnot.net/ From martin@loewis.home.cs.tu-berlin.de Wed May 23 23:37:03 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 24 May 2001 00:37:03 +0200 Subject: [XML-SIG] XML and Unicode In-Reply-To: <20010523134419.A4434@mnot.net> (message from Mark Nottingham on Wed, 23 May 2001 13:44:23 -0700) References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> <3B0B68E6.9CBF7689@lemburg.com> <20010523084622.A25059@mnot.net> <200105232015.f4NKF6I02205@mira.informatik.hu-berlin.de> <20010523134419.A4434@mnot.net> Message-ID: <200105232237.f4NMb3p03391@mira.informatik.hu-berlin.de> > I'm not sure I understand your previous message - noone has suggested > that it's Windows CP 1252 (although I may have missed messages), and > I'm not sure what you mean by 'consider the document as ISO-8859-1'; > I'm feeding a document into an XML parser with encoding="ISO-8859-1", > and getting unicode strings out of it. There simply is no em-dash in ISO-8859-1; this is a Microsoft invention. Microsoft organizes character sets in code pages (an idea taken from IBM). For Code Page 1252, we have the character assignments <-N> /x96 <U2013> EN DASH <-M> /x97 <U2014> EM DASH So the characters '\x96' and '\x97', when interpreted as CP 1252, identify EN DASH and EM DASH, respectively. In ISO 8859-1, these characters have the meanings <SG> /x96 <U0096> START OF GUARDED AREA (SPA) <EG> /x97 <U0097> END OF GUARDED AREA (EPA) As you can see, they are considered control characters in ISO-8859-1. So if you want the character to be treated as EM DASH, you should identify the character set as CP 1252, not ISO-8859-1. Doing so, in turn, will result in the Unicode characters U+2013 and U+2014 being used, instead of the Unicode characters U+0096 and U+0097 (which identify control characters). Now, assuming that you correctly identify your character set, XML parsers may refuse your document in case they don't know what cp-1252 is. Even if that succeeds, converting the resulting Unicode strings to ISO-8859-1 will fail, as EM DASH has no representation in that character set. Of course, conversion into UTF-8 will succeed in any case - all Unicode characters are representable in UTF-8 > What mechanism do I have to consider it as having a particular > encoding, beyond the XML declaration? Sorry, I cannot understand this question; please rephrase. > I've been given the impression that unicode strings are > encoding-neutral. That impression is correct. Unfortunately, byte-oriented files are not encoding-neutral, so when you read or write from/to a byte stream, you have to know its encoding. Regards, Martin P.S. If you have a browser that displays '\x96' as EN DASH even if the encoding is ISO-8859-1, this browser is broken - it should treat the character as START OF GUARDED AREA. I could not figure out what the exact meaning of this character is, something along the lines: text between SPA and EPA is "guarded", i.e. it cannot be edited or cleared. I doubt any browser implements that. From mnot@mnot.net Wed May 23 23:55:02 2001 From: mnot@mnot.net (Mark Nottingham) Date: Wed, 23 May 2001 15:55:02 -0700 Subject: [XML-SIG] XML and Unicode In-Reply-To: <200105232237.f4NMb3p03391@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Thu, May 24, 2001 at 12:37:03AM +0200 References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net> <3B0B68E6.9CBF7689@lemburg.com> <20010523084622.A25059@mnot.net> <200105232015.f4NKF6I02205@mira.informatik.hu-berlin.de> <20010523134419.A4434@mnot.net> <200105232237.f4NMb3p03391@mira.informatik.hu-berlin.de> Message-ID: <20010523155458.C4434@mnot.net> On Thu, May 24, 2001 at 12:37:03AM +0200, Martin v. Loewis wrote: > There simply is no em-dash in ISO-8859-1; this is a Microsoft > invention. Microsoft organizes character sets in code pages (an idea > taken from IBM). For Code Page 1252, we have the character assignments [...] > P.S. If you have a browser that displays '\x96' as EN DASH even if the > encoding is ISO-8859-1, this browser is broken - it should treat the > character as START OF GUARDED AREA. Ah! That explains it. Thank you very much. Both IE and Mozilla display this character as an em dash when the encoding is set to ISO-8859-1 (and a few others). Very confusing. Thanks again, -- Mark Nottingham http://www.mnot.net/ From DKGunter@lbl.gov Thu May 24 02:46:54 2001 From: DKGunter@lbl.gov (Dan Gunter) Date: Wed, 23 May 2001 18:46:54 -0700 Subject: [XML-SIG] PythonWorks SOAP Message-ID: <3B0C680E.54971ECB@lbl.gov> I have been using PythonWorks' soaplib.py in a project, and although it is not bad I was hoping that some of the rough edges would get polished in the next release. But the next (0.9) version does not seem forthcoming. My question is: does anyone know when this might happen and/or what SOAP library is being more actively worked on? Thanks in advance, -- # # Dan Gunter # http://www-didc.lbl.gov/~dang/ # From sallyd@internationalexhibits.com Thu May 24 15:19:37 2001 From: sallyd@internationalexhibits.com (Sally Daugherty) Date: Thu, 24 May 2001 07:19:37 -0700 Subject: [XML-SIG] European display solutions that can reduce the impact of the rising gasoline prices. Message-ID: <01C0E424.808F3B80@tc03-20-204.tscnet.net> International Exhibits Inc. is performing a beta test on an e-mail = marketing campaign to promote our product lines. The intent is to = provide an unobtrusive, cost-effective and environmentally friendly = marketing campaign (compared to bulk mailings and fax-grams that kill = trees). We would appreciate your thoughts on our approach. If you want = to be removed from our data base please reply with the word "remove" in = the subject line. Our hope is that you will visit our web site at = http://www.internationalexhibits.com. =20 Note: International Exhibits manufactures 7 product lines and is a = distributor for another 20 product lines. I added an attachment on = several new European Product lines that will be shortly introduced on = our web site. We believe that these items will be a cost-effective = solution to the rising gasoline prices. If you project any display needs please feel free to contact me at = www.internatinalexhibits.com or by telephone at (360)769-9726. Warm Regards, Sally Daugherty General Manager International Exhibits, Inc. From doc@sympatico.ca Thu May 24 18:25:21 2001 From: doc@sympatico.ca (DOC) Date: Thu, 24 May 2001 10:25:21 -0700 Subject: [XML-SIG] PythonWorks SOAP References: <3B0C680E.54971ECB@lbl.gov> Message-ID: <004501c0e476$9c7597c0$20afd1d8@c5y3j01> What do you need? Perhaps I can help. I have been playing around with XML recently and am looking for something more substantial to work on. DOC ----- Original Message ----- From: "Dan Gunter" <dkgunter@lbl.gov> To: <xml-sig@python.org> Sent: Wednesday, May 23, 2001 6:46 PM Subject: [XML-SIG] PythonWorks SOAP > I have been using PythonWorks' soaplib.py in a project, and although > it is not bad I was hoping that some of the rough edges would get > polished in the next release. But the next (0.9) version does not seem > forthcoming. My question is: does anyone know when this might happen > and/or what SOAP library is being more actively worked on? Thanks in > advance, > > -- > # > # Dan Gunter > # http://www-didc.lbl.gov/~dang/ > # > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From eliot@isogen.com Thu May 24 15:22:06 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 24 May 2001 09:22:06 -0500 Subject: [XML-SIG] Messengers in DOM and XSLT Processors Message-ID: <3B0D190E.A4FD12F1@isogen.com> Using the framework provided with James Clark's SP parser, GroveMinder (a commercial grove and HyTime implementation sold by Epremis (www.epremis.com)) provides a very handy messenger facility where you pass in a callback that takes a structured message as input. Using this an application can collect messages and do something cool with them. We use GroveMinder in our distributed link management system and use its messenger facility. We also use the Python DOM and 4Suite XSLT processor to do server-side processing and we need to be able to capture messages and return them to the client. We already have a general messenger framework in our client and server code. I need to add support for messengers to the Python DOM and XSLT processors. Before I dive into this--is this something that's already there and I just haven't noticed it (I can't claim to have studied every line of code in detail) or can anyone offer any tips on how to proceed or things to avoid? We would, of course, be contributing any messenger support we added back to the project (and I'm still working on completing my DOM fixes/enhancements and packing those up as patches--I should be able to get that together by the end of next week as it's finally become a priority here). Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From rsalz@zolera.com Thu May 24 16:07:47 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 24 May 2001 11:07:47 -0400 Subject: [XML-SIG] PythonWorks SOAP References: <3B0C680E.54971ECB@lbl.gov> <004501c0e476$9c7597c0$20afd1d8@c5y3j01> Message-ID: <3B0D23C3.A8310894@zolera.com> > I have been using PythonWorks' soaplib.py in a project, and although > it is not bad I was hoping that some of the rough edges would get > polished in the next release. But the next (0.9) version does not seem > forthcoming. My question is: does anyone know when this might happen > and/or what SOAP library is being more actively worked on? Thanks in > advance, There are a couple of python SOAP projects that have more active development right now. Look at SOAP.py (www.actzero.com), SOAPy (soapy.sf.net), and shortly after the weekend ZSI (web location not yet). 4thought has a SOAP ipmlementation used with their RDF stuff, I recall. You might also want to go to pywebsvcs.sf.net, the home of a just-starting group of folks in the python and web services area. /r$ From uche.ogbuji@fourthought.com Thu May 24 19:02:54 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 24 May 2001 12:02:54 -0600 Subject: [XML-SIG] Messengers in DOM and XSLT Processors In-Reply-To: Message from "W. Eliot Kimber" <eliot@isogen.com> of "Thu, 24 May 2001 09:22:06 CDT." <3B0D190E.A4FD12F1@isogen.com> Message-ID: <200105241802.f4OI2sM06596@localhost.local> > Using the framework provided with James Clark's SP parser, GroveMinder > (a commercial grove and HyTime implementation sold by Epremis > (www.epremis.com)) provides a very handy messenger facility where you > pass in a callback that takes a structured message as input. Using this > an application can collect messages and do something cool with them. We > use GroveMinder in our distributed link management system and use its > messenger facility. We also use the Python DOM and 4Suite XSLT processor > to do server-side processing and we need to be able to capture messages > and return them to the client. We already have a general messenger > framework in our client and server code. I need to add support for > messengers to the Python DOM and XSLT processors. > > Before I dive into this--is this something that's already there and I > just haven't noticed it (I can't claim to have studied every line of > code in detail) or can anyone offer any tips on how to proceed or things > to avoid? I think you'd be a pioneer on this one, but I do appreciate your interest in taking a few arrows. I think that decoupled access to DOM and 4XSLT would be very useful in general. > We would, of course, be contributing any messenger support we added back > to the project (and I'm still working on completing my DOM > fixes/enhancements and packing those up as patches--I should be able to > get that together by the end of next week as it's finally become a > priority here). Which DOM are you using? 4DOM? minidom? other? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From eliot@isogen.com Thu May 24 19:09:39 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 24 May 2001 13:09:39 -0500 Subject: [XML-SIG] Messengers in DOM and XSLT Processors References: <200105241802.f4OI2sM06596@localhost.local> Message-ID: <3B0D4E63.991D3699@isogen.com> Uche Ogbuji wrote: > > We would, of course, be contributing any messenger support we added back > > to the project (and I'm still working on completing my DOM > > fixes/enhancements and packing those up as patches--I should be able to > > get that together by the end of next week as it's finally become a > > priority here). > > Which DOM are you using? 4DOM? minidom? other? 4DOM, as far as I know. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From amorgan@mitre.org Thu May 24 23:39:21 2001 From: amorgan@mitre.org (Alex Morgan) Date: Thu, 24 May 2001 18:39:21 -0400 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements Message-ID: <3B0D8D99.B94E7542@mitre.org> When an xml.parsers.expat parser handles CDATA with an '&lt;' in it, it turns this into a '<' when it processes it. How can I stop this behavior? I apologize if this was discussed recently in the mailing list or is in the documentation. I have looked in both areas, but may have missed it. Thanks, -- -Alex Morgan Homepage: http://pubpages.unh.edu/~amorgan AIM login: HomeySage Phone: (781) 271-6306 Office: 3K-136, 202 Burlington Rd, Bedford, MA From martin@loewis.home.cs.tu-berlin.de Fri May 25 01:06:35 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 02:06:35 +0200 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <3B0D8D99.B94E7542@mitre.org> (message from Alex Morgan on Thu, 24 May 2001 18:39:21 -0400) References: <3B0D8D99.B94E7542@mitre.org> Message-ID: <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> > When an xml.parsers.expat parser handles CDATA with an '&lt;' in it, it > turns this into a '<' when it processes it. It does not do this for me, using PyXML 0.6.5 on Linux. Can you give a specific example where markup in a CDATA section is interpreted? Regards, Martin From Juergen Hermann" <jh@web.de Fri May 25 02:17:26 2001 From: Juergen Hermann" <jh@web.de (Juergen Hermann) Date: Fri, 25 May 2001 03:17:26 +0200 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> Message-ID: <m1536Cs-007Yb7C@smtp.web.de> On Fri, 25 May 2001 02:06:35 +0200, Martin v. Loewis wrote: >> When an xml.parsers.expat parser handles CDATA with an '&lt;' in it, = it >> turns this into a '<' when it processes it. > >It does not do this for me, using PyXML 0.6.5 on Linux. Can you give a >specific example where markup in a CDATA section is interpreted? Also, is Alex talking about a CDATA section, or is he mixing up PCDATA w= ith CDATA? <t>PCDATA &lt;</t> <t><![CDATA[CDATA <]]></t> Ciao, J=FCrgen From fdrake@acm.org Fri May 25 05:13:46 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 00:13:46 -0400 (EDT) Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <m1536Cs-007Yb7C@smtp.web.de> References: <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <m1536Cs-007Yb7C@smtp.web.de> Message-ID: <15117.56314.662015.891593@cj42289-a.reston1.va.home.com> Juergen Hermann writes: > On Fri, 25 May 2001 02:06:35 +0200, Martin v. Loewis wrote: > >It does not do this for me, using PyXML 0.6.5 on Linux. Can you give a > >specific example where markup in a CDATA section is interpreted? > > Also, is Alex talking about a CDATA section, or is he mixing up PCDATA with > CDATA? > > <t>PCDATA &lt;</t> > > <t><![CDATA[CDATA <]]></t> Or a CDATA attribute value? (My first guess.) <t attr="The &lt; symbol is cool!" /> -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From Alexandre.Fayolle@logilab.fr Fri May 25 08:34:38 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 25 May 2001 09:34:38 +0200 (CEST) Subject: [XML-SIG] external entities and CDATA sections Message-ID: <Pine.LNX.4.21.0105250930240.6432-100000@orion.logilab.fr> Hello, While writing some documentation, I wanted to include some python code in a docbook document. My first thought was using an external entity referencing the source file. However, the code has some interger comparison code, and features a couple of '<' characters, so it should be set in a CDATA section for proper handling. This in turn prevents the resolution of the external entity. How would the XML experts on the list tackle this? TIA Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From rsalz@zolera.com Fri May 25 14:00:19 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 25 May 2001 09:00:19 -0400 Subject: [XML-SIG] cStringIO Message-ID: <3B0E5763.FC2ED68E@zolera.com> Are there any guidelines as to when it isn't safe to use cStringIO? As long as everyone sticks to UTF-8 it should work, right? I know that incoming XML might have some other encoding, but if I'm using SAX or DOM, it will have been converted, right? /r$ From fdrake@acm.org Fri May 25 15:27:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 10:27:58 -0400 (EDT) Subject: [XML-SIG] cStringIO In-Reply-To: <3B0E5763.FC2ED68E@zolera.com> References: <3B0E5763.FC2ED68E@zolera.com> Message-ID: <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> Rich Salz writes: > Are there any guidelines as to when it isn't safe to use cStringIO? As > long as everyone sticks to UTF-8 it should work, right? I know that > incoming XML might have some other encoding, but if I'm using SAX or > DOM, it will have been converted, right? cStringIO works with 8-bit strings, regardless of the encoding. It does not work with non-ASCII Unicode strings. Fixing that is on my plate, but I don't have time allotted for it yet. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Fri May 25 16:02:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 25 May 2001 09:02:09 -0600 Subject: [XML-SIG] external entities and CDATA sections In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> of "Fri, 25 May 2001 09:34:38 +0200." <Pine.LNX.4.21.0105250930240.6432-100000@orion.logilab.fr> Message-ID: <200105251502.f4PF29313415@localhost.local> > Hello, > > While writing some documentation, I wanted to include some python code in > a docbook document. My first thought was using an external entity > referencing the source file. However, the code has some interger > comparison code, and features a couple of '<' characters, so it should be > set in a CDATA section for proper handling. This in turn prevents the > resolution of the external entity. > > How would the XML experts on the list tackle this? Unfortunately there is no easy solution. You'd have to wrap all the entities in ]]>&entity;<![CDATA[ None of the other potential solutions, XInclude, XLink of embed type, etc., would help here. Of course you can always not use CDATA sections and just &lt; escape what you need to. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Fri May 25 16:09:56 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 25 May 2001 17:09:56 +0200 (CEST) Subject: [XML-SIG] external entities and CDATA sections In-Reply-To: <200105251502.f4PF29313415@localhost.local> Message-ID: <Pine.LNX.4.21.0105251707560.8076-100000@leo.logilab.fr> On Fri, 25 May 2001, Uche Ogbuji wrote: > Of course you can always not use CDATA sections and just &lt; escape what you > need to. The idea was using the code 'as is' in the documentation to avoid maintaining both the escaped and runnable version. I'll go one using a small script to escape the examples when generating the documentation. Thanks. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Fri May 25 16:55:36 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 11:55:36 -0400 (EDT) Subject: [XML-SIG] cStringIO In-Reply-To: <3B0E7D16.3FA06CE8@zolera.com> References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> <3B0E7D16.3FA06CE8@zolera.com> Message-ID: <15118.32888.681428.716667@cj42289-a.reston1.va.home.com> Rich Salz writes: > Sorry, I don't understand. What's a non-ASCII unicode string? > Something with the high-bit on? If so, then doesn't httplib.py > have a problem using cStringIO ? Yes; any Unicode string that contains non-ASCII characters can't be converted to an 8-bit string correctly since the ASCII encoding is used by default (and there's no way to tell cStringIO to use a different encoding). Why would httplib have a problem with cStringIO? Pulling data over a socket always yields 8-bit strings, which work just fine with cStringIO regardless of the high bit. The problems with the cStringIO and Unicode are based entirely on the implicit conversion of the Unicode to ASCII, not the 8th bit per se. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From rsalz@zolera.com Fri May 25 16:41:10 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 25 May 2001 11:41:10 -0400 Subject: [XML-SIG] cStringIO References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> Message-ID: <3B0E7D16.3FA06CE8@zolera.com> > cStringIO works with 8-bit strings, regardless of the encoding. It > does not work with non-ASCII Unicode strings. Fixing that is on my > plate, but I don't have time allotted for it yet. Sorry, I don't understand. What's a non-ASCII unicode string? Something with the high-bit on? If so, then doesn't httplib.py have a problem using cStringIO ? tnx. /r$ PS: #define UNLESS ... ? Someone has a PERL sense of humor. :) /r$ From amorgan@mitre.org Fri May 25 17:19:30 2001 From: amorgan@mitre.org (Alex Morgan) Date: Fri, 25 May 2001 12:19:30 -0400 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> Message-ID: <3B0E8612.45BAF00D@mitre.org> An example of the behavior I am talking about is input that includes the following: '<reference> Morse &amp; Feshbach </reference>' With a CDATA handler: 'def char_data(data): print data' Will return 'Morse & Feshbach', when I would like it to return the original string, as is. "Martin v. Loewis" wrote: > > > When an xml.parsers.expat parser handles CDATA with an '&lt;' in it, it > > turns this into a '<' when it processes it. > > It does not do this for me, using PyXML 0.6.5 on Linux. Can you give a > specific example where markup in a CDATA section is interpreted? > > Regards, > Martin -- -Alex Morgan Homepage: http://pubpages.unh.edu/~amorgan AIM login: HomeySage Phone: (781) 271-6306 Office: 3K-136, 202 Burlington Rd, Bedford, MA From rsalz@zolera.com Fri May 25 17:24:08 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 25 May 2001 12:24:08 -0400 Subject: [XML-SIG] cStringIO References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> <3B0E7D16.3FA06CE8@zolera.com> <15118.32888.681428.716667@cj42289-a.reston1.va.home.com> Message-ID: <3B0E8728.365E67B@zolera.com> It looks to me (from skimming the code in cStringIO.c), that the code is 8bit transparent. I thought UTF-8 made all multi-byte values have the 8th bit on. So, if I'm using cStringIO I should be okay, if I'm just using cStringIO to transport data, or maybe do readline or similar. Once I need to look at individual characters, I'm hosed. But if I want to collect the value ofa bunch of TEXT_NODE elements and output them, wont' that work? /r$ From fdrake@acm.org Fri May 25 17:29:31 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 12:29:31 -0400 (EDT) Subject: [XML-SIG] cStringIO In-Reply-To: <3B0E8728.365E67B@zolera.com> References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> <3B0E7D16.3FA06CE8@zolera.com> <15118.32888.681428.716667@cj42289-a.reston1.va.home.com> <3B0E8728.365E67B@zolera.com> Message-ID: <15118.34923.55835.44275@cj42289-a.reston1.va.home.com> Rich Salz writes: > It looks to me (from skimming the code in cStringIO.c), that the code > is 8bit transparent. I thought UTF-8 made all multi-byte values have > the 8th bit on. So, if I'm using cStringIO I should be okay, if I'm > just using cStringIO to transport data, or maybe do readline or That's correct. > similar. Once I need to look at individual characters, I'm hosed. But > if I want to collect the value ofa bunch of TEXT_NODE elements and > output them, wont' that work? The *only* problem involves Unicode objects, not Unicode data encoded in 8-bit strings. So if you're TEXT_NODE objects actually contain 8-bit strings, it'll work just fine. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From fdrake@acm.org Fri May 25 17:32:50 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 12:32:50 -0400 (EDT) Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <3B0E8612.45BAF00D@mitre.org> References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <3B0E8612.45BAF00D@mitre.org> Message-ID: <15118.35122.442238.574572@cj42289-a.reston1.va.home.com> Alex Morgan writes: > Will return 'Morse & Feshbach', when I would like it to return the > original string, as is. Set the DefaultHandler attribute of the parser object; it will be called with the unexpanded entity reference as a string '&amp;'. Depending on what other handlers you set, it may get other things as well, but always in the marked-up form rather than the interpreted form. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Fri May 25 21:15:42 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 22:15:42 +0200 Subject: [XML-SIG] cStringIO In-Reply-To: <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> Message-ID: <200105252015.f4PKFga01183@mira.informatik.hu-berlin.de> > cStringIO works with 8-bit strings, regardless of the encoding. It > does not work with non-ASCII Unicode strings. Fixing that is on my > plate, but I don't have time allotted for it yet. One issue of reading UTF-8, whether from cStringIO or elsewhere, might break result strings inside a character (i.e. between character boundaries). So be careful with applying unicode() or .decode on such a string - you may have to save some bytes for the next .read() call. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri May 25 21:27:28 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 22:27:28 +0200 Subject: [XML-SIG] cStringIO In-Reply-To: <3B0E8728.365E67B@zolera.com> (message from Rich Salz on Fri, 25 May 2001 12:24:08 -0400) References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> <3B0E7D16.3FA06CE8@zolera.com> <15118.32888.681428.716667@cj42289-a.reston1.va.home.com> <3B0E8728.365E67B@zolera.com> Message-ID: <200105252027.f4PKRS601186@mira.informatik.hu-berlin.de> > It looks to me (from skimming the code in cStringIO.c), that the code > is 8bit transparent. I thought UTF-8 made all multi-byte values have > the 8th bit on. So, if I'm using cStringIO I should be okay, if I'm > just using cStringIO to transport data, or maybe do readline or > similar. Once I need to look at individual characters, I'm hosed. But > if I want to collect the value ofa bunch of TEXT_NODE elements and > output them, wont' that work? Depends on how exactly you do that. If you just write the text.data attribute to the cStringIO, it might fail, if text.data is a Unicode object (please note that a string object that is UTF-8-encoded is *not* a Unicode object, it is a byte string). To see the problem, do import cStringIO o = cStringIO.StringIO() o.write(u"My 0.02\N{EURO SIGN}") Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri May 25 21:38:43 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 22:38:43 +0200 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <3B0E8612.45BAF00D@mitre.org> (message from Alex Morgan on Fri, 25 May 2001 12:19:30 -0400) References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <3B0E8612.45BAF00D@mitre.org> Message-ID: <200105252038.f4PKchI01340@mira.informatik.hu-berlin.de> > An example of the behavior I am talking about is input that includes the > following: > > '<reference> Morse &amp; Feshbach </reference>' > > With a CDATA handler: > > 'def char_data(data): > print data' > > Will return 'Morse & Feshbach', when I would like it to return the > original string, as is. Fred already mentioned the default handler, but I'd like you to reconsider your request: &amp; and & are really the same thing; one is marked-up, the other is not. If you have a need to output the contents again as XML, you may find xml.sax.saxutils.escape useful. Regards, Martin From fdrake@acm.org Fri May 25 21:39:52 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 16:39:52 -0400 (EDT) Subject: [XML-SIG] cStringIO In-Reply-To: <200105252015.f4PKFga01183@mira.informatik.hu-berlin.de> References: <3B0E5763.FC2ED68E@zolera.com> <15118.27630.468805.814729@cj42289-a.reston1.va.home.com> <200105252015.f4PKFga01183@mira.informatik.hu-berlin.de> Message-ID: <15118.49944.451281.919103@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > One issue of reading UTF-8, whether from cStringIO or elsewhere, might > break result strings inside a character (i.e. between character > boundaries). So be careful with applying unicode() or .decode on such > a string - you may have to save some bytes for the next .read() call. Correct -- the cStringIO object is just a stream of bytes, like a file object. To read characters, you'll need to wrap it with a decoder using the codecs module, or pass the bytes to a parser that can handle them properly (like Expat). -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From fdrake@acm.org Fri May 25 21:50:59 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 16:50:59 -0400 (EDT) Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <200105252038.f4PKchI01340@mira.informatik.hu-berlin.de> References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <3B0E8612.45BAF00D@mitre.org> <200105252038.f4PKchI01340@mira.informatik.hu-berlin.de> Message-ID: <15118.50611.898283.174028@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > Fred already mentioned the default handler, but I'd like you to > reconsider your request: &amp; and & are really the same thing; one is > marked-up, the other is not. I only wish it were that easy! In cases where you want to preserve the input as much as possible, it can be important to distinguish between an internal entity reference and the expansion: <!DOCTYPE doc [ <!ENTITY MyEmployer "Digital Creations"> ]> <doc>&MyEmployer;</doc> Now, if I want to load the document into a DOM, modify a few things, and dump it back out for further human editing, I want the entity references intact. With Expat, the only way I've found to do this is to use the DefaultHandler to capture this information. Whether or not the text is expanded directly or made a child of an entity reference node should be determined by the application. The DOM Level 3 Load/Save working draft has knobs to control this behavior. (If anyone knows a way to determine whether a document contains &lt;, &#60;, &#x3c;, or &#x3C;, I'd love to hear about it!) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Fri May 25 22:00:13 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 23:00:13 +0200 Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <15118.50611.898283.174028@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <3B0E8612.45BAF00D@mitre.org> <200105252038.f4PKchI01340@mira.informatik.hu-berlin.de> <15118.50611.898283.174028@cj42289-a.reston1.va.home.com> Message-ID: <200105252100.f4PL0Dv01673@mira.informatik.hu-berlin.de> > Now, if I want to load the document into a DOM, modify a few things, > and dump it back out for further human editing, I want the entity > references intact. With Expat, the only way I've found to do this is > to use the DefaultHandler to capture this information. Of course, those of us contributing to Expat should have no problems to make it not expand internal entity references :-) Regards, Martin From fdrake@acm.org Fri May 25 22:15:12 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 May 2001 17:15:12 -0400 (EDT) Subject: [XML-SIG] xml.parsers.expat not converting aliased CDATA elements In-Reply-To: <200105252100.f4PL0Dv01673@mira.informatik.hu-berlin.de> References: <3B0D8D99.B94E7542@mitre.org> <200105250006.f4P06ZX01302@mira.informatik.hu-berlin.de> <3B0E8612.45BAF00D@mitre.org> <200105252038.f4PKchI01340@mira.informatik.hu-berlin.de> <15118.50611.898283.174028@cj42289-a.reston1.va.home.com> <200105252100.f4PL0Dv01673@mira.informatik.hu-berlin.de> Message-ID: <15118.52064.181731.967296@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > Of course, those of us contributing to Expat should have no problems > to make it not expand internal entity references :-) Of course not. ;-) Now, is there anyone *actually* contributing? -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From haering_python@gmx.de Sun May 27 02:29:47 2001 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sun, 27 May 2001 03:29:47 +0200 Subject: [XML-SIG] SRPMs Message-ID: <20010527032946.A12024@lilith.hqd-internal> Sorry if this is not the right place to ask. >From the SourceForge page, I can download a Windows installer and RPMs for PyXML. There isn't a source RPM available, however. Could this possibly be fixed? I don't want to write yet another SPEC file myself if I can avoid it. Gerhard -- mail: gerhard <at> bigfoot <dot> de registered Linux user #64239 web: http://highqualdev.com public key at homepage public key fingerprint: DEC1 1D02 5743 1159 CD20 A4B6 7B22 6575 86AB 43C0 reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b'))) From teg@redhat.com Sun May 27 04:09:56 2001 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 26 May 2001 23:09:56 -0400 Subject: [XML-SIG] SRPMs In-Reply-To: <20010527032946.A12024@lilith.hqd-internal> References: <20010527032946.A12024@lilith.hqd-internal> Message-ID: <xuy1ypb32t7.fsf@halden.devel.redhat.com> Gerhard H=E4ring <haering_python@gmx.de> writes: > Sorry if this is not the right place to ask. >=20 > >From the SourceForge page, I can download a Windows installer and RPMs= for > PyXML. There isn't a source RPM available, however. Could this possibly= be > fixed? >=20 > I don't want to write yet another SPEC file myself if I can avoid it. Just download the tar file - it contains all you need.=20 That said, you can find handmade rpms in rawhide. --=20 Trond Eivind Glomsr=F8d Red Hat, Inc. From martin@loewis.home.cs.tu-berlin.de Sun May 27 09:56:01 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 27 May 2001 10:56:01 +0200 Subject: [XML-SIG] SRPMs In-Reply-To: <20010527032946.A12024@lilith.hqd-internal> (message from Gerhard =?ISO-8859-1?Q?H=E4ring?= on Sun, 27 May 2001 03:29:47 +0200) References: <20010527032946.A12024@lilith.hqd-internal> Message-ID: <200105270856.f4R8u1q01134@mira.informatik.hu-berlin.de> > Sorry if this is not the right place to ask. Hi Gerhard, This is certainly the right place to ask. > >From the SourceForge page, I can download a Windows installer and RPMs for > PyXML. There isn't a source RPM available, however. Could this possibly be > fixed? No. The source RPM does not add any additional information, so I won't upload it. > I don't want to write yet another SPEC file myself if I can avoid it. You don't have to. To build a source RPM, just unpack the sources, and invoke python setup.py bdist_rpm Of course, if you merely want to install the package, doing python setup.py install is good enough. Hope this helps, Martin From haering_python@gmx.de Sun May 27 17:12:06 2001 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sun, 27 May 2001 18:12:06 +0200 Subject: [XML-SIG] SRPMs In-Reply-To: <200105270856.f4R8u1q01134@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sun, May 27, 2001 at 10:56:01AM +0200 References: <20010527032946.A12024@lilith.hqd-internal> <200105270856.f4R8u1q01134@mira.informatik.hu-berlin.de> Message-ID: <20010527181206.A1304@lilith.hqd-internal> On Sun, May 27, 2001 at 10:56:01AM +0200, Martin v. Loewis wrote: > > I don't want to write yet another SPEC file myself if I can avoid it. > > You don't have to. To build a source RPM, just unpack the sources, and > invoke > > python setup.py bdist_rpm Oh, I forgot about that feature of distutils. Works very nicely. Thanks, Gerhard -- mail: gerhard <at> bigfoot <dot> de registered Linux user #64239 web: http://highqualdev.com public key at homepage public key fingerprint: DEC1 1D02 5743 1159 CD20 A4B6 7B22 6575 86AB 43C0 reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b'))) From linudom@hotmail.com Mon May 28 19:17:50 2001 From: linudom@hotmail.com (Dom Linu) Date: Mon, 28 May 2001 18:17:50 -0000 Subject: [XML-SIG] getAttribute?? Message-ID: <F153weDc7kJIZAqscCQ000101a3@hotmail.com> <html><DIV>I have tried this many different ways, but it never seems to work and I always abandon PyXML&nbsp;in favor of something else... so I'll ask here, why does this fail:</DIV> <DIV>&nbsp;</DIV> <DIV>&gt;&gt;&gt; from xml.dom.ext.reader.Sax2 import FromXml<BR>&gt;&gt;&gt; doc = FromXml("&lt;mydoc id='123'&gt;text here&lt;/mydoc&gt;")<BR>&gt;&gt;&gt; elem = doc.documentElement<BR>&gt;&gt;&gt; attr = elem.getAttribute("id")<BR>&gt;&gt;&gt; print attr</DIV> <DIV>&nbsp;</DIV> <DIV>&gt;&gt;&gt; type(attr)<BR>&lt;type 'string'&gt;</DIV> <DIV>&nbsp;</DIV> <DIV>I've tried other document, other platforms (both Unix and Win32), and other techniques, but I just can't seem to get an attribute.&nbsp; Any enlightenment would be illuminating.</DIV> <DIV>&nbsp;</DIV> <DIV>thx.</DIV> <DIV>&nbsp;</DIV><br clear=all><hr>Get your FREE download of MSN Explorer at <a href="http://explorer.msn.com">http://explorer.msn.com</a><br></p></html> From Alexandre.Fayolle@logilab.fr Mon May 28 19:51:32 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 28 May 2001 20:51:32 +0200 (CEST) Subject: [XML-SIG] getAttribute?? In-Reply-To: <F153weDc7kJIZAqscCQ000101a3@hotmail.com> Message-ID: <Pine.LNX.4.21.0105282050510.27654-100000@leo.logilab.fr> On Mon, 28 May 2001, Dom Linu wrote: > I have tried this many different ways, but it never seems to work and I > always abandon PyXML in favor of something else... so I'll ask here, why > does this fail: >   > >>> from xml.dom.ext.reader.Sax2 import FromXml > >>> doc = FromXml("<mydoc id='123'>text here</mydoc>") > >>> elem = doc.documentElement > >>> attr = elem.getAttribute("id") Try this: attr = elem.getAttributeNS('','id') Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From dag@orion.no Mon May 28 20:19:24 2001 From: dag@orion.no (Dag Sunde) Date: Mon, 28 May 2001 21:19:24 +0200 Subject: [XML-SIG] getAttribute?? References: <Pine.LNX.4.21.0105282050510.27654-100000@leo.logilab.fr> Message-ID: <055901c0e7ab$1af82db0$43145c3e@orion.no> Ah! I got interested in Dom Linu's problem, and was able to get the attribute with: >>> atr = elem.attributes['','id'].value but couldn't for my life understand the first param... It's the NameSpace! :-) But why isn't getAttribute('id') working when the NS is an empty string? Does getAttribute('id') work if the NS somehow is None? Dag. ----- Original Message ----- From: "Alexandre Fayolle" <Alexandre.Fayolle@logilab.fr> To: "Dom Linu" <linudom@hotmail.com> Cc: <xml-sig@python.org> Sent: Monday, May 28, 2001 8:51 PM Subject: Re: [XML-SIG] getAttribute?? > On Mon, 28 May 2001, Dom Linu wrote: > > > I have tried this many different ways, but it never seems to work and I > > always abandon PyXML in favor of something else... so I'll ask here, why > > does this fail: > > > > >>> from xml.dom.ext.reader.Sax2 import FromXml > > >>> doc = FromXml("<mydoc id='123'>text here</mydoc>") > > >>> elem = doc.documentElement > > >>> attr = elem.getAttribute("id") > > Try this: > > attr = elem.getAttributeNS('','id') > > Alexandre Fayolle > -- > http://www.logilab.com > Narval is the first software agent available as free software (GPL). > LOGILAB, Paris (France). > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. Admin Orion Energy Consulting AS ********************************************************************** From linudom@hotmail.com Mon May 28 20:54:50 2001 From: linudom@hotmail.com (Dom Linu) Date: Mon, 28 May 2001 19:54:50 -0000 Subject: [XML-SIG] getAttribute?? Message-ID: <F52EFk2H3b98KtdhQYZ000100d0@hotmail.com> <html><DIV> <P>As always, the SIG rules.&nbsp; Bravo!&nbsp; Still perplexing is the interesting&nbsp;"feature" of getAttribute(&lt;attrname&gt;)...</P> <P>Thanks again!<BR></P></DIV> <DIV></DIV> <DIV></DIV>&gt;From: Alexandre Fayolle <ALEXANDRE.FAYOLLE@LOGILAB.FR> <DIV></DIV>&gt;To: Dom Linu <LINUDOM@HOTMAIL.COM> <DIV></DIV>&gt;CC: xml-sig@python.org <DIV></DIV>&gt;Subject: Re: [XML-SIG] getAttribute?? <DIV></DIV>&gt;Date: Mon, 28 May 2001 20:51:32 +0200 (CEST) <DIV></DIV>&gt; <DIV></DIV>&gt;On Mon, 28 May 2001, Dom Linu wrote: <DIV></DIV>&gt; <DIV></DIV>&gt; &gt; I have tried this many different ways, but it never seems to work and I <DIV></DIV>&gt; &gt; always abandon PyXML&nbsp;in favor of something else... so I'll ask here, why <DIV></DIV>&gt; &gt; does this fail: <DIV></DIV>&gt; &gt; &nbsp; <DIV></DIV>&gt; &gt; &gt;&gt;&gt; from xml.dom.ext.reader.Sax2 import FromXml <DIV></DIV>&gt; &gt; &gt;&gt;&gt; doc = FromXml("<MYDOC id=123>text here</MYDOC>") <DIV></DIV>&gt; &gt; &gt;&gt;&gt; elem = doc.documentElement <DIV></DIV>&gt; &gt; &gt;&gt;&gt; attr = elem.getAttribute("id") <DIV></DIV>&gt; <DIV></DIV>&gt;Try this: <DIV></DIV>&gt; <DIV></DIV>&gt;attr = elem.getAttributeNS('','id') <DIV></DIV>&gt; <DIV></DIV>&gt;Alexandre Fayolle <DIV></DIV>&gt;-- <DIV></DIV>&gt;http://www.logilab.com <DIV></DIV>&gt;Narval is the first software agent available as free software (GPL). <DIV></DIV>&gt;LOGILAB, Paris (France). <DIV></DIV>&gt; <DIV></DIV>&gt; <DIV></DIV>&gt;_______________________________________________ <DIV></DIV>&gt;XML-SIG maillist - XML-SIG@python.org <DIV></DIV>&gt;http://mail.python.org/mailman/listinfo/xml-sig <DIV></DIV><br clear=all><hr>Get your FREE download of MSN Explorer at <a href="http://explorer.msn.com">http://explorer.msn.com</a><br></p></html> From Mike.Olson@fourthought.com Mon May 28 21:48:52 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 28 May 2001 14:48:52 -0600 Subject: [XML-SIG] getAttribute?? References: <F153weDc7kJIZAqscCQ000101a3@hotmail.com> Message-ID: <3B12B9B4.A5C38A8E@FourThought.com> Dom Linu wrote: > > I have tried this many different ways, but it never seems to work and > I always abandon PyXML in favor of something else... so I'll ask here, > why does this fail: > > >>> from xml.dom.ext.reader.Sax2 import FromXml > >>> doc = FromXml("<mydoc id='123'>text here</mydoc>") > >>> elem = doc.documentElement > >>> attr = elem.getAttribute("id") > >>> print attr > > >>> type(attr) > <type 'string'> Because the Sax2 reader is namespace aware so you need to use the DOM level II interface of getAttributeNS('','id') > > I've tried other document, other platforms (both Unix and Win32), and > other techniques, but I just can't seem to get an attribute. Any > enlightenment would be illuminating. > > thx. > > > ---------------------------------------------------------------------- > Get your FREE download of MSN Explorer at http://explorer.msn.com > > _______________________________________________ XML-SIG maillist - > XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From linudom@hotmail.com Mon May 28 22:11:07 2001 From: linudom@hotmail.com (Dom Linu) Date: Mon, 28 May 2001 21:11:07 -0000 Subject: [XML-SIG] getAttribute?? Message-ID: <F61AxRYG6jj69EHaz8k00010246@hotmail.com> <html><DIV> <P>Wow -- very informative.&nbsp; Thank you.&nbsp; I was working on the assumption that if namespaces weren't in use, that you use non-namespace functions.&nbsp; That seems to have worked for everything else that I'm doing, but to be honest I can't remember if I've always been using the Sax2 reader-- I would have to dig.&nbsp; I mean, with the Sax2 reader (implied by using FromXml)&nbsp;getElementsByTagName works, without using getElementsByTagNameNS I'm pretty sure...&nbsp;&nbsp;is this inconsistent, or am I missing something?&nbsp; (the latter probably being true!)</P> <P>dl.<BR><BR></P></DIV> <DIV></DIV> <DIV></DIV>&gt;From: Mike Olson <MIKE.OLSON@FOURTHOUGHT.COM> <DIV></DIV>&gt;To: Dom Linu <LINUDOM@HOTMAIL.COM> <DIV></DIV>&gt;CC: xml-sig@python.org <DIV></DIV>&gt;Subject: Re: [XML-SIG] getAttribute?? <DIV></DIV>&gt;Date: Mon, 28 May 2001 14:48:52 -0600 <DIV></DIV>&gt; <DIV></DIV>&gt;Dom Linu wrote: <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; I have tried this many different ways, but it never seems to work and <DIV></DIV>&gt; &gt; I always abandon PyXML in favor of something else... so I'll ask here, <DIV></DIV>&gt; &gt; why does this fail: <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; &gt;&gt;&gt; from xml.dom.ext.reader.Sax2 import FromXml <DIV></DIV>&gt; &gt; &gt;&gt;&gt; doc = FromXml("<MYDOC id=123>text here</MYDOC>") <DIV></DIV>&gt; &gt; &gt;&gt;&gt; elem = doc.documentElement <DIV></DIV>&gt; &gt; &gt;&gt;&gt; attr = elem.getAttribute("id") <DIV></DIV>&gt; &gt; &gt;&gt;&gt; print attr <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; &gt;&gt;&gt; type(attr) <DIV></DIV>&gt; &gt; <TYPE ?string?> <DIV></DIV>&gt; <DIV></DIV>&gt;Because the Sax2 reader is namespace aware so you need to use the DOM <DIV></DIV>&gt;level II interface of getAttributeNS('','id') <DIV></DIV>&gt; <DIV></DIV>&gt; <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; I've tried other document, other platforms (both Unix and Win32), and <DIV></DIV>&gt; &gt; other techniques, but I just can't seem to get an attribute. Any <DIV></DIV>&gt; &gt; enlightenment would be illuminating. <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; thx. <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; ---------------------------------------------------------------------- <DIV></DIV>&gt; &gt; Get your FREE download of MSN Explorer at http://explorer.msn.com <DIV></DIV>&gt; &gt; <DIV></DIV>&gt; &gt; _______________________________________________ XML-SIG maillist - <DIV></DIV>&gt; &gt; XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig <DIV></DIV>&gt; <DIV></DIV>&gt;-- <DIV></DIV>&gt;Mike Olson Principal Consultant <DIV></DIV>&gt;mike.olson@fourthought.com (303)583-9900 x 102 <DIV></DIV>&gt;Fourthought, Inc. http://Fourthought.com <DIV></DIV>&gt;Software-engineering, knowledge-management, XML, CORBA, Linux, Python <DIV></DIV>&gt; <DIV></DIV>&gt;_______________________________________________ <DIV></DIV>&gt;XML-SIG maillist - XML-SIG@python.org <DIV></DIV>&gt;http://mail.python.org/mailman/listinfo/xml-sig <DIV></DIV><br clear=all><hr>Get your FREE download of MSN Explorer at <a href="http://explorer.msn.com">http://explorer.msn.com</a><br></p></html> From eliot@isogen.com Tue May 29 00:24:29 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Mon, 28 May 2001 18:24:29 -0500 Subject: [XML-SIG] getAttribute?? References: <F61AxRYG6jj69EHaz8k00010246@hotmail.com> Message-ID: <3B12DE2D.6B57DF91@isogen.com> Dom Linu wrote: > > Wow -- very informative. Thank you. I was working on the assumption > that if namespaces weren't in use, that you use non-namespace > functions. That seems to have worked for everything else that I'm > doing, but to be honest I can't remember if I've always been using the > Sax2 reader-- I would have to dig. I mean, with the Sax2 reader > (implied by using FromXml) getElementsByTagName works, without using > getElementsByTagNameNS I'm pretty sure... is this inconsistent, or am > I missing something? (the latter probably being true!) I considered the current behavior a bug (that non-namespace functions require a null namespace value) and fixed it in my local copy of the code. Unfortunately, I haven't had a chance to package up these fixes and submit them back to the SIG yet. The problem I found was that the dictionaries where things were indexed all assumed a tupple key with a possibly null namespace value. The fix was easy: just synthesize the tupple for the non-namespace lookup methods. Cheers, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From larsga@garshol.priv.no Tue May 29 08:14:45 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 May 2001 09:14:45 +0200 Subject: [XML-SIG] building XML docs using ? In-Reply-To: <200105171709.f4HH9SX17328@localhost.local> References: <200105171709.f4HH9SX17328@localhost.local> Message-ID: <m34ru4y6ca.fsf@lambda.garshol.priv.no> (I've been at XML Europe, and so didn't see your response until now.) * Uche Ogbuji | | Why not? Because most XML handling tools are not very scalable, | XSLT being the foremost example. That is true, but it still doesn't mean that there is something wrong with documents that are 100MB in size, just that there is something wrong with part of the tool set. The other part of the tool set will handle this just fine. I've been working with things like encyclopedias needing to be imported into CMSs as well as turning the Open Directory Project data into a topic map, and in these cases the documents naturally become very big. Processing these documents using SAX was no problem at all, although it admittedly took a while. In fact, an event-based representation was quite natural for these applications, though I admit that this will not apply to all applications. | Also because XML eliminates the need, which I think quite | unneccesary, of storing mountains of data in a single file. | Inclusion, transclusion, other linking mechanisms, and many tools | are available for breaking XML into manageable packets. Packets of 100MB are quite manageable with the right tools. | Opinion of others might vary, of course. It does. :-) --Lars M. From larsga@garshol.priv.no Tue May 29 08:16:58 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 May 2001 09:16:58 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B03D8B4.9108432D@zolera.com> References: <3B03D8B4.9108432D@zolera.com> Message-ID: <m33d9oy68l.fsf@lambda.garshol.priv.no> * Rich Salz | | I would be more than happy to add this to PyXML if there's interest. | Since it operates on DOM nodes, perhaps xml.dom.utils ? I know this is a little late now, but anyway: why did we do this based on the DOM? Isn't SAX far more natural for something as simple as this? It's faster, it works for DOM representations as well, and it scales much better. --Lars M. From cadeau@kipix.com Tue May 29 18:12:33 2001 From: cadeau@kipix.com (cadeau@kipix.com) Date: Tue, 29 May 2001 10:12:33 PDT Subject: [XML-SIG] Kipix(r) va vous aider a VENDRE PLUS... Message-ID: <3b1361873b1a9576@andira.wanadoo.fr> (added by andira.wanadoo.fr) <html> <head> <title>Document sans-titre</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <script language="JavaScript"> <!-- function MM_timelinePlay(tmLnName, myID) { //v1.2 //Copyright 1997 Macromedia, Inc. All rights reserved. var i,j,tmLn,props,keyFrm,sprite,numKeyFr,firstKeyFr,propNum,theObj,firstTime=false; if (document.MM_Time == null) MM_initTimelines(); //if *very* 1st time tmLn = document.MM_Time[tmLnName]; if (myID == null) { myID = ++tmLn.ID; firstTime=true;}//if new call, incr ID if (myID == tmLn.ID) { //if Im newest setTimeout('MM_timelinePlay("'+tmLnName+'",'+myID+')',tmLn.delay); fNew = ++tmLn.curFrame; for (i=0; i<tmLn.length; i++) { sprite = tmLn[i]; if (sprite.charAt(0) == 's') { if (sprite.obj) { numKeyFr = sprite.keyFrames.length; firstKeyFr = sprite.keyFrames[0]; if (fNew >= firstKeyFr && fNew <= sprite.keyFrames[numKeyFr-1]) {//in range keyFrm=1; for (j=0; j<sprite.values.length; j++) { props = sprite.values[j]; if (numKeyFr != props.length) { if (props.prop2 == null) sprite.obj[props.prop] = props[fNew-firstKeyFr]; else sprite.obj[props.prop2][props.prop] = props[fNew-firstKeyFr]; } else { while (keyFrm<numKeyFr && fNew>=sprite.keyFrames[keyFrm]) keyFrm++; if (firstTime || fNew==sprite.keyFrames[keyFrm-1]) { if (props.prop2 == null) sprite.obj[props.prop] = props[keyFrm-1]; else sprite.obj[props.prop2][props.prop] = props[keyFrm-1]; } } } } } } else if (sprite.charAt(0)=='b' && fNew == sprite.frame) eval(sprite.value); if (fNew > tmLn.lastFrame) tmLn.ID = 0; } } } function MM_timelineGoto(tmLnName, fNew, numGotos) { //v2.0 //Copyright 1997 Macromedia, Inc. All rights reserved. var i,j,tmLn,props,keyFrm,sprite,numKeyFr,firstKeyFr,lastKeyFr,propNum,theObj; if (document.MM_Time == null) MM_initTimelines(); //if *very* 1st time tmLn = document.MM_Time[tmLnName]; if (numGotos != null) if (tmLn.gotoCount == null) tmLn.gotoCount = 1; else if (tmLn.gotoCount++ >= numGotos) {tmLn.gotoCount=0; return} jmpFwd = (fNew > tmLn.curFrame); for (i = 0; i < tmLn.length; i++) { sprite = (jmpFwd)? tmLn[i] : tmLn[(tmLn.length-1)-i]; //count bkwds if jumping back if (sprite.charAt(0) == "s") { numKeyFr = sprite.keyFrames.length; firstKeyFr = sprite.keyFrames[0]; lastKeyFr = sprite.keyFrames[numKeyFr - 1]; if ((jmpFwd && fNew<firstKeyFr) || (!jmpFwd && lastKeyFr<fNew)) continue; //skip if untouchd for (keyFrm=1; keyFrm<numKeyFr && fNew>=sprite.keyFrames[keyFrm]; keyFrm++); for (j=0; j<sprite.values.length; j++) { props = sprite.values[j]; if (numKeyFr == props.length) propNum = keyFrm-1 //keyframes only else propNum = Math.min(Math.max(0,fNew-firstKeyFr),props.length-1); //or keep in legal range if (sprite.obj != null) { if (props.prop2 == null) sprite.obj[props.prop] = props[propNum]; else sprite.obj[props.prop2][props.prop] = props[propNum]; } } } else if (sprite.charAt(0)=='b' && fNew == sprite.frame) eval(sprite.value); } tmLn.curFrame = fNew; if (tmLn.ID == 0) eval('MM_timelinePlay(tmLnName)'); } function MM_initTimelines() { //MM_initTimelines() Copyright 1997 Macromedia, Inc. All rights reserved. var ns = navigator.appName == "Netscape"; document.MM_Time = new Array(1); document.MM_Time[0] = new Array(5); document.MM_Time["Timeline1"] = document.MM_Time[0]; document.MM_Time[0].MM_Name = "Timeline1"; document.MM_Time[0].fps = 15; document.MM_Time[0][0] = new String("behavior"); document.MM_Time[0][0].frame = 14; document.MM_Time[0][0].value = "MM_timelineGoto('Timeline1','1')"; document.MM_Time[0][1] = new String("sprite"); document.MM_Time[0][1].slot = 1; if (ns) document.MM_Time[0][1].obj = document["Layer1"]; else document.MM_Time[0][1].obj = document.all ? document.all["Layer1"] : null; document.MM_Time[0][1].keyFrames = new Array(1, 10); document.MM_Time[0][1].values = new Array(1); document.MM_Time[0][1].values[0] = new Array("visible","visible"); document.MM_Time[0][1].values[0].prop = "visibility"; if (!ns) document.MM_Time[0][1].values[0].prop2 = "style"; document.MM_Time[0][2] = new String("sprite"); document.MM_Time[0][2].slot = 1; if (ns) document.MM_Time[0][2].obj = document["Layer1"]; else document.MM_Time[0][2].obj = document.all ? document.all["Layer1"] : null; document.MM_Time[0][2].keyFrames = new Array(11, 13); document.MM_Time[0][2].values = new Array(1); document.MM_Time[0][2].values[0] = new Array("hidden","hidden"); document.MM_Time[0][2].values[0].prop = "visibility"; if (!ns) document.MM_Time[0][2].values[0].prop2 = "style"; document.MM_Time[0][3] = new String("sprite"); document.MM_Time[0][3].slot = 2; if (ns) document.MM_Time[0][3].obj = document["Layer2"]; else document.MM_Time[0][3].obj = document.all ? document.all["Layer2"] : null; document.MM_Time[0][3].keyFrames = new Array(1, 10); document.MM_Time[0][3].values = new Array(1); document.MM_Time[0][3].values[0] = new Array("hidden","hidden"); document.MM_Time[0][3].values[0].prop = "visibility"; if (!ns) document.MM_Time[0][3].values[0].prop2 = "style"; document.MM_Time[0][4] = new String("sprite"); document.MM_Time[0][4].slot = 2; if (ns) document.MM_Time[0][4].obj = document["Layer2"]; else document.MM_Time[0][4].obj = document.all ? document.all["Layer2"] : null; document.MM_Time[0][4].keyFrames = new Array(11, 13); document.MM_Time[0][4].values = new Array(1); document.MM_Time[0][4].values[0] = new Array("visible","visible"); document.MM_Time[0][4].values[0].prop = "visibility"; if (!ns) document.MM_Time[0][4].values[0].prop2 = "style"; document.MM_Time[0].lastFrame = 14; for (i=0; i<document.MM_Time.length; i++) { document.MM_Time[i].ID = null; document.MM_Time[i].curFrame = 0; document.MM_Time[i].delay = 1000/document.MM_Time[i].fps; } } //--> </script> </head> <body bgcolor="#FFFFFF" onLoad="MM_timelinePlay('Timeline1')"> <table width="565" border="0"> <tr> <td> <font color="#0000FF"><b><font color="#000000">De :</font></b> Laurette Hassan <b> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#000000">A&gt;</font></b> Directeur Marketing/Communication <b><font color="#000000"> &nbsp;Copie :</font></b> Dir. Commercial<br> <font size="4" color="#000000">Kipix</font><font size="1" color="#000000">®</font><font size="4" color="#000000"> est le nouveau cadeau publicitaire qui va vous aider à <b>vendre plus !</b></font><br> (explications sur <a href="http://www.kipix.com">www.kipix.com</a> et ci-apr&egrave;s...)</font> <font color="#0000FF"> &nbsp;If&nbsp;you speak <b>English</b> go on <a href="http://www.kipix.com">www.kipix.com</a></font><br> <br> <font size="4">Comment placer l'adresse de <b>votre site Internet</b> (ou le n° de téléphone de votre service clientèle, etc...) directement sous les yeux de vos prospects ?</font><br> <div id="Layer1" style="position:absolute; width:293px; height:220px; z-index:1; top: 145px; left: 0px; visibility: visible"><img src="http://www.mbiz.fr/avant.jpg" width="293" height="220"></div> <div id="Layer2" style="position:absolute; width:293px; height:220px; z-index:2; left: 0px; top: 145px; visibility: hidden"><img src="http://www.mbiz.fr/apres.jpg" width="293" height="220"></div> <div id="Layer3" style="position:absolute; width:85px; height:70px; z-index:3; left: 298px; top: 145px"><img src="http://www.mbiz.fr/logokipix.jpg" width="85" height="70"></div> <div id="Layer4" style="position:absolute; width:154px; height:54px; z-index:4; left: 385px; top: 153px"> <p align="center"><font size="2" color="#CC0000">2 grammes de<br> concentr&eacute; de communication<br> </font><font size="2"><a href="http://www.kipix.com"><font color="#0000FF" size="4">www.kipix.com</font></a></font></p> </div> <div id="Layer6" style="position:absolute; width:242px; height:94px; z-index:6; left: 297px; top: 270px">C'est parce qu'il permet &agrave; vos prospects d'afficher ainsi leurs notes d'un seul geste, que Kipix® assure la présence de <b>votre message publicitaire à un endroit stratégique...</b></div> <p><br> <br> <br> <br> <br> <font size="3">Comment placer l'adresse de</font><br> <font size="3">de <b>votre site Internet</b></font><br> <font size="3">(ou le n° de téléphone</font><br> <font size="3">de votre service clientèle, etc...)</font><br> <font size="3">directement sous les yeux de vos prospects ?</font> <br> <br> <br> &nbsp;&nbsp;&nbsp;La société Kozatis s.a.s. (spécialisée dans la conception de supports publicitaires innovants) vous propose une solution efficace et originale <b>pour que votre message publicitaire soit vraiment VU et LU</b>... Tr&egrave;s fr&eacute;quemment VU et tr&egrave;s fr&eacute;quemment LU !!!</p> <p>&nbsp;&nbsp;&nbsp;Une des raisons du succ&egrave;s publicitaire du porte-notes Kipix® (<b>Syst&egrave;me Brevet&eacute;</b>,<b> Médaille d'Or des Inventions </b>et <b>Prix du Pr&eacute;sident du Concours Lépine 2000</b>) est qu'il est perçu par vos (futurs) clients comme un cadeau original et tr&egrave;s pratique : il rend un service concret qui assurera <b>votre présence permanente</b> auprès de vos prospects... </p> <p>&nbsp;&nbsp;&nbsp;En effet, Kipix® sera rapidement adopté par vos clients ou prospects car il leur permet de mettre en évidence leurs notes, mémos et feuillets <b>d'un seul geste</b> ; et ce à un endroit stratégique pour votre communication : sur le pourtour de l'écran de leur ordinateur ! (avez-vous remarqu&eacute; la quantit&eacute; de documents qu'ils essaient d'afficher quotidiennement &agrave; cet endroit ?)</p> <p> &nbsp;&nbsp;&nbsp;Vous b&eacute;n&eacute;ficierez de l' &quot;effet Kipix®&quot; de multiples façons :<br> <br> <font face="Bookman Old Style" size=3><font face="Symbol" size="2">&#183;</font></font> en prospection : vos commerciaux laisseront dor&eacute;navant une trace visible, durable et positive de leur passage... (avec Kipix® <b>votre message publicitaire sera bien en vue jusqu'au jour o&ugrave; votre prospect aura besoin de vos produits/services</b>. Vendre, c'est souvent &ecirc;tre l&agrave; au bon moment : Kipix® est justement con&ccedil;u pour &ecirc;tre l&agrave; au bon moment !)...<br> <font face="Bookman Old Style" size=3><font face="Symbol" size="2">&#183;</font></font> pendant vos salons -ou autres &eacute;v&eacute;nements- Kipix® fera merveille : dispos&eacute; dans un r&eacute;cipient transparent, et aperçu depuis les allées d'un salon, <b>Kipix® intrigue les visiteurs et les pousse à s'approcher</b>, augmentant ainsi le nombre de vos contacts !...<br> <font face="Bookman Old Style" size=3><font face="Symbol" size="2">&#183;</font></font> dans vos courriers et vos mailings (il est extra plat et pèse moins lourd qu'une feuille A4 : &agrave; peine 4 grammes packaging inclu !). De plus, Kipix® procure une sensation tactile tr&egrave;s particuli&egrave;re au travers d'une enveloppe, <b>ce qui &quot;force&quot; litt&eacute;ralement vos prospects &agrave; ouvrir les courriers que vous leur adressez</b>...<br> <font face="Bookman Old Style" size=3><font face="Symbol" size="2">&#183;</font></font> en tant que prime directe...<br> <font face="Bookman Old Style" size=3><font face="Symbol" size="2">&#183;</font></font> etc... </p> <p>&nbsp;&nbsp;&nbsp;N'hésitez-pas à me téléphoner pour toute information supplémentaire <b>ou pour recevoir un &eacute;chantillon gratuit</b>,</p> <p>&nbsp;&nbsp;&nbsp;Sincères salutations <b><font color="#0000FF">:-)</font></b> </p> <p><img src="http://www.mbiz.fr/signaturelolopouremails.jpg" width="104" height="59"></p> <p>Laurette Hassan - Directrice Commerciale de Kozatis s.a.s.<br> +33(0)6 61 93 46 69 ou +33(0)1 58 53 52 62 <a href="mailto:cadeau@kipix.com?subject=Kipix%AE%20sur%20l'Internet"><font color="#0000FF">cadeau@kipix.com</font></a></p> <p>&nbsp;&nbsp;&nbsp;P.S. : quelques-uns des annonceurs qui font confiance &agrave; Kipix® : <b>www.nomade.fr, SNCF, Microsoft, IBM, Cegetel, FFF, Johnson &amp; Johnson, Nortel, BNP, CIC, Cr&eacute;dit Agricole, Lufthansa, Groupe CASINO, ORT Reuters, UNIX,</b> <b>Badoit,</b> etc... (pour d&eacute;couvrir quelques-uns des visuels de leurs Kipix®, visitez <a href="http://www.kipix.com"><font color="#0000FF">www.kipix.com</font></a>)</p> <p><font color="#0000FF">&nbsp;&nbsp;&nbsp;Une petite <b>DEMONSTRATION VIDEO</b> r&eacute;alis&eacute;e &quot;au pied lev&eacute;&quot; vous donnera un aper&ccedil;u dynamique des qualit&eacute;s fonctionnelles de Kipix® ; pour la d&eacute;couvrir cliquez sur le lien suivant : <a href="http://www.mbiz.fr/kipixdemovideo.mpg"><font color="#CC00CC">T&eacute;l&eacute;chargement de la d&eacute;monstration vid&eacute;o (fichier &quot;.mpg&quot; ; environ 1 minute)</font></a><br> (c'est peu probable, mais si la d&eacute;monstration ne d&eacute;marrait pas automatiquement, utilisez <i>Windows Media Player</i> (int&eacute;gr&eacute; &agrave; Windows Millenium) ou <i>RealPlayer 8 Basic</i> (disponible gratuitement sur <a href="http://www.realplayer.com"><font color="#CC00CC">www.realplayer.com</font></a>, ou plus directement en cliquant sur le lien suivant : <a href="http://huxley.real.com/real/player/player.html?src=001201realhome_1,001201rpchoice_h2&amp;dc=124123122"><font color="#D000D0">http://www...</font></a>))</font></p> <p>&nbsp;</p> <p align="center"><font size="4">If you don't speak french, but english, please visit our web site :<br> <a href="http://www.kipix.com">www.kipix.com</a></font></p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p> </td> </tr> </table> </body> </html> <BR> ___________________________________ <BR> Si vous ne désirez plus recevoir de courriers de cette liste : <BR> To remove your email address from this list : <BR><A HREF="cadeau@kipix.com"> cadeau@kipix.com</A> From rsalz@zolera.com Tue May 29 11:48:17 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 29 May 2001 06:48:17 -0400 Subject: [XML-SIG] XML Canonicalization References: <3B03D8B4.9108432D@zolera.com> <m33d9oy68l.fsf@lambda.garshol.priv.no> Message-ID: <3B137E71.C76FC3C7@zolera.com> > I know this is a little late now, but anyway: why did we do this based > on the DOM? Isn't SAX far more natural for something as simple as this? > It's faster, it works for DOM representations as well, and it scales > much better. I met my needs at the time and I thought the community would appreciate it. Hopefully someone will get useful ideas from my code and do a SAX one. /r$ From larsga@garshol.priv.no Tue May 29 13:27:00 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 May 2001 14:27:00 +0200 Subject: [XML-SIG] XML Canonicalization In-Reply-To: <3B137E71.C76FC3C7@zolera.com> References: <3B03D8B4.9108432D@zolera.com> <m33d9oy68l.fsf@lambda.garshol.priv.no> <3B137E71.C76FC3C7@zolera.com> Message-ID: <m3ofsc1gtn.fsf@lambda.garshol.priv.no> * Rich Salz | | I met my needs at the time and I thought the community would appreciate | it. Sorry, Rich, I didn't mean to be ungrateful, and I do appreciate this. It is a useful piece of code, and we've seen already that there are people interested in this. | Hopefully someone will get useful ideas from my code and do a SAX | one. Indeed, that was the intention behind my posting, even if it may not have been very clear. Sorry about that. --Lars M. From larsga@garshol.priv.no Tue May 29 13:36:25 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 May 2001 14:36:25 +0200 Subject: [XML-SIG] external entities and CDATA sections In-Reply-To: <Pine.LNX.4.21.0105250930240.6432-100000@orion.logilab.fr> References: <Pine.LNX.4.21.0105250930240.6432-100000@orion.logilab.fr> Message-ID: <m3n17w1gdy.fsf@lambda.garshol.priv.no> * Alexandre Fayolle | | While writing some documentation, I wanted to include some python | code in a docbook document. Some ideas: - reference it using unparsed entities (you must then pull in the code yourself) - reference the code using XInclude, with the type attribute set to 'text' and write a simple SAX parser filter that does the inclusions for you (I have demo code that does this, email me if interested) - preprocess the source code and use entity references to the processed code I hope this helps and isn't too late. --Lars M. From rsalz@zolera.com Tue May 29 15:22:10 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 29 May 2001 10:22:10 -0400 Subject: [XML-SIG] XML Canonicalization References: <3B03D8B4.9108432D@zolera.com> <m33d9oy68l.fsf@lambda.garshol.priv.no> <3B137E71.C76FC3C7@zolera.com> <m3ofsc1gtn.fsf@lambda.garshol.priv.no> Message-ID: <3B13B092.7DC96963@zolera.com> > Indeed, that was the intention behind my posting, even if it may not > have been very clear. Sorry about that. It was totally clear, and I'm only wasting bandwidth on this list because you apologized twice in the same message. :) No problem at all. /r$ From cipher@redback.com Tue May 29 18:31:29 2001 From: cipher@redback.com (J B Bell) Date: Tue, 29 May 2001 10:31:29 -0700 Subject: [XML-SIG] "simple" config file parser problems Message-ID: <20010529103128.A5656@login002.redback.com> I'm having the very devil of a time trying to do something that I assume would be simple (if I knew what I was doing) with xml.sax under Python 2.0 & 2.1. I'd go into the structure I'm looking to get from the XML, but at this point, the event-handling methods I have don't come into play before something deep inside xml.expat explodes. Likely the object I'm using lacks a needed trait (it appears to be something to do with name, though that seems to be there), but I'm not sure what. Without further ado, too much code, followed by a stack trace. Any help at all is greatly appreciated. If this isn't the appropriate list, please accept my copious apologies, and if you are kindly disposed, a pointer to the right place to get assistance would be a bonus. --JB # Note, I have tried with both saxlib.HandlerBase and the presumably # more generic ContentHandler. Both give the exact same error. from xml.sax import make_parser from xml.sax import saxlib from xml.sax.handler import feature_namespaces from xml.sax import ContentHandler class Config: """A base class for all types of configuration information, whether to be found in plain files, xml, or databases. Subclass as appropriate.""" def parseConfig(self, args): """Override this in your subclassed Config""" pass def __init__(self, *args): newConfig = self.parseConfig(args) return newConfig #class RsyncConfigHandler(ContentHandler): class RsyncConfigHandler(saxlib.HandlerBase): """Read in & return a config file for rsync jobs""" # Errors should be signaled, so we'll output a message and raise # the exception to stop processing def fatalError(self, exception): sys.stderr.write('ERROR: '+ str(exception)+'\n') sys.exit(1) error = fatalError warning = fatalError def startDocument(self): self.jobList = [] def startElement(self, name, attrs): methodName = "start" + str(name).capitalize() try: method = getattr(self, methodName) except: raise "Unknown element name '<%s>'" % name self.attrs = attrs if DEBUG: print "Invoking %s with attrs %s" % (methodName, str(attrs)) apply(method, attrs) def endElement(self, name): methodName = "start" + str(name).capitalize() try: method = getattr(self, methodName) except: raise "Unknown element name '</%s>'" % name if DEBUG: print "Invoking %s with attrs %s" % (methodName, str(attrs)) apply(method, attrs) def startConfig(self, attrs): """<config> just starts the whole shebang, no need to do anything.""" pass def endConfig(self): pass def startQueue(self, attrs): pass def endQueue(self): pass def startJob(self, attrs): pass def endJob(self): pass class RsyncConfig(Config): """Return an rsync configuration object""" def parseConfig(self, args): parser = make_parser() parser.setFeature(feature_namespaces, 0) dh = RsyncConfigHandler() # Might want arguments here one day parser.setContentHandler(dh) configFile = "/home/cipher/cvs/itdoc/servers/rsync_config.xml" parser.parse(configFile) [And now the stack trace:] Python 2.0 (#1, Nov 3 2000, 12:11:00) [GCC egcs-2.91.66 19990314 (egcs-1.1.2 release)] on netbsd1 Type "copyright", "credits" or "license" for more information. >>> from rsynct import RsyncConfig >>> foo = RsyncConfig() Invoking startConfig with attrs <xml.sax.xmlreader.AttributesImpl instance at 0x8309bcc> Traceback (most recent call last): File "<stdin>", line 1, in ? File "rsynct.py", line 65, in __init__ newConfig = self.parseConfig(args) File "rsynct.py", line 129, in parseConfig parser.parse(configFile) File "/usr/pkg/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/pkg/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 121, in parse self.feed(buffer) File "/usr/pkg/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed self._parser.Parse(data, isFinal) File "/usr/pkg/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 155, in start_element self._cont_handler.startElement(name, AttributesImpl(attrs)) File "rsynct.py", line 90, in startElement apply(method, attrs) File "/usr/pkg/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 314, in __getitem__ return self._attrs[name] KeyError: 0 From uche.ogbuji@fourthought.com Tue May 29 20:20:55 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 29 May 2001 13:20:55 -0600 Subject: [XML-SIG] getAttribute?? In-Reply-To: Message from "W. Eliot Kimber" <eliot@isogen.com> of "Mon, 28 May 2001 18:24:29 CDT." <3B12DE2D.6B57DF91@isogen.com> Message-ID: <200105291920.f4TJKtL05250@localhost.local> > Dom Linu wrote: > > > > Wow -- very informative. Thank you. I was working on the assumption > > that if namespaces weren't in use, that you use non-namespace > > functions. That seems to have worked for everything else that I'm > > doing, but to be honest I can't remember if I've always been using the > > Sax2 reader-- I would have to dig. I mean, with the Sax2 reader > > (implied by using FromXml) getElementsByTagName works, without using > > getElementsByTagNameNS I'm pretty sure... is this inconsistent, or am > > I missing something? (the latter probably being true!) > > I considered the current behavior a bug (that non-namespace functions > require a null namespace value) and fixed it in my local copy of the > code. Unfortunately, I haven't had a chance to package up these fixes > and submit them back to the SIG yet. I consider this a bug in DOM, not the implementation. Certainly, the current behavior of the Python DOMs is fully conformant. We've been through this dance before. Basically, as the DOM itself sternly warns: you don't mix NS and non-NS DOM usage unless you want trouble. I'm not sure that I'd be willing to support any "fixes" that basically hack around this DOM confusion. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From eliot@isogen.com Tue May 29 20:27:07 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Tue, 29 May 2001 14:27:07 -0500 Subject: [XML-SIG] getAttribute?? References: <200105291920.f4TJKtL05250@localhost.local> Message-ID: <3B13F80B.C224C543@isogen.com> Uche Ogbuji wrote: > I consider this a bug in DOM, not the implementation. Certainly, the current > behavior of the Python DOMs is fully conformant. We've been through this > dance before. Basically, as the DOM itself sternly warns: you don't mix NS > and non-NS DOM usage unless you want trouble. > > I'm not sure that I'd be willing to support any "fixes" that basically hack > around this DOM confusion. I'll have to look at the code again, but it looked like a bug to me: the API of the DOM-1 calls was not changed but they started failing when using the DOM-2 code, and there was no reason for them to fail. They failed because the DOM implementation code was not accounting for the null namespace qualifier in the dictionary. But it's possible I've misunderstood how the code is supposed to work and inappropriately fixed it. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From Joern.Schrader@R-KOM.de Wed May 30 15:44:11 2001 From: Joern.Schrader@R-KOM.de (=?iso-8859-1?Q?J=F6rn_Schrader?=) Date: Wed, 30 May 2001 16:44:11 +0200 Subject: [XML-SIG] PyExpat and german umlaute Message-ID: <09DBFC8BDA15D411A41D0090275130BC205CB8@ffserver> I try to use pyexpat, an ISO-8859-1 encoded xml-file. but if there are any german umlaute, PyExpat raises an exception: UnicodeError: ASCII encoding error: ordinal not in range(128). PyExpat has got version 2.4. What is wrong with it. From noreply@sourceforge.net Wed May 30 17:19:45 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 30 May 2001 09:19:45 -0700 Subject: [XML-SIG] [ pyxml-Bugs-428712 ] Installer problems: missing features Message-ID: <E1558hF-0000Xs-00@usw-sf-web1.sourceforge.net> Bugs item #428712, was updated on 2001-05-30 09:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=428712&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mats Wichmann (mwichmann) Assigned to: Nobody/Anonymous (nobody) Summary: Installer problems: missing features Initial Comment: This is the result of operator error, but nonetheless... I accidentally launched an install of PyXML on a w2k system where it was already installed. I know the instructions say to remove old installations first (and this was not even an old installation) as I said, Operator Error. However, at this point (a) the existing installation is not detected with a bailout option (b) there's no way to abort the installation once it starts (c) you are prompted for EACH file as to whether to replace or not; there is no "yes to all" (or "no to all") so one would potentially have to click "yes" or "no" hundreds of times to complete. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=428712&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Wed May 30 18:17:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 30 May 2001 19:17:38 +0200 Subject: [XML-SIG] PyExpat and german umlaute In-Reply-To: <09DBFC8BDA15D411A41D0090275130BC205CB8@ffserver> (message from =?ISO-8859-1?Q?J=F6rn?= Schrader on Wed, 30 May 2001 16:44:11 +0200) References: <09DBFC8BDA15D411A41D0090275130BC205CB8@ffserver> Message-ID: <200105301717.f4UHHcM01025@mira.informatik.hu-berlin.de> > I try to use pyexpat, an ISO-8859-1 encoded xml-file. but if there > are any german umlaute, PyExpat raises an exception: UnicodeError: > ASCII encoding error: ordinal not in range(128). PyExpat has got > version 2.4. > > What is wrong with it. It is an error in your code. You should not try to write Unicode objects directly into byte-oriented files; instead, you should invoke an appropriate .encode method first. Regards, Martin From noreply@sourceforge.net Thu May 31 19:44:44 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 31 May 2001 11:44:44 -0700 Subject: [XML-SIG] [ pyxml-Patches-429102 ] Node.appendChild: raise if ancestor Message-ID: <E155XR6-0004oD-00@usw-sf-web1.sourceforge.net> Patches item #429102, was updated on 2001-05-31 11:44 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=429102&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Karl Anderson (karlanderson) Assigned to: Nobody/Anonymous (nobody) Summary: Node.appendChild: raise if ancestor Initial Comment: This patch raises a HierarchyRequestErr on an attempt to appendChild with self or an ancestor. This is required behavior, and besides, such attempts were causing hangs during the _4dom_fireMutationEvent call. Found when running my PyXML checkout through the Zope ParsedXML DOM test suite. With this patch, PyXML completes the suite without hanging. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=429102&group_id=6473