Frobnostication

From Juergen Hermann" Hi! "tests/oasis" has the following problems: - "japanese/spec.dtd" is missing - lmg-long-comment has the wrong type Index: xmlconf.xml =================================================================== RCS file: /cvsroot/pyxml/test/oasis/xmlconf.xml,v retrieving revision 1.2 diff -u -r1.2 xmlconf.xml --- xmlconf.xml 2001/08/31 22:33:20 1.2 +++ xmlconf.xml 2001/09/01 01:31:13 @@ -42,7 +42,7 @@ - Parsers must handle long comments. - "myown/long-comment.xml" cannot be parsed by a validating parser, and has "--" in the comment, which is illegal Index: myown/long-comment.xml =================================================================== RCS file: /cvsroot/pyxml/test/oasis/myown/long-comment.xml,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 long-comment.xml --- myown/long-comment.xml 2001/03/29 22:01:57 1.1.1.1 +++ myown/long-comment.xml 2001/09/01 01:31:24 @@ -1,3 +1,4 @@ +]>

xml-sig@python.org

Our 178 page INTERNET AND IT SECURITY = MANUAL TEMPLATE and

74 page DISASTER RECOVERY TEMPLATE are = a great success - over

1,500 enterprises around the world = have acquired them.

Since you are a prior client or = subscriber to our site we have

a special offer for you.=A0 Both Templates can be yours for = only

$595 – Saving 10% off the = individual price.

The TEMPLATEs will be shipped to you = via e-mail in Microsoft

Word format immediately after your = credit card is approved.=A0 =

Just put the web address below in your browser

http://www.e-janco.com/DRP_and_Security.htm =

By the way we still = have our other templates that could be

of great help to = you

=A0 SAFETY PROGRAM =

=A0 = http://www.ejobdescription.com/SafetyProgramTemplate.htm

FYI, our June 2001 Salary Survey was a = real hit, I was interviewed

by CNN and several other national = programs and our survey was=A0 =

mentioned in at least two-dozen = national magazines.

You can still get our SALARY SURVEY at = www.ejobdescription.com

Have a great = day

Victor

You have opted in for mail on our products.=A0 If you wish to = be

removed from our list just hit reply = to this message with the word

REMOVE in the = subject.

------=_NextPart_000_6CCB_01C1362D.3D283D00-- From aleida@libero.it Thu Sep 6 10:02:01 2001 From: aleida@libero.it (Carla Attianese) Date: Thu, 06 Sep 2001 09:02:01 GMT Subject: [XML-SIG] from cnr - Italy In-Reply-To: References: Message-ID: <20010906.9020100@vhfnt.cib.na.cnr.it> >>>>>>>>>>>>>>>>>> Notizia originaria <<<<<<<<<<<<<<<<<< Il 05/09/01, 19.23.49, Alexandre Fayolle = ha=20 scritto sull=91argomento Re: [XML-SIG] from cnr - Italy: > On 5 Sep 2001, Alvaro L=F3pez Ortega wrote: > > On Wed, 2001-09-05 at 17:13, Carla Attianese wrote: > > > > Do you have installed dist-utils package? > > Look at this: > > http://packages.debian.org/unstable/interpreters/python-distutils.ht= ml > The package you mention is for python 1.5.2. The Distutils are suppose= d=20 to > be included in the python 2.0 standard library. > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org= > Narval, the first software agent available as free software (GPL). As I told you, I installed Python 2.0, which is supposed to include=20 distutils. I can try to install=20 distutils by themselves, but do you think is it possible there's another= =20 reason why the file 'setup.py' doesn't see the distutils utility? Do you= =20 have some idea about? Best Regards Carla Attianese From Alexandre.Fayolle@logilab.fr Thu Sep 6 11:12:59 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 6 Sep 2001 12:12:59 +0200 (CEST) Subject: [XML-SIG] from cnr - Italy In-Reply-To: <20010906.9020100@vhfnt.cib.na.cnr.it> Message-ID: On Thu, 6 Sep 2001, Carla Attianese wrote: > > As I told you, I installed Python 2.0, which is supposed to include > distutils. I can try to install > distutils by themselves, but do you think is it possible there's another > reason why the file 'setup.py' doesn't see the distutils utility? Do you > have some idea about? I fully agree with you, and have no idea as to why the process fails for you. Maybe asking the question on the distutils SIG could provide you with some suggestions. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Sep 6 11:13:44 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 6 Sep 2001 12:13:44 +0200 (CEST) Subject: [XML-SIG] xmlproc bug ? In-Reply-To: Message-ID: On Wed, 5 Sep 2001, Roman Suzi wrote: > On Wed, 5 Sep 2001, Alexandre Fayolle wrote: > > >Is the following behaviour a well known feature or a bug (or me deeply > >misunderstanding SAX)? It looks like xmlproc's Sax2 driver won't produce > >UTF-8 encoded text when parsing a iso-8859-1 encoded file. > > > >The attached file demonstrates this. I tested it on python 1.5.2 and > >2.1.1, using PyXML 0.6.6. > > > >I'll register this in the bugtracker if it turns out to be a bug. > > I thought make_parser needs a list, not string: > > p = make_parser(["xml.sax.drivers2.drv_xmlproc"]) Well, both appear to be working. The code I posted did not cause an exception and even got me the right parser. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Sep 6 11:16:02 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 6 Sep 2001 12:16:02 +0200 (CEST) Subject: [XML-SIG] xmlproc bug ? In-Reply-To: <15ejAD-01oBxAC@fwd06.sul.t-online.com> Message-ID: On Wed, 5 Sep 2001, Juergen Hermann wrote: > On Wed, 5 Sep 2001 18:53:10 +0200 (CEST), Alexandre Fayolle wrote: > > >Is the following behaviour a well known feature or a bug (or me deeply > >misunderstanding SAX)? It looks like xmlproc's Sax2 driver won't produce > >UTF-8 encoded text when parsing a iso-8859-1 encoded file. > > > >The attached file demonstrates this. I tested it on python 1.5.2 and > >2.1.1, using PyXML 0.6.6. > > > >I'll register this in the bugtracker if it turns out to be a bug. > > It's not related to xmlproc at all, but to "print" which uses "str" which in turn > uses the default encoding "USASCII". BTW, you also need to set the namesapce > feature. The code you posted causes an exception because the string I get is not ASCII, but iso-latin-1 : File "/usr/lib/python2.1/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py", line 336, in handle_start_tag AttributesNSImpl(attrs, rawnames)) File "xmlproctest.py", line 15, in startElementNS write(qname) File "xmlproctest.py", line 9, in write sys.stdout.write((x or "").encode("ISO-8859-1") + "\n") UnicodeError: ASCII decoding error: ordinal not in range(128) Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From larsga@garshol.priv.no Thu Sep 6 11:19:25 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 06 Sep 2001 12:19:25 +0200 Subject: [XML-SIG] xmlproc bug ? In-Reply-To: References: Message-ID: * Roman Suzi | | I thought make_parser needs a list, not string: * Alexandre Fayolle | | Well, both appear to be working. The code I posted did not cause an | exception and even got me the right parser. The code checks whether it gets a string, and if so creates a single-element list. --Lars M. From Juergen Hermann" Message-ID: On Thu, 6 Sep 2001 12:16:02 +0200 (CEST), Alexandre Fayolle wrote: >The code you posted causes an exception because the string I get is not >ASCII, but iso-latin-1 : Tested it again, and indeed it fails... with Python 2.0. With 2.1, runs like a champ. Go figure. Possible cause is that I have PyXML 0.7 installed in 2.1, which emits unicode strings. This works in both (for me): def write(x): s = x or "" if type(s) == type(u''): s = s.encode("ISO-8859-1") sys.stdout.write(s + "\n") From Alexandre.Fayolle@logilab.fr Thu Sep 6 13:05:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 6 Sep 2001 14:05:06 +0200 (CEST) Subject: [XML-SIG] xmlproc bug ? In-Reply-To: Message-ID: On Thu, 6 Sep 2001, Juergen Hermann wrote: > On Thu, 6 Sep 2001 12:16:02 +0200 (CEST), Alexandre Fayolle wrote: > > >The code you posted causes an exception because the string I get is not > >ASCII, but iso-latin-1 : > > Tested it again, and indeed it fails... with Python 2.0. With 2.1, runs > like a champ. Go figure. Possible cause is that I have PyXML 0.7 > installed in 2.1, which emits unicode strings. So I guess the answer to my original question is 'fixed in CVS'. Thanks for the information. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 14:18:35 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 15:18:35 +0200 Subject: [XML-SIG] xmlproc bug ? In-Reply-To: (message from Roman Suzi on Wed, 5 Sep 2001 23:06:08 +0400 (MSD)) References: Message-ID: <200109061318.f86DIZP03592@mira.informatik.hu-berlin.de> > I thought make_parser needs a list, not string: > > p = make_parser(["xml.sax.drivers2.drv_xmlproc"]) As Lars Marius explains, either a string or a list is fine. Whether accepting strings was for convenience or for backwards compatibility, I forgot. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 14:21:54 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 15:21:54 +0200 Subject: [XML-SIG] xmlproc bug ? In-Reply-To: (message from Alexandre Fayolle on Thu, 6 Sep 2001 14:05:06 +0200 (CEST)) References: Message-ID: <200109061321.f86DLsd03593@mira.informatik.hu-berlin.de> > > Tested it again, and indeed it fails... with Python 2.0. With 2.1, runs > > like a champ. Go figure. Possible cause is that I have PyXML 0.7 > > installed in 2.1, which emits unicode strings. > > So I guess the answer to my original question is 'fixed in CVS'. I think this is not the answer. Your original question was about 1.5.2, and the CVS xmlproc will generate Unicode objects only for Python 1.6 and later. In 1.5.2, it is supposed to use the integrated code set converters in xml.parsers.xmlproc.charconv. I don't have 1.5.2 anymore, so I couldn't (yet) investigate why that fails. Regards, Martin From alvaro@godsmaze.org Thu Sep 6 15:34:37 2001 From: alvaro@godsmaze.org (Alvaro =?ISO-8859-1?Q?L=F3pez?= Ortega) Date: 06 Sep 2001 16:34:37 +0200 Subject: [XML-SIG] from cnr - Italy In-Reply-To: <20010906.9020100@vhfnt.cib.na.cnr.it> References: <20010906.9020100@vhfnt.cib.na.cnr.it> Message-ID: <999786888.19322.5.camel@servidor> On Thu, 2001-09-06 at 11:02, Carla Attianese wrote: Look for the distutils module (dir).. may be with something like this: from sys import path from os import system for p in path: system("find %s -name distutils" % (p)) > >>>>>>>>>>>>>>>>>> Notizia originaria <<<<<<<<<<<<<<<<<< >=20 > Il 05/09/01, 19.23.49, Alexandre Fayolle h= a=20 > scritto sull=91argomento Re: [XML-SIG] from cnr - Italy: >=20 >=20 > > On 5 Sep 2001, Alvaro L=F3pez Ortega wrote: >=20 > > > On Wed, 2001-09-05 at 17:13, Carla Attianese wrote: > > > > > > Do you have installed dist-utils package? > > > Look at this: > > > http://packages.debian.org/unstable/interpreters/python-distutils.htm= l >=20 > > The package you mention is for python 1.5.2. The Distutils are supposed= =20 > to > > be included in the python 2.0 standard library. >=20 > > Alexandre Fayolle > > -- > > LOGILAB, Paris (France). > > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > > Narval, the first software agent available as free software (GPL). >=20 > As I told you, I installed Python 2.0, which is supposed to include=20 > distutils. I can try to install=20 > distutils by themselves, but do you think is it possible there's another=20 > reason why the file 'setup.py' doesn't see the distutils utility? Do you=20 > have some idea about? > Best Regards > Carla Attianese >=20 >=20 > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig --=20 Greetings, alo. From Alexandre.Fayolle@logilab.fr Thu Sep 6 14:59:09 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 6 Sep 2001 15:59:09 +0200 (CEST) Subject: [XML-SIG] xmlproc bug ? In-Reply-To: <200109061321.f86DLsd03593@mira.informatik.hu-berlin.de> Message-ID: On Thu, 6 Sep 2001, Martin v. Loewis wrote: > > So I guess the answer to my original question is 'fixed in CVS'. > > I think this is not the answer. Your original question was about > 1.5.2, and the CVS xmlproc will generate Unicode objects only for > Python 1.6 and later. In 1.5.2, it is supposed to use the integrated > code set converters in xml.parsers.xmlproc.charconv. I don't have > 1.5.2 anymore, so I couldn't (yet) investigate why that fails. In my original mail I said I was able to reproduce the problem using both python 1.5.2 and python 2.1.1, both having PyXML 0.6.6 installed. I guess the reason why J�rgen could reproduce it with python 2.0 and not Python 2.1 is that his python 2.1 installation has PyXML 0.7 installed, whereas the 2.0 install has 0.6.6 (or earlier). Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Juergen Hermann" Message-ID: On Thu, 6 Sep 2001 15:59:09 +0200 (CEST), Alexandre Fayolle wrote: >In my original mail I said I was able to reproduce the problem using bo= th >python 1.5.2 and python 2.1.1, both having PyXML 0.6.6 installed. I gue= ss >the reason why J=FCrgen could reproduce it with python 2.0 and not Pyth= on >2.1 is that his python 2.1 installation has PyXML 0.7 installed, wherea= s >the 2.0 install has 0.6.6 (or earlier). Note that your ORIGINAL source fails with 2.1/0.7, too. Is there a way t= o register a default encoding with str()? I just tried setlocale(), and th= at did not change anything. With Python 2.1: >>> x=3D"=F6=E4=FC=D6=C4=DC=DF" >>> u=3Dunicode(x, "iso-8859-1") >>> print x =F6=E4=FC=D6=C4=DC=DF >>> print u Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 15:46:34 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 16:46:34 +0200 Subject: [XML-SIG] xmlproc bug ? In-Reply-To: (message from Alexandre Fayolle on Wed, 5 Sep 2001 18:53:10 +0200 (CEST)) References: Message-ID: <200109061446.f86EkYL04058@mira.informatik.hu-berlin.de> > Is the following behaviour a well known feature or a bug (or me deeply > misunderstanding SAX)? It looks like xmlproc's Sax2 driver won't produce > UTF-8 encoded text when parsing a iso-8859-1 encoded file. > > The attached file demonstrates this. I tested it on python 1.5.2 and > 2.1.1, using PyXML 0.6.6. > > I'll register this in the bugtracker if it turns out to be a bug. I think I got the full story now. xmlproc, in principle, was capable of performing charset conversions itself. However, 1. To do so, one had to call set_data_charset on the parser. The SAX driver doesn't, you'll have to add parser.set_data_charset("utf-8") into drv_xmlproc.XmlprocDriver.parse. 2. Once you do so, xmlproc will try to perform conversions. In 0.6.6, this fails, because of the comment in xmlproc.charconv: # UTF-8 stuff disabled due to total lack of speed If you enable the lines below that comment, the parser will attempt charset conversion, but it will indeed slow down significantly. 3. The parser instantiates the conversion function only after it has seen the encoding= attribute. In your example, it has already converted the first chunk of data using the old converter (identity conversion), and it fails to convert rest of this chunk using the new converter. As a result, it will still pass those data as Latin-1 to the application. So in short, it doesn't work at all. 4. In the CVS, the Unicode API is used where available. For that, care was taken to convert the rest of the data once the encoding has been detected. That's why it 'works' with Py 2.1 and the CVS xmlproc. 5. In the process of integrating Unicode, it was considered pointless to allow accented characters in elements if Unicode is not available. Those accented characters would have been good only if the input is Latin-1, and only if conversion to UTF-8 is not performed. They also constitute only a small subset of the legal NCName characters. As a result, names are now restricted to ASCII letters in CVS PyXML. In turn, your document is rejected with the CVS PyXML on Py 1.5.2. Hope this clarifies it. If you think anything should be done about it, I can assist in drafting a patch - although I won't attempt to create one myself. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 15:51:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 16:51:15 +0200 Subject: [XML-SIG] xmlproc bug ? In-Reply-To: (jh@web.de) References: Message-ID: <200109061451.f86EpFX04131@mira.informatik.hu-berlin.de> > Note that your ORIGINAL source fails with 2.1/0.7, too. Is there a way to > register a default encoding with str()? I just tried setlocale(), and that > did not change anything. No. You can set it in site.py, but that is discouraged, since it means that the same code will run differently on different machines. Explicit is better than implicit, you should always use .encode if you want to output a Unicode object onto a byte stream (such as sys.stdout). If you just want to print it for debugging purposes, using repr() is best. Regards, Martin From marklists@mceahern.com Thu Sep 6 17:13:25 2001 From: marklists@mceahern.com (Mark McEahern) Date: Thu, 6 Sep 2001 09:13:25 -0700 Subject: [XML-SIG] enumerate the attributes? In-Reply-To: <200109061451.f86EpFX04131@mira.informatik.hu-berlin.de> Message-ID: Hi, is there a way to enumerate the attributes of a node? from xml.dom import minidom s = "" doc = minidom.parseString(s) foo = doc.documentElement # something like this: for a in foo.attributes: print "%s = %s" % (a.name, a.value) # ideally, attributes would be a dictionary too: a = "id" print "%s = %s" % (a, foo.attributes[a]) Thanks, // mark From marklists@mceahern.com Thu Sep 6 17:45:05 2001 From: marklists@mceahern.com (Mark McEahern) Date: Thu, 6 Sep 2001 09:45:05 -0700 Subject: [XML-SIG] enumerate the attributes? In-Reply-To: Message-ID: Well, there is: foo._attrs // m > -----Original Message----- > From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On > Behalf Of Mark McEahern > Sent: Thursday, September 06, 2001 9:13 AM > To: xml-sig@python.org > Subject: [XML-SIG] enumerate the attributes? > > > Hi, is there a way to enumerate the attributes of a node? > > from xml.dom import minidom > > s = "" > doc = minidom.parseString(s) > > foo = doc.documentElement > > # something like this: > for a in foo.attributes: > print "%s = %s" % (a.name, a.value) > > # ideally, attributes would be a dictionary too: > a = "id" > print "%s = %s" % (a, foo.attributes[a]) > > Thanks, > > // mark > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 21:16:34 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 22:16:34 +0200 Subject: [XML-SIG] enumerate the attributes? In-Reply-To: References: Message-ID: <200109062016.f86KGY200877@mira.informatik.hu-berlin.de> > Well, there is: > > foo._attrs That is undocumented, non-standard, and should not be used. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 6 21:39:46 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 6 Sep 2001 22:39:46 +0200 Subject: [XML-SIG] enumerate the attributes? In-Reply-To: References: Message-ID: <200109062039.f86Kdk400941@mira.informatik.hu-berlin.de> > Hi, is there a way to enumerate the attributes of a node? Certainly. The .attributes property of a Node is of type NamedNodeMap, see http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-84CF096 The NamedNodeMap, in turn, has a .length attribute, and a .item() method, see http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-1780488922 So with pure DOM only, you get # something like this: for aindex in range(foo.attributes.length): a = foo.attributes.item(aindex) print "%s = %s" % (a.name, a.value) > # ideally, attributes would be a dictionary too: > a = "id" > print "%s = %s" % (a, foo.attributes[a]) As documented in http://www.python.org/doc/current/lib/dom-attributelist-objects.html there is an experimental API for NamedNodeMaps to treat them as dictionaries. This works in many case, like the one you mention. It also allows you to write for a in foo.attributes.values(): print "%s = %s" % (a.name, a.value) > # something like this: > for a in foo.attributes: > print "%s = %s" % (a.name, a.value) Iterating over a NamedNodeMap itself is an extension that is only available in 4DOM. The tricky part is that foo.attributes['id'] ought to operate dictionary-like, whereas foo.attributes[0] ought to operate sequence-like. Guido once commented to that API Aaaaaaargh!!!!!!! You mean somebody wrote a subclass of UserDict that attempts to behave like a sequence by checking the type of the __getitem__ argument?????????? Yuck!!!!!!!!!!!!!!! You XML weenies are sickos. :-) I guess you won't see this extension in minidom :-) Regards, Martin From marklists@mceahern.com Thu Sep 6 21:56:50 2001 From: marklists@mceahern.com (Mark McEahern) Date: Thu, 6 Sep 2001 13:56:50 -0700 Subject: [XML-SIG] enumerate the attributes? In-Reply-To: <200109062016.f86KGY200877@mira.informatik.hu-berlin.de> Message-ID: Is there anything documented, standard, and should be used? Thanks, // mark > -----Original Message----- > From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On > Behalf Of Martin v. Loewis > Sent: Thursday, September 06, 2001 1:17 PM > To: marklists@mceahern.com > Cc: marklists@mceahern.com; xml-sig@python.org > Subject: Re: [XML-SIG] enumerate the attributes? > > > > Well, there is: > > > > foo._attrs > > That is undocumented, non-standard, and should not be used. > > Regards, > Martin From noreply@sourceforge.net Fri Sep 7 00:05:27 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 06 Sep 2001 16:05:27 -0700 Subject: [XML-SIG] [ pyxml-Bugs-459351 ] OASIS XML Conformance Test Failures Message-ID: Bugs item #459351, was opened at 2001-09-06 16:05 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=459351&group_id=6473 Category: xmlproc Group: None Status: Open Resolution: None Priority: 5 Submitted By: J�rgen Hermann (jhermann) Assigned to: Lars Marius Garshol (larsga) Summary: OASIS XML Conformance Test Failures Initial Comment: Check out http://cvs.sourceforge.net/cgi- bin/viewcvs.cgi/pyxml/test/xmlconf/results/xmlproc- 2001-09-07-win32-py21.html ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=459351&group_id=6473 From noreply@sourceforge.net Fri Sep 7 00:07:33 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 06 Sep 2001 16:07:33 -0700 Subject: [XML-SIG] [ pyxml-Bugs-459353 ] Error message with unreplaced '%s' Message-ID: Bugs item #459353, was opened at 2001-09-06 16:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=459353&group_id=6473 Category: xmlproc Group: None Status: Open Resolution: None Priority: 5 Submitted By: J�rgen Hermann (jhermann) Assigned to: Lars Marius Garshol (larsga) Summary: Error message with unreplaced '%s' Initial Comment: xmlproc emits "encoding '%s' conflicts with autodetected encoding" literally ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=459353&group_id=6473 From kajiyama@grad.sccs.chukyo-u.ac.jp Fri Sep 7 03:52:40 2001 From: kajiyama@grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA) Date: Fri, 7 Sep 2001 11:52:40 +0900 Subject: [XML-SIG] SAX driver names Message-ID: <200109070252.LAA13839@dhcp219.grad.sccs.chukyo-u.ac.jp> Hi. I have a question about SAX driver names. I noticed that saxexts.make_parser() in PyXML 0.6.6 accepts both "xmlproc" and "xml.sax.drivers.drv_xmlproc", and that the two generated parsers seem to have some differences. For example, the parser generated by the former name passes a unicode string to saxlib.HandlerBase.characters(), while the parser generated by the latter name passes an ordinal string to that method. I wonder if there are other differences. Is a driver name like "xml.sax.drivers.drv_xmlproc" a valid one? Thanks, -- KAJIYAMA, Tamito From kajiyama@grad.sccs.chukyo-u.ac.jp Fri Sep 7 05:22:09 2001 From: kajiyama@grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA) Date: Fri, 7 Sep 2001 13:22:09 +0900 Subject: [XML-SIG] SAX driver names In-Reply-To: <200109070252.LAA13839@dhcp219.grad.sccs.chukyo-u.ac.jp> (message from Tamito KAJIYAMA on Fri, 7 Sep 2001 11:52:40 +0900) References: <200109062118.f86LILD76097@freefall.freebsd.org> Message-ID: <200109070422.NAA14312@dhcp219.grad.sccs.chukyo-u.ac.jp> I wrote: | | I noticed that saxexts.make_parser() in PyXML 0.6.6 accepts both | "xmlproc" and "xml.sax.drivers.drv_xmlproc", and that the two | generated parsers seem to have some differences. I had a mistake. The driver name "xmlproc" was an invalid one and the pyexpat driver was used instead silently. | For example, | the parser generated by the former name passes a unicode string | to saxlib.HandlerBase.characters(), while the parser generated | by the latter name passes an ordinal string to that method. I | wonder if there are other differences. The difference mentioned above seems the difference between pyexpat and xmlproc. Is that an known (intended) difference? Are there other differences between the two major parsers? Thanks, -- KAJIYAMA, Tamito From Alexandre.Fayolle@logilab.fr Fri Sep 7 13:08:39 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 7 Sep 2001 14:08:39 +0200 (CEST) Subject: [XML-SIG] [ANN] xmldiff 0.5 Message-ID: Logilab has released XmlDiff 0.5. XmlDiff is a python tool that figures out the differences between two similar XML files, in the same way the diff utility does it for text files. It was developed for the Narval project and should also be used as a library. It can work either with XML files or DOM trees. XmlDiff is released under the Gnu Public Licence. What's new in 0.5? * a new algorithm is used, which is a couple of orders of magnitude faster, which makes xmldiff usable on big documents. * some Unicode issues have been fixed For more information, please check http://www.logilab.org/xmldiff/ Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Fri Sep 7 15:39:55 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 7 Sep 2001 16:39:55 +0200 (CEST) Subject: [XML-SIG] Fwd: [ANN] XSLTDoc 0.2 (fwd) Message-ID: Hello, Here's an announce for an intersting tool that I saw on comp.text.xml. I thought it might interest a few people here. It looks 100% pure XSLT, so it should be fairly portable ;o) Alexandre Fayolle ---------- Forwarded message ---------- From: "Fabrice DESRE - FT.BD/FTRD/DMI/GRI" Newsgroups: comp.text.xml Subject: [ANN] XSLTDoc 0.2 Date: Fri, 07 Sep 2001 15:48:24 +0200 Annoucing XSLTDoc : an XSLT stylesheet documentation generator. This tool is itself an XSLT stylesheet that analyzes another stylesheet and builds a clean documentation on it. It also makes some sanity checks. You'll find it at http://grillade.griotte.com/xml/ New features in v0.2 are : - some bugs in the stylesheet corrected (thanks to Wolfgang Bogacz). It should now run on Saxon without errors. - some semantic checks were added : stylesheet version, correct mode usage when applying templates. - Duplicate variable bindings in a template are detected and reported. - Usage of anything else than xsl:with-param in xsl:call-template is reported. If you think that such a tool could be useful in synergy with an xslt editor (be it a simple text editor), please let me now. I'm interested in such a project but don't ahve enough time to achieve it alone. Enjoy, Fabrice -- Fabrice Desr� - France Telecom R&D/DMI/GRI Tel: +(33) 2 96 05 31 43 Fax: +(33) 2 96 05 32 86 From aleida@libero.it Fri Sep 7 16:07:57 2001 From: aleida@libero.it (Carla Attianese) Date: Fri, 07 Sep 2001 15:07:57 GMT Subject: [XML-SIG] from CNr - Italy Message-ID: <20010907.15075700@vhfnt.cib.na.cnr.it> The installation has worked!!! But please, could someone tell me where has the setup program installed = Py XML? I've installed PyXML as a prerequisite for 4Suite, but 4Suite doesn't=20 work. During the 4Suite installation I received a warning, and I think that th= e=20 problem can be the place where Python and PyXML are installed (I have no= =20 more space in /usr/local and I performed all the installation in my home= =20 directory).=20 Thank you=20 Carla Attianese From Alexandre.Fayolle@logilab.fr Fri Sep 7 17:20:20 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 7 Sep 2001 18:20:20 +0200 (CEST) Subject: [XML-SIG] from CNr - Italy In-Reply-To: <20010907.15075700@vhfnt.cib.na.cnr.it> Message-ID: On Fri, 7 Sep 2001, Carla Attianese wrote: > The installation has worked!!! > But please, could someone tell me where has the setup program installed > Py XML? > I've installed PyXML as a prerequisite for 4Suite, but 4Suite doesn't > work. > During the 4Suite installation I received a warning, and I think that the > problem can be the place where Python and PyXML are installed (I have no > more space in /usr/local and I performed all the installation in my home > directory). Did you add $HOME/lib/python to your PYTHONPATH environment variable? This is the place where setup.py would have put it if you used the --home command line option. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From rnd@onego.ru Fri Sep 7 18:14:50 2001 From: rnd@onego.ru (Roman Suzi) Date: Fri, 7 Sep 2001 21:14:50 +0400 (MSD) Subject: [XML-SIG] namespaces and sax questions Message-ID: Hello! I am trying to master XML and I can't understand wgat is "qualified name" as understood by the sax.* modules of standard Python 2.1.1: Here are my program, XML example and result: --- run.py --- import xml.sax, xml.sax.handler from xml.sax.xmlreader import InputSource class ContentHandler(xml.sax.handler.ContentHandler): def startElementNS(self, name, qname, attrs): print "name=", name, "qname=", qname print "names:", attrs.getNames(), print "qnames:", attrs.getQNames() # def endElementNS(self, name, qname): # print name, qname def startPrefixMapping(self, prefix, URI): print "START", prefix, URI def endPrefixMapping(self, prefix): print "END", prefix input_source = InputSource() input_source.setByteStream(open("W3CExample.xml", "r")) xml_reader = xml.sax.make_parser() xml_reader.setContentHandler(ContentHandler()) # while docs tell it is ON by default, it is not: xml_reader.setFeature(xml.sax.handler.feature_namespaces, 1) xml_reader.parse(input_source) --- --- W3CExample.xml --- Frobnostication

Moved to here.

--- And the result: --- START None http://www.w3.org/TR/REC-html40 name= (u'http://www.w3.org/TR/REC-html40', u'html') qname= None names: [] qnames: [] name= (u'http://www.w3.org/TR/REC-html40', u'head') qname= None names: [] qnames: [] name= (u'http://www.w3.org/TR/REC-html40', u'title') qname= None names: [] qnames: [] name= (u'http://www.w3.org/TR/REC-html40', u'body') qname= None names: [] qnames: [] name= (u'http://www.w3.org/TR/REC-html40', u'p') qname= None names: [] qnames: [] name= (u'http://www.w3.org/TR/REC-html40', u'a') qname= None names: [(None, u'href')] qnames: [] END None --- I do not see any "html:title", "html:head", ... in qnames while http://www.w3.org/TR/REC-xml-names says what qname is: Qualified Name QName ::= (Prefix ':')? LocalPart Prefix ::= NCName LocalPart ::= NCName Also, most of the features aren't supported by default xmlparser (pyexpat), while Python docs do not tell so. The same thing happens if I add "html:" to the tags explicitly. What is the problem? How these observations could be explained? Thanks! Sincerely yours, Roman Suzi -- _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Friday, September 07, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "Dreams are free, but you get soaked on the connect time." _/ From martin@loewis.home.cs.tu-berlin.de Fri Sep 7 21:22:52 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 7 Sep 2001 22:22:52 +0200 Subject: [XML-SIG] from CNr - Italy In-Reply-To: <20010907.15075700@vhfnt.cib.na.cnr.it> (message from Carla Attianese on Fri, 07 Sep 2001 15:07:57 GMT) References: <20010907.15075700@vhfnt.cib.na.cnr.it> Message-ID: <200109072022.f87KMqQ00987@mira.informatik.hu-berlin.de> > But please, could someone tell me where has the setup program > installed Py XML? ... I have no more space in /usr/local and I performed > all the installation in my home directory). It depends on how you've invoked setup.py. What commands did you give to setup.py, and in what order? How did you build Python before that (again, give exact commands in exact order, please)? Regards, Martin From larsga@garshol.priv.no Mon Sep 10 09:53:46 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 10 Sep 2001 10:53:46 +0200 Subject: [XML-SIG] namespaces and sax questions In-Reply-To: References: Message-ID: * Roman Suzi | | I am trying to master XML and I can't understand wgat is "qualified | name" as understood by the sax.* modules of standard Python 2.1.1: When you use namespaces in XML, an element name like "xhtml:p" appearing in the document is really a shorthand for the namespace URI "http://www.w3.org/1999/xhtml" and the local name "p". In this case, "xhtml:p" is a qualified name (in SAX terminology), because it has a namespace prefix that qualifies the local name "p". | And the result: | | --- | START None http://www.w3.org/TR/REC-html40 | name= (u'http://www.w3.org/TR/REC-html40', u'html') qname= None Apparently you are using pyexpat, which doesn't reveal the original qualified name. If you try using xmlproc you should get the qname. | Also, most of the features aren't supported by default xmlparser | (pyexpat), while Python docs do not tell so. This has been corrected to some extent in the current CVS tree, so the next release should be better. If there is anything specific that has bothered you it would be interesting to hear what it was, and we might be able to fix either the code or the documentation. --Lars M. From Alexandre.Fayolle@logilab.fr Tue Sep 11 10:04:31 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 11 Sep 2001 11:04:31 +0200 (CEST) Subject: [XML-SIG] string interning Message-ID: Hello, I was browsing in the source code, and came accross this in xml.sax.handler : feature_string_interning = "http://xml.org/sax/features/string-interning" # true: All element names, prefixes, attribute names, Namespace URIs, and # local names are interned using the built-in intern function. # false: Names are not necessarily interned, although they may be (default). # access: (parsing) read-only; (not parsing) read/write what does 'interning' mean in this context ? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Tue Sep 11 10:58:09 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 11 Sep 2001 11:58:09 +0200 (CEST) Subject: [XML-SIG] string interning In-Reply-To: Message-ID: On Tue, 11 Sep 2001, Alexandre Fayolle wrote: > Hello, > > I was browsing in the source code, and came accross this in > xml.sax.handler : > > feature_string_interning = "http://xml.org/sax/features/string-interning" > # true: All element names, prefixes, attribute names, Namespace URIs, and > # local names are interned using the built-in intern function. > # false: Names are not necessarily interned, although they may be > (default). > # access: (parsing) read-only; (not parsing) read/write > > what does 'interning' mean in this context ? Keeping in mind that SAX was designed with Java in mind, I gave a look at the Java API documentation for class String, which defines an intern() method: http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html#intern() What intern() does is look for a string object being equal() to the current String in a pool of unique Strings. One can then discard the original String (to be garbage collected later) and use the returned one. This can save memory (because fewer objects are produced by the parser) and time (because one can use the == operator to test if element names are equal (equivalent of python's 'is' test in this context). This, I think, clarifies the issue. Please correct me if I'm wrong. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From larsga@garshol.priv.no Tue Sep 11 12:33:22 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 11 Sep 2001 13:33:22 +0200 Subject: [XML-SIG] string interning In-Reply-To: References: Message-ID: * Alexandre Fayolle | | What intern() does is look for a string object being equal() to the | current String A string object being == to the current String, yes. This is the same as '==' versus 'is' in Python. | This can save memory (because fewer objects are produced by the | parser) and time (because one can use the == operator to test if | element names are equal (equivalent of python's 'is' test in this | context). Exactly. I think the actual benefit in Python is small, but we kept this property anyway. I found that xmlproc was a little bit faster when I called intern() on all names, so I used it. --Lars M. From Alexandre.Fayolle@logilab.fr Tue Sep 11 13:32:17 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 11 Sep 2001 14:32:17 +0200 (CEST) Subject: [XML-SIG] string interning In-Reply-To: Message-ID: On 11 Sep 2001, Lars Marius Garshol wrote: > I think the actual benefit in Python is small, but we kept this > property anyway. I found that xmlproc was a little bit faster when I > called intern() on all names, so I used it. Oh. Silly me. I had completely missed the built-in intern() function in python (http://www.python.org/doc/current/lib/built-in-funcs.html#l2h-180) I'll keep this in mind for future developments. Thanks. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From rnd@onego.ru Tue Sep 11 19:27:48 2001 From: rnd@onego.ru (Roman Suzi) Date: Tue, 11 Sep 2001 22:27:48 +0400 (MSD) Subject: [XML-SIG] namespaces and sax questions In-Reply-To: Message-ID: On 10 Sep 2001, Lars Marius Garshol wrote: > >* Roman Suzi >| >| I am trying to master XML and I can't understand wgat is "qualified >| name" as understood by the sax.* modules of standard Python 2.1.1: > >When you use namespaces in XML, an element name like "xhtml:p" >appearing in the document is really a shorthand for the namespace URI >"http://www.w3.org/1999/xhtml" and the local name "p". In this case, >"xhtml:p" is a qualified name (in SAX terminology), because it has a >namespace prefix that qualifies the local name "p". Yes, this I understand more or less. >| And the result: >| >| --- >| START None http://www.w3.org/TR/REC-html40 >| name= (u'http://www.w3.org/TR/REC-html40', u'html') qname= None > >Apparently you are using pyexpat, which doesn't reveal the original >qualified name. If you try using xmlproc you should get the qname. Does it mean, that my code is right and the problem is indeed in the expat? If so, are pyexpat + namespaces usable? >| Also, most of the features aren't supported by default xmlparser >| (pyexpat), while Python docs do not tell so. > >This has been corrected to some extent in the current CVS tree, so the >next release should be better. > >If there is anything specific that has bothered you it would be >interesting to hear what it was, and we might be able to fix either >the code or the documentation. Thanks for answering! Sincerely yours, Roman Suzi -- _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Tuesday, September 11, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "I was the next door kid's imaginary friend." _/ From larsga@garshol.priv.no Tue Sep 11 20:35:19 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 11 Sep 2001 21:35:19 +0200 Subject: [XML-SIG] namespaces and sax questions In-Reply-To: References: Message-ID: * Lars Marius Garshol | | Apparently you are using pyexpat, which doesn't reveal the original | qualified name. If you try using xmlproc you should get the qname. * Roman Suzi | | Does it mean, that my code is right and the problem is indeed in | the expat? If so, are pyexpat + namespaces usable? Your code is right, but I'm not sure there is a problem anywhere. If you want to use namespaces you don't need the qualified name, since the URI + localname is what you should be using. If you're not interested in the URI + localname, but only want the 'raw' name from the XML document you shouldn't be doing namespace processing at all. It would be better if expat reported the qualified name, but it does not, and there is nothing the SAX driver can do about this. I don't think this is much of a problem. If you really want this you can use xmlproc. --Lars M. From fdrake@acm.org Tue Sep 11 20:36:00 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 11 Sep 2001 15:36:00 -0400 Subject: [XML-SIG] namespaces and sax questions In-Reply-To: References: Message-ID: <15262.26528.582906.915989@grendel.digicool.com> Lars Marius Garshol writes: > It would be better if expat reported the qualified name, but it does > not, and there is nothing the SAX driver can do about this. I don't > think this is much of a problem. If you really want this you can use > xmlproc. There is currently a patch filed against Expat (and is assigned to me) that should allow it to report everything necessary for PyExpat to make qualified names available. I'll try and make sure the patch makes it into Expat 1.95.3 and that additional controls are available to allow this information to be reported by PyExpat. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From larsga@garshol.priv.no Tue Sep 11 20:51:02 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 11 Sep 2001 21:51:02 +0200 Subject: [XML-SIG] namespaces and sax questions In-Reply-To: <15262.26528.582906.915989@grendel.digicool.com> References: <15262.26528.582906.915989@grendel.digicool.com> Message-ID: * Fred L. Drake, Jr. | | There is currently a patch filed against Expat (and is assigned to | me) that should allow it to report everything necessary for PyExpat | to make qualified names available. I'll try and make sure the patch | makes it into Expat 1.95.3 and that additional controls are | available to allow this information to be reported by PyExpat. Very good. Once that is done I can update the expat driver to provide this information. --Lars M. From zhusm@neusoft.com Wed Sep 12 08:25:43 2001 From: zhusm@neusoft.com (=?gb2312?B?16PLs8Px?=) Date: Wed, 12 Sep 2001 15:25:43 +0800 Subject: [XML-SIG] any one help me please Message-ID: <000801c13b5c$23b63be0$4201010a@zhushunmin> �� MIME ��ʽ�ľ��кܶಿ��Ϣ�� --Boundary_(ID_pvfPHFYR4mYbu3tn5dPt0w) Content-type: text/plain; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE help me please ,why can't parse the xml on linux. my os is slcakware -linux-7.0 but the below programme have been executed on the windows 2000. >>> import xml.dom.minidom >>>=20 >>> document =3D """\ =2E.. =2E.. Demo slideshow =2E.. Slide title =2E.. This is a demo =2E.. Of a program for processing slides =2E.. =2E..=20 =2E.. Another demo slide =2E.. It is important =2E.. To have more than =2E.. one slide =2E.. =2E.. =2E.. """ >>>=20 >>> dom =3D xml.dom.minidom.parseString(document) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.0/xml/dom/minidom.py", line 475, in parseStr= ing return _doparse(pulldom.parseString, args, kwargs) File "/usr/lib/python2.0/xml/dom/minidom.py", line 464, in _doparse events =3D apply(func, args, kwargs) File "/usr/lib/python2.0/xml/dom/pulldom.py", line 237, in parseStr= ing parser =3D xml.sax.make_parser() File "/usr/lib/python2.0/xml/sax/__init__.py", line 76, in make_par= ser return _create_parser(parser_name) File "/usr/lib/python2.0/xml/sax/__init__.py", line 101, in _create= _parser return drv_module.create_parser() AttributeError: create_parser >>>=20 >>> space =3D " " >>> def getText(nodelist): =2E.. rc =3D "" =2E.. for node in nodelist: =2E.. if node.nodeType =3D=3D node.TEXT_NODE: =2E.. rc =3D rc + node.data =2E.. return rc =2E..=20 >>> def handleSlideshow(slideshow): =2E.. print "" =2E.. handleSlideshowTitle(slideshow.getElementsByTagName("title"= )[0]) =2E.. slides =3D slideshow.getElementsByTagName("slide") =2E.. handleToc(slides) =2E.. handleSlides(slides) =2E.. print "" =2E..=20 >>> def handleSlides(slides): =2E.. for slide in slides: =2E.. handleSlide(slide) =2E..=20 >>> def handleSlide(slide): =2E.. handleSlideTitle(slide.getElementsByTagName("title")[0]) =2E.. handlePoints(slide.getElementsByTagName("point")) =2E..=20 >>> def handleSlideshowTitle(title): =2E.. print "%s" % getText(title.childNodes) =2E..=20 >>> def handleSlideTitle(title): =2E.. print "

%s

" % getText(title.childNodes) =2E..=20 >>> def handlePoints(points): =2E.. print "

" =2E.. for point in points: =2E.. handlePoint(point) =2E.. print "" =2E..=20 >>> def handlePoint(point): =2E.. print "

" % getText(point.childNodes) =2E..=20 >>> def handleToc(slides): =2E.. for slide in slides: =2E.. title =3D slide.getElementsByTagName("title")[0] =2E.. print "

" % getText(title.childNodes) =2E..=20 >>> handleSlideshow(dom) Traceback (most recent call last): File "", line 1, in ? NameError: There is no variable named 'dom' --Boundary_(ID_pvfPHFYR4mYbu3tn5dPt0w) Content-type: text/html; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE

help me please ,why can't parse the xml on linux.=

my os is slcakware -linux-7.0

but the below programme have been executed on the= windows=20 2000.

>>> import xml.dom.minidom
>>&g= t;=20
>>> document =3D """\
... <slideshow>
...= =20 <title>Demo slideshow</title>
... <slide><tit= le>Slide=20 title</title>
... <point>This is a demo</point><= BR>...=20 <point>Of a program for processing slides</point>
...= =20 </slide>
...
... <slide><title>Another demo= =20 slide</title>
... <point>It is important</point>=
...=20 <point>To have more than</point>
... <point>one= =20 slide</point>
... </slide>
... </slideshow>...=20 """
>>>
>>> dom =3D=20 xml.dom.minidom.parseString(document)
Traceback (most recent call= =20 last):
File "<stdin>", line 1, in ?
File= =20 "/usr/lib/python2.0/xml/dom/minidom.py", line 475, in=20 parseString
    return _doparse(pulldom.parseString= , args,=20 kwargs)
File "/usr/lib/python2.0/xml/dom/minidom.py", line = 464, in=20 _doparse
    events =3D apply(func, args, kwargs) File=20 "/usr/lib/python2.0/xml/dom/pulldom.py", line 237, in=20 parseString
    parser =3D xml.sax.make_parser() File=20 "/usr/lib/python2.0/xml/sax/__init__.py", line 76, in=20 make_parser
    return _create_parser(parser_name)<= BR> =20 File "/usr/lib/python2.0/xml/sax/__init__.py", line 101, in=20 _create_parser
    return=20 drv_module.create_parser()
AttributeError: create_parser
>&g= t;>=20
>>> space =3D " "
>>> def=20 getText(nodelist):
...     rc =3D=20 ""
...     for node in=20 nodelist:
...         if= =20 node.nodeType =3D=3D=20 node.TEXT_NODE:
...        = ;    =20 rc =3D rc + node.data
...     return rc
...= =20
>>> def handleSlideshow(slideshow):
...  &n= bsp; =20 print "<html>"
...    =20 handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])
.= ..    =20 slides =3D slideshow.getElementsByTagName("slide")
...  =   =20 handleToc(slides)
...    =20 handleSlides(slides)
...     print "</html&= gt;"
...=20
>>> def handleSlides(slides):
...   &n= bsp; for=20 slide in slides:
...       = =20 handleSlide(slide)
...
>>> def=20 handleSlide(slide):
...    =20 handleSlideTitle(slide.getElementsByTagName("title")[0])
... =    =20 handlePoints(slide.getElementsByTagName("point"))
...
>>= > def=20 handleSlideshowTitle(title):
...     print= =20 "<title>%s</title>" % getText(title.childNodes)
...= =20
>>> def handleSlideTitle(title):
...   = ; =20 print "<h2>%s</h2>" % getText(title.childNodes)
...= =20
>>> def handlePoints(points):
...   &n= bsp; print=20 "<ul>"
...     for point in=20 points:
...        =20 handlePoint(point)
...     print "</ul>"=
...=20
>>> def handlePoint(point):
...   &nbs= p; print=20 "<li>%s</li>" % getText(point.childNodes)
...
>= >>=20 def handleToc(slides):
...     for slide in= =20 slides:
...         title = =3D=20 slide.getElementsByTagName("title")[0]
...    =     =20 print "<p>%s</p>" % getText(title.childNodes)
...=20
>>> handleSlideshow(dom)
Traceback (most recent call= =20 last):
File "<stdin>", line 1, in ?
NameError: The= re is no=20 variable named 'dom'

--Boundary_(ID_pvfPHFYR4mYbu3tn5dPt0w)-- From adw27@cam.ac.uk Wed Sep 12 10:02:02 2001 From: adw27@cam.ac.uk (adw27@cam.ac.uk) Date: Wed, 12 Sep 2001 10:02:02 +0100 Subject: [XML-SIG] any one help me please In-Reply-To: <000801c13b5c$23b63be0$4201010a@zhushunmin> Message-ID: <3856510278.1000288922@ital16.ucles-red.cam.ac.uk> --On 12 September 2001 15:25 +0800 =D7=A3=CB=B3=C3=F1 = wrote: > > > help me please ,why can't parse the xml on linux. > my os is slcakware -linux-7.0 > but the below programme have been executed on the windows 2000. [program 1] >>>> dom =3D xml.dom.minidom.parseString(document) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.0/xml/dom/minidom.py", line 475, in parseString > return _doparse(pulldom.parseString, args, kwargs) > File "/usr/lib/python2.0/xml/dom/minidom.py", line 464, in _doparse > events =3D apply(func, args, kwargs) > File "/usr/lib/python2.0/xml/dom/pulldom.py", line 237, in parseString > parser =3D xml.sax.make_parser() > File "/usr/lib/python2.0/xml/sax/__init__.py", line 76, in make_parser > return _create_parser(parser_name) > File "/usr/lib/python2.0/xml/sax/__init__.py", line 101, in > _create_parser return drv_module.create_parser() > AttributeError: create_parser >>>> [program 2] ... >>>> handleSlideshow(dom) > Traceback (most recent call last): > File "", line 1, in ? > NameError: There is no variable named 'dom' It appears that the XML libraries have been omitted (or are not built by = default) in the Python 2.1.1 source tarball. (They appear to be bundled = with the Windows installer. I also had this problem. Installing PyXML from http://pyxml.sf.net fixed this for me; I have no idea = whether the XML libraries have been omitted by accident or design in the = 2.1.1 Unix release. Andrew From rnd@onego.ru Wed Sep 12 20:21:37 2001 From: rnd@onego.ru (Roman Suzi) Date: Wed, 12 Sep 2001 23:21:37 +0400 (MSD) Subject: [XML-SIG] namespaces and sax questions In-Reply-To: Message-ID: On 11 Sep 2001, Lars Marius Garshol wrote: > >* Fred L. Drake, Jr. >| >| There is currently a patch filed against Expat (and is assigned to >| me) that should allow it to report everything necessary for PyExpat >| to make qualified names available. I'll try and make sure the patch >| makes it into Expat 1.95.3 and that additional controls are >| available to allow this information to be reported by PyExpat. > >Very good. Once that is done I can update the expat driver to provide >this information. Thank you! The reason for me asking why there are no qualified names is that the docs do not tell that. It doesn't cause any problems, but can be confusing. Sincerely yours, Roman Suzi -- _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/ _/ Wednesday, September 12, 2001 _/ Powered by Linux RedHat 6.2 _/ _/ "If there's one thing I can't stand, it's intolerance." _/ From zhusm@neusoft.com Fri Sep 14 04:41:32 2001 From: zhusm@neusoft.com (=?gb2312?B?16PLs8Px?=) Date: Fri, 14 Sep 2001 11:41:32 +0800 Subject: [XML-SIG] what's wrong with me ? Message-ID: <000801c13ccf$25dad600$4201010a@zhushunmin> �� MIME ��ʽ�ľ��кܶಿ��Ϣ�� --Boundary_(ID_QJuJKTI5pvxuC/v+Ky5A1w) Content-type: text/plain; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE god help me ?please,see the below. 1 #include 2 #include 3 #include "Python.h" 4 5 6 int main(int argc,char **argv) 7 { 8 9 PyObject * item; 10 PyObject * POB0 =3D NULL, *POB1 =3D NULL; 11 12 Py_SetProgramName(argv[0]);//pass the programme'name to= the python 13 Py_Initialize(); //initialize the python. 14 15 PyRun_SimpleString("print hello"); 16 PyRun_SimpleString("from xml.dom.minidom import parse")= ; 17 18 printf("are you ok\n"); 19 Py_Finalize(); 20 return 1; 21 } 22 gcc is pass . root@zhusm:/home/zhusm/pythontest# ./test Traceback (most recent call last): File "", line 1, in ? NameError: There is no variable named 'hello' Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.0/xml/dom/minidom.py", line 19, in ? from StringIO import StringIO File "/usr/lib/python2.0/StringIO.py", line 32, in ? import errno ImportError: /usr/lib/python2.0/errnomodule.so: undefined symbol: PyI= nt_FromLong are you ok --Boundary_(ID_QJuJKTI5pvxuC/v+Ky5A1w) Content-type: text/html; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE

god help me ?please,see the below.

    1 #include=20 <stdio.h>
    2 #include=20 <string.h>
    3 #include=20 "Python.h"
    4
    5
&= nbsp; =20 6 int main(int argc,char **argv)
    7 {
&= nbsp; =20 8
    9       &n= bsp;=20 PyObject * item;
  =20 10         PyObject * POB0 = =3D NULL, *POB1=20 =3D NULL;
   11
  =20 12        =20 Py_SetProgramName(argv[0]);//pass the programme'name to the=20 python
   13       &n= bsp;=20 Py_Initialize();        //initiali= ze the=20 python.
   14
  =20 15         PyRun_SimpleString= ("print=20 hello");
   16       = =20 PyRun_SimpleString("from xml.dom.minidom import parse");
&nb= sp;=20 17
   18        = =20 printf("are you ok\n");
  =20 19        =20 Py_Finalize();
  =20 20         return 1;
= ; 21=20 }
   22

gcc is pass .

root@zhusm:/home/zh= usm/pythontest#=20 =2E/test
Traceback (most recent call last):
File "<st= ring>",=20 line 1, in ?
NameError: There is no variable named 'hello'
Trac= eback (most=20 recent call last):
File "<string>", line 1, in ?
&= nbsp; File=20 "/usr/lib/python2.0/xml/dom/minidom.py", line 19, in ?
= ; =20 =66rom StringIO import StringIO
File "/usr/lib/python2.0/St= ringIO.py",=20 line 32, in ?
import errno
ImportError:=20 /usr/lib/python2.0/errnomodule.so: undefined symbol: PyInt_FromLongare you=20 ok

--Boundary_(ID_QJuJKTI5pvxuC/v+Ky5A1w)-- From uche.ogbuji@fourthought.com Fri Sep 14 06:08:59 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 13 Sep 2001 23:08:59 -0600 Subject: [XML-SIG] ANN: 4Suite and 4Suite Server 0.11.1 Message-ID: <200109140508.f8E58wm30389@localhost.local> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.11.1 and 4Suite Server 0.11.1 ---------------------------- Open source XML processing tools and an XML data server http://4Suite.org http://Fourthought.com/4SuiteServer 4Suite Server News ------------------ * RDF: Many improvements to the serializer * RDF: Add utilities to simplify API * RDF: Implement rdfs:isDefinedBy * ODS: Full support for type definitions * ODS: Full support for constant expressions * ODS: Full support for the date/time types * ODS: Added support for char and octet types * pDomlette: Reader modularization * pDomlette: SAX reader fixes * pDomlette: support cached docIndex * pDomlette: update XInclude support * cDomlette: store baseUri/refUri on document node * XSLT: EXSLT now built in * XSLT: add xinclude suppression command line option * XSLT: OutputHandler fixes * XSLT: Support stylesheet include/import "path" * XSLT: implement ft:assign, ft:generate-uuid, ft:replace * XSLT: fixes to import precedence * XSLT: Oliver Graf's output and other fixes * XSLT: Top-level params can now be lists of strings which become node sets * XPath: Sort out exceptions * XUpdate: Implement command line * All: Normalization of namespaces * Many misc optimizations and bug-fixes 4Suite News ----------- * Implement RDF add and remove XSLT extensions * Implement extension to get repo user * Add and update demos * Fix backup/restore * DocDefs now accept ns mappings from the serialized XML's own namespaces * Protocols: HTTP handler fixes * Protocols: FTP handler fixes * Add 4ss rdf deserialize command * Set up share file for user modules * Normalize namespaces * Many general bug fixes 4Suite is a collection of Python tools for XML processing and object database management. It provides support for XML parsing, several transient and persistent DOM implementations, XPath expressions, XPointer, XSLT transforms, XLink, RDF, XInclude, XUpdate and ODMG object databases. 4Suite Server is a platform for XML processing. It features an XML data repository, metadata management, a rules-based engine, XSLT transforms, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also provides transactions and access control features. Along with basic console and command-line management, it supports remote, cross-platform and cross-language access through CORBA, WebDAV, HTTP, FTP and other request protocols. All the software is open-source and free to download. Priority support and customization is available from Fourthought, Inc. For more information on this, see the http://FourThought.com, or contact Fourthought at info@fourthought.com or +1 303 583 9900 More info and Obtaining 4Suite and 4Suite Server ------------------------------------------------ Please see http://4Suite.org http://Fourthought.com/4SuiteServer >From where you can download source, Windows and Linux binaries. 4Suite is distributed under a license similar to that of the Apache Web Server. From support@now.net.cn Wed Sep 12 19:01:50 2001 From: support@now.net.cn (Today's NetWork) Date: 12 Sep 2001 18:01:50 -0000 Subject: [XML-SIG] ȫ��֧��ASP��PHP ��JSP��PERL��CGI��ݿ�, ��ʷ��ͼ� Message-ID: <20010912180150.27034.qmail@localhost.localdomain> �𾴵Ŀͻ��! ��ע��(��VDNS)��µ� ֻҪעһ��Ѻ��һ��վ��ƿռ�Ĵ��λ��,ʹ��Ϊ��ӵ��ߡ� ��μ��ҹ�˾��Ƴ��ġ��Żݴ��ж��ע��VDNS��ʷ��ͼۡ� ��ϵ��ע��http://www.now.net.cn/register/ ��˲��ʵ�ֹ��ʴ��վ��룡�� ˽�ʲô��VDNS�� http://www.now.net.cn/control/VDNSFAQ.net http://www.now.net.cn/support/vdns.net ��Դ��Ŀ쳵��Ϊ��һ�塱�� ȫ��֧��ASP��PHP ��JSP��Servlet��PERL��CGI��ݿ� (�۸�ȫ��µ�10%) �ռ併��ʷ��͡��񲻱䣬֧��ȫ��ű��ÿռ��ܵͼ�ע�� ռ併��ʷ��͡��(֧��ݿ⣩��񲻱䣨�ռ��û� �Ͷ��POP�˺� ��6��μ�� ֧��ȫ��ű��CGI��PERL��PHP��ASP��JSP��ݿ�ȶ��ѡ�񣩣� a ��õ��˵Ŀռ䵽http://www.now.net.cn�� ��ʷ��ͼۡ��ҳ�ײ͡� 2000Ԫ��Լ�Ʒ�Ƶ��ƹ㣬��ʽ��ۣ��Ͽɾ��£�Ч��ѣ� ��ӭ��Today's Network support@now.net.cn ��ӭ��Today's Network http://www.now.net.cn �麣�컥�Ƽ��޹�˾ 0756--2125583 2125593 2125523 2252872 0756--2236575 2125594 2216376 ��棺 0756--2229669 ��յ��ת��cancel@now.net.cn From Benjamin.Schollnick@usa.xerox.com Fri Sep 14 13:45:16 2001 From: Benjamin.Schollnick@usa.xerox.com (Schollnick, Benjamin) Date: Fri, 14 Sep 2001 08:45:16 -0400 Subject: [XML-SIG] XML Error? (Workaround found) Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --Boundary_(ID_PeaDEhdqTKZWDvW4Z+rOEA) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT Folks, I'm having some problems here with some XML code... I'm using Python v2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 Bit (Intel)] on a Win NT 4 platform... This XML, causes a xml.sax._exceptions.SAXParseException: (:26:38: not well-formed) Traceback (most recent call last): File "C:\develope\docushare_db\test.py", line 85, in ? decoded_data = ds_workspace.decode_ds_propfind (xml_data) File "..\docushare\ds_workspace.py", line 402, in decode_ds_propfind Docushare_object.transfer_xml_to_ds_workspace ( data ) File "..\docushare\ds_workspace.py", line 147, in transfer_xml_to_ds_workspace xml_obj = XML_Objectify ( tempfile_name ) File "E:\ds_reminder\xml_objectify.py", line 218, in __init__ self._dom = minidom.parseString(self._fh.read()) File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 475, in parseString return _doparse(pulldom.parseString, args, kwargs) File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 465, in _doparse toktype, rootNode = events.getEvent() File "c:\progra~1\python20\lib\xml\dom\pulldom.py", line 187, in getEvent self.parser.feed(buf) File "c:\program files\python20\lib\xml\sax\expatreader.py", line 85, in feed self._err_handler.fatalError(exc) File "c:\program files\python20\lib\xml\sax\handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :26:38: not well-formed Any idea why? I've included the dump of the XML in a attached file (error.1, standard ASCII) because it appears to have some "high-ascii" characters that are not being reproduced in the CUT 'N paste that I've included here... Actually, I just found it... The High-Ascii Characters did indeed break the XML parsing....$92, $93, $94 specially.... xml_data = string.replace (xml_data, chr(146), "") xml_data = string.replace (xml_data, chr(147), "") xml_data = string.replace (xml_data, chr(148), "") Any work arounds beyond this, or plans to fix this implementation? (They are contained in the CDATA portion). - Benjamin http://xww.psg-techservices.world.xerox.com/docushare/File-1194 Past AI's 1999_10_26 AIs

Action items from 10/26 staff meeting File Melissa Gydesenmgydesen 1999-10-26T21:13:21Z Fri, 05 Jan 2001 19:15:36 GMT Margo Forsythemforsythe Past AI's actionitems102699.doc application/msword mgydesen 1 29184 HTTP/1.1 200 OK <> --Boundary_(ID_PeaDEhdqTKZWDvW4Z+rOEA) Content-type: application/octet-stream; name=error.1 Content-disposition: attachment; filename=error.1 Content-transfer-encoding: quoted-printable logged in 330 0 - File-1194 = http://xww.psg-techservices.world.xerox.com/docushare/File-1194 Past = AI's 1999_10_26 AIs

Action items from 10/26 staff meeting File Melissa = Gydesenmgydesen= 1999-10-26T21:13:21Z Fri, 05 Jan 2001 19:15:36 GMT Margo = Forsythemforsythe Past = AI's actionitems102699.doc application/msword mgydesen 1 29184 HTTP/1.1 200 OK *Error* Elapsed Time in Seconds : -8.64300000668 Elapsed Time in Minutes : -0.144050000111 --Boundary_(ID_PeaDEhdqTKZWDvW4Z+rOEA)-- From Benjamin.Schollnick@usa.xerox.com Fri Sep 14 13:56:34 2001 From: Benjamin.Schollnick@usa.xerox.com (Schollnick, Benjamin) Date: Fri, 14 Sep 2001 08:56:34 -0400 Subject: [XML-SIG] XML Error? (Workaround found) Message-ID: Here's a follow up... I suspect, but have not conclusively proven, that with my configuration any character higher than Chr(127), may cause the xml parser to fail, with an exception. Right now, I'm forced to build a santization string from chr(128) - chr(255), and am attempting to use RE.sub to filter those characters out. Is there an simplier way? Why isn't the XML parser dealing with the high-ascii codes, especially since they are in the CDATA string? - Benjamin -----Original Message----- From: Schollnick, Benjamin [mailto:Benjamin.Schollnick@usa.xerox.com] Sent: Friday, September 14, 2001 8:45 AM To: 'xml-sig@python.org' Subject: [XML-SIG] XML Error? (Workaround found) Folks, I'm having some problems here with some XML code... I'm using Python v2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 Bit (Intel)] on a Win NT 4 platform... This XML, causes a xml.sax._exceptions.SAXParseException: (:26:38: not well-formed) Traceback (most recent call last): File "C:\develope\docushare_db\test.py", line 85, in ? decoded_data = ds_workspace.decode_ds_propfind (xml_data) File "..\docushare\ds_workspace.py", line 402, in decode_ds_propfind Docushare_object.transfer_xml_to_ds_workspace ( data ) File "..\docushare\ds_workspace.py", line 147, in transfer_xml_to_ds_workspace xml_obj = XML_Objectify ( tempfile_name ) File "E:\ds_reminder\xml_objectify.py", line 218, in __init__ self._dom = minidom.parseString(self._fh.read()) File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 475, in parseString return _doparse(pulldom.parseString, args, kwargs) File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 465, in _doparse toktype, rootNode = events.getEvent() File "c:\progra~1\python20\lib\xml\dom\pulldom.py", line 187, in getEvent self.parser.feed(buf) File "c:\program files\python20\lib\xml\sax\expatreader.py", line 85, in feed self._err_handler.fatalError(exc) File "c:\program files\python20\lib\xml\sax\handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :26:38: not well-formed Any idea why? I've included the dump of the XML in a attached file (error.1, standard ASCII) because it appears to have some "high-ascii" characters that are not being reproduced in the CUT 'N paste that I've included here... Actually, I just found it... The High-Ascii Characters did indeed break the XML parsing....$92, $93, $94 specially.... xml_data = string.replace (xml_data, chr(146), "") xml_data = string.replace (xml_data, chr(147), "") xml_data = string.replace (xml_data, chr(148), "") Any work arounds beyond this, or plans to fix this implementation? (They are contained in the CDATA portion). - Benjamin http://xww.psg-techservices.world.xerox.com/docushare/File-1194 Past AI's 1999_10_26 AIs

Action items from 10/26 staff meeting File Melissa Gydesenmgydesen 1999-10-26T21:13:21Z Fri, 05 Jan 2001 19:15:36 GMT Margo Forsythemforsythe Past AI's actionitems102699.doc application/msword mgydesen 1 29184 HTTP/1.1 200 OK <> From larsga@garshol.priv.no Fri Sep 14 15:16:14 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 14 Sep 2001 16:16:14 +0200 Subject: [XML-SIG] XML Error? (Workaround found) In-Reply-To: References: Message-ID: * Benjamin Schollnick | | I'm having some problems here with some XML code... You are running into the most common XML problem there is. Your document is encoded in ISO 8859-1, but you fail to declare that it does so, causing the parser to assume that you are using UTF-8 and then choking when it discovers illegal UTF-8 byte sequences. Change your XML declaration to | (:26:38: not well-formed) If it is at all possible we should produce a better error message than this one. | xml_data = string.replace (xml_data, chr(146), "") | xml_data = string.replace (xml_data, chr(147), "") | xml_data = string.replace (xml_data, chr(148), "") If you are using characters in the range 128-159 you are almost certainly using windows-1252, and not ISO 8859-1, and should change your encoding declaration accordingly. --Lars M. From rsalz@zolera.com Fri Sep 14 15:35:34 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 14 Sep 2001 10:35:34 -0400 Subject: [XML-SIG] ZSI 1.1 released References: Message-ID: <3BA215B6.EA56D0C7@zolera.com> I'm pleased to announce the 1.1 release of ZSI, the Zolera SOAP infrastructure. This is a pure-Python SOAP implementation. It's open source with an "acknowledgement required" license. Major changes for this release include the "fit and finish" for script-like dynamic parsing, simple CGI and server wrappers, support for PyXML 0.6, the Apache "Map" datatype, HTTP and HTTP-like basic authentication, a handful of bugfixes, and more documentation (the PDF file is 30+ pages). For more information, see http://www.zolera.com/resources/opensrc/zsi (The formal Python/Parnassus announcement should show up soon.) /r$ -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From Benjamin.Schollnick@usa.xerox.com Fri Sep 14 16:05:43 2001 From: Benjamin.Schollnick@usa.xerox.com (Schollnick, Benjamin) Date: Fri, 14 Sep 2001 11:05:43 -0400 Subject: [XML-SIG] XML Error? (Workaround found) Message-ID: Thanks... I've escalated this problem to the developers.... At least I now know it's not a problem with *MY* code... >g< Thanks Again! - Benjamin -----Original Message----- From: Lars Marius Garshol [mailto:larsga@garshol.priv.no] Sent: Friday, September 14, 2001 10:16 AM To: 'xml-sig@python.org' Subject: Re: [XML-SIG] XML Error? (Workaround found) * Benjamin Schollnick | | I'm having some problems here with some XML code... You are running into the most common XML problem there is. Your document is encoded in ISO 8859-1, but you fail to declare that it does so, causing the parser to assume that you are using UTF-8 and then choking when it discovers illegal UTF-8 byte sequences. Change your XML declaration to | (:26:38: not well-formed) If it is at all possible we should produce a better error message than this one. | xml_data = string.replace (xml_data, chr(146), "") | xml_data = string.replace (xml_data, chr(147), "") | xml_data = string.replace (xml_data, chr(148), "") If you are using characters in the range 128-159 you are almost certainly using windows-1252, and not ISO 8859-1, and should change your encoding declaration accordingly. --Lars M. _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From tpassin@home.com Sat Sep 15 06:07:15 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 15 Sep 2001 01:07:15 -0400 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files References: <200109140508.f8E58wm30389@localhost.local> Message-ID: <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> I just downloaded the latest version of 4Suite (0.11.1), along with pyxml 0.66, for both Python 1.5.2 and 2.1.1, on Windows. I want to report on performance transforming a 4 MB xml source file, and especially terrible performance when using the 4xslt.bat batch file that gets installed in the Scripts directory. The computer: Win98 SE, 450 MHz Pentium 3, 256 MB Ram. At the start of the tests, I had from 194 to 210 MB free. I ran Saxon 5.5.1, msxsl (the Microsoft command line wrapper around msxml3), a python script 4xslt.py that I wrote some time ago, and the 4xslt.bat file as supplied with 4Suite. My script uses cDomlette; I don't know what 4xslt.bat uses, though I suspect it's not cDomlette for reasons that will be apparent. Source file size: 3.94 MB Result file size: 667 KB Main stylesheet imports two others and builds keys to index into lookup tables in a separate file. Results: processor time to transform, sec remarks msxsl 7 saxon 10 4xslt.py 167 Python 1.5.2 4xslt.py 201 Python 2.1.1 4xslt.bat gave up at 253 Py 1.5.2 (see below) 4xslt.bat gave up when memory ran out Py 2.1.1 (see below) I gave up on 4xslt.bat because it used up all 194 MB of free memory then went to virtual memory, which it kept on using more of until I quit (previously I had waited much longer with no completion, but did not get an accurate timing). Here is the memory used by the various processors during processing. processor decrease in free memory, MB msxsl 17 saxon 21 4xslt.py 32 (Py 1.5.2) 4xslt.py 45 (Py 2.1.1) 4xst.bat > 194 The 167 seconds using my script is not acceptable for my particular application, but the behavior when the transformation is launched by 4xslt.bat is impossible. Why should the very same transformation take ten times the memory that msxsl or Saxon use? You can't have an application run down your memory like this. And I don't even know how much virtual memory was used on top of the 194 MB. These results have been reasonably repeatable tonight. I hope something can be done to improve the performance and memory usage for large files. How about it, Uche and Mike? Any thoughts about what is happening here? I'll be happy to send my files for testing if anyone likes. The source file is pretty horrid ( I don't have any control over that, I'm afraid). It has very long paths, and the element names are extremely long, the result of machine translation of some CORBA IDL. I wonder if that has something to do with the results. However, what the stylesheet does is not very complex, it just has to do it 3.94 MB worth. Saxon and MS do get through it reasonably quickly. Cheers, Tom P From 520091860693-0001@t-online.de Sat Sep 15 12:51:32 2001 From: 520091860693-0001@t-online.de (Arno Waschk) Date: Sat, 15 Sep 2001 13:51:32 +0200 Subject: [XML-SIG] PYXML-0.6.5 build problem on Cygwin 133 Message-ID: <3BA340C4.4050307@arnowaschk.de> Dear people, while trying to build your nice Python XML v0.6.5 I get the following error which I cannot solve: [...] not copying xml/utils/iso8601.py (output up-to-date) not copying xml/utils/qp_xml.py (output up-to-date) running build_ext skipping '_xmlplus.parsers.pyexpat' extension (up-to-date) skipping '_xmlplus.parsers.sgmlop' extension (up-to-date) building '_xmlplus.utils.boolean' extension skipping extensions/boolean.c (build/temp.cygwin-1.3.3-i686-2.1/boolean.o up-to- date) gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.3.3-i686-2.1/boolea n.o -L/usr/lib/python2.1/config -lpython2.1 -o build/lib.cygwin-1.3.3-i686-2.1/_ xmlplus/utils/boolean.dll Cannot export PyBoolean_Type: symbol not found collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 I am on a fresh Cygwin 1.3.3 installation, following your README instructions ("python setup.py build" etc.) Am I doing something wrong? Do you need more information? Thank you in advance Arno Waschk From tpassin@home.com Sat Sep 15 21:08:48 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 15 Sep 2001 16:08:48 -0400 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files References: <002e01c125b7$8c053940$445d4540@Dell2> Message-ID: <001c01c13e22$3baba780$7cac1218@reston1.va.home.com> [Brian Quinlan] > Are there any features of 4xslt that you need that aren't available in > other XSLT processors? Or is it just the convenience of doing XSLT > transformations directly in Python that leads you to want to use 4xslt? > Sort of. This transformation is actually being done by Saxon, so I'm not directly affected so far. The same system runs Zope to interact with users, and I do use 4xslt via a Zope external method, although the data isn't so massive. I still need good performance there, but it should be easier to get a reasonable response time. Mainly, I don't want to start a separate process each time a user runs a transform. > If it is the later, Python buildings for Xalan, the Apache groups XSLT > processor, are available (the Windows release has the most convenient > installation process, if you have a choice of platforms to experiment > with): > http://sourceforge.net/project/showfiles.php?group_id=28142&release_id=4 > 7388 > > I haven't used Xalan for a while and have been meaning to get the latest versoin to try on this. The Python binding might be good to try. Thanks. Cheers, Tom P From martin@loewis.home.cs.tu-berlin.de Sat Sep 15 21:01:04 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 15 Sep 2001 22:01:04 +0200 Subject: [XML-SIG] PYXML-0.6.5 build problem on Cygwin 133 In-Reply-To: <3BA340C4.4050307@arnowaschk.de> (520091860693-0001@t-online.de) References: <3BA340C4.4050307@arnowaschk.de> Message-ID: <200109152001.f8FK14003425@mira.informatik.hu-berlin.de> > Am I doing something wrong? You mean, besides trying to use non-MS tools on MS Windows ?-) In short, PyXML does not currently support Cygwin. You may try the patch at http://sourceforge.net/tracker/index.php?func=detail&aid=445405&group_id=6473&atid=306473 or use the contributed binaries mentioned in http://sourceforge.net/forum/forum.php?forum_id=102334 Regards, Martin From dieter@handshake.de Sat Sep 15 19:35:27 2001 From: dieter@handshake.de (Dieter Maurer) Date: Sat, 15 Sep 2001 20:35:27 +0200 (CEST) Subject: [XML-SIG] 4XSLT Performance Problems with Large Files In-Reply-To: <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> References: <200109140508.f8E58wm30389@localhost.local> <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> Message-ID: <15267.40815.991038.215312@lindm.dm> Thomas B. Passin writes: > ... > Results: > > processor time to transform, sec remarks > msxsl 7 > saxon 10 > 4xslt.py 167 Python 1.5.2 > 4xslt.py 201 Python 2.1.1 > 4xslt.bat gave up at 253 Py 1.5.2 (see below) > 4xslt.bat gave up when memory ran out Py 2.1.1 (see below) Your results are compatible with my experience. I used 4xslt to process DocBook/XML files. It has been far too slow (and contained lots of bugs). I switched to saxon and got speed improvements of more than an order of magnitude (less bugs, too). My observation seems to indicate that "saxon" needs most of its time with startup (reading the DocBook stylesheets). The document size seems to be almost irrelevant. It takes about the same time to transform a 10 kB document than a several hundred kB document. For 4xslt (with "pDomlette", to be fair, as "cDomlette" did not support entity references at that time), processing time apparently grew more than linearly with document size. Dieter From tpassin@home.com Sat Sep 15 23:08:14 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 15 Sep 2001 18:08:14 -0400 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files References: <002f01c125ce$914a7070$445d4540@Dell2> Message-ID: <003301c13e32$eab98c00$7cac1218@reston1.va.home.com> [Brian Quinlan] > > Mainly, I don't want to start a separate process each time a user runs > a > > transform. > > Yep. Most transformation engines have pretty bad startup times. Though > if your transformations are taking >10 seconds it probably doesn't > matter. Xalan has a bit slower transformation speed than Saxon but has a > horrible startup time (probably because it is so huge). Running it > in-process solves that problem, of course. > > > I haven't used Xalan for a while and have been meaning to get the > latest > > versoin to try on this. The Python binding might be good to try. > Thanks. > > The Python bindings are for Xalan 1.1. The CVS version works correctly > with Xalan 1.2 but hasn't been extensively tested. I can send you a > binary if you are interested. > By all means. Thanks. Tom P From tpassin@home.com Sun Sep 16 00:22:34 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 15 Sep 2001 19:22:34 -0400 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files References: <003001c125de$c3227fb0$445d4540@Dell2> Message-ID: <003f01c13e3d$4d6b7660$7cac1218@reston1.va.home.com> [Brian Quinlan] > Here are some same usages: > > string = Pyana.transformToString(xmlString, xslString) > Pyana.transformToFile(xmlString, xslString, file=r'c:\...') > Pyana.transformToFile( Pyana.URI('file:///...'), > Pyana.URI('file:///...'), > file= r'c:\...') > > Let me know if you need any other help. > Thank you, Brian Tom From tpassin@home.com Sun Sep 16 02:33:38 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 15 Sep 2001 21:33:38 -0400 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files References: <200109140508.f8E58wm30389@localhost.local> <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> Message-ID: <000601c13e4f$9c716460$7cac1218@reston1.va.home.com> [I wrote] My script uses cDomlette; I don't know what > 4xslt.bat uses, though I suspect it's not cDomlette for reasons that will be > apparent. > I've now verified that the uncontrolled appropriation of memory for my large transformation is definitely associated with the use of pDomlette. I changed my script to use p- instead of c-Domlette, and got the same unrestrained appetite I had seen in the 4xslt.bat driver. Changing my script back to use cDomlette restored the limited use of memory I had originally seen. Here is the code fragment I've been changing (I took it from some 4suite code quite a while ago): from Ft.Lib import pDomlette BETA_DOMLETTE = 1#os.environ.get("BETA_DOMLETTE") if BETA_DOMLETTE and not validate_flag: from Ft.Lib import cDomlette g_readerClass = cDomlette.RawExpatReader reader = cDomlette.RawExpatReader() elif validate_flag: reader = pDomlette.SaxReader(validate=1) else: reader = pDomlette.PyExpatReader() I just changed the 1 on the second line to a zero to use pDomlette. Cheers, Tom P From larsga@garshol.priv.no Sun Sep 16 10:29:21 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Sep 2001 11:29:21 +0200 Subject: [XML-SIG] 4XSLT Performance Problems with Large Files In-Reply-To: <15267.40815.991038.215312@lindm.dm> References: <200109140508.f8E58wm30389@localhost.local> <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> <15267.40815.991038.215312@lindm.dm> Message-ID: * Dieter Maurer | | My observation seems to indicate that "saxon" needs most of | its time with startup (reading the DocBook stylesheets). | The document size seems to be almost irrelevant. It takes | about the same time to transform a 10 kB document than a | several hundred kB document. Do you have '

Hello,

I visited www.4suite.org and I noticed that you are not listed on some search engines. I am sure you can increase the number of people who visit www.4suite.org . Do you know TrafficMagnet? TrafficMagnet is a unique technology that instantly submits your web site to over 300,000+ search engines and directories every month. This is a very low-cost and effective way of advertising your site.

To check our prices and submit www.4suite.org to 300,000+ search engines, go to TrafficMagnet.net

I would love to hear from you.

Best Regards,
Christine Hall
Sales & Marketing
www.TrafficMagnet.net

From noreply@sourceforge.net Sun Sep 16 20:03:37 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 16 Sep 2001 12:03:37 -0700 Subject: [XML-SIG] [ pyxml-Bugs-462085 ] ExternalEntityParserCreate fails Message-ID: Bugs item #462085, was opened at 2001-09-16 12:03 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=462085&group_id=6473 Category: expat Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: ExternalEntityParserCreate fails Initial Comment: Trying to invoke ExternalEntityParserCreate with the provided context argument when encountering a external DTD fails because the context is None. I rewrote my test-program in C, and it worked. The desired argument has to be NULL there. So the solution was to patch pyexpat.c to accept the type "z" instead of "s". Now everything works for me. Diez ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=462085&group_id=6473 From dieter@handshake.de Sun Sep 16 18:54:56 2001 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 16 Sep 2001 19:54:56 +0200 (CEST) Subject: [XML-SIG] 4XSLT Performance Problems with Large Files In-Reply-To: <001c01c13e22$3baba780$7cac1218@reston1.va.home.com> References: <002e01c125b7$8c053940$445d4540@Dell2> <001c01c13e22$3baba780$7cac1218@reston1.va.home.com> Message-ID: <15268.59248.377509.882778@lindm.dm> Thomas B. Passin writes: > Mainly, I don't want to start a separate process each time a user runs a > transform. We use "saxon" as an intranet service (servlet) to transform files on demand. This way, we avoid starting a new process each time. We speak HTTP with the servlet, but XML-RPC would probably be possible, too. In our case, the servlet owns both the stylesheets and the source files (taken from a database). Not sure, whether this approach is faster than starting a separate process, when stylesheet and xml file are owned by the client. Other alternative would be JPE, the Java Python Environment. Dieter From dieter@handshake.de Sun Sep 16 19:00:37 2001 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 16 Sep 2001 20:00:37 +0200 (CEST) Subject: [XML-SIG] 4XSLT Performance Problems with Large Files In-Reply-To: References: <200109140508.f8E58wm30389@localhost.local> <000301c13da4$4a211ee0$7cac1218@reston1.va.home.com> <15267.40815.991038.215312@lindm.dm> Message-ID: <15268.59589.651426.55823@lindm.dm> Lars Marius Garshol writes: > > * Dieter Maurer > | > | My observation seems to indicate that "saxon" needs most of > | its time with startup (reading the DocBook stylesheets). > | The document size seems to be almost irrelevant. It takes > | about the same time to transform a 10 kB document than a > | several hundred kB document. > > Do you have ' may be part of the problem. The DocBook DTDs are huge, and take quite > a while to parse. Of course, so are the stylesheets, but removing the > DTD reference may help somewhat, at least. > > I've had good experiences with using xt instead, which uses xp, which > doesn't read the DTD at all. > > I've had even better experience with writing my own stylesheets and > not using DTD references at all. :-) Thank you for your comment. Yes, they have " <001c01c13e22$3baba780$7cac1218@reston1.va.home.com> <15268.59248.377509.882778@lindm.dm> Message-ID: <011a01c13f4d$616c1730$4201010a@zhushunmin> i use the xml.dom.minidom to parse a xml file.and get the content like this:but before every string there is a u.what's the u char. thanks . [2, u'accept', u'10.1.1.0', u'10.1.1.255', u'12:1E:EF:1D:6F:65', u'10.1.1.92', u'in', u'1', u'7', [u'1:30', u'3:30', u'6:30', u'8:30'], [u'2:30', u'4:30', u'7:30', u'9:30'], u'deny', u'10.1.1.0', u'10.1.1.255', u'12:1E:EF:1D:6F:65', u'10.1.1.92', u'in', u'1', u'7', [u'1:30', u'3:30', u'6:30', u'8:30'], [u'2:30', u'4:30', u'7:30', u'9:30']] From larsga@garshol.priv.no Mon Sep 17 09:11:30 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Sep 2001 10:11:30 +0200 Subject: [XML-SIG] what's this. In-Reply-To: <011a01c13f4d$616c1730$4201010a@zhushunmin> References: <002e01c125b7$8c053940$445d4540@Dell2> <001c01c13e22$3baba780$7cac1218@reston1.va.home.com> <15268.59248.377509.882778@lindm.dm> <011a01c13f4d$616c1730$4201010a@zhushunmin> Message-ID: * zhusm@neusoft.com | | i use the xml.dom.minidom to parse a xml file.and get the content | like this:but before every string there is a u.what's the u char. In Python 2.x Unicode strings are displayed that way. So it means that it's a Unicode string, which generally should behave like the old byte strings. --Lars M. From Juergen Hermann" This is the first release of PIRXX, a wrapper of Xerces and Xalan for Python. Currently, it contains a working SAX2 driver for the Xerces parser. What's missing is the locator mechanism, and the DTDHandler and EntityResolver interfaces. Homepage: http://pirxx.sourceforge.net/ Download: http://sourceforge.net/project/showfiles.php?group_id=25711 Mailing list: http://mail.python.org/mailman/listinfo/xml-sig From zhusm@neusoft.com Wed Sep 19 09:46:41 2001 From: zhusm@neusoft.com (=?gb2312?B?16PLs8Px?=) Date: Wed, 19 Sep 2001 16:46:41 +0800 Subject: [XML-SIG] something about the socket. Message-ID: <002d01c140e7$9acc7c40$4201010a@zhushunmin> �� MIME ��ʽ�ľ��кܶಿ��Ϣ�� --Boundary_(ID_e0v/OGWxJjzERvUaXpMuoQ) Content-type: text/plain; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE unsigned long ip2val(char* ip) { struct in_addr inaddr; inet_aton(ip, &inaddr); return (ntohl(inaddr.s_addr));=20 } i konw that python also has that two function. so i edit the function like this,but it is't work . def ip2val(ip_string): return ntohl(inet_aton(ip_string)); what's wrong with that ,can you tell me.and how to edit it. --Boundary_(ID_e0v/OGWxJjzERvUaXpMuoQ) Content-type: text/html; charset=gb2312 Content-transfer-encoding: QUOTED-PRINTABLE

unsigned long
ip2val(char* ip)
{
s= truct in_addr=20 inaddr;

inet_aton(ip, &inaddr);
return= =20 (ntohl(inaddr.s_addr));

}

i konw that python also has that two function.

so i edit the function like this,but it is't work= =20 =2E

def ip2val(ip_string):

ret= urn=20 ntohl(inet_aton(ip_string));

what's wrong with that ,can you tell me.and how t= o edit=20 it.

--Boundary_(ID_e0v/OGWxJjzERvUaXpMuoQ)-- From =?EUC-KR?B?w9aw5sO2?=" V2hpbGUgbGVhcm4gcHl0aG9uLCB0aGF0IGJvdWdodCBhICdYTUwgcHJvY2Vzc2xuZyB3aXRo IHB5dGhvbicgYm9vayB0byBzdHVkeSBieSBYTUwgc28gc2VlIGluIElFIHRob3VnaCBtYWtl IFNvc2V1dGV1bCBob3cgYmUNCg0KDQo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQq/7LiuIMDOxc2z3SwgRGF1 bSAgaHR0cDovL3d3dy5kYXVtLm5ldA0KyK2y9sfRIL/4vKYgsMu79iEgRGF1bbDLu/YgxO3E obfOIMOjwNohDQqi0bDLu/bHz7HiIGh0dHA6Ly9zZWFyY2guZGF1bS5uZXQNCg== From pyxml@xhaus.com Thu Sep 20 14:10:39 2001 From: pyxml@xhaus.com (Xhaus Main Account) Date: Thu, 20 Sep 2001 14:10:39 +0100 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. Message-ID: <3BA9EACF.BB0394F5@xhaus.com> Greetings all, I'm using python/pyxml/4suite to process a collection of XML files to generate a web site (www.paratuberculosis.org), and I'm finding that I have problems with memory usage. I have several hundred XML files which cross-reference each other, and I have to load them all into memory to process them (they are all xlink'ed together) I'm thinking that one solution would be to eliminate the use of the DOM, and store the information in a hierarchy of python objects which I build up myself from the XML files, using SAX. I know that it should be possible to achieve this by making my object hierarchy present a full DOM interface, i.e. implement the Node, Element, Attribute, etc, interfaces. However, there would be a large amount of effort involved in implementing all of the DOM methods. So I'm looking for possible shortcuts, if there are any. A few months ago, someone posted on this list (I think it was Eliot Kimber) that there were small modifications that could made that would make xpath expressions usable against proprietary object hierarchies/models. But there was no indication given of how to achieve that. Has anyone got any pointers as to what those modifications might be, or how I would go about implementing such a system. Failing that, some pointers to documentation on the architecture of 4XPath and 4Xslt would be most useful. Thanks in advance for any help rendered. BTW, to the authors of the excellent Python XML tools: Please don't take any of this as criticism. I continue to be amazed by the flexibility and power of Python and available collection of XML processing tools. A huge Thank You to all involved in the Python XML effort! Regards, Alan Kennedy. P.S. Out of interest, I have approximately 1335 XML files, taking 2.1meg of space. Not all of these files need to processed at the same time, although things would be much faster if they could be. I'm finding that once the I've loaded in say, 250 members (each of which is represented by two or more XML files), and constructed a membership directory XML file (listing all of the members), I'm hitting about 130Meg of memory usage(!), and my poor little PII-233/160 Meg RAM/NT4 machine starts to thrash. I fully realise that my design is not the most efficient, but the enormous memory requirements of DOM don't help. Hence why I'm trying to find an alternative memory representation foir the documents. I could do a total rewrite, but the web site I do for is an unpaid voluntary effort, and I can't spare it that much time. From Alexandre.Fayolle@logilab.fr Thu Sep 20 14:48:13 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 20 Sep 2001 15:48:13 +0200 (CEST) Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. In-Reply-To: <3BA9EACF.BB0394F5@xhaus.com> Message-ID: Hello, >From the description of your situation, I think that using 4SuiteServer could be a great help. I believe 4SS supports XLink. Depending on the use you make of XPath, this can be taken care of by the RDF-based indexing feature of 4XSLT. You don't give much details on what your application is made of. However, if you're using 4DOM, try switching to pDomlette (or even to cDomlette), which are both more lightweight and faster implementations of DOM (though not as compliant as 4DOM) Alexandre Fayolle On Thu, 20 Sep 2001, Xhaus Main Account wrote: > Greetings all, > > I'm using python/pyxml/4suite to process a collection of XML files to > generate a web site (www.paratuberculosis.org), and I'm finding that I > have problems with memory usage. I have several hundred XML files which > cross-reference each other, and I have to load them all into memory to > process them (they are all xlink'ed together) > > I'm thinking that one solution would be to eliminate the use of the DOM, > and store the information in a hierarchy of python objects which I build > up myself from the XML files, using SAX. > > I know that it should be possible to achieve this by making my object > hierarchy present a full DOM interface, i.e. implement the Node, > Element, Attribute, etc, interfaces. > > However, there would be a large amount of effort involved in > implementing all of the DOM methods. So I'm looking for possible > shortcuts, if there are any. > > A few months ago, someone posted on this list (I think it was Eliot > Kimber) that there were small modifications that could made that would > make xpath expressions usable against proprietary object > hierarchies/models. But there was no indication given of how to achieve > that. > > Has anyone got any pointers as to what those modifications might be, or > how I would go about implementing such a system. > > Failing that, some pointers to documentation on the architecture of > 4XPath and 4Xslt would be most useful. > > Thanks in advance for any help rendered. > > BTW, to the authors of the excellent Python XML tools: Please don't take > any of this as criticism. I continue to be amazed by the flexibility and > power of Python and available collection of XML processing tools. A huge > Thank You to all involved in the Python XML effort! > > Regards, > > Alan Kennedy. > > P.S. Out of interest, I have approximately 1335 XML files, taking 2.1meg > of space. Not all of these files need to processed at the same time, > although things would be much faster if they could be. I'm finding that > once the I've loaded in say, 250 members (each of which is represented > by two or more XML files), and constructed a membership directory XML > file (listing all of the members), I'm hitting about 130Meg of memory > usage(!), and my poor little PII-233/160 Meg RAM/NT4 machine starts to > thrash. I fully realise that my design is not the most efficient, but > the enormous memory requirements of DOM don't help. Hence why I'm trying > to find an alternative memory representation foir the documents. I could > do a total rewrite, but the web site I do for is an unpaid voluntary > effort, and I can't spare it that much time. > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From tpassin@home.com Thu Sep 20 16:19:00 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 20 Sep 2001 11:19:00 -0400 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: Message-ID: <000801c141e7$93f45a40$7cac1218@reston1.va.home.com> [Alexandre Fayolle] > > You don't give much details on what your application is made of. However, > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > which are both more lightweight and faster implementations of DOM (though > not as compliant as 4DOM) > Considering the memory usage issues with pDomlette that I posted about last week, I'd say you would want to try cDomlette instead, based on what Alan Kennedy said about his system. Cheers, Tom P From Alexandre.Fayolle@logilab.fr Thu Sep 20 16:42:08 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 20 Sep 2001 17:42:08 +0200 (CEST) Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. In-Reply-To: <000801c141e7$93f45a40$7cac1218@reston1.va.home.com> Message-ID: On Thu, 20 Sep 2001, Thomas B. Passin wrote: > [Alexandre Fayolle] > > > > You don't give much details on what your application is made of. However, > > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > > which are both more lightweight and faster implementations of DOM (though > > not as compliant as 4DOM) > > > Considering the memory usage issues with pDomlette that I posted about last > week, I'd say you would want to try cDomlette instead, based on what Alan > Kennedy said about his system. Sure, but, AFAIK, cDomlette is still read-only, so if he needs to change things in the DOM during processing, cDomlette won't let him. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From tpassin@home.com Thu Sep 20 17:22:21 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 20 Sep 2001 12:22:21 -0400 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: Message-ID: <001401c141f0$6d66e1a0$7cac1218@reston1.va.home.com> [Alexandre Fayolle] > On Thu, 20 Sep 2001, Thomas B. Passin wrote: > > > [Alexandre Fayolle] > > > > > > You don't give much details on what your application is made of. However, > > > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > > > which are both more lightweight and faster implementations of DOM (though > > > not as compliant as 4DOM) > > > > > Considering the memory usage issues with pDomlette that I posted about last > > week, I'd say you would want to try cDomlette instead, based on what Alan > > Kennedy said about his system. > > Sure, but, AFAIK, cDomlette is still read-only, so if he needs to change > things in the DOM during processing, cDomlette won't let him. > Maybe he can read his hundreds of files into cDOmlette, and anything he has to create he can build a regular DOM for. The best of both worlds??? Cheers Tom P From Mike.Olson@fourthought.com Thu Sep 20 17:25:59 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 20 Sep 2001 10:25:59 -0600 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: Message-ID: <3BAA1897.6665AB36@fourthought.com> Alexandre Fayolle wrote: > > > Sure, but, AFAIK, cDomlette is still read-only, so if he needs to change > things in the DOM during processing, cDomlette won't let him. FYI, by next release it should be r/w Mike > > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From larsga@garshol.priv.no Thu Sep 20 23:49:57 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Sep 2001 00:49:57 +0200 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. In-Reply-To: <3BA9EACF.BB0394F5@xhaus.com> References: <3BA9EACF.BB0394F5@xhaus.com> Message-ID: * Xhaus Main Account | | A few months ago, someone posted on this list (I think it was Eliot | Kimber) that there were small modifications that could made that | would make xpath expressions usable against proprietary object | hierarchies/models. But there was no indication given of how to | achieve that. I don't think this will solve your problem, but you can find Eliot's paper about this at --Lars M. From pyxml@xhaus.com Fri Sep 21 09:01:19 2001 From: pyxml@xhaus.com (Xhaus Main Account) Date: Fri, 21 Sep 2001 09:01:19 +0100 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: Message-ID: <3BAAF3CF.83F2A749@xhaus.com> Thanks for the tips Alexander. However, from what I can see, pDomlette and cDomlette are not suitable for my needs due to incomplete DOM support (I'll briefly explain my app below). There is no support for "geteElementsByTagName" and "getElementsByTagNameNS", which I use. That's not a major problem, since I can use xpath expressions to replace them. However, I do need support for "importNode". I can see that pDomlette does not have it, and I think cDomlette doesn't have it either (I checked by doing "strings cDomlettec.pyd | grep importNode". Am I wrong here?) Brief summary of the app: I have ~250 members of a scientific association, all of whom have contact details, and lists of publications. I have to generate a home page for each member, "directory lists" for all members by name, by country (of which there are 24) and by "Research speciality", of which there 9. Each member has at least two files. 1. A "wrapper" file 2. A data file

Dr.

Michael T. Collins President University of Wisconsin

School of Veterinary Medicine Wisconsin 53706-1102 USA

(608) 262-8457(608) 265-6463 mcollin5@facstaff.wisc.edu http://www.vetmed.wisc.edu/pbs/johnes/mtc.html 3. And possibly some files for publications lists

Chiodini RJ Collins MT Bassey EOE 1995 Proceedings of the Fourth International Colloquium on Paratuberculosis.397 pp.

When the member wrapper file is loaded, the xlinks are processed, and any "transclude" xlinks (non-standard, I know) are replaced with the DOM tree of the file they refer to. The resulting DOM is then processed with XSLT to generate a "member home page", like this http://www.paratuberculosis.org/members/collins.htm The publications xlinks point to files like this Proc. 6th Intl. Coll. Paratuberculosis: Manning EJB, Collins MT (eds) Modification of a bovine ELISA to detect camelid antibodies to Mycobacterium paratuberculosis. 1999 Kramsky JA1 Miller DS2 Hope A3 Collins MT1 Australia Mycobacterium paratuberculosis infection, or Johne's disease, has a reportedly low prevalence in South American camelid populations. .............. When it appears on the member home page, only the summary details of the abstract are extracted. When the entire abstract is to used to generate an "abstract page", like this one http://www.paratuberculosis.org/proc6/abst4_30.htm I use another wrapper file such as And so on and so on. I also generate a lot of files from scratch, by creating DOMs and then populating them with the data from the members data files. The resulting DOMs are then processed with XSLT to generate "directory listings", such as http://www.paratuberculosis.org/member/byname.htm http://www.paratuberculosis.org/country/us.htm http://www.paratuberculosis.org/research/crohns.htm When I started this years ago, before I discovered Python, I used to use a command line XSLT processsor (XT) and character entities to manage the whole network of files. However, there was too much manual maintenance of data files, so when I started using Python, I automated the maintenance using DOM. It worked fine until the membership of the association grew, and now I'm finding that I have to break the operations up and do them separately in separate runs, because otherwise I run out of memory. To summarise: I'm doing a lot of DOM manipulations. I cut branches off DOMs, tack new branches onto DOMs, and generate a lot of DOMs from scratch, and populate them with the data from other DOMs. I'll have a closer look at the code for pDomlette, and see if I can hack some support for importNode into it, which would solve my problem. Cheers, Alan. Alexandre Fayolle wrote: > Hello, > > >From the description of your situation, I think that using 4SuiteServer > could be a great help. I believe 4SS supports XLink. Depending on the use > you make of XPath, this can be taken care of by the RDF-based indexing > feature of 4XSLT. > > You don't give much details on what your application is made of. However, > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > which are both more lightweight and faster implementations of DOM (though > not as compliant as 4DOM) From pyxml@xhaus.com Fri Sep 21 09:04:03 2001 From: pyxml@xhaus.com (Xhaus Main Account) Date: Fri, 21 Sep 2001 09:04:03 +0100 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: <000801c141e7$93f45a40$7cac1218@reston1.va.home.com> Message-ID: <3BAAF473.FAC810@xhaus.com> Folks, Thanks for all the helpful tips about different DOMs to use. However, I'd still like to pose the original question as a separate thread. Does anyone know if it is possible to use xpath expressions and xslt stylesheets to process proprietary object structures, i.e. hierarchies of arbitrary python objects? Many thanks, Alan. "Thomas B. Passin" wrote: > [Alexandre Fayolle] > > > > You don't give much details on what your application is made of. However, > > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > > which are both more lightweight and faster implementations of DOM (though > > not as compliant as 4DOM) > > > Considering the memory usage issues with pDomlette that I posted about last > week, I'd say you would want to try cDomlette instead, based on what Alan > Kennedy said about his system. > > Cheers, > > Tom P > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From Alexandre.Fayolle@logilab.fr Fri Sep 21 10:12:14 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 21 Sep 2001 11:12:14 +0200 (CEST) Subject: [XML-SIG] pDomlette and importNode In-Reply-To: <3BAAF3CF.83F2A749@xhaus.com> Message-ID: On Fri, 21 Sep 2001, Xhaus Main Account wrote: > Thanks for the tips Alexander. > > However, from what I can see, pDomlette and cDomlette are not suitable for my > needs due to incomplete DOM support (I'll briefly explain my app below). > > However, I do need support for "importNode". I can see that pDomlette does not > have it, and I think cDomlette doesn't have it either (I checked by doing > "strings cDomlettec.pyd | grep importNode". Am I wrong here?) cDomlette is for now readonly, so it won't support importNode. pDomlette does not. We needed this in Narval, as well as removeAttributeNS, so what we do is add the methods dynamically to the class at start up. Now, I'm pretty sure that if this is asked to the Fourthought people, they could consider adding the method to pDomlette in the next release (and to cDomlette which, I am told, should be read-write in the near future). In the meantime, here's a snippet which adds importNode and removeAttributeNS to pDomlette nodes: -----------------8<----------------------------- from Ft.Lib.pDomlette import Element,Document def myremoveAttributeNS(self,nsURI,lname): node = self.getAttributeNodeNS(nsURI,lname) if node: self.removeAttributeNode(node) return Element.removeAttributeNS = myremoveAttributeNS def _changeowner(node,owner): node.ownerDocument = owner for a in node.attributes or []: a.ownerDocument = owner [_changeowner(c,owner) for c in node.childNodes or []] def myimportNode(self,importedNode,deep=0): try: new_node = importedNode.cloneNode(deep) _changeowner(new_node,self) return new_node except Exception,e: print importedNode.__class__ raise e Document.importNode = myimportNode ---------------------8<--------------------------- Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From tpassin@home.com Fri Sep 21 13:55:25 2001 From: tpassin@home.com (Thomas B. Passin) Date: Fri, 21 Sep 2001 08:55:25 -0400 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: <3BAAF3CF.83F2A749@xhaus.com> Message-ID: <000d01c1429c$af3e6d80$7cac1218@cj64132b> [Xhaus Main Account] > When I started this years ago, before I discovered Python, I used to use a > command line XSLT processsor (XT) and character entities to manage the whole > network of files. However, there was too much manual maintenance of data > files, so when I started using Python, I automated the maintenance using DOM. I'd reconsider using xslt. There could be another approach that would make it work out. I have a project where I produce a number of xml data files from a relational database, combine them, merge them into a template, merge that with hand-crafted text that is different for each different version I want to produce, then transform to html. All this runs from one batch file, using four separate stylesheets. All the customized information is kept in a single resource file (xml, of course), and that is the only thing that has to get changed for any customization, so maintenance is easy. Maybe xslt would really do the job for you, too. Cheers, Tom P From Mike.Olson@fourthought.com Fri Sep 21 16:32:35 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 21 Sep 2001 09:32:35 -0600 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: <000801c141e7$93f45a40$7cac1218@reston1.va.home.com> <3BAAF473.FAC810@xhaus.com> Message-ID: <3BAB5D93.7D9F0DF8@fourthought.com> Xhaus Main Account wrote: > > Folks, > > Thanks for all the helpful tips about different DOMs to use. > > However, I'd still like to pose the original question as a separate thread. > > Does anyone know if it is possible to use xpath expressions and xslt > stylesheets to process proprietary object structures, i.e. hierarchies of > arbitrary python objects? Using 4XPath? As long as thos objects conform to the DOM Levle II interfaces :) If you write your own XPath parser, or use another, then anything is possible. Mike > > Many thanks, > > Alan. > > "Thomas B. Passin" wrote: > > > [Alexandre Fayolle] > > > > > > You don't give much details on what your application is made of. However, > > > if you're using 4DOM, try switching to pDomlette (or even to cDomlette), > > > which are both more lightweight and faster implementations of DOM (though > > > not as compliant as 4DOM) > > > > > Considering the memory usage issues with pDomlette that I posted about last > > week, I'd say you would want to try cDomlette instead, based on what Alan > > Kennedy said about his system. > > > > Cheers, > > > > Tom P > > > > _______________________________________________ > > XML-SIG maillist - XML-SIG@python.org > > http://mail.python.org/mailman/listinfo/xml-sig > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From dallingham@users.sourceforge.net Sat Sep 22 22:35:29 2001 From: dallingham@users.sourceforge.net (Don Allingham) Date: 22 Sep 2001 15:35:29 -0600 Subject: [XML-SIG] How do I write an efficient parser? Message-ID: <1001194530.887.27.camel@wallace> My project has been using the SAX parser under 1.5.2 and 2.X. The XML file contains genealogy information. When I have about 2000 people in the database, the expat parser reads it in a reasonable amount of time - a few seconds, not too long for the user. However, when I starts reaching the 6000-7000 entries, it can take up to a minute or longer. At 50000, it takes several minutes, which is just unacceptable. All the parser is doing is building an in-memory structure. The best I can tell is that python's function call overhead is killing performance. Is there a way to write a more efficient parser without having to resort to C? -- Don Allingham dallingham@users.sourceforge.net GPG/PGP Public Key at http://members.home.net/donaldallingham/dallingham.key From martin@loewis.home.cs.tu-berlin.de Sun Sep 23 10:57:07 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 23 Sep 2001 11:57:07 +0200 Subject: [XML-SIG] How do I write an efficient parser? In-Reply-To: <1001194530.887.27.camel@wallace> (message from Don Allingham on 22 Sep 2001 15:35:29 -0600) References: <1001194530.887.27.camel@wallace> Message-ID: <200109230957.f8N9v7q01112@mira.informatik.hu-berlin.de> > My project has been using the SAX parser under 1.5.2 and 2.X. The XML > file contains genealogy information. When I have about 2000 people in > the database, the expat parser reads it in a reasonable amount of time - > a few seconds, not too long for the user. > > However, when I starts reaching the 6000-7000 entries, it can take up to > a minute or longer. At 50000, it takes several minutes, which is just > unacceptable. This suggests linear complexity, right? This is how it should be, parsers do have linear complexity by nature. > Is there a way to write a more efficient parser without having to resort > to C? You can't do better than linear complexity. However, you can try to speed up the processing of every event. I don't know your application; most likely, you can gain speed by tweaking it - I recommend to run the application in a profiler and find out what it really is that consumes the time. If you want to reduce the processing done in the XML libraries, you can use the raw expat API instead of using SAX. That saves you the indirections done in expatreader, and it saves you reports of events on which your handlers won't act. If that still is too slow, you can try using sgmlop instead of expat. sgmlop has a number of limitations, but they may not show up in your application. BTW, which Python version did you use for the timings? Regards, Martin From tpassin@home.com Sun Sep 23 15:39:00 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 23 Sep 2001 10:39:00 -0400 Subject: [XML-SIG] How do I write an efficient parser? References: <1001194530.887.27.camel@wallace> <200109230957.f8N9v7q01112@mira.informatik.hu-berlin.de> Message-ID: <000c01c1443d$7c0c76d0$7cac1218@cj64132b> [Martin v. Loewis] [Don Allingham] > > My project has been using the SAX parser under 1.5.2 and 2.X. The XML > > file contains genealogy information. When I have about 2000 people in > > the database, the expat parser reads it in a reasonable amount of time - > > a few seconds, not too long for the user. > > > > However, when I starts reaching the 6000-7000 entries, it can take up to > > a minute or longer. At 50000, it takes several minutes, which is just > > unacceptable. > > This suggests linear complexity, right? This is how it should be, > parsers do have linear complexity by nature. > > > Is there a way to write a more efficient parser without having to resort > > to C? > It doesn't sound linear to me, from the sketchy information that Don provided. If it were linear it would have taken about a minute for 50,000, not several minutes. Still, a minute might be too long for him. One thing that has been known to slow things down non-linearly is if you build up a really long string for output (like tens of thousands of bytes) by concatenating little pieces. This is about Python strings, not XML processing per se. If that is what Don's program does, an easy way to get a dramatic improvement would be to append each little text fragment to a list, then do string.join() on the list at the end. I've seen this turn a dog of a program into a snappy performer. Let's hope that's Don's situation. Cheers, Tom P From pyxml@xhaus.com Mon Sep 24 08:26:10 2001 From: pyxml@xhaus.com (Alan Kernnedy) Date: Mon, 24 Sep 2001 08:26:10 +0100 Subject: [XML-SIG] pDomlette and importNode References: Message-ID: <3BAEE012.8FBB5F88@xhaus.com> Alexandre, Thanks for the elegant and Pythonic solution to my "importNode" problem. I must learn to adapt the Pythonic style more myself, i.e. extend a class through manipulation of its definition rather than going and hacking the original source file. I've now got my little app running with the pDomlettte, and the change is fantastic. The memory footprint is hugely smaller, and the speed has increased by about 8 to 10 times. Many thanks, Alan. Alexandre Fayolle wrote: > On Fri, 21 Sep 2001, Xhaus Main Account wrote: > > > Thanks for the tips Alexander. > > > > However, from what I can see, pDomlette and cDomlette are not suitable for my > > needs due to incomplete DOM support (I'll briefly explain my app below). > > > > However, I do need support for "importNode". I can see that pDomlette does not > > have it, and I think cDomlette doesn't have it either (I checked by doing > > "strings cDomlettec.pyd | grep importNode". Am I wrong here?) > > cDomlette is for now readonly, so it won't support importNode. pDomlette > does not. We needed this in Narval, as well as removeAttributeNS, so what > we do is add the methods dynamically to the class at start up. Now, I'm > pretty sure that if this is asked to the Fourthought people, they could > consider adding the method to pDomlette in the next release (and to > cDomlette which, I am told, should be read-write in the near future). > > In the meantime, here's a snippet which adds importNode and > removeAttributeNS to pDomlette nodes: > > -----------------8<----------------------------- > from Ft.Lib.pDomlette import Element,Document > > def myremoveAttributeNS(self,nsURI,lname): > node = self.getAttributeNodeNS(nsURI,lname) > if node: > self.removeAttributeNode(node) > return > > Element.removeAttributeNS = myremoveAttributeNS > > def _changeowner(node,owner): > node.ownerDocument = owner > for a in node.attributes or []: > a.ownerDocument = owner > [_changeowner(c,owner) for c in node.childNodes or []] > > def myimportNode(self,importedNode,deep=0): > try: > new_node = importedNode.cloneNode(deep) > _changeowner(new_node,self) > return new_node > except Exception,e: > print importedNode.__class__ > raise e > > Document.importNode = myimportNode > ---------------------8<--------------------------- > > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). From pyxml@xhaus.com Mon Sep 24 08:43:19 2001 From: pyxml@xhaus.com (Alan Kernnedy) Date: Mon, 24 Sep 2001 08:43:19 +0100 Subject: [XML-SIG] Using xpath/xslt on proprietary object structures. References: <3BAAF3CF.83F2A749@xhaus.com> <000d01c1429c$af3e6d80$7cac1218@cj64132b> Message-ID: <3BAEE417.E1C4797D@xhaus.com> "Thomas B. Passin" wrote: > I'd reconsider using xslt. There could be another approach that would make > it work out. I have a project where I produce a number of xml data files > from a relational database, combine them, merge them into a template, merge > that with hand-crafted text that is different for each different version I > want to produce, then transform to html. All this runs from one batch file, > using four separate stylesheets. Tom, Thanks for the tips. I fully recognise that the "design" of my app is less than optimal, and that one of these days I need to rewrite it. I took my current path because I needed to load lots of files several times. For example, the information in a "Paper abstract" file needed to be included in a member's home page, AND used to construct "Chapter" and "Book" style tables of contents, as well as be processed into its own standalone page. I reasoned that loading each file once and always working off the same copy would speed up the parsing load in the app. Historical: I started using python before 4Suite was fully developed, and in fact used JPython, Xerces and Xalan to do all the processing. Now that was a memory hog! I switched to CPython and 4Suite when the memory usage of JPython and the Apache went through the roof. Taking my current approach of changing in "baby steps", my next step would be to use XPATHs document() function to be transferring nodesets between DOMs at HTML page generation time, rather than cutting and splicing chunks of DOM trees. In the medium term, I do really need to get it into a database. I do need multiple views of data, and database queries are the way to get them, not writing a combination of Xpath and Python code. In the longer term, I should really have a look at the more "exotic" features in 4Suite, such the XLink, DBDom and RDF support. I'm sure that they would enable more elegant, compact and efficient solutions to my problems. It's just a case of finding the time to investigate the capabilities of these modules, to see what they can do. Many thanks to all who offered tips and advice. Cheers, Alan. From pyxml@xhaus.com Mon Sep 24 08:47:37 2001 From: pyxml@xhaus.com (Alan Kernnedy) Date: Mon, 24 Sep 2001 08:47:37 +0100 Subject: [XML-SIG] pDomlette bug/non standard method? References: Message-ID: <3BAEE519.FC9853F1@xhaus.com> Folks, In switching to pDomlette from 4DOM, I was surprised by one aspect of text nodes. According to the DOM spec, I should be able to acess the character data in a text node through the attribute "nodeValue", which is what my code used successfully with 4DOM, e.g myData = myTextNode.nodeValue Now, having examined the code, it appears that I have to use the attribute "data", i.e. I access the contents of a text node using myData = myTextNode.data Am I missing something here? Regards, Alan. From Alexandre.Fayolle@logilab.fr Mon Sep 24 11:32:14 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 24 Sep 2001 12:32:14 +0200 (CEST) Subject: [XML-SIG] pDomlette bug/non standard method? In-Reply-To: <3BAEE519.FC9853F1@xhaus.com> Message-ID: On Mon, 24 Sep 2001, Alan Kernnedy wrote: > According to the DOM spec, I should be able to acess the character > data in a text node through the attribute "nodeValue", which is what > my code used successfully with 4DOM, e.g > > myData = myTextNode.nodeValue > > Now, having examined the code, it appears that I have to use the > attribute "data", i.e. I access the contents of a text node using > > myData = myTextNode.data > > Am I missing something here? Well, I'd say that this is part of the price to pay for the memory footprint decrease, and the speed improvement you get from switching from 4DOM to pDomlette (and it also explains the 'ette' at the end of pDomlette). The idea is to get a DOM-like object structure, but not with all the features mentionned in the spec (which make the DOM very heavyweight). To be more specific to the case at hand, nodeValue is an attribute of the Node interface, and data is an attribute of the CharacterData interface (the Text interface inherits from this interface, which in turn inherits from Node). pDomlette does not have a Node interface, and uses a pythonic Implicit Interface pattern instead for classes that should implement it. The consequence of this is that redundant methods/attributes are not implemented (you'd have to update both data and nodeValue when one of them is changed, or use __getattr__, and both of these have an impact on performance or memory usage. Using data in 4DOM is safe (if you want to preserve compatibility with this DOM implementation), so I'd suggest you use data to get the contents of a Text node. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Juergen Hermann" Hi! Now that we have a 4Suite 0.11.1 release, what is missing for release 0.7 of PyXML? From Mike.Olson@fourthought.com Mon Sep 24 16:27:33 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 24 Sep 2001 09:27:33 -0600 Subject: [XML-SIG] pDomlette bug/non standard method? References: Message-ID: <3BAF50E5.68604AE7@fourthought.com> Alexandre Fayolle wrote: > > On Mon, 24 Sep 2001, Alan Kernnedy wrote: > > > According to the DOM spec, I should be able to acess the character > > data in a text node through the attribute "nodeValue", which is what > > my code used successfully with 4DOM, e.g > > > > myData = myTextNode.nodeValue > > > > Now, having examined the code, it appears that I have to use the > > attribute "data", i.e. I access the contents of a text node using > > > > myData = myTextNode.data > > > > Am I missing something here? > > Well, I'd say that this is part of the price to pay for the memory > footprint decrease, and the speed improvement you get from switching from > 4DOM to pDomlette (and it also explains the 'ette' at the end of > pDomlette). I agree with you Alexandre, but this specific use case should work. Alan, what version of 4Suite are you using? > > The idea is to get a DOM-like object structure, but not with all the > features mentionned in the spec (which make the DOM very heavyweight). > > To be more specific to the case at hand, nodeValue is an attribute of the > Node interface, and data is an attribute of the CharacterData interface > (the Text interface inherits from this interface, which in turn inherits > from Node). pDomlette does not have a Node interface, and uses a pythonic > Implicit Interface pattern instead for classes that should implement it. > The consequence of this is that redundant methods/attributes are not > implemented (you'd have to update both data and nodeValue when one of them > is changed, or use __getattr__, and both of these have an impact on > performance or memory usage. > > Using data in 4DOM is safe (if you want to preserve compatibility with > this DOM implementation), so I'd suggest you use data to get the contents > of a Text node. > > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From mal@lemburg.com Mon Sep 24 17:36:44 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 24 Sep 2001 18:36:44 +0200 Subject: [XML-SIG] ANN: eGenix.com mx EXPERIMENTAL Package 0.5.0 Message-ID: <3BAF611C.C4AEA3D1@lemburg.com> mxTidy should be of interest for this list... ________________________________________________________________________ ANNOUNCING: eGenix.com mx EXPERIMENTAL Extension Package for Python Version 0.5.0 Experimental Python extensions providing important and useful services for Python programmers. ________________________________________________________________________ WHAT IS IT ?: The eGenix.com mx EXPERIMENTAL Extensions for Python are a collection of alpha and beta quality software tools for Python which will be integrated into the other mx Extension Packages after they have matured to professional quality tools. Python is an object-oriented Open Source programming language which runs on all modern platforms (http://www.python.org/). By integrating ease-of-use, clarity in coding, enterprise application connectivity and rapid application design, Python establishes an ideal programming platform for todays IT challenges. ________________________________________________________________________ WHAT'S NEW ? This release fixes a serious bug in the mxTidy package which made it unusable for most non-ASCII content. In addition to this bug some minor tweaks were done, mostly to the distutils setup. ________________________________________________________________________ EGENIX.COM MX EXPERIMENTAL PACKAGE OVERVIEW: mxNumber - Python Interface to GNU MP Number Types mxNumber provides direct access to the high performance numeric types available in the GNU Multi-Precision Lib (GMP). This library is licensed under the LGPL and runs on practically all Unix platforms. eGenix.com has ported the GMP lib to Windows, to also provide our Windows users with the added benefit of being able to do arbitrary precision calculations. The package currently provide these numerical types: 1. Integer(value) -- arbitrary precision integers much like Python longs only faster 2. Rational(nom,denom) -- rational numbers with Integers as numerator and denominator 3. Float(value[,prec]) -- floating point number with at least prec bits precision 4. FareyRational(value, maxden) -- calculate the best rational represenation n/d of value such that d < maxden mxTidy - Interface to HTML Tidy (HTML/XML cleanup tool) mxTidy provides a Python interface to a thread-safe, library version of the HTML Tidy. command line tool. HTML Tidy helps you to cleanup coding errors in HTML and XML files and produce well-formed HTML, XHTML or XML as output. This allows you to preprocess web-page for inclusion in XML repositories, prepare broken XML files for validation and also makes it possible to write converters from well-known word processing applications such as MS Word to other structured data representations by using XML as intermediate format. ________________________________________________________________________ WHERE CAN I GET IT ? The download archives and instructions for installing the packages can be found at: http://www.lemburg.com/files/python/ ________________________________________________________________________ WHAT DOES IT COST ? The EXPERIMENTAL packages uses different licenses in its subpackages. Please refer to the subpackage documentation for details. Some of them may be integrated into the BASE package, others will be integrated into the COMMERCIAL package. The package comes with full source code ________________________________________________________________________ WHERE CAN I GET SUPPORT ? There currently is no support for these packages, since they are still in alpha or beta. Feedback is welcome, though, so don't hesitate to write us about the quirks you find. ________________________________________________________________________ REFERENCE:

eGenix.com mx EXPERIMENTAL Extension Package 0.5.0 - eGenix.com mx EXPERIMENTAL Extension Package 0.5.0 with precompiled binaries for Windows and Linux. (24-Sep-2001) ________________________________________________________________________ Enjoy, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Mon Sep 24 22:45:53 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 24 Sep 2001 23:45:53 +0200 Subject: [XML-SIG] Release In-Reply-To: (jh@web.de) References: Message-ID: <200109242145.f8OLjrT01216@mira.informatik.hu-berlin.de> > Now that we have a 4Suite 0.11.1 release, what is missing for release > 0.7 of PyXML? Mainly, the xpath and xslt modules need to be integrated into PyXML once more, and need to be made to work even without any 4Suite. Regards, Martin From rsalz@zolera.com Tue Sep 25 02:24:57 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 24 Sep 2001 21:24:57 -0400 Subject: [XML-SIG] Release References: <200109242145.f8OLjrT01216@mira.informatik.hu-berlin.de> Message-ID: <3BAFDCE9.DE2133EB@zolera.com> The XPath that's in PyXML doesn't need 4Suite, so it would be a matter of scanning diffs and double-checking. I volunteer to re-integrate xpath. /r$ -- Zolera Systems, Securing web services (XML, SOAP, Signatures, Encryption) http://www.zolera.com From uche.ogbuji@fourthought.com Tue Sep 25 02:39:40 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 24 Sep 2001 19:39:40 -0600 Subject: [4suite] Re: [XML-SIG] Release References: <200109242145.f8OLjrT01216@mira.informatik.hu-berlin.de> Message-ID: <3BAFE05C.7FFC7FC2@fourthought.com> "Martin v. Loewis" wrote: > > > Now that we have a 4Suite 0.11.1 release, what is missing for release > > 0.7 of PyXML? > > Mainly, the xpath and xslt modules need to be integrated into PyXML > once more, and need to be made to work even without any 4Suite. Good news and bad news here. Bad news: we've reorganized 4Suite to make it easier for 4Suite or 4SS users to install. This will probably complicate a migration of the latest code to PyXML. Good news: we're on a truly accelerated schedule for 4Suite 1.0. Therefore it will almost certainly be less than a couple of months before the fork is mended. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From martin@loewis.home.cs.tu-berlin.de Tue Sep 25 07:42:04 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 25 Sep 2001 08:42:04 +0200 Subject: [XML-SIG] Release In-Reply-To: <3BAFDCE9.DE2133EB@zolera.com> (message from Rich Salz on Mon, 24 Sep 2001 21:24:57 -0400) References: <200109242145.f8OLjrT01216@mira.informatik.hu-berlin.de> <3BAFDCE9.DE2133EB@zolera.com> Message-ID: <200109250642.f8P6g4H00864@mira.informatik.hu-berlin.de> > The XPath that's in PyXML doesn't need 4Suite, so it would be a matter > of scanning diffs and double-checking. > > I volunteer to re-integrate xpath. Please go ahead. Please have a look at README.4XPath, and update it where necessary: it should record all files that PyXML changes with respect to 4Suite. I think there still is the issue that the 4Suite authors want us to integrate XPathParserBase and XPathParser unmodified, whereas PyXML currently modifies them to use the YAPPS parser. It would be possible to integrate it unmodified (or leave it out altogether) if the API in xml.xpath (Evaluate, Compile) was used throughout. Regards, Martin From Alexandre.Fayolle@logilab.fr Tue Sep 25 08:17:11 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 25 Sep 2001 09:17:11 +0200 (CEST) Subject: [XML-SIG] pDomlette bug/non standard method? In-Reply-To: <3BAF50E5.68604AE7@fourthought.com> Message-ID: On Mon, 24 Sep 2001, Mike Olson wrote: > > Well, I'd say that this is part of the price to pay for the memory > > footprint decrease, and the speed improvement you get from switching from > > 4DOM to pDomlette (and it also explains the 'ette' at the end of > > pDomlette). > > I agree with you Alexandre, but this specific use case should work. Oh, great. Is this new in 4Suite 0.11.1? It used not to work when we ported Narval to pDomlette. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From pyxml@xhaus.com Mon Sep 24 21:57:06 2001 From: pyxml@xhaus.com (Alan Kernnedy) Date: Mon, 24 Sep 2001 21:57:06 +0100 Subject: [XML-SIG] pDomlette bug/non standard method? References: <3BAF50E5.68604AE7@fourthought.com> Message-ID: <3BAF9E22.94451316@xhaus.com> Mike Olson wrote: > Alan, what version of 4Suite are you using? Mike, I'm using version 0.11.0, on both Python 2.0 and 2.1, and getting these results. I ran this piece of code on both interpreters. The output follows. <--code------------------------------------------------> import sys print "Python version is:", sys.version print "PYTHONPATH is:", sys.path import Ft.Lib.pDomlette print "4Version is:", Ft.__version__ doc = "Some text" domTree = Ft.Lib.pDomlette.PyExpatReader().fromString(doc) textNode = domTree.documentElement.childNodes[0] print "textNode.data = \"%s\"" % textNode.data print "textNode.nodeValue = \"%s\"" % textNode.nodeValue <--code------------------------------------------------> Python 2.0 output --------------------------------------> Python version is: 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] PYTHONPATH is: ['', 'c:\\temp', 'c:\\python20\\dlls', 'c:\\python20\\lib', 'c:\\python20\\lib\\plat-win', 'c:\\python20\\lib\\lib-tk', 'c:\\python20'] 4Version is: 0.11.0 textNode.data = "Some text" textNode.nodeValue = "" Python 2.0 output --------------------------------------> Python 2.1 output --------------------------------------> Python version is: 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] PYTHONPATH is: ['', 'C:\\TEMP', 'C:\\python21\\DLLs', 'C:\\python21\\lib', 'C:\\python21\\lib\\plat-win', 'C:\\python21\\lib\\lib-tk', 'C:\\python21'] 4Version is: 0.11.0 textNode.data = "Some text" textNode.nodeValue = "" Python 2.1 output --------------------------------------> The reason why I checked paths is because the Python 2.0 installation had an old 4Suite 0.10.2 documentation directory. Although the doc directory remained behind, I'm sure that the 0.11.0 installation wrote over the old 0.10.2 Ft.Lib directory. I checked the two "pDomlette.py" files between the two different 0.11.0 installations, and there was no difference. I was a little confused to see in the pDomlette.py file that there are only two instances of the name "nodeValue", once setting it to an empty string in the Node __init__ method, and once using it to index self.__dict__ in the __setattr__ method of the Attribute class. With my limited knowledge of Python class and object behind-the-scenes manipulation (and the performance costs thereof), I expected to see use of the __getattr__ technique on the Node base class, so that it would be inherited by every derived dom class. Regards, Alan. From Alexandre.Fayolle@logilab.fr Tue Sep 25 09:43:41 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 25 Sep 2001 10:43:41 +0200 (CEST) Subject: [XML-SIG] pDomlette bug/non standard method? In-Reply-To: <3BAF9E22.94451316@xhaus.com> Message-ID: On Mon, 24 Sep 2001, Alan Kernnedy wrote: > Mike Olson wrote: > > > Alan, what version of 4Suite are you using? > > Mike, > > I'm using version 0.11.0, on both Python 2.0 and 2.1, and getting these results. > I ran this piece of code on both interpreters. The output follows. You should upgrade to 0.11.1 which has support for both accessors : >>> import sys >>> print "Python version is:", sys.version Python version is: 2.1 (#1, Jul 16 2001, 19:00:33) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] >>> print "PYTHONPATH is:", sys.path PYTHONPATH is: ['', '/home/alf/Narval', '/usr/lib/python2.1', '/usr/lib/python2.1/plat-linux2', '/usr/lib/python2.1/lib-tk', '/usr/lib/python2.1/lib-dynload', '/usr/lib/python2.1/site-packages'] >>> import Ft.Lib.pDomlette >>> print "4Version is:", Ft.__version__ 4Version is: 0.11.1 >>> doc = "Some text" >>> domTree = Ft.Lib.pDomlette.PyExpatReader().fromString(doc) >>> textNode = domTree.documentElement.childNodes[0] >>> print "textNode.data = \"%s\"" % textNode.data textNode.data = "Some text" >>> print "textNode.nodeValue = \"%s\"" % textNode.nodeValue textNode.nodeValue = "Some text" Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From noreply@sourceforge.net Tue Sep 25 14:31:06 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 25 Sep 2001 06:31:06 -0700 Subject: [XML-SIG] [ pyxml-Bugs-464796 ] can't import xpath, windows python 2.1 Message-ID: Bugs item #464796, was opened at 2001-09-25 06:31 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=464796&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: can't import xpath, windows python 2.1 Initial Comment: I installed 4Suite-0_11_1_win32-py2_1.exe, and (presumably since there's no __init__.py in the _xmlplus directory), I can't import xml.xpath. Not being sure what tricky stuff the package does with the import mechanism, I'm stuck. >>> import _xmlplus Traceback (most recent call last): File "", line 1, in ? ImportError: No module named _xmlplus >>> import xml.xpath Traceback (most recent call last): File "", line 1, in ? ImportError: No module named xpath >>> John ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=464796&group_id=6473 From Mike.Olson@fourthought.com Tue Sep 25 17:03:36 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 25 Sep 2001 10:03:36 -0600 Subject: [XML-SIG] pDomlette bug/non standard method? References: Message-ID: <3BB0AAD8.6C44165B@fourthought.com> Alexandre Fayolle wrote: > > On Mon, 24 Sep 2001, Mike Olson wrote: > > > > Well, I'd say that this is part of the price to pay for the memory > > > footprint decrease, and the speed improvement you get from switching from > > > 4DOM to pDomlette (and it also explains the 'ette' at the end of > > > pDomlette). > > > > I agree with you Alexandre, but this specific use case should work. > > Oh, great. Is this new in 4Suite 0.11.1? It used not to work when we > ported Narval to pDomlette. Not sure when it was added. I think it was in 0.11.0 thought Sadly, pDomlette is slowly losing it "ette" status. Mike > > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From paul@boddie.net Tue Sep 25 18:28:43 2001 From: paul@boddie.net (paul@boddie.net) Date: 25 Sep 2001 17:28:43 -0000 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation Message-ID: <20010925172843.27327.qmail@www2.nameplanet.com> Hello, I have been reading up about default namespaces in XML and it seems that the handling of default namespaces and their lack of any relationship with attributes can prove to be somewhat tiresome to deal with. For example, if I have a file like this... ...the "title" attribute has no namespace associated with it. However, if I then have a file like this... ...the "title" attribute has the URI of the "special" namespace associated with it. Now I suppose I could have omitted the "special" prefix for "title", but that would make the "ownership" of "title" application-specific, wouldn't it? (Meaning that since no namespace is defined, it is up to the application to find out what the attribute means.) If we assume that "title" belongs to the "special" namespace as noted in the second example, how can we conveniently note this in the first? Is the "special" namespace implicitly set up (due to the nature of the name of the element within which the default namespace is declared)? Now, none of this is specifically relevant to 4DOM, but what I'm trying to do is to find a convenient way of handling both of the above styles (or conformant variants of them) in order to be able to detect permissible attributes in a document using the getAttributeNS method on an Element, without having to find the element's own namespace in cases where the attribute doesn't have one. How do people deal with these cases with 4DOM? Could we have more demonstrations in the examples directory which is distributed with it? The traceNS.py program seems to deliberately ignore the issue - some lines in the program address it but are commented out, at least in PyXML 0.6.5. Should anyone be thinking of extending the DOM to make cases like those above more easy to detect and work with? Or am I doing strange things with XML that would shock its inventors to the very core of their collective being? ;-) Regards, Paul -- Get your firstname@lastname email for FREE at http://Nameplanet.com/?su From Mike.Olson@fourthought.com Wed Sep 26 17:06:50 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 26 Sep 2001 10:06:50 -0600 Subject: [XML-SIG] Re: [4suite] from CNR - Italy References: <3BB1D74B.94492765@libero.it> Message-ID: <3BB1FD1A.55784830@fourthought.com> Carla Attianese wrote: > > As I told you, I installed all the 4SuiteServer prerequisites, and among > them I installed PyXML 0.6.6 too. > Anyway, I send you a file containing the entire error message I receive > when I type "4ss --help". I cc'ed this to the XML-SIG because it looks like a PyXml problem. What OS are you on? It looks like pyexpat did not get compilied/installed properly. Mike > Thank you > Carla Attianese > > ------------------------------------------------------------------------ > ceback (most recent call last): > File "/home/carla/Pythonexe/bin/4ss", line 3, in ? > from FtServer.Console import Commands > File "/home/carla/Pythonexe/lib/python2.0/site-packages/FtServer/Console/Commands/__init__.py", line 11, in ? > import Add > File "/home/carla/Pythonexe/lib/python2.0/site-packages/FtServer/Console/Commands/Add/__init__.py", line 4, in ? > import Member > File "/home/carla/Pythonexe/lib/python2.0/site-packages/FtServer/Console/Commands/Add/Member.py", line 17, in ? > from FtServer.Console import CommandUtil > File "/home/carla/Pythonexe/lib/python2.0/site-packages/FtServer/Console/CommandUtil.py", line 4, in ? > from FtServer.Core.Lib import ConfigFile > File "/home/carla/Pythonexe/lib/python2.0/site-packages/FtServer/Core/Lib/ConfigFile.py", line 2, in ? > from Ft.Rdf.Serializers.Dom import Serializer > File "/home/carla/Pythonexe/lib/python2.0/site-packages/Ft/Rdf/Serializers/Dom.py", line 27, in ? > from Ft.Lib import pDomlette > File "/home/carla/Pythonexe/lib/python2.0/site-packages/Ft/Lib/pDomlette.py", line 718, in ? > from pDomletteReader import * > File "/home/carla/Pythonexe/lib/python2.0/site-packages/Ft/Lib/pDomletteReader > .py", line 27, in ? > from xml.parsers import expat > File "/home/carla/Pythonexe/lib/python2.0/site-packages/_xmlplus/parsers/expat.py", line 4, in ? > from pyexpat import * > ImportError: libexpat.so.0: cannot open shared object file: File o directory inesistente -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From sam@webslingerZ.com Wed Sep 26 17:49:53 2001 From: sam@webslingerZ.com (Sam Brauer) Date: Wed, 26 Sep 2001 12:49:53 -0400 (EDT) Subject: [XML-SIG] sax2 parsing from a string In-Reply-To: Message-ID: Can someone give me a brief example showing how to create a namespace-aware sax2 parser and use it to parse a string containing an XML document? I'm having no luck so far... Here's the sort of thing I'm trying now: import cStringIO import xml.sax import xml.sax.handler import xml.sax.saxutils import xml.sax.xmlreader xmlstring = """ Hello World """ myhandler = xml.sax.saxutils.XMLGenerator() parser = xml.sax.make_parser() parser.setFeature(xml.sax.handler.feature_namespaces, 1) parser.setContentHandler(myhandler) inputsource = xml.sax.xmlreader.InputSource() inbuffer = cStringIO.StringIO() inbuffer.write(xmlstring) inbuffer.seek(0) inputsource.setByteStream(inbuffer) parser.parse(inputsource) parser.close() Here's the output I get: Traceback (most recent call last): File "./saxtest.py", line 25, in ? parser.parse(inputsource) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 121, in parse self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed self._parser.Parse(data, isFinal) File "extensions/pyexpat.c", line 522, in CharacterData TypeError: not enough arguments; expected 4, got 2 If I replace the line: inputsource.setByteStream(inbuffer) with: inputsource.setCharacterStream(inbuffer) I get: Traceback (most recent call last): File "./saxtest.py", line 23, in ? parser.parse(inputsource) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 38, in parse source = saxutils.prepare_input_source(source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/saxutils.py", line 369, in prepare_input_source if os.path.isfile(sysid): File "/usr/local/lib/python2.0/posixpath.py", line 192, in isfile st = os.stat(path) TypeError: stat, argument 1: expected string, None found I'm using PyXML-0.6.6 and Python 2.0. Also (on a tangent), I think in xml.sax.saxutils.XMLGenerator and xml.sax.saxutils.XMLFilterBase that the characters() and ignorableWhitespace() methods need to have 4 arguments instead of 2... For example: def characters(self, content, start, length): self._out.write(escape(content[start:start+length])) instead of: def characters(self, content): self._out.write(escape(content)) But I may be wrong... Thanks for any help, Sam From paul@boddie.net Wed Sep 26 17:48:29 2001 From: paul@boddie.net (paul@boddie.net) Date: 26 Sep 2001 16:48:29 -0000 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation Message-ID: <20010926164829.4967.qmail@www2.nameplanet.com> Sorry about replying to my own posting but, after sending it just before leaving work yesterday, I realised that I should have found a more general forum and done some research there. It might, however, be interesting to others to post what I found, but I apologise for being somewhat off-topic. I managed to search some archives of the xml-dev@lists.xml.org mailing list (having thought that I might find some information on the W3C site, but didn't) and found a discussion which focuses on some examples which are almost identical in nature to the ones I provided. Here's a link to it: http://lists.xml.org/archives/xml-dev/200007/msg00505.html One of the more informative explanations appears in the following message: http://lists.xml.org/archives/xml-dev/200007/msg00638.html But, back to vague relevance for this list, does anyone have any hands-on experience with namespace processing along these lines? Might there be any plans in the PyXML world to possibly provide "helper methods" to find the meaning of attributes where such a meaning is inferred from the namespace of the parent element? Finally, is it reasonable to infer "meaning" in such a way? Are there any other likely interpretations for attributes without namespace prefixes? The reason for all these questions is that I really want to find out the definitive method of handling such issues in my XMLForms project: http://www.paul.boddie.net/Python/XMLForms/index.html Right now, I have to mandate that the XMLForms namespace be the default namespace in a document, because the above issue made it unclear how one should properly handle cases such as those I gave in my examples. It would be great, for a number of reasons, to remove this restriction. Paul -- Get your firstname@lastname email for FREE at http://Nameplanet.com/?su From Martin.v.Loewis@t-online.de Wed Sep 26 00:11:47 2001 From: Martin.v.Loewis@t-online.de (Martin v. Loewis) Date: Wed, 26 Sep 2001 01:11:47 +0200 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation In-Reply-To: <20010925172843.27327.qmail@www2.nameplanet.com> (paul@boddie.net) References: <20010925172843.27327.qmail@www2.nameplanet.com> Message-ID: <200109252311.f8PNBlv03013@mira.informatik.hu-berlin.de> > Now I suppose I could have omitted the "special" prefix for "title", > but that would make the "ownership" of "title" application-specific, > wouldn't it? (Meaning that since no namespace is defined, it is up > to the application to find out what the attribute means.) Correct. > If we assume that "title" belongs to the "special" namespace as > noted in the second example, how can we conveniently note this in > the first? By means of a default namespace? You cannot. > Is the "special" namespace implicitly set up (due to the nature of > the name of the element within which the default namespace is > declared)? No, it isn't. Any such usage would be up to the application. Of course, since all attribute evaluation is up the application, this is not an issue. Very few attributes are defined that have a meaning independent of the element in which they occur (e.g. xlink introduces some of those); for such attributes, you always _must_ qualify the attribute with a namespace prefix. To put it the other way 'round, any attribute that does not live in a well-understand namespace implicitly ought to be understood in the context of the element only. If the attribute makes no sense in the element (considering the namespace of the attribute if given), the application ought to complain. > Now, none of this is specifically relevant to 4DOM, but what I'm > trying to do is to find a convenient way of handling both of the > above styles (or conformant variants of them) in order to be able to > detect permissible attributes in a document using the getAttributeNS > method on an Element, without having to find the element's own > namespace in cases where the attribute doesn't have one. You need to know in advance whether you are expecting an attribute to live in a namespace or not. Both foo and prefix:foo may be present, and may mean different things. > How do people deal with these cases with 4DOM? Could we have more > demonstrations in the examples directory which is distributed with > it? The traceNS.py program seems to deliberately ignore the issue - > some lines in the program address it but are commented out, at least > in PyXML 0.6.5. Should anyone be thinking of extending the DOM to > make cases like those above more easy to detect and work with? Thinking of extending DOM is probably worthwhile, especially as a significant share of users thinks that DOM is broken as-is (they'd phrase it as "broken beyond repair", but I'm more optimistic). However, any thoughts on extending the DOM ought to be brought up to the W3C editors of the DOM specification, with DOM being a W3C recommendation (rather than a 4Suite invention). > Or am I doing strange things with XML that would shock its inventors > to the very core of their collective being? ;-) Not necessarily. It rather seems that you are overly complicating things, and that true XML gurus would say "Why don't you just ...". Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 27 06:55:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 27 Sep 2001 07:55:24 +0200 Subject: [XML-SIG] sax2 parsing from a string In-Reply-To: (message from Sam Brauer on Wed, 26 Sep 2001 12:49:53 -0400 (EDT)) References: Message-ID: <200109270555.f8R5tOW01542@mira.informatik.hu-berlin.de> > Can someone give me a brief example showing how to create a > namespace-aware sax2 parser and use it to parse a string containing an > XML document? I see a number of confusing information in your message, perhaps you can help making sense out of it. > parser = xml.sax.make_parser() > parser.setFeature(xml.sax.handler.feature_namespaces, 1) > parser.setContentHandler(myhandler) > inputsource = xml.sax.xmlreader.InputSource() > inbuffer = cStringIO.StringIO() > inbuffer.write(xmlstring) > inbuffer.seek(0) > inputsource.setByteStream(inbuffer) > parser.parse(inputsource) > parser.close() You don't need to close the parser if you use the .parse method; this is only for use as an IncremementalParser (i.e. through feed). > self._parser.Parse(data, isFinal) > File "extensions/pyexpat.c", line 522, in CharacterData > TypeError: not enough arguments; expected 4, got 2 I cannot reproduce this problem. Can you please find out what content handler exactly you gave to the expat reader? It appears that you somehow put in a character data handler that expects 4 arguments, whereas pyexpat will only pass 2 of them. To find this out, please print myhandler, and perhaps myhandler.characters. > If I replace the line: > inputsource.setByteStream(inbuffer) > > with: > inputsource.setCharacterStream(inbuffer) > > > I get: > Traceback (most recent call last): This is not so surprising: the character stream interface is inherited from Java, but it doesn't work in Python (yet?). > Also (on a tangent), I think in xml.sax.saxutils.XMLGenerator and > xml.sax.saxutils.XMLFilterBase that the characters() and > ignorableWhitespace() methods need to have 4 arguments instead of 2... > > For example: > def characters(self, content, start, length): > self._out.write(escape(content[start:start+length])) No, they don't. A SAX2 characters handler has only a single content argument; it was SAX1 where you had start and length arguments. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 27 07:08:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 27 Sep 2001 08:08:39 +0200 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation In-Reply-To: <20010926164829.4967.qmail@www2.nameplanet.com> (paul@boddie.net) References: <20010926164829.4967.qmail@www2.nameplanet.com> Message-ID: <200109270608.f8R68d601647@mira.informatik.hu-berlin.de> > But, back to vague relevance for this list, does anyone have any > hands-on experience with namespace processing along these lines? > Might there be any plans in the PyXML world to possibly provide > "helper methods" to find the meaning of attributes where such a > meaning is inferred from the namespace of the parent element? > Finally, is it reasonable to infer "meaning" in such a way? It is certainly possible to infer the meaning of an attribute without namespace from the parent element. However, this cannot be done in a general way; it is always application specific. Please try to think of XML applications as processors: They want to achieve some goal (e.g. produce some output). For that, they need to gather information from the input document. If this information happens to live in attributes, they need to know which attributes they expect the information in; this includes the namespace of the attributes (if any), and the elements of which they are attributes. It may be that the processor also needs to establish the "correctness" of the input somehow. In that case, it would be reasonable to require that it "understands" all pieces of information. Then, it ought to complain about attributes it doesn't know in context of their element; a namespace on the attribute may help to ignore the context of the element. > Right now, I have to mandate that the XMLForms namespace be the > default namespace in a document, because the above issue made it > unclear how one should properly handle cases such as those I gave in > my examples. It would be great, for a number of reasons, to remove > this restriction. If you consider this desirable, you may want to reconsider whether namespaces are a good thing in the first place. In XML namespaces, you really have to have a namespace on each element. Whether this happens by means of a default namespace, or by means of tagging each element with a namespace, are your only options. If inconvenience is a reason: it is common to pick a single-letter prefix, e.g. "f" if you have to type it frequently. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Sep 27 06:40:18 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 27 Sep 2001 07:40:18 +0200 Subject: [XML-SIG] Re: [4suite] from CNR - Italy In-Reply-To: <3BB1FD1A.55784830@fourthought.com> (message from Mike Olson on Wed, 26 Sep 2001 10:06:50 -0600) References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> Message-ID: <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> > What OS are you on? It looks like pyexpat did not get > compilied/installed properly. [...] > > ImportError: libexpat.so.0: cannot open shared object file: File o directory inesistente Do you have a libexpat.so.0 on your system? If so, you need to set your LD_LIBRARY_PATH to point to the directory containing it. Of course, it is confusing that pyexpat would link with a shared libexpat that cannot be found. This is most likely an error. To investigate it, we need to know in detail what you did to install PyXML. I.e. please report - what system you are using - what file you've downloaded to install PyXML (exact URL please) - what commands you've used in what order to install PyXML If you could give further clues (e.g. why it might think to use libexpat.so.0), don't hesitate to communicate them as well. Regards, Martin From paul@boddie.net Thu Sep 27 10:20:58 2001 From: paul@boddie.net (paul@boddie.net) Date: 27 Sep 2001 09:20:58 -0000 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation Message-ID: <20010927092058.17444.qmail@www3.nameplanet.com> On Thu, 27 Sep 2001 08:08:39 +0200 "Martin v. Loewis" wrote: > >It is certainly possible to infer the meaning of an attribute without >namespace from the parent element. However, this cannot be done in a >general way; it is always application specific. I can see the occasional need for "annotating" elements with attributes which are not associated with the namespace of the elements in question. However, the specification seems to imply that all unprefixed attributes are always going to be application specific in some way. Personally, if I wanted to use attributes in this way, I would at least associate them with a namespace to indicate that they mean something well-defined. Of course, by not using unprefixed attributes at all, it does mean that one can know what an attribute means by a casual inspection of a document, but if default namespaces are in use then one cannot do the same for elements. I'm just not convinced by the need for this lack of consistency. [...] >> Right now, I have to mandate that the XMLForms namespace be the >> default namespace in a document, because the above issue made it >> unclear how one should properly handle cases such as those I gave in >> my examples. It would be great, for a number of reasons, to remove >> this restriction. > >If you consider this desirable, you may want to reconsider whether >namespaces are a good thing in the first place. In XML namespaces, you >really have to have a namespace on each element. Whether this happens >by means of a default namespace, or by means of tagging each element >with a namespace, are your only options. Namespaces *are* a good thing in my particular application since there would be no decent way of using elements from different document types otherwise. My confusion arose when considering the behaviour or meaning of default namespaces - since the default namespace applies to all unprefixed elements, and as a result an element cannot (or should not) occur without being associated with a particular namespace, it seems bizarre to the casual observer that a loophole exists which permits attributes to escape such constraints. >If inconvenience is a reason: it is common to pick a single-letter >prefix, e.g. "f" if you have to type it frequently. I'm not as concerned with convenience as with the need for the relationships between attributes and namespaces to be clear to me and to anyone who might want to use my application. It seems perverse that despite a sizeable specification (aren't all the W3C's specifications huge?) one only finds decent answers to fundamental questions in some mailing list archives after doing a Web search. Anyway, thanks for the feedback! Perhaps I should go and rant in more official circles about this. They have probably heard this a thousand times before, although I can't say I would be surprised about that. Paul -- Get your firstname@lastname email for FREE at http://Nameplanet.com/?su From jh@web.de Thu Sep 27 11:38:30 2001 From: jh@web.de (=?iso-8859-1?Q? J=FCrgen=20Hermann ?=) Date: Thu, 27 Sep 2001 12:38:30 +0200 Subject: [XML-SIG] sax2 parsing from a string Message-ID: <200109271038.f8RAcUh13970@mailgate5.cinetic.de> "Martin v. Loewis" schrieb am 27.09.01: > > self._parser.Parse(data, isFinal) > > File "extensions/pyexpat.c", line 522, in CharacterData > > TypeError: not enough arguments; expected 4, got 2 > > I cannot reproduce this problem. Can you please find out what content > handler exactly you gave to the expat reader? It appears that you > somehow put in a character data handler that expects 4 arguments, > whereas pyexpat will only pass 2 of them. That is a diff between a SAX and a SAX2 handler, i.e. he uses a SAX handler with the SAX2 parser. > No, they don't. A SAX2 characters handler has only a single content > argument; it was SAX1 where you had start and length arguments. ... as you already have mentioned. :) _______________________________________________________________________ 1.000.000 DM gewinnen - kostenlos tippen - http://millionenklick.web.de IhrName@web.de, 8MB Speicher, Verschluesselung - http://freemail.web.de From uche.ogbuji@fourthought.com Thu Sep 27 15:03:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 27 Sep 2001 08:03:34 -0600 Subject: [XML-SIG] Re: [4suite] from CNR - Italy References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> Message-ID: <3BB331B6.B4AB285C@fourthought.com> "Martin v. Loewis" wrote: > > > What OS are you on? It looks like pyexpat did not get > > compilied/installed properly. > [...] > > > ImportError: libexpat.so.0: cannot open shared object file: File o directory inesistente > > Do you have a libexpat.so.0 on your system? If so, you need to set > your LD_LIBRARY_PATH to point to the directory containing it. > > Of course, it is confusing that pyexpat would link with a shared > libexpat that cannot be found. This is most likely an error. To > investigate it, we need to know in detail what you did to install > PyXML. I.e. please report > > - what system you are using > - what file you've downloaded to install PyXML (exact URL please) > - what commands you've used in what order to install PyXML > > If you could give further clues (e.g. why it might think to use > libexpat.so.0), don't hesitate to communicate them as well. The funny thing is that on my Red Hat 7.1 system, it isn't "libpyexpat.so.0" that is built by PyXML, but rather /usr/local/lib/python2.1/site-packages/_xmlplus/parsers/pyexpat.so Looking at the code, I don't see why on earth it would be looking for "libpyexpat.so.0" on UNIX. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From sam@webslingerZ.com Thu Sep 27 15:49:32 2001 From: sam@webslingerZ.com (Sam Brauer) Date: Thu, 27 Sep 2001 10:49:32 -0400 (EDT) Subject: [XML-SIG] sax2 parsing from a string In-Reply-To: <200109270555.f8R5tOW01542@mira.informatik.hu-berlin.de> Message-ID: Thanks! This set me on the right path. I'd only done sax1 so far with PyXML, and did not realize that the characters() method had a different number of arguments. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sam Brauer : Systems Programmer : sam@webslingerZ.com On Thu, 27 Sep 2001, Martin v. Loewis wrote: > > Can someone give me a brief example showing how to create a > > namespace-aware sax2 parser and use it to parse a string containing an > > XML document? > > I see a number of confusing information in your message, perhaps you > can help making sense out of it. > > > parser = xml.sax.make_parser() > > parser.setFeature(xml.sax.handler.feature_namespaces, 1) > > parser.setContentHandler(myhandler) > > inputsource = xml.sax.xmlreader.InputSource() > > inbuffer = cStringIO.StringIO() > > inbuffer.write(xmlstring) > > inbuffer.seek(0) > > inputsource.setByteStream(inbuffer) > > parser.parse(inputsource) > > parser.close() > > You don't need to close the parser if you use the .parse method; this > is only for use as an IncremementalParser (i.e. through feed). > > > self._parser.Parse(data, isFinal) > > File "extensions/pyexpat.c", line 522, in CharacterData > > TypeError: not enough arguments; expected 4, got 2 > > I cannot reproduce this problem. Can you please find out what content > handler exactly you gave to the expat reader? It appears that you > somehow put in a character data handler that expects 4 arguments, > whereas pyexpat will only pass 2 of them. > > To find this out, please print myhandler, and perhaps > myhandler.characters. > > > If I replace the line: > > inputsource.setByteStream(inbuffer) > > > > with: > > inputsource.setCharacterStream(inbuffer) > > > > > > I get: > > Traceback (most recent call last): > > This is not so surprising: the character stream interface is inherited > from Java, but it doesn't work in Python (yet?). > > > Also (on a tangent), I think in xml.sax.saxutils.XMLGenerator and > > xml.sax.saxutils.XMLFilterBase that the characters() and > > ignorableWhitespace() methods need to have 4 arguments instead of 2... > > > > For example: > > def characters(self, content, start, length): > > self._out.write(escape(content[start:start+length])) > > No, they don't. A SAX2 characters handler has only a single content > argument; it was SAX1 where you had start and length arguments. > > Regards, > Martin > From Alexandre.Fayolle@logilab.fr Thu Sep 27 16:12:02 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 27 Sep 2001 17:12:02 +0200 (CEST) Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation In-Reply-To: <20010926164829.4967.qmail@www2.nameplanet.com> Message-ID: On 26 Sep 2001 paul@boddie.net wrote: > But, back to vague relevance for this list, does anyone have any hands-on > experience with namespace processing along these lines? Might there be any > plans in the PyXML world to possibly provide "helper methods" to find the > meaning of attributes where such a meaning is inferred from the namespace of > the parent element? Finally, is it reasonable to infer "meaning" in such a way? > Are there any other likely interpretations for attributes without namespace > prefixes? This will probably not fully answer your question, but what we use in Narval's development branch to cope with this is code looking like: attr_ns = attributeNode.namespaceURI or \ attributeNode.ownerElement.namespaceURI In other words, we use the element's nsuri if none is given for the attribute. The major drawback to this approach is that you need to access the attribute node itself. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From paul@boddie.net Thu Sep 27 16:31:17 2001 From: paul@boddie.net (paul@boddie.net) Date: 27 Sep 2001 15:31:17 -0000 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3Crecommendation Message-ID: <20010927153117.10075.qmail@www3.nameplanet.com> On Thu, 27 Sep 2001 17:12:02 +0200 (CEST) Alexandre Fayolle wrote: > >This will probably not fully answer your question, but what we use in >Narval's development branch to cope with this is code looking like: > >attr_ns = attributeNode.namespaceURI or \ > attributeNode.ownerElement.namespaceURI > >In other words, we use the element's nsuri if none is given for the >attribute. The major drawback to this approach is that you need to access >the attribute node itself. Yes, this is more or less what I supposed I would need to do. Indeed, I wouldn't need to use the ownerElement attribute, given that most of the time I would reference (or obtain a reference to) the attribute using the owner element's DOM object. Thanks, Paul -- Get your firstname@lastname email for FREE at http://Nameplanet.com/?su From jeremy.kloth@fourthought.com Thu Sep 27 17:05:11 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Thu, 27 Sep 2001 10:05:11 -0600 Subject: [XML-SIG] Re: [4suite] from CNR - Italy References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> <3BB331B6.B4AB285C@fourthought.com> Message-ID: <005401c1476e$320be6e0$7f01a8c0@thekloths.net> > "Martin v. Loewis" wrote: > > > > > What OS are you on? It looks like pyexpat did not get > > > compilied/installed properly. > > [...] > > > > ImportError: libexpat.so.0: cannot open shared object file: File o directory inesistente > > > > Do you have a libexpat.so.0 on your system? If so, you need to set > > your LD_LIBRARY_PATH to point to the directory containing it. > > > > Of course, it is confusing that pyexpat would link with a shared > > libexpat that cannot be found. This is most likely an error. To > > investigate it, we need to know in detail what you did to install > > PyXML. I.e. please report > > > > - what system you are using > > - what file you've downloaded to install PyXML (exact URL please) > > - what commands you've used in what order to install PyXML > > > > If you could give further clues (e.g. why it might think to use > > libexpat.so.0), don't hesitate to communicate them as well. > > The funny thing is that on my Red Hat 7.1 system, it isn't > "libpyexpat.so.0" that is built by PyXML, but rather > > /usr/local/lib/python2.1/site-packages/_xmlplus/parsers/pyexpat.so > > Looking at the code, I don't see why on earth it would be looking for > "libpyexpat.so.0" on UNIX. > Now, PyXML only builds pyexpat if the version of pyexpat included with Python is less than 2.39. And if I remember correctly, pyexpat from Python links against the expat libary, hence the failed libexpat.so.0. I don't have a copy of Python 2.0.1 around to check the version of pyexpat, however, in 2.1 it is 2.45. If they rolled changes in pyexpat into the bugfix release, this would account for this odd error. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com +1 303 583 9900 x 105 Fourthought, Inc. http://fourthought.com 4735 East Walnut St, Boulder, CO 80301, USA XML strategy, XML tools (http://4suite.org), knowledge management From uche.ogbuji@fourthought.com Thu Sep 27 17:36:57 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 27 Sep 2001 10:36:57 -0600 Subject: [XML-SIG] Re: [4suite] from CNR - Italy References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> <3BB331B6.B4AB285C@fourthought.com> <005401c1476e$320be6e0$7f01a8c0@thekloths.net> Message-ID: <3BB355A9.C6329570@fourthought.com> Jeremy Kloth wrote: > Now, PyXML only builds pyexpat if the version of pyexpat included with > Python is less than 2.39. And if I remember correctly, pyexpat from Python > links against the expat libary, hence the failed libexpat.so.0. > > I don't have a copy of Python 2.0.1 around to check the version of pyexpat, > however, in 2.1 it is 2.45. If they rolled changes in pyexpat into the > bugfix release, this would account for this odd error. Thanks, Jeremy. Carla, Try adding the line "return 1" after line 48 in setup.py of the PyXML package. You should then have: def should_build_pyexpat(): return 1 Then re-install PyXML and let us know whether this fixes the problem. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Juergen Hermann" NEW: Locator Support for ContentHandler (tested), DTDHandler (untested) EntityResolver and LexicalHandler are almost finished. Ciao, J=FCrgen From Juergen Hermann" Hi! The following patch changes test_sax.py so that it tests ANY sax driver (i.e. the current driver according to PY_SAX_PRSER etc.). The only thing remaining expat-specific are the tests for incremental parsing. expat passes the changed tests. Comments? If anyone objects to the changes in-place , I'll add the patched module as "test_sax2.py". Index: test_sax.py =================================================================== RCS file: /cvsroot/pyxml/xml/test/test_sax.py,v retrieving revision 1.6 diff -u -r1.6 test_sax.py --- test_sax.py 2001/08/07 19:31:24 1.6 +++ test_sax.py 2001/09/27 20:51:26 @@ -1,15 +1,16 @@ # regression test for SAX 2.0 # $Id: test_sax.py,v 1.6 2001/08/07 19:31:24 fdrake Exp $ -from xml.sax import make_parser, ContentHandler, \ +from xml.sax import handler, make_parser, ContentHandler, \ SAXException, SAXReaderNotAvailable, SAXParseException try: make_parser() except SAXReaderNotAvailable: # don't try to test this module if we cannot create a parser raise ImportError("no XML parsers available") -from xml.sax.saxutils import XMLGenerator, escape, quoteattr, XMLFilterBase -from xml.sax.expatreader import create_parser +from xml.sax.saxutils import XMLGenerator, escape, quoteattr, XMLFilterBase, Location +from xml.sax import expatreader +from xml.sax.sax2exts import make_parser from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl from cStringIO import StringIO from test.test_support import verbose, TestFailed, findfile @@ -223,7 +224,7 @@ # ===== XMLReader support def test_expat_file(): - parser = create_parser() + parser = make_parser() result = StringIO() xmlgen = XMLGenerator(result) @@ -247,16 +248,15 @@ self._entities.append((name, publicId, systemId, ndata)) def test_expat_dtdhandler(): - parser = create_parser() + parser = make_parser() handler = TestDTDHandler() parser.setDTDHandler(handler) - parser.feed('\n') - parser.feed(' \n') - parser.feed(']>\n') - parser.feed('') - parser.close() + parser.parse(StringIO(''' + +]> +''')) return handler._notations == [("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)] and \ handler._entities == [("img", None, "expat.gif", "GIF")] @@ -271,16 +271,15 @@ return inpsrc def test_expat_entityresolver(): - parser = create_parser() + parser = make_parser() parser.setEntityResolver(TestEntityResolver()) result = StringIO() parser.setContentHandler(XMLGenerator(result)) - parser.feed('\n') - parser.feed(']>\n') - parser.feed('&test;') - parser.close() + parser.parse(StringIO(''' +]> +&test;''')) return result.getvalue() == start + "" @@ -295,42 +294,40 @@ self._attrs = attrs def test_expat_attrs_empty(): - parser = create_parser() + parser = make_parser() gather = AttrGatherer() parser.setContentHandler(gather) - parser.feed("") - parser.close() + parser.parse(StringIO("")) return verify_empty_attrs(gather._attrs) def test_expat_attrs_wattr(): - parser = create_parser() + parser = make_parser() gather = AttrGatherer() parser.setContentHandler(gather) - parser.feed("") - parser.close() + parser.parse(StringIO("")) return verify_attrs_wattr(gather._attrs) def test_expat_nsattrs_empty(): - parser = create_parser(1) + parser = make_parser() + parser.setFeature(handler.feature_namespaces, 1) gather = AttrGatherer() parser.setContentHandler(gather) - parser.feed("") - parser.close() + parser.parse(StringIO("")) return verify_empty_nsattrs(gather._attrs) def test_expat_nsattrs_wattr(): - parser = create_parser(1) + parser = make_parser() + parser.setFeature(handler.feature_namespaces, 1) gather = AttrGatherer() parser.setContentHandler(gather) - parser.feed("" % ns_uri) - parser.close() + parser.parse(StringIO("" % ns_uri)) attrs = gather._attrs @@ -352,7 +349,7 @@ xml_test_out = open(findfile("test.xml.out")).read() def test_expat_inpsource_filename(): - parser = create_parser() + parser = make_parser() result = StringIO() xmlgen = XMLGenerator(result) @@ -362,7 +359,7 @@ return result.getvalue() == xml_test_out def test_expat_inpsource_sysid(): - parser = create_parser() + parser = make_parser() result = StringIO() xmlgen = XMLGenerator(result) @@ -372,7 +369,7 @@ return result.getvalue() == xml_test_out def test_expat_inpsource_stream(): - parser = create_parser() + parser = make_parser() result = StringIO() xmlgen = XMLGenerator(result) @@ -388,7 +385,7 @@ def test_expat_incremental(): result = StringIO() xmlgen = XMLGenerator(result) - parser = create_parser() + parser = expatreader.create_parser() parser.setContentHandler(xmlgen) parser.feed("") @@ -400,7 +397,7 @@ def test_expat_incremental_reset(): result = StringIO() xmlgen = XMLGenerator(result) - parser = create_parser() + parser = expatreader.create_parser() parser.setContentHandler(xmlgen) parser.feed("") @@ -420,29 +417,36 @@ # ===== Locator support +class LocatorTest(XMLGenerator): + def __init__(self, out=None, encoding="iso-8859-1"): + XMLGenerator.__init__(self, out, encoding) + self.location = None + + def endDocument(self): + XMLGenerator.endDocument(self) + self.location = Location(self._locator) + def test_expat_locator_noinfo(): result = StringIO() - xmlgen = XMLGenerator(result) - parser = create_parser() + xmlgen = LocatorTest(result) + parser = make_parser() parser.setContentHandler(xmlgen) - parser.feed("") - parser.feed("") - parser.close() + parser.parse(StringIO("")) - return parser.getSystemId() is None and \ - parser.getPublicId() is None and \ - parser.getLineNumber() == 1 + return xmlgen.location.getSystemId() is None and \ + xmlgen.location.getPublicId() is None and \ + xmlgen.location.getLineNumber() == 1 def test_expat_locator_withinfo(): result = StringIO() - xmlgen = XMLGenerator(result) - parser = create_parser() + xmlgen = LocatorTest(result) + parser = make_parser() parser.setContentHandler(xmlgen) parser.parse(findfile("test.xml")) - return parser.getSystemId() == findfile("test.xml") and \ - parser.getPublicId() is None + return xmlgen.location.getSystemId() == findfile("test.xml") and \ + xmlgen.location.getPublicId() is None # =========================================================================== @@ -452,7 +456,7 @@ # =========================================================================== def test_expat_inpsource_location(): - parser = create_parser() + parser = make_parser() parser.setContentHandler(ContentHandler()) # do nothing source = InputSource() source.setByteStream(StringIO("")) #ill-formed @@ -464,7 +468,7 @@ return e.getSystemId() == name def test_expat_incomplete(): - parser = create_parser() + parser = make_parser() parser.setContentHandler(ContentHandler()) # do nothing try: parser.parse(StringIO("")) @@ -620,7 +624,7 @@ # ===== Main program def make_test_output(): - parser = create_parser() + parser = make_parser() result = StringIO() xmlgen = XMLGenerator(result) From martin@loewis.home.cs.tu-berlin.de Fri Sep 28 06:02:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 28 Sep 2001 07:02:49 +0200 Subject: [XML-SIG] Re: [4suite] from CNR - Italy In-Reply-To: <3BB331B6.B4AB285C@fourthought.com> (message from Uche Ogbuji on Thu, 27 Sep 2001 08:03:34 -0600) References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> <3BB331B6.B4AB285C@fourthought.com> Message-ID: <200109280502.f8S52nu01133@mira.informatik.hu-berlin.de> > > > > ImportError: libexpat.so.0: cannot open shared object file: File o directory inesistente [...] > The funny thing is that on my Red Hat 7.1 system, it isn't > "libpyexpat.so.0" that is built by PyXML Notice that it wasn't libpyexpat.so.0 that was missing, either, it was libexpat.so.0 - which might have been the expat library proper. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Sep 28 05:59:08 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 28 Sep 2001 06:59:08 +0200 Subject: [XML-SIG] Re: [4suite] from CNR - Italy In-Reply-To: <005401c1476e$320be6e0$7f01a8c0@thekloths.net> (jeremy.kloth@fourthought.com) References: <3BB1D74B.94492765@libero.it> <3BB1FD1A.55784830@fourthought.com> <200109270540.f8R5eIx01445@mira.informatik.hu-berlin.de> <3BB331B6.B4AB285C@fourthought.com> <005401c1476e$320be6e0$7f01a8c0@thekloths.net> Message-ID: <200109280459.f8S4x8t01101@mira.informatik.hu-berlin.de> > Now, PyXML only builds pyexpat if the version of pyexpat included with > Python is less than 2.39. And if I remember correctly, pyexpat from Python > links against the expat libary, hence the failed libexpat.so.0. > > I don't have a copy of Python 2.0.1 around to check the version of pyexpat, > however, in 2.1 it is 2.45. If they rolled changes in pyexpat into the > bugfix release, this would account for this odd error. Not really. If Python's pyexpat was unimportable due to the missing libexpat.so, importing it would have failed when the PyXML setup was run. In that case, pyexpat would have been build in PyXML, anyways. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Sep 28 06:34:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 28 Sep 2001 07:34:24 +0200 Subject: [XML-SIG] Patch: Generic test_sax.py In-Reply-To: (jh@web.de) References: Message-ID: <200109280534.f8S5YOf01263@mira.informatik.hu-berlin.de> > The following patch changes test_sax.py so that it tests ANY sax > driver (i.e. the current driver according to PY_SAX_PRSER etc.). > The only thing remaining expat-specific are the tests for > incremental parsing. expat passes the changed tests. > > Comments? If anyone objects to the changes in-place , I'll add > the patched module as "test_sax2.py". test_sax.py is a file shared with Python (more or less), please see test.test_sax. It appears that your patch would prohibit synchronization with Python. If you can fix that (i.e. test_sax continues to operate in a Python-without-PyXML installation), feel free to apply the patch (and I'll forward it into Python). If not, you'd need to create second test case. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Sep 28 06:29:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 28 Sep 2001 07:29:49 +0200 Subject: [XML-SIG] Default namespaces, attributes, 4DOM and the W3C recommendation In-Reply-To: <20010927092058.17444.qmail@www3.nameplanet.com> (paul@boddie.net) References: <20010927092058.17444.qmail@www3.nameplanet.com> Message-ID: <200109280529.f8S5TnE01226@mira.informatik.hu-berlin.de> > I can see the occasional need for "annotating" elements with > attributes which are not associated with the namespace of the > elements in question. Attributes without a namespace are the rule in a namespace application, not the exception. As explained in A.2, attribute names are either global, or in a per-element partition. The per-element partition is by far more typical as the global one; most attributes make sense only in the context of their element. For an example, notice how XHTML *only* has attributes in the per-element partitions, and no global attributes. > My confusion arose when considering the behaviour or meaning of > default namespaces - since the default namespace applies to all > unprefixed elements, and as a result an element cannot (or should > not) occur without being associated with a particular namespace, it > seems bizarre to the casual observer that a loophole exists which > permits attributes to escape such constraints. That is on purpose. Eg. what use would be an xhtml:border attribute? It is meaningful for tables, but not, say, for anchors. Regards, Martin From Peter.Frey@haufe.de Fri Sep 28 06:42:32 2001 From: Peter.Frey@haufe.de (Peter.Frey@haufe.de) Date: Fri, 28 Sep 2001 07:42:32 +0200 Subject: [XML-SIG] PyXML-0.6.6 Installation and Python 2.1 Message-ID: <08231599A3E6E64CAE8FA66619243E4001993D9A@VG100EXCH> This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --------------InterScan_NT_MIME_Boundary Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C147E0.5E482F80" ------_=_NextPart_001_01C147E0.5E482F80 Content-Type: text/plain; charset="iso-8859-1" Hi, I just downloaded PyXML from sourceforge.net this morning, namely PyXML-0.6.6.win32-py2.1.exe as I am using ActivePython 2.1, build 211 In the Setup Dialog that pops up when starting installation it states: XML Parsers and API for Python This version of PyXML was tested with Python 2.0 and 1.5.2. Python 2.0??? I need the version for Python 2.1! Where can I get it from? Or is PyXML 0.6.6 not yet tested/released for Python 2.1? Peter Frey Dipl. Informatik-Ing. ETH Rudolf Haufe Verlag Dept. Electronic Publishing phone: +49 761/3683-576 fax +49 761/3683-820-576 peter.frey@haufe.de ------_=_NextPart_001_01C147E0.5E482F80 Content-Type: text/html; charset="iso-8859-1" PyXML-0.6.6 Installation and Python 2.1

Hi,

I just downloaded PyXML from sourceforge.net this morning, namely PyXML-0.6.6.win32-py2.1.exe
as I am using ActivePython 2.1, build 211

In the Setup Dialog that pops up when starting installation it states:

XML Parsers and API for Python
This version of PyXML was tested with Python 2.0 and 1.5.2.

Python 2.0??? I need the version for Python 2.1!
Where can I get it from?
Or is PyXML 0.6.6 not yet tested/released for Python 2.1?

Peter Frey
Dipl. Informatik-Ing. ETH
Rudolf Haufe Verlag
Dept. Electronic Publishing
phone: +49 761/3683-576
fax +49 761/3683-820-576
peter.frey@haufe.de

------_=_NextPart_001_01C147E0.5E482F80-- --------------InterScan_NT_MIME_Boundary-- From Alexandre.Fayolle@logilab.fr Fri Sep 28 08:09:31 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 28 Sep 2001 09:09:31 +0200 (CEST) Subject: [XML-SIG] PyXML-0.6.6 Installation and Python 2.1 In-Reply-To: <08231599A3E6E64CAE8FA66619243E4001993D9A@VG100EXCH> Message-ID: On Fri, 28 Sep 2001 Peter.Frey@haufe.de wrote: > Python 2.0??? I need the version for Python 2.1! > Where can I get it from? > Or is PyXML 0.6.6 not yet tested/released for Python 2.1? It is released for python 2.1 (the name of the package is a strong indication, event if te message is misleading), and it works perfectly well for me on win2k + python2.1. Now remember that this is free software, that comes with no warranty. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From martin@loewis.home.cs.tu-berlin.de Fri Sep 28 08:14:02 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 28 Sep 2001 09:14:02 +0200 Subject: [XML-SIG] PyXML-0.6.6 Installation and Python 2.1 In-Reply-To: <08231599A3E6E64CAE8FA66619243E4001993D9A@VG100EXCH> (Peter.Frey@haufe.de) References: <08231599A3E6E64CAE8FA66619243E4001993D9A@VG100EXCH> Message-ID: <200109280714.f8S7E2604983@mira.informatik.hu-berlin.de> > I just downloaded PyXML from sourceforge.net this morning, namely > PyXML-0.6.6.win32-py2.1.exe > as I am using ActivePython 2.1, build 211 > > In the Setup Dialog that pops up when starting installation it states: > > XML Parsers and API for Python > This version of PyXML was tested with Python 2.0 and 1.5.2. > > Python 2.0??? I need the version for Python 2.1! Sorry for the confusion. You got the version for Python 2.1. > Or is PyXML 0.6.6 not yet tested/released for Python 2.1? It is released, and it even has been tested. I just forgot to update that particular string. Regards, Martin From Alexandre.Fayolle@logilab.fr Fri Sep 28 09:14:46 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 28 Sep 2001 10:14:46 +0200 (CEST) Subject: [XML-SIG] [ANN] xmltools 1.3.4 Message-ID: Logilab has released a xmltools 1.3.4. Python XmlTools is a set of high level tools to help using XML in python. It relies heavily on PyXml and 4Suite to access XML resources. Right now it features two pyGTK widgets: XmlTree and XmlEditor, which can respectively display and edit an XML document in a graphical fashion. Both widgets are used in the NARVAL project. The main focus of this release is fixing an encoding handling bug that would cause the application to crash on Windows machines. You can find information about xmltools on http://www.logilab.org/xmltools/ The download site is ftp://ftp.logilab.org/pub/xmltools/ Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Juergen Hermann" Hi! Now that the SAX driver passes the tests, I made some ad-hoc measurements (feeding hamlet.xml to xml.sax.saxutils.DefaultHandler). PIRXX is roughly 4.5 times faster than xmlproc, and 6 times faster in validating mode. Compared to expat, times are nearly equal (non- validating only of course). Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/