From Mike.Olson@fourthought.com Sun Oct 1 21:06:44 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 01 Oct 2000 14:06:44 -0600 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests References: <200009300152.TAA12572@localhost.localdomain> Message-ID: <39D79954.342F7478@FourThought.com> uche.ogbuji@fourthought.com wrote: > > > On Fri, Sep 29, 2000 at 11:22:54PM +0200, Martin v. Loewis wrote: > > >The missing piece for a 0.6.1 release is the test suite. Specifically, > > >the dom tests are not executed, as xml.dom.core is not available > > >anymore. Should these tests be ported to 4DOM? > > > > I vaguely recall that someone at FourThought once asked me if that > > would be OK, but don't know if anyone actually did it. It would be a > > good idea to port them, since they made some attempt at being > > exhaustive (trying the various error cases, etc.). > > Yes, we did check in the test suites, but as Martin points out, they're broken > because they use the now defunct TraceOut module (that's the "Ft" stuff. Mike > has agreed to tackle some of the task of removing traceouts tonight so I'll > ask hiom if he can start with the 4DOM test suites. If so, we'll check in the > fix tomorrow. All of the traceout stuff has been removed. There still is the problem of our traceout library. I suppose we can install it to 2 locations so that xml.dom is not dependent on Ft. I'll work on that today so it should be in the next snapshot. Mike > > I'll also move the demos as discussed. > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From clarence@netlojix.com Sun Oct 1 21:52:42 2000 From: clarence@netlojix.com (Clarence Gardner) Date: Sun, 1 Oct 2000 13:52:42 -0700 Subject: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought Message-ID: <20001001135242.B29839@liberty.sba2.netlojix.net> Context: I'm using PyXML-0.5.5.1, and interestingly, I never compiled any of the C code; I just use the .py and it seems to work fine. So I'm storing some arbitrary textual data under an arbitrarily-named element node. My test code created the xml and dumped it to a file, where it looked like this: This document is read and updated, and I noticed that each time I added a new username (i.e., read the xml source, inserted a new username node via DOM, and wrote back to the file), the previous ones changed from CDATA to TEXT. This seemed like a bug to me. I thought I would see what would happen if I added a username of "". The first time, it appeared in the file as ]]> as expected, then after one more addition, it was now <markup test> . But now I see that, if I read that document, the username has not one TEXT child, but three ('<', 'markup test', and '>'). Does all this seem right to people? That last implies, of course, that in order to get what I expect to be the text value of a node, I actually have to get all of the text children and concatenate their values. Which would seem to be a problem if (I haven't tried this) I originally stored two separate text children of the username node, because this would cause them to be merged into one. -- Clarence Gardner Software Engineer NetLojix Communications clarence@netlojix.com From martin@loewis.home.cs.tu-berlin.de Sun Oct 1 23:05:50 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 2 Oct 2000 00:05:50 +0200 Subject: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought In-Reply-To: <20001001135242.B29839@liberty.sba2.netlojix.net> (message from Clarence Gardner on Sun, 1 Oct 2000 13:52:42 -0700) References: <20001001135242.B29839@liberty.sba2.netlojix.net> Message-ID: <200010012205.AAA00753@loewis.home.cs.tu-berlin.de> > Does all this seem right to people? I can't see a problem here. AFAIK, ]]> is really equivalent to <markup test> from an XML point of view. You did not say exactly how this reading or writing was achieved (SAX, DOM, something else). Whatever procedure was used to write this, I guess it should be possible to change it so that Text nodes (in the DOM sense) come out as CDATA sections - if that is really a requirement. Regards, Martin From clarence@netlojix.com Sun Oct 1 23:33:03 2000 From: clarence@netlojix.com (Clarence Gardner) Date: Sun, 1 Oct 2000 15:33:03 -0700 Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought] Message-ID: <20001001153303.E29839@liberty.sba2.netlojix.net> From: "Martin v. Loewis" >I can't see a problem here. AFAIK, ]]> is really >equivalent to <markup test> from an XML point of view. Sure. I did that test to make sure it didn't change it to a text node containing the literal "". But what really got me was that [text] got changed to []. On further reflection, I can see that my previous concern about two original TEXT children of was nonsensical (if they were really distinct, they should be elements), but nonetheless, the lesson about having to concatenate all TEXT children to get the original text value seems to be true. > >You did not say exactly how this reading or writing was achieved (SAX, >DOM, something else). Whatever procedure was used to write this, I >guess it should be possible to change it so that Text nodes (in the >DOM sense) come out as CDATA sections - if that is really a >requirement. It's not that I have a love affair with CDATA, I just wanted to be sure that arbitrary text wouldn't cause a problem. I'm doing what is presumably the vanilla use of the package: p = saxexts.make_parser() dh = SaxBuilder() p.setDocumentHandler(dh) p.parseFile(f) ... dh.doctoxml() -- Clarence Gardner Software Engineer NetLojix Communications clarence@netlojix.com From m.favas@per.dem.csiro.au Mon Oct 2 00:20:52 2000 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Mon, 02 Oct 2000 07:20:52 +0800 Subject: [XML-SIG] distutils bug with PyXML 0.6.1, Python 2.0b2 (CVS) Message-ID: <39D7C6D4.FBF17094@per.dem.csiro.au> With the current (Oct 2) CVS versions of PyXML and Python (2), running "python setup.py install" produces the following glitch: copies all the relevant files to /usr/local/lib/python2.0/site-packages/_xmlplus and then tries to compile them all. Unfortunatley, it tries to byte-compile sgmlop.so, leading to the traceback below. Is this a PyXML mis-setup of setup.py or a distutils (Python core version) bug? byte-compiling /usr/local/lib/python2.0/site-packages/_xmlplus/parsers/xmlproc/xmlutils.py to xmlutils.pyc byte-compiling /usr/local/lib/python2.0/site-packages/_xmlplus/parsers/xmlproc/xmlval.py to xmlval.pyc Traceback (most recent call last): File "setup.py", line 94, in ? ext_modules = ext_modules File "/usr/local/lib/python2.0/distutils/core.py", line 138, in setup dist.run_commands() File "/usr/local/lib/python2.0/distutils/dist.py", line 829, in run_commands self.run_command(cmd) File "/usr/local/lib/python2.0/distutils/dist.py", line 849, in run_command cmd_obj.run() File "/usr/local/lib/python2.0/distutils/command/install.py", line 470, in run self.run_command(cmd_name) File "/usr/local/lib/python2.0/distutils/cmd.py", line 328, in run_command self.distribution.run_command(command) File "/usr/local/lib/python2.0/distutils/dist.py", line 849, in run_command cmd_obj.run() File "/usr/local/lib/python2.0/distutils/command/install_lib.py", line 61, in run self.bytecompile(outfiles) File "/usr/local/lib/python2.0/distutils/command/install_lib.py", line 88, in bytecompile verbose=self.verbose, dry_run=self.dry_run) File "/usr/local/lib/python2.0/distutils/util.py", line 381, in byte_compile raise ValueError, \ ValueError: invalid filename: '/usr/local/lib/python2.0/site-packages/_xmlplus/p arsers/sgmlop.so' doesn't end with '.py' -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From martin@loewis.home.cs.tu-berlin.de Mon Oct 2 00:33:12 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 2 Oct 2000 01:33:12 +0200 Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought] In-Reply-To: <20001001153303.E29839@liberty.sba2.netlojix.net> (message from Clarence Gardner on Sun, 1 Oct 2000 15:33:03 -0700) References: <20001001153303.E29839@liberty.sba2.netlojix.net> Message-ID: <200010012333.BAA00983@loewis.home.cs.tu-berlin.de> > On further reflection, I can see that my previous concern about two > original TEXT children of was nonsensical (if they were > really distinct, they should be elements), but nonetheless, the > lesson about having to concatenate all TEXT children to get the > original text value seems to be true. I think you have a point on splitting a text fragment into multiple Text nodes; the DOM spec says about the interface Text: # If there is no markup inside an element's content, the text is # contained in a single object implementing the Text interface that is # the only child of the element. If there is markup, it is parsed into # a list of elements and Text nodes that form the list of children of # the element. # When a document is first made available via the DOM, there is only # one Text node for each block of text. Users may create adjacent Text # nodes that represent the contents of a given element without any # intervening markup, but should be aware that there is no way to # represent the separations between these nodes in XML or HTML, so # they will not (in general) persist between DOM editing sessions. The # normalize() method on Element [p.38] merges any such adjacent Text # objects into a single node for each block of text; this is # recommended before employing operations that depend on a particular # document structure, such as navigation with XPointers. [from REC-DOM-Level-1-19981001] I'm not sure what that means for parsing <hallo> - is it permitted that these are split into three Text nodes, is it required that they are split, or is it disallowed? According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an entity reference is markup; 4.1 says that > is an entity reference (*not* a character reference) - so it appears permitted that multiple Text nodes are created. You *should* be able to merge them by calling normalize() on the tree; I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in PyXML 0.6. Please note that normalize won't merge CDATA sections. Regards, Martin From gward@python.net Mon Oct 2 00:50:19 2000 From: gward@python.net (Greg Ward) Date: Sun, 1 Oct 2000 19:50:19 -0400 Subject: [XML-SIG] Re: distutils bug with PyXML 0.6.1, Python 2.0b2 (CVS) In-Reply-To: <39D7C6D4.FBF17094@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Mon, Oct 02, 2000 at 07:20:52AM +0800 References: <39D7C6D4.FBF17094@per.dem.csiro.au> Message-ID: <20001001195018.A11937@beelzebub> On 02 October 2000, Mark Favas said: > With the current (Oct 2) CVS versions of PyXML and Python (2), running > "python setup.py install" produces the following glitch: copies all the > relevant files to /usr/local/lib/python2.0/site-packages/_xmlplus and > then tries to compile them all. Unfortunatley, it tries to byte-compile > sgmlop.so, leading to the traceback below. Is this a PyXML mis-setup of > setup.py or a distutils (Python core version) bug? D'ohh! That's a Distutils bug, introduced last night. Just checked in a fix -- thanks for the quick report! BTW, it doesn't matter if you follow the Python CVS or the Distutils CVS, you'll get my latest code either way. Greg -- Greg Ward gward@python.net http://starship.python.net/~gward/ From clarence@netlojix.com Mon Oct 2 03:17:54 2000 From: clarence@netlojix.com (Clarence Gardner) Date: Sun, 1 Oct 2000 19:17:54 -0700 Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]] Message-ID: <20001001191754.F29839@liberty.sba2.netlojix.net> From: "Martin v. Loewis" >I think you have a point on splitting a text fragment into multiple >Text nodes; the DOM spec says about the interface Text: > ># If there is no markup inside an element's content, the text is ># contained in a single object implementing the Text interface that is ># the only child of the element. If there is markup, it is parsed into ># a list of elements and Text nodes that form the list of children of ># the element. > >[from REC-DOM-Level-1-19981001] > >I'm not sure what that means for parsing <hallo> - is it >permitted that these are split into three Text nodes, is it required >that they are split, or is it disallowed? > >According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an >entity reference is markup; 4.1 says that > is an entity reference >(*not* a character reference) - so it appears permitted that multiple >Text nodes are created. Thanks, Martin. (And please accept my apologies for posting from a state of abysmal ignorance regarding XML. Being a person who actually enjoys reading standards documents, I'm going to read through the document you referenced.) >You *should* be able to merge them by calling normalize() on the tree; >I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in >PyXML 0.6. Please note that normalize won't merge CDATA sections. It does work, at least on my test data. -- Clarence Gardner Software Engineer NetLojix Communications clarence@netlojix.com From uche.ogbuji@fourthought.com Mon Oct 2 19:05:35 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 02 Oct 2000 12:05:35 -0600 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests In-Reply-To: Message from "Martin v. Loewis" of "Sat, 30 Sep 2000 09:34:43 +0200." <200009300734.JAA00694@loewis.home.cs.tu-berlin.de> Message-ID: <200010021805.MAA11294@localhost.localdomain> I still get masses of errors working with the pyxml CVS. I won't clutter the list, but I've placed a (long) transcript of my last commit effort, erros and all at ftp://ftp.fourthought.com/pub/etc/pyxml-cvs-errors.txt Some excerpts: cvs diff: [10:57:45] waiting for uche's lock in /cvsroot/pyxml/xml/test/dom Mailed xml-checkins@python.org Traceback (innermost last): File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ? blast_mail(mailcmd, specs[1:]) File "/cvsroot/pyxml/CVSROOT/syncmail", line 198, in blast_mail fp = os.popen(cmd, 'w') os.error: (11, 'Resource temporarily unavailable') Mailed xml-checkins@python.org Traceback (innermost last): File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ? blast_mail(mailcmd, specs[1:]) File "/cvsroot/pyxml/CVSROOT/syncmail", line 193, in blast_mail if not os.fork(): os.error: (11, 'Resource temporarily unavailable') Mailed xml-checkins@python.org Traceback (innermost last): File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ? blast_mail(mailcmd, specs[1:]) File "/cvsroot/pyxml/CVSROOT/syncmail", line 203, in blast_mail fp.write(calculate_diff(file)) IOError: (32, 'Broken pipe') cvs [diff aborted]: no such tag NONE -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Oct 2 19:06:22 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 02 Oct 2000 12:06:22 -0600 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests In-Reply-To: Message from "Martin v. Loewis" of "Sat, 30 Sep 2000 09:34:43 +0200." <200009300734.JAA00694@loewis.home.cs.tu-berlin.de> Message-ID: <200010021806.MAA11305@localhost.localdomain> I've updated the structure to eliminate duplicate demos and test suites and I've updated the DOM code-base. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Mon Oct 2 22:35:05 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 2 Oct 2000 23:35:05 +0200 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests In-Reply-To: <200010021805.MAA11294@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200010021805.MAA11294@localhost.localdomain> Message-ID: <200010022135.XAA01128@loewis.home.cs.tu-berlin.de> > I still get masses of errors working with the pyxml CVS. I won't > clutter the list, but I've placed a (long) transcript of my last > commit effort, erros and all at It seems that your check-ins made it through, anyway - Thanks! You may want to check whether you have there were changes that did not get committed (just update, and see whether any conflicts are reported; cvs will store the original files with your modifications in a .# name in that case). > ftp://ftp.fourthought.com/pub/etc/pyxml-cvs-errors.txt # fp = os.popen(diffcmd) # os.error: (11, 'Resource temporarily unavailable') fork(2) will give error 11 if no process could be started anymore. It seems that the machine was running out of processes, so the problem is hopefully indeed temporarily. In any case - this was just an attempt to send a commit email message; the main operation was not affected. To see the code of the syncmail script, just do 'cvs co CVSROOT' in the xml topleve directory. If the problem persists, SF would need to kill some processes, or reboot the machine - I believe none of us could actually log into it. Regards, Martin From dwallace@udel.edu Wed Oct 4 03:26:43 2000 From: dwallace@udel.edu (Dave) Date: Tue, 03 Oct 2000 22:26:43 -0400 Subject: [XML-SIG] Can't make more than one parser Message-ID: <39DA9563.1000602@delanet.com> Hello, I appologize if it is premature to report this on your development tree but I have been seeing it for a several days now and wanted you to be aware. Using the latest PyXML CVS checkout, the following code throws an exception at the second make_parser call. This happens in Python2.0 (also latest CVS) and Python1.6. from xml.sax.saxexts import make_parser p = make_parser() if p: print "* got one *" q = make_parser() if q: print "* got the other *" The exception thrown is: Traceback (most recent call last): File "test_xml.py", line 8, in ? q = make_parser() File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py", line 158, in make_parser return XMLParserFactory.make_parser(parser_list) File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py", line 63, in make_parser return self._create_parser(parser_name) File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py", line 43, in _create_parser return drv_module.create_parser() AttributeError: create_parser Dave. From martin@loewis.home.cs.tu-berlin.de Wed Oct 4 07:41:06 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 4 Oct 2000 08:41:06 +0200 Subject: [XML-SIG] Can't make more than one parser In-Reply-To: <39DA9563.1000602@delanet.com> (message from Dave on Tue, 03 Oct 2000 22:26:43 -0400) References: <39DA9563.1000602@delanet.com> Message-ID: <200010040641.IAA00909@loewis.home.cs.tu-berlin.de> > File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py", > line 43, in _create_parser > return drv_module.create_parser() > AttributeError: create_parser Apparently, it attempts to load a parser module which does not implement a create_parser function. Before line 43, could you please insert a line "print drv_module", and report what it prints? (of course, you can also run it in the IDLE debugger to see what happens) Regards, Martin From bkc@murkworks.com Wed Oct 4 15:45:50 2000 From: bkc@murkworks.com (Brad Clements) Date: Wed, 4 Oct 2000 10:45:50 -0400 Subject: [XML-SIG] Build problem on Win2k Message-ID: <39DB0A5C.7184.8D6529D@localhost> I have Python 1.5.2, trying to install XML v0.5.2 I have dist-utils installed (perhaps its too old). python setup.py build produces this traceback.. I'll start looking to see what's happening, but here's the info in case someone knows what's up. creating build\lib.win32\xml\utils copying xml\utils\iso8601.py -> build\lib.win32\xml\utils copying xml\utils\qp_xml.py -> build\lib.win32\xml\utils copying xml\utils\__init__.py -> build\lib.win32\xml\utils running build_ext warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules for extension 'sgmlop'-- please convert to Extension instance warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules for extension 'xml.unicode.wstrop'-- please convert to Extension instance warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules for extension 'xml.parsers.pyexpat'-- please convert to Extension instance building 'sgmlop' extension creating build\temp.win32 creating build\temp.win32\Release creating build\temp.win32\Release\extensions D:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 -Id:\progra~1\python\Include /Tcextensions/sgmlop.c /Fobuild\temp.win32\Release\ extensions/sgmlop.obj sgmlop.c Traceback (innermost last): File "setup.py", line 53, in ? ext_modules = wstr_modules + File "d:\Program Files\Python\Lib\distutils\core.py", line 112, in setup dist.run_commands () File "d:\Program Files\Python\Lib\distutils\dist.py", line 776, in run_command s self.run_command (cmd) File "d:\Program Files\Python\Lib\distutils\dist.py", line 797, in run_command cmd_obj.run () File "d:\Program Files\Python\Lib\distutils\command\build.py", line 117, in ru n self.run_command ('build_ext') File "d:\Program Files\Python\Lib\distutils\cmd.py", line 310, in run_command self.distribution.run_command (command) File "d:\Program Files\Python\Lib\distutils\dist.py", line 797, in run_command cmd_obj.run () File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 224, i n run self.build_extensions () File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 428, i n build_extensions libraries=self.get_libraries(ext), File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 571, i n get_libraries return ext.libraries + [pythonlib] TypeError: bad operand type(s) for + Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax netmeeting: ils://ils.murkworks.com AOL-IM: BKClements From bkc@murkworks.com Wed Oct 4 15:53:18 2000 From: bkc@murkworks.com (Brad Clements) Date: Wed, 4 Oct 2000 10:53:18 -0400 Subject: [XML-SIG] Cancel that build error on win2k Message-ID: <39DB0C1B.11449.8DD27BB@localhost> Apparently I had an out-of-date distutils. Upgrading to 1.0 fixed the problem. Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax netmeeting: ils://ils.murkworks.com AOL-IM: BKClements From brian@watchmark.com Wed Oct 4 17:17:52 2000 From: brian@watchmark.com (Brian Fritz) Date: Wed, 04 Oct 2000 09:17:52 -0700 Subject: [XML-SIG] Q: Any Solaris users that have successfully installed PyXML? Message-ID: <39DB5830.FA2A3507@watchmark.com> Hi, I reviewed the archives for the last couple of months and didn't notice any posts that seemed relevant to installing PyXML on a Sun SparcStation running Solaris. My apologies if I stopped looking too soon. I ftp'd the PyXML-0.5.5.1 source yesterday and quickly discovered that I apparently also needed to install the Distutils-1.0. Reading through the install isntructions for the Distutils I noticed that it said: > To use the Distutils under Unix, you must have a *complete* Python > installation, including the Makefile and config.h used to build Python. Do I have to build Python from source to install the Distutils to install the PyXML modules? Has anyone else been down this path for Solaris and can warn me in advance of any more "traps", or suggest shortcuts? TIA! Brian From akuchlin@mems-exchange.org Wed Oct 4 17:39:43 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 4 Oct 2000 12:39:43 -0400 Subject: [XML-SIG] Q: Any Solaris users that have successfully installed PyXML? In-Reply-To: <39DB5830.FA2A3507@watchmark.com>; from brian@watchmark.com on Wed, Oct 04, 2000 at 09:17:52AM -0700 References: <39DB5830.FA2A3507@watchmark.com> Message-ID: <20001004123908.A5080@kronos.cnri.reston.va.us> On Wed, Oct 04, 2000 at 09:17:52AM -0700, Brian Fritz wrote: >Do I have to build Python from source to install the Distutils to install >the PyXML modules? If whoever installed Python for you did the job correctly, no, since the Makefile and Setup should have been installed at that time. You don't need to have a complete Python source tree lying around. Distutils needs /usr/local/lib/python1.5/config/Makefile (/python1.5/config/Makefile, to be more general), so check if it's there in your Python installation. If it is, the Distutils should work fine. >Has anyone else been down this path for Solaris and can warn me in advance >of any more "traps", or suggest shortcuts? I develop on Solaris part of the time, and there aren't any special problems that I'm unaware of. --amk From tgagne@efinnet.com Wed Oct 4 21:04:39 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 16:04:39 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python Message-ID: <39DB8D57.F59A80D8@ix.netcom.com> I was just going through the XML howto about creating DOMs. I have a buffer that looks like: And I want to get the value of the "status" attribute. My subroutine looks like: def isResultValue(buffer): print buffer parser = saxexts.make_parser() dh = SaxBuilder() parser.setDocumentHandler(dh) fh = StringIO.StringIO(buffer) parser.parseFile(fh) print dh.get_parentNode() parser.close() fh.close() Now, the problem is, I don't know how to get the first node from dh. I usually try to print variables to see what they can do, but I'm not seeing anything when I try "print dh". I've tried printing dh.parentNode and dh.get_parentNode() without success. I think if someone could just point me in the right direction I'd be zooming right along. -- .tom From akuchlin@mems-exchange.org Wed Oct 4 21:11:38 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 4 Oct 2000 16:11:38 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python In-Reply-To: <39DB8D57.F59A80D8@ix.netcom.com>; from tgagne@ix.netcom.com on Wed, Oct 04, 2000 at 04:04:39PM -0400 References: <39DB8D57.F59A80D8@ix.netcom.com> Message-ID: <20001004161138.A7962@kronos.cnri.reston.va.us> On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote: >anything when I try "print dh". I've tried printing dh.parentNode and >dh.get_parentNode() without success. I think if someone could just point me >in the right direction I'd be zooming right along. dh is a SAX document handler, not a DOM tree, so I wouldn't expect get_parentNode() to work. Instead say "doc = dh.document"; doc is then a DOM tree, so you can call doc.getElementsByTagName() or whatever. --amk From tgagne@efinnet.com Wed Oct 4 21:20:59 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 16:20:59 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> Message-ID: <39DB912B.DC401BDA@ix.netcom.com> Andrew Kuchling wrote: > On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote: > >anything when I try "print dh". I've tried printing dh.parentNode and > >dh.get_parentNode() without success. I think if someone could just point me > >in the right direction I'd be zooming right along. > > dh is a SAX document handler, not a DOM tree, so I wouldn't expect > get_parentNode() to work. Instead say "doc = dh.document"; doc is > then a DOM tree, so you can call doc.getElementsByTagName() or > whatever. That sounded good so I tried it. The code now looks like: def isResultValue(buffer): print buffer parser = saxexts.make_parser() dh = SaxBuilder() parser.setDocumentHandler(dh) fh = StringIO.StringIO(buffer) parser.parseFile(fh) doc = dh.document print doc.getElementsByTagName("isResultsInfo") parser.close() fh.close() and when I run it, I get: Now, looking at the buffer I can see there's a node, but it *never* seems to show up. It's driving me nuts!!! > > > --amk > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- .tom From tgagne@efinnet.com Wed Oct 4 21:24:49 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 16:24:49 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> Message-ID: <39DB9211.6ED019E0@ix.netcom.com> One last comment, after creating the doc, and I try: "print doc.toxml()" I get: Showing the child of isResult, isResultInfo, is missing completey. Where did it go??? -- .tom From tgagne@efinnet.com Wed Oct 4 21:40:33 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 16:40:33 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com> Message-ID: <39DB95C1.F0C1F5F5@ix.netcom.com> Here's something curious: If I try the same thing on I get the output: Which shows the childnode. Is it possible we have a problem with newline characters in the buffer? Is this a parser problem or a StringIO problem? -- .tom From dag@orion.no Wed Oct 4 21:50:07 2000 From: dag@orion.no (Dag Sunde) Date: Wed, 4 Oct 2000 22:50:07 +0200 Subject: [XML-SIG] Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB912B.DC401BDA@ix.netcom.com> Message-ID: <009b01c02e44$aee1a090$43145c3e@orion.no> Check Your argument to "getElementsByTagName"... You ask for "isResultsInfo" (plural) (s)... but your tag is defined as "" (singular) Remove the "s" in: doc.getElementsByTagName("isResult_s_Info") and you shold be ok... :-) Dag. ----- Original Message ----- From: "Thomas Gagne" Cc: "Python XML-SIG" Sent: 4. oktober 2000 22:20 Subject: Re: [XML-SIG] Accessing DOM nodes in Python > Andrew Kuchling wrote: > > > On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote: > > >anything when I try "print dh". I've tried printing dh.parentNode and > > >dh.get_parentNode() without success. I think if someone could just point me > > >in the right direction I'd be zooming right along. > > > > dh is a SAX document handler, not a DOM tree, so I wouldn't expect > > get_parentNode() to work. Instead say "doc = dh.document"; doc is > > then a DOM tree, so you can call doc.getElementsByTagName() or > > whatever. > > That sounded good so I tried it. The code now looks like: > > def isResultValue(buffer): > print buffer > parser = saxexts.make_parser() > > dh = SaxBuilder() > > parser.setDocumentHandler(dh) > > fh = StringIO.StringIO(buffer) > parser.parseFile(fh) > > doc = dh.document > print doc.getElementsByTagName("isResultsInfo") > > parser.close() > fh.close() > > and when I run it, I get: > > > > > > > > Now, looking at the buffer I can see there's a node, but it > *never* seems to show up. It's driving me nuts!!! > > > > > > > --amk > > > > _______________________________________________ > > XML-SIG maillist - XML-SIG@python.org > > http://www.python.org/mailman/listinfo/xml-sig > > -- > .tom > > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. Admin Orion System AS ********************************************************************** From tgagne@efinnet.com Wed Oct 4 21:59:26 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 16:59:26 -0400 Subject: [XML-SIG] Re: The typo doesn't seem to change things... References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB912B.DC401BDA@ix.netcom.com> <009b01c02e44$aee1a090$43145c3e@orion.no> Message-ID: <39DB9A2E.D3103629@ix.netcom.com> Yes, it was a typo, but the node still seems to disappear after parsing. -- .tom From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 02:41:11 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 5 Oct 2000 03:41:11 +0200 Subject: [XML-SIG] Accessing DOM nodes in Python In-Reply-To: <39DB95C1.F0C1F5F5@ix.netcom.com> (message from Thomas Gagne on Wed, 04 Oct 2000 16:40:33 -0400) References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com> <39DB95C1.F0C1F5F5@ix.netcom.com> Message-ID: <200010050141.DAA00782@loewis.home.cs.tu-berlin.de> > Which shows the childnode. Is it possible we have a problem with newline > characters in the buffer? Is this a parser problem or a StringIO problem? It's likely not a StringIO problem. Can you find out what parser you are using? I.e. print parser. Regards, Martin From tgagne@efinnet.com Thu Oct 5 03:31:12 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 22:31:12 -0400 Subject: [XML-SIG] Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com> <39DB95C1.F0C1F5F5@ix.netcom.com> <200010050141.DAA00782@loewis.home.cs.tu-berlin.de> Message-ID: <39DBE7F0.A35A303B@ix.netcom.com> Can you tell which parser you're using? print parser When I check /usr/lib/python1.5/site-packages/xml/dom, it appears to be PyXML-0.5.5.1. -- .tom From tgagne@efinnet.com Thu Oct 5 04:18:09 2000 From: tgagne@efinnet.com (Thomas Gagne) Date: Wed, 04 Oct 2000 23:18:09 -0400 Subject: [XML-SIG] FIXED: Accessing DOM nodes in Python References: <39DB8D57.F59A80D8@ix.netcom.com> Message-ID: <39DBF2F0.30675B1A@ix.netcom.com> The buffer I was getting back was from a middleware routine retrieving each line one at a time. Since the middleware's API is C based, string are returned with a trailing NULL byte. Since the API doesn't care whether the data is text or binary it dutifully returns the trailing NULL byte to the Python interface which perturbs string processing--especially when one string is appended to another, the NULL bytes remain. I have to figure out where the appropriate place is to trim the NULL byte from the end of each line and things should be cool. I'd like to thank everyone for their help. -- .tom From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 08:32:26 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 5 Oct 2000 09:32:26 +0200 Subject: [XML-SIG] FIXED: Accessing DOM nodes in Python In-Reply-To: <39DBF2F0.30675B1A@ix.netcom.com> (message from Thomas Gagne on Wed, 04 Oct 2000 23:18:09 -0400) References: <39DB8D57.F59A80D8@ix.netcom.com> <39DBF2F0.30675B1A@ix.netcom.com> Message-ID: <200010050732.JAA00717@loewis.home.cs.tu-berlin.de> > I have to figure out where the appropriate place is to trim the NULL > byte from the end of each line and things should be cool. I'd say it's in the C API, when it creates Python objects. The terminating 0 should not be counted towards the size of the string if you are using PyString_FromStringAndSize; you better use PyString_FromString in this case. Regards, Martin From christian@ellguth.de Thu Oct 5 10:26:46 2000 From: christian@ellguth.de (Christian Ellguth) Date: Thu, 5 Oct 2000 11:26:46 +0200 Subject: [XML-SIG] German Umlauts Message-ID: <00100511264600.11550@cellguth> Hi, I have some troubles using named entities like ä . Everytime the parser encounters an entity like this it stops parsing with= the=20 message "unknown entity at ... " . If I use the numerical representation like ä for ä everything w= orks=20 fine. Is this a bug in the python sax-parser, or did I omit to tell the=20 documenthandler what to do with named entities ? Thank you for your reply, Christian --=20 Universitaetsbibliothek Braunschweig Christian Ellguth Pockelsstr. 13 38106 Braunschweig From Juergen Hermann" Message-ID: On Thu, 5 Oct 2000 11:26:46 +0200, Christian Ellguth wrote: >Is this a bug in the python sax-parser, or did I omit to tell the >documenthandler what to do with named entities ? No, you omitted to read the XML specs thoroughly. ;) XML knows exactly = FIVE entities by default, anything else has to be defined. You have 3 options: 1) use encoding=3D"iso-8859-1" and then literal =E4=F6=FC=C4=D6=DC char= s 2) define the entities yourself (or better, include the latin1.ent file, which is part of XHTML for example (see the w3c site)) 3) use the numerical representation Option 1 is what is normally used, option 2 is when you want to re-use "old" HTML that is converted to XHTML, option 3 is for quick hacks. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From jeremy@beopen.com Thu Oct 5 17:32:34 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Thu, 5 Oct 2000 12:32:34 -0400 (EDT) Subject: [XML-SIG] SAX exceptions are odd Message-ID: <14812.44322.692362.12640@bitdiddle.concentric.net> I am just learning how to use SAX and am a bit puzzled by a few of the exceptions that get raised or not raised. If I call on parse on an empty file, I get no exception. Is this desirable? I assume it means that "" is well-formed XML, but that doesn't seem like a very helpful definition. Is this right? If I get almost any other exception I get an error message that says something like: "not well-formed at None:1:7" Why is None being printed? It gave me the initial impression that my error was no setting up parse call correctly. I assumed that the None was the cause of the exception and that under normal circumstances it would have said something like "not well-formed at foo.xml:1:7". What is a system identifier and why should it be reported in an exception when it is None? I also think the format is odd. There are three different pieces of information separated by colons. I am accustomed to the notation filename:line number, but not another colon for the cursor position. It would have been clearer, I think, if the message were more verbose and explained what each field was. Jeremy From larsga@garshol.priv.no Thu Oct 5 17:51:56 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Oct 2000 18:51:56 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net> References: <14812.44322.692362.12640@bitdiddle.concentric.net> Message-ID: * Jeremy Hylton | | If I call on parse on an empty file, I get no exception. Is this | desirable? I assume it means that "" is well-formed XML, but that | doesn't seem like a very helpful definition. Is this right? No, it's not right. You should get an error telling you that the document element is required. | If I get almost any other exception I get an error message that says | something like: "not well-formed at None:1:7" Expat is not very good at providing informative error messages, so I don't think you can expect much more. If you want better error messages you should probably use xmlproc or xmllib. As for the None that should imply that you just gave the parser a string to parse and didn't provide it with a system identifier (ie: URL or file name). | Why is None being printed? It gave me the initial impression that my | error was no setting up parse call correctly. I assumed that the None | was the cause of the exception and that under normal circumstances it | would have said something like "not well-formed at foo.xml:1:7". If you told it that you were parsing from foo.xml it should definitely return that information in the error message. Can you show us the exact call to parse? | What is a system identifier and why should it be reported in an | exception when it is None? The system identifier is SGML-speak (and XML-speak) for the location of the document being parsed. I guess we could leave it out in the cases where it is None, if people prefer that. (I personally have no opinion on that.) | I also think the format is odd. There are three different pieces of | information separated by colons. I am accustomed to the notation | filename:line number, but not another colon for the cursor position. | It would have been clearer, I think, if the message were more | verbose and explained what each field was. How about this: "Not well-formed in foo.xml at line %d, column %d." If you prefer that I'd be happy to change both that and the lost system identifier (if that is indeed the problem). --Lars M. From jeremy@beopen.com Thu Oct 5 21:59:08 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Thu, 5 Oct 2000 16:59:08 -0400 (EDT) Subject: [XML-SIG] Re: SAX exceptions are odd Message-ID: <14812.60316.775448.910249@bitdiddle.concentric.net> [Lars M. writes:] >* Jeremy Hylton >| >| If I call on parse on an empty file, I get no exception. Is this >| desirable? I assume it means that "" is well-formed XML, but that >| doesn't seem like a very helpful definition. Is this right? > >No, it's not right. You should get an error telling you that the >document element is required. Ok. Then consider it a bug report :-). Can you fix this and add a test case to the test suite? > >| If I get almost any other exception I get an error message that says >| something like: "not well-formed at None:1:7" > >Expat is not very good at providing informative error messages, so I >don't think you can expect much more. If you want better error >messages you should probably use xmlproc or xmllib. I think the explanation part of the error message is okay, could be better but not terrible. The part that's confusing is the formatting. >As for the None that should imply that you just gave the parser a >string to parse and didn't provide it with a system identifier (ie: >URL or file name). How does it know when I pass it a string and when I pass it a system identifier? In Python, system identifiers are strings?!? What if I have a file called "" will it open that file or attempt to parse it as a string? >| Why is None being printed? It gave me the initial impression that my >| error was no setting up parse call correctly. I assumed that the None >| was the cause of the exception and that under normal circumstances it >| would have said something like "not well-formed at foo.xml:1:7". > >If you told it that you were parsing from foo.xml it should definitely >return that information in the error message. Can you show us the >exact call to parse? I have a file foo in my current directory. I fire up Python: > ls -l foo -rw-rw-r-- 1 jeremy admin 0 Oct 5 16:57 foo c> python Python 2.0b2 (#18, Oct 5 2000, 09:53:11) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from xml.sax import parse, ContentHandler >>> parse("foo", ContentHandler()) >>> >| What is a system identifier and why should it be reported in an >| exception when it is None? > >The system identifier is SGML-speak (and XML-speak) for the location >of the document being parsed. I guess we could leave it out in the >cases where it is None, if people prefer that. (I personally have no >opinion on that.) I personally prefer that. > >| I also think the format is odd. There are three different pieces of >| information separated by colons. I am accustomed to the notation >| filename:line number, but not another colon for the cursor position. >| It would have been clearer, I think, if the message were more >| verbose and explained what each field was. > >How about this: > > "Not well-formed in foo.xml at line %d, column %d." > >If you prefer that I'd be happy to change both that and the lost >system identifier (if that is indeed the problem). I would like this a lot better. It will be appreciated by novice programmers and whiners like me. Jeremy From fdrake@beopen.com Fri Oct 6 03:04:07 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 5 Oct 2000 22:04:07 -0400 (EDT) Subject: [XML-SIG] xml/dom/ext/reader/test_suite/ ? Message-ID: <14813.13079.166524.629646@cj42289-a.reston1.va.home.com> Will this test remain in it's current location? This seems like the wrong place for it, and it isn't a package. I suspect the test/ directory would provide a better home, but I don't want to move it without the 4Thought team having a chance to object. ;) -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From Juergen Hermann" On Thu, 5 Oct 2000 21:18:27 +0200, Martin v. Loewis wrote: >Actually, there is a fourth one which I believe is the officially >preferred one: Encode your text as UTF-8 (i.e. no encoding=3D >attribute). That will remove the need to have any character entities, >except for the five predefined ones. That is a variation of 1), which you can use when you have an UTF- enabled editor (or the files are machine-generated anyway). Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 22:36:18 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 5 Oct 2000 23:36:18 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net> (message from Jeremy Hylton on Thu, 5 Oct 2000 12:32:34 -0400 (EDT)) References: <14812.44322.692362.12640@bitdiddle.concentric.net> Message-ID: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> > If I call on parse on an empty file, I get no exception. Is this > desirable? I assume it means that "" is well-formed XML, but that > doesn't seem like a very helpful definition. Is this right? No, that looks like a bug in the expat parser. The xmlproc parser (in PyXML) properly reports FATAL ERROR in /tmp/foo:1:0: Premature document end, no root element (when foo is an empty file) > If I get almost any other exception I get an error message that says > something like: "not well-formed at None:1:7" > > Why is None being printed? It gave me the initial impression that my > error was no setting up parse call correctly. I assumed that the None > was the cause of the exception and that under normal circumstances it > would have said something like "not well-formed at foo.xml:1:7". If the InputSource object has a proper system identifier, it should print it. It may be useful to print something different if it is None, e.g. "not well-formed at :1:7" If you did provide a file name, and it got lost somewhere - then that is a bug. > What is a system identifier and why should it be reported in an > exception when it is None? I believe it is the SGML term for "file name". In SGML, documents may have "public identifiers", in which case a globally well-known string refers to the name of the document, and a system identifier - whose meaning is understood only on the local computer system. I also believe XML more specifically thinks of system identifiers as URLs - although it is common to allow strings which are not URLs (according to the RFC). > There are three different pieces of > information separated by colons. I am accustomed to the notation > filename:line number, but not another colon for the cursor position. That's a matter of taste - you can write your own ErrorHandler if you don't like the output. I personally understood immediately that notation, as this is what Emacs supports as file locations. > It would have been clearer, I think, if the message were more > verbose and explained what each field was. For reproducability, it is probably best if it is terse - we would probably have a long debate on what it should look like if it had to change. Regards, Martin From jeremy@beopen.com Fri Oct 6 16:26:37 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Fri, 6 Oct 2000 11:26:37 -0400 (EDT) Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> References: <14812.44322.692362.12640@bitdiddle.concentric.net> <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> Message-ID: <14813.61229.663642.454479@bitdiddle.concentric.net> >>>>> "MvL" == Martin v Loewis writes: >> If I get almost any other exception I get an error message that >> says something like: "not well-formed at None:1:7" >> >> Why is None being printed? It gave me the initial impression >> that my error was no setting up parse call correctly. I assumed >> that the None was the cause of the exception and that under >> normal circumstances it would have said something like "not >> well-formed at foo.xml:1:7". MvL> If the InputSource object has a proper system identifier, it MvL> should print it. It may be useful to print something different MvL> if it is None, e.g. MvL> "not well-formed at :1:7" MvL> If you did provide a file name, and it got lost somewhere - MvL> then that is a bug. (It looks like you may have missed my second message on this subject.) I did pass a filename that was lost. >> There are three different pieces of information separated by >> colons. I am accustomed to the notation filename:line number, >> but not another colon for the cursor position. MvL> That's a matter of taste - you can write your own ErrorHandler MvL> if you don't like the output. I personally understood MvL> immediately that notation, as this is what Emacs supports as MvL> file locations. It is a matter of taste. We have been trying to improve the quality and verbosity of error messages raised by Python code so that novices have a better chance of understanding them. It is no help to tell a beginner: "The error messages produced by the xml packages are a tad obscure. Just write a subclass that makes the errors clearer." >> It would have been clearer, I think, if the message were more >> verbose and explained what each field was. MvL> For reproducability, it is probably best if it is terse - we MvL> would probably have a long debate on what it should look like MvL> if it had to change. I don't understand what you mean by reproducability. The ability to reproduce an error message has nothing to do with whether it is terse or verbose. I liked the suggested error message that Lars proposed a *lot* better. Jeremy From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 16:26:38 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 17:26:38 +0200 Subject: [XML-SIG] Re: SAX exceptions are odd In-Reply-To: <14812.60316.775448.910249@bitdiddle.concentric.net> (message from Jeremy Hylton on Thu, 5 Oct 2000 16:59:08 -0400 (EDT)) References: <14812.60316.775448.910249@bitdiddle.concentric.net> Message-ID: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> > How does it know when I pass it a string and when I pass it a system > identifier? In Python, system identifiers are strings?!? What if I > have a file called "" will it open that file or attempt to parse > it as a string? If you invoke xml.sax.parse, it will always be understood as a system identifier - you should invoke parseString if you have a "here" document. These are convenience functions - the full API has the notion of InputSource objects, which are the primary means to tell a parser what to process. There is some magic telling file names apart from file objects, but that can't also tell apart system identifiers and here documents - hence the two functions. > >How about this: > > > > "Not well-formed in foo.xml at line %d, column %d." > > > >If you prefer that I'd be happy to change both that and the lost > >system identifier (if that is indeed the problem). > > I would like this a lot better. It will be appreciated by novice > programmers and whiners like me. I'd like to caution again: No matter what string is taken now, it will have to stay forever. Other tools will expect that a certain Python application formats its XML error messages in a certain way, and they will whine if that is ever changed. If that consequence is accepted, then it's fine with me to change that string... Regards, Martin From jeremy@beopen.com Fri Oct 6 16:41:49 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Fri, 6 Oct 2000 11:41:49 -0400 (EDT) Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> References: <14812.44322.692362.12640@bitdiddle.concentric.net> <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> Message-ID: <14813.62141.233397.345264@bitdiddle.concentric.net> Here is another potential problem with xml exceptions. There may not be anything to do about it, because the sax package is, by design, very clever about imports. >>> from xml import sax >>> sax.parse("", sax.ContentHandler()) Traceback (most recent call last): File "", line 1, in ? File "./../Lib/xml/sax/__init__.py", line 29, in parse parser = make_parser() File "./../Lib/xml/sax/__init__.py", line 79, in make_parser raise SAXException("No parsers found", None) xml.sax._exceptions.SAXException: No parsers found This puzzled me for quite a while, because I was sure I had a parser. [continuing same session:] >>> import pyexpat >>> I start poking around in the internals of the sax implementation. I see that I ought to be able to import xml.sax.expatreader. So I fire up a new session and try it: >>> import xml.sax.expatreader Traceback (most recent call last): File "", line 1, in ? File "/home/jeremy/src/python/dist/src/Lib/xml/sax/expatreader.py", line 10, in ? from xml.sax import xmlreader, saxutils, handler File "/home/jeremy/src/python/dist/src/Lib/xml/sax/saxutils.py", line 6, in ? import os, urlparse, urllib, types File "/home/jeremy/src/python/dist/src/Lib/urllib.py", line 26, in ? import socket File "/home/jeremy/src/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: libssl.so.0: cannot open shared object file: No such file or directory So the problem is a bogus local configuration for shared libraries. On the one hand, I haven't installed Python properly, so I shouldn't expect things to work. On the other hand, it would be helpful if unexpected exceptions could be reported. Is there any way to provide an informative error message in this case? Jeremy From jeremy@beopen.com Fri Oct 6 16:44:31 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Fri, 6 Oct 2000 11:44:31 -0400 (EDT) Subject: [XML-SIG] Re: SAX exceptions are odd In-Reply-To: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> Message-ID: <14813.62303.243577.289385@bitdiddle.concentric.net> >>>>> "MvL" == Martin v Loewis writes: MvL> I'd like to caution again: No matter what string is taken now, MvL> it will have to stay forever. Other tools will expect that a MvL> certain Python application formats its XML error messages in a MvL> certain way, and they will whine if that is ever changed. MvL> If that consequence is accepted, then it's fine with me to MvL> change that string... What tools are there that depend on the string representation of the exception object raised by a Python library? The proposal is not to change the API of the exception object, just what gets printed when the program exits. It would be bad form for an external program to depend in some way on the error messages Python prints. There is definitely no guarantee that they will remain unchanged from version to version. Jeremy From noreply@sourceforge.net Fri Oct 6 16:47:06 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 6 Oct 2000 08:47:06 -0700 Subject: [XML-SIG] [Bug #116246] 2.0b2: the Windows installer installs empty .py files Message-ID: <200010061547.IAA23110@bush.i.sourceforge.net> Bug #116246, was updated on 2000-Oct-06 08:47 Here is a current snapshot of the bug. Project: Python/XML Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Summary: 2.0b2: the Windows installer installs empty .py files Details: Running PyXML-0.6.0.win32.exe (on NT4 SP6, Python 2.0b2 installation) appears to run correctly but all of the whatever.py files it places in _xmlplus appear to be empty (zero-length). For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=116246&group_id=6473 From akuchlin@mems-exchange.org Fri Oct 6 16:50:01 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 6 Oct 2000 11:50:01 -0400 Subject: [XML-SIG] Re: SAX exceptions are odd In-Reply-To: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Fri, Oct 06, 2000 at 05:26:38PM +0200 References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> Message-ID: <20001006115001.A27789@kronos.cnri.reston.va.us> On Fri, Oct 06, 2000 at 05:26:38PM +0200, Martin v. Loewis wrote: >have to stay forever. Other tools will expect that a certain Python >application formats its XML error messages in a certain way, and they >will whine if that is ever changed. Shouldn't the exception class have attributes for .filename, .line, .column, though, which is all an application needs to be concerned with? In fact I thought SAX exceptions already had this, but perhaps I'm misremembering. --amk From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 16:55:10 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 17:55:10 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14813.61229.663642.454479@bitdiddle.concentric.net> (message from Jeremy Hylton on Fri, 6 Oct 2000 11:26:37 -0400 (EDT)) References: <14812.44322.692362.12640@bitdiddle.concentric.net> <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.61229.663642.454479@bitdiddle.concentric.net> Message-ID: <200010061555.RAA01020@loewis.home.cs.tu-berlin.de> > MvL> If you did provide a file name, and it got lost somewhere - > MvL> then that is a bug. > > (It looks like you may have missed my second message on this subject.) > I did pass a filename that was lost. Actually, part of Germany was cut-off part of the US for the last day, so some messages got sent later. > I don't understand what you mean by reproducability. The ability to > reproduce an error message has nothing to do with whether it is terse > or verbose. If people write test suites, then they expact a certain output to determine that a test failed. Any changes to the format of the output will break the test case. The more verbose the text is, the more likely are people to change it from release to release - not considering that they may break things by changing some error message formats. Likewise, I expect that programs will parse the output of Python programs, and expect a certain formatting. Such programs won't work if people change strings. > I liked the suggested error message that Lars proposed a *lot* better. Reviewing the formats that Emacs' compilation-error-regexp-alist supports, I found that :::error message is quite common (and GNU standard), and that it would also recognize an additional 'program name', so it correctly parses xml.sax._exceptions.SAXParseException: a.c:10:16:not well-formed (it does not parse the current text, as the error message precedes the line information) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:03:06 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 18:03:06 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14813.62141.233397.345264@bitdiddle.concentric.net> (message from Jeremy Hylton on Fri, 6 Oct 2000 11:41:49 -0400 (EDT)) References: <14812.44322.692362.12640@bitdiddle.concentric.net> <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.62141.233397.345264@bitdiddle.concentric.net> Message-ID: <200010061603.SAA01153@loewis.home.cs.tu-berlin.de> > So the problem is a bogus local configuration for shared libraries. > On the one hand, I haven't installed Python properly, so I shouldn't > expect things to work. On the other hand, it would be helpful if > unexpected exceptions could be reported. Is there any way to provide > an informative error message in this case? One idea is that drivers should specifically distinguish between "expected" import errors and unexpected ones, e.g. class MissingFeature(ImportError): pass Then, drivers should catch ImportError when they expect a failure, and a plain (unexpected) ImportError would get through (*). I can try to come up with a patch for that, as this is repeatedly causing problems. Regards, Martin (*) Actually, we have to separate the case that the driver module itself does not exist, and that processing it caused an ImportError. That can be done by looking at sys.modules. From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:07:42 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 18:07:42 +0200 Subject: [XML-SIG] Re: SAX exceptions are odd In-Reply-To: <14813.62303.243577.289385@bitdiddle.concentric.net> (message from Jeremy Hylton on Fri, 6 Oct 2000 11:44:31 -0400 (EDT)) References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> <14813.62303.243577.289385@bitdiddle.concentric.net> Message-ID: <200010061607.SAA01157@loewis.home.cs.tu-berlin.de> > What tools are there that depend on the string representation of the > exception object raised by a Python library? None at the moment (although I wish Emacs would recognize these strings - it would with a slight change). When Python 2.1 will be released, it is quite possible that tools might rely on that - at which time it won't be possible anymore to change the string. At the moment, it still is. > It would be bad form for an external program to depend in some way on > the error messages Python prints. There is definitely no guarantee > that they will remain unchanged from version to version. The external programs may have no other option - stdin/stdout/sterr is the typical way of communicating with a program. Nobody would try to parse a genuine Python traceback, as that often is a bug in the script. However, SAX exceptions are raised for errors in the XML, so this is different. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:45:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 18:45:13 +0200 Subject: [XML-SIG] Re: SAX exceptions are odd In-Reply-To: <20001006115001.A27789@kronos.cnri.reston.va.us> (message from Andrew Kuchling on Fri, 6 Oct 2000 11:50:01 -0400) References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> <20001006115001.A27789@kronos.cnri.reston.va.us> Message-ID: <200010061645.SAA01323@loewis.home.cs.tu-berlin.de> > Shouldn't the exception class have attributes for .filename, .line, > .column, though, which is all an application needs to be concerned > with? In fact I thought SAX exceptions already had this, but perhaps > I'm misremembering. They certainly do. Discussion is what __str__ should return for them. Regards, Martin From alf@logilab.com Fri Oct 6 22:17:39 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Fri, 6 Oct 2000 23:17:39 +0200 (CEST) Subject: [XML-SIG] Problem parsing the xhtml dtd Message-ID: Hi, I'm trying to parse the XHTML dtd () with xmlproc (as of PyXml 0.5.5.1). The python code I use is the following: from xml.parsers.xmlproc.dtdparser import DTDParser from xml.parsers.xmlproc.xmldtd import CompleteDTD parser = DTDParser() dtd = CompleteDTD(parser) parser.set_dtd_consumer(dtd) parser.set_dtd_object(dtd) parser.parse_resource('xhtml1-strict.dtd') parser.deref() I get the following message: ERROR: xml:space must have exactly the values 'default' and 'preserve' at xhtml1-strict.dtd:315:47 TEXT: '> The problem occurs on the following block : The correction involved modifying the last line of the block: xml:space (default|preserve) #FIXED "preserve" > Is this a bug in xmlproc or in the W3C DTD ? -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 22:43:08 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 23:43:08 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14813.62141.233397.345264@bitdiddle.concentric.net> (message from Jeremy Hylton on Fri, 6 Oct 2000 11:41:49 -0400 (EDT)) References: <14812.44322.692362.12640@bitdiddle.concentric.net> <200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.62141.233397.345264@bitdiddle.concentric.net> Message-ID: <200010062143.XAA27398@loewis.home.cs.tu-berlin.de> > So the problem is a bogus local configuration for shared libraries. > On the one hand, I haven't installed Python properly, so I shouldn't > expect things to work. On the other hand, it would be helpful if > unexpected exceptions could be reported. Is there any way to provide > an informative error message in this case? I've just committed a change that will give you an ImportError in this case - only a failure to import xml.parsers.expat will be ignored. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 22:44:49 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 6 Oct 2000 23:44:49 +0200 Subject: [XML-SIG] SAX exceptions are odd In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net> (message from Jeremy Hylton on Thu, 5 Oct 2000 12:32:34 -0400 (EDT)) References: <14812.44322.692362.12640@bitdiddle.concentric.net> Message-ID: <200010062144.XAA27414@loewis.home.cs.tu-berlin.de> > If I call on parse on an empty file, I get no exception. Is this > desirable? I assume it means that "" is well-formed XML, but that > doesn't seem like a very helpful definition. Is this right? I have installed a patch to fix this. > If I get almost any other exception I get an error message that says > something like: "not well-formed at None:1:7" > > Why is None being printed? I have also installed a patch to fix that. > I also think the format is odd. I did not (and will not) change that, though - somebody else might go ahead, though. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 23:53:17 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 7 Oct 2000 00:53:17 +0200 Subject: [XML-SIG] Problem parsing the xhtml dtd In-Reply-To: (message from Alexandre Fayolle on Fri, 6 Oct 2000 23:17:39 +0200 (CEST)) References: Message-ID: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> > Is this a bug in xmlproc or in the W3C DTD ? XML 1.0 says # A special attribute named xml:space may be attached to an element to # signal an intention that in that element, white space should be # preserved by applications. In valid documents, this attribute, like # any other, must be declared if it is used. When declared, it must be # given as an enumerated type whose only possible values are "default" # and "preserve". As a non-native speaker of English, that sentence sounds ambiguous to me: Does it mean that xml:space must have no more, no less than "default" and "preserve" as possible values, or does it mean it may have less than these values? Regards, Martin From tpassin@home.com Sat Oct 7 04:29:12 2000 From: tpassin@home.com (tpassin@home.com) Date: Fri, 6 Oct 2000 23:29:12 -0400 Subject: [XML-SIG] Problem parsing the xhtml dtd References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> Message-ID: <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com> Martin v. Loewis asks - > XML 1.0 says > > # A special attribute named xml:space may be attached to an element to > # signal an intention that in that element, white space should be > # preserved by applications. In valid documents, this attribute, like > # any other, must be declared if it is used. When declared, it must be > # given as an enumerated type whose only possible values are "default" > # and "preserve". > > As a non-native speaker of English, that sentence sounds ambiguous to > me: Does it mean that xml:space must have no more, no less than > "default" and "preserve" as possible values, or does it mean it may > have less than these values? > There is an illustration in the Rec, right after the section that Martin quoted: To this native speaker of English, the text seems to mean exactly the same thing as the example does. To arrive at this conclusion, I must take the text in a very literal (or 'formal') way. A colloquial reading could give the impression that one of the two values could be omitted. Cheers, Tom Passin From martin@loewis.home.cs.tu-berlin.de Sat Oct 7 07:42:11 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 7 Oct 2000 08:42:11 +0200 Subject: [XML-SIG] Problem parsing the xhtml dtd In-Reply-To: <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com> Message-ID: <200010070642.IAA00667@loewis.home.cs.tu-berlin.de> > > > To this native speaker of English, the text seems to mean exactly the same > thing as the example does. To arrive at this conclusion, I must take the > text in a very literal (or 'formal') way. A colloquial reading could give > the impression that one of the two values could be omitted. Very interesting. That means that the W3C XHTML DTD is ill-formed, and that xmlproc properly detected that error. Regards, Martin From alf@logilab.com Sat Oct 7 09:59:56 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Sat, 7 Oct 2000 10:59:56 +0200 (CEST) Subject: [XML-SIG] Problem parsing the xhtml dtd In-Reply-To: <200010070642.IAA00667@loewis.home.cs.tu-berlin.de> Message-ID: On Sat, 7 Oct 2000, Martin v. Loewis wrote: > > > > > > To this native speaker of English, the text seems to mean exactly the same > > thing as the example does. To arrive at this conclusion, I must take the > > text in a very literal (or 'formal') way. A colloquial reading could give > > the impression that one of the two values could be omitted. > > Very interesting. That means that the W3C XHTML DTD is ill-formed, and > that xmlproc properly detected that error. Has anyone contacted the person responsible for this DTD ate W3C to inform them of the problem yet? I guess some people here have better contacts with W3C members than me ;o) -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From larsga@garshol.priv.no Sat Oct 7 13:51:16 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 07 Oct 2000 14:51:16 +0200 Subject: [XML-SIG] Problem parsing the xhtml dtd In-Reply-To: References: Message-ID: * Alexandre Fayolle | | Is this a bug in xmlproc or in the W3C DTD ? It's a bug in xmlproc. This is one of several cases where I guessed wrong about ambiguities in the XML 1.0 spec, according to the second edition of that spec. * Martin v. Loewis | | XML 1.0 says | | # A special attribute named xml:space may be attached to an element to | # signal an intention that in that element, white space should be | # preserved by applications. In valid documents, this attribute, like | # any other, must be declared if it is used. When declared, it must be | # given as an enumerated type whose only possible values are "default" | # and "preserve". | | As a non-native speaker of English, that sentence sounds ambiguous to | me: Does it mean that xml:space must have no more, no less than | "default" and "preserve" as possible values, or does it mean it may | have less than these values? I had exactly the same problem as you with this part of the spec when I implemented this. However, the second edition of the XML specification has improved this section and is now crystal clear: # A special attribute named xml:space may be attached to an element to # signal an intention that in that element, white space should be # preserved by applications. In valid documents, this attribute, like # any other, must be declared if it is used. When declared, it must be # given as an enumerated type whose values are one or both of "default" # and "preserve". ^^^^^^^^^^^ Once Python 2.0 is out I'm planning to improve xmlproc by - writing a full-featured SAX 2.0 driver with lots of features and properties - updating it to conform to the XML 1.0 2nd edition spec - adding full Unicode support The order and timing of these releases is still unclear. I've fixed this particular problem now in my private CVS tree. --Lars M. From MichaelDyck@home.com Sat Oct 7 22:23:53 2000 From: MichaelDyck@home.com (Michael Dyck) Date: Sat, 07 Oct 2000 14:23:53 -0700 Subject: [XML-SIG] Problem parsing the xhtml dtd References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com> Message-ID: <39DF9469.76A822ED@home.com> Martin v. Loewis wrote: > > XML 1.0 says > > # A special attribute named xml:space may be attached to an element to > # signal an intention that in that element, white space should be > # preserved by applications. In valid documents, this attribute, like > # any other, must be declared if it is used. When declared, it must be > # given as an enumerated type whose only possible values are "default" > # and "preserve". > > As a non-native speaker of English, that sentence sounds ambiguous to > me: Does it mean that xml:space must have no more, no less than > "default" and "preserve" as possible values, or does it mean it may > have less than these values? Apparently, it sounded ambiguous to others as well. The errata for XML 1.0 (http://www.w3.org/XML/xml-19980210-errata#E81) rewords it: --------------------------- Section 2.10 In the third paragraph, replace the sentence: When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". with: When declared, it must be given as an enumerated type whose values are one or both of "default" and "preserve". Add an example after the existing one (in the same table): Rationale The wording in the spec was ambigous on whether the value of the xml:space attribute could be limited to one of the two possible values. ---------------------------- The change has been incorporated into the 2nd edition of XML 1.0 (see http://www.w3.org/TR/2000/REC-xml-20001006#sec-white-space). -Michael Dyck From liu@netease.com Sat Oct 7 22:08:00 2000 From: liu@netease.com (liu) Date: Sun, 08 Oct 2000 05:08:00 +0800 Subject: [XML-SIG] Ô¤²â Message-ID: <20001007160504.7E7AA1C745A4B@mx1.netease.com> This is a Multipart MIME message. ------=_ST3201_0001_00DF2B82_01BE5704 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit Ò»¸öÈËÒ»ÉúµÄÃüÔ˵½µ×ÊÇÓÉʲôÀ´¾ö¶¨µÄ£¿´ð°¸²»ÊÇΨһµÄ£¬Ò»¸öÈ˵ÄÃû×Ö¡¢Éú»î¡¢¹¤×÷»·¾³µÈ¶ÔÒ»¸öÈ˵ÄÃüÔ˶¼ÓÐÖ±½ÓµÄÓ°Ï죬ͬÑù·¿×Ó£¬²»Í¬µÄÖ÷È˾Óס£¬·¿×Ó¶ÔÖ÷È˵ÄÓ°ÏìÊDz»Í¬µÄ£¬ÕâÒª¿´Õâ¸öÈ˵ij¡ÊÇÔõÑùµÄ£¿Ò»¸ö»·¾³ÓÐËû×Ô¼ºµÄ³¡£¬ÄÇôÁ½¸ö³¡ÈçºÎ²ÅÄÜÏàÅäÄØ£¿Õâ²»ÊÇÆÕͨµÄÈË¿ÉÒÔÖªµÀµÄ£¬ÖÜÒ×µÈһЩԤ²âѧ½²¾¿µÄÊǼÆË㣬¶øÓÐһЩÈË£¬ËûÃÇÓÐÌìÉúµÄÌØÒ칦ÄÜ£¬ÄÜ¿´µ½³£ÈË¿´²»µ½ µÄ¶«Î÷£¬ËûÃǶԳ£È˵ÄÖ¸µã£¬ÍùÍù·Ç³£ÓÐÓã¬ÔںܶàÄêÒÔÇ°£¬ÔÚÎÒ¹úµÄºÓ±±Ê¡£¬ÓиöСÄк¢£¬Í»È»ÓÐÒ»ÌìµÃÁËÒ»³¡Öز¡£¬²¡ºÃÖ®ºó£¬ÈËÃǾ­³£»áÌýµ½Ð¡Äк¢µÄ¶Ç×ÓÀïÓÐÈËÔÚ˵»°£¬´Ó´Ë£¬ÈËÃÇÖªµÀÕâ¸öСÄк¢µÄ¶Ç×Ó»á˵»°£¬µ«ÊǺóÀ´£¬ÈËÃÇÓÖ·¢ÏÖ£¬Õâ¸öСÄк¢¾ßÓÐÄÜ¿´µ½±ðÈË¿´²»µ½µÄ¶«Î÷µÄÌØÒ칦ÄÜ£¬»¹ÄÜ°ïÈËÖβ¡£¬½ñÌ죬Õâ¸öСÄк¢ÒѾ­³¤´óÁË£¬ÏÖÔÚÔڹ㶫£¬ËûÏÖÔÚΪÈËÃÇÌṩ¹«Ë¾»ò¸öÈËÆðÃû×Ö¡¢·çË®¡¢¹ÉƱ¡¢É̱ꡢ¼²²¡µÈµÄÔ¤²â¡£Õ⼸Ä꣬ËûΪºÜ¶àÈË×ö¹ýÔ¤²â£¬×¼È·Âʷdz£µÄ¸ß£¬ËûûÓÐÄÄÃÅÄÄÅÉ£¬ÍêÈ«¿¿×Ô¼ºµÄÌØÒ칦ÄÜ¡£Èç¹ûÄúÏë׼ȷͶ×Ê¡¢¸ÄÉƲ»ÀûÐÎÊƵȣ¬ÇëÓëÎÒÃÇÁªÏµ£¬ÎÒÃǽ«ÎªÄúÌṩ×î׼ȷµÄÒâ¼û¡£ ÁªÏµµç»°£º0757-2252618 ÁªÏµÈË£ºÁõС½ã »ò Áé¸ë ÁªÏµÇëÓõ绰£¬²»ÒªÊ¹Óõç×ÓÓʼþ ------=_ST3201_0001_00DF2B82_01BE5704 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="Ðû´«ÐÅ.txt" Content-Length: 1342 0ru49sjL0rvJ+rXEw/zUy7W9tdfKx9PJyrLDtMC0vva2qLXEo7+08LC4srvKx86o0ru1xKOs 0ru49sjLtcTD+9fWoaLJ+rvuoaK5pNf3u7e+s7XIttTSu7j2yMu1xMP81Mu2vNPQ1rG907XE 07DP7KOszazR+be/19OjrLK7zay1xNb3yMu+09eho6y3v9fTttTW98jLtcTTsM/syseyu82s tcSjrNXi0qq/tNXiuPbIy7XEs6HKx9T10fm1xKO/0ru49ru3vrPT0Mv719S8urXEs6GjrMTH w7TBvbj2s6HI57rOssXE3M/gxeTE2KO/1eKyu8rHxtXNqLXEyMu/ydLU1qq1wLXEo6zW3NLX tcjSu9Cp1KSy4tGnvbK+v7XEyse8xsvjo6y2+NPQ0rvQqcjLo6zL+8PH09DM7Mn6tcTM2NLs uabE3KOsxNy/tLW9s6PIy7+0sru1vSC1xLarzvejrMv7w8e21LOjyMu1xNa4teOjrM35zfm3 x7Oj09DTw6Os1Nq63LbgxOrS1Mewo6zU2s7Sufq1xLrTsbHKoaOs09C49tChxNC6oqOszbvI u9PQ0rvM7LXDwcvSu7Oh1tiyoaOssqG6w9auuvOjrMjLw8e+rbOju+HM/bW90KHE0LqitcS2 x9fTwO/T0MjL1NrLtbuwo6y007TLo6zIy8PH1qq1wNXiuPbQocTQuqK1xLbH19O74cu1u7Cj rLWryse688C0o6zIy8PH09a3os/Wo6zV4rj20KHE0Lqivt/T0MTcv7S1vbHwyMu/tLK7tb21 xLarzve1xMzY0uy5psTco6y7ucTcsO/Iy9bOsqGjrL3xzOyjrNXiuPbQocTQuqLS0b6ts6S0 88HLo6zP1tTa1Nq547aro6zL+8/W1NrOqsjLw8fM4bmpuavLvrvyuPbIy8bww/vX1qGit+fL rqGiucnGsaGiycyx6qGivLKyobXItcTUpLLioaPV4ry4xOqjrMv7zqq63LbgyMvX9rn91KSy 4qOs17zIt8LKt8ezo7XEuN+jrMv7w7vT0MTEw8XExMXJo6zN6sirv7/X1Ly6tcTM2NLsuabE 3KGjyOe5+8T6z+vXvMi3zbbXyqGiuMTJxrK7wPvQzsrGtcijrMfr0+vO0sPHwarPtaOsztLD x72rzqrE+szhuanX7te8yLe1xNLivPuhow0KDQrBqs+1tee7sKO6MDc1Ny0yMjUyNjE4ICAg DQoNCsGqz7XIy6O6wfXQob3jILvyIMHpuOsNCsGqz7XH69PDtee7sKOssrvSqsq508O159fT 08q8/g== ------=_ST3201_0001_00DF2B82_01BE5704-- From martin@loewis.home.cs.tu-berlin.de Sun Oct 8 09:33:35 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 8 Oct 2000 10:33:35 +0200 Subject: [XML-SIG] 4DOM bugs Message-ID: <200010080833.KAA00815@loewis.home.cs.tu-berlin.de> While porting PyXML's test_dom to 4DOM, I noticed a number of problems, which I believe are bugs in 4DOM. Consider from xml.dom import implementation doc = implementation.createDocument(None,None,None) n1 = doc.createElement('n1') ; n2 = doc.createElement('n2') pi = doc.createProcessingInstruction("Processing", "Instruction") doc.appendChild(pi) doc.appendChild(n1) #doc.appendChild(n1) # fails, but shouldn't doc.replaceChild(n2, n1) doc.replaceChild(pi, n2) print doc.documentElement The line "doc.appendChild(n1)" raises a hierarchy exception, as n1 is already in the tree. However, this is incorrect: it should first remove n1, then reinsert it. The second fragment does not cause an exception. However, in the end, the "documentElement" of the document is a processing instruction. That is very strange - it should always be an element. I've been using the 4DOM version that is currently in the PyXML CVS. Regards, Martin From loewis@informatik.hu-berlin.de Sun Oct 8 15:57:35 2000 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 8 Oct 2000 16:57:35 +0200 (MET DST) Subject: [XML-SIG] PyXML 0.6.1 release Message-ID: <200010081457.QAA09543@pandora.informatik.hu-berlin.de> Version 0.6.1 of the Python/XML distribution is now available. It should be considered a beta release, and can be downloaded from the following URLs: http://download.sourceforge.net/pyxml/PyXML-0.6.1.tar.gz http://download.sourceforge.net/pyxml/PyXML-0.6.1.win32-py1.5.exe http://download.sourceforge.net/pyxml/PyXML-0.6.1.win32-py2.0.exe http://download.sourceforge.net/pyxml/PyXML-0.6.1-1.5.2.i386.rpm http://download.sourceforge.net/pyxml/PyXML-0.6.1-2.0b2.i386.rpm Changes in this version, compared to 0.6.0: * Support for Python 1.5.2 was restored, as long as no character set recoding is required * The 4DOM package was updated. * Most of the test suite now passes again. * The tutorial was updated. Changes of version 0.6.0, compared to 0.5.x: * The 4DOM package has been integrated into PyXML. * The package supports now SAX2 interfaces in addition to the SAX1 interfaces. Currently, pyexpat and xmlproc can serve as SAX2 drivers. * The proprietary Unicode type has been removed. Instead, PyXML now relies on the standard Python Unicode type. In turn, PyXML 0.6.0 will not work with Python 1.5. It has been tested with 2.0b1. * PyXML now operates on top of the XML package coming in Python 2. The Python/XML distribution contains the basic tools required for processing XML data using the Python programming language, assembled into one easy-to-install package. The distribution includes parsers and standard interfaces such as SAX and DOM, along with various other useful modules. The package currently contains: * XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius Garshol), xmllib.py (Sjoerd Mullender) using the sgmlop.c accelerator module (Fredrik Lundh). * SAX interface (Lars Marius Garshol) * DOM interface (Stefane Fermigier, A.M. Kuchling) * 4DOM interface from Fourthought (Uche Ogbuji, Mike Olson) * xmlarch.py, for architectural forms processing (Geir Ove Grønmo) * Various utility modules and functions (various people) * Documentation and example programs (various people) The code is being developed bazaar-style by contributors from the Python XML Special Interest Group, so please send comments, questions, or bug reports to . For more information about Python and XML, see: http://www.python.org/topics/xml/ -- Martin v. Löwis http://www.informatik.hu-berlin.de/~loewis From el@buch.biblio.etc.tu-bs.de Mon Oct 9 11:33:46 2000 From: el@buch.biblio.etc.tu-bs.de (Christian Ellguth) Date: Mon, 9 Oct 2000 12:33:46 +0200 Subject: [XML-SIG] Parsers and their behaviours Message-ID: <00100912334600.27248@cellguth> Is there any documentation on the various XML-parsers and their capabilit= ies ? If yes, where can I find it ! I'am using the drv_xmlproc.SAX_XPParser parser.=20 I am a newbie to the Python/XML Library and have tried to understand the=20 symple_appl.py script from Simon Pepping. If the script encounters the numeric representation of a german umlaut it= =20 diplays the umlaut in the correct way but the parser adds an additional \= n to=20 the character but continues parsing the XML-File. If I use the drv_xmlproc.SAX_XPParser in my script the parser continues t= o=20 parse the file but all strings containing numerical representations of=20 entities are shortened after the entity and the rest of the string is los= t. Thank you for your replies Christian --=20 Universitaetsbibliothek Braunschweig Christian Ellguth Pockelsstr. 13 38106 Braunschweig From larsga@garshol.priv.no Mon Oct 9 11:43:14 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 09 Oct 2000 12:43:14 +0200 Subject: [XML-SIG] Parsers and their behaviours In-Reply-To: <00100912334600.27248@cellguth> References: <00100912334600.27248@cellguth> Message-ID: * Christian Ellguth | | Is there any documentation on the various XML-parsers and their | capabilities ? No, not really. It would be nice to produce this as part of the SAX 2.0 effort, but I'm afraid that will take some time. | If I use the drv_xmlproc.SAX_XPParser in my script the parser | continues to parse the file but all strings containing numerical | representations of entities are shortened after the entity and the | rest of the string is lost. Most likely this is a bug in your script. SAX allows parsers to call the characters() method more than once for a single block of character data and character references (and entity references) are just the sort of thing that will cause a parser to call it more than once. So most likely your script does not handle this case correctly. --Lars M. From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 20:06:00 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Oct 2000 21:06:00 +0200 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests In-Reply-To: <20000929180212.C20008@kronos.cnri.reston.va.us> (message from Andrew Kuchling on Fri, 29 Sep 2000 18:02:12 -0400) References: <200009292122.XAA01777@loewis.home.cs.tu-berlin.de> <20000929180212.C20008@kronos.cnri.reston.va.us> Message-ID: <200010091906.VAA00791@loewis.home.cs.tu-berlin.de> > I vaguely recall that someone at FourThought once asked me if that > would be OK, but don't know if anyone actually did it. It would be > a good idea to port them, since they made some attempt at being > exhaustive (trying the various error cases, etc.). I've started doing that (test_dom.py), but found that 4DOM simply won't pass the tests. I'd appreciate if somebody could look at the current failures in the code, and tell me whether the test case is overly strict (or simply broken), or whether there are genuine bugs in 4DOM. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 20:11:46 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Oct 2000 21:11:46 +0200 Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests In-Reply-To: <39D79954.342F7478@FourThought.com> (message from Mike Olson on Sun, 01 Oct 2000 14:06:44 -0600) References: <200009300152.TAA12572@localhost.localdomain> <39D79954.342F7478@FourThought.com> Message-ID: <200010091911.VAA00840@loewis.home.cs.tu-berlin.de> > All of the traceout stuff has been removed. There still is the problem > of our traceout library. I suppose we can install it to 2 locations so > that xml.dom is not dependent on Ft. I'll work on that today so it > should be in the next snapshot. Thanks! I hope I've properly updated the MANIFEST.in so that everything that should get shipped actually is - I'd appreciate if you could verify this based on the PyXML 0.6.1 tar file. Also, what is the status of xml/dom/html/test_suite? Running test.py in that directory gives Traceback (most recent call last): File "test.py", line 76, in ? test(fileList) File "test.py", line 69, in test _mod.test(); File "test_element.py", line 20, in test e._set_ID('1'); File "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/Node.py", line 84, in __getattr__ return getattr(Node, name) AttributeError: _set_ID Is that a known problem? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 21:50:49 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Oct 2000 22:50:49 +0200 Subject: [XML-SIG] documentation In-Reply-To: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com> (fdrake@beopen.com) References: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com> Message-ID: <200010092050.WAA01029@loewis.home.cs.tu-berlin.de> > I'm starting documentation for the xml.sax package. We already have > the material that's part of the PyXML package, and I'm currently > working on the xml.sax package module itself. If anyone else would > like to take a portion of the documentation to work on, I'd certainly > appreciate some help! Hi Fred, I'm currently working on the major body of the SAX interfaces, mostly by extracting the doc strings into TeX. Regards, Martin From fdrake@beopen.com Mon Oct 9 21:55:58 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 9 Oct 2000 16:55:58 -0400 (EDT) Subject: [XML-SIG] documentation In-Reply-To: <200010092050.WAA01029@loewis.home.cs.tu-berlin.de> References: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com> <200010092050.WAA01029@loewis.home.cs.tu-berlin.de> Message-ID: <14818.12510.721242.680124@cj42289-a.reston1.va.home.com> [CC'd to the Doc-SIG as well.] Martin v. Loewis writes: > I'm currently working on the major body of the SAX interfaces, mostly > by extracting the doc strings into TeX. Great, since I was planning to work on them this week! I'll merge some of your xml.dom documentation with some that Paul sent directly to me, and get that in later this week. (Hopefully by Wed.) I've told Paul that I'd really like to have everything that will be in the final release in by Thursday, and will have a doc freeze on Friday. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Mon Oct 9 22:18:30 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Mon, 9 Oct 2000 17:18:30 -0400 (EDT) Subject: [XML-SIG] test cases Message-ID: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com> I just commented to Jeremy that I'd like to see more test cases for the XML package. I don't know how much time there will be for that this week, but if anyone has time to create some good tests, I'd certainly be interested in getting more tests into the regression test. I've definately not had enough time to write test cases; there are a lot I'd like to add and extend across the standard library as a whole. We won't be able to get more tests into 2.0, but we can extend the tests in PyXML and Python 2.1. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 23:31:33 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 10 Oct 2000 00:31:33 +0200 Subject: [XML-SIG] test cases In-Reply-To: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com> (fdrake@beopen.com) References: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com> Message-ID: <200010092231.AAA05930@loewis.home.cs.tu-berlin.de> > I just commented to Jeremy that I'd like to see more test cases for > the XML package. I don't know how much time there will be for that > this week I'll be out of town for the rest of the week, so I won't be able to do anything more... I don't feel that things are too bad - most of the problems we've seen are "border cases" (e.g. reporting errors on empty files, properly setting all attributes in minidom which is a DOM subset anyway); I'd hope that the core functionality is working. Still, volunteers for writing new test cases would certainly be welcome - even reporting things that you feel are not right would help; others might then fix the problem and submit a test case. Regards, Martin From clarence@netlojix.com Tue Oct 10 22:44:55 2000 From: clarence@netlojix.com (Clarence Gardner) Date: Tue, 10 Oct 2000 14:44:55 -0700 Subject: [XML-SIG] Moving DOM node hierarchies Message-ID: <20001010144455.B12546@liberty.sba2.netlojix.net> I have a program in which a bunch of unrelated functions return a node hierarchy to a central function, which then wants to package them all up into a document and send it on its way. Unfortunately, I'm getting WRONG_DOCUMENT errors when I do this, because each of the trees was generated using a throw-away document. It would be obnoxious to have to create the result document first and then provide it everywhere that a subtree is created. Does this seem like a strange way to be going about things? It seemed quite reasonable to me. In fact, I don't understand the reasoning behind the only-append-in-the-creating-document restriction. Can anyone shed light on this? Thanks. -- Clarence Gardner Software Engineer NetLojix Communications clarence@netlojix.com From jsydik@BINARY.NET Wed Oct 11 00:39:17 2000 From: jsydik@BINARY.NET (Jeremy J. Sydik) Date: Tue, 10 Oct 2000 18:39:17 -0500 Subject: [XML-SIG] Moving DOM node hierarchies References: <20001010144455.B12546@liberty.sba2.netlojix.net> Message-ID: <39E3A8A5.F58BD069@BINARY.NET> I had a similar problem a while back, but that was before the XML-SIG/4DOM integration. It would be helpful to see the code that is failing or a test case that shows the same error. In the meantime, here are the sig archive messages related to my problems: http://www.python.org/pipermail/xml-sig/2000-March/003656.html http://www.python.org/pipermail/xml-sig/2000-March/003668.html http://www.python.org/pipermail/xml-sig/2000-April/003747.html http://www.python.org/pipermail/xml-sig/2000-April/003748.html Clarence Gardner wrote: > > I have a program in which a bunch of unrelated functions return a > node hierarchy to a central function, which then wants to package them > all up into a document and send it on its way. Unfortunately, I'm > getting WRONG_DOCUMENT errors when I do this, because each of the trees > was generated using a throw-away document. It would be obnoxious to have > to create the result document first and then provide it everywhere that > a subtree is created. > > Does this seem like a strange way to be going about things? It seemed > quite reasonable to me. In fact, I don't understand the reasoning behind > the only-append-in-the-creating-document restriction. Can anyone shed > light on this? > > Thanks. > > -- > Clarence Gardner > Software Engineer > NetLojix Communications > clarence@netlojix.com > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig From clarence@netlojix.com Wed Oct 11 02:19:53 2000 From: clarence@netlojix.com (Clarence Gardner) Date: Tue, 10 Oct 2000 18:19:53 -0700 Subject: [XML-SIG] Moving DOM node hierarchies In-Reply-To: <39E3A8A5.F58BD069@BINARY.NET> References: <20001010144455.B12546@liberty.sba2.netlojix.net> <39E3A8A5.F58BD069@BINARY.NET> Message-ID: <20001010181953.C12546@liberty.sba2.netlojix.net> Thanks. I read the references, but my posting was more oriented toward why the insertion was disallowed in the first place. Of course, the people writing the specs put a lot more thought into these things than I do, but I would think this would come up quite often and they might have rationalized it. I was particularly amazed, after coming across this, that DocumentFragments couldn't even be inserted. I see in the spec for Level 2 references to how default attributes are handled between the two documents, which is certainly an issue. I guess maybe this was in fact just overlooked the first time. Oh well. On Tue, Oct 10, 2000 at 06:39:17PM -0500, Jeremy J. Sydik wrote: > I had a similar problem a while back, but that was before the XML-SIG/4DOM > integration. It would be helpful to see the code that is failing or a test > case that shows the same error. In the meantime, here are the sig archive > messages related to my problems: > > http://www.python.org/pipermail/xml-sig/2000-March/003656.html > http://www.python.org/pipermail/xml-sig/2000-March/003668.html > > http://www.python.org/pipermail/xml-sig/2000-April/003747.html > http://www.python.org/pipermail/xml-sig/2000-April/003748.html > > Clarence Gardner wrote: > > > > I have a program in which a bunch of unrelated functions return a > > node hierarchy to a central function, which then wants to package them > > all up into a document and send it on its way. Unfortunately, I'm > > getting WRONG_DOCUMENT errors when I do this, because each of the trees > > was generated using a throw-away document. It would be obnoxious to have > > to create the result document first and then provide it everywhere that > > a subtree is created. > > > > Does this seem like a strange way to be going about things? It seemed > > quite reasonable to me. In fact, I don't understand the reasoning behind > > the only-append-in-the-creating-document restriction. Can anyone shed > > light on this? > > > > Thanks. > > > > -- > > Clarence Gardner > > Software Engineer > > NetLojix Communications > > clarence@netlojix.com > > > > _______________________________________________ > > XML-SIG maillist - XML-SIG@python.org > > http://www.python.org/mailman/listinfo/xml-sig -- Clarence Gardner Software Engineer NetLojix Communications clarence@netlojix.com From alf@logilab.com Wed Oct 11 11:38:15 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Wed, 11 Oct 2000 12:38:15 +0200 (CEST) Subject: [XML-SIG] Moving DOM node hierarchies In-Reply-To: <20001010181953.C12546@liberty.sba2.netlojix.net> Message-ID: On Tue, 10 Oct 2000, Clarence Gardner wrote: > > Thanks. I read the references, but my posting was more oriented toward > why the insertion was disallowed in the first place. Of course, the > people writing the specs put a lot more thought into these things than > I do, but I would think this would come up quite often and they might > have rationalized it. I was particularly amazed, after coming across > this, that DocumentFragments couldn't even be inserted. They can be inserted. The DOM core spec says: insertBefore : Inserts the node newChild before the existing child node refChild. If refChild is null, insert newChild at the end of the list of children. If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed. DOM3 should provide facilities for moving nodes across documents. In the meanwhile, you have to use Document.importNode(node,deep_copy=true) before inserting the new copy in the tree. If you really want to mimick DOM2 behaviour, you also have to manually remove the original node from the first document. So this gives something like def appendFromOtherDoc(node1,node2): imported = node2.ownerDocument.importNode(node1,1) node2.appendChild(imported) node1.parent.removeNode(node1) # optionnally with 4DOM, you may want to remove circular references from xml.dom.ext import ReleaseNode ReleaseNode(node1) Notice that this does not answer your question on why this is done the way it is done. I'm sometimes as baffled as you are. My biggest grudge is Why non qualified attributes do not inherit the element default namespace ? -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From pblanchette@pixelsystems.com Wed Oct 11 16:02:07 2000 From: pblanchette@pixelsystems.com (Patrick Blanchette) Date: Wed, 11 Oct 2000 11:02:07 -0400 Subject: [XML-SIG] Generating XML documents Message-ID: <39E480EF.5C56215E@pixelsystems.com> This is a multi-part message in MIME format. --------------3EDB20303C3E34468D29EB7A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I'm a newbie in xml. I want to generate new xml documents using python code. In the HOWTO doc, there is a "xml.dom.builder" class but this class did not seem to be part of the PyXML 0.6.1. Where can I found a python base class for generating xml documents? --------------3EDB20303C3E34468D29EB7A Content-Type: text/x-vcard; charset=us-ascii; name="pblanchette.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Patrick Blanchette Content-Disposition: attachment; filename="pblanchette.vcf" begin:vcard n:Blanchette;Patrick x-mozilla-html:FALSE org:Pixel Systems Inc.;Display team adr:;;;;;; version:2.1 email;internet:pblanchette@pixelsystems.com title:Software developper fn:Patrick Blanchette end:vcard --------------3EDB20303C3E34468D29EB7A-- From jeremy.kloth@fourthought.com Wed Oct 11 18:40:41 2000 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Wed, 11 Oct 2000 11:40:41 -0600 Subject: [XML-SIG] Re: [4suite] 4DOM bugs References: <200010080833.KAA00815@loewis.home.cs.tu-berlin.de> Message-ID: <39E4A619.281CBE77@fourthought.com> "Martin v. Loewis" wrote: > > While porting PyXML's test_dom to 4DOM, I noticed a number of > problems, which I believe are bugs in 4DOM. Consider > > from xml.dom import implementation > > doc = implementation.createDocument(None,None,None) > > n1 = doc.createElement('n1') ; n2 = doc.createElement('n2') > pi = doc.createProcessingInstruction("Processing", "Instruction") > doc.appendChild(pi) > doc.appendChild(n1) > > #doc.appendChild(n1) # fails, but shouldn't > doc.replaceChild(n2, n1) > doc.replaceChild(pi, n2) > print doc.documentElement > > The line "doc.appendChild(n1)" raises a hierarchy exception, as n1 is > already in the tree. However, this is incorrect: it should first > remove n1, then reinsert it. > > The second fragment does not cause an exception. However, in the end, > the "documentElement" of the document is a processing > instruction. That is very strange - it should always be an element. > > I've been using the 4DOM version that is currently in the PyXML CVS. > We do remove the child first for regular elements, but apparently didn't propagate the change into the code for modifing elements in the Document. We'll get this fixed up and checked into the PyXML CVS as soon as possible. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From bwiegert@learningbyte.com Wed Oct 11 19:55:15 2000 From: bwiegert@learningbyte.com (Ben Wiegert) Date: Wed, 11 Oct 2000 13:55:15 -0500 Subject: [XML-SIG] Getting DOCTYPE information using SAX Message-ID: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net> I am a Python newbie. I have gotten my code to read in and parse XML using SAXLIB from PyXML. I can also manipulate what I read in and output it to XML. The only thing that I can not seem to grab is the DOCTYPE line (or the XML header Line, but I am mainly concerned with the DOCTYPE). I need to specify my DTD in the outbound XML file? Is there an event in SAX that allows me to get that info? Any help appreciated Ben From BestFriend@twcny.rr.com Wed Oct 11 13:06:39 2000 From: BestFriend@twcny.rr.com (BestFriend@twcny.rr.com) Date: Wed, 11 Oct 2000 12:06:39 Subject: [XML-SIG] WHAT CAN YOU GET FOR $20??? Message-ID: <701.490608.793281@twcny.rr.com> What can you get for $20.00? A pizza A tank of gas A haircut Lunch with a friend A parking place How About FINANCIAL INDEPENDENCE!!!! Looking for that extra something, to help your life have that little extra comfort? Do you work to cover the bills? Fed up with paying out and not receiving the rewards you wish for? Then have an open mind And read all of this, before you make a decision- it will be worth your while. _______________________________________________ Subject: MUST READ! ! ! ... TV Advertised! ! ! ... Fun-Lucrative Fellow Entrepreneur If you wish to learn about an exceptional opportunity in the Home Business arena...Read On. "Your living is determined not so much by what life brings to you as by the attitude you bring to life; not so much by what happens to you as by the way your mind looks at what happens." This is going to be a great New Year for you! Please read all of this! EARN $100,000 PER YEAR SENDING E-MAIL!!! **************************************************************** You can earn $50,000 or more in the next 90 days sending e-mail, seem impossible? Read on for details (no, there is no 'catch')... ---------------------------------------------------------------- "AS SEEN ON NATIONAL TV" Thank you for your time and Interest. This is the letter you've been hearing about in the news lately. Due to the popularity of this letter on the internet, a major nightly news program recently devoted an entire show to the investigation of the program, described below, to see if it really can make people money. The show also investigated whether or not the program was legal. Their findings proved once and for all that there are, absolutely no laws prohibiting the participation in the program. This has helped to show people that this is a simple, harmless and fun way to make some extra money at home. The results of this show have been truly remarkable. Since so many people are participating now, those involved are doing much better than ever before. Everyone makes more as more people try it out. It is very, very exciting to be a part of this plan. You will understand once you experience it. "HERE IT IS, BELOW" ================================================ ================================================ *** Print This Now For Future Reference *** The following income opportunity is one you may be interested in taking a look at. It can be started with VERY LITTLE investment and the income return is TREMENDOUS!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ If you would like to make at least $50,000 in less than 90 days! Please read the enclosed program...THEN READ IT AGAIN!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ THIS IS A LEGITIMATE, LEGAL, MONEYMAKING OPPORTUNITY. It does not require you to come into contact with people, do any hard work and best of all, you never have to leave the house except to get the mail. If you believe that someday you'll get that big break that you've been waiting for, THIS IS IT! Simply follow the instructions, and your dreams will come true. This e-mail marketing program works perfectly...100%, EVERY TIME. E-mail is the sales tool of the future. Take advantage of this non- commercialized method of advertising NOW!!! The longer you wait, the more people will be doing business using e-mail. Get your piece of this program now! MULTI-LEVEL MARKETING (MLM) has finally gained respectability. It is being taught in the Harvard Business School, both Stanford Research and the Wall Street Journal have stated that between 50% and 65% of all goods and services will be sold through multi-level methods by the late 1990's. This is a Multi-Billion Dollar industry and of the 500,000 millionaires in the U.S., 20% (100,000) made their fortune in the last few years in MLM. Moreover, statistics show 45 people become millionaires everyday through Multi-Level Marketing. You may have heard this story before, but over the summer Donald Trump made an appearance on the David Letterman Show. Dave asked him what he would do if he lost everything and had to start over from scratch. Without hesitating, Trump said he would find a good network marketing company and get to work. The audience started to hoot and boo him. He looked out at the audience and dead-panned his response - "That's why I'm sitting up here and you are all sitting out there!" With network marketing you have two sources of income. Direct commissions from sales you make yourself and commissions from sales made by people you introduce to the business. Residual income is the secret of the wealthy. It means investing time or money once and getting paid again and again and again. In network marketing, it also means getting paid for the work of others. The enclosed information is something I almost let slip through my fingers. Fortunately, sometime later I re-read everything and gave some thought and study to it. My name is Ellie Gilbert. Two years ago, the corporation I worked for, the past twelve years, down-sized and my position was eliminated. After many unproductive job interviews, I decided to open my own business. Over the past year, I incurred many unforeseen financial problems. I owed my family, friends and creditors over $40,000... I just couldn't seem to make ends meet. I had to refinance and borrow against my home to support my family and struggling business. AT THAT MOMENT something significant happened in my life and I am writing to share the experience in hopes that this will change your life, FINANCIALLY, FOREVER!!! In mid December, I received this program via e-mail. Six month's prior to receiving this program I had been sending away for information on various business opportunities. All of the programs I received, in my opinion, were not cost effective. They were either too difficult for me to comprehend or the initial investment was too much for me to risk to see if they would work or not. One claimed that I would make a million dollars in one year...it didn't tell me I'd have to write a best selling book to make it! But, as I was saying, in December of 1997 I received this program. I didn't send for it, or ask for it, they just got my name off a mailing list. THANK GOODNESS FOR THAT! After reading it several times, to make sure I was reading it correctly, I couldn't believe my eyes. Here was a MONEY MAKING PHENOMENON. I could invest as much as I wanted to start, without putting me further into debt. After I got a pencil and paper and figured it out, I would at least get my money back. But like most of you I was still a little skeptical and a little worried about the legal aspects of it all. So I checked it out with the U.S. Post Office (1-800-725-2161 24-hrs) and they confirmed that it is indeed legal! After determining the program was LEGAL and NOT A CHAIN LETTER, I decided "WHY NOT." Initially I sent out 10,000 e-mails. The great thing about e- mail is that I don't need any money for printing to send out the program, and because all of my orders are fulfilled via e-mail, the only expense is my time. I'm telling you as it is, I hope it doesn't turn you off, but I promised myself that I would not "rip-off" anyone, no matter how much money it cost me. In less than one week, I was starting to receive orders for REPORT #1. By January 13, I had received 26 orders for REPORT #1. Your goal is to "RECEIVE at least 20 ORDERS FOR REPORT #1 WITHIN 2 WEEKS. If you don't, SEND OUT MORE PROGRAMS UNTIL YOU DO!" My first step in making $50,000 in 90 days was done. By January 30, I had received 196 orders for REPORT #2. Your goal is to "RECEIVE AT LEAST 100+ ORDERS FOR REPORT #2 WITHIN 2 WEEKS. IF NOT, SEND OUT MORE PROGRAMS UNTIL YOU DO. ONCE YOU HAVE 100 ORDERS, THE REST IS EASY, RELAX, YOU WILL MAKE YOUR $50,000 GOAL." Well, I had 196 orders for REPORT #2, 96 more than I needed. So I sat back and relaxed. By March 1, of my e- mailing of 10,000, I received $58,000 with more coming in every day. I paid off ALL my debts and bought a much needed new car. Please take time to read the attached program, IT WILL CHANGE YOUR LIFE FOREVER! Remember, it won't work if you don't try it. This program does work, but you must follow it EXACTLY! Especially the rules of not trying to place your name in a different place. It won't work, you'll lose out on a lot of money! In order for this program to work, you must meet your goal of 20+ orders for REPORT #1, and 100+ orders for REPORT #2 and you will make $50,000 or more in 90 days. I AM LIVING PROOF THAT IT WORKS! If you choose not to participate in this program, I am sorry. It really is a great opportunity with little cost or risk to you. If you choose to participate, follow the program and you will be on your way to financial security. If you are a business owner and in financial trouble, as I was, or you want to start your own business, consider this a good luck sign. I DID! Sincerely, Ellie Gilbert P.S. Do you have any idea what $58,000 looks like piled up on a kitchen table? IT'S AWESOME! A PERSONAL NOTE FROM THE ORIGINATOR OF THIS PROGRAM: By the time you have read the enclosed program and reports you should have concluded that such a program, one that is legal, could not have been created by an amateur. Let me tell you a little about myself. I had a profitable business for 10 years. Then in 1979 my business began falling off. I was doing the same things that were previously successful for me, but it wasn't working. Finally, I figured it out. It wasn't me, it was the economy. Inflation and recession had replaced the stable economy that had been with us since 1945. I don't have to tell you what happened to the unemployment rate... because many of you know from first hand experience. There were more failures and bankruptcies than ever before. The middle class was vanishing. Those who knew what they were doing invested wisely and moved up. Those who did not, including those who never had anything to save or invest, were moving down into the ranks of the poor. As the saying goes, "THE RICH GET RICHER AND THE POOR GET POORER." The traditional methods of making money will never allow you to "move up" or "get rich". You have just received information that can give you financial freedom for the rest of your life, with "NO RISK" and "JUST A LITTLE BIT OF EFFORT." You can make more money in the next few months than you have ever imagined. I should also point out that I will not see a penny of this money, nor anyone else who has provided a testimonial for this program. I have already made over 4 MILLION DOLLARS! I have retired from the program after sending out over 16,000 programs. Follow the program EXACTLY AS INSTRUCTED. Do not change it in any way. It works exceedingly well as it is now. Remember to e- mail a copy of this exciting report to everyone you can think of. One of the people you send this to may send out 50,000...and your name will be on every one of them! Remember though, the more you send out the more potential customers you will reach. So my friend, I have given you the ideas, information, materials and opportunity to become financially independent, IT IS NOW UP TO YOU! "THINK ABOUT IT" Before you delete this program from your mailbox, as I almost did, take a little time to read it and REALLY THINK ABOUT IT. Get a pencil and figure out what could happen when YOU participate. Figure out the worst possible response and no matter how you calculate it, you will still make a lot of money! You will definitely get back what you invested. Any doubts you have will vanish when your first orders come in. IT WORKS! Jody Jacobs, Richmond, VA HERE'S HOW THIS AMAZING PROGRAM WILL MAKE YOU THOUSANDS OF DOLLARS INSTRUCTIONS: This method of raising capital REALLY WORKS 100 %, EVERY TIME. I am sure that you could use up to $50,000 or more in the next 90 days. Before you say "BULL... ", please read this program carefully. This is not a chain letter, but a perfectly legal money making opportunity. Basically, this is what you do: As with all multi- level businesses, we build our business by recruiting new partners and selling our products. Every state in the USA allows you to recruit new multi-level business partners, and we offer a product for EVERY dollar sent. YOUR ORDERS COME BY MAIL AND ARE FILLED BY E-MAIL, so you are not involved in personal selling. You do it privately in your own home, store or office. This is the GREATEST Multi-Level Mail Order Marketing anywhere: This is what you MUST do: 1. Order all 4 reports shown on the list below (you can't sell them if you don't order them). * For each report, send $5.00 (£5) CASH, the NAME & NUMBER OF THE REPORT YOU ARE ORDERING, YOUR E-MAIL ADDRESS, and YOUR NAME & RETURN ADDRESS (in case of a problem) to the person whose name appears on the list next to the report. MAKE SURE YOUR RETURN ADDRESS IS ON YOUR ENVELOPE IN CASE OF ANY MAIL PROBLEMS! * When you place your order, make sure you order each of the four reports. You will need all four reports so that you can save them on your computer and resell them. * Within a few days you will receive, via e-mail, each of the four reports. Save them on your computer so they will be accessible for you to send to the 1,000's of people who will order them from you. 2. IMPORTANT-- DO NOT alter the names of the people who are listed next to each report, or their sequence on the list, in any way other than is instructed below in steps "a" through "f" or you will lose out on the majority of your profits. Once you understand the way this works, you'll also see how it doesn't work if you change it. Remember, this method has been tested, and if you alter it, it will not work. a. Look below for the listing of available reports. b. After you've ordered the four reports, take this letter and remove the name and address under REPORT #4. This person has made it through the cycle and is no doubt counting their $50,000! c. Move the name and address under REPORT #3 down to REPORT #4. d. Move the name and address under REPORT #2 down to REPORT #3. e. Move the name and address under REPORT #1 down to REPORT #2. f. Insert your name/address in the REPORT #1 position. Please make sure you copy every name and address ACCURATELY! 3. Take this entire letter, including the modified list of names, and save it to your computer. Make NO changes to the instruction portion of this letter. 4. Now you're ready to start an advertising campaign on the WORLD WIDE WEB! SEND OUT THIS LETTER (with your name added) TO AS MANY PEOPLE AS YOU CAN, EVEN FRIENDS AND FAMILY. Advertising on the WEB can be very, very inexpensive, and there are HUNDREDS of FREE places to advertise. Another avenue which you could use for advertising is e-mail lists. You can buy these lists for under $20/20,000 addresses or you can pay someone to take care of it for you. BE SURE TO START YOUR AD CAMPAIGN IMMEDIATELY! 5. For every $5.00(£5) you receive, all you must do is e-mail them the report they ordered. THAT'S IT! ALWAYS PROVIDE SAME-DAY SERVICE ON ALL ORDERS! This will help guarantee that the e-mail THEY send out, with YOUR name and address on it, will be prompt because they can't advertise until they receive the report! To grow fast be prompt and courteous. ------------------------------------------ AVAILABLE REPORTS ------------------------------------------ ***Order Each REPORT by NUMBER and NAME*** Notes: * - ALWAYS SEND $5(£5) CASH FOR EACH REPORT * - ALWAYS SEND YOUR ORDER VIA THE QUICKEST DELIVERY * - Make sure the cash is concealed by wrapping it in at least two sheets of paper * - On one of those sheets of paper, include: (a) the number & name of the report you are ordering, (b) your e-mail address, and (c) your postal address. ___________________________________________________________ REPORT #1 "HOW TO MAKE $250,000 THROUGH MULTI-LEVEL SALES" ORDER REPORT #1 FROM: K. Winchell (will accept your currency) PO Box 283 Sandy Creek, NY USA 13145 _______________________________________________________ REPORT #2 "MAJOR CORPORATIONS AND MULTI-LEVEL SALES" ORDER REPORT #2 FROM: E.Mills (will accept your currency) PO Box 2 Mowbray Heights Launceston,Tasmania Australia 7248 ________________________________________________ REPORT #3 "SOURCES FOR THE BEST MAILING LISTS" Jim Wright 38 Pentyla Baglan Rd Port Talbot West Glamorgan SA12 8AA Wales UK ________________________________________________ REPORT #4 "EVALUATING MULTI-LEVEL SALES PLANS" ORDER REPORT #4 FROM: Conrad Fry 1 Avon Gardens West Bridgford Nottingham England NG2 6BP ---------------------------------------------------------------- ----- HERE'S HOW THIS AMAZING PLAN WILL MAKE YOU $MONEY$ ---------------------------------------------------------------- ----- Let's say you decide to start small just to see how well it works. Assume your goal is to get 10 people to participate on your first level. (Placing a lot of FREE ads on the Internet will EASILY get a larger response.) Also assume that everyone else in YOUR ORGANIZATION gets ONLY 10 downline members. Follow this example to achieve the STAGGERING results below. 1st level--your 10 members with $5.......................$50 2nd level--10 members from those 10 ($5 x 100)........$500 3rd level--10 members from those 100 ($5 x 1,000)...$5,000 4th level--10 members from those 1,000 ($5x10,000).$50,000 THIS TOTALS ------ $55,550 Remember, this assumes that the people who participate only recruit 10 people each. Think for a moment what would happen if they got 20 people to participate! Lots of people get 100s of participants! THINK ABOUT IT! Your cost to participate in this is practically nothing (surely you can afford $20). You obviously already have an Internet connection and e-mail is FREE! REPORT #3 shows you the most productive methods for bulk e-mailing and purchasing e-mail lists. Some list & bulk e-mail vendors even work on trade! Over 50,000, new people, get on the Internet EVERYDAY (CBS NEWS)! *******TIPS FOR SUCCESS******* * TREAT THIS AS YOUR BUSINESS! Be prompt, professional, and follow the directions accurately. * Send for the four reports IMMEDIATELY so you will have them when the orders start coming in because: When you receive a $5 order, you MUST send out the requested product (report) to comply with the U.S. Postal & Lottery Laws, Title 18, Sections 1302 and 1341 or Title 18, Section 3005 in the U.S. Code, also Code of Federal Regs. vol. 16, Sections 255 and 436, which state that "a product or service must be exchanged for money received." * ALWAYS PROVIDE SAME-DAY SERVICE ON THE ORDERS YOU RECEIVE. * Be patient and persistent with this program. If you follow the instructions exactly, the results WILL undoubtedly be SUCCESSFUL! * ABOVE ALL, HAVE FAITH IN YOURSELF AND KNOW YOU WILL SUCCEED! *******YOUR SUCCESS GUIDELINE******* Follow these guidelines to help assure your success: If you don't receive 10 to 20 orders for REPORT #1 within two weeks, continue advertising until you do. Then, a couple of weeks later you should receive at least 100 orders for REPORT #2. If you don't, continue advertising until you do. Once you have received 100 or more orders for REPORT #2, YOU CAN RELAX, because the system is already working for you, and the cash can continue to roll in! THIS IS IMPORTANT TO REMEMBER: Every time your name is moved down on the list, you are placed in front of a DIFFERENT report. You can KEEP TRACK of your PROGRESS by watching which report people are ordering from you. If you want to generate more income, send another batch of e- mails and start the whole process again! There is no limit to the income you will generate from this business! PLEASE NOTE: If you need help with starting a business, registering a business name, learning how income tax is handled, etc., contact your local office of the Small Business Administration (a Federal agency) 1-(800)827-5722 for free help and answers to questions. Also, the Internal Revenue Service offers free help via telephone and free seminars about business tax requirements. Your earnings and results are highly dependent on your activities and advertising. This letter constitutes no guarantees stated nor implied. In the event that it is determined that this letter constitutes a guarantee of any kind, that guarantee is now void. Any testimonials or amounts of earnings listed in this letter may be factual or fictitious. If you have any question of the legality of this letter contact the Office of Associate Director for Marketing Practices Federal Trade Commission Bureau of Consumer Protection in Washington DC. *******T E S T I M O N I A L S******* This program does work, but you must follow it EXACTLY! Especially the rule of not trying to place your name in a different position, it won't work and you'll lose a lot of potential income. I'm living proof that it works. It really is a great opportunity to make relatively easy money, with little cost to you. If you do choose to participate, follow the program exactly, and you'll be on your way to financial security. Sean McLaughlin, Jackson, MS My name is Frank. My wife, Doris, and I live in Bel-Air, MD. I am a cost accountant with a major U.S. Corporation and I make pretty good money. When I received the program I grumbled to Doris about receiving "junk mail." I made fun of the whole thing, spouting my knowledge of the population and percentages involved. I "knew" it wouldn't work. Doris totally ignored my supposed intelligence and jumped in with both feet. I made merciless fun of her, and was ready to lay the old "I told you so" on her when the thing didn't work... well, the laugh was on me! Within two weeks she had received over 50 responses. Within 45 days she had received over $147,200 in $5 bills! I was shocked! I was sure that I had it all figured and that it wouldn't work. I AM a believer now. I have joined Doris in her "hobby." I did have seven more years until retirement, but I think of the "rat race" and it's not for me. We owe it all to MLM. Frank T., Bel-Air, MD I just want to pass along my best wishes and encouragement to you. Any doubts you have will vanish when your first orders come in. I even checked with the U.S. Post Office to verify that the plan was legal. It definitely is! IT WORKS! Paul Johnson, Raleigh, NC The main reason for this letter is to convince you that this system is honest, lawful, extremely profitable, and is a way to get a large amount of money in a short time. I was approached several times before I checked this out. I joined just to see what one could expect in return for the minimal effort and money required. To my astonishment, I received $36,470.00 in the first 14 weeks, with money still coming in. Phillip A. Brown, Esq. Not being the gambling type, it took me several weeks to make up my mind to participate in this plan. But conservative that I am, I decided that the initial investment was so little that there was just no way that I wouldn't get enough orders to at least get my money back. Boy, was I surprised when I found my medium- size post office box crammed with orders! For a while, it got so overloaded that I had to start picking up my mail at the window. I'll make more money this year than any 10 years of my life before. The nice thing about this plan is that it doesn't matter where in the U.S. people live. There simply isn't a better investment with a faster return. Mary Rockland, Lansing, MI I had received this program before. I deleted it, but later I wondered if I shouldn't have given it a try. Of course, I had no idea who to contact to get another copy, so I had to wait until I was e-mailed another program...11 months passed then it came...I didn't delete this one!...I made more than $41,000 on the first try!! D. Wilburn, Muncie, IN This is my third time to participate in this plan. We have quit our jobs, and will soon buy a home on the beach and live off the interest on our money. The only way on earth that this plan will work for you is if you do it. For your sake, and for your family's sake don't pass up this golden opportunity. Good luck and happy spending! Charles Fairchild, Spokane, WA ORDER YOUR REPORTS TODAY AND GET STARTED ON YOUR ROAD TO FINANCIAL FREEDOM! NOW IS THE HOUR! DECISIVE ACTION YIELDS POWERFUL RESULTS ! ********************************************************* Your request to be removed will be processed within 24 hours. DISCLAIMER: Under Bill s.1618 TITLE III passed by the 105th US Congress this letter Cannot be considered Spam as long as the sender includes contact information & a method of removal.To be removed from future mailings just reply with REMOVE in the subject line.Thank you for your kind consideration. From larsga@garshol.priv.no Wed Oct 11 23:20:33 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Oct 2000 00:20:33 +0200 Subject: [XML-SIG] Getting DOCTYPE information using SAX In-Reply-To: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net> References: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net> Message-ID: * Ben Wiegert | | I am a Python newbie. I have gotten my code to read in and parse | XML using SAXLIB from PyXML. I can also manipulate what I read in | and output it to XML. The only thing that I can not seem to grab is | the DOCTYPE line (or the XML header Line, but I am mainly concerned | with the DOCTYPE). I need to specify my DTD in the outbound XML | file? Is there an event in SAX that allows me to get that info? In SAX 2.0 as it exists in Python 2.0 and PyXML there is not. There is an extension handler (in SAX 2.0 ext) known as LexicalHandler that does have an event for this. xmlproc will support this, once Python 2.0 is out the door and I have time to sit down and write a SAX driver for it. (There is one written already, but it's for an older form of SAX.) --Lars M. From BestFriend@twcny.rr.com Wed Oct 11 18:56:16 2000 From: BestFriend@twcny.rr.com (BestFriend@twcny.rr.com) Date: Wed, 11 Oct 2000 17:56:16 Subject: [XML-SIG] WHAT CAN YOU GET FOR $20??? Message-ID: <316.413246.418392@twcny.rr.com> What can you get for $20.00? A pizza A tank of gas A haircut Lunch with a friend A parking place How About FINANCIAL INDEPENDENCE!!!! Looking for that extra something, to help your life have that little extra comfort? Do you work to cover the bills? Fed up with paying out and not receiving the rewards you wish for? Then have an open mind And read all of this, before you make a decision- it will be worth your while. _______________________________________________ Subject: MUST READ! ! ! ... TV Advertised! ! ! ... Fun-Lucrative Fellow Entrepreneur If you wish to learn about an exceptional opportunity in the Home Business arena...Read On. "Your living is determined not so much by what life brings to you as by the attitude you bring to life; not so much by what happens to you as by the way your mind looks at what happens." This is going to be a great New Year for you! Please read all of this! EARN $100,000 PER YEAR SENDING E-MAIL!!! **************************************************************** You can earn $50,000 or more in the next 90 days sending e-mail, seem impossible? Read on for details (no, there is no 'catch')... ---------------------------------------------------------------- "AS SEEN ON NATIONAL TV" Thank you for your time and Interest. This is the letter you've been hearing about in the news lately. Due to the popularity of this letter on the internet, a major nightly news program recently devoted an entire show to the investigation of the program, described below, to see if it really can make people money. The show also investigated whether or not the program was legal. Their findings proved once and for all that there are, absolutely no laws prohibiting the participation in the program. This has helped to show people that this is a simple, harmless and fun way to make some extra money at home. The results of this show have been truly remarkable. Since so many people are participating now, those involved are doing much better than ever before. Everyone makes more as more people try it out. It is very, very exciting to be a part of this plan. You will understand once you experience it. "HERE IT IS, BELOW" ================================================ ================================================ *** Print This Now For Future Reference *** The following income opportunity is one you may be interested in taking a look at. It can be started with VERY LITTLE investment and the income return is TREMENDOUS!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ If you would like to make at least $50,000 in less than 90 days! Please read the enclosed program...THEN READ IT AGAIN!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ THIS IS A LEGITIMATE, LEGAL, MONEYMAKING OPPORTUNITY. It does not require you to come into contact with people, do any hard work and best of all, you never have to leave the house except to get the mail. If you believe that someday you'll get that big break that you've been waiting for, THIS IS IT! Simply follow the instructions, and your dreams will come true. This e-mail marketing program works perfectly...100%, EVERY TIME. E-mail is the sales tool of the future. Take advantage of this non- commercialized method of advertising NOW!!! The longer you wait, the more people will be doing business using e-mail. Get your piece of this program now! MULTI-LEVEL MARKETING (MLM) has finally gained respectability. It is being taught in the Harvard Business School, both Stanford Research and the Wall Street Journal have stated that between 50% and 65% of all goods and services will be sold through multi-level methods by the late 1990's. This is a Multi-Billion Dollar industry and of the 500,000 millionaires in the U.S., 20% (100,000) made their fortune in the last few years in MLM. Moreover, statistics show 45 people become millionaires everyday through Multi-Level Marketing. You may have heard this story before, but over the summer Donald Trump made an appearance on the David Letterman Show. Dave asked him what he would do if he lost everything and had to start over from scratch. Without hesitating, Trump said he would find a good network marketing company and get to work. The audience started to hoot and boo him. He looked out at the audience and dead-panned his response - "That's why I'm sitting up here and you are all sitting out there!" With network marketing you have two sources of income. Direct commissions from sales you make yourself and commissions from sales made by people you introduce to the business. Residual income is the secret of the wealthy. It means investing time or money once and getting paid again and again and again. In network marketing, it also means getting paid for the work of others. The enclosed information is something I almost let slip through my fingers. Fortunately, sometime later I re-read everything and gave some thought and study to it. My name is Ellie Gilbert. Two years ago, the corporation I worked for, the past twelve years, down-sized and my position was eliminated. After many unproductive job interviews, I decided to open my own business. Over the past year, I incurred many unforeseen financial problems. I owed my family, friends and creditors over $40,000... I just couldn't seem to make ends meet. I had to refinance and borrow against my home to support my family and struggling business. AT THAT MOMENT something significant happened in my life and I am writing to share the experience in hopes that this will change your life, FINANCIALLY, FOREVER!!! In mid December, I received this program via e-mail. Six month's prior to receiving this program I had been sending away for information on various business opportunities. All of the programs I received, in my opinion, were not cost effective. They were either too difficult for me to comprehend or the initial investment was too much for me to risk to see if they would work or not. One claimed that I would make a million dollars in one year...it didn't tell me I'd have to write a best selling book to make it! But, as I was saying, in December of 1997 I received this program. I didn't send for it, or ask for it, they just got my name off a mailing list. THANK GOODNESS FOR THAT! After reading it several times, to make sure I was reading it correctly, I couldn't believe my eyes. Here was a MONEY MAKING PHENOMENON. I could invest as much as I wanted to start, without putting me further into debt. After I got a pencil and paper and figured it out, I would at least get my money back. But like most of you I was still a little skeptical and a little worried about the legal aspects of it all. So I checked it out with the U.S. Post Office (1-800-725-2161 24-hrs) and they confirmed that it is indeed legal! After determining the program was LEGAL and NOT A CHAIN LETTER, I decided "WHY NOT." Initially I sent out 10,000 e-mails. The great thing about e- mail is that I don't need any money for printing to send out the program, and because all of my orders are fulfilled via e-mail, the only expense is my time. I'm telling you as it is, I hope it doesn't turn you off, but I promised myself that I would not "rip-off" anyone, no matter how much money it cost me. In less than one week, I was starting to receive orders for REPORT #1. By January 13, I had received 26 orders for REPORT #1. Your goal is to "RECEIVE at least 20 ORDERS FOR REPORT #1 WITHIN 2 WEEKS. If you don't, SEND OUT MORE PROGRAMS UNTIL YOU DO!" My first step in making $50,000 in 90 days was done. By January 30, I had received 196 orders for REPORT #2. Your goal is to "RECEIVE AT LEAST 100+ ORDERS FOR REPORT #2 WITHIN 2 WEEKS. IF NOT, SEND OUT MORE PROGRAMS UNTIL YOU DO. ONCE YOU HAVE 100 ORDERS, THE REST IS EASY, RELAX, YOU WILL MAKE YOUR $50,000 GOAL." Well, I had 196 orders for REPORT #2, 96 more than I needed. So I sat back and relaxed. By March 1, of my e- mailing of 10,000, I received $58,000 with more coming in every day. I paid off ALL my debts and bought a much needed new car. Please take time to read the attached program, IT WILL CHANGE YOUR LIFE FOREVER! Remember, it won't work if you don't try it. This program does work, but you must follow it EXACTLY! Especially the rules of not trying to place your name in a different place. It won't work, you'll lose out on a lot of money! In order for this program to work, you must meet your goal of 20+ orders for REPORT #1, and 100+ orders for REPORT #2 and you will make $50,000 or more in 90 days. I AM LIVING PROOF THAT IT WORKS! If you choose not to participate in this program, I am sorry. It really is a great opportunity with little cost or risk to you. If you choose to participate, follow the program and you will be on your way to financial security. If you are a business owner and in financial trouble, as I was, or you want to start your own business, consider this a good luck sign. I DID! Sincerely, Ellie Gilbert P.S. Do you have any idea what $58,000 looks like piled up on a kitchen table? IT'S AWESOME! A PERSONAL NOTE FROM THE ORIGINATOR OF THIS PROGRAM: By the time you have read the enclosed program and reports you should have concluded that such a program, one that is legal, could not have been created by an amateur. Let me tell you a little about myself. I had a profitable business for 10 years. Then in 1979 my business began falling off. I was doing the same things that were previously successful for me, but it wasn't working. Finally, I figured it out. It wasn't me, it was the economy. Inflation and recession had replaced the stable economy that had been with us since 1945. I don't have to tell you what happened to the unemployment rate... because many of you know from first hand experience. There were more failures and bankruptcies than ever before. The middle class was vanishing. Those who knew what they were doing invested wisely and moved up. Those who did not, including those who never had anything to save or invest, were moving down into the ranks of the poor. As the saying goes, "THE RICH GET RICHER AND THE POOR GET POORER." The traditional methods of making money will never allow you to "move up" or "get rich". You have just received information that can give you financial freedom for the rest of your life, with "NO RISK" and "JUST A LITTLE BIT OF EFFORT." You can make more money in the next few months than you have ever imagined. I should also point out that I will not see a penny of this money, nor anyone else who has provided a testimonial for this program. I have already made over 4 MILLION DOLLARS! I have retired from the program after sending out over 16,000 programs. Follow the program EXACTLY AS INSTRUCTED. Do not change it in any way. It works exceedingly well as it is now. Remember to e- mail a copy of this exciting report to everyone you can think of. One of the people you send this to may send out 50,000...and your name will be on every one of them! Remember though, the more you send out the more potential customers you will reach. So my friend, I have given you the ideas, information, materials and opportunity to become financially independent, IT IS NOW UP TO YOU! "THINK ABOUT IT" Before you delete this program from your mailbox, as I almost did, take a little time to read it and REALLY THINK ABOUT IT. Get a pencil and figure out what could happen when YOU participate. Figure out the worst possible response and no matter how you calculate it, you will still make a lot of money! You will definitely get back what you invested. Any doubts you have will vanish when your first orders come in. IT WORKS! Jody Jacobs, Richmond, VA HERE'S HOW THIS AMAZING PROGRAM WILL MAKE YOU THOUSANDS OF DOLLARS INSTRUCTIONS: This method of raising capital REALLY WORKS 100 %, EVERY TIME. I am sure that you could use up to $50,000 or more in the next 90 days. Before you say "BULL... ", please read this program carefully. This is not a chain letter, but a perfectly legal money making opportunity. Basically, this is what you do: As with all multi- level businesses, we build our business by recruiting new partners and selling our products. Every state in the USA allows you to recruit new multi-level business partners, and we offer a product for EVERY dollar sent. YOUR ORDERS COME BY MAIL AND ARE FILLED BY E-MAIL, so you are not involved in personal selling. You do it privately in your own home, store or office. This is the GREATEST Multi-Level Mail Order Marketing anywhere: This is what you MUST do: 1. Order all 4 reports shown on the list below (you can't sell them if you don't order them). * For each report, send $5.00 (£5) CASH, the NAME & NUMBER OF THE REPORT YOU ARE ORDERING, YOUR E-MAIL ADDRESS, and YOUR NAME & RETURN ADDRESS (in case of a problem) to the person whose name appears on the list next to the report. MAKE SURE YOUR RETURN ADDRESS IS ON YOUR ENVELOPE IN CASE OF ANY MAIL PROBLEMS! * When you place your order, make sure you order each of the four reports. You will need all four reports so that you can save them on your computer and resell them. * Within a few days you will receive, via e-mail, each of the four reports. Save them on your computer so they will be accessible for you to send to the 1,000's of people who will order them from you. 2. IMPORTANT-- DO NOT alter the names of the people who are listed next to each report, or their sequence on the list, in any way other than is instructed below in steps "a" through "f" or you will lose out on the majority of your profits. Once you understand the way this works, you'll also see how it doesn't work if you change it. Remember, this method has been tested, and if you alter it, it will not work. a. Look below for the listing of available reports. b. After you've ordered the four reports, take this letter and remove the name and address under REPORT #4. This person has made it through the cycle and is no doubt counting their $50,000! c. Move the name and address under REPORT #3 down to REPORT #4. d. Move the name and address under REPORT #2 down to REPORT #3. e. Move the name and address under REPORT #1 down to REPORT #2. f. Insert your name/address in the REPORT #1 position. Please make sure you copy every name and address ACCURATELY! 3. Take this entire letter, including the modified list of names, and save it to your computer. Make NO changes to the instruction portion of this letter. 4. Now you're ready to start an advertising campaign on the WORLD WIDE WEB! SEND OUT THIS LETTER (with your name added) TO AS MANY PEOPLE AS YOU CAN, EVEN FRIENDS AND FAMILY. Advertising on the WEB can be very, very inexpensive, and there are HUNDREDS of FREE places to advertise. Another avenue which you could use for advertising is e-mail lists. You can buy these lists for under $20/20,000 addresses or you can pay someone to take care of it for you. BE SURE TO START YOUR AD CAMPAIGN IMMEDIATELY! 5. For every $5.00(£5) you receive, all you must do is e-mail them the report they ordered. THAT'S IT! ALWAYS PROVIDE SAME-DAY SERVICE ON ALL ORDERS! This will help guarantee that the e-mail THEY send out, with YOUR name and address on it, will be prompt because they can't advertise until they receive the report! To grow fast be prompt and courteous. ------------------------------------------ AVAILABLE REPORTS ------------------------------------------ ***Order Each REPORT by NUMBER and NAME*** Notes: * - ALWAYS SEND $5(£5) CASH FOR EACH REPORT * - ALWAYS SEND YOUR ORDER VIA THE QUICKEST DELIVERY * - Make sure the cash is concealed by wrapping it in at least two sheets of paper * - On one of those sheets of paper, include: (a) the number & name of the report you are ordering, (b) your e-mail address, and (c) your postal address. ___________________________________________________________ REPORT #1 "HOW TO MAKE $250,000 THROUGH MULTI-LEVEL SALES" ORDER REPORT #1 FROM: K. Winchell (will accept your currency) PO Box 283 Sandy Creek, NY USA 13145 _______________________________________________________ REPORT #2 "MAJOR CORPORATIONS AND MULTI-LEVEL SALES" ORDER REPORT #2 FROM: E.Mills (will accept your currency) PO Box 2 Mowbray Heights Launceston,Tasmania Australia 7248 ________________________________________________ REPORT #3 "SOURCES FOR THE BEST MAILING LISTS" Jim Wright 38 Pentyla Baglan Rd Port Talbot West Glamorgan SA12 8AA Wales UK ________________________________________________ REPORT #4 "EVALUATING MULTI-LEVEL SALES PLANS" ORDER REPORT #4 FROM: Conrad Fry 1 Avon Gardens West Bridgford Nottingham England NG2 6BP ---------------------------------------------------------------- ----- HERE'S HOW THIS AMAZING PLAN WILL MAKE YOU $MONEY$ ---------------------------------------------------------------- ----- Let's say you decide to start small just to see how well it works. Assume your goal is to get 10 people to participate on your first level. (Placing a lot of FREE ads on the Internet will EASILY get a larger response.) Also assume that everyone else in YOUR ORGANIZATION gets ONLY 10 downline members. Follow this example to achieve the STAGGERING results below. 1st level--your 10 members with $5.......................$50 2nd level--10 members from those 10 ($5 x 100)........$500 3rd level--10 members from those 100 ($5 x 1,000)...$5,000 4th level--10 members from those 1,000 ($5x10,000).$50,000 THIS TOTALS ------ $55,550 Remember, this assumes that the people who participate only recruit 10 people each. Think for a moment what would happen if they got 20 people to participate! Lots of people get 100s of participants! THINK ABOUT IT! Your cost to participate in this is practically nothing (surely you can afford $20). You obviously already have an Internet connection and e-mail is FREE! REPORT #3 shows you the most productive methods for bulk e-mailing and purchasing e-mail lists. Some list & bulk e-mail vendors even work on trade! Over 50,000, new people, get on the Internet EVERYDAY (CBS NEWS)! *******TIPS FOR SUCCESS******* * TREAT THIS AS YOUR BUSINESS! Be prompt, professional, and follow the directions accurately. * Send for the four reports IMMEDIATELY so you will have them when the orders start coming in because: When you receive a $5 order, you MUST send out the requested product (report) to comply with the U.S. Postal & Lottery Laws, Title 18, Sections 1302 and 1341 or Title 18, Section 3005 in the U.S. Code, also Code of Federal Regs. vol. 16, Sections 255 and 436, which state that "a product or service must be exchanged for money received." * ALWAYS PROVIDE SAME-DAY SERVICE ON THE ORDERS YOU RECEIVE. * Be patient and persistent with this program. If you follow the instructions exactly, the results WILL undoubtedly be SUCCESSFUL! * ABOVE ALL, HAVE FAITH IN YOURSELF AND KNOW YOU WILL SUCCEED! *******YOUR SUCCESS GUIDELINE******* Follow these guidelines to help assure your success: If you don't receive 10 to 20 orders for REPORT #1 within two weeks, continue advertising until you do. Then, a couple of weeks later you should receive at least 100 orders for REPORT #2. If you don't, continue advertising until you do. Once you have received 100 or more orders for REPORT #2, YOU CAN RELAX, because the system is already working for you, and the cash can continue to roll in! THIS IS IMPORTANT TO REMEMBER: Every time your name is moved down on the list, you are placed in front of a DIFFERENT report. You can KEEP TRACK of your PROGRESS by watching which report people are ordering from you. If you want to generate more income, send another batch of e- mails and start the whole process again! There is no limit to the income you will generate from this business! PLEASE NOTE: If you need help with starting a business, registering a business name, learning how income tax is handled, etc., contact your local office of the Small Business Administration (a Federal agency) 1-(800)827-5722 for free help and answers to questions. Also, the Internal Revenue Service offers free help via telephone and free seminars about business tax requirements. Your earnings and results are highly dependent on your activities and advertising. This letter constitutes no guarantees stated nor implied. In the event that it is determined that this letter constitutes a guarantee of any kind, that guarantee is now void. Any testimonials or amounts of earnings listed in this letter may be factual or fictitious. If you have any question of the legality of this letter contact the Office of Associate Director for Marketing Practices Federal Trade Commission Bureau of Consumer Protection in Washington DC. *******T E S T I M O N I A L S******* This program does work, but you must follow it EXACTLY! Especially the rule of not trying to place your name in a different position, it won't work and you'll lose a lot of potential income. I'm living proof that it works. It really is a great opportunity to make relatively easy money, with little cost to you. If you do choose to participate, follow the program exactly, and you'll be on your way to financial security. Sean McLaughlin, Jackson, MS My name is Frank. My wife, Doris, and I live in Bel-Air, MD. I am a cost accountant with a major U.S. Corporation and I make pretty good money. When I received the program I grumbled to Doris about receiving "junk mail." I made fun of the whole thing, spouting my knowledge of the population and percentages involved. I "knew" it wouldn't work. Doris totally ignored my supposed intelligence and jumped in with both feet. I made merciless fun of her, and was ready to lay the old "I told you so" on her when the thing didn't work... well, the laugh was on me! Within two weeks she had received over 50 responses. Within 45 days she had received over $147,200 in $5 bills! I was shocked! I was sure that I had it all figured and that it wouldn't work. I AM a believer now. I have joined Doris in her "hobby." I did have seven more years until retirement, but I think of the "rat race" and it's not for me. We owe it all to MLM. Frank T., Bel-Air, MD I just want to pass along my best wishes and encouragement to you. Any doubts you have will vanish when your first orders come in. I even checked with the U.S. Post Office to verify that the plan was legal. It definitely is! IT WORKS! Paul Johnson, Raleigh, NC The main reason for this letter is to convince you that this system is honest, lawful, extremely profitable, and is a way to get a large amount of money in a short time. I was approached several times before I checked this out. I joined just to see what one could expect in return for the minimal effort and money required. To my astonishment, I received $36,470.00 in the first 14 weeks, with money still coming in. Phillip A. Brown, Esq. Not being the gambling type, it took me several weeks to make up my mind to participate in this plan. But conservative that I am, I decided that the initial investment was so little that there was just no way that I wouldn't get enough orders to at least get my money back. Boy, was I surprised when I found my medium- size post office box crammed with orders! For a while, it got so overloaded that I had to start picking up my mail at the window. I'll make more money this year than any 10 years of my life before. The nice thing about this plan is that it doesn't matter where in the U.S. people live. There simply isn't a better investment with a faster return. Mary Rockland, Lansing, MI I had received this program before. I deleted it, but later I wondered if I shouldn't have given it a try. Of course, I had no idea who to contact to get another copy, so I had to wait until I was e-mailed another program...11 months passed then it came...I didn't delete this one!...I made more than $41,000 on the first try!! D. Wilburn, Muncie, IN This is my third time to participate in this plan. We have quit our jobs, and will soon buy a home on the beach and live off the interest on our money. The only way on earth that this plan will work for you is if you do it. For your sake, and for your family's sake don't pass up this golden opportunity. Good luck and happy spending! Charles Fairchild, Spokane, WA ORDER YOUR REPORTS TODAY AND GET STARTED ON YOUR ROAD TO FINANCIAL FREEDOM! NOW IS THE HOUR! DECISIVE ACTION YIELDS POWERFUL RESULTS ! ********************************************************* Your request to be removed will be processed within 24 hours. DISCLAIMER: Under Bill s.1618 TITLE III passed by the 105th US Congress this letter Cannot be considered Spam as long as the sender includes contact information & a method of removal.To be removed from future mailings just reply with REMOVE in the subject line.Thank you for your kind consideration. From akuchlin@mems-exchange.org Thu Oct 12 03:46:15 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Wed, 11 Oct 2000 22:46:15 -0400 Subject: [XML-SIG] What's New section on XML Message-ID: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Here's draft text for a section that briefly discusses the new XML support in Python 2.0. Criticisms and comments, please... --amk 13 XML Modules Python 1.5.2 included a simple XML parser in the form of the xmllib module, contributed by Sjoerd Mullender. Since 1.5.2's release, two different interfaces for processing XML have become common: SAX2 (version 2 of the Simple API for XML) provides an event-driven interface with some similarities to xmllib, and the DOM (Document Object Model) provides a tree-based interface, transforming an XML document into a tree of nodes that can be traversed and modified. Python 2.0 includes a SAX2 interface and a stripped-down DOM interface as part of the xml package. Here we will give a brief overview of these new interfaces; consult the Python documentation or the source code for complete details. The Python XML SIG is also working on improved documentation. 13.1 SAX2 Support SAX defines an event-driven interface for parsing XML. To use SAX, you must write a SAX handler class. Handler classes inherit from various classes provided by SAX, and override various methods that will then be called by the XML parser. For example, the startElement and endElement methods are called for every starting and end tag encountered by the parser, the characters() method is called for every chunk of character data, and so forth. The advantage of the event-driven approach is that that the whole document doesn't have to be resident in memory at any one time, which matters if you are processing really huge documents. However, writing the SAX handler class can get very complicated if you're trying to modify the document structure in some elaborate way. For example, this little example program defines a handler that prints a message for every starting and ending tag, and then parses the file hamlet.xml using it: from xml import sax class SimpleHandler(sax.ContentHandler): def startElement(self, name, attrs): print 'Start of element:', name, attrs.keys() def endElement(self, name): print 'End of element:', name # Create a parser object parser = sax.make_parser() # Tell it what handler to use handler = SimpleHandler() parser.setContentHandler( handler ) # Parse a file! parser.parse( 'hamlet.xml' ) For more information, consult the Python documentation, or the XML HOWTO at http://www.python.org/doc/howto/xml/. 13.2 DOM Support The Document Object Model is a tree-based representation for an XML document. A top-level Document instance is the root of the tree, and has a single child which is the top-level Element instance. This Element has children nodes representing character data and any sub-elements, which may have further children of their own, and so forth. Using the DOM you can traverse the resulting tree any way you like, access element and attribute values, insert and delete nodes, and convert the tree back into XML. The DOM is useful for modifying XML documents, because you can create a DOM tree, modify it by adding new nodes or rearranging subtrees, and then produce a new XML document as output. You can also construct a DOM tree manually and convert it to XML, which can be a more flexible way of producing XML output than simply writing ... to a file. The DOM implementation included with Python lives in the xml.dom.minidom module. It's a lightweight implementation of the Level 1 DOM with support for XML namespaces. The parse() and parseString() convenience functions are provided for generating a DOM tree: from xml.dom import minidom doc = minidom.parse('hamlet.xml') doc is a Document instance. Document, like all the other DOM classes such as Element and Text, is a subclass of the Node base class. All the nodes in a DOM tree therefore support certain common methods, such as toxml() which returns a string containing the XML representation of the node and its children. Each class also has special methods of its own; for example, Element and Document instances have a method to find all child elements with a given tag name. Continuing from the previous 2-line example: perslist = doc.getElementsByTagName( 'PERSONA' ) print perslist[0].toxml() print perslist[1].toxml() For the Hamlet XML file, the above few lines output: CLAUDIUS, king of Denmark. HAMLET, son to the late, and nephew to the present king. The root element of the document is available as doc.documentElement, and its children can be easily modified by deleting, adding, or removing nodes: root = doc.documentElement # Remove the first child root.removeChild( root.childNodes[0] ) # Move the new first child to the end root.appendChild( root.childNodes[0] ) # Insert the new first child (originally, # the third child) before the 20th child. root.insertBefore( root.childNodes[0], root.childNodes[20] ) Again, I will refer you to the Python documentation for a complete listing of the different Node classes and their various methods. 13.3 Relationship to PyXML The XML Special Interest Group has been working on XML-related Python code for a while. Its code distribution, called PyXML, is available from the SIG's Web pages at http://www.python.org/sigs/xml-sig/. The PyXML distribution also used the package name "xml". If you've written programs that used PyXML, you're probably wondering about its compatibility with the 2.0 xml package. The answer is that Python 2.0's xml package isn't compatible with PyXML, but can be made compatible by installing a recent version PyXML. Many applications can get by with the XML support that is included with Python 2.0, but more complicated applications will require that the full PyXML package will be installed. When installed, PyXML versions 0.6.0 or greater will replace the xml package shipped with Python, and will be a strict superset of the standard package, adding a bunch of additional features. Some of the additional features in PyXML include: * 4DOM, a full DOM implementation from FourThought LLC. * The xmlproc validating parser, written by Lars Marius Garshol. * The sgmlop parser accelerator module, written by Fredrik Lundh From uche.ogbuji@fourthought.com Thu Oct 12 08:30:17 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 12 Oct 2000 01:30:17 -0600 Subject: [XML-SIG] ANN: 4Suite 0.9.1 Message-ID: <200010120730.BAA18082@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.9.1 --------------------------- Open source tools for standards-based XML, DOM, XPath, XSLT, RDF XPointer and object-database development in Python 4Suite is a collection of Python tools for XML processing and object database management. An integrated packaging of several formerly separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS and featuring the new 4XPointer. More info and Obtaining 4Suite ------------------------------ Please see http://Fourthought.com/4Suite Or you can download 4Suite source from ftp://Fourthought.com/pub/4Suite There are Windows Packages, Linux RPM and Linux binary also available at ftp://Fourthought.com/pub/4Suite 4Suite is distributed under a license similar to that of Python. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From alf@logilab.com Thu Oct 12 09:25:37 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Thu, 12 Oct 2000 10:25:37 +0200 (CEST) Subject: [XML-SIG] Generating XML documents In-Reply-To: <39E480EF.5C56215E@pixelsystems.com> Message-ID: On Wed, 11 Oct 2000, Patrick Blanchette wrote: > Hi, > I'm a newbie in xml. I want to generate new xml documents using > python code. In the HOWTO doc, there is a "xml.dom.builder" class but > this class did not seem to be part of the PyXML 0.6.1. > Where can I found a python base class for generating xml documents? Actually, generating XML document is pretty easy: you just have to build a string which begins with '' and add your tags in the string, and flush it to disk. This works well if you know in advance what you want oin your XML document. OTOH, if you want to build your document incrementally, you can use DOM. you've got to import the DOM implementation first: >>> from xml.dom import implementation and build a document from the implementation: >>> docType = implementation.createDocumentType('','','') >>> doc = implementation.createDocument('',None,docType) Then you can set your root element: >>> root = doc.createElementNS('','docRoot') >>> doc.appendChild(root) And then you can use the createXXX methods of your document to create new nodes and use appendChild to add them to other nodes. If you want to set attributes, you can use the setAttributeNS method in Element. Beware if you're using python1.5.2, you cannot use no ASCII characters as arguments to createTextNode() or setAttributeNS() (at least not if you intend to save your file to disk), since these expect UTF-8 strings, and not iso-8859-1 strings. When you're done creating your document, use Print or PrettyPrint to save it to disk: >>> from xml.dom.ext import PrettyPrint >>> f=open('/tmp/test.xml','w') >>> PrettyPrint(doc,f) If you want to reload your file, use the readers: >>> from xml.dom.ext.reader import Sax2 >>> doc = Sax2.FromXmlFile('/tmp/test.xml') If you saved it with PrettyPrint, the new dom tree will have text nodes full of whitespace. You can use the StripXml extension to clean the tree: >>> from xml.dom.ext import StripXml >>> StripXml(doc) I'd suggest that you check the W3C DOM spec (at least core DOM), since the DOM implementation in PyXml0.6 in a good implementation of the spec. I'll be glad to help you if you have further questions with DOM. -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From alf@logilab.com Thu Oct 12 09:42:10 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Thu, 12 Oct 2000 10:42:10 +0200 (CEST) Subject: [XML-SIG] What's New section on XML In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Message-ID: On Wed, 11 Oct 2000, A.M. Kuchling wrote: > Here's draft text for a section that briefly discusses the new XML > support in Python 2.0. Criticisms and comments, please... > The DOM implementation included with Python lives in the > xml.dom.minidom module. It's a lightweight implementation of the Level > 1 DOM with support for XML namespaces. Is minidom a *strict* implementation of DOM L1, with some extensions that would point towards DOM L2 (namespaces) ? -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From alf@logilab.com Thu Oct 12 11:37:41 2000 From: alf@logilab.com (Alexandre Fayolle) Date: Thu, 12 Oct 2000 12:37:41 +0200 (CEST) Subject: [XML-SIG] packaging issue Message-ID: Isn't there a packaging issue with 4DOM being possibly provided by to means (PyXml and 4Suite) ? I don't know about RPMs, but I understand that this will cause major headaches to .deb maintainers. -- Alexandre Fayolle http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France). From akuchlin@mems-exchange.org Thu Oct 12 15:37:48 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 12 Oct 2000 10:37:48 -0400 Subject: [XML-SIG] What's New section on XML In-Reply-To: ; from alf@logilab.com on Thu, Oct 12, 2000 at 10:42:10AM +0200 References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Message-ID: <20001012103748.B8959@kronos.cnri.reston.va.us> On Thu, Oct 12, 2000 at 10:42:10AM +0200, Alexandre Fayolle wrote: >Is minidom a *strict* implementation of DOM L1, with some extensions that >would point towards DOM L2 (namespaces) ? Beats me. It doesn't seem to be a strict L1 implementation, since last night I found some non-compliances. See bugs #116677 and #116678 on SourceForge. I've assigned them to Paul for fixing, but if someone else wants to tackle them, feel free; I may attempt to write a patch myself. --amk From larsga@garshol.priv.no Thu Oct 12 16:48:02 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Oct 2000 17:48:02 +0200 Subject: [XML-SIG] What's New section on XML In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Message-ID: * A. M. Kuchling | | Here's draft text for a section that briefly discusses the new XML | support in Python 2.0. Criticisms and comments, please... I like it! However, I think it be worth mentioning that expat underlies SAX and the DOM, comes with 2.0 and supports Unicode. Adding that both SAX and the DOM are parser-independent may also be worthwhile. --Lars M. From jeremy@beopen.com Thu Oct 12 17:11:53 2000 From: jeremy@beopen.com (Jeremy Hylton) Date: Thu, 12 Oct 2000 12:11:53 -0400 (EDT) Subject: [XML-SIG] test_minidom non-failure failure? (take 2) Message-ID: <14821.58057.358947.271778@bitdiddle.concentric.net> Sorry about the previous message; a mail munger somewhere between my display and python.org choked on a very long line... I am getting an occasional, hard-to-reproduce error in test_minidom. When I run the test, it displays about a thousand lines of garbage, but the test suite does not report test_minidom as failed or skipped. The output I see during the test run is this: test_minidom garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling': , 'childNodes': None, 'attributes': None, 'parentNode': None, 'data': u'Obsolete but implemented...', 'previousSibling': None}, , {'nodeValue': u'\012', 'nextSibling': None, 'childNodes': None, 'a [... many hundreds of lines omitted] At the end of the test, I get a pretty normal result: 95 tests OK. 13 tests skipped: test_al test_cd test_cl test_dbm test_dl test_gl test_imgfile test_largefile test_nis test_sunaudiodev test_timing test_winreg test_winsound So two questions: Why is test_minidom producing all this output? And why is it only happening intermittently? Why does regrtest.py think that test_minidom is working correctly when it produces all this output? Jeremy From brian@watchmark.com Thu Oct 12 17:15:16 2000 From: brian@watchmark.com (Brian Fritz) Date: Thu, 12 Oct 2000 09:15:16 -0700 Subject: [XML-SIG] Q: Post install testing errors Message-ID: <39E5E394.84051CDA@watchmark.com> I just installed the PyXML-0.5.5.1 software on my SparcStation here at work. I ran the PyXML-0.5.5.1/test/testxml.py script and got the following results: > blackriver /export/home/PyXML-0.5.5.1/test> python testxml.py > test_dom > test_dom2 > Warning: can't open ./output/test_dom2 > > > test_domu > test_howto > test_htmlb > test_marshal > test_pyexpat > test test_pyexpat failed -- Writing: 'Summary of XML parser upcalls:', expected: > 'Parser returned 1\012Summary of X' > test_sax > test_unicode > test_utils > test_xmllib > test test_xmllib skipped -- an optional feature could not be imported > 9 tests OK. > 1 test failed: test_pyexpat > 1 test skipped: test_xmllib > blackriver /export/home/PyXML-0.5.5.1/test> I then ran the test_xmllib.py script and recieved the following error: > blackriver /export/home/PyXML-0.5.5.1/test> python test_xmllib.py > Traceback (innermost last): > File "test_xmllib.py", line 25, in ? > from xml.parsers import xmllib > ImportError: cannot import name xmllib > blackriver /export/home/PyXML-0.5.5.1/test> Are these errors worth worrying about, or should I just get on with using (learning) Python and XML? Thanks in Advance! Brian From nas@arctrix.com Thu Oct 12 10:31:34 2000 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 12 Oct 2000 02:31:34 -0700 Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: <14821.58057.358947.271778@bitdiddle.concentric.net>; from jeremy@beopen.com on Thu, Oct 12, 2000 at 12:11:53PM -0400 References: <14821.58057.358947.271778@bitdiddle.concentric.net> Message-ID: <20001012023134.A18254@glacier.fnational.com> On Thu, Oct 12, 2000 at 12:11:53PM -0400, Jeremy Hylton wrote: > I am getting an occasional, hard-to-reproduce error in test_minidom. > When I run the test, it displays about a thousand lines of garbage, > but the test suite does not report test_minidom as failed or skipped. > > The output I see during the test run is this: > > test_minidom > garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling': This is most likely the garbage collector. regrtest.py contains the following code: if findleaks: gc.collect() if gc.garbage: print "garbage:", repr(gc.garbage) found_garbage.extend(gc.garbage) del gc.garbage[:] findleaks is true if the -l option is specified (TESTOPS in the makefile includes it). Something is producing cyclic garbage. Neil From guido@python.org Thu Oct 12 18:39:40 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 12 Oct 2000 12:39:40 -0500 Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: Your message of "Thu, 12 Oct 2000 02:31:34 MST." <20001012023134.A18254@glacier.fnational.com> References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> Message-ID: <200010121739.MAA07968@cj20424-a.reston1.va.home.com> > On Thu, Oct 12, 2000 at 12:11:53PM -0400, Jeremy Hylton wrote: > > I am getting an occasional, hard-to-reproduce error in test_minidom. > > When I run the test, it displays about a thousand lines of garbage, > > but the test suite does not report test_minidom as failed or skipped. > > > > The output I see during the test run is this: > > > > test_minidom > > garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling': [Neil] > This is most likely the garbage collector. regrtest.py contains > the following code: > > if findleaks: > gc.collect() > if gc.garbage: > print "garbage:", repr(gc.garbage) > found_garbage.extend(gc.garbage) > del gc.garbage[:] > > findleaks is true if the -l option is specified (TESTOPS in the > makefile includes it). Something is producing cyclic garbage. Of course something is producing cyclic garbage! The DOM tree is full of parent and child links. Does this output mean that the GC works correctly? Or does it mean that there is a reason why this garbage cannot be disposed of? In the latter case, could that be because there are __del__ methods? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@beopen.com Thu Oct 12 17:55:19 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 12 Oct 2000 12:55:19 -0400 (EDT) Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: <20001012023134.A18254@glacier.fnational.com> References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> Message-ID: <14821.60663.213246.179325@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > This is most likely the garbage collector. regrtest.py contains > the following code: ... > findleaks is true if the -l option is specified (TESTOPS in the > makefile includes it). Something is producing cyclic garbage. This is definately the problem. Lars, Paul: This looks like a problem in the unlink() method of the DOM. Could you please check that the unlink() method is updated to handle the latest version of the other changes? Thanks! -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From fdrake@beopen.com Thu Oct 12 17:59:33 2000 From: fdrake@beopen.com (Fred L. Drake, Jr.) Date: Thu, 12 Oct 2000 12:59:33 -0400 (EDT) Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: <14821.58057.358947.271778@bitdiddle.concentric.net> References: <14821.58057.358947.271778@bitdiddle.concentric.net> Message-ID: <14821.60917.429141.652655@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > Why is test_minidom producing all this output? And why is it only > happening intermittently? It isn't. See Neil's excellent explanation. > Why does regrtest.py think that test_minidom is working correctly when > it produces all this output? The test is passing just fine, and is complete before the test for garbage is performed. The unlink() method on DOM objects is the culprit; it is updating the Node.allnodes dictionary correctly, but not the Node instances. I've already asked Paul & Lars to fix this; it should work just fine with or without GC once they've seen the report. -Fred -- Fred L. Drake, Jr. BeOpen PythonLabs Team Member From uche.ogbuji@fourthought.com Thu Oct 12 18:16:14 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 12 Oct 2000 11:16:14 -0600 Subject: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1 References: Message-ID: <39E5F1DE.14311B@fourthought.com> Alexandre Fayolle wrote: > > Congratulations. > > It works great here. Phew! No one uncovers bugs as diligently as you do, so that's good to hear. Of course, it's only been a few hours, eh? > > 4Suite is a collection of Python tools for XML processing and object > > database management. An integrated packaging of several formerly > > separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS > > and featuring the new 4XPointer. > > Why does 4XPointer live in Ft/ and not in xml/ ? Do you plan to move xpath > and xslt to Ft/ too ? The Python XML-SIG really "owns" the "xml" Python package namespace and we don't intrude on it without its permission. DOM moved there for obvious reasons (it's now the official full DOM of the SIG). Earlier this year the decision was made to move xslt and xpath there as part of initial moves towards incorporating them into the xml-sig package but I think concerns over their cross-platform compatability are holding up outright adoption. We haven't discusssed 4RDF or 4XPointer (or the coming 4XLink). Is it time to reopen this discussion? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From nas@arctrix.com Thu Oct 12 11:24:48 2000 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 12 Oct 2000 03:24:48 -0700 Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: <200010121739.MAA07968@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Oct 12, 2000 at 12:39:40PM -0500 References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> <200010121739.MAA07968@cj20424-a.reston1.va.home.com> Message-ID: <20001012032448.A18407@glacier.fnational.com> On Thu, Oct 12, 2000 at 12:39:40PM -0500, Guido van Rossum wrote: > Of course something is producing cyclic garbage! > > The DOM tree is full of parent and child links. > > Does this output mean that the GC works correctly? Or does it > mean that there is a reason why this garbage cannot be disposed > of? In the latter case, could that be because there are > __del__ methods? The -l option tries to find any cyclic garbage produced by the tests. I don't think that that option should be enabled default. The output means that the GC is working and is finding stuff that would not be freed by reference counting alone. I can't tell if the GC would free this garbage. The -l option sets the DEBUG_SAVEALL option which causes all garbage found to end up in gc.garbage, not just garbage the can't be cleaned up. I don't have pyexpat installed here so I can't test it. If you want to find out if test_minidom is creating garbage the collector can't free you should comment out the: gc.set_debug(gc.DEBUG_SAVEALL) line in regrtest.py and run: regrtest.py -l test_minidom If that does what I think it does and you still get the "garbage: " line then the test is creating evil things. :) Neil From jumpytom@yahoo.com Fri Oct 13 16:20:24 2000 From: jumpytom@yahoo.com (Jack Greene) Date: Fri, 13 Oct 2000 08:20:24 -0700 (PDT) Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2 Message-ID: <20001013152024.35406.qmail@web9704.mail.yahoo.com> I use sax (L.M. Garshol's saxlib 1.0) in an application written in Python 1.5.2 (on WinNT4 SP5). sax is imported using the standard "from xml.sax import ...". That used to work fine until this week when, after a hard drive failure, I had to reinstall everything on this machine. Now I get a "ImportError: No module named xml.sax" sax is installed in /Lib which is where, I think, I had it last time. The problem is, I can't remember what I did when I installed sax originally to get it to work. What am I doing wrong (besides not writing down important stuff like that)? Jack __________________________________________________ Do You Yahoo!? Get Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/ From larsga@garshol.priv.no Fri Oct 13 16:26:59 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 13 Oct 2000 17:26:59 +0200 Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2 In-Reply-To: <20001013152024.35406.qmail@web9704.mail.yahoo.com> References: <20001013152024.35406.qmail@web9704.mail.yahoo.com> Message-ID: * Jack Greene | | That used to work fine until this week when, after a hard drive | failure, I had to reinstall everything on this machine. Now I get a | "ImportError: No module named xml.sax" Hmmm. One thing I would check for is whether both xml/ and xml/sax/ contain the needed __init__.py files. One thing you might also try is to import xml and print xml.__file__. I can see no obvious mistakes or things that you should have done that you haven't. --Lars M. From nuno.simoes@ruido-visual.pt Fri Oct 13 16:32:35 2000 From: nuno.simoes@ruido-visual.pt (Nuno Simoes) Date: Fri, 13 Oct 2000 16:32:35 +0100 Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2 References: <20001013152024.35406.qmail@web9704.mail.yahoo.com> Message-ID: <39E72B13.92A8B307@ruido-visual.pt> Jack Greene wrote: Hi. [...] > What am I doing wrong (besides not writing down > important stuff like that)? You should have a directory tree like this: ~/lib/xml ~/lib/xml/dom ~/lib/xml/marshal ~/lib/xml/parsers ~/lib/xml/sax ~/lib/xml/unicode ~/lib/xml/utils Remember to do a "python setup.py build" to build the pyc files. Another thing, in Win32, you must/should copy the files under ~/windows to ~/lib/xml/parsers/ . Nuno Simões, RVTI From larsga@garshol.priv.no Fri Oct 13 21:08:05 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 13 Oct 2000 22:08:05 +0200 Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: <14821.60663.213246.179325@cj42289-a.reston1.va.home.com> References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> <14821.60663.213246.179325@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | This looks like a problem in the unlink() method of the DOM. Could | you please check that the unlink() method is updated to handle the | latest version of the other changes? It seems that the current unlink() does not remove sibling cycles. Patch #101897 adds a line to set sibling references to None, which seems to make regrtest.py -l happy. --Lars M. From prescod@prescod.net Fri Oct 13 21:12:38 2000 From: prescod@prescod.net (Paul) Date: Fri, 13 Oct 2000 15:12:38 -0500 (CDT) Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2) In-Reply-To: Message-ID: Right, I just checked in the fix to that. Paul Prescod On 13 Oct 2000, Lars Marius Garshol wrote: > > * Fred L. Drake, Jr. > | > | This looks like a problem in the unlink() method of the DOM. Could > | you please check that the unlink() method is updated to handle the > | latest version of the other changes? > > It seems that the current unlink() does not remove sibling cycles. > Patch #101897 adds a line to set sibling references to None, which > seems to make regrtest.py -l happy. > > --Lars M. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev > From uche.ogbuji@fourthought.com Sun Oct 15 06:25:08 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 14 Oct 2000 23:25:08 -0600 Subject: [XML-SIG] 4RDF on XML.com Message-ID: <200010150525.XAA24601@localhost.localdomain> I just wanted to note that my write-up on 4RDF is this week's feature article on XML.com It's been an interesting time since we offered up the first public glimpse of 4RDF. A lot of quite excited response. Dan Brickley, editor of the RDF Schema spec, in an e-mail exchange with me, mentioned he thought the feature set was good enough that he'd consider learning Python. Indeed I went on to take a look at the other RDF systems out there and I think 4RDF is way ahead of the pack. It avoids the horrid contortions of the SIRPAC API and provides full support from models, containers, full serialization/deserialization, reification, etc. through schemas. It provides extra amenities such as pluggable back ends and Inference through RDF Inference Language. At any rate, there's much more to read at http://www.xml.com/pub/2000/10/11/rdf/index.html and until next Thursday, simply http://www.xml.com Will suffice. Comments are welcome, especially from this group. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From pingl_l@yahoo.com Mon Oct 16 00:27:07 2000 From: pingl_l@yahoo.com (Ping Li) Date: Sun, 15 Oct 2000 18:27:07 -0500 Subject: [XML-SIG] Web Site Translation for Ligand-Protein Docking Message-ID: <20001015232702.YBDN2291.mtiwmhc21.worldnet.att.net@ea> From: Ping Li To: Ligand-Protein Docking Dear Web Manager, I visited your Web site at http://www.scripps.edu/pub/olson-web/people/gmm and would like to let you know that your Web site could also be presented in other languages for broader recognition. If you feel that my suggestion has no value, please kindly ignore this message and accept my apology. According to research done by International Data Corporation, non-English speaking users will make up over 50% of the total online population by 2002, and 70% by 2004. Business Web users are three times more likely to buy when addressed in their own languages (survey by Forrester Research). We specialize in Web Site Translation and Global URL Submission in 11 languages - English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese and Korean. Our customized service package includes: 1. Web page translation and Web programming localization -- We not only convert HTML pages and graphics, but also ensure that the translated Web sites fully function in the target language environment. 2. URL submission and re-submission to leading search engines and business directories in the target languages -- Our Web promotion specialists optimize the keywords and descriptions of the translated Web sites for ideal search engine rankings. 3. Company profile translation and a free listing in GlobalListing, a multi-language business directory. The translation is conducted by our professional translators who are native speakers and have years of Web translation experience. We are not hired for the ability to take a word in one language and convert it into an equivalent word in another language. Instead, we get to the heart of communication and express the true meaning of your message, because we are aware that improper translation may cause unrecoverable damages to your company's image. At GlobalListing, we are in the business of ensuring that you are 100% satisfied with our quality services. Thank you very much for your time. Should you be interested in our services, please contact me for a free estimate or any further information. Best regards, Ping Lee Director, Web Translation GlobalListing Phone: (604) 324-4638 Fax: (413) 431-2597 Office Hour: 9:00AM - 5:00PM (Pacific Time) From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:22:26 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 09:22:26 +0200 Subject: [XML-SIG] What's New section on XML In-Reply-To: (message from Alexandre Fayolle on Thu, 12 Oct 2000 10:42:10 +0200 (CEST)) References: Message-ID: <200010160722.JAA00853@loewis.home.cs.tu-berlin.de> > Is minidom a *strict* implementation of DOM L1, with some extensions that > would point towards DOM L2 (namespaces) ? No, minidom does not support all of DOM L1. See my patch on minidom documentation on SF for a detailed list of things it does and doesn't. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:19:50 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 09:19:50 +0200 Subject: [XML-SIG] Q: Post install testing errors In-Reply-To: <39E5E394.84051CDA@watchmark.com> (message from Brian Fritz on Thu, 12 Oct 2000 09:15:16 -0700) References: <39E5E394.84051CDA@watchmark.com> Message-ID: <200010160719.JAA00836@loewis.home.cs.tu-berlin.de> > Are these errors worth worrying about, or should I just get on with using > (learning) Python and XML? These errors are nothing to worry about. Please note that parts of the package have seen some changes recently (in particular, the DOM implementation); if you want to live on the "cutting edge", you should install PyXML 0.6.1 (from http://sourceforge.net/projects/pyxml) instead. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:31:24 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 09:31:24 +0200 Subject: [XML-SIG] What's New section on XML In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> (amk@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com) References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Message-ID: <200010160731.JAA00943@loewis.home.cs.tu-berlin.de> > parser.setContentHandler( handler ) I've always wondered where this style of spacing originates. The BDFL says he hates it when he sees a space in this place (http://www.python.org/doc/essays/styleguide.html); so do I. Otherwise, it looks fine to me. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:36:57 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 09:36:57 +0200 Subject: [XML-SIG] packaging issue In-Reply-To: (message from Alexandre Fayolle on Thu, 12 Oct 2000 12:37:41 +0200 (CEST)) References: Message-ID: <200010160736.JAA00987@loewis.home.cs.tu-berlin.de> > Isn't there a packaging issue with 4DOM being possibly provided by to > means (PyXml and 4Suite) ? I don't know about RPMs, but I understand that > this will cause major headaches to .deb maintainers. I think the problem is beyond what those package systems are designed to do. It's not that the distributions coincidently include the same files - they have the same file names on purpose. An intelligent decision of the system administrator is required to decide which one to use - the packaging system can't (and shouldn't) make this decision. Distributors can make a decision for the maintainer; I can see a number of intelligent decisions: Include the 4DOM modules only in one of them; produce an independent 4DOM package; provide a single package containing both PyXML and 4Suite. In any case: this is a packaging problem; I don't think such a problem should change what the XML-SIG or Fourthought maintains in there source code repositories. Patches to setup.py of PyXML are certainly welcome. Regards, Martin From larsga@garshol.priv.no Mon Oct 16 09:35:59 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Oct 2000 10:35:59 +0200 Subject: [XML-SIG] How to proceed Message-ID: Now that SAX 2.0 is more-or-less done I am ready to start working on other Python-XML projects. I'd like to hear the opinion of those of you here on how to proceed. Below is a list of projects that I am thinking about (I can't actually promise that I will get round to all of them). The question is, where do you think development of these packages ought to happen? As part of the XML-SIG work, as a separate SourceForge project or privately, the way I've done so far. xmlproc This needs to be updated to XML 1.0 2nd ed, extended with Unicode support and a SAX 2.0 driver (I have 95% of one ready) and also improved in various ways. dtddoc This has not been taken very far yet, but could become a useful package if more thought and effort were put into it. saxlib I plan for this package to contain lots of SAX 2.0-related utilities, like DOM2SAX walkers, XInclude and XBase filters, more advanced parser instantiation tools, more drivers etc. RSS-kit This is a toolkit for working with RSS documents that I have had lying around for more than a year. It's now getting much closer to being useful; the question is where I should develop it further. --Lars M. From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 09:39:43 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 10:39:43 +0200 Subject: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1 In-Reply-To: <39E5F1DE.14311B@fourthought.com> (message from Uche Ogbuji on Thu, 12 Oct 2000 11:16:14 -0600) References: <39E5F1DE.14311B@fourthought.com> Message-ID: <200010160839.KAA01656@loewis.home.cs.tu-berlin.de> > Is it time to reopen this discussion? Certainly, yes. > The Python XML-SIG really "owns" the "xml" Python package namespace and > we don't intrude on it without its permission. DOM moved there for > obvious reasons (it's now the official full DOM of the SIG). Earlier > this year the decision was made to move xslt and xpath there as part of > initial moves towards incorporating them into the xml-sig package but I > think concerns over their cross-platform compatability are holding up > outright adoption. Could somebody please summarize what these concerns where? I understand to fully build 4XPath from source, you need BisonGen, bison, flex, SWIG, ... what else? This is indeed an impressive list of prerequisites, but I can't see anything inherently platform-dependent in it. Since 4Suite 0.9.1 comes with these files prebuilt, it can't be too difficult to adjust the PyXML build procedure to also assume they are generated. Would patches to the build process be accepted? It seems that it should be easy to get at least SWIG out of the picture, by properly changing BisonGen. In the long run, I wish Python had a standard parser generator so the dependency on Bison could be removed; that's beyond reach at the moment. Does anybody think it is funny that you need EBNF parsers in XML tools?-) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 09:47:01 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 16 Oct 2000 10:47:01 +0200 Subject: [XML-SIG] What's New section on XML In-Reply-To: <20001012103748.B8959@kronos.cnri.reston.va.us> (message from Andrew Kuchling on Thu, 12 Oct 2000 10:37:48 -0400) References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> <20001012103748.B8959@kronos.cnri.reston.va.us> Message-ID: <200010160847.KAA01704@loewis.home.cs.tu-berlin.de> > Beats me. It doesn't seem to be a strict L1 implementation, since > last night I found some non-compliances. See bugs #116677 and #116678 > on SourceForge. I've assigned them to Paul for fixing, but if someone > else wants to tackle them, feel free; I may attempt to write a patch > myself. minidom is quite a limited implementation of the DOM, with many details missing. It seems the general rule is not to provide "convenience functions", i.e. if something can be achieved by other means, then don't provide this function. It is sufficient for building a tree from a document and analyzing it; I probably wouldn't attempt heavy structural manipulations on the tree (*). We'll have to see how well users accept that approach. It may be desirable to fully conform to DOM Core (of some level) for a later Python release, even if that means that minidom will grow in size and perhaps even slow down. Regards, Martin (*) It would be intereresting to survey what people *do* use the DOM for; it's not all that clear to me that all features of the DOM are really in use. From loewis@informatik.hu-berlin.de Mon Oct 16 12:18:16 2000 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 16 Oct 2000 13:18:16 +0200 (MET DST) Subject: [XML-SIG] PyXML home page on SF Message-ID: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> I've added a new page on http://pyxml.sourceforge.net/, and made this the project home page. Sometimes, people had ran into the page, and got what still is in http://pyxml.sourceforge.net/index.php. Please let me know what you think. Patches to the page are welcome; comments that having that page is a stupid idea will be considered :-) Regards, Martin From akuchlin@mems-exchange.org Mon Oct 16 15:08:37 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 16 Oct 2000 10:08:37 -0400 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <200010161118.NAA10370@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Mon, Oct 16, 2000 at 01:18:16PM +0200 References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> Message-ID: <20001016100837.B9235@kronos.cnri.reston.va.us> On Mon, Oct 16, 2000 at 01:18:16PM +0200, Martin von Loewis wrote: >I've added a new page on http://pyxml.sourceforge.net/, and made this >the project home page. Sometimes, people had ran into the page, and >got what still is in http://pyxml.sourceforge.net/index.php. Should the XML topic guide be moved to a set of pages on pyxml.sourceforge.net? I'm really the only person left who can update the topic guide, and having the Web pages accessible through CVS would mean more people could keep them up to date. This would require 2 steps: 1) check the pages into CVS, along with the required scripts, and 2) set up a redirect from www.python.org/topics/xml/ to pyxml.sourceforge.net. --amk From uche.ogbuji@fourthought.com Mon Oct 16 15:08:55 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 16 Oct 2000 08:08:55 -0600 Subject: [XML-SIG] How to proceed In-Reply-To: Message from Lars Marius Garshol of "16 Oct 2000 10:35:59 +0200." Message-ID: <200010161408.IAA01441@localhost.localdomain> > > Now that SAX 2.0 is more-or-less done I am ready to start working on > other Python-XML projects. I'd like to hear the opinion of those of > you here on how to proceed. Below is a list of projects that I am > thinking about (I can't actually promise that I will get round to all > of them). > > The question is, where do you think development of these packages > ought to happen? As part of the XML-SIG work, as a separate > SourceForge project or privately, the way I've done so far. > > xmlproc > This needs to be updated to XML 1.0 2nd ed, extended with Unicode > support and a SAX 2.0 driver (I have 95% of one ready) and also > improved in various ways. This has my vote, easily. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Oct 16 15:16:35 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 16 Oct 2000 08:16:35 -0600 Subject: [XML-SIG] What's New section on XML In-Reply-To: Message from "A.M. Kuchling" of "Wed, 11 Oct 2000 22:46:15 EDT." <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> Message-ID: <200010161416.IAA01501@localhost.localdomain> > * 4DOM, a full DOM implementation from FourThought LLC. Fourthought, Inc., actually. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Mon Oct 16 15:22:51 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Oct 2000 16:22:51 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: <200010161408.IAA01441@localhost.localdomain> References: <200010161408.IAA01441@localhost.localdomain> Message-ID: * uche ogbuji | | This has my vote, easily. What has? I was asking where you (and the others) think development should happen, in the XML-SIG, as separate projects on SourceForge or privately (as has been done so far). All your email told me was that you have some opinion about xmlproc, but I haven't a clue what it was. :-) --Lars M. From larsga@garshol.priv.no Mon Oct 16 15:23:29 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Oct 2000 16:23:29 +0200 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us> References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | Should the XML topic guide be moved to a set of pages on | pyxml.sourceforge.net? I'm really the only person left who can | update the topic guide, and having the Web pages accessible through | CVS would mean more people could keep them up to date. I'm definitely for this. There are many times when I know I would have updated things on the pages if I'd had access. --Lars M. From chris@rpgarchive.com Mon Oct 16 15:58:29 2000 From: chris@rpgarchive.com (chris davis) Date: Mon, 16 Oct 2000 09:58:29 -0500 Subject: [XML-SIG] the faster way to get a dom. Message-ID: <39EB1795.395C41D@rpgarchive.com> I wondering what is the fastest (as in speed of processing) to get a DOM. Below is the way OI;ve been doing, but lately Ilve had to deal with very lrage XML documents and I wondeing if ther is a way to imporev speed. from xml.dom.ext.reader import Sax def parseXml(s,ownerDocument=None): "parse and return doc" doc = Sax.FromXml(s,ownerDocument) ext.StripXml(doc) return doc From uche.ogbuji@fourthought.com Mon Oct 16 16:19:13 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 16 Oct 2000 09:19:13 -0600 Subject: [XML-SIG] How to proceed In-Reply-To: Message from Lars Marius Garshol of "16 Oct 2000 16:22:51 +0200." Message-ID: <200010161519.JAA02069@localhost.localdomain> > * uche ogbuji > | > | This has my vote, easily. > > What has? I was asking where you (and the others) think development > should happen, in the XML-SIG, as separate projects on SourceForge or > privately (as has been done so far). Ah, I hadn't enough sleep when I responded. > All your email told me was that you have some opinion about xmlproc, > but I haven't a clue what it was. :-) I meant that I would much prefer to see development on xmlproc. In answer to your real question, though, I think they might as well all go on Sourceforge sonmce it will give others a chance to pitch in. (I would have considered "others pitching in" a remote contingency until recently when Martin stepped in and pretty much saved the XML-SIG). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Oct 16 16:24:38 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 16 Oct 2000 09:24:38 -0600 Subject: [XML-SIG] the faster way to get a dom. In-Reply-To: Message from chris davis of "Mon, 16 Oct 2000 09:58:29 CDT." <39EB1795.395C41D@rpgarchive.com> Message-ID: <200010161524.JAA02091@localhost.localdomain> > I wondering what is the fastest (as in speed of processing) to get a > DOM. Below is the way OI;ve been doing, but lately Ilve had to deal > with very lrage XML documents and I wondeing if ther is a way to imporev > speed. > > from xml.dom.ext.reader import Sax > > def parseXml(s,ownerDocument=None): > "parse and return doc" > doc = Sax.FromXml(s,ownerDocument) > ext.StripXml(doc) > return doc There is a lot of overhead in 4DOM's SAX reader. We've cut some out and we wonder whether we'll soon be reaching the point of diminishing requrns optimizing that. Maybe it's time for a c-level DOM builder. We have one for cDomlette, a tiny DOM written entirely in C (with Python interface, of course) which comes with 4Suite. It would take quite some effort to scale it up to the full 4DOM, though. Are the large documents such that a subset of the DOM would suffice for your use? If so, have a look at cDomlette in 4Suite. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Mon Oct 16 17:54:47 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 16 Oct 2000 10:54:47 -0600 Subject: [RIL] Re: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1 References: <39E5F1DE.14311B@fourthought.com> <200010160839.KAA01656@loewis.home.cs.tu-berlin.de> Message-ID: <39EB32D7.10BB8074@FourThought.com> "Martin v. Loewis" wrote: > > > Is it time to reopen this discussion? > > Certainly, yes. > > > The Python XML-SIG really "owns" the "xml" Python package namespace and > > we don't intrude on it without its permission. DOM moved there for > > obvious reasons (it's now the official full DOM of the SIG). Earlier > > this year the decision was made to move xslt and xpath there as part of > > initial moves towards incorporating them into the xml-sig package but I > > think concerns over their cross-platform compatability are holding up > > outright adoption. > > Could somebody please summarize what these concerns where? I > understand to fully build 4XPath from source, you need BisonGen, > bison, flex, SWIG, ... what else? This is indeed an impressive list of > prerequisites, but I can't see anything inherently platform-dependent > in it. I think we have addressed most of these concerns in the latest releases of 4Suite. We now check in all of the generated files so all you need is PyXML and a c compilier. I think the only concern left is the not all of 4Suite should be included into the "xml" pacakge. 4ODS for sure does not belong there...The rest, RDF, XPointer, XLink, I think all fit. Mike > > Does anybody think it is funny that you need EBNF parsers in XML tools?-) > > Regards, > Martin > _______________________________________________ > RIL mailing list > RIL@lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/ril -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Nicolas.Chauvat@logilab.fr Mon Oct 16 17:59:18 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Mon, 16 Oct 2000 18:59:18 +0200 (CEST) Subject: [XML-SIG] PyXML home page on SF In-Reply-To: Message-ID: On 16 Oct 2000, Lars Marius Garshol wrote: > * Andrew Kuchling > |=20 > | Should the XML topic guide be moved to a set of pages on > | pyxml.sourceforge.net? I'm really the only person left who can > | update the topic guide, and having the Web pages accessible through > | CVS would mean more people could keep them up to date. >=20 > I'm definitely for this. There are many times when I know I would > have updated things on the pages if I'd had access. Lots of python projects would benefit from sourceforge-like tools. Sourceforge itself is open source. What about having python.org or pythonlabs/beopen host a sourceforge like system as does www.bioinformatics.org for projects related to bioinformatics ? That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-) Opinions ? --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From rsalz@caveosystems.com Mon Oct 16 19:19:54 2000 From: rsalz@caveosystems.com (Rich Salz) Date: Mon, 16 Oct 2000 14:19:54 -0400 Subject: [XML-SIG] PyXML home page on SF References: Message-ID: <39EB46CA.9D27CC70@caveosystems.com> There's a lot more to running a service than just compiling the software. > Lots of python projects would benefit from sourceforge-like tools. > Sourceforge itself is open source. What about having python.org or > pythonlabs/beopen host a sourceforge like system as does > www.bioinformatics.org for projects related to bioinformatics ? > > That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-) I suggest just use the free SF service. From gstein@lyra.org Mon Oct 16 20:08:27 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Oct 2000 12:08:27 -0700 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Oct 16, 2000 at 10:08:37AM -0400 References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us> Message-ID: <20001016120826.A347@lyra.org> On Mon, Oct 16, 2000 at 10:08:37AM -0400, Andrew Kuchling wrote: > On Mon, Oct 16, 2000 at 01:18:16PM +0200, Martin von Loewis wrote: > >I've added a new page on http://pyxml.sourceforge.net/, and made this > >the project home page. Sometimes, people had ran into the page, and > >got what still is in http://pyxml.sourceforge.net/index.php. > > Should the XML topic guide be moved to a set of pages on > pyxml.sourceforge.net? I'm really the only person left who can update > the topic guide, and having the Web pages accessible through CVS would > mean more people could keep them up to date. > > This would require 2 steps: 1) check the pages into CVS, along with > the required scripts, and 2) set up a redirect from > www.python.org/topics/xml/ to pyxml.sourceforge.net. +1 !! -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Oct 16 22:28:57 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Oct 2000 14:28:57 -0700 Subject: [XML-SIG] How to proceed In-Reply-To: ; from larsga@garshol.priv.no on Mon, Oct 16, 2000 at 04:22:51PM +0200 References: <200010161408.IAA01441@localhost.localdomain> Message-ID: <20001016142857.F25097@lyra.org> On Mon, Oct 16, 2000 at 04:22:51PM +0200, Lars Marius Garshol wrote: > > * uche ogbuji > | > | This has my vote, easily. > > What has? I was asking where you (and the others) think development > should happen, in the XML-SIG, as separate projects on SourceForge or > privately (as has been done so far). It would be nice to have xmlproc bundled as part of PyXML, which means the source should be included with the rest (on SF, in the PyXML project). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Oct 16 22:39:49 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Oct 2000 14:39:49 -0700 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <39EB46CA.9D27CC70@caveosystems.com>; from rsalz@caveosystems.com on Mon, Oct 16, 2000 at 02:19:54PM -0400 References: <39EB46CA.9D27CC70@caveosystems.com> Message-ID: <20001016143949.G25097@lyra.org> Agreed. There is bandwidth, administration, backups, etc. I see *very* little benefit to avoiding SourceForge and starting a new one. Cheers, -g On Mon, Oct 16, 2000 at 02:19:54PM -0400, Rich Salz wrote: > There's a lot more to running a service than just compiling the > software. > > > Lots of python projects would benefit from sourceforge-like tools. > > Sourceforge itself is open source. What about having python.org or > > pythonlabs/beopen host a sourceforge like system as does > > www.bioinformatics.org for projects related to bioinformatics ? > > > > That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-) > > I suggest just use the free SF service. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:00:29 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 01:00:29 +0200 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us> (message from Andrew Kuchling on Mon, 16 Oct 2000 10:08:37 -0400) References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us> Message-ID: <200010162300.BAA00783@loewis.home.cs.tu-berlin.de> > Should the XML topic guide be moved to a set of pages on > pyxml.sourceforge.net? I'm really the only person left who can > update the topic guide, and having the Web pages accessible through > CVS would mean more people could keep them up to date. If you'd be willing to perform the updates, I'd be fine if they stay on python.org. It is certainly the case that they are more accessible on SF. I'm not sure what the future of python.org is - if there is a chance that it gets as open as SF, it would probably be better if such stuff stays on python.org. I'm a bit worried how long python.org stays in its current state, though. > This would require 2 steps: 1) check the pages into CVS, along with > the required scripts, and 2) set up a redirect from > www.python.org/topics/xml/ to pyxml.sourceforge.net. I'm not really sure how to set up CVS-controlled pages on SF, but I guess others have done that, so I could find out. I certainly agree that having such a procedure is a prerequisite for moving any contents. The redirect could happen once the content has moved. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:08:06 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 01:08:06 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: (message from Lars Marius Garshol on 16 Oct 2000 10:35:59 +0200) References: Message-ID: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de> > xmlproc > This needs to be updated to XML 1.0 2nd ed, extended with Unicode > support and a SAX 2.0 driver (I have 95% of one ready) and also > improved in various ways. It's probably in the interest of PyXML users to get updates to xmlproc together with PyXML updates, instead of collecting things from various sources - so yes, I'd like to see your continuing support for this parser in PyXML. > saxlib > I plan for this package to contain lots of SAX 2.0-related > utilities, like DOM2SAX walkers, XInclude and XBase filters, more > advanced parser instantiation tools, more drivers etc. It sounds like a good idea to offer more functionality in saxlib. However, we have to be careful that PyXML continues to be a strict superset of Python 2.0. To achieve that, I'd like to see the 2.0-provided functionality be split-off before adding more stuff. Then, it should be possible to use saxlib in a line-by-line identical form with 2.0 (e.g. by having xml.sax.saxlib20), and still provide extra functionality in xml.sax.saxlib. I'd also like to hear opinions on *where* this functionality should be located - if it is not clearly specific to SAX, xml.sax may not be the right place. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:18:17 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 01:18:17 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: <200010161519.JAA02069@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200010161519.JAA02069@localhost.localdomain> Message-ID: <200010162318.BAA00906@loewis.home.cs.tu-berlin.de> > I meant that I would much prefer to see development on xmlproc. In > answer to your real question, though, I think they might as well all > go on Sourceforge sonmce it will give others a chance to pitch in. I'm all in favour of a bazaar-style operation (release early, release often). I'm willing to help as I can to allow Lars to release his stuff through PyXML. If he thinks parts of it are not ready for consumption by the typical PyXML user, then having different SF projects might provide the right balance between releasing early and committing to specific API too early (not that PyXML will guarantee stable API in all modules - Python 2.0 is there for stable documented API). Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:24:47 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 01:24:47 +0200 Subject: [XML-SIG] the faster way to get a dom. In-Reply-To: <200010161524.JAA02091@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200010161524.JAA02091@localhost.localdomain> Message-ID: <200010162324.BAA00966@loewis.home.cs.tu-berlin.de> > Are the large documents such that a subset of the DOM would suffice for your > use? On another note, does it make a difference to use a different parser? I don't know whether sgmlop is sophisticated and compatible enough for the ext.reader functions - I'd be interested to learn whether it makes any difference, though. It appears that you can't tell FromXml and friends what parser to use; if you have PyXML 0.6.1, you can influence choice of parser by setting the PY_SAX_PARSER environment variable (use xml.sax,saxexts.make_parser() to see whether it really gives you a sgmlop driver). Again, it'd be quite interesting to learn about your findings; if you think something should work but doesn't, we'd like to know as well. Regards, Martin From Nicolas.Chauvat@logilab.fr Tue Oct 17 11:18:41 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Tue, 17 Oct 2000 12:18:41 +0200 (CEST) Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <200010162335.BAA01015@loewis.home.cs.tu-berlin.de> Message-ID: [maintaining a pythonforge.org would take time and ressources] Yes. But I understand a commercial company (BeOpen) has taken over python development. I think it is the kind of free services they could give back to the python community. > > That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-= ) >=20 > I'd personally hope that python.org becomes accessible in that > way. You could probably have all of the current content there, and > then also have python.org/projects/pyxml; python.org/users/someone; > xml.python.org (and whatever other gimmicks they offer). > > One advantage of taking things off SF is that responsiveness of that > system was really bad. That seems to have improved recently; My concern is not about responsiveness as much as distribution (as in Internet is a distributed system). SourceForge is a great service. Good. Now are we to host every single open source project on SourceForge ? If we do so, the day SourceForge closes or changes its policy, or whatever, every single open source project will be halted or maybe discontinued. The people at SourceForge know their job: they provide a good service and the tools (code+doc) to implement that same service at other places. Why wouldn't a community as big an active as python's put up ressources in common to offer such a useful service, but dedicated to python development ? There use to be a python.starship.net, maintained by volunteers, that is now hosted by BeOpen. Why not take the next step ? > if anybody is to host a similar server, they need to be aware that it > is probably hard to compete with SF in terms of provided services. For > example, I trust that SF has a reasonable backup strategy - they > simply can't risk a desaster. Anybody hosting a server for just a few > projects would not get the same sort of trust from me. That's why I think we shouldn't look for someone able to host a server for a few projects but for some company(ies) able to put up the ressources for "python projects" and volunteers that help them. That would also make it easier for people to look for python ressources: code in development would be at something.python.org and 3rd party software at www.vex.net/parnassus. But I would agree that's a weak argument as long as www.python.org continues to be well-maintained with no broken links and stays the central hub for python information. I'm sure several people from PythonLabs are on this list. What is their opinion ? --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From larsga@garshol.priv.no Tue Oct 17 12:41:47 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Oct 2000 13:41:47 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: <200010162318.BAA00906@loewis.home.cs.tu-berlin.de> References: <200010161519.JAA02069@localhost.localdomain> <200010162318.BAA00906@loewis.home.cs.tu-berlin.de> Message-ID: * Martin v. Loewis | | I'm all in favour of a bazaar-style operation (release early, | release often). I'm willing to help as I can to allow Lars to | release his stuff through PyXML. If he thinks parts of it are not | ready for consumption by the typical PyXML user, then having | different SF projects might provide the right balance between | releasing early and committing to specific API too early (not that | PyXML will guarantee stable API in all modules - Python 2.0 is there | for stable documented API). In general, I think xmlproc fits in well with the XML-SIG stuff. saxlib (as I now envision it) would probably also fit in there. dtddoc and rsskit are XML applications rather than XML infrastructure, and so are substantially different from the other two. They are also much less mature as ideas and I will probably not develop these as actively as the other two. The only reason I can see for not including xmlproc is that I would like to be able to basically develop it the way I want, and add whatever features I like and at my own speed. If you think that would work just as well under the XML-SIG umbrella then I think we can do that. --Lars M. From larsga@garshol.priv.no Tue Oct 17 12:46:01 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 17 Oct 2000 13:46:01 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de> References: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de> Message-ID: * Lars Marius Garshol | | saxlib | I plan for this package to contain lots of SAX 2.0-related | utilities, like DOM2SAX walkers, XInclude and XBase filters, more | advanced parser instantiation tools, more drivers etc. * Martin v. Loewis | | It sounds like a good idea to offer more functionality in | saxlib. However, we have to be careful that PyXML continues to be a | strict superset of Python 2.0. My idea for saxlib is that it should be a toolkit with SAX 2.0-related add-ons. I didn't really intend for it to contain SAX 2.0 itself, just useful drivers, filters and similar kinds of utilities. | To achieve that, I'd like to see the 2.0-provided functionality be | split-off before adding more stuff. Agreed. | Then, it should be possible to use saxlib in a line-by-line | identical form with 2.0 (e.g. by having xml.sax.saxlib20), and still | provide extra functionality in xml.sax.saxlib. Well, whether this belongs in the xml.sax package or not is unclear. It's not part of SAX as such, just utilities built on top of SAX. | I'd also like to hear opinions on *where* this functionality should | be located - if it is not clearly specific to SAX, xml.sax may not | be the right place. It would be clearly SAX-specific. However, much of it would also be usable with the DOM as well. For example, you might use the filters to transparently perform XInclude processing as the DOM tree is built. xml.saxlib is probably a better location for it. Or xmlplus.saxlib. Or whatever. --Lars M. From Juergen Hermann" Message-ID: On 17 Oct 2000 13:41:47 +0200, Lars Marius Garshol wrote: >dtddoc and rsskit are XML applications rather than XML infrastructure, >and so are substantially different from the other two. They are also >much less mature as ideas and I will probably not develop these as >actively as the other two. I think they should not necessarily go into the xml module, but I _DO_ think they fit under the PyXML umbrella. We currently have two top-level= modules in the CVS tree, "html" and "xml". We could either open a top-level module for each tool/application, or create a "tools" module and locate them there. IMHO that is easier than = to have a separate SF project for each of those smaller tools that are most= ly maintained by the XML-SIG people anyway. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From GuyM@eurodatasystems.com Tue Oct 17 13:22:21 2000 From: GuyM@eurodatasystems.com (Guy Murphy) Date: Tue, 17 Oct 2000 13:22:21 +0100 Subject: [XML-SIG] Xalan and Xerces... Message-ID: Hiya. Been lurking for quite a while now. I understand that the fourthought offering might well be a good one, but I am a little confused as to what the fourthought offering might bring to Python that Xalan and Xerces would not, and why the fourthought offering has been blessed by the SIG for pyXML. [genuine pondering, not rhetorical] Windows developers are in the fortunate position of having a reference parser, MSXML to develop for. Any product aimed at the Windows platform can reasonably stipulate a requirement for MSXML. Xalan and Xerces I would assert are the most likely candidates to become the cross platform (or indeed Unix focussed with Windows availability) reflection of MSXML. A stable, well documented, widely distributed XML parser and XSL processing engine. It's use by Apache ensures a reasonable degree of development resource. Also given that C++ and Java versions of Xalan and Xerces are available, this would have to me at least seemed a perfect fit for Python and JPython both. Why has fourthought's offering been chosen over Xalan and Xerces? [again genuine question, not rhetorical] It seems to me that a precious opportunity to become *the* language of choice for cross-platform XML development is being lost by the Python community. Python's SAX support is good, but it's DOM support to date is less than "industrial strength", and doesn't look as if it will be for some time yet. If Python had production quality XML/XSL support and a core Apache module (I realise there are two or more such modules existing, but again IMHO they are not well focused by the community, and of unverified / unproven strength) then Python could capitalise on a cross-platform web-development role. In an ideal world a Python DOM/XPath/XSLT wrapper that could mask either a Xalan/Xerces or MSXML core, with an automatic switch dependant upon platform and availability might start to qualify for the term "full XML support". ****Moving slightly but not wholly out of the lists scope**** My own personal view is that such a Web development niche focused upon ease of XML development is essential for Pythons long term viability as a development language (as opposed to a spare wrench in the toolbox). This niche is as much about developers perception of Python as much as it's actual ability, and takes time to build up. I would like to have developers think of Python and XML in the same way as they think of Perl and regular expressions. I've been playing with .NET and this evening will be playing with Python.NET (with the .NET Xml library), and if MS ever does manage to get the port they're after out of Corel then Python like many others risks having itself assimilated unless it has a real strong niche offering. Looking forward to my re-education as to why Python has moved this way, Best Regards, Guy J Murphy guym@eurodatasystems.com From rsalz@caveosystems.com Tue Oct 17 14:45:24 2000 From: rsalz@caveosystems.com (Rich Salz) Date: Tue, 17 Oct 2000 09:45:24 -0400 Subject: [XML-SIG] Xalan and Xerces... References: Message-ID: <39EC57F4.C0060601@caveosystems.com> > Also given that C++ and Java versions of Xalan and Xerces are available, > this would have to me at least seemed a perfect fit for Python and JPython > both. I don't know that I'd call them two different versions, but rather I'd say that the Apache group has two different implementations. The API's look kinda similar, but they're not really that alike, even though they keep saying things like "port the Java serializer classes." :) And if there are two, why can't there be a third? It's WAAY too early to declare victory for one side or the other. The Xalan/Xerces folks have to spend a fair amount of time dealing with lots of issues like delete vs delete[], garbage collection (er, sorry, lazy evaluations when you call Terminate() :), wchar_t/XmlChar, threads and other platform issues that are really secondary. Because of that, I believe that PyXML and friends will soon catch up and surpass the Apache C++ XML efforts. > Why has fourthought's offering been chosen over Xalan and Xerces? [again > genuine question, not rhetorical] I don't think that this has happened. Feel free to write extension code that wraps the C++ stuff into Python classes. (I'd recommend swig, www.swig.org). You can probably get xml.xalan and xml.xerces as the package names without any problem. > In an ideal world a Python DOM/XPath/XSLT wrapper that could mask either a > Xalan/Xerces or MSXML core, with an automatic switch dependant upon platform Last I looked the API's weren't all that similar -- different enough that some of those wrappers would be pretty hairy. > I would like to have developers think of Python and XML in the same way as > they think of Perl and regular expressions. Many of us already are. /r$ From uche.ogbuji@fourthought.com Tue Oct 17 16:18:28 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 17 Oct 2000 09:18:28 -0600 Subject: [XML-SIG] the faster way to get a dom. In-Reply-To: Message from "Martin v. Loewis" of "Tue, 17 Oct 2000 01:24:47 +0200." <200010162324.BAA00966@loewis.home.cs.tu-berlin.de> Message-ID: <200010171518.JAA05611@localhost.localdomain> > > Are the large documents such that a subset of the DOM would suffice for your > > use? > > On another note, does it make a difference to use a different parser? > I don't know whether sgmlop is sophisticated and compatible enough for > the ext.reader functions - I'd be interested to learn whether it makes > any difference, though. > > It appears that you can't tell FromXml and friends what parser to use; > if you have PyXML 0.6.1, you can influence choice of parser by setting > the PY_SAX_PARSER environment variable (use > xml.sax,saxexts.make_parser() to see whether it really gives you a > sgmlop driver). Right now you can only specify parser by hacking FromXml or influencing saxlib's choice of parser as you say. It might be useful to write some low-level 4DOM readers that don't have to go through SAX. For instance, using sgmlop's low-level interface should be _many_ times faster than expat via saxlib. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Oct 17 16:21:01 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 17 Oct 2000 09:21:01 -0600 Subject: [XML-SIG] How to proceed In-Reply-To: Message from Lars Marius Garshol of "17 Oct 2000 13:41:47 +0200." Message-ID: <200010171521.JAA05626@localhost.localdomain> > The only reason I can see for not including xmlproc is that I would > like to be able to basically develop it the way I want, and add > whatever features I like and at my own speed. If you think that would > work just as well under the XML-SIG umbrella then I think we can do > that. You could probably have it both ways if you're not averse to having to merge in changes between your own work and contributions by other SIG members now and then. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Oct 17 16:40:50 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 17 Oct 2000 09:40:50 -0600 Subject: [XML-SIG] Xalan and Xerces... In-Reply-To: Message from Guy Murphy of "Tue, 17 Oct 2000 13:22:21 BST." Message-ID: <200010171540.JAA05660@localhost.localdomain> As a Fourthought principal, I've avoided comment thus far, but... > It seems to me that a precious opportunity to become *the* language of > choice for cross-platform XML development is being lost by the Python > community. Python's SAX support is good, but it's DOM support to date is > less than "industrial strength", and doesn't look as if it will be for some > time yet. Odd statement, that. As Lars discovered with his cross-platform DOM test-suite (and reported to the www-dom list), 4DOM is one of the most compliant DOM implementations available for any platform. Why do you think it's less than "industrial-strength"? > If Python had production quality XML/XSL support and a core Apache module (I > realise there are two or more such modules existing, but again IMHO they are > not well focused by the community, and of unverified / unproven strength) > then Python could capitalise on a cross-platform web-development role. For core Apache modules, see mod_snake. It's great stuff. We use it in the soon-to-be-released 4Suite Server. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 22:51:09 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 23:51:09 +0200 Subject: [XML-SIG] PyXML home page on SF In-Reply-To: (message from Nicolas Chauvat on Tue, 17 Oct 2000 12:18:41 +0200 (CEST)) References: Message-ID: <200010172151.XAA00871@loewis.home.cs.tu-berlin.de> > Yes. But I understand a commercial company (BeOpen) has taken over python > development. I think it is the kind of free services they could give > back to the python community. I don't feel in a position to demand anything like that... In fact, in free software, nobody normally has the right to demand anything. It happens that BeOpen/Pythonlabs makes Python available on roughly the same terms as it was always available - so I don't even see why they need to give anything else to anybody... > Now are we to host every single open source project on SourceForge ? > If we do so, the day SourceForge closes or changes its policy, or > whatever, every single open source project will be halted or maybe > discontinued. Not at all. In the worst case, people would have to change the host name in their CVS sandboxes; and you might lose your bug data base - but that would already be a horror scenario. More likely, I'd think that there would be advance warning about policy changes, to give people enough time to migrate somewhere else should they feel the need. > Why wouldn't a community as big an active as python's put up ressources in > common to offer such a useful service, but dedicated to python > development ? There use to be a python.starship.net, maintained by > volunteers, that is now hosted by BeOpen. Why not take the next step ? Lack of volunteers, perhaps? > That's why I think we shouldn't look for someone able to host a > server for a few projects but for some company(ies) able to put up > the ressources for "python projects" and volunteers that help them. If you think you can demand such things from people, just go ahead and do so. I don't feel the need for that. There are plenty of alternatives already: Cygnus/RedHat operates sources.redhat.com (aka sourceware.cygnus.com). They are also willing to host projects. > That would also make it easier for people to look for python > ressources: code in development would be at something.python.org and > 3rd party software at www.vex.net/parnassus. That something that is needed - something like CPAN. But we are the wrong SIG for that :-) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 22:58:25 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Oct 2000 23:58:25 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: (message from Lars Marius Garshol on 17 Oct 2000 13:41:47 +0200) References: <200010161519.JAA02069@localhost.localdomain> <200010162318.BAA00906@loewis.home.cs.tu-berlin.de> Message-ID: <200010172158.XAA00916@loewis.home.cs.tu-berlin.de> > The only reason I can see for not including xmlproc is that I would > like to be able to basically develop it the way I want, and add > whatever features I like and at my own speed. If you think that would > work just as well under the XML-SIG umbrella then I think we can do > that. Yes, certainly. As long as you continue to maintain it, I assume you will also respond to people who complain that something broke. With different people contributing experimental bleeding-edge code, I would expect many releases to have glitches here and there - if we all agree to work towards a "stable" release from time to time, then the better. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 23:03:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Oct 2000 00:03:13 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: (message from Lars Marius Garshol on 17 Oct 2000 13:46:01 +0200) References: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de> Message-ID: <200010172203.AAA00965@loewis.home.cs.tu-berlin.de> > My idea for saxlib is that it should be a toolkit with SAX 2.0-related > add-ons. I didn't really intend for it to contain SAX 2.0 itself, just > useful drivers, filters and similar kinds of utilities. And I see that I was just confusing it with saxutils, sorry - there is no need to synchronize saxlib with Python 2.0 [as that has no xml.sax.saxlib]. For xml.sax.saxlib, there is then only the backwards compatibility concern with SAX1 - we probably have to support the SAX1 classes in saxlib as long as people have SAX1 applications. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:22:45 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Oct 2000 01:22:45 +0200 Subject: [XML-SIG] Xalan and Xerces... In-Reply-To: (message from Guy Murphy on Tue, 17 Oct 2000 13:22:21 +0100) References: Message-ID: <200010172322.BAA01705@loewis.home.cs.tu-berlin.de> > I understand that the fourthought offering might well be a good one, but I > am a little confused as to what the fourthought offering might bring to > Python that Xalan and Xerces would not, and why the fourthought offering has > been blessed by the SIG for pyXML. [genuine pondering, not rhetorical] I haven't used Xalan or Xerces - but how exactly would you integrated into PyXML? I think that is technically not feasible, at least not in a 99% pure Python approach. In addition, PyXML supports a number of parsers - validating ones, fast ones, and super fast ones - I'd see no point in adding another parser, unless it provides features not found in any of the existing parsers. > Windows developers are in the fortunate position of having a > reference parser, MSXML to develop for. Any product aimed at the > Windows platform can reasonably stipulate a requirement for MSXML. Python, with Python 2, is also in the fortunate position of having a reference parser - xml.parsers.expat. With PyXML, you get xmlproc and sgmlop in addition to that. > Xalan and Xerces I would assert are the most likely candidates to become the > cross platform (or indeed Unix focussed with Windows availability) > reflection of MSXML. Nah, can't be :-) Python 2 (shipping today) already provides cross-platform XML parsing - for Python. There is nothing wrong with Xerces providing the same thing for Java - although I'd prefer a parser running in compiled code any time. > A stable, well documented, widely distributed XML parser and XSL > processing engine. For the PyXML parsers, I think pretty much the same can be said. > Also given that C++ and Java versions of Xalan and Xerces are available, > this would have to me at least seemed a perfect fit for Python and JPython > both. I can't really comment on the quality of the C++ version of Xerces - I can't emagine it is completely compatible to the Java version, though. Even if it was, arranging the same *Python* interface to both might be a challenge. > Why has fourthought's offering been chosen over Xalan and Xerces? > [again genuine question, not rhetorical] Please understand that PyXML is *not* a Fourthought offering. They have provided the DOM implementation, and they will provide the XSLT implementation - the parsers come from many other sources. Being confronted with Xerces for the first time, I took the opportunity to port their SAXCount example to PyXML, which took me half an hour (plus minus five minutes), including installing Xerces. On my system (AMD K6, 350MHz, JDK 1.3.0beta-b07) I got the following results: Xerces with no options: data/personal.xml: 903 ms (37 elems, 18 attrs, 26 spaces, 242 chars) Xerces with -w (i.e. parse the file once, then measure time for second run) data/personal.xml: 85 ms (37 elems, 18 attrs, 26 spaces, 242 chars) PyXML 0.6.1, expat as the parser: data/personal.xml: 0.0128449s (37 elems, 12 attrs,0 spaces, 268 chars) First, you'll notice that Python beats Java by an order of magnitude even in the "fast" java case. I'm not really surprised - expat is a fast parser, and it is written in C. Next, you'll notice that expat does not report ignorableWhitespace; instead, the spaces are reported as character data. I'm not sure which one is right here (or whether both are acceptable) - both parsers operate in a non-validating mode. Somebody cares to clarify. The difference in number of attributes apparently comes from Xerces passing the default value for an implied attribute from the DTD, whereas expat doesn't. See for the source of that ported example below. > If Python had production quality XML/XSL support and a core Apache > module (I realise there are two or more such modules existing, but > again IMHO they are not well focused by the community, and of > unverified / unproven strength) then Python could capitalise on a > cross-platform web-development role. I think Python does capitalise on a cross-platform web-development role. However, if you think more needs to be done - just go ahead and do it :-) > In an ideal world a Python DOM/XPath/XSLT wrapper that could mask > either a Xalan/Xerces or MSXML core, with an automatic switch > dependant upon platform and availability might start to qualify for > the term "full XML support". I can imagine using MSXML when that is available, and Xerces when it is available (i.e. in JPython). That should be as simple to support as adding SAX drivers. However, that is not strictly necessary - Python has "full XML support" right now. > My own personal view is that such a Web development niche focused upon ease > of XML development is essential for Pythons long term viability as a > development language (as opposed to a spare wrench in the toolbox). That is a little bit too much of marketing speak for me. I will continue to use Python as long as it is useful for me - regardless of others considering it viable for something or not. > I would like to have developers think of Python and XML in the same > way as they think of Perl and regular expressions. I don't think spreading FUD that Python currently does not support XML does help for that, though... Regards, Martin # Example adapted from Xerces' sax.SAXCount from xml.sax import ContentHandler, make_parser from xml.sax.handler import feature_namespaces from time import time setValidation = 0 setNameSpaces = 1 setSchemaSupport = 1 warmup = 0 class SAXCount(ContentHandler): def startDocument(self): if warmup:return self.elems = 0 self.attrs = 0 self.chars = 0 self.spaces = 0 def startElementNS(self,name,qname,attrs): if warmup:return self.elems += 1 self.attrs += len(attrs) def characters(self,chars): if warmup:return self.chars += len(chars) def ignorableWhitespace(self,chars): if warmup:return self.spaces += len(chars) def printResults(self, uri, time): print "%s: %gs" % (uri, time), print "(%(elems)d elems, %(attrs)d attrs,"\ "%(spaces)d spaces, %(chars)d chars)" %\ vars(self) def printit(uri): global warmup counter = SAXCount() parser = make_parser() parser.setContentHandler(counter) # not setting error handler # parser.setFeature(feature_validation, setValidation) parser.setFeature(feature_namespaces, setNameSpaces) # parser.setFeature(feature_schema, setSchema) parser.parse(uri) if warmup: parser.parse(uri) parser.reset() warmup = 0 start = time() parser.parse(uri) counter.printResults(uri,time()-start) if __name__=='__main__': # todo: argument processing import sys,getopt opts, args = getopt.getopt(sys.argv[1:], "w") for opt,val in opts: if opt == '-w': warmup = 1 printit(args[0]) From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:27:20 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Oct 2000 01:27:20 +0200 Subject: [XML-SIG] How to proceed In-Reply-To: <200010171521.JAA05626@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200010171521.JAA05626@localhost.localdomain> Message-ID: <200010172327.BAA01750@loewis.home.cs.tu-berlin.de> > You could probably have it both ways if you're not averse to having to merge > in changes between your own work and contributions by other SIG members now > and then. Alternatively, we could make use of CVS branches if there are features that may take time to get in a usable shape, and that would break existing code even during that time. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:26:01 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Oct 2000 01:26:01 +0200 Subject: [XML-SIG] the faster way to get a dom. In-Reply-To: <200010171518.JAA05611@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200010171518.JAA05611@localhost.localdomain> Message-ID: <200010172326.BAA01735@loewis.home.cs.tu-berlin.de> > Right now you can only specify parser by hacking FromXml or > influencing saxlib's choice of parser as you say. It might be > useful to write some low-level 4DOM readers that don't have to go > through SAX. For instance, using sgmlop's low-level interface > should be _many_ times faster than expat via saxlib. As a starting point, I'd try using sgmlop through the sgmlop SAX driver. To achieve that, it would be simplest if the From* methods took a parser= keyword argument which would allow the caller to specify a pre-fabricated parser object. This is a small effort and might reveal whether sgmlop is suitable for building DOM trees or not (and perhaps fix it if it is not). Regards, Martin From uche.ogbuji@fourthought.com Wed Oct 18 00:43:42 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 17 Oct 2000 17:43:42 -0600 Subject: [XML-SIG] Xalan and Xerces... In-Reply-To: Message from "Martin v. Loewis" of "Wed, 18 Oct 2000 01:22:45 +0200." <200010172322.BAA01705@loewis.home.cs.tu-berlin.de> Message-ID: <200010172343.RAA06974@localhost.localdomain> > Being confronted with Xerces for the first time, I took the > opportunity to port their SAXCount example to PyXML, which took me > half an hour (plus minus five minutes), including installing Xerces. > > On my system (AMD K6, 350MHz, JDK 1.3.0beta-b07) I got the following > results: > > Xerces with no options: > data/personal.xml: 903 ms (37 elems, 18 attrs, 26 spaces, 242 chars) > Xerces with -w (i.e. parse the file once, then measure time for second run) > data/personal.xml: 85 ms (37 elems, 18 attrs, 26 spaces, 242 chars) > PyXML 0.6.1, expat as the parser: > data/personal.xml: 0.0128449s (37 elems, 12 attrs,0 spaces, 268 chars) Good stats to have on hand. Thanks. > First, you'll notice that Python beats Java by an order of magnitude > even in the "fast" java case. I'm not really surprised - expat is a > fast parser, and it is written in C. > > Next, you'll notice that expat does not report ignorableWhitespace; > instead, the spaces are reported as character data. I'm not sure which > one is right here (or whether both are acceptable) - both parsers > operate in a non-validating mode. Somebody cares to clarify. There is really no such thing as ignorable whitespace in non-validating mode. According to XML 1.0, white-space can only be ignored when it occurs where the is no corresponding #PCDATA in the content model from the DTD. Since the DTD is not used in non-validating mode, the parser _cannot_ make assumptions that it's ignorable. So in this case expat is right and Xerces is wrong. > The difference in number of attributes apparently comes from Xerces > passing the default value for an implied attribute from the DTD, > whereas expat doesn't. Since expat is strictly non-validating, this is quite valid. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Wed Oct 18 00:48:46 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 17 Oct 2000 17:48:46 -0600 Subject: [XML-SIG] the faster way to get a dom. In-Reply-To: Message from "Martin v. Loewis" of "Wed, 18 Oct 2000 01:26:01 +0200." <200010172326.BAA01735@loewis.home.cs.tu-berlin.de> Message-ID: <200010172348.RAA07014@localhost.localdomain> > > Right now you can only specify parser by hacking FromXml or > > influencing saxlib's choice of parser as you say. It might be > > useful to write some low-level 4DOM readers that don't have to go > > through SAX. For instance, using sgmlop's low-level interface > > should be _many_ times faster than expat via saxlib. > > As a starting point, I'd try using sgmlop through the sgmlop SAX > driver. To achieve that, it would be simplest if the From* methods > took a parser= keyword argument which would allow the caller to > specify a pre-fabricated parser object. This is a small effort and > might reveal whether sgmlop is suitable for building DOM trees or not > (and perhaps fix it if it is not). Right-O. I'll add this and post some bench-marks so we can think it over. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Nicolas.Chauvat@logilab.fr Wed Oct 18 09:35:02 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Wed, 18 Oct 2000 10:35:02 +0200 (CEST) Subject: [XML-SIG] PyXML home page on SF In-Reply-To: <200010172151.XAA00871@loewis.home.cs.tu-berlin.de> Message-ID: On Tue, 17 Oct 2000, Martin v. Loewis wrote: [about my suggestion concerning a SourceForge dedicated to python] That was only a suggestion, not a demand as someone stated. I would love to set up a pythonforge myself if I had the resources, unfortunately I don't... maybe in a few months? Time will tell. For now I'll just go back to hacking and keep quiet. Cheers, :) --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From martin@loewis.home.cs.tu-berlin.de Thu Oct 19 07:06:43 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Oct 2000 08:06:43 +0200 Subject: [XML-SIG] XML topic guide in CVS Message-ID: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de> I've extracted what I believe is the content of http://www.python.org as the www module of the PyXML CVS (i.e. a root of cvs.pyxml.sourceforge.net:/cvsroot/pyxml, and a repository of www). If you know any file that's missing, please let me know (or add it yourself). The ht2html generated files are *not* part of the repository. Instead, I put ht2html into the www module as well, and intend to regenerate the .html files on every commit. Unfortunately, write access to the WWW pages is currently denied on SF, so I cannot test the procedure. If you want to have a glance on what it should do, please look into the files commitprog and loginfo of the CVSROOT module. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Oct 19 07:22:08 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Oct 2000 08:22:08 +0200 Subject: [XML-SIG] XML topic guide in CVS In-Reply-To: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de> (martin@loewis.home.cs.tu-berlin.de) References: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de> Message-ID: <200010190622.IAA01737@loewis.home.cs.tu-berlin.de> > I've extracted what I believe is the content of http://www.python.org Oops: http://www.python.org/topics/xml Martin From ibarg@as.arizona.edu Thu Oct 19 18:23:57 2000 From: ibarg@as.arizona.edu (Irene Barg) Date: Thu, 19 Oct 2000 10:23:57 -0700 Subject: [XML-SIG] PxXML-0.6.1 html_builder? Message-ID: <39EF2E2D.E08BC9D7@as.arizona.edu> XML-SIG, I installed PxXML-0.6.1 on RedHat Linux 6.1, Python 1.5.2 and noticed that neither 'builder' or 'html_builder' exist. The 'PxXML-0.6.1/xml/dom/ChangeLog' says that 'Builder is now deprecated ...'. Yet, there is still an example in the dom/demo called 'html2html' that imports 'html_builder', and 'builder.py' imports 'builder'. Are there examples of 4DOM modules that replace these? Thanks, --irene ------------------------------------------------------------------ Irene Barg Email: ibarg@as.arizona.edu Steward Observatory Phone: 520-621-2602 933 N. Cherry Ave. University of Arizona FAX: 520-621-1891 Tucson, AZ 85721 http://nickel.as.arizona.edu/~barg ------------------------------------------------------------------ From uche.ogbuji@fourthought.com Fri Oct 20 03:56:15 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 19 Oct 2000 20:56:15 -0600 Subject: [XML-SIG] PxXML-0.6.1 html_builder? In-Reply-To: Message from Irene Barg of "Thu, 19 Oct 2000 10:23:57 PDT." <39EF2E2D.E08BC9D7@as.arizona.edu> Message-ID: <200010200256.UAA10726@localhost.localdomain> For an example of creating a DOM from an html file, see dom/demos/dom_from_html_file.py For an example of creating HTML dynamically, see dom/demos/generate_html1.py Good luck. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From per@sbc.su.se Fri Oct 20 10:02:08 2000 From: per@sbc.su.se (Per Kraulis) Date: Fri, 20 Oct 2000 11:02:08 +0200 Subject: [XML-SIG] error in test_sax.py, and fix Message-ID: <39F00A10.1C510662@sbc.su.se> This is a multi-part message in MIME format. --------------78802F8B16C4F64AA1F9652C Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, Having installed PyXML-0.6.1 under Python 1.5.2 (Linux RedHat 6.2), I had some problems running the PyXML-0.6.1/test/regrtest.py script. I have isolated one particular problem, and fixed it (I think) by rearranging the order of some statements in the script, which seemed to be erroneous. I attach the fixed script; do a comparison with the original to see what I did. Original error output: per@sandman $ python test_sax.py Traceback (innermost last): File "test_sax.py", line 283, in ? xml_test_out = open(findfile("test.xml.out")).read() IOError: [Errno 2] No such file or directory: 'test.xml.out' Cheerio, and thanks for the hard work, Per Kraulis -- Per J. Kraulis, Ph.D. per@sbc.su.se Stockholm Bioinformatics Center (SBC) http://www.sbc.su.se/~per Dept. Biochemistry, Stockholm University phone +46 (0)8 - 674 78 17 SE-106 91 Stockholm, SWEDEN fax +46 (0)8 - 15 80 57 --------------78802F8B16C4F64AA1F9652C Content-Type: text/plain; charset=iso-8859-1; name="test_sax.py" Content-Transfer-Encoding: 8bit Content-Disposition: inline; filename="test_sax.py" # regression test for SAX 2.0 # $Id: test_sax.py,v 1.3 2000/10/07 18:30:11 loewis Exp $ from xml.sax import make_parser, ContentHandler, \ SAXException, SAXReaderNotAvailable, SAXParseException try: make_parser() except SAXReaderNotAvailable: # don't try to test this module if we cannot create a parser raise ImportError("no XML parsers available") from xml.sax.saxutils import XMLGenerator, escape, XMLFilterBase from xml.sax.expatreader import create_parser from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl from cStringIO import StringIO from test.test_support import verbose, TestFailed, findfile # ===== Utilities tests = 0 fails = 0 def confirm(outcome, name): global tests, fails tests = tests + 1 if outcome: print "Passed", name else: print "Failed", name fails = fails + 1 # =========================================================================== # # saxutils tests # # =========================================================================== # ===== escape def test_escape_basic(): return escape("Donald Duck & Co") == "Donald Duck & Co" def test_escape_all(): return escape("") == "<Donald Duck & Co>" def test_escape_extra(): return escape("Hei på deg", {"å" : "å"}) == "Hei på deg" def test_make_parser(): try: # Creating a parser should succeed - it should fall back # to the expatreader p = make_parser(['xml.parsers.no_such_parser']) except: return 0 else: return p # ===== XMLGenerator start = '\n' def test_xmlgen_basic(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.startElement("doc", {}) gen.endElement("doc") gen.endDocument() return result.getvalue() == start + "" def test_xmlgen_content(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.startElement("doc", {}) gen.characters("huhei") gen.endElement("doc") gen.endDocument() return result.getvalue() == start + "huhei" def test_xmlgen_pi(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.processingInstruction("test", "data") gen.startElement("doc", {}) gen.endElement("doc") gen.endDocument() return result.getvalue() == start + "" def test_xmlgen_content_escape(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.startElement("doc", {}) gen.characters("<huhei&" def test_xmlgen_ignorable(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.startElement("doc", {}) gen.ignorableWhitespace(" ") gen.endElement("doc") gen.endDocument() return result.getvalue() == start + " " ns_uri = "http://www.python.org/xml-ns/saxtest/" def test_xmlgen_ns(): result = StringIO() gen = XMLGenerator(result) gen.startDocument() gen.startPrefixMapping("ns1", ns_uri) gen.startElementNS((ns_uri, "doc"), "ns1:doc", {}) # add an unqualified name gen.startElementNS((None, "udoc"), None, {}) gen.endElementNS((None, "udoc"), None) gen.endElementNS((ns_uri, "doc"), "ns1:doc") gen.endPrefixMapping("ns1") gen.endDocument() return result.getvalue() == start + \ ('' % ns_uri) # ===== XMLFilterBase def test_filter_basic(): result = StringIO() gen = XMLGenerator(result) filter = XMLFilterBase() filter.setContentHandler(gen) filter.startDocument() filter.startElement("doc", {}) filter.characters("content") filter.ignorableWhitespace(" ") filter.endElement("doc") filter.endDocument() return result.getvalue() == start + "content " # =========================================================================== # # expatreader tests # # =========================================================================== # ===== DTDHandler support class TestDTDHandler: def __init__(self): self._notations = [] self._entities = [] def notationDecl(self, name, publicId, systemId): self._notations.append((name, publicId, systemId)) def unparsedEntityDecl(self, name, publicId, systemId, ndata): self._entities.append((name, publicId, systemId, ndata)) def test_expat_dtdhandler(): parser = create_parser() handler = TestDTDHandler() parser.setDTDHandler(handler) parser.feed('\n') parser.feed(' \n') parser.feed(']>\n') parser.feed('') parser.close() return handler._notations == [("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)] and \ handler._entities == [("img", None, "expat.gif", "GIF")] # ===== EntityResolver support class TestEntityResolver: def resolveEntity(self, publicId, systemId): inpsrc = InputSource() inpsrc.setByteStream(StringIO("")) return inpsrc def test_expat_entityresolver(): parser = create_parser() parser.setEntityResolver(TestEntityResolver()) result = StringIO() parser.setContentHandler(XMLGenerator(result)) parser.feed('\n') parser.feed(']>\n') parser.feed('&test;') parser.close() return result.getvalue() == start + "" # ===== Attributes support class AttrGatherer(ContentHandler): def startElement(self, name, attrs): self._attrs = attrs def startElementNS(self, name, qname, attrs): self._attrs = attrs def test_expat_attrs_empty(): parser = create_parser() gather = AttrGatherer() parser.setContentHandler(gather) parser.feed("") parser.close() return verify_empty_attrs(gather._attrs) def test_expat_attrs_wattr(): parser = create_parser() gather = AttrGatherer() parser.setContentHandler(gather) parser.feed("") parser.close() return verify_attrs_wattr(gather._attrs) def test_expat_nsattrs_empty(): parser = create_parser(1) gather = AttrGatherer() parser.setContentHandler(gather) parser.feed("") parser.close() return verify_empty_nsattrs(gather._attrs) def test_expat_nsattrs_wattr(): parser = create_parser(1) gather = AttrGatherer() parser.setContentHandler(gather) parser.feed("" % ns_uri) parser.close() attrs = gather._attrs return attrs.getLength() == 1 and \ attrs.getNames() == [(ns_uri, "attr")] and \ attrs.getQNames() == [] and \ len(attrs) == 1 and \ attrs.has_key((ns_uri, "attr")) and \ attrs.keys() == [(ns_uri, "attr")] and \ attrs.get((ns_uri, "attr")) == "val" and \ attrs.get((ns_uri, "attr"), 25) == "val" and \ attrs.items() == [((ns_uri, "attr"), "val")] and \ attrs.values() == ["val"] and \ attrs.getValue((ns_uri, "attr")) == "val" and \ attrs[(ns_uri, "attr")] == "val" # ===== InputSource support def test_expat_inpsource_filename(): parser = create_parser() result = StringIO() xmlgen = XMLGenerator(result) parser.setContentHandler(xmlgen) parser.parse(findfile("test.xml")) return result.getvalue() == xml_test_out def test_expat_inpsource_sysid(): parser = create_parser() result = StringIO() xmlgen = XMLGenerator(result) parser.setContentHandler(xmlgen) parser.parse(InputSource(findfile("test.xml"))) return result.getvalue() == xml_test_out def test_expat_inpsource_stream(): parser = create_parser() result = StringIO() xmlgen = XMLGenerator(result) parser.setContentHandler(xmlgen) inpsrc = InputSource() inpsrc.setByteStream(open(findfile("test.xml"))) parser.parse(inpsrc) return result.getvalue() == xml_test_out # =========================================================================== # # error reporting # # =========================================================================== def test_expat_inpsource_location(): parser = create_parser() parser.setContentHandler(ContentHandler()) # do nothing source = InputSource() source.setByteStream(StringIO("")) #ill-formed name = "a file name" source.setSystemId(name) try: parser.parse(source) except SAXException, e: return e.getSystemId() == name def test_expat_incomplete(): parser = create_parser() parser.setContentHandler(ContentHandler()) # do nothing try: parser.parse(StringIO("")) except SAXParseException: return 1 # ok, error found else: return 0 # =========================================================================== # # xmlreader tests # # =========================================================================== # ===== AttributesImpl def verify_empty_attrs(attrs): try: attrs.getValue("attr") gvk = 0 except KeyError: gvk = 1 try: attrs.getValueByQName("attr") gvqk = 0 except KeyError: gvqk = 1 try: attrs.getNameByQName("attr") gnqk = 0 except KeyError: gnqk = 1 try: attrs.getQNameByName("attr") gqnk = 0 except KeyError: gqnk = 1 try: attrs["attr"] gik = 0 except KeyError: gik = 1 return attrs.getLength() == 0 and \ attrs.getNames() == [] and \ attrs.getQNames() == [] and \ len(attrs) == 0 and \ not attrs.has_key("attr") and \ attrs.keys() == [] and \ attrs.get("attrs") == None and \ attrs.get("attrs", 25) == 25 and \ attrs.items() == [] and \ attrs.values() == [] and \ gvk and gvqk and gnqk and gik and gqnk def verify_attrs_wattr(attrs): return attrs.getLength() == 1 and \ attrs.getNames() == ["attr"] and \ attrs.getQNames() == ["attr"] and \ len(attrs) == 1 and \ attrs.has_key("attr") and \ attrs.keys() == ["attr"] and \ attrs.get("attr") == "val" and \ attrs.get("attr", 25) == "val" and \ attrs.items() == [("attr", "val")] and \ attrs.values() == ["val"] and \ attrs.getValue("attr") == "val" and \ attrs.getValueByQName("attr") == "val" and \ attrs.getNameByQName("attr") == "attr" and \ attrs["attr"] == "val" and \ attrs.getQNameByName("attr") == "attr" def test_attrs_empty(): return verify_empty_attrs(AttributesImpl({})) def test_attrs_wattr(): return verify_attrs_wattr(AttributesImpl({"attr" : "val"})) # ===== AttributesImpl def verify_empty_nsattrs(attrs): try: attrs.getValue((ns_uri, "attr")) gvk = 0 except KeyError: gvk = 1 try: attrs.getValueByQName("ns:attr") gvqk = 0 except KeyError: gvqk = 1 try: attrs.getNameByQName("ns:attr") gnqk = 0 except KeyError: gnqk = 1 try: attrs.getQNameByName((ns_uri, "attr")) gqnk = 0 except KeyError: gqnk = 1 try: attrs[(ns_uri, "attr")] gik = 0 except KeyError: gik = 1 return attrs.getLength() == 0 and \ attrs.getNames() == [] and \ attrs.getQNames() == [] and \ len(attrs) == 0 and \ not attrs.has_key((ns_uri, "attr")) and \ attrs.keys() == [] and \ attrs.get((ns_uri, "attr")) == None and \ attrs.get((ns_uri, "attr"), 25) == 25 and \ attrs.items() == [] and \ attrs.values() == [] and \ gvk and gvqk and gnqk and gik and gqnk def test_nsattrs_empty(): return verify_empty_nsattrs(AttributesNSImpl({}, {})) def test_nsattrs_wattr(): attrs = AttributesNSImpl({(ns_uri, "attr") : "val"}, {(ns_uri, "attr") : "ns:attr"}) return attrs.getLength() == 1 and \ attrs.getNames() == [(ns_uri, "attr")] and \ attrs.getQNames() == ["ns:attr"] and \ len(attrs) == 1 and \ attrs.has_key((ns_uri, "attr")) and \ attrs.keys() == [(ns_uri, "attr")] and \ attrs.get((ns_uri, "attr")) == "val" and \ attrs.get((ns_uri, "attr"), 25) == "val" and \ attrs.items() == [((ns_uri, "attr"), "val")] and \ attrs.values() == ["val"] and \ attrs.getValue((ns_uri, "attr")) == "val" and \ attrs.getValueByQName("ns:attr") == "val" and \ attrs.getNameByQName("ns:attr") == (ns_uri, "attr") and \ attrs[(ns_uri, "attr")] == "val" and \ attrs.getQNameByName((ns_uri, "attr")) == "ns:attr" # ===== Main program def make_test_output(): parser = create_parser() result = StringIO() xmlgen = XMLGenerator(result) parser.setContentHandler(xmlgen) parser.parse(findfile("test.xml")) outf = open(findfile("test.xml.out"), "w") outf.write(result.getvalue()) outf.close() make_test_output() xml_test_out = open(findfile("test.xml.out")).read() items = locals().items() items.sort() for (name, value) in items: if name[ : 5] == "test_": confirm(value(), name) print "%d tests, %d failures" % (tests, fails) if fails != 0: raise TestFailed, "%d of %d tests failed" % (fails, tests) --------------78802F8B16C4F64AA1F9652C-- From andorxor@gmx.de Fri Oct 20 13:06:51 2000 From: andorxor@gmx.de (Stephan Tolksdorf) Date: Fri, 20 Oct 2000 14:06:51 +0200 Subject: [XML-SIG] normalize() for minidom Message-ID: <11011989820.20001020140651@email.com> Hello, I'd like to have the normalize() method of DOM2's node interface (DOM1's normalize() was in the element interface) in minidom included. You will find my try of an implementation at the end of this mail. --- documentation --- This is the description from the candidate recommendation of dom2: normalize (introduced in DOM Level 2) Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer lookups) that depend on a particular document tree structure are to be used. Note: In cases where the document contains CDATASections, the normalize operation alone may not be sufficient, since XPointers do not differentiate between Text nodes and CDATASection nodes. No Parameters No Return Value No Exceptions --- implementation --- class Node: (...) def normalize(self): """Joins adjacent Text nodes and deletes empty Text nodes in the full depth of the sub-tree underneath this Node. """ i = 0 while i < len(self.childNodes): cn = self.childNodes[i] if cn.nodeType == Node.TEXT_NODE: i += 1 # join adjecent Text nodes while i < len(self.childNodes) and self.childNodes[i].nodeType == Node.TEXT_NODE: cn.nodeValue = cn.data = cn.data + self.childNodes[i].data del(self.childNodes[i]) # delete empty nodes if cn.nodeValue == "": i -= 1 del(self.childNodes[i]) continue elif cn.nodeType == Node.ELEMENT_NODE: cn.normalize() i += 1 ------ Best regards, Stephan Tolksdorf From martin@loewis.home.cs.tu-berlin.de Fri Oct 20 17:54:21 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Oct 2000 18:54:21 +0200 Subject: [XML-SIG] error in test_sax.py, and fix In-Reply-To: <39F00A10.1C510662@sbc.su.se> (message from Per Kraulis on Fri, 20 Oct 2000 11:02:08 +0200) References: <39F00A10.1C510662@sbc.su.se> Message-ID: <200010201654.SAA00763@loewis.home.cs.tu-berlin.de> > Having installed PyXML-0.6.1 under Python 1.5.2 (Linux RedHat 6.2), I > had some problems running the PyXML-0.6.1/test/regrtest.py script. I > have isolated one particular problem, and fixed it (I think) by > rearranging the order of some statements in the script, which seemed to > be erroneous. I attach the fixed script; do a comparison with the > original to see what I did. Thanks for your report and your patch. Since test.xml.out is the expected output, creating it on each run reduces the strength of the test. Instead, that file should have been distributed as a source file. It is already in the CVS, only the MANIFEST.in failed to mention it. I've corrected that, so 0.6.2 should include that file. Thanks again, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 20 18:21:29 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Oct 2000 19:21:29 +0200 Subject: [XML-SIG] normalize() for minidom In-Reply-To: <11011989820.20001020140651@email.com> (message from Stephan Tolksdorf on Fri, 20 Oct 2000 14:06:51 +0200) References: <11011989820.20001020140651@email.com> Message-ID: <200010201721.TAA00922@loewis.home.cs.tu-berlin.de> > I'd like to have the normalize() method of DOM2's node interface > (DOM1's normalize() was in the element interface) in minidom > included. Looks fine to me. I've massaged it a little (mostly to restore 1.5.2 compatibility), and committed it into PyXML, to appear as part of 0.6.2. Regards, Martin From larsga@garshol.priv.no Tue Oct 24 12:50:21 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Oct 2000 13:50:21 +0200 Subject: [XML-SIG] Re: How to proceed Message-ID: * Lars Marius Garshol | | The only reason I can see for not including xmlproc is that I would | like to be able to basically develop it the way I want, and add | whatever features I like and at my own speed. If you think that | would work just as well under the XML-SIG umbrella then I think we | can do that. * Martin v. Loewis | | Yes, certainly. As long as you continue to maintain it, I assume you | will also respond to people who complain that something broke. Of course. | With different people contributing experimental bleeding-edge code, | I would expect many releases to have glitches here and there - if we | all agree to work towards a "stable" release from time to time, then | the better. Then I think we've settled that. Once I replace my monitor I will commit the xmlproc regression tests to the PyXML CVS tree and start developing it there. The question of rsskit and dtddoc still remains open, though. --Lars M. From larsga@garshol.priv.no Tue Oct 24 12:50:55 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Oct 2000 13:50:55 +0200 Subject: [XML-SIG] Re: How to proceed Message-ID: * Lars Marius Garshol | | My idea for saxlib is that it should be a toolkit with SAX | 2.0-related add-ons. I didn't really intend for it to contain SAX | 2.0 itself, just useful drivers, filters and similar kinds of | utilities. * Martin v. Loewis | | And I see that I was just confusing it with saxutils, sorry - there | is no need to synchronize saxlib with Python 2.0 [as that has no | xml.sax.saxlib]. I think we should stop calling it saxlib, since that just seems to confuse people. I'm thinking that it should not be a part of SAX 2.0, but that it should be a set of useful utilities and also some add-ons. (The LexicalHandler and DeclHandler will be in there, for example.) My list of the contents, as I currently envision it is: - extensions - LexicalHandler - DeclHandler - extra drivers - JPython driver - SP driver - xmllib/sgmlop driver - RXP driver (xmlproc driver will be part of xmlproc: xml.parsers.xmlproc.drv_xmlproc) - utilities - ErrorPrinter, ErrorRaiser, ErrorRecorder - EventTracer - XBaseFilter, XIncludeFilter - ParserManager - DOM2SAX event generator - CanonicalXMLGenerator - DTDValidatingFilter, SchemaValidatingFilter So maybe saxkit or saxpack would be a better name. | For xml.sax.saxlib, there is then only the backwards compatibility | concern with SAX1 - we probably have to support the SAX1 classes in | saxlib as long as people have SAX1 applications. I think we should use a different module, possibly even a different package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*? Comments would be very welcome, as I have some of this stuff already, and would like to be able to place it where it belongs. --Lars M. From Juergen Hermann" Message-ID: On 24 Oct 2000 13:50:21 +0200, Lars Marius Garshol wrote: >The question of rsskit and dtddoc still remains open, though. I prefer one of two options: * make both a part of the PyXML project, and put them into the repository as siblings(!) of the "xml" directory. * make them their own project(s) on SourceForge (PyXMLTools?). Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From fdrake@acm.org Tue Oct 24 14:43:13 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Oct 2000 09:43:13 -0400 (EDT) Subject: [XML-SIG] Re: How to proceed In-Reply-To: References: Message-ID: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > I think we should use a different module, possibly even a different > package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*? How about xml.saxtools.*? -Fred -- Fred L. Drake, Jr. PythonLabs Team Member From larsga@garshol.priv.no Tue Oct 24 14:54:13 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Oct 2000 15:54:13 +0200 Subject: [XML-SIG] Re: How to proceed In-Reply-To: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> References: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> Message-ID: * Lars Marius Garshol | | I think we should use a different module, possibly even a different | package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*? * Fred L. Drake, Jr. | | How about xml.saxtools.*? I like that better than my own suggestions, since it makes it clearer what the package actually is. Unless anyone protests or proposes a better name, that's the one I would like to use. Does this belong in the XML-SIG package? I think it does, but it would be nice to have feedback. --Lars M. From fdrake@acm.org Tue Oct 24 14:58:14 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Oct 2000 09:58:14 -0400 (EDT) Subject: [XML-SIG] Re: How to proceed In-Reply-To: References: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> Message-ID: <14837.38262.68627.415392@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > Does this belong in the XML-SIG package? I think it does, but it > would be nice to have feedback. I think so. I think of PyXML as a fairly substantial package that provides all sorts of general XML-related support that is useful for a wide range of applications, but might not be included in the "core" support packaged with Python. Andrew described it as an "omnibus" package in the early days of PyXML, and I think that's a good description. The only thing I'd like to see changed in that regard is that development may be better off within PyXML rather than using separate CVS repositories for components (xmlproc, 4DOM, etc.), so that changes get broader testing a little earlier. Now that the CVS is on SourceForge that makes more sense than when it was on Andrew's machine. -Fred -- Fred L. Drake, Jr. PythonLabs Team Member From larsga@garshol.priv.no Tue Oct 24 15:23:56 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Oct 2000 16:23:56 +0200 Subject: [XML-SIG] Re: How to proceed In-Reply-To: <14837.38262.68627.415392@cj42289-a.reston1.va.home.com> References: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> <14837.38262.68627.415392@cj42289-a.reston1.va.home.com> Message-ID: * Lars Marius Garshol | | Does this belong in the XML-SIG package? I think it does, but it | would be nice to have feedback. * Fred L. Drake, Jr. | | I think so. I think of PyXML as a fairly substantial package that | provides all sorts of general XML-related support that is useful for | a wide range of applications, but might not be included in the | "core" support packaged with Python. Andrew described it as an | "omnibus" package in the early days of PyXML, and I think that's a | good description. I agree, this is how I think of it as well. So it looks like the SAX extras package will be called saxtools, have the package name xml.saxtools.* and live in the XML-SIG package. | The only thing I'd like to see changed in that regard is that | development may be better off within PyXML rather than using | separate CVS repositories for components (xmlproc, 4DOM, etc.), so | that changes get broader testing a little earlier. I agree. My plan for the future is to maintain xmlproc, javadom and saxtools in the PyXML CVS tree. However, this still leave the question of the two XML applications open. Do rsskit and dtddoc belong in the XML-SIG package? I don't think they do, but there have been requests to me from people who would like to see them there. So I have to decide between using the PyXML package, separate SF projects or hosting development myself. --Lars M. From uche.ogbuji@fourthought.com Tue Oct 24 15:55:21 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 24 Oct 2000 08:55:21 -0600 Subject: [XML-SIG] Re: How to proceed In-Reply-To: Message from Lars Marius Garshol of "24 Oct 2000 16:23:56 +0200." Message-ID: <200010241455.IAA32150@localhost.localdomain> > | The only thing I'd like to see changed in that regard is that > | development may be better off within PyXML rather than using > | separate CVS repositories for components (xmlproc, 4DOM, etc.), so > | that changes get broader testing a little earlier. > > I agree. My plan for the future is to maintain xmlproc, javadom and > saxtools in the PyXML CVS tree. > > However, this still leave the question of the two XML applications > open. Do rsskit and dtddoc belong in the XML-SIG package? I don't > think they do, but there have been requests to me from people who > would like to see them there. > > So I have to decide between using the PyXML package, separate SF > projects or hosting development myself. I don't know enough about rsskit and dtddoc to specifically conclude where you should host them, but if there is general interest in the packages, I'd say "put them in PyXML. Anything from Lars is sure to be of high enough quality to avoid any question of shovelling poor code into the package. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Alexandre.Fayolle@logilab.fr Tue Oct 24 16:54:00 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 24 Oct 2000 17:54:00 +0200 (CEST) Subject: [XML-SIG] xmlproc DTD api bug Message-ID: I'm using PyXml 0.5.5.1 on python 1.5.2 I know these are old versions (we are planning to move further real soon now), so maybe the problem has already been fixed. Please excuse me if this is the case. If I have an element with an ANY content model, and use elt.get_valid_elements(elt.get_start_state()), I get a Type Error, because the content model is None for this element: File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", line 300, in get_valid_elements return self.content_model[state].keys() TypeError: unsubscriptable object I think xmlproc should test for this case, and return the list of all the elements known in the DTD (using dtd.get_elements()) Cheers -- Alexandre Fayolle http://www.logilab.com LOGILAB, Paris (France). From larsga@garshol.priv.no Tue Oct 24 17:26:28 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Oct 2000 18:26:28 +0200 Subject: [XML-SIG] xmlproc DTD api bug In-Reply-To: References: Message-ID: * Alexandre Fayolle | | I'm using PyXml 0.5.5.1 on python 1.5.2 I know these are old | versions (we are planning to move further real soon now), so maybe | the problem has already been fixed. Please excuse me if this is the | case. It has been fixed, but not in the PyXML package. The fix will appear there when I move xmlproc development to the PyXML CVS tree. | If I have an element with an ANY content model, and use | elt.get_valid_elements(elt.get_start_state()), I get a Type Error, | because the content model is None for this element: | File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py", | line 300, in get_valid_elements | return self.content_model[state].keys() | TypeError: unsubscriptable object | | I think xmlproc should test for this case, I agree, and the version in my CVS tree does. It might be that version 0.70 also does; I'm not sure. | and return the list of all the elements known in the DTD (using | dtd.get_elements()) In principle I agree that this would be the correct solution. However, the element doesn't have a reference to the DTD, so it doesn't have this information. So my current code returns '[]' instead. The problem is that if the element is to have a reference to the DTD we have a cycle and in 1.5.2 that means that we must have either a .unlink() method on the DTD or memory leaks (and quite often also both). If you want to fix the immediate problem, add this method to the ElementTypeAny class in xmldtd.py: def get_valid_elements(self, state): return [] Thank you for reporting this problem! --Lars M. From Alexandre.Fayolle@logilab.fr Tue Oct 24 17:48:25 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 24 Oct 2000 18:48:25 +0200 (CEST) Subject: [XML-SIG] xmlproc DTD api bug In-Reply-To: Message-ID: On 24 Oct 2000, Lars Marius Garshol wrote: > If you want to fix the immediate problem, add this method to the > ElementTypeAny class in xmldtd.py: > > def get_valid_elements(self, state): > return [] Since the involved code is to be distributed, I prefer avoiding patching dependencies, so I set up a workaround in the calling code. This workaround may last for some time since the caller knows of the dtd, and is able to call get_elements() by itself. Thanks for the quick answer. -- Alexandre Fayolle http://www.logilab.com LOGILAB, Paris (France). From Joakim.Hove@phys.ntnu.no Tue Oct 24 20:34:12 2000 From: Joakim.Hove@phys.ntnu.no (Joakim Hove) Date: 24 Oct 2000 21:34:12 +0200 Subject: [XML-SIG] Quotes example PyXML-0.6.1 seems to ignore DTD? Message-ID: /----------------------------------------------------------------- | Please excuse if this mail has apperead on the list previously,=20 | I had some problems sending it initially. \----------------------------------------------------------------- Hello, I have just installed the PyXML-0.6.1 distribution. This is actually an attempt to teach myself _both_ about Python _and_ XML - ideally one should probably concentrate on one new thing at a time. Anyway, I'am experimenting with the qtfmt.py program in demos/quotes/, this program is supplied with a sample XML file, and an accompanying DTD - file. bash% head -2 sample.xml As we can see the sample.xml file should be validated with respect to (??) the DTD file "quotations.dtd". Now if I rename quotations.dtd to something else, I was expecting to trigger a run-time error of some kind, as the DTD-file specified in the XML is no longer to be found, however no error occurs, and the output from the qtfmt.py program is unchanged. I am not able to assure that the parser used in qtfmt.py is validating (the relevant part of the qtmft.py file): # Enforce the use of the Expat parser, because the code needs to be # sure that the output will be UTF-8 encoded. p=3Dsaxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat") but if this is indeed a non-validating parser - then it is somewhat misleading to ship the quotations.dtd file - which is actually not used. If anyone could clear up these misunderstandings I would be most grateful. -- Joakim Hove --=20 =3D=3D=3D Joakim Hove www.phys.ntnu.no/~hove/ =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D # Institutt for fysikk (735) 93637 / E3-166 | Sk=F8yensgate 10D # # N - 7491 Trondheim hove@phys.ntnu.no | N - 7030 Trondheim # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 73 93= 31 68 =3D=3D=3D=3D=3D=3D=3D=3D From akuchlin@mems-exchange.org Tue Oct 24 20:52:51 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 24 Oct 2000 15:52:51 -0400 Subject: [XML-SIG] Quotes example PyXML-0.6.1 seems to ignore DTD? In-Reply-To: ; from Joakim.Hove@phys.ntnu.no on Tue, Oct 24, 2000 at 09:34:12PM +0200 References: Message-ID: <20001024155251.A15058@kronos.cnri.reston.va.us> On Tue, Oct 24, 2000 at 09:34:12PM +0200, Joakim Hove wrote: > # Enforce the use of the Expat parser, because the code needs to be > # sure that the output will be UTF-8 encoded. > p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat") >but if this is indeed a non-validating parser - then it is somewhat >misleading to ship the quotations.dtd file - which is actually not >used. Correct, Expat is a non-validating parser. But enforcing the use of Expat is a hack because at the time only Expat would provide UTF-8 output. Probably that hack is no longer necessary with Python 2.0, since it could just feed a Unicode string to xmlproc, which is a validating parser, and then convert to the desired output encoding. I'll add it to my stack of things to do. --amk From akuchlin@mems-exchange.org Wed Oct 25 03:40:38 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Tue, 24 Oct 2000 22:40:38 -0400 Subject: [XML-SIG] Determining output encoding of a SAX parser Message-ID: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> Is there any way to determine the encoding of the output from a SAX1 parser driver? It's clear if the callbacks are being passed Unicode strings, but with 8-bit strings you have no way of knowing if they're in Latin1 or UTF-8 or anything (unless you know what parser you're using). Given that SAX2 does seem to support this with XMLReader.{get,set}Encoding(), is this worth fixing in SAX1? --amk From fdrake@acm.org Wed Oct 25 03:33:07 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Oct 2000 22:33:07 -0400 (EDT) Subject: [XML-SIG] Determining output encoding of a SAX parser In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> Message-ID: <14838.18019.793018.508809@cj42289-a.reston1.va.home.com> A.M. Kuchling writes: > Given that SAX2 does seem to support this with > XMLReader.{get,set}Encoding(), is this worth fixing in SAX1? I thought we'd decided to drop SAX 1 support. Perhaps I'm mis-remembering? -Fred -- Fred L. Drake, Jr. PythonLabs Team Member From martin@loewis.home.cs.tu-berlin.de Wed Oct 25 07:17:11 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 25 Oct 2000 08:17:11 +0200 Subject: [XML-SIG] Re: How to proceed In-Reply-To: (message from Lars Marius Garshol on 24 Oct 2000 13:50:21 +0200) References: Message-ID: <200010250617.IAA00842@loewis.home.cs.tu-berlin.de> > The question of rsskit and dtddoc still remains open, though. I don't know what this is, who would use it, or how good it works for the purposes it is designed for, so I can't really comment. All I can say that if you consider it free software (in the speech sense), and related to PyXML, and if its presence doesn't break anything, then I certainly won't object to ship it together with PyXML. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Oct 25 07:25:43 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 25 Oct 2000 08:25:43 +0200 Subject: [XML-SIG] Determining output encoding of a SAX parser In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> (amk@mira.erols.com) References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> Message-ID: <200010250625.IAA00904@loewis.home.cs.tu-berlin.de> > Is there any way to determine the encoding of the output from a SAX1 > parser driver? It's clear if the callbacks are being passed Unicode > strings, but with 8-bit strings you have no way of knowing if they're > in Latin1 or UTF-8 or anything (unless you know what parser you're > using). = > = > Given that SAX2 does seem to support this with > XMLReader.{get,set}Encoding(), is this worth fixing in SAX1? = I don't think it is worth to fix anything with SAX1, unless documented functionality is clearly broken. =46rom Python 1.6 on, I'd expect drivers to produce Unicode objects in most cases (although only expat currently does), in which case the encoding of the input would be irrelevant. Please note that the {get,set}Encoding() is on the InputSource, not on the XMLReader. I don't know whether the reader is supposed to invoke setEncoding on the source once it sees an encoding=3D attribute. Regards, Martin From larsga@garshol.priv.no Wed Oct 25 10:39:18 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 25 Oct 2000 11:39:18 +0200 Subject: [XML-SIG] Determining output encoding of a SAX parser In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com> Message-ID: * A. M. Kuchling | | Is there any way to determine the encoding of the output from a SAX1 | parser driver? No, there is not. You simply get 8-bit strings with no semantics attached. | Given that SAX2 does seem to support this with | XMLReader.{get,set}Encoding(), There is no XMLReader.{get,set}Encoding() in Python or Java SAX 2.0. There are methods with these names on InputSource, but that is something completely different. | is this worth fixing in SAX1? No, I don't think it is. SAX 1.0 is obsolete now, and we should all move on to SAX 2.0. In SAX 2.0, the goal is to have all drivers (or at least as close to all as possible) emit Unicode strings. --Lars M. From Nicolas.Chauvat@logilab.fr Wed Oct 25 15:45:00 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Wed, 25 Oct 2000 16:45:00 +0200 (CEST) Subject: [XML-SIG] Character encodings and expat Message-ID: Hi, It looks like expat refuses the alias "latin1" for the encoding "ISO-8859-1" as it returns a fatalError that raises a SaxException when using Sax2.FromXml('=E0=E9=E8=F9<= /try>') The XML spec says that parsers *may* recognize aliases defined by IANA, so I wouldn't call it a bug. Did I miss a parameter to set up somewhere to get expat to recognize "latin1" ? --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From martin@loewis.home.cs.tu-berlin.de Thu Oct 26 00:12:28 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 26 Oct 2000 01:12:28 +0200 Subject: [XML-SIG] Character encodings and expat In-Reply-To: (message from Nicolas Chauvat on Wed, 25 Oct 2000 16:45:00 +0200 (CEST)) References: Message-ID: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> > It looks like expat refuses the alias "latin1" for the encoding > "ISO-8859-1" as it returns a fatalError that raises a SaxException when > using > > Sax2.FromXml('àéèù') > > The XML spec says that parsers *may* recognize aliases defined by IANA, so > I wouldn't call it a bug. Did I miss a parameter to set up somewhere to > get expat to recognize "latin1" ? Once xmlproc is capable of producing Unicode, it will certainly understand all encodings that the Python 2.0 encoding machinery knows of; that includes "latin1". We should also strive for teaching expat to use the Python encoding machinery, but that may be more difficult. Any volunteers? If you *just* want it to recognize "latin1", you should extend xmltok/xmltok.c:getEncodingIndex. Regards, Martin From larsga@garshol.priv.no Fri Oct 27 10:05:46 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 27 Oct 2000 11:05:46 +0200 Subject: [XML-SIG] Character encodings and expat In-Reply-To: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> Message-ID: * Martin v. Loewis | | Once xmlproc is capable of producing Unicode, it will certainly | understand all encodings that the Python 2.0 encoding machinery knows | of; that includes "latin1". Yup. I plan to teach xmlproc the IANA registry, so that this should not be a problem with xmlproc. However, it is a problem that Python does not support any of the Far East encodings yet. Does anyone know if there are any plans to change that? | We should also strive for teaching expat to use the Python encoding | machinery, but that may be more difficult. Any volunteers? I don't think it's really all that difficult. It should be possible to use the Python codec system to produce utf-16, and then you feed this to expat and fix the encoding as "utf-16" in the call to ParserCreate. The only possible stumbling block is when expat discovers an XML declaration that says something other than "utf-16"... --Lars M. From mal@lemburg.com Fri Oct 27 11:07:37 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Oct 2000 12:07:37 +0200 Subject: [XML-SIG] Character encodings and expat References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> Message-ID: <39F953E9.493A0DE1@lemburg.com> Lars Marius Garshol wrote: > > * Martin v. Loewis > | > | Once xmlproc is capable of producing Unicode, it will certainly > | understand all encodings that the Python 2.0 encoding machinery knows > | of; that includes "latin1". > > Yup. I plan to teach xmlproc the IANA registry, so that this should > not be a problem with xmlproc. You might want to have a look at the code in encodings/aliases.py It includes the aliasing "database" which the encodings package uses to map encoding names to codec names. If not all IANA names are included in this list, it would be a good idea adding them... > However, it is a problem that Python does not support any of the Far > East encodings yet. Does anyone know if there are any plans to change > that? Tamito KAJIYAMA has written a few Asian cocecs. These are not high-performance, but fairly complete and also a great example of how codecs package can be written. More about this on the i18n-sig mailing list. > | We should also strive for teaching expat to use the Python encoding > | machinery, but that may be more difficult. Any volunteers? > > I don't think it's really all that difficult. It should be possible > to use the Python codec system to produce utf-16, and then you feed > this to expat and fix the encoding as "utf-16" in the call to > ParserCreate. > > The only possible stumbling block is when expat discovers an XML > declaration that says something other than "utf-16"... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From larsga@garshol.priv.no Fri Oct 27 11:24:09 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 27 Oct 2000 12:24:09 +0200 Subject: [XML-SIG] Character encodings and expat In-Reply-To: <39F953E9.493A0DE1@lemburg.com> References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com> Message-ID: * Lars Marius Garshol | | Yup. I plan to teach xmlproc the IANA registry, so that this should | not be a problem with xmlproc. * mal@lemburg.com | | You might want to have a look at the code in encodings/aliases.py | It includes the aliasing "database" which the encodings package uses | to map encoding names to codec names. | | If not all IANA names are included in this list, it would be | a good idea adding them... It would indeed. :) I intended to submit a patch if I found any to be missing. * Lars Marius Garshol | | However, it is a problem that Python does not support any of the Far | East encodings yet. Does anyone know if there are any plans to change | that? * mal@lemburg.com | | Tamito KAJIYAMA has written a few Asian cocecs. These are not | high-performance, but fairly complete and also a great example of | how codecs package can be written. That's only Shift-JIS and EUC-JP, though. Is there any concerted effort afoot to make a more complete set? At the very least, ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. | More about this on the i18n-sig mailing list. Well, if only a single response is required I would prefer to get that here. --Lars M. From andorxor@gmx.de Fri Oct 27 18:40:52 2000 From: andorxor@gmx.de (Stephan Tolksdorf) Date: Fri, 27 Oct 2000 19:40:52 +0200 Subject: [XML-SIG] Improvement pulldom and minidom Message-ID: <315158556.20001027194052@email.com> Hello, I would like to have the two methods hasAttribute and hasAttributeNS of DOM2's Element in minidom included. They are very easy to implement and rather usefull: ---------- class Element: (...) def hasAttribute(self, name): return self._attrs.has_key(name) def hasAttributeNS(self, namespaceURI, localName): return self._attrsNS.has_key((namespaceURI, localName)) ---------- Additonally I propose to replace l. 220 in pulldom.py (def parse): if type(stream_or_string) is type(""): with: if type(stream_or_string in [type(""), type(u'')]: like it is done in saxutils.prepare_input_source. Just to make it possible to pass unicode filenames directly to the function. Best regards, Stephan Tolksdorf From fdrake@acm.org Fri Oct 27 18:39:57 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 27 Oct 2000 13:39:57 -0400 (EDT) Subject: [XML-SIG] Improvement pulldom and minidom In-Reply-To: <315158556.20001027194052@email.com> References: <315158556.20001027194052@email.com> Message-ID: <14841.48621.686427.513012@cj42289-a.reston1.va.home.com> Stephan Tolksdorf writes: > I would like to have the two methods hasAttribute and hasAttributeNS > of DOM2's Element in minidom included. These are good suggestions. I'd like Paul Prescod to take a look at this to make a final determination since he wrote that code. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs Team Member From Nicolas.Chauvat@logilab.fr Fri Oct 27 21:36:58 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Fri, 27 Oct 2000 22:36:58 +0200 (CEST) Subject: [XML-SIG] (4DOM) Misleading error message in StyleSheetReader Message-ID: Hi there, As stated in the subject, there is a misleading error message in xml.xslt.StyleSheetReader.py/StyleSheetGenerator.__initialize(): if the xmlns:xsl=3D"URI" of an xsl:transform node is not equal to the XSL_NAMESPACE defined in xml.xslt.__init__.py, it will raise an Error.STYLESHEET_MISSING_VERSION error and complain about a missing version attribute. I'd say it's a wrong diagnostic. I suppose it should check first that the URI refered to by xmlns:xsl is the same as XSL_NAMESPACE, but I'm not sure how things work in there and I'm too tired to submit a patch today. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From martin@loewis.home.cs.tu-berlin.de Fri Oct 27 22:24:56 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Oct 2000 23:24:56 +0200 Subject: [XML-SIG] Character encodings and expat In-Reply-To: (message from Lars Marius Garshol on 27 Oct 2000 11:05:46 +0200) References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> Message-ID: <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> > Yup. I plan to teach xmlproc the IANA registry, so that this should > not be a problem with xmlproc. With due respect, I hope this is not the way it that is done. Instead, I think codecs.lookup should know the IANA registry. It may be that this information comes with PyXML only for now, but it should be available to all Python applications. E.g. xml/__init__.py could do codecs.register(iana_lookup) where iana_lookup simply maps encodings to the "normalized" form. I agree with MAL that this should eventually end-up in Python proper. In any case, knowing the official aliases should not be restricted to xmlproc. > However, it is a problem that Python does not support any of the Far > East encodings yet. Does anyone know if there are any plans to change > that? Again, I'd see no problem including Tamito Kajiyama's code in PyXML, if he wants us to ship it - or we could recommend JapaneseCodecs as an valuable addition to PyXML; this package also uses the distutils, so it is quite easy to install. [using Python codecs in expat] > I don't think it's really all that difficult. [...] > The only possible stumbling block is when expat discovers an XML > declaration that says something other than "utf-16"... Wouldn't that be the normal case where encodings other than UTF-8 become interesting? I'd assume that most XML documents which don't use UTF-8 do declare the encoding in the XML declaration, instead of relying on some higher-level protocol to correctly transmit encoding information. So I'd rather see an approach where expat itself finds out eventually what the encoding is, and then goes to the application (i.e. the Python SAX driver) and asks to convert the input. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Oct 27 22:47:42 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Oct 2000 23:47:42 +0200 Subject: [XML-SIG] Character encodings and expat In-Reply-To: (message from Lars Marius Garshol on 27 Oct 2000 12:24:09 +0200) References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com> Message-ID: <200010272147.XAA00953@loewis.home.cs.tu-berlin.de> > That's only Shift-JIS and EUC-JP, though. Is there any concerted > effort afoot to make a more complete set? At the very least, > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. I'd hope that somebody exposes the operating system's converters to Python. For example, on Linux and Solaris, the iconv library offers a wide variety of codecs (at least in its gconv form), which are also highly performant. On W2k, a huge set of converters is available, which just waits being exposed to Python. I'm always concerned by the fact that every package seems to come with its own set of conversion tables, instead on relying on other people to do a good job (and report bugs if they don't). Tcl has such tables, Java does, X11 has some, ICU has more - I really can't see the reason to reimplement them all again in Python. > | More about this on the i18n-sig mailing list. > > Well, if only a single response is required I would prefer to get that > here. This is free software. You never get away with a single response only :-) Regards, Martin From mal@lemburg.com Sat Oct 28 14:54:19 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 28 Oct 2000 15:54:19 +0200 Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> Message-ID: <39FADA8B.8D5FE731@lemburg.com> "Martin v. Loewis" wrote: > > > Yup. I plan to teach xmlproc the IANA registry, so that this should > > not be a problem with xmlproc. > > With due respect, I hope this is not the way it that is done. Instead, > I think codecs.lookup should know the IANA registry. It may be that > this information comes with PyXML only for now, but it should be > available to all Python applications. E.g. xml/__init__.py could > do > > codecs.register(iana_lookup) > > where iana_lookup simply maps encodings to the "normalized" form. That would be another option (this codec search function design turns out to be far more useful than originally though ;-)... > I agree with MAL that this should eventually end-up in Python proper. > In any case, knowing the official aliases should not be restricted to > xmlproc. Right. Python's encodings package should know at least about all common aliases used for the provided codecs. Do you have a pointer to a list of IANA aliases ? > > However, it is a problem that Python does not support any of the Far > > East encodings yet. Does anyone know if there are any plans to change > > that? > > Again, I'd see no problem including Tamito Kajiyama's code in PyXML, > if he wants us to ship it - or we could recommend JapaneseCodecs as an > valuable addition to PyXML; this package also uses the distutils, so > it is quite easy to install. I think it should distributed as separate package: the codecs are useful in a lot of contexts -- not only XML. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sat Oct 28 15:09:13 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 28 Oct 2000 16:09:13 +0200 Subject: [XML-SIG] Character encodings and expat References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com> <200010272147.XAA00953@loewis.home.cs.tu-berlin.de> Message-ID: <39FADE09.D257A7DF@lemburg.com> "Martin v. Loewis" wrote: > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted > > effort afoot to make a more complete set? At the very least, > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. > > I'd hope that somebody exposes the operating system's converters to > Python. For example, on Linux and Solaris, the iconv library offers a > wide variety of codecs (at least in its gconv form), which are also > highly performant. On W2k, a huge set of converters is available, > which just waits being exposed to Python. > > I'm always concerned by the fact that every package seems to come with > its own set of conversion tables, instead on relying on other people > to do a good job (and report bugs if they don't). Tcl has such tables, > Java does, X11 has some, ICU has more - I really can't see the reason > to reimplement them all again in Python. Sure would be nice... the only problem I see is that the different codecs for the Asian scripts will most probably behave differently, e.g. there are many issues with private code point areas in Unicode and the various Asian encodings. It would still be nice to have different codec packages around though -- even if they all implement the same converters, e.g. AsianCharmapCodecs, NativeWin32Codecs, NativeCLibCodecs, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From frank63@ms5.hinet.net Sun Oct 29 00:46:50 2000 From: frank63@ms5.hinet.net (Frank J.S. Chen) Date: Sat, 28 Oct 2000 23:46:50 -0000 Subject: [XML-SIG] Character encodings and expat Message-ID: <200010281546.XAA15848@ms5.hinet.net> > > > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted > > > effort afoot to make a more complete set? At the very least, > > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. > > > > Sure would be nice... the only problem I see is that the > different codecs for the Asian scripts will most probably > behave differently, e.g. there are many issues with private > code point areas in Unicode and the various Asian encodings. For now, all CJK Unicode characters reside in Basic Multilingual Plane(Plane 0). It seems no need to consider surrogate area or private use area right now. What we need is indeed a transcoding interface to convert different locales to UTF-8/UTF-16 and then back. From martin@loewis.home.cs.tu-berlin.de Sat Oct 28 21:34:45 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Oct 2000 22:34:45 +0200 Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat In-Reply-To: <39FADA8B.8D5FE731@lemburg.com> (mal@lemburg.com) References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> <39FADA8B.8D5FE731@lemburg.com> Message-ID: <200010282034.WAA00771@loewis.home.cs.tu-berlin.de> > Do you have a pointer to a list of IANA aliases ? It's at http://www.isi.edu/in-notes/iana/assignments/character-sets Regards, Martin From andy@reportlab.com Sun Oct 29 07:14:01 2000 From: andy@reportlab.com (Andy Robinson) Date: Sun, 29 Oct 2000 07:14:01 -0000 Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat In-Reply-To: <200010272147.XAA00953@loewis.home.cs.tu-berlin.de> Message-ID: > -----Original Message----- > From: i18n-sig-admin@python.org [mailto:i18n-sig-admin@python.org]On > Behalf Of Martin v. Loewis > Sent: 27 October 2000 22:48 > To: larsga@garshol.priv.no > Cc: i18n-sig@python.org; xml-sig@python.org > Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat > > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted > > effort afoot to make a more complete set? At the very least, > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be > implemented. > That was the intention, but I admit we have run out of steam somewhat. Tamito Kajiyama is the only person to have made a really big contribution. I was hoping to, but that hope was on the basis of a large customer project needing this stuff which got cancelled, and running a startup is taking so much time that I won't manage much until ReportLab gets a customer who needs to reencode data. When that happens, we'll have to do it, and fast. As an aside, we're doing the work to allow use of Adobe's Asian Font Packs in reportlab at the moment, and they use the native encodings. So once that comes out, we'll be under a lot of pressure to do it. I am very hopeful of the first half of next year if no one else has done the work already. In the meantime, frankly, not enough people need it badly enough and nobody but Tamito has had a go. Volunteers welcome! >I'm always concerned by the fact that every package seems to come with >its own set of conversion tables, instead on relying on other people >to do a good job (and report bugs if they don't). Tcl has such tables, >Java does, X11 has some, ICU has more - I really can't see the reason >to reimplement them all again in Python. I don't use Tcl, Java or X11 and don't know what ICU is, but I do use Python on several platforms and would want to know that the encodings library worked identically on all platforms - i.e. if there are bugs in the codecs, they are consistent and can be fixed consistently. I think this issue was pretty much settled in MAL's original i18n proposal. However, no sane person retypes mapping tables; if we built something Pythonic we'd hopefully do it by extracting data from two different sources, building our own tables and checking they got identical results. With compression into a Zip file and careful use of diff-like techniques (all the obscure Asian codecs go like 'take this base encoding and add these extra code points'), I believe a good codec database could be quite small. - Andy Robinson From martin@loewis.home.cs.tu-berlin.de Sun Oct 29 16:46:06 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 29 Oct 2000 17:46:06 +0100 Subject: [XML-SIG] Improvement pulldom and minidom In-Reply-To: <315158556.20001027194052@email.com> (message from Stephan Tolksdorf on Fri, 27 Oct 2000 19:40:52 +0200) References: <315158556.20001027194052@email.com> Message-ID: <200010291646.RAA00933@loewis.home.cs.tu-berlin.de> > I would like to have the two methods hasAttribute and hasAttributeNS > of DOM2's Element in minidom included. [...] > Additonally I propose to replace These patches look good to me, so I have installed them. Please try to produce context or unified diffs the next time (e.g. like the ones I have attached below), either using diff or cvs diff. Regards, Martin Index: minidom.py =================================================================== RCS file: /cvsroot/pyxml/xml/xml/dom/minidom.py,v retrieving revision 1.6 diff -u -r1.6 minidom.py --- minidom.py 2000/10/20 17:19:59 1.6 +++ minidom.py 2000/10/29 16:33:45 @@ -346,6 +346,12 @@ node.unlink() del self._attrs[node.name] del self._attrsNS[(node.namespaceURI, node.localName)] + + def hasAttribute(self, name): + return self._attrs.has_key(name) + + def hasAttributeNS(self, namespaceURI, localName): + return self._attrsNS.has_key((namespaceURI, localName)) def getElementsByTagName(self, name): return _getElementsByTagNameHelper(self, name, []) Index: pulldom.py =================================================================== RCS file: /cvsroot/pyxml/xml/xml/dom/pulldom.py,v retrieving revision 1.5 diff -u -r1.5 pulldom.py --- pulldom.py 2000/10/20 16:59:42 1.5 +++ pulldom.py 2000/10/29 16:42:20 @@ -1,6 +1,12 @@ import minidom import xml.sax,xml.sax.handler +import types +try: + _StringTypes = [types.StringType, types.UnicodeType] +except AttributeError: + _StringTypes = [types.StringType] + START_ELEMENT = "START_ELEMENT" END_ELEMENT = "END_ELEMENT" COMMENT = "COMMENT" @@ -217,7 +223,7 @@ default_bufsize = (2 ** 14) - 20 def parse(stream_or_string, parser=None, bufsize=default_bufsize): - if type(stream_or_string) is type(""): + if type(stream_or_string) in _StringTypes: stream = open(stream_or_string) else: stream = stream_or_string From uche.ogbuji@fourthought.com Sun Oct 29 19:16:53 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 29 Oct 2000 12:16:53 -0700 Subject: [XML-SIG] Error in pyexpat docs Message-ID: <200010291916.MAA16420@localhost.localdomain> An excerpt from the Python 2.0 docs for pyexpat: """ ParseFile (file) Parse XML data reading from the object file. file only needs to provide the read(nbytes) method, returning the empty string when there's no more data. [snip] The following attributes contain values relating to the most recent error encountered by an xmlparser object, and will only have correct values once a call to Parse() or ParseFile() has raised a xml.parsers.expat.error exception. ErrorByteIndex Byte index at which an error occurred. [etc.] """ The wrong, "xml.parsers.expat" is the first indicator that there might be a problem, yet I took the docs at their word and wrapped the call to ParseFile in a blanket try/except, only to find that no exception of any sort is ever raised by ParseFile. It turns out that ParseFile actually returns 0 on error, returning 1 otherwise. The first matter is that the code and the docs need to be reconciled. However, I would _much_ rather prefer that things were as in the docs. I think ParseFile should raise an exception rather than return an error flag. Interestingly enough, this is the same argument I had with a colleague just last week. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mal@lemburg.com Mon Oct 30 08:52:58 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 30 Oct 2000 09:52:58 +0100 Subject: [XML-SIG] Character encodings and expat References: <200010281546.XAA15848@ms5.hinet.net> Message-ID: <39FD36EA.E307382B@lemburg.com> "Frank J.S. Chen" wrote: > > > > > > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted > > > > effort afoot to make a more complete set? At the very least, > > > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. > > > > > > > Sure would be nice... the only problem I see is that the > > different codecs for the Asian scripts will most probably > > behave differently, e.g. there are many issues with private > > code point areas in Unicode and the various Asian encodings. > > For now, all CJK Unicode characters reside in Basic Multilingual > Plane(Plane 0). > It seems no need to consider surrogate area or private use area right now. But there is a private use area in the BMP as well... and if you plan to write round-trip safe codecs for corporate character sets, then you'll have to use these to make the transfer safe. > What we need is indeed a transcoding interface to convert different locales > to UTF-8/UTF-16 and then back. I not sure I understand you here: there are quite a few codecs available in the std Python lib which are readily usable and the locale.py module has a database of many default encodings for the various locales. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Mon Oct 30 22:38:47 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 30 Oct 2000 23:38:47 +0100 Subject: [XML-SIG] Character encodings and expat In-Reply-To: <39FD36EA.E307382B@lemburg.com> (mal@lemburg.com) References: <200010281546.XAA15848@ms5.hinet.net> <39FD36EA.E307382B@lemburg.com> Message-ID: <200010302238.XAA00854@loewis.home.cs.tu-berlin.de> > But there is a private use area in the BMP as well... and if you > plan to write round-trip safe codecs for corporate character sets, > then you'll have to use these to make the transfer safe. Well, you can't make round-trip encoding safe for them - that is the very nature of the private use area. If convert set A to Unicode, using the private map, then convert to set B, and back from there, you likely lose. If there are "official" mappings between some corporate's character set and Unicode, then I'd expect all converters that support the corporate character set also to treat the private use area in the same way. If there are no official mappings published by the corporation, then you are better of using the platform converters on the corporation's operating system. Those will definitely get the private use area right; the ones provided by Python in a cross-platform cross-vendor way might not. Regards, Martin From mal@lemburg.com Mon Oct 30 22:57:10 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 30 Oct 2000 23:57:10 +0100 Subject: [XML-SIG] Character encodings and expat References: <200010281546.XAA15848@ms5.hinet.net> <39FD36EA.E307382B@lemburg.com> <200010302238.XAA00854@loewis.home.cs.tu-berlin.de> Message-ID: <39FDFCC6.D9599CE2@lemburg.com> "Martin v. Loewis" wrote: > > > But there is a private use area in the BMP as well... and if you > > plan to write round-trip safe codecs for corporate character sets, > > then you'll have to use these to make the transfer safe. > > Well, you can't make round-trip encoding safe for them - that is the > very nature of the private use area. If convert set A to Unicode, > using the private map, then convert to set B, and back from there, you > likely lose. True. With "round trip" I meant encoding A -> Unicode -> encoding A. This is often needed in order to do processing on the data and should be a 1-1 mapping if possible. > If there are "official" mappings between some corporate's character > set and Unicode, then I'd expect all converters that support the > corporate character set also to treat the private use area in the same > way. > > If there are no official mappings published by the corporation, then > you are better of using the platform converters on the corporation's > operating system. Those will definitely get the private use area > right; the ones provided by Python in a cross-platform cross-vendor > way might not. Right. Perhaps the codecs should warn about these conversions by applying error handling to them (raise exceptions, ignore, replace, etc.) ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From larsga@garshol.priv.no Tue Oct 31 10:56:25 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 31 Oct 2000 11:56:25 +0100 Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat In-Reply-To: <39FADA8B.8D5FE731@lemburg.com> References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> <39FADA8B.8D5FE731@lemburg.com> Message-ID: * Martin von Loewis | | Again, I'd see no problem including Tamito Kajiyama's code in PyXML, | if he wants us to ship it - or we could recommend JapaneseCodecs as an | valuable addition to PyXML; this package also uses the distutils, so | it is quite easy to install. * mal@lemburg.com | | I think it should distributed as separate package: the codecs | are useful in a lot of contexts -- not only XML. Agreed. Anyone who wants the codecs at all will want them regardless of whether they want the XML package or not. --Lars M. From larsga@garshol.priv.no Tue Oct 31 11:01:24 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 31 Oct 2000 12:01:24 +0100 Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat In-Reply-To: References: Message-ID: * Lars Marius Garshol | | That's only Shift-JIS and EUC-JP, though. Is there any concerted | effort afoot to make a more complete set? At the very least, ISO | 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented. * Andy Robinson | | That was the intention, but I admit we have run out of steam | somewhat. Tamito Kajiyama is the only person to have made a really | big contribution. [...] Volunteers welcome! Then I may have a go at it if I can find the time. I've written codecs for all these in C++ over the past few weeks, so it should be a simple job to redo it for Python. (It was for a closed-source project, so it can unfortunately not be reused directly.) | However, no sane person retypes mapping tables; if we built | something Pythonic we'd hopefully do it by extracting data from two | different sources, building our own tables and checking they got | identical results. www.unicode.org provides mapping tables that are really easy to parse with a Python script in order to build tables. | With compression into a Zip file and careful use of diff-like | techniques (all the obscure Asian codecs go like 'take this base | encoding and add these extra code points'), I believe a good codec | database could be quite small. My binary collection of conversion tables for ISO 8859 1->15, Windows-12xx, koi8-r, VISCII, Shift-JIS, EUC-JP, ISO 2022-JP, Big5, EUC-KR and GB-2312 is about 90k. --Lars M. From richard@iopen.co.nz Tue Oct 31 12:07:34 2000 From: richard@iopen.co.nz (richard@iopen.co.nz) Date: Wed, 1 Nov 2000 01:07:34 +1300 (NZDT) Subject: [XML-SIG] Multiple top nodes Message-ID: Greetings - Recently I installed PyXML 0.6.1, having previously been using 0.5.2. I had an XML parser written with 0.5.2 which took an XML document with multiple top nodes and created DOM from it (ie. it created a DocumentFragment that wasn't well formed XML). eg. a file of xml snippets - --- --- to do this in 0.5.2 I had a pretty simple fragment of code such as - --- parser=saxexts.make_parser() dh=SaxBuilder() dh.buildFragment() parser.setDocumentHandler(snippetFile) parser.parse(snippetfile) --- which worked great. I'm having exactly 0 luck getting the same sort of thing to work with 0.6.1. The key in this case was using a DocumentHandler that stuffed the DOM into a DocumentFragment, not a Document (which must be well formed). Anyway, to get to the point, could someone please give me a pointer to do the same with 0.6.1? I'm sure that it is as simple (or at least close) in 0.6.1, but I'm probably missing the obvious. Regards, Richard Waid Network/Software Engineer iOpen Technologies Ltd. From larsga@garshol.priv.no Tue Oct 31 12:34:14 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 31 Oct 2000 13:34:14 +0100 Subject: [XML-SIG] Multiple top nodes In-Reply-To: References: Message-ID: * richard@iopen.co.nz | | [...] which worked great. I'm having exactly 0 luck getting the same | sort of thing to work with 0.6.1. The key in this case was using a | DocumentHandler that stuffed the DOM into a DocumentFragment, not a | Document (which must be well formed). Could you provide us with the exact error message you get? I have a suspicion that your problem might be related to your trying to parse a non-wellformed XML document. :-) --Lars M. From martin@loewis.home.cs.tu-berlin.de Tue Oct 31 20:09:00 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 31 Oct 2000 21:09:00 +0100 Subject: [XML-SIG] Moving towards 0.6.2 Message-ID: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de> I'm going to release PyXML 0.6.2 later this week or next week. If you have any patches that you'd like in that release, please let me know, or commit them yourself. Regards, Martin From akuchlin@mems-exchange.org Tue Oct 31 20:20:07 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 31 Oct 2000 15:20:07 -0500 Subject: [XML-SIG] Moving towards 0.6.2 In-Reply-To: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Oct 31, 2000 at 09:09:00PM +0100 References: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de> Message-ID: <20001031152007.A12433@kronos.cnri.reston.va.us> On Tue, Oct 31, 2000 at 09:09:00PM +0100, Martin v. Loewis wrote: >I'm going to release PyXML 0.6.2 later this week or next week. If you >have any patches that you'd like in that release, please let me know, >or commit them yourself. Resyncing with the current version (or the CVS tree?) of 4DOM would be a good idea, since I think some bugs have been fixed that are still present. (For example, the incorrect XML declaration in xml/dom/ext/Printer.py, reported by Jennifer Wells a while back.) Uche, Mike: any suggestions about this? --amk From richard@iopen.co.nz Tue Oct 31 20:39:14 2000 From: richard@iopen.co.nz (richard@iopen.co.nz) Date: Wed, 1 Nov 2000 09:39:14 +1300 (NZDT) Subject: [XML-SIG] Multiple top nodes In-Reply-To: Message-ID: On 31 Oct 2000, Lars Marius Garshol wrote: > * richard@iopen.co.nz > | > | [...] which worked great. I'm having exactly 0 luck getting the same > | sort of thing to work with 0.6.1. The key in this case was using a > | DocumentHandler that stuffed the DOM into a DocumentFragment, not a > | Document (which must be well formed). > > Could you provide us with the exact error message you get? > > I have a suspicion that your problem might be related to your trying > to parse a non-wellformed XML document. :-) Thanks for the reply. A quick example - With a file called some.xml with contents - --- --- and using FromXmlFile to parse some.xml into a DOM - --- Python 2.0 (#1, Oct 16 2000, 18:10:03) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from xml.dom.ext.reader.Sax2 import FromXmlFile >>> xml_dom=FromXmlFile('out.xml') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 276, in FromXmlFile rv = FromXmlStream(fp, ownerDocument, validate, keepAllWs, catName, saxHandlerClass) File "/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 256, in FromXmlStream parser.parseFile(stream) File "/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/sax/drivers/drv_pyexpat.py", line 68, in parseFile if self.parser.Parse(buf, 0) != 1: xml.parsers.expat.error: junk after document element: line 6, column 0 >>> --- Which I would have expected, if it was trying to parse into a Document. My reading of - http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-core.html#ID-B63ED1A3 suggests that if I was to parse into a DocumentFragment (which is what I did with SaxBuilder in my original example), the document should be able to have more than one top-level node. A little more detail might help -- I'm looking to have a number of different programs writing 'snippets' of XML into different files, which then get read by a different program, parsed into DocumentFragment, and then written back out to a file which will be Well Formed. I could just read the files in blindly, manually put root tags around the lot, and write them out again, but I may want to manipulate the DOM before I write it out, and I'd like to check that the fragments are individually well formed (well, as well formed as a document could be with multiple top nodes). Many thanks, Richard Waid Network/Software Engineer iOpen Technologies Ltd. From martin@loewis.home.cs.tu-berlin.de Tue Oct 31 21:17:10 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 31 Oct 2000 22:17:10 +0100 Subject: [XML-SIG] Multiple top nodes In-Reply-To: (richard@iopen.co.nz) References: Message-ID: <200010312117.WAA01998@loewis.home.cs.tu-berlin.de> > Which I would have expected, if it was trying to parse into a Document. Well, you are trying to parse a document. At least this is what FromXml* assumes. It *will* assume that you want to create a DocumentFragment if you pass it the ownerDocument parameter, e.g. from xml.dom import implementation from xml.dom.ext.reader import Sax2 Sax=Sax2 frag=""" """ doc = implementation.createDocument(None,None,None) print Sax.FromXml(frag,ownerDocument=doc) However, it still will use a SAX parser to parse the fragment, and SAX does not support parsing fragments. Specifically, the expat parser will complain about the ill-formedness of the document. If you absolutely need to make this work, then you can use the sgmlop driver, which performs less error checking. Unfortunately, FromXml* does not support an application-provided driver, so your only solution at the moment is to set the environment variable PY_SAX_PARSER to xml.sax.drivers.drv_sgmlop. With that setting, my program above creates a DocumentFragment. Please note that this is abusing defects in sgmlop, which should detect errors in the document more reliably. I'm surprised this worked with PyXML 0.5. Regards, Martin From richard@iopen.co.nz Tue Oct 31 21:43:52 2000 From: richard@iopen.co.nz (richard@iopen.co.nz) Date: Wed, 1 Nov 2000 10:43:52 +1300 (NZDT) Subject: [XML-SIG] Multiple top nodes In-Reply-To: <200010312117.WAA01998@loewis.home.cs.tu-berlin.de> Message-ID: On Tue, 31 Oct 2000, Martin v. Loewis wrote: > However, it still will use a SAX parser to parse the fragment, and SAX > does not support parsing fragments. Specifically, the expat parser > will complain about the ill-formedness of the document. > > If you absolutely need to make this work, then you can use the sgmlop > driver, which performs less error checking. Unfortunately, FromXml* > does not support an application-provided driver, so your only solution > at the moment is to set the environment variable PY_SAX_PARSER to > xml.sax.drivers.drv_sgmlop. With that setting, my program above > creates a DocumentFragment. > > Please note that this is abusing defects in sgmlop, which should > detect errors in the document more reliably. I'm surprised this worked > with PyXML 0.5. Thanks for the swift reply. Now that I follow the reasoning, I'm suprised it worked in 0.5.5 too. I'll rework the problem, perhaps using a modified file object to surround the fragment with an arbitrary root tag, then extract the fragment from the DOM. Not as tidy as I'd hoped, but I take heed from your warning that doing it the other way would be abusing sgmlop. I'd hate for someone to go and 'fix' it on me :) Many thanks, Richard Waid Network/Software Engineer iOpen Technologies Ltd.