From akuchlin@cnri.reston.va.us Mon Jun 1 21:51:58 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 1 Jun 1998 16:51:58 -0400 (EDT) Subject: [XML-SIG] Re: Mr. Nitpicker looks at saxlib In-Reply-To: References: <01bd8a83$bce958c0$f29b12c2@panik.pythonware.com> Message-ID: <13683.5026.113395.967455@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >I'm thinking of extending the Parser interface with some more methods >that are not part of SAX 1.0 anyway, so perhaps we can do this in a >more controlled fashion. My plan was to keep saxlib pure, but to add a >number of optional methods in a subclass of saxlib.Parser in saxexts >and implement these in all parser drivers. IMHO we really want to stay compatible with Java SAX, both for ease of use with JPython, and to avoid confusion; renaming methods and changing their signatures should be right out. On the other hand, Python-specific convenience functions and helpful subclasses are good. Your suggested extensions look fine. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Dad died in the last big earthquake to hit Pacific. It was like, after that, Mom was scared of anything chaotic in our lives. Like everything was on the edge of collapsing, getting sucked into an earthquake of chaos. -- Michael talks about his parents, in ENIGMA #3: "The Good Boy" From akuchlin@cnri.reston.va.us Tue Jun 2 15:29:07 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 2 Jun 1998 10:29:07 -0400 (EDT) Subject: [XML-SIG] saxlib finalization In-Reply-To: References: Message-ID: <13684.2557.922567.301586@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >If there are no protests, that will be the final core SAX 1.0 >interface in Python. I see no problems with the interface as described. >As for the extensions, these are the new interfaces I think we should >have: > > - ExtendedParser > > - DispatchDocHandler > >A final question: should these two last extensions be part of saxlib, >or should we keep them separate? I'd vote for "separate"; put them in saxexts.py or somewhere else, but not in saxlib. No further extensions are immediately apparent to me (assuming the simple things like a make_parser helper function have been added), but that's not very surprising; probably the only way to invent new extensions will be from actual use of saxlib. A side issue on packages: what's the style of import usage that we want to encourage? That is, do we want to encourage: import xml.sax.saxlib class MyHandler(xml.sax.saxlib.HandlerBase): ... or: from xml.sax import saxlib class MyHandler(saxlib.HandlerBase): or even: from xml.sax import * # Or 'from xml.sax.saxlib ...' class MyHandler(HandlerBase): ... My preference is for the first, longest, form, but I'm weird. Most people will probably use the shorter forms #2 or #3. >I say we make the final decision on Monday 8th of June (the deadline >is so late because I'll be offline for the 4 days prior to that >date), and then I should be able to have a fully documented package >out pretty soon. Hopefully we'll start seeing interesting stuff built >on top of SAX after that. OK. I'll try to cut a new test release of the omnibus package tonight, with a working "make install", so we can see if it compiles on lots of platforms. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Before a war military science seems a real science, like astronomy; but after a war it seems more like astrology. -- Rebecca West From hanwen@cs.uu.nl Tue Jun 9 09:23:24 1998 From: hanwen@cs.uu.nl (Han-Wen Nienhuys) Date: Tue, 9 Jun 1998 10:23:24 +0200 (MDT) Subject: [XML-SIG] Bug in XML stuff? Message-ID: <13692.61692.935545.826897@vbassa.cs.uu.nl> I sent this to guido, and he told me to take it to you: Hello all, I think that I have found some kind of bug in python: vbassa:~/aiostudie/bug> uname -a IRIX vbassa 5.3 02091401 IP22 mips vbassa:~/aiostudie/bug> gcc -v Reading specs from /packages/gcc-2.7.2/lib/gcc-lib/mips-sgi-irix5.3/2.7.2/specs gcc version 2.7.2 vbassa:~/aiostudie/bug> python t3.py -r test.xml Segmentation fault (core dumped) This is with Python 1.5.1 (compiled with termios) running on a Indy R5 (irix 5.3). I used gcc 2.7.2; for running the stuff in the attached tarball you'll need saxlib-1.0beta3 installed in /tmp/source/sax If you need additional info, I could try to recompile and do a stacktrace. the tarball is on http://www.cs.uu.nl/people/hanwen/bug.tar.gz -- Han-Wen Nienhuys, hanwen@cs.uu.nl ** GNU LilyPond - The Music Typesetter http://www.cs.uu.nl/people/hanwen/lilypond/index.html From Jack.Jansen@cwi.nl Tue Jun 9 12:04:52 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 09 Jun 1998 13:04:52 +0200 Subject: [XML-SIG] Bug in XML stuff? In-Reply-To: Message by Han-Wen Nienhuys , Tue, 9 Jun 1998 10:23:24 +0200 (MDT) , <13692.61692.935545.826897@vbassa.cs.uu.nl> Message-ID: > vbassa:~/aiostudie/bug> python t3.py -r test.xml > Segmentation fault (core dumped) > > This is with Python 1.5.1 (compiled with termios) running on a Indy R5 > (irix 5.3). I used gcc 2.7.2; for running the stuff in the attached > tarball you'll need saxlib-1.0beta3 installed in /tmp/source/sax [I'm cc-ing Guido on this too, this appears to be more a string-format problem than an xml/sax problem. En ten behoeve van xml-sig lezers hou ik 't in het engels:-)] The problem appears not to be compiler-dependent: it also crashes on my SGI O2 running 6.2, Python 151 compiled with the SGI compiler. Dbx told me the following about the problem (only the last few stack frames shown): 0 memmove(0x100d274c, 0xffffffff, 0xffffffff, 0x100d278c) ["bcopy.s":844, 0xfa6c0e8] 1 PyString_Format(0xb, 0x1011d860, 0x0, 0x0) ["../../Objects/stringobject.c":999, 0x43c7dc] 2 PyNumber_Remainder(0x100e9250, 0x1011d860, 0xffffffff, 0x100d278c) ["../../Objects/abstract.c":413, 0x461854] 3 eval_code2(0x100e9290, 0x100d3190, 0x100e9250, 0x100d3190) ["../../Python/ceval.c" So, apparently you're doing a % operation here. Trace tells me the following (only the last couple of lines shown): > printer.py:28 do_creation (135) > printer.py:23 root (29) > printer.py:17 path_to_root (24) > printer.py:17 path_to_root (20) < printer.py:21 path_to_root [0.0002] < printer.py:21 path_to_root [0.0013] < printer.py:24 root [0.0023] > printer.py:218 global_typeset (29) > grobs.py:106 add (219) > grobs.py:62 add (107) < grobs.py:66 add [0.0003] > grobs.py:32 __str__ (111) > grobs.py:26 __str__ (33) > interval.py:31 __str__ (27) < interval.py:32 __str__ [0.0003] < grobs.py:29 __str__ [0.0015] Bus error - core dumped So, apparently we're printing a Graphical_object, and self.coordinate_info[0] has been formatted successfully. Before we get a chance to format self.coordinate_info[1] we've crashed. Combined with the dbx stacktrace we appear to be in the % operation in Graphical_object.__str__(). The only slightly suspect things I can see are (1) you're using recursive string-% (PyString_Format calls, internally) and (2) you're using floating point (half-serious: floating point is often less tested than plain integers). Guido: is PyString_Format recursion-safe, or could there be situations where it isn't? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin@cnri.reston.va.us Tue Jun 9 15:36:27 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 9 Jun 1998 10:36:27 -0400 (EDT) Subject: [XML-SIG] New release of xml package Message-ID: <13693.17897.363510.896792@newcnri.cnri.reston.va.us> I've put up a new version (0.2) of xml-package.tgz; it's at http://www.python.org/sigs/xml-sig/files/xml-package.tgz Major change: "make install" now actually works, at least for me on Linux. It may well break on your machine; please let me know how it goes for you. It creates an 'xml' subdirectory under site-packages and puts the .py files there; the *.so or *.sl files go into site-packages, but not into the xml package. Other notes: * What's the license on everything going to be? Stefane's DOM code is under LGPL; pyexpat is under Python's license; sgmlop is even less restrictive. Lars, what license are you using for your code? I'd argue for Python-style, but LGPL would also be tolerable, I think. * Implementing "make install" was ugly; not difficult, but ugly, since the top-level Makefile cd's into the pyexpat and sgmlop subdirectories and re-runs make there. I did it this way to preserve the pyexpat and sgmlop subdirectories, making it easier to drop in new versions of them. But the added complexity may be a pain, and it might be simpler to pull the C extensions up to the top-level directory, requiring only a single Makefile. * We're going to have do a test suite at some point. * Still to do: more documentation and sample code. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It was easier to keep myself from becoming a success as an actor. Critics were careful not to outrage my modesty by their praise, and the public scrupulously refused to debauch me with applause. I have thought about it a good deal, and my conclusion is that I was ahead of my time. Or behind it. Or something. -- Robertson Davies, "Shakespeare over the Port" From fredrik@pythonware.com Tue Jun 9 16:54:14 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 9 Jun 1998 16:54:14 +0100 Subject: [XML-SIG] New release of xml package Message-ID: <01bd93be$d9c04e10$f29b12c2@panik.pythonware.com> > * What's the license on everything going to be? Stefane's DOM >code is under LGPL; pyexpat is under Python's license; sgmlop is even >less restrictive. Lars, what license are you using for your code? >I'd argue for Python-style, but LGPL would also be tolerable, I think. IMO, all important Python extensions should use Python-style licenses. People had enough trouble using PIL 0.1 which used a slightly modified version of Python's license (it essentially said that you couldn't use PIL unless you also used Python, something that led to mails from company lawyers, etc.). And some people think that anything with a GPL in it is viral... Cheers /F fredrik@pythonware.com http://www.pythonware.com From guido@CNRI.Reston.Va.US Tue Jun 9 16:03:37 1998 From: guido@CNRI.Reston.Va.US (Guido van Rossum) Date: Tue, 09 Jun 1998 11:03:37 -0400 Subject: [XML-SIG] Bug in XML stuff? In-Reply-To: Your message of "Tue, 09 Jun 1998 13:04:52 +0200." References: Message-ID: <199806091503.LAA20828@eric.CNRI.Reston.Va.US> After Jack isolated the bug approximately, I've found the problem. There are two problems, really: (1) Han-Wen's code has a bug in his Dimension.__str__() method: it never returns a value; (2) Python's string formatting doesn't detect this, and ends up calling memcpy(res, buf, len) with len being -1 and buf being NULL. My apologies to Han-Wen. Here's a patch. Index: stringobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v retrieving revision 2.47 diff -c -r2.47 stringobject.c *** stringobject.c 1998/04/10 22:16:39 2.47 --- stringobject.c 1998/06/09 15:01:53 *************** *** 900,905 **** --- 900,910 ---- temp = PyObject_Str(v); if (temp == NULL) goto error; + if (!PyString_Check(temp)) { + PyErr_SetString(PyExc_TypeError, + "%s argument has non-string str()"); + goto error; + } buf = PyString_AsString(temp); len = PyString_Size(temp); if (prec >= 0 && len > prec) --Guido van Rossum (home page: http://www.python.org/~guido/) From fleck@informatik.uni-bonn.de Tue Jun 9 21:14:05 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Tue, 09 Jun 1998 22:14:05 +0200 Subject: [XML-SIG] New release of xml package References: <01bd93be$d9c04e10$f29b12c2@panik.pythonware.com> Message-ID: <357D978D.A33@informatik.uni-bonn.de> Fredrik Lundh wrote: > IMO, all important Python extensions should use Python-style licenses. This is what I thought too, and why I was shocked recently to learn that the new version of Fnorb is under a noncommercial-use-only license. Or did the DO-SIG also help create another "reference" implementation than Fnorb and ILU, under a more Python-style license? (Please reply to me privately, because this is obviously off-topic for the XML-SIG list.) > And some people think that anything with a GPL in it is viral... Is the DOM library separated well enough from the rest then? :-) Yours, Markus. -- //////////////////////////////////////////////////////////////////////////// Markus B Fleck - University of Bonn - CS Department IV - fleck@isoc.de UNIX Administrator - comp.lang.python.announce Moderator PINN Open Source Internet Groupware Project - http://cscw.net/pinn/ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From larsga@ifi.uio.no Thu Jun 11 22:05:59 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 11 Jun 1998 23:05:59 +0200 Subject: [XML-SIG] New release of xml package In-Reply-To: <13693.17897.363510.896792@newcnri.cnri.reston.va.us> References: <13693.17897.363510.896792@newcnri.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | Lars, what license are you using for your code? I'd argue for | Python-style, but LGPL would also be tolerable, I think. It will be Python-style, once I get round to actually writing a license (or adapting one, more likely). -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From larsga@ifi.uio.no Thu Jun 11 22:07:54 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 11 Jun 1998 23:07:54 +0200 Subject: [XML-SIG] saxlib finalization In-Reply-To: <13684.2557.922567.301586@newcnri.cnri.reston.va.us> References: <13684.2557.922567.301586@newcnri.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | A side issue on packages: what's the style of import usage | that we want to encourage? That is, do we want to encourage: | | [...] | | from xml.sax import saxlib | class MyHandler(saxlib.HandlerBase): This is what I use, and IMHO it's the most readable style. -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From rob@io.com Sun Jun 14 18:00:50 1998 From: rob@io.com (Rob Tillotson) Date: 14 Jun 1998 12:00:50 -0500 Subject: [XML-SIG] Debian Linux version of xml-package (and a bug) Message-ID: <87d8cb2a0t.fsf@io.com> Greetings! In case anyone is interested, I have just built the xml-package (version 0.2) for Debian GNU/Linux. A preliminary i386 package is available at the following URL: http://www.concentric.net/~n9mtb/python-xml_0.2-0.1_i386.deb It appears to work just fine on my machine, although I did discover a bug in the compilation process -- the stuff in xml/pyexpat/expat is not built using the proper compiler flags for shared libraries, because those flags aren't passed along to its local Makefile. Comments and bug reports (about the packaging... I have nothing to do with the contents :) may be sent directly to me; don't use the Debian bug reporting system as I haven't submitted python-xml to the distribution yet. Enjoy, --Rob -- Rob Tillotson N9MTB Internet: rob@io.com From fredrik@pythonware.com Tue Jun 16 12:04:38 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 16 Jun 1998 12:04:38 +0100 Subject: [XML-SIG] xml-sig publicity Message-ID: <000b01bd9916$8ea85a50$f29b12c2@pythonware.com> just noticed that we've been mentioned by Dave Winer's Scripting News: http://www.scripting.com Cheers /F fredrik@pythonware.com http://www.pythonware.com From akuchlin@cnri.reston.va.us Tue Jun 16 15:22:39 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 16 Jun 1998 10:22:39 -0400 (EDT) Subject: [XML-SIG] XML testing Message-ID: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> One of the things I really want to have in the first beta is some sort of test suite; I expect that having it will help in shaking out numerous bugs across different parsers and drivers. The simplest test would be to have a little well-formed XML document, and try parsing it via SAX; this would be as much of a test of the SAX implementation as of the parser. A more complete solution would be to run through James Clark's XML test suite and verify the output, but I don't think we can include the test suite in our distribution (it's also fairly large). Any thoughts on this? (Having finally completed some long-overdue revisions to my Web page, it's back to XML for me...) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Gryphon, you are old. Your flesh is meat, and the meat is decaying. Your bones are dry and brittle. Within you now, lion and eagle abandon their battle for dominance, and surrender to time and to the grave. -- The Furies kill the gryphon, in SANDMAN #64: "The Kindly Ones:8" From johnm@magnet.com Tue Jun 16 15:35:09 1998 From: johnm@magnet.com (John Mitchell) Date: Tue, 16 Jun 1998 10:35:09 -0400 (EDT) Subject: [XML-SIG] XML testing In-Reply-To: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> Message-ID: On Tue, 16 Jun 1998, Andrew Kuchling wrote: > One of the things I really want to have in the first beta is > some sort of test suite; I expect that having it will help in shaking > out numerous bugs across different parsers and drivers. > > The simplest test would be to have a little well-formed XML document, > and try parsing it via SAX; [...] I'd rate as "essential" a test suite of negative cases. That is, a sequence that is known as *illegal*, and are flagged as such. Silently accepting bogus code leads to much hair loss... The easiest test suite would just do a 'diff' of generated and known-good output, I think this is what Python's suite does. Comments? - j From Fred L. Drake, Jr." References: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> Message-ID: <13702.33833.335459.729411@weyr.cnri.reston.va.us> Andrew Kuchling writes: > One of the things I really want to have in the first beta is > some sort of test suite; I expect that having it will help in shaking > out numerous bugs across different parsers and drivers. ... > run through James Clark's XML test suite and verify the output, but I > don't think we can include the test suite in our distribution (it's I agree. There's no need to include the test data, just a script that takes the path to the test data as an argument (with an easily edited default ;) and runs the test. The README can include a pointer to the test data, just in case. Most users won't want to hack the internals or run the tests; just developers and the paranoid. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Sjoerd.Mullender@cwi.nl Tue Jun 16 16:16:36 1998 From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender) Date: Tue, 16 Jun 1998 17:16:36 +0200 Subject: [XML-SIG] XML testing In-Reply-To: Your message of Tue, 16 Jun 1998 10:22:39 -0400. <13702.28782.981364.277128@newcnri.cnri.reston.va.us> References: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> Message-ID: On Tue, Jun 16 1998 Andrew Kuchling wrote: > One of the things I really want to have in the first beta is > some sort of test suite; I expect that having it will help in shaking > out numerous bugs across different parsers and drivers. > > The simplest test would be to have a little well-formed XML document, > and try parsing it via SAX; this would be as much of a test of the SAX > implementation as of the parser. A more complete solution would be to > run through James Clark's XML test suite and verify the output, but I > don't think we can include the test suite in our distribution (it's > also fairly large). > > Any thoughts on this? See http://www.jclark.com/xml/ and get the file referred to by the link "test cases". -- Sjoerd Mullender From larsga@ifi.uio.no Tue Jun 16 18:39:23 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 16 Jun 1998 19:39:23 +0200 Subject: [XML-SIG] XML testing In-Reply-To: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> References: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | Any thoughts on this? We should definitely use James Clarks suite for testing (I already do), but I see no need to distribute it with the package. A URL will do fine. I also think John Mitchells comment makes a lot of sense. The testing should definitely produce canonical XML[1] and compare it against known good output. Possibly output from James Clarks XMLTest SAX application with a reliable parser (XP?). I do this for myself already on a small scale, but maybe we should do something a bit more organized. diff, BTW, won't work, since line breaks are replaced by in canonical XML. BTW: It should be noted that the output from validating parsers will differ from the output from non-validating parsers, because of whitespace normalization in attribute values. [1] Canonical XML is already produced by the saxdemo.py application and defined at -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From Fred L. Drake, Jr." References: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> Message-ID: <13702.46049.231110.30708@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > something a bit more organized. diff, BTW, won't work, since line > breaks are replaced by in canonical XML. Using cmp should be fine, but... > BTW: It should be noted that the output from validating parsers will > differ from the output from non-validating parsers, because of > whitespace normalization in attribute values. ... this introduces a problem. Perhaps correct output from validating and non-validing parsers should be handled separately. This is more disk space, but not for "normal" users. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From akuchlin@cnri.reston.va.us Tue Jun 16 20:06:30 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 16 Jun 1998 15:06:30 -0400 (EDT) Subject: [XML-SIG] XML testing In-Reply-To: <13702.46049.231110.30708@weyr.cnri.reston.va.us> References: <13702.28782.981364.277128@newcnri.cnri.reston.va.us> <13702.46049.231110.30708@weyr.cnri.reston.va.us> Message-ID: <13702.47405.167622.106305@newcnri.cnri.reston.va.us> Fred L. Drake writes: > Using cmp should be fine, but... Better to write Python code to do the comparison; that will make it easier to run the test suite on Windows and Mac machines. >... this introduces a problem. Perhaps correct output from validating >and non-validing parsers should be handled separately. This is more >disk space, but not for "normal" users. OK, but I certainly want to include at least one or two sample files with the base system, just so people have some reassurance that the compile worked. It looks like the best course is to have a testing script that walks over a directory tree (probably one arranged like the XML test suite) and processes all the files, checking that the output is what's expected. Another thing: a while back I proposed pulling the sgmlop and pyexpat modules into the root of the distribution to simplify compilation, even though that will require a bit more care when upgrading to new versions of those modules. Speak now or I'm going to do that in the next test release. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Aye, Master Orpheus. Well, they do say that two heads are better than one. -- Lady Johanna Constantine, in SANDMAN #29: "Thermidor" From Jack.Jansen@cwi.nl Tue Jun 16 22:47:33 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 16 Jun 1998 23:47:33 +0200 Subject: [XML-SIG] XML testing In-Reply-To: Message by Andrew Kuchling , Tue, 16 Jun 1998 15:06:30 -0400 (EDT) , <13702.47405.167622.106305@newcnri.cnri.reston.va.us> Message-ID: Recently, Andrew Kuchling said: > OK, but I certainly want to include at least one or two sample > files with the base system, just so people have some reassurance that > the compile worked. It looks like the best course is to have a > testing script that walks over a directory tree (probably one arranged > like the XML test suite) and processes all the files, checking that > the output is what's expected. How about the following: - Include a simple test file plus its canonical form - Include a test script that accepts two URLs, runs the first document through the parser and compares the output to the second - Include a master script that runs the test script on the simple file and on a number of web-based documents that are supposed to test the parser exhaustively. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From larsga@ifi.uio.no Wed Jun 17 23:24:27 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 18 Jun 1998 00:24:27 +0200 Subject: [XML-SIG] saxlib status report Message-ID: Since there were no complaints on the SAX extensions I've now started implementing and testing them. I've written a formal specification, which is available at (This is a newly installed Linux box where Apache has been running for about 4 hours now. Please notify me if there are any problems with accessing it.) Comments on this are very welcome. If there are none I'll release a SAX 1.0gamma once the *mllib-like interface has been implemented. I expect this to be on Monday, provided I am not eaten by bears or ghosts during the weekend. I've hacked a very simple SAX test system much like what has been discussed here lately and discovered that none of the XML parsers except xmlproc (and possibly Pyexpat, haven't been able to test it yet) normalize line breaks as they are supposed to do. (All sequences in input are to be passed on to the applications as .) Accounting for some known bugs in the parsers, the SAX drivers now pass this test system. Is the test code of interest? -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From akuchlin@cnri.reston.va.us Thu Jun 18 21:05:02 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Thu, 18 Jun 1998 16:05:02 -0400 (EDT) Subject: [XML-SIG] saxlib status report In-Reply-To: References: Message-ID: <13705.29229.522749.299385@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >Accounting for some known bugs in the parsers, the SAX drivers now >pass this test system. Is the test code of interest? Yes, I'd like to take a look at it; it might be a start to a test suite, or at least provide some ideas for things to check. Another method that might be useful for saxexts.py: whether or not the parser will read external entities. That would let you know that you'd have to process entity references yourself (if they're defined in the external subset). -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Watch me. This is me going to the door, just like I've done thousands and thousands of times in the past and none of those times was important, I can't even remember them as individual times, who remembers walking to the door...? And then I open the door. -- Lyta Hall, in SANDMAN #59: "The Kindly Ones:3" From Fred L. Drake, Jr." Ok, I've finally had time to take a quick look at the "combined" xml package, and have a few comments. The install target places all the Python modules directly in site-packages/xml/, which seems like a bug to me. (Solaris 2.6 patched, current Python, using the Python supplied installer script.) I've appended a patch to the Makefile.pre.in to fix this. The C modules are added to the site-packages/ directory; is there any reason for this? It seems that they should go into appropriate places in the tree. I remember ni didn't handle DLL modules off the standard path, but the built-in packages seem to handle it quite fine (I've been using that feature for a while now, at least). pyexpat and sgmlop should be more appropriately installed, and __init__.py files added where needed. Otherwise the build/install was very clean! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 *** Makefile.pre.in.orig Tue Jun 9 10:09:40 1998 --- Makefile.pre.in Fri Jun 19 17:45:56 1998 *************** *** 357,367 **** fi; done for i in `find *.py $(LIBSUBDIRS) -name '*.py' -print` ; do \ if test -x $$i; then \ ! $(INSTALL_PROGRAM) $$i $(LIBXML); \ ! echo $(INSTALL_PROGRAM) $$i $(LIBXML); \ else \ ! $(INSTALL_DATA) $$i $(LIBXML); \ ! echo $(INSTALL_DATA) $$i $(LIBXML); \ fi; \ done PYTHONPATH=$(LIBXML) \ --- 357,367 ---- fi; done for i in `find *.py $(LIBSUBDIRS) -name '*.py' -print` ; do \ if test -x $$i; then \ ! $(INSTALL_PROGRAM) $$i $(LIBXML)/$$i; \ ! echo $(INSTALL_PROGRAM) $$i $(LIBXML)/$$i; \ else \ ! $(INSTALL_DATA) $$i $(LIBXML)/$$i; \ ! echo $(INSTALL_DATA) $$i $(LIBXML)/$$i; \ fi; \ done PYTHONPATH=$(LIBXML) \ From akuchlin@cnri.reston.va.us Sun Jun 21 23:17:57 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: 21 Jun 1998 22:17:57 -0000 Subject: [XML-SIG] xml-package comments In-Reply-To: <13706.55919.406418.801851@weyr.cnri.reston.va.us> Message-ID: <19980621221757.7369.qmail@m2.findmail.com> Fred Drake wrote: > The C modules are added to the site-packages/ directory; is there > any reason for this? It seems that they should go into appropriate > places in the tree. I remember ni didn't handle DLL modules off the OK. pyexpat would reasonably go in xml.parsers; should sgmlop go there, too? And where should the sgmlop-aware versions of sgmllib and xmllib go? Should they also go in xml.parsers (even though sgmllib isn't an *XML* parser)? -- amk ----- Original Message: http://www.findmail.com/list/xml-sig/?start=189 Start a FREE e-mail list at http://www.FindMail.com/ From rob@io.com Mon Jun 22 13:47:38 1998 From: rob@io.com (Rob Tillotson) Date: 22 Jun 1998 07:47:38 -0500 Subject: [XML-SIG] xml-package comments In-Reply-To: "Andrew M. Kuchling"'s message of "21 Jun 1998 22:17:57 -0000" References: <19980621221757.7369.qmail@m2.findmail.com> Message-ID: <87yaupr4b9.fsf@io.com> "Andrew M. Kuchling" writes: > And where should the sgmlop-aware versions of sgmllib and xmllib go? > Should they also go in xml.parsers (even though sgmllib isn't an *XML* parser)? Personally, I think they ought to replace the standard ones, or some reasonable facsimile of same (i.e. arrange for them to be on sys.path before the standard library), so that they can be used by existing code without changes. In my Debian package I "divert" the original ones and write the new ones in their place, so that any program which does "import xmllib" or "import sgmllib" will get the new ones without changing anything. I support a .pth file or something similar would work, too. --Rob -- Rob Tillotson N9MTB Internet: rob@io.com From Fred L. Drake, Jr." References: <13706.55919.406418.801851@weyr.cnri.reston.va.us> <19980621221757.7369.qmail@m2.findmail.com> Message-ID: <13710.23903.838070.155447@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > OK. pyexpat would reasonably go in xml.parsers; should sgmlop go there, too? > > And where should the sgmlop-aware versions of sgmllib and xmllib go? > Should they also go in xml.parsers (even though sgmllib isn't an *XML* parser)? I think that's as reasonable a place as there is, unless we want to add an xml.misc package (probably not a good idea). -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From akuchlin@cnri.reston.va.us Mon Jun 22 15:20:10 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 22 Jun 1998 10:20:10 -0400 (EDT) Subject: [XML-SIG] xml-package comments In-Reply-To: <87yaupr4b9.fsf@io.com> References: <19980621221757.7369.qmail@m2.findmail.com> <87yaupr4b9.fsf@io.com> Message-ID: <13710.26234.247982.372920@newcnri.cnri.reston.va.us> Rob Tillotson writes: >Personally, I think they ought to replace the standard ones, or some >reasonable facsimile of same (i.e. arrange for them to be on sys.path >before the standard library), so that they can be used by existing >code without changes. In my Debian package I "divert" the original I think dropping them into the standard Python library directory is evil; re-install it, and your XML changes have gone. Unfortunately, unless Guido's essay on 1.5 packages is out of date, the directories in .pth files are added to the end of sys.path, and can't therefore can't override BTW, over the weekend I fixed the installation problems, started on a test/ directory (working similarly to Python's test suite, so we'll write test_saxlib or test_sgmlop or whatever), and split the HOWTO into reference and tutorial documents. Before the next release, I want to start on a demo/ directory, include the new versions of saxlib and Expat, and complete the test suite. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I must tell you that it will mean some change in your writing style. All four-letter words must be omitted and, in future, please no references to screwing, buggery or to any perverted acts. I admit that won't leave you much to write about, but that's the price of loyalty. -- Jack McClelland, in a letter to Mordecai Richler From fredrik@pythonware.com Mon Jun 22 16:44:45 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 22 Jun 1998 16:44:45 +0100 Subject: [XML-SIG] xml-package comments Message-ID: <052901bd9df4$ae6ad560$f29b12c2@pythonware.com> >Rob Tillotson writes: >>Personally, I think they ought to replace the standard ones, or some >>reasonable facsimile of same (i.e. arrange for them to be on sys.path >>before the standard library), so that they can be used by existing >>code without changes. In my Debian package I "divert" the original > > I think dropping them into the standard Python library >directory is evil; re-install it, and your XML changes have gone. Note that they were written with Python 1.5.2 (1.6, 2.0, whatever) in mind (if you don't have sgmlop, both sgmllib and xmllib work exactly as they did before). Dunno how to handle them before that release, though... Cheers /F From akuchlin@cnri.reston.va.us Mon Jun 22 15:45:38 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 22 Jun 1998 10:45:38 -0400 (EDT) Subject: [XML-SIG] xml-package comments In-Reply-To: <052901bd9df4$ae6ad560$f29b12c2@pythonware.com> References: <052901bd9df4$ae6ad560$f29b12c2@pythonware.com> Message-ID: <13710.28068.653791.329311@newcnri.cnri.reston.va.us> Fredrik Lundh writes: >Note that they were written with Python 1.5.2 (1.6, 2.0, whatever) >in mind (if you don't have sgmlop, both sgmllib and xmllib work exactly >as they did before). In which case, sgmlop can't go in xml.parsers, unless Guido consents to making those directories in Python 1.5.2, which is quite unlikely. sgmlop.so should therefore continue to live in site-packages, though pyexpat.so can be moved. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ No doubt, a scientist isn't necessarily penalized for being a complex, versatile, eccentric individual with lots of extra-scientific interests. But it certainly doesn't help him a bit. -- Stephen Toulmin From Fred L. Drake, Jr." References: <052901bd9df4$ae6ad560$f29b12c2@pythonware.com> Message-ID: <13710.29448.120652.356476@weyr.cnri.reston.va.us> Andrew Kuchling said: > I think dropping them into the standard Python library > directory is evil; re-install it, and your XML changes have gone. I agree with this. Fredrik Lundh writes: > Note that they were written with Python 1.5.2 (1.6, 2.0, whatever) > in mind (if you don't have sgmlop, both sgmllib and xmllib work exactly > as they did before). I'm not entirely convinced that they should be dropped in such that they override the modules in the standard library, but if they do become the standard modules, then that's probably the most livable solution. The right way to achieve this is something for which I have no clue. ;-( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From rob@io.com Mon Jun 22 16:36:02 1998 From: rob@io.com (Rob Tillotson) Date: 22 Jun 1998 10:36:02 -0500 Subject: [XML-SIG] xml-package comments In-Reply-To: Andrew Kuchling's message of "Mon, 22 Jun 1998 10:20:10 -0400 (EDT)" References: <19980621221757.7369.qmail@m2.findmail.com> <87yaupr4b9.fsf@io.com> <13710.26234.247982.372920@newcnri.cnri.reston.va.us> Message-ID: <87ogvlo3dp.fsf@io.com> Andrew Kuchling writes: > I think dropping them into the standard Python library > directory is evil; re-install it, and your XML changes have gone. Well, I did say "or a reasonable facsimile" *grin* (My Debian package will continue to put them in the main library too, but only because the packaging system deals with this problem.) There ought to be a way to do this without modifying the way sys.path is constructed and without touching the standard library, but since there apparently isn't, I guess we have to live with manually adding stuff to sys.path or importing "xml.parsers.xmllib" or something similar. > Unfortunately, unless Guido's essay on 1.5 packages is out of date, > the directories in .pth files are added to the end of sys.path, and > can't therefore can't override Looking at $(python_lib)/site.py, it does appear to still be the case. There ought to be a way to add stuff to the beginning of the path too, but using .pth files is already gross enough without making them more complicated :) --Rob -- Rob Tillotson N9MTB Internet: rob@io.com From akuchlin@cnri.reston.va.us Tue Jun 23 14:26:59 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 23 Jun 1998 09:26:59 -0400 (EDT) Subject: [XML-SIG] CFV for comp.text.xml Message-ID: <13711.43402.3419.319558@newcnri.cnri.reston.va.us> A small note that may be of interest to readers of this list: A Call For Votes for a comp.text.xml newsgroup has just been posted to comp.text.sgml, and various XML mailing lists. Forwarding CFVs is a violation of the rules regarding newsgroup creation, so if you read Usenet, you may wish to read the CFV on comp.text.sgml, and vote on it if you'd like to read the newsgroup. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The spreadsheet matrix is a creative prison bound by A1 and Z1000. Walls. A psychological prison. Unlike the Black Death, nobody sees this malady. There will be no cure. Soon it will be too late. -- John C. Dvorak From klm@python.org Tue Jun 23 17:08:27 1998 From: klm@python.org (Ken Manheimer) Date: Tue, 23 Jun 1998 12:08:27 -0400 (EDT) Subject: [XML-SIG] Lost maillist messages thursday eve (6/18) to friday noon (6/19) Message-ID: <199806231608.MAA13096@glyph.cnri.reston.va.us> Some of you may have lost postings posted to one of the following maillists between last thursday (6/18) evening and friday (6/19) noon. Mailman-Developers (1 msg) Matrix-SIG (8 msgs) DB-SIG (3 msgs) Doc-SIG (4 msgs) Pythonmac-SIG (3 msgs) XML-SIG (1 msg) Trove-Dev (6 msgs) This happened accompanying an upgrade of our maillist software, mailman, due to an bad interaction between a new mailman feature and an anti-spam (anti-relay) mechanism applied to python.org's sendmail configuration. This problem did not show during testing because our test recipients were all local, and not subject to the anti-relay constraints. If you sent something to any of these lists during that time frame and never saw it show, you may want to resend. Archiving was not affected, so you should be able to find the messages in the maillist archives. People receiving the lists in digest format were not affected, since the delivery problem was fixed before the digest delivery time. My apologies for the disruption! Ken Manheimer klm@python.org 703 620-8990 x268 (orporation for National Research |nitiatives # If you appreciate Python, consider joining the PSA! # # . # From larsga@ifi.uio.no Wed Jun 24 12:30:51 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Wed, 24 Jun 1998 13:30:51 +0200 (MET DST) Subject: [XML-SIG] saxlib 1.0 gamma released Message-ID: <199806241130.NAA20731@ifi.uio.no> I've just released the new version of saxlib (1.0gamma), which is a significant improvement over the beta release. A lot of bugs have been fixed and several new utilities and features have been introduced. There are also new drivers for: - sgmlop (Fredrik Lundhs C module) - sgmllib - htmllib - Dan Connollys XML parser - Pyexpat (the new version of the PyXMLTok module) Extensive new documentation has also been added to the home page. In short, everyone is recommended to upgrade to the new version. The URL is http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxlib.html --Lars M. From akuchlin@cnri.reston.va.us Thu Jun 25 15:41:33 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Thu, 25 Jun 1998 10:41:33 -0400 (EDT) Subject: [XML-SIG] Release 3 of XML package Message-ID: <13714.23691.899977.664642@newcnri.cnri.reston.va.us> I've just made available the latest omnibus package of the XML software; this is the third alpha release. Get it from: http://www.python.org/sigs/xml-sig/files/xml-package-0.3.tgz Please try it out, since we're starting to get really close with this one. You should be able to unpack, compile, and install it without too much difficulty. Once installed, you should then be able to start using it for XML work. Please send me a note (privately) if you install it, even if it worked without problems, so I can gauge how many people have tried it. Changes: * Now contains saxlib-1.0gamma, and the latest Expat. * Pulled the C modules into the root directory; this simplified building a *lot*. * Split the documentation into a HOWTO and a reference manual. See: http://www.python.org/doc/howto/xml/ http://www.python.org/doc/howto/xml-ref/ * Added demo/ and test/ directories. The framework for a test suite is there; we only need to write test_*.py files (similar to Python's test suite). It doesn't use Clark's XML test suite yet; that's next on my list. You would unpack xmltest in the test/ directory, and a test_xml.py module would check for its presence and loop over the tree. The missing pieces are now mostly secondary ones: * Make more passes over the documentation and improve it. I also have to produce plain ASCII versions and put them in the doc/ directory. * More demo programs. If you have a little XML processing program that would be of interest, let me know and I'll add it. * Flesh out the test suite; it should use Clark's test suite, and test all the available parsers. * Create a set of Web pages on www.python.org at /topics/xml/; these would be the major starting point for Python & XML. Open questions: * Licensing: Python-style? * Tar file name: I've been calling it xml-package, which is a bit long. xml-py-0.3.tgz? (Too cryptic?) * Is there anything else that should be added to the package? Lars has an experimental XPointer implementation; should it go in? What about a saxlib driver for some Java parsers? Anything else? Last call... Otherwise, we're almost on track for my earlier schedule. For the next release I'm working toward better documentation and test suite, and hopefully some demo programs will appear. The next release would then be 0.4; I'd wait a week to see if any problems get reported, and if none appear, then call 0.4 the first beta release. 0.4 would then be the first release that would get announced on comp.lang.python, since the code will finally be ready for general use and testing. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ If there is anything the nonconformist hates worse than a conformist it's another nonconformist who doesn't conform to the prevailing standard of nonconformity. -- Bill Vaughan From rob@io.com Thu Jun 25 23:08:30 1998 From: rob@io.com (Rob Tillotson) Date: 25 Jun 1998 17:08:30 -0500 Subject: [XML-SIG] Debian GNU/Linux version of XML package 0.3 Message-ID: <874sx9kuch.fsf@io.com> I've just finished Debianizing (for i386) the latest version of the XML package. Anyone interested can get it at: http://www.concentric.net/~n9mtb/python-xml_0.3-0.1_i386.deb Please report bugs to me directly. I hope to submit python-xml to the distribution as soon as it goes beta, if the licensing issue is resolved by then. Enjoy, --Rob -- Rob Tillotson N9MTB Internet: rob@io.com From pvelikho@cs.ucsd.edu Tue Jun 30 05:10:15 1998 From: pvelikho@cs.ucsd.edu (Pavel Velikhov) Date: Mon, 29 Jun 1998 21:10:15 -0700 (PDT) Subject: [XML-SIG] DTD processing Message-ID: <199806300410.VAA13340@tartarus.ucsd.edu> I have a question about DTDs and Python XML developments : I am working on an XML application that needs to browse and create DTDs for documents. As far as I understand SAX and DOM interfaces don't offer any DTD manipulation capabilities, so I will have to build some things on my own. So the question is - is there a way to build my own DTD manipulation module and reuse some of the code developed for the Python XML parser? (I want to avoid going to yacc at least). Thank you Pavel Velikhov pvelikho@cs.ucsd.edu From larsga@ifi.uio.no Tue Jun 30 12:13:10 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Tue, 30 Jun 1998 13:13:10 +0200 Subject: [XML-SIG] Re: DTD processing In-Reply-To: <199806300410.VAA13340@tartarus.ucsd.edu> Message-ID: <3.0.1.32.19980630131310.00710e4c@ifi.uio.no> * Pavel Velikhov > >I have a question about DTDs and Python XML developments : I am working on an >XML application that needs to browse and create DTDs for documents. As far >as I understand SAX and DOM interfaces don't offer any DTD manipulation capabilities, The DOM working draft has this, but PyDOM does not. (Probably because xmlproc is the only parser that offers DTD access, and this is been badly documented and sort of difficult to find.) I'm thinking of making a DOM-compliant DTD interface to xmlproc, and integrating it with PyDOM somehow, but haven't gotten round to it yet. Stephane, if you want to do this, that would probably be the best. >so I will have to build some things on my own. So the question is - is there a >way to build my own DTD manipulation module and reuse some of the code developed for >the Python XML parser? xmlproc gives you both a low-level interface for parsing events and a higher-level one for querying parsed DTDs. The latter is a little incomplete, since I've never really used it for anything except the validation and since I've so far focused on other things. In other words: feedback on this interface will probably lead to it being changed to fit your needs. So, what you can do right now is to go to http://www.stud.ifi.uio.no/~larsga/download/python/xml/xmlproc.html and download xmlproc 0.40. In xmlapp.py you'll find the DTDConsumer interface that receives DTD parse events. The DTDParser itself is in xmlproc.py. The interfaces for parsed DTDs are in xmldtd.py. I've just documented the DTD interfaces and modified them slightly. The documentation will be available when I release 0.50, which should be RSN. --Lars M.