Introduction [...] </Chapter> &publ-today; &techno-back; &architecture; &platform; &evaluation; </Book> | would it be easier to write a Python script | which rips off the extra DOCTYPE elements. I don't think so. :) --Lars M. From mss@transas.com Sat May 15 20:01:58 1999 From: mss@transas.com (Michael Sobolev) Date: Sat, 15 May 1999 23:01:58 +0400 Subject: [XML-SIG] a simple SGML question (off-topic) In-Reply-To: <004f01be9f02$2ab46910$f29b12c2@pythonware.com>; from Fredrik Lundh on Sat, May 15, 1999 at 08:38:44PM +0200 References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> <004f01be9f02$2ab46910$f29b12c2@pythonware.com> Message-ID: <19990515230158.A8598@transas.com> On Sat, May 15, 1999 at 08:38:44PM +0200, Fredrik Lundh wrote: > #include "chapter1.sgm" > #include "chapter2.sgm" <!DOCTYPE ... [ <!ENTITY chapter1 SYSTEM "chapter1.sgm"> ]> ... &chapter1; ... But I believe your files chapter* should not have any <!doctype>s. -- Mike From fredrik@pythonware.com Sun May 16 15:05:04 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 16 May 1999 16:05:04 +0200 Subject: [XML-SIG] a simple SGML question (off-topic) References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> <004f01be9f02$2ab46910$f29b12c2@pythonware.com> <19990515230158.A8598@transas.com> Message-ID: <006301be9fa5$3713fe70$f29b12c2@pythonware.com> Michael Sobolev <mss@transas.com> wrote: > On Sat, May 15, 1999 at 08:38:44PM +0200, Fredrik Lundh wrote: > > #include "chapter1.sgm" > > #include "chapter2.sgm" > <!DOCTYPE ... [ > <!ENTITY chapter1 SYSTEM "chapter1.sgm"> > ]> > > ... > &chapter1; > ... just what I needed! > > But I believe your files chapter* should not have any <!doctype>s. I was just about to answer that I need doctypes to be able to edit the individual chapters when I realized that my SGML editor did the right thing when I loaded the master document. extremely cool! Thanks /F From Shane.Burrell@metrostat.net Thu May 20 01:11:10 1999 From: Shane.Burrell@metrostat.net (Shane Burrell) Date: Wed, 19 May 1999 20:11:10 -0400 Subject: [XML-SIG] Anyone doing any XML for real estate? Message-ID: <000001bea255$438d82e0$1602a8c0@singer> Shane Burrell Software Engineer/Systems Administrator - Metrostat Technologies, Inc. From wask@mcc.com Thu May 20 16:01:42 1999 From: wask@mcc.com (wask@mcc.com) Date: Thu, 20 May 1999 10:01:42 -0500 Subject: [XML-SIG] Installing Python/XML on NT Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB2B@brazil.mcc.com> [Apologies for double posting - I sent this to the wrong reflector earlier.] Could someone point me to instructions for installing XML/Python 0.5.1 on NT? (I looked - honest, I really did - but only found UNIX instructions.) Thanks, Fred From larsga@ifi.uio.no Fri May 21 12:51:47 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 13:51:47 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14128.29972.689027.500572@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> Message-ID: <wk675mzlm4.fsf@ifi.uio.no> I posted an easySAX proposal here a while ago, in response to various requests for SAX extensions/changes, mainly from Paul Prescod. I've used it a little myself in the meantime and have found it to be a major improvement for direct programming compared to pure SAX. However, before I do anything more about this it would be nice to know what the rest of the XML-SIG is thinking. Would anyone be unhappy if SAX were kept as it is, only updated with minor changes and extended to follow Java SAX 2.0 AND easySAX were provided as the easy-to-use alternative, built on top of SAX? easySAX would, modulo any suggestions, be as proposed in <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> with start_*/end_*/pi_*/ppi_* methods. * Fred L. Drake | | It might be a good idea to use a call to extract the element stack | instead of providing direct access to the list object. This would | allow different internal structures to be used without changing the | interface. This might be interesting in some cases. I didn't do this out of a worry about speed, but now I think we should do this. Anyone who is concerned about speed can just take the risk and access the underlying stack directly anyway. Any other opinions? Also, do we need to do anything in particular to deal with namespaces here? Should we reserve a namespace-URI callback argument to slot them into when SAX 2.0 is in place? As for packaging, I think this should be a separate package from SAX itself. Other convenient interfaces on top of SAX are both possible and desirable, and I certainly don't want to monopolize that space with easySAX or appear to do so. If nobody protests I'll go ahead and do this, although I'd feel much easier about it if people actually voiced support for this. Andrew, do you think this belongs in the XML package? --Lars M. From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 16:51:05 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 11:51:05 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <wk675mzlm4.fsf@ifi.uio.no> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> Message-ID: <14149.33001.688171.408570@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> > > with start_*/end_*/pi_*/ppi_* methods. Lars, Have you written a version with the dispatcher to drive these methods? I don't think there's a lot of code, but if you've already written it, it might be nice for people to have a chance to play with it. This would be especially good to play with for those of us with a lot of code based on the xmllib API, which I find I still use. > Also, do we need to do anything in particular to deal with namespaces > here? Should we reserve a namespace-URI callback argument to slot them > into when SAX 2.0 is in place? Sigh. I'm very undecided about namespaces. The concept is really good, but I've shied away from using them. Building all the support for documents that are likely to use several (known) namespaces that all need special processing is still a pain, especially using the event-based interfaces (SAX, xmllib). I'd be more likely to use the DOM if I care about namespaces (and I almost cared about them the other day!). > As for packaging, I think this should be a separate package from SAX > itself. Other convenient interfaces on top of SAX are both possible How about xml.easysax? > If nobody protests I'll go ahead and do this, although I'd feel much > easier about it if people actually voiced support for this. Andrew, do > you think this belongs in the XML package? In general, I think it's a good idea. Perhaps the first cut can simply be a module that gets posted to the list; if it's well received, it can be added to the XML omnibus package. My name isn't Andrew, and chances are good it won't ever be, but *I* think it belongs there if people are likely to want to use it. I think we should avoid the Perl-XML problem, with lots of different packages that people need to update independently. I'd like to be able to say "this software requires the Python XML package: ftp://ftp.python.org/..." and be done with it. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From larsga@ifi.uio.no Fri May 21 17:19:10 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 18:19:10 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14149.33001.688171.408570@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> Message-ID: <wkpv3uxuo1.fsf@ifi.uio.no> * Lars Marius Garshol | | <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> | | with start_*/end_*/pi_*/ppi_* methods. * Fred L. Drake | | Have you written a version with the dispatcher to drive these | methods? Not yet. I was planning to do it if the proposal was accepted. | I don't think there's a lot of code, but if you've already written | it, it might be nice for people to have a chance to play with it. | This would be especially good to play with for those of us with a | lot of code based on the xmllib API, which I find I still use. I'll post it the moment I complete it, provided I do complete it at all. * Lars Marius Garshol | | Also, do we need to do anything in particular to deal with | namespaces here? Should we reserve a namespace-URI callback argument | to slot them into when SAX 2.0 is in place? * Fred L. Drake | | Sigh. I'm very undecided about namespaces. The concept is really | good, but I've shied away from using them. This is more or less my reaction as well, although I'm no fan of the form the actual final Recommendation got, nor of the place it seems to occupy in people's minds. | Building all the support for documents that are likely to use | several (known) namespaces that all need special processing is still | a pain, especially using the event-based interfaces (SAX, xmllib). This might be true, although I haven't tried or even thought much about it. Some splitting filter that sends different namespaces to different handlers might help, but I feel the basic problem is that none of these have any provision for namespaces at all. And that is what I want to avoid if we do easySAX now, since we won't be able to insert this parameter later without breaking things. But maybe we should have a separate NamespaceAwareDocumentHandler instead? One nice thing would be if the parse method automagically detected which kind of handler it received as a parameter and then either applied or did not apply namespace processing. * Lars Marius Garshol | | As for packaging, I think this should be a separate package from SAX | itself. Other convenient interfaces on top of SAX are both possible * Fred L. Drake | | How about xml.easysax? Sounds good to me, although I've already used the file name ezsax on my own disk. However, what I really meant was that I thought this should be a separate release, with its own home page, ZIP file and version history. * Lars Marius Garshol | | If nobody protests I'll go ahead and do this, although I'd feel much | easier about it if people actually voiced support for this. Andrew, | do you think this belongs in the XML package? * Fred L. Drake | | Perhaps the first cut can simply be a module that gets posted to the | list; if it's well received, it can be added to the XML omnibus | package. That's certainly an alternative, and it's by no means incompatible. One reason I'd like it to be more than just something posted to the list is that then it becomes easier to document it, refer to it and also to discover it for new users. Also, if this becomes widely used I suppose the more speed-conscious may want to bypass SAX entirely and write easySAX drivers on top of a parser. I think this is especially interesting in a JPython context, where it can be built on top of Java SAX. Opinions on this are welcome. Personally, I like packages that are available separately as well as a part of a bigger lump. | I think we should avoid the Perl-XML problem, with lots of different | packages that people need to update independently. I'd like to be | able to say "this software requires the Python XML package: | ftp://ftp.python.org/..." and be done with it. I certainly agree with this. --Lars M. From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 17:35:57 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 12:35:57 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <wkpv3uxuo1.fsf@ifi.uio.no> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> Message-ID: <14149.35693.275585.44713@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > I'll post it the moment I complete it, provided I do complete it at > all. If you decide not to spend time on it, send a note to the list so someone else can pick it up. > This is more or less my reaction as well, although I'm no fan of the > form the actual final Recommendation got, nor of the place it seems to > occupy in people's minds. Yes, the Rec was rather poor, both in the technical content and the writting. > about it. Some splitting filter that sends different namespaces to > different handlers might help, but I feel the basic problem is that This is what I thought might be doable and possbly workable, but its not entirely clear to me how to work with it still. There's still a lot of setup required for the application to make things work nicely. > Sounds good to me, although I've already used the file name ezsax on > my own disk. However, what I really meant was that I thought this > should be a separate release, with its own home page, ZIP file and > version history. Hm. I think it should exist within the "xml" Python package, regardless of the external packaging. I'm not sure how multiple distributions should treat sharing of the Python package space. I do *not* like having a single module that uses different names based on the separate or omnibus distributions. This is probably something the distutils-sig should deal with. > That's certainly an alternative, and it's by no means incompatible. > One reason I'd like it to be more than just something posted to the > list is that then it becomes easier to document it, refer to it and > also to discover it for new users. I was thinking of this as a temporary "do we really want this" approach; post it after writing, and package it if people are actually interested in it. [Andrew: Are you the only person with write access to the CVS repository? It would be easier to add things for experimental periods if it was easier to add to the repository. Whether this would be useful depends on just what place you think the omnibus package has.] -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From paul@prescod.net Fri May 21 18:51:43 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 12:51:43 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> Message-ID: <37459D2F.7178EC99@prescod.net> Lars Marius Garshol wrote: > > What do people think? Is this better than adding the suggested > improvements to the SAX core? This was just hacked together in 15 > minutes, so please don't hesitate to slaughter it if you don't like > it. I'm not thrilled with the fact that it requires an explicit adapter instead of a simple base class. My counter-proposal is that easySax be a base class that defines startElement, endElement and characters. easySax "clients" would define start_Foo, end_Foo,..., startUnknown, endUnknown processingInstruction and "text", where text is defined as a Python programmer would expect: as a simple string without the index junk. What you do with captured text is highly context specific. What if we had TITLE_text, BODY_text, FOO_text and Unknowntext. Then if Unknowntext isn't defined we wouldn't be storing away little useless text snippets all of the time (e.g. if we were just looking for titles). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From larsga@ifi.uio.no Fri May 21 19:50:52 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 20:50:52 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14149.35693.275585.44713@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <wkemkaxnn7.fsf@ifi.uio.no> * Fred L. Drake | | Hm. I think it should exist within the "xml" Python package, | regardless of the external packaging. I'm not sure how multiple | distributions should treat sharing of the Python package space. I | do *not* like having a single module that uses different names based | on the separate or omnibus distributions. That wouldn't happen in any case (just as it hasn't with saxlib and xmlproc, both of which use the same whether inside or outside the omnibus package). | I was thinking of this as a temporary "do we really want this" | approach; post it after writing, and package it if people are | actually interested in it. OK, I'll go ahead and do that. If I give up I'll notify the list. --Lars M. From larsga@ifi.uio.no Fri May 21 19:51:43 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 20:51:43 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <37459D2F.7178EC99@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <37459D2F.7178EC99@prescod.net> Message-ID: <wkd7zuxnls.fsf@ifi.uio.no> * Lars Marius Garshol | | What do people think? Is this better than adding the suggested | improvements to the SAX core? This was just hacked together in 15 | minutes, so please don't hesitate to slaughter it if you don't like | it. * Paul Prescod | | I'm not thrilled with the fact that it requires an explicit adapter | instead of a simple base class. Hmmm. Why do you see this as a disadvantage? Part of the reason I did it as I did is that I want the user to be able to redefine startElement and endElement without messing up the framework. Knowing why you don't like the adapater would make the tradeoff easier. | My counter-proposal is that easySax be a base class that defines | startElement, endElement and characters. | | easySax "clients" would define start_Foo, end_Foo,..., startUnknown, | endUnknown processingInstruction and "text", where text is defined | as a Python programmer would expect: as a simple string without the | index junk. Hmmm. I feel uneasy about the *Uknown methods, but I suppose they will have their uses. | What you do with captured text is highly context specific. What if | we had TITLE_text, BODY_text, FOO_text and Unknowntext. Then if | Unknowntext isn't defined we wouldn't be storing away little useless | text snippets all of the time (e.g. if we were just looking for | titles). Paul, thanks you! I think this is the idea I've been looking for ever since I started thinking about making something like easySAX. If we pass in attributes as well here it means that for the small leaf elements (which in data-oriented XML are usually the important ones) you have all the information you need in one callback. It would also mean that passing an unsliced strings to characters in the real SAX probably will pay off, since as you say we will now only slice the strings when we actually need them. (Except I think I prefer text_TITLE and textUnknown.) I'll give the interface another rotation and then post it again. (More comments are of course very welcome.) --Lars M. From wask@mcc.com Fri May 21 20:36:21 1999 From: wask@mcc.com (wask@mcc.com) Date: Fri, 21 May 1999 14:36:21 -0500 Subject: [XML-SIG] JPython / xmllib issue ???? Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> Hello, Because I don't know how to load the NT version of the XML 0.5.1 package, I decided to "brute force it" and use xmllib - which is ok as I'm teaching myself the basics. However, I keep stumbling into a problem using xmllib from JPython, a problem I don't see using Python. Particulars are noted below for those interested. Is this a cockpit error or a JPython issue? Any help would be most appreciated. [The carrot: I work for a research firm for several large international corps. I'm trying to assess this technology's viability for incorporation into a large project.] Much thanks in advance, Fred ***** The simple XML file (should look awfully familiar) ***** <?xml version="1.0"?> <COLLECTION> <COMIC TITLE="Sandman" NUMBER="62"> <WRITER>Neil Gaman</WRITER> <PENCILLER PAGES="1-9, 18-24">Glyn Dillon</PENCILLER> <PENCILLER PAGES="10-17">Charles Vess</PENCILLER> </COMIC> </COLLECTION> ***** Code snippet ***** try: testFile = open ('Comics.xml', 'rw') except IOError, detail: print '***IO ERROR> ', detail parser = xmllib.XMLParser () data = testFile.read() parser.feed(data) ***** The problem ***** None if running Python. If running from a JPython script --- Traceback (innermost last): <snip my stuff> File "...\xmllib.py", line 149, in feed File "...\xmllib.py", line 240, in goahead File "...\xmllib.py", line 610, in parse_starttag IndexError: group 7 is undefined From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 20:40:14 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 15:40:14 -0400 (EDT) Subject: [XML-SIG] JPython / xmllib issue ???? In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> Message-ID: <14149.46750.170061.508059@weyr.cnri.reston.va.us> wask@mcc.com writes: > If running from a JPython script --- > > Traceback (innermost last): > <snip my stuff> > File "...\xmllib.py", line 149, in feed > File "...\xmllib.py", line 240, in goahead > File "...\xmllib.py", line 610, in parse_starttag > IndexError: group 7 is undefined I get a different error: Traceback (innermost last): File "snippet.py", line 1, in ? File "/depot/java/share/JPython-1.0/Lib/xmllib.py", line 60, in ? File "/depot/java/share/JPython-1.0/Lib/string.py", line 13, in maketrans NameError: maketrans not yet implemented in JPython -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From akuchlin@cnri.reston.va.us Fri May 21 20:48:51 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 21 May 1999 15:48:51 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <14149.35693.275585.44713@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <14149.46016.398382.370758@amarok.cnri.reston.va.us> Fred L. Drake writes: > [Andrew: Are you the only person with write access to the CVS >repository? The public CVS tree is mirrored from the CVS tree on my machine at home, so the answer is yes. :) I certainly think easySAX would be a good addition to the XML package. It would be even better if easySAX was small enough to be added to the Python library. One problem is that you have to choose between using the XML package and just xmllib.py, particularly for applications that aren't aimed at XML-aware users, but simply use XML internally. For example, I'm starting work on a GUI editor for recipes using the DTD previously discussed here, and it's a difficult decision to require neophyte users to install the XML package; I may end up just using xmllib.py to parse input to avoid requiring the installation of another package. (On the other hand, the true fix for this is probably to finish the distutils work and make it much easier to install Python extensions.) -- A.M. Kuchling http://starship.python.net/crew/amk/ Considered in its entirety, psychoanalysis won't do. It is an end product, moreover, like a dinosaur or a zeppelin; no better theory can ever be erected on its ruins, which will remain for ever one of the saddest and strangest of all landmarks in the history of twentieth century thought. -- Sir Peter Medawar From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 21:02:18 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 16:02:18 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <14149.46016.398382.370758@amarok.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> Message-ID: <14149.48074.473510.593070@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > The public CVS tree is mirrored from the CVS tree on my > machine at home, so the answer is yes. :) Perhaps this can be moved to cvs.python.org or the starship, if you don't object? > I certainly think easySAX would be a good addition to the XML > package. It would be even better if easySAX was small enough to be I understood easySAX to depend on the parsers from the xml package; if it is, then adding it to the standard library won't help unless it includes a driver for xmllib (at which point you may as well just use xmllib). > added to the Python library. One problem is that you have to choose > between using the XML package and just xmllib.py, particularly for > applications that aren't aimed at XML-aware users, but simply use XML This reminds me: I still want to look at xml.parsers.xmllib to make the interface match that of xmllib. While I may complain about namespaces and the interface for them, proliferating incompatible interfaces won't help the situation. Not sure when I'll have time; I really need to update t1python now that I've updated my Linux installation at home. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From paul@prescod.net Fri May 21 22:07:45 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 16:07:45 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> Message-ID: <3745CB21.FC8A9668@prescod.net> "Fred L. Drake" wrote: > > I understood easySAX to depend on the parsers from the xml package; > if it is, then adding it to the standard library won't help unless it > includes a driver for xmllib (at which point you may as well just use > xmllib). Why? easySax allows the person to move to another parser if they want. It's an abstraction over xmllib that gives them more freedom of choice. Since I don't believe xmllib is a complete, standards-conformant parser I think that is important. In fact, I'd like to see easySax put on top of sgmlop and promoted as the "standard" Python/XML integration for Python 1.6. Maybe by Python 2 we would move to something larger like expat. So how about that? easySax and sgmlop in Python 1.6. xmllib's interface is deprecated. Additional parsers and handlers can be downloaded as part of the xml sig distribution? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 22:55:32 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 17:55:32 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <3745CB21.FC8A9668@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> Message-ID: <14149.54868.802029.691711@weyr.cnri.reston.va.us> Paul Prescod writes: > Why? easySax allows the person to move to another parser if they want. > It's an abstraction over xmllib that gives them more freedom of choice. I'm fine with this. > In fact, I'd like to see easySax put on top of sgmlop and promoted as the > "standard" Python/XML integration for Python 1.6. Maybe by Python 2 we This presents a very real problem: xmllib is already standard and documented, and therefore "in use". Deprecating it is a problem because people will need to update their code for what will probably be a mostly minimal difference (for existing code). Updating what's currently xml.parsers.xmllib to the documented xmllib interface and using that as the standard xmllib would be a big improvement, esp. with sgmlop in the core. That's not to say an additional API can't be added, but a second event-based interface is not necessarily a good idea. Perhaps a compromise API can be created which extends the xmllib interface with the pi_*(), ppi_*(), and text_*() methods? Extending the existing interface is not a problem as far as I can tell. It can still be highly efficient, especially if we allow handle_data() to be undefined. > So how about that? easySax and sgmlop in Python 1.6. xmllib's interface is > deprecated. Additional parsers and handlers can be downloaded as part of > the xml sig distribution? As long as the base easySAX can accept arbitrary backends I'm still happy. There's no reason not to allow the xmllib.XMLParser to support arbitrary backends as well, with the default being the current implementation (or something compatible). -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From gstein@lyra.org Fri May 21 23:41:51 1999 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 May 1999 15:41:51 -0700 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <3745E12F.69CE31E7@lyra.org> Fred L. Drake wrote: >... > [Andrew: Are you the only person with write access to the CVS > repository? It would be easier to add things for experimental periods > if it was easier to add to the repository. Whether this would be > useful depends on just what place you think the omnibus package has.] I'm set up to provide a read/write CVS repository to multiple projects and people. This could be particularly handy for non-CNRI contributors since CNRI-based repositories have access restrictions. Right now, I'm running the mod_dav project from my CVS system, but it has already been configured for more projects with per-person per-project access control. The system is also configured for sending email when checkins occur. I will happily host any Python-related or WebDAV-related project on my CVS server (and other facilities). They can live under the lyra.org, webdav.org, or pythonpros.com domains. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Fri May 21 23:36:02 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 17:36:02 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <37459D2F.7178EC99@prescod.net> <wkd7zuxnls.fsf@ifi.uio.no> Message-ID: <3745DFD2.C5D08C0E@prescod.net> Lars Marius Garshol wrote: > > Hmmm. Why do you see this as a disadvantage? Part of the reason I did > it as I did is that I want the user to be able to redefine > startElement and endElement without messing up the framework. Knowing > why you don't like the adapater would make the tradeoff easier. * a tiny bit of extra overhead, * some extra typing, * the fact that the user has to explicitly invoke a bridge between easy sax and "real sax", * I would prefer using easySax to be more like using regular sax and also more like using SAX in Java, * you can't get the default implementation for, e.g. PI * it makes easySax somewhat "second class" Nothing major. Just a vague discomfort. > (Except I think I prefer text_TITLE and textUnknown.) Well English isn't your native language. :) No, actually that's fine with me. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From paul@prescod.net Fri May 21 23:48:37 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 17:48:37 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> Message-ID: <3745E2C5.B55CC39D@prescod.net> "Fred L. Drake" wrote: > > This presents a very real problem: xmllib is already standard and > documented, and therefore "in use". Deprecating it is a problem > because people will need to update their code for what will probably > be a mostly minimal difference (for existing code). I was thinking that deprecating it would just mean that new people would stop using it. As regex says: "This module is obsolete as of Python version 1.5; it is still being maintained because much existing code still uses it." > That's not to say an additional API can't be added, but a second > event-based interface is not necessarily a good idea. Perhaps a > compromise API can be created which extends the xmllib interface with > the pi_*(), ppi_*(), and text_*() methods? Extending the existing > interface is not a problem as far as I can tell. I kind of think that the current interface is too large and complicated already. easySax was going to be something like 6 or 8 callbacks. xmllib is already something like 16 or 17. Another option would be to merge the interfaces but deprecate all but the 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many others will never be triggered by a sax parser (even sgmlop, if it is talking to xmllib via sax). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From Bharat" <bharatrs@vsnl.com Sun May 23 00:54:13 1999 From: Bharat" <bharatrs@vsnl.com (Bharat) Date: Sat, 22 May 1999 19:54:13 -0400 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #297 - 2 msgs References: <199905220503.BAA28568@python.org> Message-ID: <000601bea4ba$60873aa0$1d38c5cb@bharatrs> subscribe ----- Original Message ----- From: <xml-sig-admin@python.org> To: <xml-sig@python.org> Sent: Saturday, May 22, 1999 1:03 AM Subject: XML-SIG digest, Vol 1 #297 - 2 msgs > > Send XML-SIG mailing list submissions to > xml-sig@python.org > > To subscribe or unsubscribe via the web, visit > http://www.python.org/mailman/listinfo/xml-sig > or, via email, send a message with subject or body 'help' to > xml-sig-request@python.org > You can reach the person managing the list at > xml-sig-admin@python.org > > When replying, please edit your Subject line so it is more specific than > "Re: Contents of XML-SIG digest...") > > > Today's Topics: > > 1. Re: easySAX (Paul Prescod) > 2. Re: easySAX (Paul Prescod) > > --__--__-- > > Message: 1 > Date: Fri, 21 May 1999 17:36:02 -0500 > From: Paul Prescod <paul@prescod.net> > To: xml-sig@python.org > Subject: Re: [XML-SIG] easySAX > > Lars Marius Garshol wrote: > > > > Hmmm. Why do you see this as a disadvantage? Part of the reason I did > > it as I did is that I want the user to be able to redefine > > startElement and endElement without messing up the framework. Knowing > > why you don't like the adapater would make the tradeoff easier. > > * a tiny bit of extra overhead, > * some extra typing, > * the fact that the user has to explicitly invoke a bridge between easy > sax and "real sax", > * I would prefer using easySax to be more like using regular sax and also > more like using SAX in Java, > * you can't get the default implementation for, e.g. PI > * it makes easySax somewhat "second class" > > Nothing major. Just a vague discomfort. > > (Except I think I prefer text_TITLE and textUnknown.) > > Well English isn't your native language. :) > > No, actually that's fine with me. > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Alabama's constitution is 100 years old, 300 pages long and has more than > 600 amendments. Highlights include "Amendment 393: Amendment of Amendment > No. 351", "Validation of Laws Regulating Court Costs in Randolph County", > "Miscegenation laws", "Bingo Games in Russell County", "Suppression > of dueling". - http://www.legislature.state.al.us/ALISHome.html > > --__--__-- > > Message: 2 > Date: Fri, 21 May 1999 17:48:37 -0500 > From: Paul Prescod <paul@prescod.net> > To: xml-sig@python.org > Subject: Re: [XML-SIG] easySAX > <14128.29972.689027.500572@weyr.cnri.reston.va.us> > <wk675mzlm4.fsf@ifi.uio.no> > <14149.33001.688171.408570@weyr.cnri.reston.va.us> > <wkpv3uxuo1.fsf@ifi.uio.no> > <14149.35693.275585.44713@weyr.cnri.reston.va.us> > <14149.46016.398382.370758@amarok.cnri.reston.va.us> > <14149.48074.473510.593070@weyr.cnri.reston.va.us> > <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> > > "Fred L. Drake" wrote: > > > > This presents a very real problem: xmllib is already standard and > > documented, and therefore "in use". Deprecating it is a problem > > because people will need to update their code for what will probably > > be a mostly minimal difference (for existing code). > > I was thinking that deprecating it would just mean that new people would > stop using it. As regex says: "This module is obsolete as of Python > version 1.5; it is still being maintained because much existing code still > uses it." > > > That's not to say an additional API can't be added, but a second > > event-based interface is not necessarily a good idea. Perhaps a > > compromise API can be created which extends the xmllib interface with > > the pi_*(), ppi_*(), and text_*() methods? Extending the existing > > interface is not a problem as far as I can tell. > > I kind of think that the current interface is too large and complicated > already. easySax was going to be something like 6 or 8 callbacks. xmllib > is already something like 16 or 17. > > Another option would be to merge the interfaces but deprecate all but the > 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many > others will never be triggered by a sax parser (even sgmlop, if it is > talking to xmllib via sax). > > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Alabama's constitution is 100 years old, 300 pages long and has more than > 600 amendments. Highlights include "Amendment 393: Amendment of Amendment > No. 351", "Validation of Laws Regulating Court Costs in Randolph County", > "Miscegenation laws", "Bingo Games in Russell County", "Suppression > of dueling". - http://www.legislature.state.al.us/ALISHome.html > > > > --__--__---- > > End of XML-SIG Digest > From tismer@appliedbiometrics.com Sun May 23 17:53:19 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 23 May 1999 18:53:19 +0200 Subject: [XML-SIG] DOM toxml() method References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> Message-ID: <3748327F.DB8CD080@appliedbiometrics.com> This is a multi-part message in MIME format. --------------71AC2DF7ECB30B2725A67297 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Jeffrey Chang wrote: > > How are people generating nicely-formatted XML from DOM trees? > > Has anyone written a function that will take a DOM tree and will insert > whitespace, where necessary, to generate a pretty XML document? It would > be almost the reverse of utils.strip_whitespace. Well, I did a little on this a while ago. But don't ask me about it's current state, had a lot of other projects meanwhile... ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home --------------71AC2DF7ECB30B2725A67297 Content-Type: text/plain; charset=us-ascii; name="indenter.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="indenter.py" # pretty printer for SAX # CT990122 # based upon the saxutils.Canonizer code # V.0.2 support for sgmlop which doesn't give ignorableWhitespace info from xml.sax import saxexts, saxlib, saxutils import string, sys class Indenter(saxlib.HandlerBase): "A SAX document handler that produces indented XML output." def __init__(self,writer=sys.stdout, indent=2): self.elem_level=0 self.writer=writer self.indent=indent self.last_level=-1 self.buffer = "" # lazy buffer for whitespace stripping def processingInstruction (self,target, remainder): #if not target=="xml": self.writer.write("<?"+target+" "+remainder+"?>\n") def startElement(self,name,amap): if self.buffer: self.write_buffer() self.writer.write("\n"+self.indent*self.elem_level*" "+"<"+name) a_names=amap.keys() a_names.sort() for a_name in a_names: self.writer.write(" "+a_name+"=\"") self.write_data(amap[a_name], 1) self.writer.write("\"") self.writer.write(">") self.last_level = self.elem_level self.elem_level=self.elem_level+1 def endElement(self,name): if self.buffer: self.write_buffer() self.elem_level=self.elem_level-1 if self.last_level < self.elem_level: self.writer.write("\n"+self.indent*self.elem_level*" "+"</"+name+">") else: self.writer.write("</"+name+">") self.last_level = -1 def ignorableWhitespace(self,data,start_ix,length): # we drop white space here. # self.characters(data,start_ix,length) pass def characters(self,data,start_ix,length): if self.elem_level>0: self.put_buffer(data[start_ix:start_ix+length]) def put_buffer(self, txt): self.buffer = self.buffer+txt def write_buffer(self): if self.buffer: self.write_data(string.strip(self.buffer)) self.buffer = "" def write_data(self,data, quotes=0): "Writes datachars to writer." data=string.replace(data,"&","&") data=string.replace(data,"<","<") if quotes: data=string.replace(data,"\"",""") data=string.replace(data,">",">") self.writer.write(data) def endDocument(self): self.write_buffer() self.writer.write("\n") try: pass #self.writer.close() except NameError: pass # It's OK, if the method isn't there we probably don't need it """ Example to format a DOM: >>> i=Indenter() >>> p=saxexts.make_parser() >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(cStringIO.StringIO(dom.toxml())) Example to format a file to a file, with sgmlop as parser: >>> f=open(r'd:\tmp\test.xml',"w") >>> i=Indenter(f) >>> p=saxexts.make_parser("xml.sax.drivers.drv_sgmlop") >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(r"h:\pns\projekte\srz\roteli\birgit\sgml\praep.sgm.umgebrochen.xml") >>> f.close() """ # speed comparison: # a very minimalistic parser which just finds tags. def indent(infile, outfile=sys.stdout, indent=2): split = string.split strip = string.strip if type(infile)==type(""): txt = infile else: txt = infile.read() lis = split(txt, "<") level = 0 lastl = -1 try: txt = strip(lis[0]) p = 1 while 1: parts = split(lis[p], ">") if len(parts) > 2: parts[:-1]=join(parts[:-1], ">") if parts[0][:1] != "/": # assume start tag outfile.write(strip(txt)+"\n"+indent*level*" "+"<"+parts[0]+">") txt = parts[1] lastl = level if parts[0][-1] not in "/?": # kein empty tag oder PI? level=level+1 else: outfile.write(strip(txt)) txt = parts[1] level=level-1 if lastl < level: outfile.write("\n"+indent*level*" "+"<"+parts[0]+">") else: outfile.write("<"+parts[0]+">") lastl = -1 p = p + 1 except IndexError: pass outfile.write(txt) --------------71AC2DF7ECB30B2725A67297-- From akuchlin@cnri.reston.va.us Mon May 24 02:33:58 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 23 May 1999 21:33:58 -0400 Subject: [XML-SIG] easySAX In-Reply-To: <3745E12F.69CE31E7@lyra.org> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <3745E12F.69CE31E7@lyra.org> Message-ID: <199905240133.VAA28025@mira.erols.com> Greg Stein writes: > I'm set up to provide a read/write CVS repository to multiple projects > and people. This could be particularly handy for non-CNRI contributors > since CNRI-based repositories have access restrictions. Ah! Having more people with commit privileges would be excellent, and would probably speed development, since people could keep their own modules up to date. We can discuss the administrative details of setting this up in private e-mail. -- A.M. Kuchling http://starship.python.net/crew/amk/ Those who will not reason / Perish in the act: / Those who will not act / Perish for that reason. -- W.H. Auden, "Shorts" From Fred L. Drake, Jr." <fdrake@acm.org Mon May 24 18:00:21 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Mon, 24 May 1999 13:00:21 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <3745E2C5.B55CC39D@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> <3745E2C5.B55CC39D@prescod.net> Message-ID: <14153.34213.811521.224833@weyr.cnri.reston.va.us> Paul Prescod writes: > I was thinking that deprecating it would just mean that new people would > stop using it. As regex says: "This module is obsolete as of Python This could be done easily enough (technically; this would still have to pass through Guido, but I think he's open to the SIG's suggestions on this stuff). > I kind of think that the current interface is too large and complicated > already. easySax was going to be something like 6 or 8 callbacks. xmllib > is already something like 16 or 17. There are a lot, but I'm not sure it's a huge problem. I don't know why there's setnomoretags(), and I'd expect setliteral() and translate_references() should be internal and undocumented. I'd also think the default behavior of handle_cdata() should be to pass the data along to handle_data(), but that's a separate issue. > Another option would be to merge the interfaces but deprecate all but the > 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many I'd be interested in seeing a specific synopsis for your simplified interface; perhaps just a class declaration with docstrings? -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From jack@oratrix.nl Mon May 24 20:46:31 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 24 May 1999 21:46:31 +0200 Subject: [XML-SIG] easySAX In-Reply-To: Message by "Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> , Fri, 21 May 1999 15:48:51 -0400 (EDT) , <14149.46016.398382.370758@amarok.cnri.reston.va.us> Message-ID: <19990524194636.E6BFCDDE08@oratrix.oratrix.nl> Recently, "Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> said: > I certainly think easySAX would be a good addition to the XML > package. It would be even better if easySAX was small enough to be > added to the Python library. Great idea! -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon May 24 21:13:19 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 24 May 1999 22:13:19 +0200 Subject: [XML-SIG] easySAX In-Reply-To: Message by Jack Jansen <jack@oratrix.nl> , Mon, 24 May 1999 21:46:31 +0200 , <19990524194636.E6BFCDDE08@oratrix.oratrix.nl> Message-ID: <19990524201324.137B7DDE08@oratrix.oratrix.nl> Recently, Jack Jansen <jack@oratrix.nl> said: > Great idea! Hmm, that wasn't very informative. I thinnk I need to learn to (a) read all messages on the subject and (b) then write meaningful messages. Let me try again:-) What I would like very much if there was a working easySAX interface in the core distribution, which would be setup to use the existing xmllib (which, in turn, would be marked depracated in the manual). Loading the whole xml suite would make the other parsers available to easySAX, thereby allowing an easy upgrade path to more functionality or faster parsers or whatever. And, of course, real power users could then switch from the easysax interface to the full interface. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From paul@prescod.net Tue May 25 14:23:08 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 25 May 1999 08:23:08 -0500 Subject: [XML-SIG] xmllib and easysax References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> <3745E2C5.B55CC39D@prescod.net> <14153.34213.811521.224833@weyr.cnri.reston.va.us> Message-ID: <374AA43C.D28C69D4@prescod.net> Fred asked me to outline an easysax compliant xmllib extension. Well, xmllib has only one export: xmllib.XMLParser. So my idea is that all we need to do is either build on that or deprecate it. Option 1: Build on it: We can extend xmllib.XMLParser (and sgmlop!) to be SAX-compliant parsers just by adding setDocumentHandler, etc. When XMLParser.parse() is called, it would behave just like a SAX parser. If setDocumentHandler is never called then the default document handler would be an undocumented helper class that would redirect the events BACK to the xmllib.XMLParser (because xmllib.XMLParser plays the roles of both parser and event handler). All of the non-SAX methods of xmllib.XMLParser would be deprecated. Option 2: Deprecate it: Maybe it is better to deprecate all of xmllib.XMLParser instead of deprecating individual methods. If we deprecated it we would replace it with an xmllib.Parser (not the shorter name) that was SAX compliant. Other stuff: Now that xmllib has a SAX-compliant parser (one way or the other), we can make a class called xmllib.handler which is a base class that implements all of the SAX methods and redirects start_FOO, text_FOO, pi_FOO, to a subclassed client (if it cares to override them) and also allows overriding of error, fatalerror, warning and so forth. I could live with the default behavior for errors and warnings being to throw an exception, I guess. We wouldn't really need to use the term "easysax" anymore. Easysax was never really an API in that we didn't expect multiple implementations for it. It was just a convenient handler base class (or adapter). I would also like the initialization of the XMLParser and handler classes to be integrated somehow. "Ordinary" sax takes too many steps in my opinion. We need to have a single line of user code that sets ALL of the sax handlers, creates the parser and parses. Perhaps class handler: def Parse( streamOrFile, parser=None ): parser = parser or XMLParser() XMLParser.setThis() XMLParser.setThat() if isFile( streamOrFile ): XMLParser.parse( open( "file", "rb" ) ) else: XMLParser.parse( streamOrFile ) This would be used like so: class MyHandler( xmllib.handler ): def text_TITLE( self, text ): #blah h=MyHandler() h.Parse( "/myfile.xml" ) One neat thing about this is that we could change the Parse() implementation one day so that it used a parser that knew a lot about easysax and did not (for instance) report text and elements that we aren't going to work with *at all*. If you don't specifically ask for a parser you get the blazingly fast one. But if you want choice you've got it: h=MyHandler() h.Parse( "/myfile.xml", MyFavParser() ) -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From wask@mcc.com Tue May 25 22:48:45 1999 From: wask@mcc.com (wask@mcc.com) Date: Tue, 25 May 1999 16:48:45 -0500 Subject: [XML-SIG] Error re: the impish "imp" ????????/ Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> I've been learning to use the XML Package from Python. However, when running from JPython, I keep getting errors regarding module imp (no module named, NameError). I've got my PATH set up correctly (I'm in Windows Wonderland). Anyone else experience this problem? --- Fred From l.szyster@ibm.net Wed May 26 10:42:03 1999 From: l.szyster@ibm.net (Laurent Szyster) Date: Wed, 26 May 1999 11:42:03 +0200 Subject: [XML-SIG] Python alternates to XSLT? References: <199905080503.BAA13877@python.org> <37342925.5880AB6A@webone.com.au> Message-ID: <374BC1EB.80F85F5F@ibm.net> Hi Stuart, Stuart Hungerford wrote: > > Maybe I'm getting old, but I've become very frustrated > with using XSL and now XSLT to transform XML documents > into HTML. What tools did you use? What problem did you encounter? I'm eager to learn all this, because I plan to use XSLT as the transformation language for a mapping engine. > I'd love to be able to "call out" to Python's > regular expressions and other abilities while in the > process of transforming the XML. Yes that would be nice. And that is exactly why I used Python to write my first mapper, Ema (an EDIFACT mapper). Laurent From Fred L. Drake, Jr." <fdrake@acm.org Wed May 26 14:54:41 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 26 May 1999 09:54:41 -0400 (EDT) Subject: [XML-SIG] Error re: the impish "imp" ????????/ In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> Message-ID: <14155.64801.383489.780326@weyr.cnri.reston.va.us> wask@mcc.com writes: > I've been learning to use the XML Package from Python. However, when running > from JPython, I keep getting errors regarding module imp (no module named, > NameError). I've got my PATH set up correctly (I'm in Windows Wonderland). Fred, This should go to the JPython list; this question doesn't appear specific to the XML packages. If this is being raised from the XML packages, perhaps you could post a traceback and a code snippet to reproduce the error. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From Fred L. Drake, Jr." <fdrake@acm.org Wed May 26 15:04:13 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 26 May 1999 10:04:13 -0400 (EDT) Subject: [XML-SIG] Location of Windows DLLs in the CVS tree Message-ID: <14155.65373.814138.260300@weyr.cnri.reston.va.us> Is there any reason not to place the Windows DLLs in the appropriate locations in the package tree for Windows? Windows uses should then be able to simply unpack the distribution in the right place to use it; no copying or moving of the DLLs would be needed. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From larsga@ifi.uio.no Wed May 26 15:55:57 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 26 May 1999 16:55:57 +0200 Subject: [XML-SIG] Error re: the impish "imp" ????????/ In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> Message-ID: <wkpv3nj2wy.fsf@ifi.uio.no> * wask@mcc.com | | I've been learning to use the XML Package from Python. However, when | running from JPython, I keep getting errors regarding module imp (no | module named, NameError). The problem is that imp is not implemented in JPython. I assume you're using SAX, and specifically the parser factories? I know they have this problem in JPython and I seem to recall that I have a patch submitted by Geir Ove Gr�nmo which does not use imp and thus works in JPython. I'll try to verify that this is so and see if I can make it available to you. I'll also try to think a little about whether we should put out another SAX version while waiting for SAX2. Got to run now, --Lars M. From HCSCCS@prudential.com.my Wed May 26 19:12:06 1999 From: HCSCCS@prudential.com.my (Paul Chung Chee Soong) Date: Wed, 26 May 1999 18:12:06 Subject: [XML-SIG] Python/XML HOWTO Message-ID: <199905261012-64975@prudential.com.my> Hi. I'm Paul from Malaysia and very keen in Python. I know that you're the author of the current Python/XML HOWTO. Thanks very much for coming up for this invaluable information. Now, let's discuss my problem. I have difficulty understanding certain part of the document. For example, I can't execute the "from xml.sax import saxlib, saxexts" coz I don't have the xml.sax module. But anyway, i manage to download those components separately. The earlier code become "import saxlib, saxexts". Did I solve the problem? FYI, I'm having Python 1.52 final release. Another problem that I came across is that in Section 3.1 Starting Out. I don't seem to run the example. This is what I do.. import saxexts if __name__ = '__main__': parser = saxexts.make_parser('drv_xmllib') # 1 dh = FindIssue ('Sandman', '62') # 2 parser.setDocumentHandler(dh) # 3 parser.parseFile('collection.xml') # 4 Now, in the # 1, I got error when I followed your example. Therefore, I include a parameter (a driver) Next in # 4, I got problem again. In your example, your parameter is a file. What is the 'file' represents? I thought it was a xml file but it isn't right?? I'll appreciate your advice in any format possible. Thanks a lot. Let's keep Python ALIVE!! Sincerely, Paul x�>" From larsga@ifi.uio.no Thu May 27 07:41:06 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 27 May 1999 08:41:06 +0200 Subject: [XML-SIG] Python/XML HOWTO In-Reply-To: <199905261012-64975@prudential.com.my> References: <199905261012-64975@prudential.com.my> Message-ID: <wkiu9fggl9.fsf@ifi.uio.no> * Paul Chung Chee Soong | | Now, let's discuss my problem. I have difficulty understanding | certain part of the document. For example, I can't execute the "from | xml.sax import saxlib, saxexts" coz I don't have the xml.sax | module. But anyway, i manage to download those components | separately. The earlier code become "import saxlib, saxexts". Did I | solve the problem? Probably not. :) This is something that seems to confuse many newbies, so I'll try to explain. If you do like this: C:\Mine dokumenter>python Python 1.5.2c1 (#0, Mar 12 1999, 10:55:39) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> print sys.path Python will print a list of directories. This is the list of directories in which it looks for modules you import. When you unzip the XML-SIG package (or the saxlib.zip file) somewhere on your disk (let's say c:\foo) it creates an xml directory (c:\foo\xml), below that a sax directory and below that again a drivers directory. What you need to do is to ensure that c:\foo is in the sys.path list. If it isn't you'll need to either add c:\foo to the PYTHONPATH environment variable or add it to the HKEY_LOCAL_MACHINE\Software\Python\1.5.2\PythonPath registry key. I hope that explained it for you? (If it did I think I'll brush it up a little and put it on the SAX pages as installation instructions.) | Another problem that I came across is that in Section 3.1 Starting | Out. I don't seem to run the example. This is what I do.. | | import saxexts | if __name__ = '__main__': | parser = saxexts.make_parser('drv_xmllib') # 1 | dh = FindIssue ('Sandman', '62') # 2 | parser.setDocumentHandler(dh) # 3 | parser.parseFile('collection.xml') # 4 | | Now, in the # 1, I got error when I followed your example. This is very likely because the parser factory has a list of drivers that looks like 'xml.sax.drivers.drv_xmllib'. So if you solve the package problem above you shouldn't need to hard-code the driver package name. (Another thing is that other programs that use SAX won't work unless they can find SAX where they expect it.) | Next in # 4, I got problem again. In your example, your parameter is | a file. What is the 'file' represents? I thought it was a xml file | but it isn't right?? It's the name of an XML file. If you want to push an XML document as a string ('<root><title>My title

From fredrik@pythonware.com Sat May 1 13:22:26 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 1 May 1999 14:22:26 +0200 Subject: [XML-SIG] Another SAX Suggestion References: <3.0.5.32.19990430085119.00ad0c50@corp> Message-ID: <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> Walter Underwood wrote: > At 11:19 AM 4/28/99 -0500, Paul Prescod wrote: > >I would like to suggest the default error handlers do something useful: > > > > def error(self, exception): > > "Handle a recoverable error." > > sys.stderr.write( "Error: "+ exception ) > > Since we write servers, we consider output to stderr from a library > to be a defect. I strongly agree. libraries shouldn't print stuff unless you explicitly tell them to. use exceptions instead. > Anybody else remember "RANGE ERROR" from the > C math library? anybody else stumbled upon the "Enter username/ Enter password" stuff down in urllib.py? From paul@prescod.net Sat May 1 19:03:15 1999 From: paul@prescod.net (Paul Prescod) Date: Sat, 01 May 1999 13:03:15 -0500 Subject: [XML-SIG] Another SAX Suggestion References: <3.0.5.32.19990430085119.00ad0c50@corp> <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> Message-ID: <372B41E3.E358ABF5@prescod.net> Fredrik Lundh wrote: > > I strongly agree. libraries shouldn't print stuff unless > you explicitly tell them to. use exceptions instead. Exceptions are maybe okay for fatal error messages, but what about non-fatal errors and warnings? > > Anybody else remember "RANGE ERROR" from the > > C math library? That was a very, very different situation. See below. > anybody else stumbled upon the "Enter username/ > Enter password" stuff down in urllib.py? I think that halting execution pending user input is a little bit more severe than outputting text to stderr! Anyhow, in the days of the ill-fated Taligent they made a distinction between a "library", an "application" and an "application framework." Parsers are definately libraries. HandlerBase is an application framework: you use it *only* to get its default behavior. If you don't want the behavior it gives you then you either don't use it or override the appropriate methods to make it do what you want. How about if we give HandlerBase a single initialization parameter which is the stream to output to. If you are a server or GUI app writer all you need to do is pass in a null output stream to get the error handling you want. Anyhow, I find it hard to believe that either Infoseek or PythonWare is going to write a search engine or graphics renderer and forget to think about where error messages should go! More likely it is the hordes of simple filter writers who will forget -- so the default should be optimized for them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Microsoft spokesman Ian Hatton admits that the Linux system would have performed better had it been tuned." "Future press releases on the issue will clearly state that the research was sponsored by Microsoft." http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp From shecter@darmstadt.gmd.de Mon May 3 10:43:12 1999 From: shecter@darmstadt.gmd.de (Robb Shecter) Date: Mon, 03 May 1999 11:43:12 +0200 Subject: [XML-SIG] Validation question Message-ID: <372D6FB0.BB258D24@darmstadt.gmd.de> Hi, I'd like to set up an xml validation service. I've done all my XML programming in Java so far, but would like to try Python out. I'd like to have an object with a method that takes two parameters; a dtd and an xml document, and returns true or false depending whether the document conforms to the dtd. For my purposes (making an XML-based middleware system), this could work with URLs pointing to the documents, if some caching is done. Typically, many documents would be checked against a single dtd. (I'm playing with using XML to specify server interfaces.) Thanks for any hints! - Robb From paul@prescod.net Mon May 3 14:31:54 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 03 May 1999 08:31:54 -0500 Subject: [XML-SIG] qp_api and the DOM Message-ID: <372DA54A.DCCD4F88@prescod.net> Greg's absence gave me a little bit of time to catch up on work (of course not enough :) I think that what Greg and I are trying to accomplish may be so divergent that trying to unify the efforts may be a waste of time. Here's how I interpret the differences: I think that Greg wants to make a *module* for his own use and the use of others with his concerns. I want to make an *API* which explicitly permits multiple, interoperable implementations -- including wrappers over existing C libraries like xml4c2 and Java libraries. I think that Greg's ordering of priorities is performance, simplicity, familiarity, completeness and interoperability. It seems to me that a module will meet this ordering of priorities better than a standardized API. My ordering is completeness, interoperability, familiarity, simplicity, and performance. It is because I rank interoperability so highly that I want to make an API instead of a module. I rank completeness highly because I think that my API should be applicable to as wide a range of software as possible. Greg doesn't mind restricting his API to just a "data-centric" subset. I think that Greg wants his library to be part of the Python XML-SIG's distribution, I'm okay with that. I think that at least one implementation of the Python DOM API should be part of that distribution also. I also think that some library conforming to the DOM API should eventually be part of the Python standard library -- perhaps partially coded in C. A DOM implementation is appropriate for the Python standard library because it is designed to be general and familiar. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Diplomatic term: "We had a frank exchange of views." Translation: Negotiations stopped just short of shouting and table-banging. (Brills Content, Apr. 1999) From Fred L. Drake, Jr." Is anyone working on making xml.parsers.xmllib conform to the new xmllib interface? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From larsga@ifi.uio.no Tue May 4 11:32:11 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 04 May 1999 12:32:11 +0200 Subject: [XML-SIG] ElementType.content_model interpretation of '*' Message-ID: * Jeffrey Chang | | I am using xmlproc.dtdparser.DTDParser and xmlproc.xmldtd.CompleteDTD to | parse and store the contents of a DTD file (xmlproc v0.60). I have been working on the same (in order to provide DTD caching), and it would be interesting to see what you've done with this. I used pickle and achieved acceptable (although not really impressive) speedups, but ran into trouble with the internal subset handling. I'm still thinking about what to do with this. Some improvements to the current DTD handling have emerged, though, and it is possible that there will be more. | | | [...] | >>> d.elems['test'].content_model | { # I've reformatted this for readability | 'start': 1L, | 1L: [(6L, 'a')], | 4L: [(4L, 'b')], | 6L: [(4L, 'b')], | 'final': 4L | } | | According to this content model, 'test' must contain 1 'a' and at | least 1 'b' before reaching the final state. So it may seem, but it's not actually the case. This content model is created by converting a non-deterministic automaton into a deterministic one, so state 1 is state 1 in the original NDA, 4 is state 4, while state 6 is the combination of state 2 and state 4 in the original NDA. If you look at the contents of the final_state method you'll see that it does return self.content_model["final"] & state which means that after seeing one a and no b's it will do return 4L & 6L and so return 4, which evaluates as true. | I [...] would have expected a content model more like: | 'start': 1L, | 1L: [(4L, 'a')], | 4L: [(4L, 'b')], | 'final': 4L I agree that this would have been more optimal. I should probably have a closer look at my automaton-generating code (which is sub-optimal in some other respects as well) and see if I can improve it. I don't expect this to happen any time soon, though, since the only things it affects are memory consumption and DTD loading time (from the not-yet-released cache). I hope this resolved your question. I'm sorry for the late reply, but I've been away for two weeks due to the PAJAVA and XML Europe conferences. --Lars M. From gstein@lyra.org Tue May 4 11:50:32 1999 From: gstein@lyra.org (Greg Stein) Date: Tue, 04 May 1999 03:50:32 -0700 Subject: [XML-SIG] qp_api and the DOM References: <372DA54A.DCCD4F88@prescod.net> Message-ID: <372ED0F8.39D612D5@lyra.org> I generally dislike responses like the one I'm about to make, but I'm leaving town again for a few days :-), so... --> yes, absolutely, right-on-the-money, excellent. I believe that I was trying to state some of this in the dichotomy I proposed, but (as you pointed out) that was flawed. IMO, you've analyzed it much better. More on Friday or so... thx! -g Paul Prescod wrote: > > Greg's absence gave me a little bit of time to catch up on work (of course > not enough :) > > I think that what Greg and I are trying to accomplish may be so divergent > that trying to unify the efforts may be a waste of time. Here's how I > interpret the differences: > > I think that Greg wants to make a *module* for his own use and the use of > others with his concerns. > > I want to make an *API* which explicitly permits multiple, interoperable > implementations -- including wrappers over existing C libraries like > xml4c2 and Java libraries. > > I think that Greg's ordering of priorities is performance, simplicity, > familiarity, completeness and interoperability. It seems to me that a > module will meet this ordering of priorities better than a standardized > API. > > My ordering is completeness, interoperability, familiarity, simplicity, > and performance. It is because I rank interoperability so highly that I > want to make an API instead of a module. > > I rank completeness highly because I think that my API should be > applicable to as wide a range of software as possible. Greg doesn't mind > restricting his API to just a "data-centric" subset. > > I think that Greg wants his library to be part of the Python XML-SIG's > distribution, I'm okay with that. I think that at least one implementation > of the Python DOM API should be part of that distribution also. > > I also think that some library conforming to the DOM API should eventually > be part of the Python standard library -- perhaps partially coded in C. A > DOM implementation is appropriate for the Python standard library because > it is designed to be general and familiar. -- Greg Stein, http://www.lyra.org/ From sean@digitome.com Tue May 4 14:19:39 1999 From: sean@digitome.com (Sean Mc Grath) Date: Tue, 04 May 1999 14:19:39 +0100 Subject: [XML-SIG] DOM Considered Harmful :-) In-Reply-To: <37219B0E.3123A201@lyra.org> Message-ID: <3.0.6.32.19990504141939.00d2e290@gpo.iol.ie> [Greg Stein] > >2) the close() method and parent/sibling relationships > >Adding parent/sibling relationships introduces loops unless you use >proxies or introduce a close() method (if there is another way, then I'd >like to learn it). In LumberJack, a collection of nodes (LJNode objects) are associated with a tree object known as an LJTree. When the reference count of an LJTree shrinks to zero LumberJack walks the associated, doubly linked LJnodes and breaks the double links:- def __del__(self): self.Home() # Move current position to root of tree from LJUtils import DescendantsInclusive for item in DescendantsInclusive(self): # iterate a node list item.node.North = None # break all links item.node.South = None item.node.East = None item.node.West = None From larsga@ifi.uio.no Tue May 4 15:02:44 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 04 May 1999 16:02:44 +0200 Subject: [XML-SIG] Validation question In-Reply-To: <372D6FB0.BB258D24@darmstadt.gmd.de> References: <372D6FB0.BB258D24@darmstadt.gmd.de> Message-ID: * Robb Shecter | | I'd like to set up an xml validation service. I've done all my XML | programming in Java so far, but would like to try Python out. Cool! You are certainly most welcome here. :) | I'd like to have an object with a method that takes two parameters; | a dtd and an xml document, and returns true or false depending | whether the document conforms to the dtd. Using xmlproc you can easily do this. However, what do you do if the document contains a DOCTYPE declaration that points to something else? xmlproc would let you ignore that and use the same DTD anyway, but depending on your application this might not be a very XML-like approach. | For my purposes (making an XML-based middleware system), this could | work with URLs pointing to the documents, if some caching is done. | Typically, many documents would be checked against a single dtd. I have nearly all the code needed to enable this with xmlproc, the only part missing being the handling of the internal subset. If you want to outlaw the use of internal subsets in the documents you validate I can easily release an xmlproc 0.62 which supports DTD caching in this case. (It could later be extended to support it in all cases, without changing the externally-visible APIs.) It's relatively easy to support the internal subset as well, the problem is finding an approach that is both fast and clean and works equally well both for in-memory caching within a process and on-disk caching between processes. --Lars M. From wunder@infoseek.com Tue May 4 16:43:52 1999 From: wunder@infoseek.com (Walter Underwood) Date: Tue, 04 May 1999 08:43:52 -0700 Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <372B41E3.E358ABF5@prescod.net> References: <3.0.5.32.19990430085119.00ad0c50@corp> <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> Message-ID: <3.0.5.32.19990504084352.00a90c90@corp> At 01:03 PM 5/1/99 -0500, Paul Prescod wrote: > >How about if we give HandlerBase a single initialization parameter which >is the stream to output to. If you are a server or GUI app writer all you >need to do is pass in a null output stream to get the error handling you >want. I would slightly prefer a base that did nothing, with a supplied subclass that wrote to stderr. The stream parameter wouldn't be useful in our code, since our log interface isn't a stream. >Anyhow, I find it hard to believe that either Infoseek or PythonWare is >going to write a search engine or graphics renderer and forget to think >about where error messages should go! We just don't like modifying the guts of libraries to get there. > [...] More likely it is the hordes of >simple filter writers who will forget -- so the default should be >optimized for them. We're on the same page here, but I'd rather see the stderr stuff used in the sample code. Most of those filters will be a thousand lines of code added to the sample code anyway. wunder -- Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://software.infoseek.com/cce/ (my product) http://www.best.com/~wunder/ 1-408-543-6946 From larsga@ifi.uio.no Tue May 4 19:19:22 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 04 May 1999 20:19:22 +0200 Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <3.0.5.32.19990430085119.00ad0c50@corp> References: <3.0.5.32.19990430085119.00ad0c50@corp> Message-ID: * Walter Underwood | | I wouldn't mind having a stderr error handler provided as part of | the module, with sample code that uses that error handler. We've got it already, except for the sample error code. | Also along this line, does the SAX adaptor for expat catch all | exceptions raised in a handler? So far, what it does is to call parse and if that returns an error code, the error message is resolved and passed to ErrorHandler.fatalError and parsing stops. If this is bad then please let me know and I'll fix it (or accept patches). --Lars M. From larsga@ifi.uio.no Tue May 4 19:20:53 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 04 May 1999 20:20:53 +0200 Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <3727350E.6B51E1ED@prescod.net> References: <3727350E.6B51E1ED@prescod.net> Message-ID: * Paul Prescod | | I would like to suggest the default error handlers do something useful: | | [...] | | Of course if that's not what a particular implementation wants, they | can override it, but I think that the current lack of behavior is | non-intuitive. Maybe I'm corrupted by working with SGML tools but I | expect the defaults to be as above. It's a good suggestion. Here are what I see as the pros and cons of this. Pro: - I've been bitten by forgetting to add an errorhandler for one-off scripts lots and lots of times, and and I don't doubt that many others have as well - in most cases this is very likely what the user wants anyway - like you say, serious users will have to add something anyway, and ErrorRaiser is in saxutils ready for use Con: - it's a departure from standard Java-SAX behaviour - ErrorPrinter and ErrorRaiser are in saxutils and ready to be plugged in - forcing people to take a conscious stand on this issue is probably not the worst we could do to them So far I've limited myself to mentioning ErrorRaiser and ErrorPrinter and telling people to always always always no-matter-what use one of them. So far I count Paul and AMK in favour and Fredrik and Walter against. Personally I don't have an opinion (yet), but if the discussion ends with a 2-2 score I'll consider it a draw and not do anything. However, it's worth pointing out that in any case, altering the behaviour is a question of a single line of Python code (not counting imports), so this is hardly the end of the world no matter which way we go. --Lars M. From larsga@ifi.uio.no Tue May 4 19:24:49 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 04 May 1999 20:24:49 +0200 Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: <372B41E3.E358ABF5@prescod.net> References: <3.0.5.32.19990430085119.00ad0c50@corp> <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> <372B41E3.E358ABF5@prescod.net> Message-ID: * Paul Prescod | | How about if we give HandlerBase a single initialization parameter | which is the stream to output to. If you are a server or GUI app | writer all you need to do is pass in a null output stream to get the | error handling you want. I don't like this approach. I think something like parser.setErrorHandler(saxutils.ErrorIgnorer()) is much to be preferred. We could then keep the current ErrorPrinter, which takes both an output stream _and_ an numeric level parameter so that warnings and/or errors can be ignored. | More likely it is the hordes of simple filter writers who will | forget -- so the default should be optimized for them. This should definitely be considered when we define the filter interface for SAX 2 (or layered on top of it). --Lars M. From Fred L. Drake, Jr." References: <3727350E.6B51E1ED@prescod.net> Message-ID: <14127.15227.362057.867378@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > So far I count Paul and AMK in favour and Fredrik and Walter against. > Personally I don't have an opinion (yet), but if the discussion ends > with a 2-2 score I'll consider it a draw and not do anything. Lars, Count me as wanting ErrorPrinter to be the default behavior. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein@lyra.org Tue May 4 21:08:28 1999 From: gstein@lyra.org (Greg Stein) Date: Tue, 04 May 1999 13:08:28 -0700 Subject: [XML-SIG] DOM Considered Harmful :-) References: <3.0.6.32.19990504141939.00d2e290@gpo.iol.ie> <372F5396.33F92E54@lyra.org> Message-ID: <372F53BC.521F1F9C@lyra.org> Crap. Hit Reply rather than Reply All... :-) Greg Stein wrote: > > Sean Mc Grath wrote: > > > > [Greg Stein] > > > > > >2) the close() method and parent/sibling relationships > > > > > >Adding parent/sibling relationships introduces loops unless you use > > >proxies or introduce a close() method (if there is another way, then I'd > > >like to learn it). > > > > In LumberJack, a collection of nodes (LJNode objects) are > > associated with a tree object known as an LJTree. When the > > reference count of an LJTree shrinks to zero LumberJack > > walks the associated, doubly linked LJnodes and > > breaks the double links:- > > How could the tree ever hit a refcount of zero? If the tree refers to > the Nodes and the Nodes to the tree, then you have a reference loop. > > Or are you maintaining a global list of Node references? (and filtering > for those that refer to the tree) Oh. That wouldn't work, since that > presumes the Nodes are still referencing the tree, which is not possible > (since the tree has zero refs in this case) > > Regardless, while you may have solved the loops problem, this is a > rather complex solution. I had stated in my original note that close() > caused one of two problems: loops or complexity. Its sounds like you're > hiding some of the complexity from users, but it is still there. I'd > like to believe that we can provide XML parsing/consumption without the > complexity. > > Cheers, > -g > > -- > Greg Stein, http://www.lyra.org/ -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Tue May 4 23:20:29 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 May 1999 17:20:29 -0500 Subject: [XML-SIG] Another SAX Suggestion References: <3.0.5.32.19990430085119.00ad0c50@corp> <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> <372B41E3.E358ABF5@prescod.net> Message-ID: <372F72AD.9A0890B6@prescod.net> Lars Marius Garshol wrote: > > | More likely it is the hordes of simple filter writers who will > | forget -- so the default should be optimized for them. > > This should definitely be considered when we define the filter > interface for SAX 2 (or layered on top of it). I was speaking in the more generic sense of someone who takes XML data in and wants to produce something else out. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco The first three Noble Truths of Python: All that is not Python is suffering. The origin of suffering lies in the use of not-Python. The cessation of suffering can be achieved by not using not-Python. http://www.pauahtun.org/4nobletruthsofpython.html From paul@prescod.net Tue May 4 23:21:47 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 May 1999 17:21:47 -0500 Subject: [XML-SIG] Another SAX Suggestion References: <3727350E.6B51E1ED@prescod.net> Message-ID: <372F72FB.25F3F91B@prescod.net> Lars Marius Garshol wrote: > > - forcing people to take a conscious stand on this issue is probably > not the worst we could do to them If the default behavior motion fails to carry then I would like the fallback position to be some way of forcing them to take a stand. For instance you could throw an ExceptionHandlerNotDefined exception. I think that the fact that this has bitten both you and I is important. I have a feeling that Walter and Fredrick are speaking more from a general design perspective than from experience in this particular case. Usually I would agree but HandlerBase is not just a library -- it is a helper framework. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco The first three Noble Truths of Python: All that is not Python is suffering. The origin of suffering lies in the use of not-Python. The cessation of suffering can be achieved by not using not-Python. http://www.pauahtun.org/4nobletruthsofpython.html From paul@prescod.net Tue May 4 23:23:28 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 04 May 1999 17:23:28 -0500 Subject: [XML-SIG] Another SAX Suggestion References: <3.0.5.32.19990430085119.00ad0c50@corp> <011f01be93cd$4ddd2570$f29b12c2@pythonware.com> <3.0.5.32.19990504084352.00a90c90@corp> Message-ID: <372F7360.A0464018@prescod.net> Walter Underwood wrote: > > I would slightly prefer a base that did nothing, with a supplied > subclass that wrote to stderr. The stream parameter wouldn't be > useful in our code, since our log interface isn't a stream. Let me go back in my time machine. Presto, the proposal is changed to meet your needs. Here's how to use the base that does nothing: class MyHandler( EntityResolver, DTDHandler, DocumentHandler ): def warning (): logit() def error(): logit() def fatalError(): logit() > We just don't like modifying the guts of libraries to get there. You don't have to modify anything. Just override a method. HandlerBase exists precisely so that you can override its methods. By default it is useless. > We're on the same page here, but I'd rather see the stderr stuff > used in the sample code. Most of those filters will be a thousand > lines of code added to the sample code anyway. A simple SAX app is only 5 lines of code because it depends on HandlerBase as the "reusable sample code." But you are right, if we don't change this then we definately have to change the test code and demo programs because now they do the wrong thing. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco The first three Noble Truths of Python: All that is not Python is suffering. The origin of suffering lies in the use of not-Python. The cessation of suffering can be achieved by not using not-Python. http://www.pauahtun.org/4nobletruthsofpython.html From larsga@ifi.uio.no Wed May 5 09:29:33 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 05 May 1999 10:29:33 +0200 Subject: [XML-SIG] Py-ish PySax Suggestion #2 In-Reply-To: <3724D3FA.2E589DB6@prescod.net> References: <3724D3FA.2E589DB6@prescod.net> Message-ID: * Paul Prescod | | I would like to suggest that we copy the *mllib start_foo convention for | PySAX. And in other postings you suggest that the error handler do something by default, and that we add an easier-to-use alternative to the characters method. I like all these proposals, but on the other hand I would like to keep SAX as it is and to keep it in step with Java SAX. Also, I think your proposals are way too limited in terms of making SAX easier to use. I would like to add things like: - making an element stack (with attributes) available to the application - making it easier to get the contents of an element If we could solve all these we would have a SAX interface that would be a lot easier to use, and in fact I've been waiting for somebody to do this job ever since we first defined PySAX. So my suggestion is: why don't we sit down and define an easier-to-use SAX, which we layer on top of the existing one? I think we can just recycle your proposals and use them directly, then add the element stack (which is trivial anyway) and then see what we can do about element content. (I have some ideas, but they need to mature a little.) Does anybody else have something on their SAX wishlist that they'd like to see in this easySAX? --Lars M. From sean@digitome.com Tue May 4 21:41:56 1999 From: sean@digitome.com (Sean Mc Grath) Date: Tue, 04 May 1999 21:41:56 +0100 Subject: [XML-SIG] DOM Considered Harmful :-) In-Reply-To: <372F53BC.521F1F9C@lyra.org> References: <3.0.6.32.19990504141939.00d2e290@gpo.iol.ie> <372F5396.33F92E54@lyra.org> Message-ID: <3.0.6.32.19990504214156.0095ea00@gpo.iol.ie> >> Sean Mc Grath wrote: >> > >> > In LumberJack, a collection of nodes (LJNode objects) are >> > associated with a tree object known as an LJTree. When the >> > reference count of an LJTree shrinks to zero LumberJack >> > walks the associated, doubly linked LJnodes and >> > breaks the double links:- >> >Greg Stein wrote: >> How could the tree ever hit a refcount of zero? If the tree refers to >> the Nodes and the Nodes to the tree, then you have a reference loop. The tree references the root node. The is no link back from the root node to the tree and this no circular reference. Users of LumberJack think in terms of tree objects - not nodes or node lists. The tree object serves to maintain a "current position" on behalf of the user. It is also the place where the tree level methods hang - things like cut(), paste() and so on. One interesting side-effect of the tree/node combo is that with LumberJack you can easily manipulate inter-tree node-lists. A NodeListItem is a combination of tree object and node object so you can build lists of these things that contain bits of lots of different trees. I'm sure some day now I'll think of a really good application for that...:-) From jim@digicool.com Wed May 5 15:54:42 1999 From: jim@digicool.com (Jim Fulton) Date: Wed, 05 May 1999 10:54:42 -0400 Subject: [XML-SIG] DOM Considered Harmful :-) References: <37219B0E.3123A201@lyra.org> Message-ID: <37305BB2.52D4AABF@digicool.com> Greg Stein wrote: > > 2) the close() method and parent/sibling relationships Acquisition (http://www.zope.org/Documentation/Reference/Acquisition) should work very well for this. > Adding parent/sibling relationships introduces loops unless you use > proxies or introduce a close() method (if there is another way, then I'd > like to learn it). Proxies are out for efficiency reasons -- objects get > constructed every time you simply want to peek into the data structure. > While the complexity is (mostly) hidden from the client, it is still > there. You don't end up with simple data structures... instead, you get > a lot of "mechanism" in there to deal with intercepting accesses so that > you can create a proxy to bundle up the necessary data. I guess that acquisition corresponds to a "proxy" approach. We have optimized it heavily so performance is not an issue. Also, acquisition provides a very simple model that adds very little complexity. > A close() type method introduces other problems. If you aren't careful, > then it is easy to leak the entire parse tree. What happens if you pass > a subset of the tree to another subsystem? You will have one of two > problems: 1) the client avoids calling close() so the subsystem can use > parent references (this leaks the whole tree); or 2) the client calls > close() so the subsystem only retains its subtree, but now its > (expected) parent/sibling relationsips no longer work. It has a set of > objects that don't fully respond to their published API. Acquisition avoids this issue because it eliminates the need for child refereces to parents. (snip) > I'm tremendously in favor of the model returned by qp_xml. (BTW, could you provide a link to this proposal? I must have deleted the message that contained it. I'd like to look at it.) Acquisition has worked very well in Zope. It provides a very nice way to share information in a containment hierarchy. It's only drawback is that it requires use of ExtensionClass, which is actually a bonus. ;) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From larsga@ifi.uio.no Wed May 5 16:37:38 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 05 May 1999 17:37:38 +0200 Subject: [XML-SIG] easySAX Message-ID: Here's a first sketch of what easySAX might look like. It incorporates Paul's text proposal, Geir Ove's text-in-one-block, Paul's always-print-error-messages and should probably be extended to call start_foo for events provided that start_foo is defined (and otherwise use startElement) and ditto for end tags. One addition that might be nice: allow users to define pi_(data) and ppi_(attrs) methods. In the latter case the data would be parsed into an attributelist. What do people think? Is this better than adding the suggested improvements to the SAX core? This was just hacked together in 15 minutes, so please don't hesitate to slaughter it if you don't like it. --Lars M. from xml.sax import saxlib,saxexts import sys class SAXAdapter(saxlib.DocumentHandler): def __init__(self,dh): self.dh=dh def startElement(self,name,attrs): self.dh.text="" self.dh.startElement(name,attrs) self.dh.element_stack.append((name,attrs)) # copy attrs?? def characters(self,data,start,len): self.dh.text=self.dh.text+data[start:start+len] def endElement(self,name): attrs=self.dh.element_stack[-1][1] del self.dh.element_stack[-1] self.dh.endElement(name,attrs) self.dh.text="" def processingInstruction(self,target,data): self.dh.processingInstruction(target,data) class DocHandler: def __init__(self): self.element_stack=[] # stack of (name,attrs) tuples self.locator=None # locator, if any self.text="" # text seen so far in current element # (reset whenever a tag is seen) def startElement(self,name,attrs): pass def endElement(self,name,attrs): pass def processingInstruction(self,target,data): pass # --- Main program dh=DocHandler() p=saxexts.make_parser() p.setDocumentHandler(SAXAdapter(DocHandler())) p.parse(sys.argv[1]) From Fred L. Drake, Jr." References: Message-ID: <14128.29972.689027.500572@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > always-print-error-messages and should probably be extended to call > start_foo for events provided that start_foo is defined (and > otherwise use startElement) and ditto for end tags. I like this addition; it should be added. > One addition that might be nice: allow users to define pi_(data) > and ppi_(attrs) methods. In the latter case the data would be > parsed into an attributelist. This would definately be nice, and won't be a problem for "normal" data, where PIs aren't used often. > improvements to the SAX core? This was just hacked together in 15 > minutes, so please don't hesitate to slaughter it if you don't like It might be a good idea to use a call to extract the element stack instead of providing direct access to the list object. This would allow different internal structures to be used without changing the interface. This might be interesting in some cases. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From dieter@handshake.de Tue May 4 22:08:25 1999 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 4 May 1999 21:08:25 +0000 (/etc/localtime) Subject: [XML-SIG] Another SAX Suggestion In-Reply-To: References: Message-ID: <14127.24729.741934.93363@lindm.dm> Lars Marius Garshol writes: > > * Paul Prescod > | > | I would like to suggest the default error handlers do something useful: > So far I count Paul and AMK in favour and Fredrik and Walter against. > Personally I don't have an opinion (yet), but if the discussion ends > with a 2-2 score I'll consider it a draw and not do anything. Count me in favour. I, once, have debugged xmlproc for about 1 hour to analyse a problem posted into this list. It turned out, that an attribute value had not been quoted. Because no error handler had been installed, the result was apparently a wrong DOM tree without any indication of the real problem. - Dieter From gstein@lyra.org Wed May 5 21:30:37 1999 From: gstein@lyra.org (Greg Stein) Date: Wed, 5 May 1999 13:30:37 -0700 (PDT) Subject: [XML-SIG] DOM Considered Harmful :-) In-Reply-To: <37305BB2.52D4AABF@digicool.com> Message-ID: On Wed, 5 May 1999, Jim Fulton wrote: > > I'm tremendously in favor of the model returned by qp_xml. > > (BTW, could you provide a link to this proposal? > I must have deleted the message that contained it. I'd > like to look at it.) It's available from my page at: http://www.lyra.org/greg/ > Acquisition has worked very well in Zope. It provides a > very nice way to share information in a containment hierarchy. > It's only drawback is that it requires use of ExtensionClass, > which is actually a bonus. ;) :-) Is that still a patch to the interpreter, or is it all Python code now? I'd be interested to see what kinds of changes would be needed to use the acquisition stuff. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim@digicool.com Wed May 5 22:11:18 1999 From: jim@digicool.com (Jim Fulton) Date: Wed, 05 May 1999 17:11:18 -0400 Subject: [XML-SIG] DOM Considered Harmful :-) References: Message-ID: <3730B3F6.69868D16@digicool.com> Greg Stein wrote: > > On Wed, 5 May 1999, Jim Fulton wrote: > > > I'm tremendously in favor of the model returned by qp_xml. > > > > (BTW, could you provide a link to this proposal? > > I must have deleted the message that contained it. I'd > > like to look at it.) > > It's available from my page at: http://www.lyra.org/greg/ Thanks, I'll take a look. > > Acquisition has worked very well in Zope. It provides a > > very nice way to share information in a containment hierarchy. > > It's only drawback is that it requires use of ExtensionClass, > > which is actually a bonus. ;) > > :-) > > Is that still a patch to the interpreter, or is it all Python code now? It never was a patch. It *is* an extension module. There's an ExtensionClass extension module and an Acquisition axtension module, which requires the ExtensionClass extension to be around. > I'd be interested to see what kinds of changes would be needed to use the > acquisition stuff. Basically, you subclass one of the acquisition base classes (Explicit or Implict) and then, when you want to refer to a parent in a child (that has been accesses through a parent), you refer to the parent via the attribute, aq_parent. Given that in DOM (if my vague recollection serves me :) children are not accessed as attributes, the access methods need to return the children on the context of the parent, as in: import Acquisition class MyClass(Acquisition.Explicit): def some_method_that_returns_a_child(self, ... whatever): .... return theChild.__of__(self) The special method __of__, which comes from a mix-in class returns one object "in the context of" another. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From akuchlin@cnri.reston.va.us Thu May 6 02:09:03 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 5 May 1999 21:09:03 -0400 Subject: [XML-SIG] unicode entity refs In-Reply-To: <85256763.004E13CB.00@li01.lm.ssc.siemens.com> References: <85256763.004E13CB.00@li01.lm.ssc.siemens.com> Message-ID: <199905060109.VAA02082@207-172-46-87.s87.tnt9.ann.va.dialup.rcn.com> Jeff.Johnson@icn.siemens.com writes: >Sorry to be a pest but I never got a response on the following email and was >hoping someone had an answer as to why unicode entity refs dissapear in PyDom. I finally got around to looking at this tonight while cleaning out my mailbox. The HTML parser is actually choking on the character reference, but the error handler is, surprise surprise, not doing anything. The fix is to add an error handler, as in the patch below. However, this doesn't fix your problem, since the error handler raises a BadHTML exception. I'd argue for this behaviour, since the HTML character set is ISO-whatever, not Unicode, and therefore this is illegal HTML; if it's got character references >255, it's not HTML but XML that looks like HTML. (Hmm... I may have written too soon; what's the status of HTML i18n? Can you declare a Unicode encoding for an HTML document?) On a side note, the Unicode issue seems to be heading for using /F's Unicode type. This would seem to be a good argument to drop MvL's Unicode type, which is currently in the XML tree, and replace it with /F's code. Opinions? -- A.M. Kuchling http://starship.python.net/crew/amk/ Surely where there's smoke there's fire? No, where there's so much smoke there's smoke. -- John A. Wheeler Index: html_builder.py =================================================================== RCS file: /home/cvsroot/xml/dom/html_builder.py,v retrieving revision 1.9 diff -C2 -r1.9 html_builder.py *** html_builder.py 1999/03/09 00:57:11 1.9 --- html_builder.py 1999/05/06 01:03:33 *************** *** 100,103 **** --- 100,106 ---- break + def unknown_charref(self, ref): + raise BadHTMLError, ('Unknown character reference: &#' + ref + ';') + def handle_data(self, s): #print `s` From larsga@ifi.uio.no Thu May 6 09:58:13 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 06 May 1999 10:58:13 +0200 Subject: [XML-SIG] unicode entity refs In-Reply-To: <199905060109.VAA02082@207-172-46-87.s87.tnt9.ann.va.dialup.rcn.com> References: <85256763.004E13CB.00@li01.lm.ssc.siemens.com> <199905060109.VAA02082@207-172-46-87.s87.tnt9.ann.va.dialup.rcn.com> Message-ID: * A. M. Kuchling | | (Hmm... I may have written too soon; what's the status of HTML i18n? | Can you declare a Unicode encoding for an HTML document?) HTML 4.0 declares the character set to be the first 17 planes of ISO 10646, meaning that references up to 1056768 are allowed, although there are some holes in between. That is, something like this, but in the char ref handler, which I don't know what should look like: def unknown_charref(self, ref): ref=string.atoi(ref) if ref<9 or ref==11 or ref==12 or (ref>13 and ref<32) or \ (ref>126 and ref<160) or (ref>55295 and ref<57344): raise BadHTMLError, ('Illegal character reference: &#' + ref + ';') elif ref<256: # accept and insert else: raise BadHTMLError, ('Unsupported character reference: &#' + ref + ';') I think we should look at the SGML declaration of HTML, outlaw the characters not supported by HTML 4.0 and raise | On a side note, the Unicode issue seems to be heading for using /F's | Unicode type. This would seem to be a good argument to drop MvL's | Unicode type, which is currently in the XML tree, and replace it | with /F's code. Opinions? Go for it! --Lars M. From Jeff.Johnson@icn.siemens.com Thu May 6 16:38:53 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Thu, 6 May 1999 11:38:53 -0400 Subject: [XML-SIG] unicode entity refs Message-ID: <85256769.0055E293.00@li01.lm.ssc.siemens.com> A.M. Kuchling writes: >I finally got around to looking at this tonight while cleaning out my >mailbox. The HTML parser is actually choking on the character >reference, but the error handler is, surprise surprise, not doing >anything. The fix is to add an error handler, as in the patch below. Good timing, I put this on the back burner until yesterday when I made a few subclasses to preserve the charrefs... I have no idea if this is the best way but here it is anyway :) (gLog is a global error logging class) from xml.dom.builder import Builder from xml.dom.html_builder import HtmlBuilder from xml.dom.utils import FileReader class MyHtmlBuilder(HtmlBuilder): def unknown_charref(self, ref): gLog.Warning('unknown_charref %s' % ref) Builder.entityref(self, '#' + ref) def unknown_entityref(self, ref): gLog.Error('unknown_entityref %s' % ref) class MyFileReader(FileReader): def readHtml(self,stream): b = MyHtmlBuilder() b.feed(stream.read()) b.close() return b.document "A.M. Kuchling" on 05/05/99 09:09:03 PM Please respond to akuchlin@cnri.reston.va.us To: Jeff Johnson/Service/ICN cc: xml-sig@python.org Subject: [XML-SIG] unicode entity refs Jeff.Johnson@icn.siemens.com writes: >Sorry to be a pest but I never got a response on the following email and was >hoping someone had an answer as to why unicode entity refs dissapear in PyDom. I finally got around to looking at this tonight while cleaning out my mailbox. The HTML parser is actually choking on the character reference, but the error handler is, surprise surprise, not doing anything. The fix is to add an error handler, as in the patch below. However, this doesn't fix your problem, since the error handler raises a BadHTML exception. I'd argue for this behaviour, since the HTML character set is ISO-whatever, not Unicode, and therefore this is illegal HTML; if it's got character references >255, it's not HTML but XML that looks like HTML. (Hmm... I may have written too soon; what's the status of HTML i18n? Can you declare a Unicode encoding for an HTML document?) On a side note, the Unicode issue seems to be heading for using /F's Unicode type. This would seem to be a good argument to drop MvL's Unicode type, which is currently in the XML tree, and replace it with /F's code. Opinions? -- A.M. Kuchling http://starship.python.net/crew/amk/ Surely where there's smoke there's fire? No, where there's so much smoke there's smoke. -- John A. Wheeler Index: html_builder.py =================================================================== RCS file: /home/cvsroot/xml/dom/html_builder.py,v retrieving revision 1.9 diff -C2 -r1.9 html_builder.py *** html_builder.py 1999/03/09 00:57:11 1.9 --- html_builder.py 1999/05/06 01:03:33 *************** *** 100,103 **** --- 100,106 ---- break + def unknown_charref(self, ref): + raise BadHTMLError, ('Unknown character reference: &#' + ref + ';') + def handle_data(self, s): #print `s` _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig From dieter@handshake.de Thu May 6 20:25:26 1999 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 6 May 1999 19:25:26 +0000 (/etc/localtime) Subject: [XML-SIG] unicode entity refs In-Reply-To: <199905060109.VAA02082@207-172-46-87.s87.tnt9.ann.va.dialup.rcn.com> References: <85256763.004E13CB.00@li01.lm.ssc.siemens.com> <199905060109.VAA02082@207-172-46-87.s87.tnt9.ann.va.dialup.rcn.com> Message-ID: <14129.59844.98081.179790@lindm.dm> A.M. Kuchling writes: > However, this doesn't fix your problem, since the error > handler raises a BadHTML exception. I'd argue for this behaviour, > since the HTML character set is ISO-whatever, not Unicode, and > therefore this is illegal HTML; if it's got character references >255, > it's not HTML but XML that looks like HTML. (Hmm... I may have > written too soon; what's the status of HTML i18n? Can you declare a > Unicode encoding for an HTML document?) We (Saarbr�cker Zeitung) make extensive use of UTF-8 encoded HTML documents for european language documents (e.g. documents containing both sweedish, french, german and greek passages). Both Netscape and IE support it. We use the META tag for "encoding" with value "UTF-8". This becomes an HTTP header telling the browser that the document is encoded in "UTF-8". With the appropriate fonts installed, you can see all european characters on a single page (and would be able to include japanese and chinese characters as well, but we do not need that, so far). > On a side note, the Unicode issue seems to be heading for > using /F's Unicode type. This would seem to be a good argument to > drop MvL's Unicode type, which is currently in the XML tree, and > replace it with /F's code. Opinions? What's the difference between these Unicode types? - Dieter From shecter@darmstadt.gmd.de Fri May 7 15:29:52 1999 From: shecter@darmstadt.gmd.de (Robb Shecter) Date: Fri, 07 May 1999 16:29:52 +0200 Subject: [XML-SIG] Validation question References: <372D6FB0.BB258D24@darmstadt.gmd.de> Message-ID: <3732F8E0.6699FE76@darmstadt.gmd.de> Lars Marius Garshol wrote: > * Robb Shecter > | > | I'd like to set up an xml validation service. I've done all my XML > | programming in Java so far, but would like to try Python out. > > Cool! You are certainly most welcome here. :) > Great. I'm really interested in Python mostly because its dynamically typed and has a decent amount of syntactic sugar - something Java is sorely missing. Comparing equivalent code for building and manipulating a DOM is really an eye-opener. You could probably make a catchy web page with this idea... "Would you rather do this, or -this-?" > > | I'd like to have an object with a method that takes two parameters; > | a dtd and an xml document, and returns true or false depending > | whether the document conforms to the dtd. > > Using xmlproc you can easily do this. However, what do you do if the > document contains a DOCTYPE declaration that points to something else? > xmlproc would let you ignore that and use the same DTD anyway, but > depending on your application this might not be a very XML-like > approach. > Excellent. That's what I'd do. I'm making a middle-tier, and I don't mind simply saying that the clients have got to get their XML right. > I have nearly all the code needed to enable this with xmlproc, the > only part missing being the handling of the internal subset. I don't know what an internal subset is, so I guess I won't be needing it. :) I've give xmlproc a look. Thanks a lot, Robb From Fred L. Drake, Jr." The toxml() methods in PyDOM perform a huge amount of string copying; just about every operation is implemented as a string addition, which requires a malloc() and data copying. When a lot of this is required, string.join() can be a lot faster when joining many strings. I've modified xml.dom.core to use string.join(); the patch is below. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives Index: core.py =================================================================== RCS file: /projects/cvsroot/xml/dom/core.py,v retrieving revision 1.45 diff -c -c -r1.45 core.py *** core.py 1999/03/27 18:47:28 1.45 --- core.py 1999/05/07 20:33:59 *************** *** 743,755 **** return '' % (repr(self._node.name),) def toxml(self): ! s = "" for c in self._node.children: if c.type == TEXT_NODE: s = s + c.value elif c.type == ENTITY_REFERENCE_NODE: ! s = s + '&' + c.name + ';' ! return s def get_nodeName(self): return self._node.name --- 743,756 ---- return '' % (repr(self._node.name),) def toxml(self): ! L = [] for c in self._node.children: if c.type == TEXT_NODE: + L.append(c.value) s = s + c.value elif c.type == ENTITY_REFERENCE_NODE: ! L.extend(["&", c.name, ";"]) ! return string.join(L, "") def get_nodeName(self): return self._node.name *************** *** 798,822 **** return "" % (self._node.name) def toxml(self): ! s = "<" + self._node.name for attr, attrnode in self._node.attributes.items(): ! s = s + " %s='" % (attr,) for value in attrnode.children: if value.type == TEXT_NODE: ! s = s + escape(value.value) else: n = NODE_CLASS[ value.type ] (value, self._document) ! s = s + value.toxml() ! s = s + "'" if len(self._node.children) == 0: ! return s + " />" ! s = s + '>' for child in self._node.children: n = NODE_CLASS[ child.type ] (child, self._document) ! s = s + n.toxml() ! s = s + "' ! return s # Attributes --- 799,825 ---- return "" % (self._node.name) def toxml(self): ! L = ["<", self._node.name] for attr, attrnode in self._node.attributes.items(): ! L.append(" %s='" % (attr,)) for value in attrnode.children: if value.type == TEXT_NODE: ! L.append(escape(value.value) ) else: n = NODE_CLASS[ value.type ] (value, self._document) ! L.append(value.toxml()) ! s = s + ! L.append("'") if len(self._node.children) == 0: ! L.append("/>") ! return string.join(L, "") ! L.append(">") for child in self._node.children: n = NODE_CLASS[ child.type ] (child, self._document) ! L.append(n.toxml()) ! L.extend([""]) ! return string.join(L, "") # Attributes *************** *** 1109,1121 **** self._document = node def toxml(self): ! s = '\n' if self.documentType: ! s = s + self.documentType.toxml() for n in self._node.children: n = NODE_CLASS[ n.type ] (n, self._document) ! s = s + n.toxml() ! return s def __repr__(self): return '' % (repr(self.get_documentElement()),) --- 1112,1124 ---- self._document = node def toxml(self): ! L = ['\n'] if self.documentType: ! L.append(self.documentType.toxml()) for n in self._node.children: n = NODE_CLASS[ n.type ] (n, self._document) ! L.append(n.toxml()) ! return string.join(L, "") def __repr__(self): return '' % (repr(self.get_documentElement()),) *************** *** 1327,1337 **** return None def toxml(self): ! s = "" for child in self._node.children: n = NODE_CLASS[ child.type ] (child, self._document) ! s = s + n.toxml() ! return s # Dictionary mapping types to the corresponding class object --- 1330,1340 ---- return None def toxml(self): ! L = [] for child in self._node.children: n = NODE_CLASS[ child.type ] (child, self._document) ! L.append(n.toxml()) ! return string.join(L, "") # Dictionary mapping types to the corresponding class object From Fred L. Drake, Jr." Has anyone been working on XLink support from Python? I don't recall it being mentioned, but I thought I'd ask around before I decide whether to use XLink for the converted Python documentation. There are distinct advantages for XLink from an authoring perspective. Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From uche.ogbuji@fourthought.com Sat May 8 02:59:38 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 07 May 1999 19:59:38 -0600 Subject: [XML-SIG] XLink support from Python? In-Reply-To: Your message of "Fri, 07 May 1999 18:00:27 EDT." <14131.25211.333465.675299@weyr.cnri.reston.va.us> Message-ID: <199905080159.TAA02082@malatesta.local> > Has anyone been working on XLink support from Python? I don't > recall it being mentioned, but I thought I'd ask around before I > decide whether to use XLink for the converted Python documentation. > There are distinct advantages for XLink from an authoring > perspective. We have some basic XLink functionality in Python that we use in-house. We were planning to dust it up a bit and let people use it, but then the freight train (benign, in this case) known as the latest XSL working draft came along... Still, the basics that we have are enough to allow us write/manipulate XML with basic "embed" and "replace" links. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Sat May 8 09:03:11 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 08 May 1999 10:03:11 +0200 Subject: [XML-SIG] XML parser benchmark Message-ID: This was posted by Clark Cooper to xml-dev yesterday. Performance for Pyexpat is not good, but it seems that his Python application can be improved. (I'm looking at it right now.) --Lars M. | There's an article of mine on: | | http://www.xml.com/ | | That compares the performance of 6 XML parsers using 4 languages: | | 1) James Clark's Expat (C) | 2) Richard Tobin's RXP (C) | 3) James Clark's XP (Java) | 4) IBM's XML4J (Java) | 5) My XML::Parser (Perl) | 6) Jack Jansen's Pyexpat (Python) | | The same program is implemented in each language and using each parser | and then compared with different test cases in the same environment. | | Please check it out and critique as necessary. | | Thanks, | Clark | | -- | Clark Cooper Software Engineer Home: coopercc@netheaven.com | Schenectady, NY USA Work: cccooper@ltionline.com From larsga@ifi.uio.no Sat May 8 10:59:47 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 08 May 1999 11:59:47 +0200 Subject: [XML-SIG] XML parser benchmark Message-ID: A quick investigation of the benchmark code shows that roughly 40 % of its time is spent doing Unicode string counting, using the re module. Here are some interesting numbers, using the big.xml document: Original: 108 seconds Using len to count: 62 seconds My optimized version: 92 seconds If we replace my naive pure-Python character counting with one of the Unicode C modules we might well have a version of statsexpat.py that is faster than the XML::Parser application. In case anyone wants to attack this, here's my version: import sys,os,time,pyexpat class Elinfo: def __init__(self, name): self.name = name self.count = 0 self.minlev = 0 self.charcount = 0 self.empty = 1 self.ptab = {} self.ktab = {} self.atab = {} class Docinfo: def __init__(self): self.root = None self.eltab = {} self.elstack = [] self.seenorder = 0 def strt_handle(self, name, attrs): intern(name) try: inf = self.eltab[name] except KeyError: inf = Elinfo(name) inf.seen = self.seenorder self.seenorder = self.seenorder + 1 self.eltab[name] = inf inf.count = inf.count + 1 if self.elstack!=[]: parent = self.elstack[-1] try: inf.ptab[parent] = inf.ptab[parent] + 1 except KeyError: inf.ptab[parent] = 1 pinf = self.eltab[parent] pinf.empty = 0 try: pinf.ktab[name] = pinf.ktab[name] + 1 except KeyError: pinf.ktab[name] = 1 else: self.root = name #Attribute handling for i in range(0, len(attrs), 2): try: inf.atab[attrs[i]] = inf.atab[attrs[i]] + 1 except KeyError: inf.atab[attrs[i]] = 1 self.elstack.append(name) def end_handle(self, name): del self.elstack[-1] def char_handle(self, data): elname = self.elstack[-1] inf = self.eltab[elname] inf.empty = 0 cnt = len(data) max = len(data) ix=0 while ix level: newlev = level + 1 inf.minlev = level for kid in inf.ktab.keys(): self.set_minlev(kid, newlev) def showtab(label, tab, dosum): if not len(tab): return print '\n ', label + ':' sum = 0 names = tab.keys() names.sort() for name in names: cnt = tab[name] sum = sum + cnt print ' %-16s %5d' % (name, cnt) if dosum and len(names) > 1: print ' =====' print ' %5d' % sum def elcmp(a, b): cmpmin = a.minlev - b.minlev if cmpmin: return cmpmin return a.seen - b.seen doc = Docinfo() parser = pyexpat.ParserCreate() parser.StartElementHandler = doc.strt_handle parser.EndElementHandler = doc.end_handle parser.CharacterDataHandler = doc.char_handle docstream = open(sys.argv[1]) start=time.clock() while 1: buff = docstream.read(32000) if not len(buff): break status = parser.Parse(buff, 0) if status == 0: print parser.ErrorCode, ' at line ', parser.ErrorLineNumber,\ ', column ', parser.ErrorColumnNumber, ', byte ',\ parser.ErrorByteIndex sys.exit(-1) status = parser.Parse('', 1) pt=time.clock()-start if status == 0: print parser.ErrorCode, ' at line ', parser.ErrorLineNumber,\ ', column ', parser.ErrorColumnNumber, ', byte ',\ parser.ErrorByteIndex sys.exit(-1) print "TIME: "+`pt` sys.exit() doc.set_minlev(doc.root, 0) sortinf = doc.eltab.values() sortinf.sort(elcmp) for elinf in sortinf: print '\n================' print elinf.name + ':', elinf.count if elinf.charcount: print 'Had', elinf.charcount, 'bytes of character data' if elinf.empty: print 'Always empty' showtab('Parents', elinf.ptab, 0) showtab('Children', elinf.ktab, 1) showtab('Attributes', elinf.atab, 0) --Lars M. From stuart.hungerford@webone.com.au Sat May 8 13:08:05 1999 From: stuart.hungerford@webone.com.au (Stuart Hungerford) Date: Sat, 08 May 1999 22:08:05 +1000 Subject: [XML-SIG] Python alternates to XSLT? References: <199905080503.BAA13877@python.org> Message-ID: <37342925.5880AB6A@webone.com.au> Hi all, Maybe I'm getting old, but I've become very frustrated with using XSL and now XSLT to transform XML documents into HTML. It started me wondering if there wasn't a simpler, easier and more Pythonic way of doing XML -> HTML transformations. Could we treat an XML document as a "little language" and use well-tested Python parsing and other libraries to generate HTML based on a set of rules? I'd love to be able to "call out" to Python's regular expressions and other abilities while in the process of transforming the XML. Am I barking up the wrong node tree here? Has anyone done something similar? Stu From paul@prescod.net Mon May 10 08:08:22 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 10 May 1999 02:08:22 -0500 Subject: [XML-SIG] XLink support from Python? References: <199905080159.TAA02082@malatesta.local> Message-ID: <373685E6.D4DAB385@prescod.net> uche.ogbuji@fourthought.com wrote: > > We have some basic XLink functionality in Python that we use in-house. We > were planning to dust it up a bit and let people use it, but then the freight > train (benign, in this case) known as the latest XSL working draft came > along... Still, the basics that we have are enough to allow us > write/manipulate XML with basic "embed" and "replace" links. The "embed" and "replace" junk in XLink is incredibly underspecified and I think that it is pretty dangerous to interpret them. For instance, according to what justification do you treat a link to an XML element differently from a link to a PDF document (or do you have a model that also allows those to be embedded). It seems to me that "embed", "replace" et. al. are behavioral junk for browsers -- no more, no less. You should treat an "embed" of an XML element as you would an embed of a JPEG -- the XLink spec. does. You would be better off adopting a local convention that has exactly the right semantics you need. Call it UCHE:transclude. I started writing a W3C NOTE for such a convention (hope you don't mind me using your name) but I realized that I really needed a more robust data model than that provided by the XML family of standards so I started working on that data model instead. The tricky part that made me start thinking about the data model is how do you express hyperlinks from one part of the logical document to another? You need to extend the concept of URL to allow references to abstract objects that result from transformations (like the transclusion transformation). That means that you need a concept of abstract objects...which is what XML has been lacking since day 1 (despite my irritating nagging). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco The first three Noble Truths of Python: All that is not Python is suffering. The origin of suffering lies in the use of not-Python. The cessation of suffering can be achieved by not using not-Python. http://www.pauahtun.org/4nobletruthsofpython.html From shecter@darmstadt.gmd.de Mon May 10 09:28:57 1999 From: shecter@darmstadt.gmd.de (Robb Shecter) Date: Mon, 10 May 1999 10:28:57 +0200 Subject: [XML-SIG] XML parser benchmark References: Message-ID: <373698C9.B17E3032@darmstadt.gmd.de> Lars Marius Garshol wrote: > This was posted by Clark Cooper to xml-dev yesterday. ... > | > | That compares the performance of 6 XML parsers using 4 languages: > | Hmm... I read the article and checked out the perl script: It looks to me like there's a serious problem with how the test was conducted. Maybe I don't understand what's going on, but this looks obvious: The tests were apparently done with the unix "time" command, by shelling out, and starting a new process for each document. This means that the interpreter-based languages get hit with two disadvantages: 1) They're penalized for VM startup and shutdown times. 2) After parsing a document, all loaded objects, references, cached whatevers and knowedge gained are thrown away, and can't be used for the next document. To me, this is a valid issue because the test environment is artificial: it doesn't represent real-world uses. It also isn't the execution style for which the systems (IMO) were designed. The test most closely models a CGI environment, which is a dying programming style. My guess is that if the test more closely modelled real-world use: a server that, in its lifetime, parses many documents, then the results may not have been so exagerated. - Robb From Tim Lavoie Tue May 11 16:52:54 1999 From: Tim Lavoie (Tim Lavoie) Date: Tue, 11 May 1999 10:52:54 -0500 Subject: [XML-SIG] Re: XML parser benchmarks In-Reply-To: <199905110501.BAA04098@python.org>; from xml-sig-admin@python.org on Tue, May 11, 1999 at 01:01:38AM -0400 References: <199905110501.BAA04098@python.org> Message-ID: <19990511105253.B15572@beyondtv.net> Robb Shecter wrote: > The tests were apparently done with the unix "time" command, by > shelling out, and starting a new process for each document. This > means that the interpreter-based languages get hit with two > disadvantages: 1) They're penalized for VM startup and shutdown > times. 2) After parsing a document, all loaded objects, references, > cached whatevers and knowedge gained are thrown away, and can't be > used for the next document. There are a few things to consider when claiming unfairness in the tests: - The larger files would have much less relative penalty in terms of VM startup. The test data had files of roughly 150K, 890K, 1.2MB, 3.4MB, and 5MB. For anything but the smallest, VM startup time should be a fairly small part of the total. - Several of the parsers rely on expat at their core. Naturally, their results will consist of whatever time expat needs for the job, plus all the overhead of the scripting-language wrapper. - Did the Java benchmarks use a just-in-time compiler? I suspect not, though there is one for Linux (tya) which might have chopped those times in half. - Finally, these are PARSING benchmarks. Not "parse the file and then ponder the results in Biblical detail" benchmarks. If someone wants to parse XML, twiddle the data in kaleidoscopic variety and then benchmark that, feel free. Time required to write usable code for a task in a given language is also another matter, and one at which languages like Perl and Python excel. That doesn't mean that the parsing benchmark itself is bad or unfair. From shecter@darmstadt.gmd.de Wed May 12 12:41:14 1999 From: shecter@darmstadt.gmd.de (Robb Shecter) Date: Wed, 12 May 1999 13:41:14 +0200 Subject: [XML-SIG] Re: XML parser benchmarks References: <199905110501.BAA04098@python.org> <19990511105253.B15572@beyondtv.net> Message-ID: <373968DA.93761D14@darmstadt.gmd.de> Tim Lavoie wrote: > There are a few things to consider when claiming unfairness in > the tests: > > - The larger files would have much less relative penalty in terms of > VM startup. Yes, you're right. > - Did the Java benchmarks use a just-in-time compiler? I suspect not, > though there is one for Linux (tya) which might have chopped those > times in half. > Yes, once you get into it, just testing parsing with Java alone can involve lots of variables. I think that Java has a relatively long startup time compared to the other languages. Maybe that was reflected in Java getting better wrt. perl and python for the larger files. - Robb From uche.ogbuji@fourthought.com Thu May 13 04:35:28 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 12 May 1999 21:35:28 -0600 Subject: [XML-SIG] XLink support from Python? In-Reply-To: Your message of "Mon, 10 May 1999 02:08:22 CDT." <373685E6.D4DAB385@prescod.net> Message-ID: <199905130335.VAA10622@malatesta.local> Uche Ogbuji: > > [...] Still, the basics that we have are enough to allow us > > write/manipulate XML with basic "embed" and "replace" links. Paul Prescod: > > The "embed" and "replace" junk in XLink is incredibly underspecified and I > think that it is pretty dangerous to interpret them. For instance, > according to what justification do you treat a link to an XML element > differently from a link to a PDF document (or do you have a model that > also allows those to be embedded). It seems to me that "embed", "replace" > et. al. are behavioral junk for browsers -- no more, no less. You should > treat an "embed" of an XML element as you would an embed of a JPEG -- the > XLink spec. does. And I thought _I_ was always on the W3C's case! But you do have a good point here, as you did with XSLT/XSLF. We really just lifted the XLink terminology and format and invented whatever behavior seemed reasonable to us. I did read your long missive on the topic to xlxp, xsl-list and XML-DEV. I'll be curious to follow the debate. I am familiar with the concept of transclusions from pute maths, but not in the SGML/XML context, so I am certainly curious. [...] Wot? No close tag? Tsk tsk. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Fri May 14 08:26:49 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 14 May 1999 09:26:49 +0200 Subject: [XML-SIG] RSS and stuff Message-ID: I sat down yesterday and had a look at RSS, a format for news headlines which is used by Slashdot, mozilla.org and Scripting News, among others. It was very simple (a bit too simple, in fact), so I sat down and made a simple RSS library and client in Python. This client produces a web page when it is run. (I run it from cron.) This is the result: The client is available at: (if it becomes popular and I extend it I'll probably make a home page for it) The 'specification' and lists of providers can be found at: (warning: the RSS guide is not very accurate technically) (note that python.org provides an RSS summary) After that it took me 15 minutes to make my free XML tools list provide its updates list in RSS format and register it with the Userland registry: Finally, for those who may be interested, there is a Visual Basic tool that can show RSS summaries. --Lars M. From jefftc@leland.Stanford.EDU Sat May 15 01:07:57 1999 From: jefftc@leland.Stanford.EDU (Jeffrey Chang) Date: Fri, 14 May 1999 17:07:57 -0700 (PDT) Subject: [XML-SIG] DOM toxml() method Message-ID: How are people generating nicely-formatted XML from DOM trees? Has anyone written a function that will take a DOM tree and will insert whitespace, where necessary, to generate a pretty XML document? It would be almost the reverse of utils.strip_whitespace. Jeff From fredrik@pythonware.com Sat May 15 19:38:44 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 15 May 1999 20:38:44 +0200 Subject: [XML-SIG] a simple SGML question (off-topic) References: Message-ID: <004f01be9f02$2ab46910$f29b12c2@pythonware.com> (a bit off topic, but here we go anyway) okay, here's my problem: I have a bunch of docbook CHAPTER's, each in a separate file. It would be very cool if I could write a "master document" which "includes" them all, something like: #include "chapter1.sgm" #include "chapter2.sgm" what strange SGML incantations do I need to do this? (I'm using Jade to process these files) would it be easier to write a Python script which rips off the extra DOCTYPE elements. From larsga@ifi.uio.no Sat May 15 19:52:29 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 15 May 1999 20:52:29 +0200 Subject: [XML-SIG] a simple SGML question (off-topic) In-Reply-To: <004f01be9f02$2ab46910$f29b12c2@pythonware.com> References: <004f01be9f02$2ab46910$f29b12c2@pythonware.com> Message-ID: * Fredrik Lundh | | I have a bunch of docbook CHAPTER's, each | in a separate file. It would be very cool if I | could write a "master document" which | "includes" them all, something like: | | #include "chapter1.sgm" | #include "chapter2.sgm" Here's what I use: ]> [...] Introduction [...] </Chapter> &publ-today; &techno-back; &architecture; &platform; &evaluation; </Book> | would it be easier to write a Python script | which rips off the extra DOCTYPE elements. I don't think so. :) --Lars M. From mss@transas.com Sat May 15 20:01:58 1999 From: mss@transas.com (Michael Sobolev) Date: Sat, 15 May 1999 23:01:58 +0400 Subject: [XML-SIG] a simple SGML question (off-topic) In-Reply-To: <004f01be9f02$2ab46910$f29b12c2@pythonware.com>; from Fredrik Lundh on Sat, May 15, 1999 at 08:38:44PM +0200 References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> <004f01be9f02$2ab46910$f29b12c2@pythonware.com> Message-ID: <19990515230158.A8598@transas.com> On Sat, May 15, 1999 at 08:38:44PM +0200, Fredrik Lundh wrote: > #include "chapter1.sgm" > #include "chapter2.sgm" <!DOCTYPE ... [ <!ENTITY chapter1 SYSTEM "chapter1.sgm"> ]> ... &chapter1; ... But I believe your files chapter* should not have any <!doctype>s. -- Mike From fredrik@pythonware.com Sun May 16 15:05:04 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 16 May 1999 16:05:04 +0200 Subject: [XML-SIG] a simple SGML question (off-topic) References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> <004f01be9f02$2ab46910$f29b12c2@pythonware.com> <19990515230158.A8598@transas.com> Message-ID: <006301be9fa5$3713fe70$f29b12c2@pythonware.com> Michael Sobolev <mss@transas.com> wrote: > On Sat, May 15, 1999 at 08:38:44PM +0200, Fredrik Lundh wrote: > > #include "chapter1.sgm" > > #include "chapter2.sgm" > <!DOCTYPE ... [ > <!ENTITY chapter1 SYSTEM "chapter1.sgm"> > ]> > > ... > &chapter1; > ... just what I needed! > > But I believe your files chapter* should not have any <!doctype>s. I was just about to answer that I need doctypes to be able to edit the individual chapters when I realized that my SGML editor did the right thing when I loaded the master document. extremely cool! Thanks /F From Shane.Burrell@metrostat.net Thu May 20 01:11:10 1999 From: Shane.Burrell@metrostat.net (Shane Burrell) Date: Wed, 19 May 1999 20:11:10 -0400 Subject: [XML-SIG] Anyone doing any XML for real estate? Message-ID: <000001bea255$438d82e0$1602a8c0@singer> Shane Burrell Software Engineer/Systems Administrator - Metrostat Technologies, Inc. From wask@mcc.com Thu May 20 16:01:42 1999 From: wask@mcc.com (wask@mcc.com) Date: Thu, 20 May 1999 10:01:42 -0500 Subject: [XML-SIG] Installing Python/XML on NT Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB2B@brazil.mcc.com> [Apologies for double posting - I sent this to the wrong reflector earlier.] Could someone point me to instructions for installing XML/Python 0.5.1 on NT? (I looked - honest, I really did - but only found UNIX instructions.) Thanks, Fred From larsga@ifi.uio.no Fri May 21 12:51:47 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 13:51:47 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14128.29972.689027.500572@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> Message-ID: <wk675mzlm4.fsf@ifi.uio.no> I posted an easySAX proposal here a while ago, in response to various requests for SAX extensions/changes, mainly from Paul Prescod. I've used it a little myself in the meantime and have found it to be a major improvement for direct programming compared to pure SAX. However, before I do anything more about this it would be nice to know what the rest of the XML-SIG is thinking. Would anyone be unhappy if SAX were kept as it is, only updated with minor changes and extended to follow Java SAX 2.0 AND easySAX were provided as the easy-to-use alternative, built on top of SAX? easySAX would, modulo any suggestions, be as proposed in <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> with start_*/end_*/pi_*/ppi_* methods. * Fred L. Drake | | It might be a good idea to use a call to extract the element stack | instead of providing direct access to the list object. This would | allow different internal structures to be used without changing the | interface. This might be interesting in some cases. I didn't do this out of a worry about speed, but now I think we should do this. Anyone who is concerned about speed can just take the risk and access the underlying stack directly anyway. Any other opinions? Also, do we need to do anything in particular to deal with namespaces here? Should we reserve a namespace-URI callback argument to slot them into when SAX 2.0 is in place? As for packaging, I think this should be a separate package from SAX itself. Other convenient interfaces on top of SAX are both possible and desirable, and I certainly don't want to monopolize that space with easySAX or appear to do so. If nobody protests I'll go ahead and do this, although I'd feel much easier about it if people actually voiced support for this. Andrew, do you think this belongs in the XML package? --Lars M. From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 16:51:05 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 11:51:05 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <wk675mzlm4.fsf@ifi.uio.no> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> Message-ID: <14149.33001.688171.408570@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> > > with start_*/end_*/pi_*/ppi_* methods. Lars, Have you written a version with the dispatcher to drive these methods? I don't think there's a lot of code, but if you've already written it, it might be nice for people to have a chance to play with it. This would be especially good to play with for those of us with a lot of code based on the xmllib API, which I find I still use. > Also, do we need to do anything in particular to deal with namespaces > here? Should we reserve a namespace-URI callback argument to slot them > into when SAX 2.0 is in place? Sigh. I'm very undecided about namespaces. The concept is really good, but I've shied away from using them. Building all the support for documents that are likely to use several (known) namespaces that all need special processing is still a pain, especially using the event-based interfaces (SAX, xmllib). I'd be more likely to use the DOM if I care about namespaces (and I almost cared about them the other day!). > As for packaging, I think this should be a separate package from SAX > itself. Other convenient interfaces on top of SAX are both possible How about xml.easysax? > If nobody protests I'll go ahead and do this, although I'd feel much > easier about it if people actually voiced support for this. Andrew, do > you think this belongs in the XML package? In general, I think it's a good idea. Perhaps the first cut can simply be a module that gets posted to the list; if it's well received, it can be added to the XML omnibus package. My name isn't Andrew, and chances are good it won't ever be, but *I* think it belongs there if people are likely to want to use it. I think we should avoid the Perl-XML problem, with lots of different packages that people need to update independently. I'd like to be able to say "this software requires the Python XML package: ftp://ftp.python.org/..." and be done with it. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From larsga@ifi.uio.no Fri May 21 17:19:10 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 18:19:10 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14149.33001.688171.408570@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> Message-ID: <wkpv3uxuo1.fsf@ifi.uio.no> * Lars Marius Garshol | | <URL: http://www.python.org/pipermail/xml-sig/1999-May/001199.html> | | with start_*/end_*/pi_*/ppi_* methods. * Fred L. Drake | | Have you written a version with the dispatcher to drive these | methods? Not yet. I was planning to do it if the proposal was accepted. | I don't think there's a lot of code, but if you've already written | it, it might be nice for people to have a chance to play with it. | This would be especially good to play with for those of us with a | lot of code based on the xmllib API, which I find I still use. I'll post it the moment I complete it, provided I do complete it at all. * Lars Marius Garshol | | Also, do we need to do anything in particular to deal with | namespaces here? Should we reserve a namespace-URI callback argument | to slot them into when SAX 2.0 is in place? * Fred L. Drake | | Sigh. I'm very undecided about namespaces. The concept is really | good, but I've shied away from using them. This is more or less my reaction as well, although I'm no fan of the form the actual final Recommendation got, nor of the place it seems to occupy in people's minds. | Building all the support for documents that are likely to use | several (known) namespaces that all need special processing is still | a pain, especially using the event-based interfaces (SAX, xmllib). This might be true, although I haven't tried or even thought much about it. Some splitting filter that sends different namespaces to different handlers might help, but I feel the basic problem is that none of these have any provision for namespaces at all. And that is what I want to avoid if we do easySAX now, since we won't be able to insert this parameter later without breaking things. But maybe we should have a separate NamespaceAwareDocumentHandler instead? One nice thing would be if the parse method automagically detected which kind of handler it received as a parameter and then either applied or did not apply namespace processing. * Lars Marius Garshol | | As for packaging, I think this should be a separate package from SAX | itself. Other convenient interfaces on top of SAX are both possible * Fred L. Drake | | How about xml.easysax? Sounds good to me, although I've already used the file name ezsax on my own disk. However, what I really meant was that I thought this should be a separate release, with its own home page, ZIP file and version history. * Lars Marius Garshol | | If nobody protests I'll go ahead and do this, although I'd feel much | easier about it if people actually voiced support for this. Andrew, | do you think this belongs in the XML package? * Fred L. Drake | | Perhaps the first cut can simply be a module that gets posted to the | list; if it's well received, it can be added to the XML omnibus | package. That's certainly an alternative, and it's by no means incompatible. One reason I'd like it to be more than just something posted to the list is that then it becomes easier to document it, refer to it and also to discover it for new users. Also, if this becomes widely used I suppose the more speed-conscious may want to bypass SAX entirely and write easySAX drivers on top of a parser. I think this is especially interesting in a JPython context, where it can be built on top of Java SAX. Opinions on this are welcome. Personally, I like packages that are available separately as well as a part of a bigger lump. | I think we should avoid the Perl-XML problem, with lots of different | packages that people need to update independently. I'd like to be | able to say "this software requires the Python XML package: | ftp://ftp.python.org/..." and be done with it. I certainly agree with this. --Lars M. From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 17:35:57 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 12:35:57 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <wkpv3uxuo1.fsf@ifi.uio.no> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> Message-ID: <14149.35693.275585.44713@weyr.cnri.reston.va.us> Lars Marius Garshol writes: > I'll post it the moment I complete it, provided I do complete it at > all. If you decide not to spend time on it, send a note to the list so someone else can pick it up. > This is more or less my reaction as well, although I'm no fan of the > form the actual final Recommendation got, nor of the place it seems to > occupy in people's minds. Yes, the Rec was rather poor, both in the technical content and the writting. > about it. Some splitting filter that sends different namespaces to > different handlers might help, but I feel the basic problem is that This is what I thought might be doable and possbly workable, but its not entirely clear to me how to work with it still. There's still a lot of setup required for the application to make things work nicely. > Sounds good to me, although I've already used the file name ezsax on > my own disk. However, what I really meant was that I thought this > should be a separate release, with its own home page, ZIP file and > version history. Hm. I think it should exist within the "xml" Python package, regardless of the external packaging. I'm not sure how multiple distributions should treat sharing of the Python package space. I do *not* like having a single module that uses different names based on the separate or omnibus distributions. This is probably something the distutils-sig should deal with. > That's certainly an alternative, and it's by no means incompatible. > One reason I'd like it to be more than just something posted to the > list is that then it becomes easier to document it, refer to it and > also to discover it for new users. I was thinking of this as a temporary "do we really want this" approach; post it after writing, and package it if people are actually interested in it. [Andrew: Are you the only person with write access to the CVS repository? It would be easier to add things for experimental periods if it was easier to add to the repository. Whether this would be useful depends on just what place you think the omnibus package has.] -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From paul@prescod.net Fri May 21 18:51:43 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 12:51:43 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> Message-ID: <37459D2F.7178EC99@prescod.net> Lars Marius Garshol wrote: > > What do people think? Is this better than adding the suggested > improvements to the SAX core? This was just hacked together in 15 > minutes, so please don't hesitate to slaughter it if you don't like > it. I'm not thrilled with the fact that it requires an explicit adapter instead of a simple base class. My counter-proposal is that easySax be a base class that defines startElement, endElement and characters. easySax "clients" would define start_Foo, end_Foo,..., startUnknown, endUnknown processingInstruction and "text", where text is defined as a Python programmer would expect: as a simple string without the index junk. What you do with captured text is highly context specific. What if we had TITLE_text, BODY_text, FOO_text and Unknowntext. Then if Unknowntext isn't defined we wouldn't be storing away little useless text snippets all of the time (e.g. if we were just looking for titles). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From larsga@ifi.uio.no Fri May 21 19:50:52 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 20:50:52 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <14149.35693.275585.44713@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <wkemkaxnn7.fsf@ifi.uio.no> * Fred L. Drake | | Hm. I think it should exist within the "xml" Python package, | regardless of the external packaging. I'm not sure how multiple | distributions should treat sharing of the Python package space. I | do *not* like having a single module that uses different names based | on the separate or omnibus distributions. That wouldn't happen in any case (just as it hasn't with saxlib and xmlproc, both of which use the same whether inside or outside the omnibus package). | I was thinking of this as a temporary "do we really want this" | approach; post it after writing, and package it if people are | actually interested in it. OK, I'll go ahead and do that. If I give up I'll notify the list. --Lars M. From larsga@ifi.uio.no Fri May 21 19:51:43 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 May 1999 20:51:43 +0200 Subject: [XML-SIG] easySAX In-Reply-To: <37459D2F.7178EC99@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <37459D2F.7178EC99@prescod.net> Message-ID: <wkd7zuxnls.fsf@ifi.uio.no> * Lars Marius Garshol | | What do people think? Is this better than adding the suggested | improvements to the SAX core? This was just hacked together in 15 | minutes, so please don't hesitate to slaughter it if you don't like | it. * Paul Prescod | | I'm not thrilled with the fact that it requires an explicit adapter | instead of a simple base class. Hmmm. Why do you see this as a disadvantage? Part of the reason I did it as I did is that I want the user to be able to redefine startElement and endElement without messing up the framework. Knowing why you don't like the adapater would make the tradeoff easier. | My counter-proposal is that easySax be a base class that defines | startElement, endElement and characters. | | easySax "clients" would define start_Foo, end_Foo,..., startUnknown, | endUnknown processingInstruction and "text", where text is defined | as a Python programmer would expect: as a simple string without the | index junk. Hmmm. I feel uneasy about the *Uknown methods, but I suppose they will have their uses. | What you do with captured text is highly context specific. What if | we had TITLE_text, BODY_text, FOO_text and Unknowntext. Then if | Unknowntext isn't defined we wouldn't be storing away little useless | text snippets all of the time (e.g. if we were just looking for | titles). Paul, thanks you! I think this is the idea I've been looking for ever since I started thinking about making something like easySAX. If we pass in attributes as well here it means that for the small leaf elements (which in data-oriented XML are usually the important ones) you have all the information you need in one callback. It would also mean that passing an unsliced strings to characters in the real SAX probably will pay off, since as you say we will now only slice the strings when we actually need them. (Except I think I prefer text_TITLE and textUnknown.) I'll give the interface another rotation and then post it again. (More comments are of course very welcome.) --Lars M. From wask@mcc.com Fri May 21 20:36:21 1999 From: wask@mcc.com (wask@mcc.com) Date: Fri, 21 May 1999 14:36:21 -0500 Subject: [XML-SIG] JPython / xmllib issue ???? Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> Hello, Because I don't know how to load the NT version of the XML 0.5.1 package, I decided to "brute force it" and use xmllib - which is ok as I'm teaching myself the basics. However, I keep stumbling into a problem using xmllib from JPython, a problem I don't see using Python. Particulars are noted below for those interested. Is this a cockpit error or a JPython issue? Any help would be most appreciated. [The carrot: I work for a research firm for several large international corps. I'm trying to assess this technology's viability for incorporation into a large project.] Much thanks in advance, Fred ***** The simple XML file (should look awfully familiar) ***** <?xml version="1.0"?> <COLLECTION> <COMIC TITLE="Sandman" NUMBER="62"> <WRITER>Neil Gaman</WRITER> <PENCILLER PAGES="1-9, 18-24">Glyn Dillon</PENCILLER> <PENCILLER PAGES="10-17">Charles Vess</PENCILLER> </COMIC> </COLLECTION> ***** Code snippet ***** try: testFile = open ('Comics.xml', 'rw') except IOError, detail: print '***IO ERROR> ', detail parser = xmllib.XMLParser () data = testFile.read() parser.feed(data) ***** The problem ***** None if running Python. If running from a JPython script --- Traceback (innermost last): <snip my stuff> File "...\xmllib.py", line 149, in feed File "...\xmllib.py", line 240, in goahead File "...\xmllib.py", line 610, in parse_starttag IndexError: group 7 is undefined From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 20:40:14 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 15:40:14 -0400 (EDT) Subject: [XML-SIG] JPython / xmllib issue ???? In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB30@brazil.mcc.com> Message-ID: <14149.46750.170061.508059@weyr.cnri.reston.va.us> wask@mcc.com writes: > If running from a JPython script --- > > Traceback (innermost last): > <snip my stuff> > File "...\xmllib.py", line 149, in feed > File "...\xmllib.py", line 240, in goahead > File "...\xmllib.py", line 610, in parse_starttag > IndexError: group 7 is undefined I get a different error: Traceback (innermost last): File "snippet.py", line 1, in ? File "/depot/java/share/JPython-1.0/Lib/xmllib.py", line 60, in ? File "/depot/java/share/JPython-1.0/Lib/string.py", line 13, in maketrans NameError: maketrans not yet implemented in JPython -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From akuchlin@cnri.reston.va.us Fri May 21 20:48:51 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 21 May 1999 15:48:51 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <14149.35693.275585.44713@weyr.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <14149.46016.398382.370758@amarok.cnri.reston.va.us> Fred L. Drake writes: > [Andrew: Are you the only person with write access to the CVS >repository? The public CVS tree is mirrored from the CVS tree on my machine at home, so the answer is yes. :) I certainly think easySAX would be a good addition to the XML package. It would be even better if easySAX was small enough to be added to the Python library. One problem is that you have to choose between using the XML package and just xmllib.py, particularly for applications that aren't aimed at XML-aware users, but simply use XML internally. For example, I'm starting work on a GUI editor for recipes using the DTD previously discussed here, and it's a difficult decision to require neophyte users to install the XML package; I may end up just using xmllib.py to parse input to avoid requiring the installation of another package. (On the other hand, the true fix for this is probably to finish the distutils work and make it much easier to install Python extensions.) -- A.M. Kuchling http://starship.python.net/crew/amk/ Considered in its entirety, psychoanalysis won't do. It is an end product, moreover, like a dinosaur or a zeppelin; no better theory can ever be erected on its ruins, which will remain for ever one of the saddest and strangest of all landmarks in the history of twentieth century thought. -- Sir Peter Medawar From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 21:02:18 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 16:02:18 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <14149.46016.398382.370758@amarok.cnri.reston.va.us> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> Message-ID: <14149.48074.473510.593070@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > The public CVS tree is mirrored from the CVS tree on my > machine at home, so the answer is yes. :) Perhaps this can be moved to cvs.python.org or the starship, if you don't object? > I certainly think easySAX would be a good addition to the XML > package. It would be even better if easySAX was small enough to be I understood easySAX to depend on the parsers from the xml package; if it is, then adding it to the standard library won't help unless it includes a driver for xmllib (at which point you may as well just use xmllib). > added to the Python library. One problem is that you have to choose > between using the XML package and just xmllib.py, particularly for > applications that aren't aimed at XML-aware users, but simply use XML This reminds me: I still want to look at xml.parsers.xmllib to make the interface match that of xmllib. While I may complain about namespaces and the interface for them, proliferating incompatible interfaces won't help the situation. Not sure when I'll have time; I really need to update t1python now that I've updated my Linux installation at home. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From paul@prescod.net Fri May 21 22:07:45 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 16:07:45 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> Message-ID: <3745CB21.FC8A9668@prescod.net> "Fred L. Drake" wrote: > > I understood easySAX to depend on the parsers from the xml package; > if it is, then adding it to the standard library won't help unless it > includes a driver for xmllib (at which point you may as well just use > xmllib). Why? easySax allows the person to move to another parser if they want. It's an abstraction over xmllib that gives them more freedom of choice. Since I don't believe xmllib is a complete, standards-conformant parser I think that is important. In fact, I'd like to see easySax put on top of sgmlop and promoted as the "standard" Python/XML integration for Python 1.6. Maybe by Python 2 we would move to something larger like expat. So how about that? easySax and sgmlop in Python 1.6. xmllib's interface is deprecated. Additional parsers and handlers can be downloaded as part of the xml sig distribution? -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From Fred L. Drake, Jr." <fdrake@acm.org Fri May 21 22:55:32 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 21 May 1999 17:55:32 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <3745CB21.FC8A9668@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> Message-ID: <14149.54868.802029.691711@weyr.cnri.reston.va.us> Paul Prescod writes: > Why? easySax allows the person to move to another parser if they want. > It's an abstraction over xmllib that gives them more freedom of choice. I'm fine with this. > In fact, I'd like to see easySax put on top of sgmlop and promoted as the > "standard" Python/XML integration for Python 1.6. Maybe by Python 2 we This presents a very real problem: xmllib is already standard and documented, and therefore "in use". Deprecating it is a problem because people will need to update their code for what will probably be a mostly minimal difference (for existing code). Updating what's currently xml.parsers.xmllib to the documented xmllib interface and using that as the standard xmllib would be a big improvement, esp. with sgmlop in the core. That's not to say an additional API can't be added, but a second event-based interface is not necessarily a good idea. Perhaps a compromise API can be created which extends the xmllib interface with the pi_*(), ppi_*(), and text_*() methods? Extending the existing interface is not a problem as far as I can tell. It can still be highly efficient, especially if we allow handle_data() to be undefined. > So how about that? easySax and sgmlop in Python 1.6. xmllib's interface is > deprecated. Additional parsers and handlers can be downloaded as part of > the xml sig distribution? As long as the base easySAX can accept arbitrary backends I'm still happy. There's no reason not to allow the xmllib.XMLParser to support arbitrary backends as well, with the default being the current implementation (or something compatible). -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From gstein@lyra.org Fri May 21 23:41:51 1999 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 May 1999 15:41:51 -0700 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> Message-ID: <3745E12F.69CE31E7@lyra.org> Fred L. Drake wrote: >... > [Andrew: Are you the only person with write access to the CVS > repository? It would be easier to add things for experimental periods > if it was easier to add to the repository. Whether this would be > useful depends on just what place you think the omnibus package has.] I'm set up to provide a read/write CVS repository to multiple projects and people. This could be particularly handy for non-CNRI contributors since CNRI-based repositories have access restrictions. Right now, I'm running the mod_dav project from my CVS system, but it has already been configured for more projects with per-person per-project access control. The system is also configured for sending email when checkins occur. I will happily host any Python-related or WebDAV-related project on my CVS server (and other facilities). They can live under the lyra.org, webdav.org, or pythonpros.com domains. Cheers, -g -- Greg Stein, http://www.lyra.org/ From paul@prescod.net Fri May 21 23:36:02 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 17:36:02 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <37459D2F.7178EC99@prescod.net> <wkd7zuxnls.fsf@ifi.uio.no> Message-ID: <3745DFD2.C5D08C0E@prescod.net> Lars Marius Garshol wrote: > > Hmmm. Why do you see this as a disadvantage? Part of the reason I did > it as I did is that I want the user to be able to redefine > startElement and endElement without messing up the framework. Knowing > why you don't like the adapater would make the tradeoff easier. * a tiny bit of extra overhead, * some extra typing, * the fact that the user has to explicitly invoke a bridge between easy sax and "real sax", * I would prefer using easySax to be more like using regular sax and also more like using SAX in Java, * you can't get the default implementation for, e.g. PI * it makes easySax somewhat "second class" Nothing major. Just a vague discomfort. > (Except I think I prefer text_TITLE and textUnknown.) Well English isn't your native language. :) No, actually that's fine with me. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From paul@prescod.net Fri May 21 23:48:37 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 21 May 1999 17:48:37 -0500 Subject: [XML-SIG] easySAX References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> Message-ID: <3745E2C5.B55CC39D@prescod.net> "Fred L. Drake" wrote: > > This presents a very real problem: xmllib is already standard and > documented, and therefore "in use". Deprecating it is a problem > because people will need to update their code for what will probably > be a mostly minimal difference (for existing code). I was thinking that deprecating it would just mean that new people would stop using it. As regex says: "This module is obsolete as of Python version 1.5; it is still being maintained because much existing code still uses it." > That's not to say an additional API can't be added, but a second > event-based interface is not necessarily a good idea. Perhaps a > compromise API can be created which extends the xmllib interface with > the pi_*(), ppi_*(), and text_*() methods? Extending the existing > interface is not a problem as far as I can tell. I kind of think that the current interface is too large and complicated already. easySax was going to be something like 6 or 8 callbacks. xmllib is already something like 16 or 17. Another option would be to merge the interfaces but deprecate all but the 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many others will never be triggered by a sax parser (even sgmlop, if it is talking to xmllib via sax). -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From Bharat" <bharatrs@vsnl.com Sun May 23 00:54:13 1999 From: Bharat" <bharatrs@vsnl.com (Bharat) Date: Sat, 22 May 1999 19:54:13 -0400 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #297 - 2 msgs References: <199905220503.BAA28568@python.org> Message-ID: <000601bea4ba$60873aa0$1d38c5cb@bharatrs> subscribe ----- Original Message ----- From: <xml-sig-admin@python.org> To: <xml-sig@python.org> Sent: Saturday, May 22, 1999 1:03 AM Subject: XML-SIG digest, Vol 1 #297 - 2 msgs > > Send XML-SIG mailing list submissions to > xml-sig@python.org > > To subscribe or unsubscribe via the web, visit > http://www.python.org/mailman/listinfo/xml-sig > or, via email, send a message with subject or body 'help' to > xml-sig-request@python.org > You can reach the person managing the list at > xml-sig-admin@python.org > > When replying, please edit your Subject line so it is more specific than > "Re: Contents of XML-SIG digest...") > > > Today's Topics: > > 1. Re: easySAX (Paul Prescod) > 2. Re: easySAX (Paul Prescod) > > --__--__-- > > Message: 1 > Date: Fri, 21 May 1999 17:36:02 -0500 > From: Paul Prescod <paul@prescod.net> > To: xml-sig@python.org > Subject: Re: [XML-SIG] easySAX > > Lars Marius Garshol wrote: > > > > Hmmm. Why do you see this as a disadvantage? Part of the reason I did > > it as I did is that I want the user to be able to redefine > > startElement and endElement without messing up the framework. Knowing > > why you don't like the adapater would make the tradeoff easier. > > * a tiny bit of extra overhead, > * some extra typing, > * the fact that the user has to explicitly invoke a bridge between easy > sax and "real sax", > * I would prefer using easySax to be more like using regular sax and also > more like using SAX in Java, > * you can't get the default implementation for, e.g. PI > * it makes easySax somewhat "second class" > > Nothing major. Just a vague discomfort. > > (Except I think I prefer text_TITLE and textUnknown.) > > Well English isn't your native language. :) > > No, actually that's fine with me. > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Alabama's constitution is 100 years old, 300 pages long and has more than > 600 amendments. Highlights include "Amendment 393: Amendment of Amendment > No. 351", "Validation of Laws Regulating Court Costs in Randolph County", > "Miscegenation laws", "Bingo Games in Russell County", "Suppression > of dueling". - http://www.legislature.state.al.us/ALISHome.html > > --__--__-- > > Message: 2 > Date: Fri, 21 May 1999 17:48:37 -0500 > From: Paul Prescod <paul@prescod.net> > To: xml-sig@python.org > Subject: Re: [XML-SIG] easySAX > <14128.29972.689027.500572@weyr.cnri.reston.va.us> > <wk675mzlm4.fsf@ifi.uio.no> > <14149.33001.688171.408570@weyr.cnri.reston.va.us> > <wkpv3uxuo1.fsf@ifi.uio.no> > <14149.35693.275585.44713@weyr.cnri.reston.va.us> > <14149.46016.398382.370758@amarok.cnri.reston.va.us> > <14149.48074.473510.593070@weyr.cnri.reston.va.us> > <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> > > "Fred L. Drake" wrote: > > > > This presents a very real problem: xmllib is already standard and > > documented, and therefore "in use". Deprecating it is a problem > > because people will need to update their code for what will probably > > be a mostly minimal difference (for existing code). > > I was thinking that deprecating it would just mean that new people would > stop using it. As regex says: "This module is obsolete as of Python > version 1.5; it is still being maintained because much existing code still > uses it." > > > That's not to say an additional API can't be added, but a second > > event-based interface is not necessarily a good idea. Perhaps a > > compromise API can be created which extends the xmllib interface with > > the pi_*(), ppi_*(), and text_*() methods? Extending the existing > > interface is not a problem as far as I can tell. > > I kind of think that the current interface is too large and complicated > already. easySax was going to be something like 6 or 8 callbacks. xmllib > is already something like 16 or 17. > > Another option would be to merge the interfaces but deprecate all but the > 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many > others will never be triggered by a sax parser (even sgmlop, if it is > talking to xmllib via sax). > > -- > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself > http://itrc.uwaterloo.ca/~papresco > > Alabama's constitution is 100 years old, 300 pages long and has more than > 600 amendments. Highlights include "Amendment 393: Amendment of Amendment > No. 351", "Validation of Laws Regulating Court Costs in Randolph County", > "Miscegenation laws", "Bingo Games in Russell County", "Suppression > of dueling". - http://www.legislature.state.al.us/ALISHome.html > > > > --__--__---- > > End of XML-SIG Digest > From tismer@appliedbiometrics.com Sun May 23 17:53:19 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 23 May 1999 18:53:19 +0200 Subject: [XML-SIG] DOM toxml() method References: <Pine.GSO.3.96.990514145940.21814A-100000@saga9.Stanford.EDU> Message-ID: <3748327F.DB8CD080@appliedbiometrics.com> This is a multi-part message in MIME format. --------------71AC2DF7ECB30B2725A67297 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Jeffrey Chang wrote: > > How are people generating nicely-formatted XML from DOM trees? > > Has anyone written a function that will take a DOM tree and will insert > whitespace, where necessary, to generate a pretty XML document? It would > be almost the reverse of utils.strip_whitespace. Well, I did a little on this a while ago. But don't ask me about it's current state, had a lot of other projects meanwhile... ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home --------------71AC2DF7ECB30B2725A67297 Content-Type: text/plain; charset=us-ascii; name="indenter.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="indenter.py" # pretty printer for SAX # CT990122 # based upon the saxutils.Canonizer code # V.0.2 support for sgmlop which doesn't give ignorableWhitespace info from xml.sax import saxexts, saxlib, saxutils import string, sys class Indenter(saxlib.HandlerBase): "A SAX document handler that produces indented XML output." def __init__(self,writer=sys.stdout, indent=2): self.elem_level=0 self.writer=writer self.indent=indent self.last_level=-1 self.buffer = "" # lazy buffer for whitespace stripping def processingInstruction (self,target, remainder): #if not target=="xml": self.writer.write("<?"+target+" "+remainder+"?>\n") def startElement(self,name,amap): if self.buffer: self.write_buffer() self.writer.write("\n"+self.indent*self.elem_level*" "+"<"+name) a_names=amap.keys() a_names.sort() for a_name in a_names: self.writer.write(" "+a_name+"=\"") self.write_data(amap[a_name], 1) self.writer.write("\"") self.writer.write(">") self.last_level = self.elem_level self.elem_level=self.elem_level+1 def endElement(self,name): if self.buffer: self.write_buffer() self.elem_level=self.elem_level-1 if self.last_level < self.elem_level: self.writer.write("\n"+self.indent*self.elem_level*" "+"</"+name+">") else: self.writer.write("</"+name+">") self.last_level = -1 def ignorableWhitespace(self,data,start_ix,length): # we drop white space here. # self.characters(data,start_ix,length) pass def characters(self,data,start_ix,length): if self.elem_level>0: self.put_buffer(data[start_ix:start_ix+length]) def put_buffer(self, txt): self.buffer = self.buffer+txt def write_buffer(self): if self.buffer: self.write_data(string.strip(self.buffer)) self.buffer = "" def write_data(self,data, quotes=0): "Writes datachars to writer." data=string.replace(data,"&","&") data=string.replace(data,"<","<") if quotes: data=string.replace(data,"\"",""") data=string.replace(data,">",">") self.writer.write(data) def endDocument(self): self.write_buffer() self.writer.write("\n") try: pass #self.writer.close() except NameError: pass # It's OK, if the method isn't there we probably don't need it """ Example to format a DOM: >>> i=Indenter() >>> p=saxexts.make_parser() >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(cStringIO.StringIO(dom.toxml())) Example to format a file to a file, with sgmlop as parser: >>> f=open(r'd:\tmp\test.xml',"w") >>> i=Indenter(f) >>> p=saxexts.make_parser("xml.sax.drivers.drv_sgmlop") >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(r"h:\pns\projekte\srz\roteli\birgit\sgml\praep.sgm.umgebrochen.xml") >>> f.close() """ # speed comparison: # a very minimalistic parser which just finds tags. def indent(infile, outfile=sys.stdout, indent=2): split = string.split strip = string.strip if type(infile)==type(""): txt = infile else: txt = infile.read() lis = split(txt, "<") level = 0 lastl = -1 try: txt = strip(lis[0]) p = 1 while 1: parts = split(lis[p], ">") if len(parts) > 2: parts[:-1]=join(parts[:-1], ">") if parts[0][:1] != "/": # assume start tag outfile.write(strip(txt)+"\n"+indent*level*" "+"<"+parts[0]+">") txt = parts[1] lastl = level if parts[0][-1] not in "/?": # kein empty tag oder PI? level=level+1 else: outfile.write(strip(txt)) txt = parts[1] level=level-1 if lastl < level: outfile.write("\n"+indent*level*" "+"<"+parts[0]+">") else: outfile.write("<"+parts[0]+">") lastl = -1 p = p + 1 except IndexError: pass outfile.write(txt) --------------71AC2DF7ECB30B2725A67297-- From akuchlin@cnri.reston.va.us Mon May 24 02:33:58 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 23 May 1999 21:33:58 -0400 Subject: [XML-SIG] easySAX In-Reply-To: <3745E12F.69CE31E7@lyra.org> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <3745E12F.69CE31E7@lyra.org> Message-ID: <199905240133.VAA28025@mira.erols.com> Greg Stein writes: > I'm set up to provide a read/write CVS repository to multiple projects > and people. This could be particularly handy for non-CNRI contributors > since CNRI-based repositories have access restrictions. Ah! Having more people with commit privileges would be excellent, and would probably speed development, since people could keep their own modules up to date. We can discuss the administrative details of setting this up in private e-mail. -- A.M. Kuchling http://starship.python.net/crew/amk/ Those who will not reason / Perish in the act: / Those who will not act / Perish for that reason. -- W.H. Auden, "Shorts" From Fred L. Drake, Jr." <fdrake@acm.org Mon May 24 18:00:21 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Mon, 24 May 1999 13:00:21 -0400 (EDT) Subject: [XML-SIG] easySAX In-Reply-To: <3745E2C5.B55CC39D@prescod.net> References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> <3745E2C5.B55CC39D@prescod.net> Message-ID: <14153.34213.811521.224833@weyr.cnri.reston.va.us> Paul Prescod writes: > I was thinking that deprecating it would just mean that new people would > stop using it. As regex says: "This module is obsolete as of Python This could be done easily enough (technically; this would still have to pass through Guido, but I think he's open to the SIG's suggestions on this stuff). > I kind of think that the current interface is too large and complicated > already. easySax was going to be something like 6 or 8 callbacks. xmllib > is already something like 16 or 17. There are a lot, but I'm not sure it's a huge problem. I don't know why there's setnomoretags(), and I'd expect setliteral() and translate_references() should be internal and undocumented. I'd also think the default behavior of handle_cdata() should be to pass the data along to handle_data(), but that's a separate issue. > Another option would be to merge the interfaces but deprecate all but the > 6 or 8 *methods*. handle_charref, handle_entityref, handle_cdata and many I'd be interested in seeing a specific synopsis for your simplified interface; perhaps just a class declaration with docstrings? -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From jack@oratrix.nl Mon May 24 20:46:31 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 24 May 1999 21:46:31 +0200 Subject: [XML-SIG] easySAX In-Reply-To: Message by "Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> , Fri, 21 May 1999 15:48:51 -0400 (EDT) , <14149.46016.398382.370758@amarok.cnri.reston.va.us> Message-ID: <19990524194636.E6BFCDDE08@oratrix.oratrix.nl> Recently, "Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> said: > I certainly think easySAX would be a good addition to the XML > package. It would be even better if easySAX was small enough to be > added to the Python library. Great idea! -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon May 24 21:13:19 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 24 May 1999 22:13:19 +0200 Subject: [XML-SIG] easySAX In-Reply-To: Message by Jack Jansen <jack@oratrix.nl> , Mon, 24 May 1999 21:46:31 +0200 , <19990524194636.E6BFCDDE08@oratrix.oratrix.nl> Message-ID: <19990524201324.137B7DDE08@oratrix.oratrix.nl> Recently, Jack Jansen <jack@oratrix.nl> said: > Great idea! Hmm, that wasn't very informative. I thinnk I need to learn to (a) read all messages on the subject and (b) then write meaningful messages. Let me try again:-) What I would like very much if there was a working easySAX interface in the core distribution, which would be setup to use the existing xmllib (which, in turn, would be marked depracated in the manual). Loading the whole xml suite would make the other parsers available to easySAX, thereby allowing an easy upgrade path to more functionality or faster parsers or whatever. And, of course, real power users could then switch from the easysax interface to the full interface. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From paul@prescod.net Tue May 25 14:23:08 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 25 May 1999 08:23:08 -0500 Subject: [XML-SIG] xmllib and easysax References: <wkvhe7le7x.fsf@ifi.uio.no> <14128.29972.689027.500572@weyr.cnri.reston.va.us> <wk675mzlm4.fsf@ifi.uio.no> <14149.33001.688171.408570@weyr.cnri.reston.va.us> <wkpv3uxuo1.fsf@ifi.uio.no> <14149.35693.275585.44713@weyr.cnri.reston.va.us> <14149.46016.398382.370758@amarok.cnri.reston.va.us> <14149.48074.473510.593070@weyr.cnri.reston.va.us> <3745CB21.FC8A9668@prescod.net> <14149.54868.802029.691711@weyr.cnri.reston.va.us> <3745E2C5.B55CC39D@prescod.net> <14153.34213.811521.224833@weyr.cnri.reston.va.us> Message-ID: <374AA43C.D28C69D4@prescod.net> Fred asked me to outline an easysax compliant xmllib extension. Well, xmllib has only one export: xmllib.XMLParser. So my idea is that all we need to do is either build on that or deprecate it. Option 1: Build on it: We can extend xmllib.XMLParser (and sgmlop!) to be SAX-compliant parsers just by adding setDocumentHandler, etc. When XMLParser.parse() is called, it would behave just like a SAX parser. If setDocumentHandler is never called then the default document handler would be an undocumented helper class that would redirect the events BACK to the xmllib.XMLParser (because xmllib.XMLParser plays the roles of both parser and event handler). All of the non-SAX methods of xmllib.XMLParser would be deprecated. Option 2: Deprecate it: Maybe it is better to deprecate all of xmllib.XMLParser instead of deprecating individual methods. If we deprecated it we would replace it with an xmllib.Parser (not the shorter name) that was SAX compliant. Other stuff: Now that xmllib has a SAX-compliant parser (one way or the other), we can make a class called xmllib.handler which is a base class that implements all of the SAX methods and redirects start_FOO, text_FOO, pi_FOO, to a subclassed client (if it cares to override them) and also allows overriding of error, fatalerror, warning and so forth. I could live with the default behavior for errors and warnings being to throw an exception, I guess. We wouldn't really need to use the term "easysax" anymore. Easysax was never really an API in that we didn't expect multiple implementations for it. It was just a convenient handler base class (or adapter). I would also like the initialization of the XMLParser and handler classes to be integrated somehow. "Ordinary" sax takes too many steps in my opinion. We need to have a single line of user code that sets ALL of the sax handlers, creates the parser and parses. Perhaps class handler: def Parse( streamOrFile, parser=None ): parser = parser or XMLParser() XMLParser.setThis() XMLParser.setThat() if isFile( streamOrFile ): XMLParser.parse( open( "file", "rb" ) ) else: XMLParser.parse( streamOrFile ) This would be used like so: class MyHandler( xmllib.handler ): def text_TITLE( self, text ): #blah h=MyHandler() h.Parse( "/myfile.xml" ) One neat thing about this is that we could change the Parse() implementation one day so that it used a parser that knew a lot about easysax and did not (for instance) report text and elements that we aren't going to work with *at all*. If you don't specifically ask for a parser you get the blazingly fast one. But if you want choice you've got it: h=MyHandler() h.Parse( "/myfile.xml", MyFavParser() ) -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html From wask@mcc.com Tue May 25 22:48:45 1999 From: wask@mcc.com (wask@mcc.com) Date: Tue, 25 May 1999 16:48:45 -0500 Subject: [XML-SIG] Error re: the impish "imp" ????????/ Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> I've been learning to use the XML Package from Python. However, when running from JPython, I keep getting errors regarding module imp (no module named, NameError). I've got my PATH set up correctly (I'm in Windows Wonderland). Anyone else experience this problem? --- Fred From l.szyster@ibm.net Wed May 26 10:42:03 1999 From: l.szyster@ibm.net (Laurent Szyster) Date: Wed, 26 May 1999 11:42:03 +0200 Subject: [XML-SIG] Python alternates to XSLT? References: <199905080503.BAA13877@python.org> <37342925.5880AB6A@webone.com.au> Message-ID: <374BC1EB.80F85F5F@ibm.net> Hi Stuart, Stuart Hungerford wrote: > > Maybe I'm getting old, but I've become very frustrated > with using XSL and now XSLT to transform XML documents > into HTML. What tools did you use? What problem did you encounter? I'm eager to learn all this, because I plan to use XSLT as the transformation language for a mapping engine. > I'd love to be able to "call out" to Python's > regular expressions and other abilities while in the > process of transforming the XML. Yes that would be nice. And that is exactly why I used Python to write my first mapper, Ema (an EDIFACT mapper). Laurent From Fred L. Drake, Jr." <fdrake@acm.org Wed May 26 14:54:41 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 26 May 1999 09:54:41 -0400 (EDT) Subject: [XML-SIG] Error re: the impish "imp" ????????/ In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> Message-ID: <14155.64801.383489.780326@weyr.cnri.reston.va.us> wask@mcc.com writes: > I've been learning to use the XML Package from Python. However, when running > from JPython, I keep getting errors regarding module imp (no module named, > NameError). I've got my PATH set up correctly (I'm in Windows Wonderland). Fred, This should go to the JPython list; this question doesn't appear specific to the XML packages. If this is being raised from the XML packages, perhaps you could post a traceback and a code snippet to reproduce the error. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From Fred L. Drake, Jr." <fdrake@acm.org Wed May 26 15:04:13 1999 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 26 May 1999 10:04:13 -0400 (EDT) Subject: [XML-SIG] Location of Windows DLLs in the CVS tree Message-ID: <14155.65373.814138.260300@weyr.cnri.reston.va.us> Is there any reason not to place the Windows DLLs in the appropriate locations in the package tree for Windows? Windows uses should then be able to simply unpack the distribution in the right place to use it; no copying or moving of the DLLs would be needed. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives From larsga@ifi.uio.no Wed May 26 15:55:57 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 26 May 1999 16:55:57 +0200 Subject: [XML-SIG] Error re: the impish "imp" ????????/ In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> Message-ID: <wkpv3nj2wy.fsf@ifi.uio.no> * wask@mcc.com | | I've been learning to use the XML Package from Python. However, when | running from JPython, I keep getting errors regarding module imp (no | module named, NameError). The problem is that imp is not implemented in JPython. I assume you're using SAX, and specifically the parser factories? I know they have this problem in JPython and I seem to recall that I have a patch submitted by Geir Ove Gr�nmo which does not use imp and thus works in JPython. I'll try to verify that this is so and see if I can make it available to you. I'll also try to think a little about whether we should put out another SAX version while waiting for SAX2. Got to run now, --Lars M. From HCSCCS@prudential.com.my Wed May 26 19:12:06 1999 From: HCSCCS@prudential.com.my (Paul Chung Chee Soong) Date: Wed, 26 May 1999 18:12:06 Subject: [XML-SIG] Python/XML HOWTO Message-ID: <199905261012-64975@prudential.com.my> Hi. I'm Paul from Malaysia and very keen in Python. I know that you're the author of the current Python/XML HOWTO. Thanks very much for coming up for this invaluable information. Now, let's discuss my problem. I have difficulty understanding certain part of the document. For example, I can't execute the "from xml.sax import saxlib, saxexts" coz I don't have the xml.sax module. But anyway, i manage to download those components separately. The earlier code become "import saxlib, saxexts". Did I solve the problem? FYI, I'm having Python 1.52 final release. Another problem that I came across is that in Section 3.1 Starting Out. I don't seem to run the example. This is what I do.. import saxexts if __name__ = '__main__': parser = saxexts.make_parser('drv_xmllib') # 1 dh = FindIssue ('Sandman', '62') # 2 parser.setDocumentHandler(dh) # 3 parser.parseFile('collection.xml') # 4 Now, in the # 1, I got error when I followed your example. Therefore, I include a parameter (a driver) Next in # 4, I got problem again. In your example, your parameter is a file. What is the 'file' represents? I thought it was a xml file but it isn't right?? I'll appreciate your advice in any format possible. Thanks a lot. Let's keep Python ALIVE!! Sincerely, Paul x�>" From larsga@ifi.uio.no Thu May 27 07:41:06 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 27 May 1999 08:41:06 +0200 Subject: [XML-SIG] Python/XML HOWTO In-Reply-To: <199905261012-64975@prudential.com.my> References: <199905261012-64975@prudential.com.my> Message-ID: <wkiu9fggl9.fsf@ifi.uio.no> * Paul Chung Chee Soong | | Now, let's discuss my problem. I have difficulty understanding | certain part of the document. For example, I can't execute the "from | xml.sax import saxlib, saxexts" coz I don't have the xml.sax | module. But anyway, i manage to download those components | separately. The earlier code become "import saxlib, saxexts". Did I | solve the problem? Probably not. :) This is something that seems to confuse many newbies, so I'll try to explain. If you do like this: C:\Mine dokumenter>python Python 1.5.2c1 (#0, Mar 12 1999, 10:55:39) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> print sys.path Python will print a list of directories. This is the list of directories in which it looks for modules you import. When you unzip the XML-SIG package (or the saxlib.zip file) somewhere on your disk (let's say c:\foo) it creates an xml directory (c:\foo\xml), below that a sax directory and below that again a drivers directory. What you need to do is to ensure that c:\foo is in the sys.path list. If it isn't you'll need to either add c:\foo to the PYTHONPATH environment variable or add it to the HKEY_LOCAL_MACHINE\Software\Python\1.5.2\PythonPath registry key. I hope that explained it for you? (If it did I think I'll brush it up a little and put it on the SAX pages as installation instructions.) | Another problem that I came across is that in Section 3.1 Starting | Out. I don't seem to run the example. This is what I do.. | | import saxexts | if __name__ = '__main__': | parser = saxexts.make_parser('drv_xmllib') # 1 | dh = FindIssue ('Sandman', '62') # 2 | parser.setDocumentHandler(dh) # 3 | parser.parseFile('collection.xml') # 4 | | Now, in the # 1, I got error when I followed your example. This is very likely because the parser factory has a list of drivers that looks like 'xml.sax.drivers.drv_xmllib'. So if you solve the package problem above you shouldn't need to hard-code the driver package name. (Another thing is that other programs that use SAX won't work unless they can find SAX where they expect it.) | Next in # 4, I got problem again. In your example, your parameter is | a file. What is the 'file' represents? I thought it was a xml file | but it isn't right?? It's the name of an XML file. If you want to push an XML document as a string ('<root><title>My title ...') you can use the parser.reset(), parser.feed('...') and parser.close() methods. (See ) I hope this helped, --Lars M. From grove@infotek.no Thu May 27 11:32:17 1999 From: grove@infotek.no (Geir Ove Gr�nmo) Date: 27 May 1999 12:32:17 +0200 Subject: [XML-SIG] Error re: the impish "imp" ????????/ In-Reply-To: References: <7836EC5266D2D211886400A0C94A7A9014BB42@brazil.mcc.com> Message-ID: * Lars Marius Garshol | * wask@mcc.com | | | | I've been learning to use the XML Package from Python. However, when | | running from JPython, I keep getting errors regarding module imp (no | | module named, NameError). | | The problem is that imp is not implemented in JPython. | | I assume you're using SAX, and specifically the parser factories? I | know they have this problem in JPython and I seem to recall that I | have a patch submitted by Geir Ove Gr�nmo which does not use imp and | thus works in JPython. You can replace your existing saxexts.py with the one you'll find at the following location: http://www.infotek.no/~grove/saxexts.py This version adds support for JPython, using the org.python.core.imp Java class. saxexts.py should work for both CPython and JPython. All the best, Geir O. From larsga@ifi.uio.no Thu May 27 11:48:32 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 27 May 1999 12:48:32 +0200 Subject: [XML-SIG] Python/XML HOWTO In-Reply-To: References: <199905261012-64975@prudential.com.my> Message-ID: * Paul Chung Chee Soong | | parser.parseFile('collection.xml') # 4 | | [...] | | Next in # 4, I got problem again. In your example, your parameter is | a file. What is the 'file' represents? I thought it was a xml file | but it isn't right?? * Lars Marius Garshol | | It's the name of an XML file. If you want to push an XML document as a | string ('My title ...') you can use the | parser.reset(), parser.feed('...') and parser.close() methods. Sorry, I was a bit quick here and got it wrong. The SAX parser class has two parse methods: - parse(URL) - parseFile(file-object) So parse takes the file name of an XML document (but it can be a URL) and parseFile takes a file-like object (like what is returned by urllib.urlopen and open). Sorry about that. --Lars M. From hunt@cos.com Fri May 28 00:44:36 1999 From: hunt@cos.com (Huntington Williams) Date: 27 May 99 19:44:36 -0400 Subject: [XML-SIG] NSF Grants Update Message-ID: <199905280523.BAA0000025309@chablis.cos.com> The National Science Foundation provided over $3.7 billion last year for crucial research projects. Community of Science invites you to search our FREE NSF Grants database, which lets you quickly find which projects are being funded in areas of science and engineering that matter to you. The NSF Grants database is available at http://fundedresearch.cos.com/NSF The database is updated weekly and contains comprehensive information on NSF grants searchable to topic, researcher name, performing institution, and geographical area. I hope you find this research tool to be useful to you and your colleagues. Please feel free to pass this e-mail along to a friend. All best wishes, Huntington Williams, III, D. Phil. President, Community of Science hunt@cos.com http://www.cos.com From Shane.Burrell@metrostat.net Fri May 28 06:20:05 1999 From: Shane.Burrell@metrostat.net (Shane Burrell) Date: Fri, 28 May 1999 01:20:05 -0400 Subject: [XML-SIG] Anyone working with RETML or RETS Real Estate XML specs. Message-ID: <000201bea8c9$becad7d0$1602a8c0@singer> We are using python to parse both formats. If anyone is interested in getting some transmissions going please let me know. Thanks. Shane Burrell Software Engineer/Systems Administrator - Metrostat Technologies, Inc. From Michelle Mills Strout Fri May 28 19:13:25 1999 From: Michelle Mills Strout (Michelle Mills Strout) Date: Fri, 28 May 1999 11:13:25 -0700 (PDT) Subject: [XML-SIG] handle_entityref not being called Message-ID: I am using the XMLParser class which is part of the xmllib. I would like to define my own entity references of the form "β". In the process of doing this I found that the function handle_entityref isn't actually called with an entity reference is called. I stepped through the following function with the debugger. myxmlparser.feed("α") If handle_entityref were called I could override unknown_entityref and do my own processing. However, now unless I put my entity references in the entitydefs mapping I get a syntax error. I don't want to put entity references in the entitydefs mapping because then I don't have any hooks to do my own processing. Upon doing a search through the xmllib.py source (version = '0.2') I found that handle_entityref is in fact not called anywhere in that code. Do I have the wrong version of xmllib? (I have the most recent Python Mac version so hopefully this isn't the case). Can anyone help me remedy this situation? Thanks, Michelle Strout From wask@mcc.com Fri May 28 21:04:33 1999 From: wask@mcc.com (wask@mcc.com) Date: Fri, 28 May 1999 15:04:33 -0500 Subject: [XML-SIG] Parser selection Message-ID: <7836EC5266D2D211886400A0C94A7A9014BB58@brazil.mcc.com> The XML package provides several drivers that map native parser interfaces to SAX, thereby providing Python/JPython access to those parsers. What are the differentiating factors between the parsers? In other words, what features do they provide that would cause me to choose one over the other? --- Fred From larsga@ifi.uio.no Sat May 29 00:28:19 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 29 May 1999 01:28:19 +0200 Subject: [XML-SIG] Parser selection In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB58@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB58@brazil.mcc.com> Message-ID: * wask@mcc.com | | The XML package provides several drivers that map native parser | interfaces to SAX, thereby providing Python/JPython access to those | parsers. What are the differentiating factors between the parsers? | In other words, what features do they provide that would cause me to | choose one over the other? Conformance, ease of use/installation, clarity of error messages, feature set and speed are generally the deciding factors. (Not necessarily in that order.) If you use the parser factory in SAX you'll see that it has an ordering of parsers more-or-less according to my personal guesses of what people would actually prefer. If you disagree you can set your own preferences. I'm not sure whether you're asking which parser is recommended or what characteristics to choose a parser by. Given more information we might be able to give more help. --Lars M. From akuchlin@cnri.reston.va.us Sun May 30 20:45:39 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 30 May 1999 15:45:39 -0400 Subject: [XML-SIG] Parser selection In-Reply-To: <7836EC5266D2D211886400A0C94A7A9014BB58@brazil.mcc.com> References: <7836EC5266D2D211886400A0C94A7A9014BB58@brazil.mcc.com> Message-ID: <199905301945.PAA02185@mira.erols.com> wask@mcc.com writes: > The XML package provides several drivers that map native parser interfaces > to SAX, thereby providing Python/JPython access to those parsers. What are > the differentiating factors between the parsers? In other words, what > features do they provide that would cause me to choose one over the other? Some parsers are implemented in Python (xmlproc), others are in C (PyExpat) and some are in a mixture of Python and C (xmllib+sgmlop). Some parsers are fully validating (xmlproc being the only one currently) and others are non-validating parsers (xmllib, PyExpat). In practice, I think the choice comes down to PyExpat for applications where validation isn't important, and xmlproc when you need validation (though you could also use external software such as nsgmls to do parsing and validation). -- A.M. Kuchling http://starship.python.net/crew/amk/ We write our names in the sand, and then the waves roll in and wash them away. -- The emperor Augustus in SANDMAN #30: "August" From akuchlin@cnri.reston.va.us Sun May 30 20:56:15 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 30 May 1999 15:56:15 -0400 Subject: [XML-SIG] handle_entityref not being called In-Reply-To: References: Message-ID: <199905301956.PAA02202@mira.erols.com> Michelle Mills Strout writes: > I am using the XMLParser class which is part of the xmllib. I would like > to define my own entity references of the form "β". In the process > of doing this I found that the function handle_entityref isn't actually > called with an entity reference is called. I stepped through the > following function with the debugger. The version of xmllib.py in Python 1.5.2 does seem to ignore handle_entityref(). Entities are handled in the internal method called goahead(), where, if an entity reference is found, the self.entitydefs dictionary is checked, and, if no corresponding key is found, the unknown_entityref() function is called. Perhaps the handle_entityref() method is obsolete; Sjoerd? The version of xmllib.py in the omnibus XML package is older, and actually does call handle_entityref(), which is why I'm assuming that the method was made vestigial (though not deleted) in version 0.2. -- A.M. Kuchling http://starship.python.net/crew/amk/ When John Ryder, for instance, writes "I utter valediction to the author of my being", he means simply that he said goodbye to his mother. -- Robertson Davies, "How to Be a Collector" From sjoerd@oratrix.nl Mon May 31 12:52:11 1999 From: sjoerd@oratrix.nl (Sjoerd Mullender) Date: Mon, 31 May 1999 13:52:11 +0200 Subject: [XML-SIG] handle_entityref not being called In-Reply-To: Your message of Sun, 30 May 1999 15:56:15 -0400. <199905301956.PAA02202@mira.erols.com> References: <199905301956.PAA02202@mira.erols.com> Message-ID: <19990531115212.836FC310440@bireme.oratrix.nl> On Sun, May 30 1999 "A.M. Kuchling" wrote: > Michelle Mills Strout writes: > > I am using the XMLParser class which is part of the xmllib. I would like > > to define my own entity references of the form "β". In the process > > of doing this I found that the function handle_entityref isn't actually > > called with an entity reference is called. I stepped through the > > following function with the debugger. > > The version of xmllib.py in Python 1.5.2 does seem to ignore > handle_entityref(). Entities are handled in the internal method > called goahead(), where, if an entity reference is found, the > self.entitydefs dictionary is checked, and, if no corresponding key is > found, the unknown_entityref() function is called. Perhaps the > handle_entityref() method is obsolete; Sjoerd? It looks like it is. I made this change quite a long time ago, so I don't know exactly why I made it. I think it has to do with having to rescan the result of the substitution. This is needed to make definitions such as work. Even if the ENTITY tag isn't recognized, you can define the lt entity as having the value "&<" in self.entitydefs. So the way to have you own entities is to add them to the self.entitydefs dictionary. But watch out, the default value for that dictionary is a class variable, so you may want to copy the old value into an instance variable before you start modifying. > The version of xmllib.py in the omnibus XML package is older, > and actually does call handle_entityref(), which is why I'm assuming > that the method was made vestigial (though not deleted) in version > 0.2. I deleted the definition from the TestXMLParser but forgot to delete it from the main XMLParser. -- Sjoerd Mullender