From michael@memra.com Sat Apr 4 18:09:50 1998 From: michael@memra.com (Michael Dillon) Date: Sat, 4 Apr 1998 10:09:50 -0800 (PST) Subject: [XML-SIG] Look what these folks are doing with XML Message-ID: Remote Procedure Calls using XML http://www.scripting.com/98/04/stories/simpleCrossNetworkScript.html No mention of Python here... http://www.infoworld.com/cgi-bin/displayStory.pl?98043.whxml.htm -- Michael Dillon - Internet & ISP Consulting http://www.memra.com - E-mail: michael@memra.com From akuchlin@cnri.reston.va.us Mon Apr 6 14:41:08 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 6 Apr 1998 09:41:08 -0400 (EDT) Subject: [XML-SIG] Look what these folks are doing with XML In-Reply-To: References: Message-ID: <199804061341.JAA09051@newcnri.cnri.reston.va.us> Michael Dillon writes: >Remote Procedure Calls using XML >http://www.scripting.com/98/04/stories/simpleCrossNetworkScript.html An informal spec of the XML encoding used is at . It doesn't seem fairly difficult to create an XML.pickle module that did the same thing as pickle, but using XML as the format, and then implementing XML.rpc (or whatever) on top of that. >No mention of Python here... >http://www.infoworld.com/cgi-bin/displayStory.pl?98043.whxml.htm Because we haven't been sending out press releases. We should do something about that when the code is done. So, what's everyone doing coding-wise? Things have been too quiet here lately. A.M. Kuchling http://starship.skyport.net/crew/amk/ prompt. n. (Unix) A symbol on the screen indicating which shell is attacking you. -- Stan Kelly-Bootle, _The Computer Contradictionary_ From yrieck001@ntsource.com Mon Apr 6 16:53:54 1998 From: yrieck001@ntsource.com (Yale Rieck) Date: Mon, 6 Apr 1998 10:53:54 -0500 Subject: [XML-SIG] unsubscribe] Message-ID: <01bd6174$3368aac0$01010101@default.domain> This is a multi-part message in MIME format. ------=_NextPart_000_0012_01BD614A.4A92A2C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable please take me off this listserv Thanks ------=_NextPart_000_0012_01BD614A.4A92A2C0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
please take me off this = listserv
 
Thanks
------=_NextPart_000_0012_01BD614A.4A92A2C0-- From larsga@ifi.uio.no Tue Apr 7 00:16:55 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 07 Apr 1998 01:16:55 +0200 Subject: [XML-SIG] xmlproc version 0.30 In-Reply-To: <199804061341.JAA09051@newcnri.cnri.reston.va.us> References: <199804061341.JAA09051@newcnri.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | So, what's everyone doing coding-wise? Things have been too quiet | here lately. xmlproc version 0.30, with validation, is now released. This should be considered a beta, but it works and is reasonably complete. (It handles content model validation, all kinds of attributes exception notation attributes, nested entities and even detects ambiguous content models.) Once I get some feedback and manage to polish the code and APIs a bit more I should be able to release 0.31, at which time I hope to be ready to announce the parser on xml-dev and comp.text.sgml. (xmlproc is at ) Nothing is happening with saxlib yet. I left the planned release for later, since a new SAX proposal will probably arrive quite soon now. However, I've been thinking of adding drivers for htmllib, sgmllib and Paul Prescods ESIS-driven parser. I also hope to be able to make a driver for PyXMLTok. I've got some ideas for tools for generating XML from Python, and especially from databases, but I don't think I'll be able to find the time to actually write anything for a while yet. BTW: James Clark has now released expat, which is a newer version of XMLTok. Maybe the module should be updated? -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From larsga@ifi.uio.no Tue Apr 7 00:18:34 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 07 Apr 1998 01:18:34 +0200 Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: References: Message-ID: * Michael Dillon | | Remote Procedure Calls using XML | http://www.scripting.com/98/04/stories/simpleCrossNetworkScript.html Maybe there's something about this I just didn't understand, but to me it seems that they could have achieved the exact same thing with CORBA much more easily and the result would probably also have been more efficient in terms of both bandwidth and speed. -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From akuchlin@cnri.reston.va.us Tue Apr 7 14:15:18 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 7 Apr 1998 09:15:18 -0400 (EDT) Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: References: Message-ID: <199804071315.JAA03039@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >Maybe there's something about this I just didn't understand, but to me >it seems that they could have achieved the exact same thing with CORBA >much more easily and the result would probably also have been more >efficient in terms of both bandwidth and speed. If you already have an ORB, sure. Writing an ORB isn't trivial, though, and this provides RPC using two capabilities (XML and HTTP) that most languages already have available; you could probably implement this in a single medium-size Python module. What I'm not sure of is the reason why a text-based format like XML would be preferable to a binary format such as Sun's RPC. Why would 1 be better than '\000\000\000\001'? A.M. Kuchling http://starship.skyport.net/crew/amk/ If I were meta-agnostic, I'd be confused over whether I'm agnostic or not---but I'm not quite sure if I feel *that* way; hence I must be meta-meta-agnostic (I guess). -- Douglas R. Hofstadter, _Gödel, Escher, Bach_ From ken@bitsko.slc.ut.us Tue Apr 7 14:21:16 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 07 Apr 1998 08:21:16 -0500 Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: Lars Marius Garshol's message of 07 Apr 1998 01:18:34 +0200 References: Message-ID: Lars Marius Garshol writes: > * Michael Dillon > | > | Remote Procedure Calls using XML > | http://www.scripting.com/98/04/stories/simpleCrossNetworkScript.html > > Maybe there's something about this I just didn't understand, but to > me it seems that they could have achieved the exact same thing with > CORBA much more easily and the result would probably also have been > more efficient in terms of both bandwidth and speed. That's true in part, but CORBA is also a lot more strict, tightly coupled, and not directly intended for ``lightweight'' data. One _could_ use CORBA to build a foundation for something lighter, but it also turns out that you can also encode the higher level directly just as easily (like pickling). That said, XML still isn't a very efficient encoding for this either, the overhead in parsing arbitrary XML is enormous. I think it's good idea to start at the top level using XML, so any scripter or coder with an XML parser can participate. Then you formalize your structures a bit and allow clients and servers to negotiate a more efficient encoding. With that said, note that the above URL is only talking about simple remote procedure calls with a small list of data types. They've yet to discuss any OO concepts, much less the semantic layering that something like Apple's Open Scripting Architecture allows for (Dave has mentioned elsewhere that he'd like to support OSA-style messaging). -- Ken MacLeod ken@bitsko.slc.ut.us From dave@scripting.com Tue Apr 7 15:16:14 1998 From: dave@scripting.com (Dave Winer) Date: Tue, 07 Apr 1998 07:16:14 -0700 Subject: [XML-SIG] Re: XML RPC proposal: Why XML? In-Reply-To: <199804071359.JAA04235@newcnri.cnri.reston.va.us> Message-ID: <3.0.5.32.19980407071614.00a9f340@scripting.com> Andrew, thanks for getting in touch! We'll have a spec soon. This stuff is moving quickly and since we're implementing as we go, there are brief outages as our focus moves around. Should be a week at most before we have more details on the site. In my opinion, the most important reason to use XML instead of a binary format is readability. Our feeling was that the format would gain more acceptance quickly if you didn't need special tools to discover the method names, parameters and returned results. We learned, in our experience with Macintosh interapplication communication that developers lose interest in this stuff, or don't document their interfaces well enough. If the messages themselves are in ASCII we stand a better chance of understanding how the calls work. It's a low-tech tradeoff, easy to understand, and you burn a few cycles on each call as the price for being understandable. Further, once the bridges are working, we can optimize as necessary. We can build a compatible network using XML encoding, bridge all the OSes, environments and languages, and then optimize. But I strongly believe it has to start simple if it's going to gain traction in the various communities. And finally, thank you again for getting in touch. Setting up the communication links between the communities is the next thing to do. The Python world is a strong one that we respect and want to work with. Dave Winer PS: I've cc'd this to our Frontier-XML list. Perhaps we could get one person from each of our lists to join the other so we can stay informed on progress each of us is making? PPS: I also really want to talk about interfaces between scripting environments and scripting languages. We want to run Python in our world but don't want to do source integration. At 09:59 AM 4/7/98 -0400, you wrote: >Good day! > >I'm the owner of the Python XML Special Interest Group, and was >interested in your RPC over HTTP proposal. It doesn't look very >difficult to implement, once you've made the DTD, or at least a more >complete informal spec available. For examples, what are all the >data types supported? How are errors signalled? Et cetera... > > However, one thing I'm wondering about: there's already a >binary RPC encoding from Sun, XDR (External Data Representation), >described in RFC 1832. This encoding would be more compact, and >wouldn't require an XML parser to decode, since it would just be a >matter of gluing bytes together. XML is obviously useful for >long-term data representation and storage; I have more difficulty >seeing why it's worth the processing time for transient messages. >Simply because it gives you introspective information about the types >of things, perhaps? (And most scripting languages are dynamically >typed, so that's vital, but it could be added with XDR by sending type >identifiers along with the data.) > > BTW, I may forward your response to xml-sig@python.org, where >this is being bounced around. (Feel free to CC: your reply there, if >you don't mind possibly getting entangled in a discussion.) > > >A.M. Kuchling http://starship.skyport.net/crew/amk/ >It is easy---terribly easy---to shake a man's faith in himself. To take >advantage of that to break a man's spirit is devil's work. Take care of what >you are doing. Take care. > -- G.B. Shaw, _Candida_ > From larsga@ifi.uio.no Sat Apr 11 13:20:01 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 11 Apr 1998 14:20:01 +0200 Subject: [XML-SIG] SAX update Message-ID: David Megginson, the maintainer of the SAX API has posted a pre-release of the new interface at with the following caveat: "**Please** do not release new versions of your software based on this: while I don't expect major changes, I would like to take a one- or two-week bug-fixing period before we release this to the world at large." A similar pre-release of the Python translation is at with the same caveat. AttributeList has become a mix of list and hash in the translation that I'm not entirely sure whether is a good thing, and I've left out some overloaded methods. Feedback would be most welcome if anyone has an opinion on this. -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From Jack.Jansen@cwi.nl Tue Apr 14 10:09:30 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 14 Apr 1998 11:09:30 +0200 Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: Message by Lars Marius Garshol , 07 Apr 1998 01:18:34 +0200 , Message-ID: > * Michael Dillon > | > | Remote Procedure Calls using XML > | http://www.scripting.com/98/04/stories/simpleCrossNetworkScript.html > > Maybe there's something about this I just didn't understand, but to me > it seems that they could have achieved the exact same thing with CORBA > much more easily and the result would probably also have been more > efficient in terms of both bandwidth and speed. What I like about this idea is that, if I understand it correctly, you can have a completely text-based encoding of your datastructures, that is moreover fully described in your DTD. So, if we invent a python-data-DTD and they have a frontier-data-DTD we could use standard XML tools to turn the one format into the other. Any volunteers to write the DTD and implement an xmlpickle? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin@cnri.reston.va.us Tue Apr 14 17:56:41 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 14 Apr 1998 12:56:41 -0400 (EDT) Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: References: Message-ID: <13619.37998.190812.387569@newcnri.cnri.reston.va.us> Jack Jansen writes: >Any volunteers to write the DTD and implement an xmlpickle? This brings to mind something I've been wondering about. What's the best way to create XML text from scratch? Obviously it's simplest to do print "", contents, "", but that's doing everything from scratch. Is it possible to create a DOM tree and then convert it to XML? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ To hate is to study, to study is to understand, to understand is to appreciate, to appreciate is to love. So maybe I'll end up loving your theory. -- John A. Wheeler From digitome@pop3.mail.demon.net Tue Apr 14 19:36:33 1998 From: digitome@pop3.mail.demon.net (Sean Mc grath) Date: Tue, 14 Apr 1998 19:36:33 +0100 Subject: [XML-SIG] Re: Look what these folks are doing with XML Message-ID: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie> At 12:56 PM 4/14/98 -0400, you wrote: >Jack Jansen writes: >>Any volunteers to write the DTD and implement an xmlpickle? > > This brings to mind something I've been wondering about. >What's the best way to create XML text from scratch? Obviously it's >simplest to do print "", contents, "", but that's doing >everything from scratch. Is it possible to create a DOM tree and then >convert it to XML? Yes. DOM is read/write so in theory you can build and entire XML document from scratch using the DOM interface and then serialise the structure with a simple "save" commmand of some sort. In my flaky Lumberjack[1] SGML/XML processing toolkit you can do the same. The DumpXML() object will serialize any Lumberjack tree struture to XML. These trees can be created by parsing SGML/XML or created programmatically. Hmmm, How about an "__xml__" method?... [1]I'm a Lumberjack and I'm okay. -- Monty Pythons Flying Circus Sean Mc Grath http://www.digitome.com "there are two types of people in the world. Those who think the world consists of two types of people, and those who don't" From Fred L. Drake, Jr." References: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie> Message-ID: <13619.50812.229055.719057@weyr.cnri.reston.va.us> Sean Mc grath writes: > How about an "__xml__" method?... I presume you mean an __xml__() method that create's the corresponding XML for the object, and returns it as a string? Frankly, I think that's the wrong approach. The Pickler/Unpickler classes should be able to do all the XML generation/processing, and use the existing pickling protocol. I think this would be very reasonable, and actually not too hard to do. Perhaps the entire pickling/unpickling process could be separated into a frontend/backend structure, so that the intermediate structure could be spat out in XML, the current pickle format, proprietary formats, etc. Of course, the performance problem requires separate, dedicated implementations. ;-( -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From akuchlin@cnri.reston.va.us Tue Apr 14 22:08:07 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 14 Apr 1998 17:08:07 -0400 (EDT) Subject: [XML-SIG] Re: Look what these folks are doing with XML In-Reply-To: <13619.50812.229055.719057@weyr.cnri.reston.va.us> References: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie> <13619.50812.229055.719057@weyr.cnri.reston.va.us> Message-ID: <13619.52737.97326.408307@newcnri.cnri.reston.va.us> Fred L. Drake writes: > Frankly, I think that's the wrong approach. The Pickler/Unpickler >classes should be able to do all the XML generation/processing, and >use the existing pickling protocol. I think this would be very >reasonable, and actually not too hard to do. Perhaps the entire Looking at the current implementation of the Pickler and Unpickler classes, you'd have to override a *lot* of methods for XMLPickler, because many methods refer to module-level string variables containing the type codes. I asked Guido about this, and he thought it would be better to write the required marshal or pickle functionality from scratch in a class of its own. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The world will never starve for wonders; but only for want of wonder. -- G.K. Chesterton From Fred L. Drake, Jr." References: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie> <13619.50812.229055.719057@weyr.cnri.reston.va.us> <13619.52737.97326.408307@newcnri.cnri.reston.va.us> Message-ID: <13619.53648.540621.146501@weyr.cnri.reston.va.us> Andrew Kuchling writes: > XMLPickler, because many methods refer to module-level string > variables containing the type codes. I asked Guido about this, and he > thought it would be better to write the required marshal or pickle > functionality from scratch in a class of its own. I think that's what I was suggesting in the last sentence of my note. It would be as easy or easier to write a completely separate implementation that had the same interfaces. -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From fermigie@math.jussieu.fr Wed Apr 15 13:41:46 1998 From: fermigie@math.jussieu.fr (Stefane Fermigier) Date: Wed, 15 Apr 1998 14:41:46 +0200 Subject: [XML-SIG] new snapshot release of pydom. In-Reply-To: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie>; from Sean Mc grath on Tue, Apr 14, 1998 at 07:36:33PM +0100 References: <1.5.4.32.19980414183633.008dedd0@gpo.iol.ie> Message-ID: <19980415144146.43973@riemann.math.jussieu.fr> Hi, I've worked a little on pydom last week-end. There is new builder that parses HTML, built on top of sgmllib. There is a new module, pydom.py, which permits to built DOM trees using Python sytax, e.g.: doc = HTML(HEAD(TITLE('My Title')), BODY(P('First paragraph'))) The whole thing is at There is also a try at an ASP-style writer (in writer.py) but the project was dropped middleway. Regards, S. -- Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau). From gstein@exchange.microsoft.com Wed Apr 15 23:55:03 1998 From: gstein@exchange.microsoft.com (Greg Stein (Exchange)) Date: Wed, 15 Apr 1998 15:55:03 -0700 Subject: [XML-SIG] Re: XML RPC proposal: Why XML? Message-ID: <69D8143E230DD111B1D40000F848584001E544AF@ED> Dave's comment on readability is a big bonus. I designed (with a few others) and implemented a protocol for MUDs to talk to each other. The protocol has been in use for about three years now by hundreds and hundreds of MUDs over that time. One of the biggest reasons that it was adopted, implemented, and used was the fact that it was readable text rather than binary (we did allow for binary to be used in the future, but nobody has bothered). Another reason for XML over other text formats is the prevalence of common, standardized tools for dealing with XML. For example, every Win32 machine with IE4 on it has XML support builtin (msxml.dll). It's hard to find another text processor that is widely installed. -g -----Original Message----- From: Dave Winer [mailto:dave@scripting.com] Sent: Tuesday, April 07, 1998 7:16 AM To: Andrew Kuchling Cc: xml-sig@python.org; frontier-xml@scripting.com Subject: [XML-SIG] Re: XML RPC proposal: Why XML? Andrew, thanks for getting in touch! We'll have a spec soon. This stuff is moving quickly and since we're implementing as we go, there are brief outages as our focus moves around. Should be a week at most before we have more details on the site. In my opinion, the most important reason to use XML instead of a binary format is readability. Our feeling was that the format would gain more acceptance quickly if you didn't need special tools to discover the method names, parameters and returned results. We learned, in our experience with Macintosh interapplication communication that developers lose interest in this stuff, or don't document their interfaces well enough. If the messages themselves are in ASCII we stand a better chance of understanding how the calls work. It's a low-tech tradeoff, easy to understand, and you burn a few cycles on each call as the price for being understandable. Further, once the bridges are working, we can optimize as necessary. We can build a compatible network using XML encoding, bridge all the OSes, environments and languages, and then optimize. But I strongly believe it has to start simple if it's going to gain traction in the various communities. And finally, thank you again for getting in touch. Setting up the communication links between the communities is the next thing to do. The Python world is a strong one that we respect and want to work with. Dave Winer PS: I've cc'd this to our Frontier-XML list. Perhaps we could get one person from each of our lists to join the other so we can stay informed on progress each of us is making? PPS: I also really want to talk about interfaces between scripting environments and scripting languages. We want to run Python in our world but don't want to do source integration. At 09:59 AM 4/7/98 -0400, you wrote: >Good day! > >I'm the owner of the Python XML Special Interest Group, and was >interested in your RPC over HTTP proposal. It doesn't look very >difficult to implement, once you've made the DTD, or at least a more >complete informal spec available. For examples, what are all the >data types supported? How are errors signalled? Et cetera... > > However, one thing I'm wondering about: there's already a >binary RPC encoding from Sun, XDR (External Data Representation), >described in RFC 1832. This encoding would be more compact, and >wouldn't require an XML parser to decode, since it would just be a >matter of gluing bytes together. XML is obviously useful for >long-term data representation and storage; I have more difficulty >seeing why it's worth the processing time for transient messages. >Simply because it gives you introspective information about the types >of things, perhaps? (And most scripting languages are dynamically >typed, so that's vital, but it could be added with XDR by sending type >identifiers along with the data.) > > BTW, I may forward your response to xml-sig@python.org, where >this is being bounced around. (Feel free to CC: your reply there, if >you don't mind possibly getting entangled in a discussion.) > > >A.M. Kuchling http://starship.skyport.net/crew/amk/ >It is easy---terribly easy---to shake a man's faith in himself. To take >advantage of that to break a man's spirit is devil's work. Take care of what >you are doing. Take care. > -- G.B. Shaw, _Candida_ > ------------------------------------------------------ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig From fleck@xmailer.informatik.uni-bonn.de Fri Apr 17 04:03:40 1998 From: fleck@xmailer.informatik.uni-bonn.de (Markus Fleck) Date: Fri, 17 Apr 1998 05:03:40 +0200 (MET DST) Subject: [XML-SIG] Mozilla "Raptor" Project - New Embeddable Layout Engine Message-ID: <199804170303.FAA13034@sokrates.informatik.uni-bonn.de> FYI: Mozilla (aka free Netscape Navigator) will get a new layout engine called "Raptor", with incremental rendering, CSS, DOM Level 1 and RDF support. It is also intended to be embeddable via a "WebWidget" API. Further info at: It *might* be worthwhile to consider embedding "Raptor" in a Python GUI, rather than the other way around. Something else: Netscape have posted their "module owners" information. James Clark is one of the few non-Netscape people who is mentioned there (for the XML module; yet, not as the primary "owner"/maintainer, but as "peer"). Info at . Yours, Markus. -- /////////////////////////////////////////////////////////////////////////// Today's excuse: Repeated reboots of the system failed to solve problem \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From akuchlin@cnri.reston.va.us Fri Apr 17 14:56:54 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Fri, 17 Apr 1998 09:56:54 -0400 (EDT) Subject: [XML-SIG] Converting to saxlib.py Message-ID: <13623.22755.564226.37118@newcnri.cnri.reston.va.us> As a trial, I'm converting the xmllib-based script I use for processing my quotation collections to both SAX and DOM interfaces. I did the SAX translation using the last experimental release of saxlib.zip, and came across a few things: * drv_xmllib.py doesn't override xmllib.py's unknown_entityref and unknown_charref functions, so errors like &asdfasdf; are quietly ignored. * The parse() method takes a systemID and does urllib.urlopen() on it. Shouldn't there also be a way to pass in a file-like object, and just have the parser read from it. This could be done either by adding an optional 'infile' parameter to parse() and only using urlopen() if it's omitted, or by adding a completely different method. * I've never written a formal DTD for my quotations; currently, the files look like this: One trouble with being efficient is that it makes everybody hate you so. Bob Edwards The Calgary Eyeopener, March 18, 1916 ... The new xmllib.py in Python 1.5.1, and the DOM package, both complain about having multiple top-level objects. This also made my existing script break, so I put everything inside a element. Anyway, saxlib.py doesn't complain about this; should it, or should that be left to the handleElement function written by the user? Beyond that, the conversion from using xmllib.py to the SAX driver was fairly simple. Next step: writing a DOM version... -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Thus the metric system did not really catch on in the States, unless you count the increasing popularity of the nine-millimeter bullet. -- Dave Barry From fermigie@math.jussieu.fr Fri Apr 17 15:10:08 1998 From: fermigie@math.jussieu.fr (Stefane Fermigier) Date: Fri, 17 Apr 1998 16:10:08 +0200 Subject: [XML-SIG] new snapshot release of pydom. Message-ID: <19980417161008.45646@riemann.math.jussieu.fr> I've sent this message 2 days ago but didn't receive any confirmation. ----- Forwarded message from Stefane Fermigier ----- Message-ID: <19980415144146.43973@riemann.math.jussieu.fr> Date: Wed, 15 Apr 1998 14:41:46 +0200 From: Stefane Fermigier To: xml-sig@python.org Hi, I've worked a little on pydom last week-end. There is new builder that parses HTML, built on top of sgmllib. There is a new module, pydom.py, which permits to built DOM trees using Python sytax, e.g.: doc = HTML(HEAD(TITLE('My Title')), BODY(P('First paragraph'))) The whole thing is at There is also a try at an ASP-style writer (in writer.py) but the project was dropped middleway. Regards, S. ----- End forwarded message ----- -- Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau). Mathematician, hacker, bassist. http://www.math.jussieu.fr/~fermigie/ "Deep Hack Mode -- that mysterious and frightening state of consciousness where Mortal Users fear to tread. Very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your computer being struck by lightning." Matt Welsh. From larsga@ifi.uio.no Fri Apr 17 15:41:52 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1998 16:41:52 +0200 Subject: [XML-SIG] new snapshot release of pydom. In-Reply-To: <19980417161008.45646@riemann.math.jussieu.fr> References: <19980417161008.45646@riemann.math.jussieu.fr> Message-ID: * Stefane Fermigier | | I've worked a little on pydom last week-end. I used PyDOM in an XML->HTML conversion script and when I saw that PyDOM 0.1 was out I tried replacing it. It worked without a hitch. | There is new builder that parses HTML, built on top of sgmllib. Just so you know: I plan to add SAX drivers for sgmllib, htmllib and Paul Prescods ESIS module once the new revision of SAX is finalized. | There is a new module, pydom.py, which permits to built DOM trees using | Python sytax, e.g.: | | doc = HTML(HEAD(TITLE('My Title')), BODY(P('First paragraph'))) Sounds really interesting. I'll look at it as soon as I can. -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From larsga@ifi.uio.no Fri Apr 17 15:50:23 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 17 Apr 1998 16:50:23 +0200 Subject: [XML-SIG] Converting to saxlib.py In-Reply-To: <13623.22755.564226.37118@newcnri.cnri.reston.va.us> References: <13623.22755.564226.37118@newcnri.cnri.reston.va.us> Message-ID: * Andrew Kuchling | | As a trial, I'm converting the xmllib-based script I use for | processing my quotation collections to both SAX and DOM interfaces. | I did the SAX translation using the last experimental release of | saxlib.zip, Just so you know: SAX will change. Hopefully the new spec will be ready next week. | * drv_xmllib.py doesn't override xmllib.py's unknown_entityref | and unknown_charref functions, so errors like &asdfasdf; are quietly | ignored. Will fix that in the next version. | * The parse() method takes a systemID and does | urllib.urlopen() on it. Shouldn't there also be a way to pass in a | file-like object, and just have the parser read from it. It seems there will be in the new SAX revision, but there is some argument on how to handle the distinction between byte and character streams. I have an idea, but I'm not sure I'll have the time to sketch it out before the spec is frozen. | This could be done either by adding an optional 'infile' parameter | to parse() and only using urlopen() if it's omitted, or by adding a | completely different method. I think this is useful, too, and will probably add it once the spec is finalized. | The new xmllib.py in Python 1.5.1, and the DOM package, both | complain about having multiple top-level objects. This also made my | existing script break, so I put everything inside a | element. Anyway, saxlib.py doesn't complain about this; should it, | or should that be left to the handleElement function written by the | user? It should definitely not be detected by the user, nor should this really be a matter for the driver. This should be detected by the parser and the report just passed on from the driver. I remember being vaguely muddled about the error reporting, but never resolved it since it was just a quick sketch anyway. Thanks for the report (I'm a bit terse here, but I'm in a hurry right now). I'll make sure these issues are taken care of when I do the final version. -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ From fredrik@pythonware.com Sat Apr 18 14:12:26 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 18 Apr 1998 15:12:26 +0200 Subject: [XML-SIG] A small SGML question Message-ID: <01bd6acb$a22ff840$f29b12c2@panik.pythonware.com> (Assuming that I'm about the only one here that doesn't have an SGML specification handy...) I've put together an accelerator plugin for the standard sgmllib/xmllib module (sgmlop). It still have a few warts, but I hope to get around to ship it before the end of next week. However, I have a little problem: how to handle the following case: NS, MSIE, and the current version of sgmllib all treat ">" as end of tag, even if it's quoted. The specs I've been looking at seem to hint that is not the correct behaviour. Who's right? Cheers /F From papresco@technologist.com Sat Apr 18 15:11:53 1998 From: papresco@technologist.com (Paul Prescod) Date: Sat, 18 Apr 1998 10:11:53 -0400 Subject: [XML-SIG] A small SGML question References: <01bd6acb$a22ff840$f29b12c2@panik.pythonware.com> Message-ID: <3538B4A8.1EDD783B@technologist.com> In general, Netscape and MSIE never give correct behaviours. Fredrik Lundh wrote: ... > NS, MSIE, and the current version of sgmllib all treat ">" as end of tag, > even if it's quoted. The specs I've been looking at seem to hint that is > not the correct behaviour. Who's right? As in Python, string literals must have both a start and end quote character or else they "run-on" to the end of the document. The free way to check these things is to use nsgmls from http://www.jclark.com/sp ]> >As in Python, string literals must have both a start and end quote >character or else they "run-on" to the end of the document. Thought so... Now, should we fix sgmllib, or should I add a "compatibility mode" to the accelerator module... (I don't like modes, but I don't want to break existing apps just because they want to run faster... Another tweak in the accelerator module is that it currently throws away linefeeds just after and just before tags; sgmllib doesn't...) Opinions? (I have a slight fever and a terrible headache, so I don't have any of my own today ;-) Cheers /F fredrik@pythonware.com http://www.pythonware.com From akuchlin@cnri.reston.va.us Sun Apr 19 21:40:49 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Sun, 19 Apr 1998 16:40:49 -0400 (EDT) Subject: [XML-SIG] DOM notes, and xml.marshal module Message-ID: <199804192040.QAA01303@mira.erols.com> I've written a first cut at a marshal module that converts a simple Python data structure to and from a simple XML representation, using the DOM implementation. The code's included below. Some notes: * There's one problem with xml.marshal at the moment; you can't pickle multiple objects to the same stream because, when you read the data again, the parser doesn't read one data item and stop, but reads them all. For example, None is converted to a tag; if you pickle None to the same file object twice, you get . But when you parse this, the parser builds a tree containing both tags. If an XML document must contain a single top-level element, then I think parsers should recognize when that top-level element has been completed and stop. Any thoughts on this question? What's the correct behaviour? * The Walker class's walk1() method isn't consistent in returning values. walk() does "return self.walk1()", but walk1() never returns anything; this should probably be fixed. For xml.marshal, I therefore overrode the walk1() method, but I'm not sure that's how Walker is intended to be used. On the other hand, unmarshalling using just startElement(), endElement(), and doText() would have been more complicated, so overriding was the easiest thing to do. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Despair says little, and is patient. -- From SANDMAN: "Season of Mists", episode 0 # xml.marshal : Marshals simple Python data types into an XML-based # format. The interface is the same as the built-in module of the # same name, with four functions: # dump(value, file), load(file) # dumps(value), loads(string) # XXX Should provide a DTD for the XML format here. from xml.dom.builder import Builder from xml.dom.writer import XmlWriter, XmlLineariser from types import * # Dictionary mapping some of the simple types to the corresponding tag _mapping = {StringType:'string', IntType:'int', FloatType:'float'} # Internal function; recursively marshals a simple Python data type, # acting on a DOM Builder object. def _marshal(value, tree): t = type(value) if _mapping.has_key( t ): name = _mapping[t] tree.startElement(name, {}) tree.text( str(value) ) tree.endElement(name) elif t == LongType: tree.startElement('long', {}) tree.text( str(value)[:-1] ) # Chop off the trailing 'L' tree.endElement('long') elif t == TupleType: tree.startElement('tuple', {}) for elem in value: _marshal(elem, tree) tree.endElement('tuple') elif t == ListType: tree.startElement('list', {}) for elem in value: _marshal(elem, tree) tree.endElement('list') elif t == DictType: tree.startElement('dict', {}) for key, v in value.items(): _marshal(key, tree) _marshal(v, tree) tree.endElement('dict') elif t == NoneType: tree.startElement('none', {}) tree.endElement('none') elif t == ComplexType: tree.startElement('complex', {}) tree.startElement('real', {}) tree.text( str(value.real) ) tree.endElement('real') tree.startElement('imag', {}) tree.text( str(value.imag) ) tree.endElement('imag') tree.endElement('complex') elif t == CodeType: # The full information about code objects is only available # from the C level, so we'll use the built-in marshal module # to convert the code object into a string, and include it in # the HTML. import marshal tree.startElement('code', {}) tree.text( marshal.dumps(value) ) tree.endElement('code') return tree # The following class walks over a DOM tree, constructing the Python # data objects for each node. # XXX This was done by subclassing Walker and overriding the walk1() # method; is this the way Walker is supposed to be used? from xml.dom.walker import Walker from xml.dom.core import * class UnmarshallingWalker(Walker): def walk1(self, node): assert node.NodeType == ELEMENT n = node.tagName if n == 'tuple' or n=='list': L = [] children = node.getChildren() children = filter(lambda x: x.NodeType == ELEMENT, children) for child in children: if child.NodeType == ELEMENT: L.append( self.walk1(child) ) if n == 'tuple': return tuple (L) else: return L elif n == 'dict': d = {} children = node.getChildren() children = filter(lambda x: x.NodeType == ELEMENT, children) assert (len(children) % 2) ==0 for i in range(0, len(children), 2): key = self.walk1(children[i]) value = self.walk1(children[i+1]) d[key] = value return d elif n=='none': return None elif n=='complex': children = node.getChildren() children = filter(lambda x: x.NodeType == ELEMENT, children) assert len(children) == 2 real = self.walk1(children[0]) imag = self.walk1(children[1]) return complex(real, imag) elif n == 'code': children = node.getChildren() assert len(children) == 1 child = children[0] assert child.NodeType == TEXT data = child.data import marshal return marshal.loads(data) elif n == 'string': d = "" children = node.getChildren() for child in children: assert child.NodeType == TEXT d = d + child.data return d else: children = node.getChildren() assert len(children) == 1 child = children[0] assert child.NodeType == TEXT data = child.data if n == 'int': return int(data) elif n == 'long': return long(data) elif n == 'float' or n=='real' or n=='imag': return float(data) def dump(value, file): "Write the value on the open file" builder = _marshal(value, Builder() ) w = XmlWriter( file ) w.newline_after_start = ['list', 'tuple', 'dict'] w.newline_after_end = ['list', 'tuple', 'dict', 'none', 'int'] w.write(builder.document) def load(file): "Read one value from the open file" import xml.sax.saxlib, xml.sax.drv_xmllib from xml.dom.sax_builder import SaxBuilder p = xml.sax.drv_xmllib.SAX_XLParser() dh = SaxBuilder() p.setDocumentHandler(dh) p.parse('', file) u = UnmarshallingWalker() return u.walk(dh.document) def dumps(value): "Marshal value, returning the resulting string" builder = _marshal(value, Builder() ) w = XmlLineariser( ) w.newline_after_start = ['list', 'tuple', 'dict'] w.newline_after_end = ['list', 'tuple', 'dict', 'none', 'int', 'long', 'float', 'complex', 'string'] return w.linearise( builder.document ) def loads(string): "Read one value from the string" import StringIO file = StringIO.StringIO(string) return load(file) if __name__ == '__main__': print "Testing XML marshalling..." L=[None, 1, pow(2,123L), 19.72, 1+5j, "here is a string ", (1,2,3), ['alpha', 'beta', 'gamma'], {'key':'value', 1:2}, dumps.func_code ] # Try all the above bits of data import StringIO print "The second and third numbers in each line should both be 1." for item in L + [ L ]: s = dumps(item) output = loads(s) # Try it from a file file = StringIO.StringIO() dump(item, file) file.seek(0) output2 = load(file) # Verify that the parser only reads as far as is required # XXX this test currently fails (see text of posting) ##file = StringIO.StringIO( 2 * dumps(item) ) ##print file.getvalue() ##output3 = load( file ) ##output4 = load( file ) print repr(item), item==output, item==output2 From guido@CNRI.Reston.Va.US Mon Apr 20 17:29:57 1998 From: guido@CNRI.Reston.Va.US (Guido van Rossum) Date: Mon, 20 Apr 1998 12:29:57 -0400 Subject: [XML-SIG] Re: DOM notes, and xml.marshal module In-Reply-To: Your message of "Mon, 20 Apr 1998 12:00:15 EDT." <199804201600.MAA05441@python.org> References: <199804201600.MAA05441@python.org> Message-ID: <199804201629.MAA23445@eric.CNRI.Reston.Va.US> > * There's one problem with xml.marshal at the moment; you > can't pickle multiple objects to the same stream because, when you > read the data again, the parser doesn't read one data item and stop, > but reads them all. > > For example, None is converted to a tag; if you pickle > None to the same file object twice, you get . But when > you parse this, the parser builds a tree containing both tags. If an > XML document must contain a single top-level element, then I think > parsers should recognize when that top-level element has been > completed and stop. > > Any thoughts on this question? What's the correct behaviour? The pickle module uses an explicit terminator symbol to end the pickle. The closest equivalent in XML is probably to define an extra tag pair to surround the outermost thing, so you'd get e.g. . If this doesn't work, you'll just have to be content that you can only pickle one item per file (this is how pickles are used 99% of the time anyway). --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@cnri.reston.va.us Mon Apr 20 17:31:44 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 20 Apr 1998 12:31:44 -0400 (EDT) Subject: [XML-SIG] Admin note: Two missing e-mails Message-ID: <13627.30174.603212.860139@newcnri.cnri.reston.va.us> An administrative note: two messages seem to have been received for the list, and added to the archive, but not actually mailed out. Both postings are visible in the archive at . One posting was a follow-up by Fredrik Lundh to Paul Prescod's last message. Fredrik asks, in regard to his question about > ending SGML tags in sgmllib.py, "Now, should we fix sgmllib, or should I add a "compatibility mode" to the accelerator module...?" The other posting was from me; over the weekend, I wrote an xml.marshal module, and posted the code, along with some notes on the DOM implementation. I'll repost them. Digest subscribers seem to have received my message in their most recent digest; I'm not sure about Fredrik's. If you're a non-digest subscriber and received either of these messages, please let me know; perhaps I've gotten confused about what happened. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "Rain Brain says your voice is like an old black telephone and Black Annis told me to tell you that you're the first man she hasn't wanted to castrate." "Tell her she'd be too late anyway." -- Crazy Jane and Cliff Steele in DOOM PATROL #20. From akuchlin@cnri.reston.va.us Mon Apr 20 17:35:19 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Mon, 20 Apr 1998 12:35:19 -0400 (EDT) Subject: [XML-SIG] Repost: DOM notes Message-ID: <13627.30833.844579.133818@newcnri.cnri.reston.va.us> I've written a first cut at a marshal module that converts a simple Python data structure to and from a simple XML representation, using the DOM implementation. The code's available in the SIG archive. Some notes: * There's one problem with xml.marshal at the moment; you can't pickle multiple objects to the same stream because, when you read the data again, the parser doesn't read one data item and stop, but reads them all. For example, None is converted to a tag; if you pickle None to the same file object twice, you get . But when you parse this, the parser builds a tree containing both tags. If an XML document must contain a single top-level element, then I think parsers should recognize when that top-level element has been completed and stop. Any thoughts on this question? What's the correct behaviour? * The Walker class's walk1() method isn't consistent in returning values. walk() does "return self.walk1()", but walk1() never returns anything; this should probably be fixed. For xml.marshal, I therefore overrode the walk1() method, but I'm not sure that's how Walker is intended to be used. On the other hand, unmarshalling using just startElement(), endElement(), and doText() would have been more complicated, so overriding was the easiest thing to do. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Our advanced and fashionable thinkers are, naturally, out on a wide swing of the pendulum, away from the previous swing of the pendulum. If you want to reach dead center, you will do well to avoid the most advanced thinkers. -- Anthony Standen From fermigie@math.jussieu.fr Tue Apr 21 13:57:28 1998 From: fermigie@math.jussieu.fr (Stefane Fermigier) Date: Tue, 21 Apr 1998 14:57:28 +0200 Subject: [XML-SIG] Repost: DOM notes In-Reply-To: <13627.30833.844579.133818@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Mon, Apr 20, 1998 at 12:35:19PM -0400 References: <13627.30833.844579.133818@newcnri.cnri.reston.va.us> Message-ID: <19980421145728.55264@riemann.math.jussieu.fr> On Mon, Apr 20, 1998 at 12:35:19PM -0400, Andrew Kuchling wrote: > I've written a first cut at a marshal module that converts a simple > Python data structure to and from a simple XML representation, using > the DOM implementation. The code's available in the SIG archive. > Some notes: > > * There's one problem with xml.marshal at the moment; you > can't pickle multiple objects to the same stream because, when you > read the data again, the parser doesn't read one data item and stop, > but reads them all. > > For example, None is converted to a tag; if you pickle > None to the same file object twice, you get . But when > you parse this, the parser builds a tree containing both tags. If an > XML document must contain a single top-level element, then I think > parsers should recognize when that top-level element has been > completed and stop. > > Any thoughts on this question? What's the correct behaviour? I believe the parser should either parse only one element, or raise an exeption, since the standards says that there must be only one to element in one document. > * The Walker class's walk1() method isn't consistent in > returning values. walk() does "return self.walk1()", but walk1() > never returns anything; this should probably be fixed. For > xml.marshal, I therefore overrode the walk1() method, but I'm not sure > that's how Walker is intended to be used. You should probably have written your walker from scratch (see below). > On the other hand, unmarshalling using just startElement(), > endElement(), and doText() would have been more complicated, so > overriding was the easiest thing to do. You're right. walker.py was an attempt to write a generic walker class, but a walker can have several goals (the one in walker.py just dispatches events, but you could also call a function for each visited node, or for each visited node for which some condition holds, or modify the tree, or filter nodes,...) so this should be designed more carefully. Unfortunatly, I'm not familiar with the walker design pattern to do that properly. Cheers, S. -- Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau). Mathematician, hacker, bassist. http://www.math.jussieu.fr/~fermigie/ "In its pure form, Pascal is a toy language, suitable for teaching but not for real programming." Brian Kernighan. From akuchlin@cnri.reston.va.us Tue Apr 28 22:04:17 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 28 Apr 1998 17:04:17 -0400 (EDT) Subject: [XML-SIG] State of the world Message-ID: <13638.15528.211906.411994@newcnri.cnri.reston.va.us> It's probably a good time to look at the current state of affairs in this SIG, and to raise a few issues worth considering. * On the xml-dev mailing list, David Megginson's Java SAX implementation is now at 1.0beta, and the interface has been frozen except for bug fixes. Once it goes final, the Python SAX interface can be modified to match the frozen interface, and then SAX will be pretty much done. (Well, there will probably be a level 2 SAX interface someday, but that's no great concern at the moment.) * I don't know when the W3C's DOM working group is planning to finalize the Level 1 DOM spec. (Is anyone following the WG closely, and can tell us what the planned schedule is?) In any case, we should start carefully checking the DOM implementation against the current working draft , and try to move in compliance with it. * In the String-SIG, Martin von Loewis posted another patch that adds Unicode to the Python core. I've been meaning to take a look at it, but haven't got around to it yet, so work on Unicode is still progressing, though not very quickly. Open issues: * With one API frozen and the other solidifying, we can start thinking about how to distribute the Python code. My inclination is that individual authors such as Lars and Stefane will always distribute their code as single pieces, but there will also be an omnibus package that contains everything -- SAX, DOM, xmltok, JPython code, documentation, demo programs, and anything else we can think of. Most users will install this package. I'm willing to do that packaging job. * Also, we need a single factory function for instantiating XML parsers, that will use xmltok if it's available, the appropriate Java parser in JPython, and xmllib if there's nothing more specialized installed. * xmltok seems to have changed names, to expat. Probably the Python extension should follow suit. * We need to come to some resolution about handling multiple XML documents coming from a single input source. (This is the problem I ran into with xml.marshal, which prevents the code from marshalling two Python objects to the same file and then reading them in again.) Anything else we need to consider? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Your grandchildren will likely find it incredible---or even sinful---that you burned up a gallon of gasoline to fetch a pack of cigarettes! -- Dr. Paul MacCready Jr. From amos@aracnet.com Wed Apr 29 04:14:11 1998 From: amos@aracnet.com (Amos Latteier) Date: Tue, 28 Apr 1998 20:14:11 -0700 Subject: [XML-SIG] Bobo and XML, a first attempt Message-ID: <3.0.32.19980428201404.009b8eb0@mail.aracnet.com> I don't know very much about XML, but I've been thinking that there should be some interesting ways to connect XML to Bobo. I am not aware of any attempts to use XML with Bobo so far, so I decided to strike out on my own. I have made a first attempt at a sort of integration of XML and DTML, with the goal of being able to manipulate the representation of an XML document via DTML, which could then be published by Bobo. I would love feedback anyone who is interested by this type of thing, and especially from people who understand XML better that I do ;-) You can get the code and find out more at: http://starship.skyport.net/crew/amos/xml_hack.html Thanks. -Amos ------------------------------------------------------------ Amos Latteier Consulting mailto:amos@aracnet.com http://www.aracnet.com/~amos tel:503.232.3814 From Jack.Jansen@cwi.nl Wed Apr 29 10:36:18 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Wed, 29 Apr 1998 11:36:18 +0200 Subject: [XML-SIG] State of the world In-Reply-To: Message by Andrew Kuchling , Tue, 28 Apr 1998 17:04:17 -0400 (EDT) , <13638.15528.211906.411994@newcnri.cnri.reston.va.us> Message-ID: > * With one API frozen and the other solidifying, we can start > thinking about how to distribute the Python code. My inclination is > that individual authors such as Lars and Stefane will always > distribute their code as single pieces, but there will also be an > omnibus package that contains everything -- SAX, DOM, xmltok, JPython > code, documentation, demo programs, and anything else we can think of. > Most users will install this package. I'm willing to do that > packaging job. I'd like to make a plug for my versioncheck stuff here, which compares the version number of your locally installed package with the version number of the latest version of your package as obtained from a URL. Especially for the xml stuff in its current state (multiple authors, probably reasonably fast changing) I think it would be a help to users. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin@cnri.reston.va.us Wed Apr 29 14:42:10 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Wed, 29 Apr 1998 09:42:10 -0400 (EDT) Subject: [XML-SIG] State of the world In-Reply-To: <13638.15528.211906.411994@newcnri.cnri.reston.va.us> Message-ID: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Lars M. is away from his regular account, but here's a response from him that I'm forwarding to the list. Andrew Kuchling wrote: > > * On the xml-dev mailing list, David Megginson's Java SAX > implementation is now at 1.0beta, and the interface has been frozen > except for bug fixes. Once it goes final, the Python SAX interface > can be modified to match the frozen interface, I'm already working on the Python version, so I expect it to be out be out quite soon. > (Well, there will probably be a level 2 SAX > interface someday, but that's no great concern at the moment.) Agreed. I'll probably make some experimental extensions (clearly marked as such), but they will probably be released a little later. > * In the String-SIG, Martin von Loewis posted another patch > that adds Unicode to the Python core. I've been meaning to take a > look at it, but haven't got around to it yet, so work on Unicode is > still progressing, though not very quickly. I had a quick look at your proposal and had a question I never got round to asking: will the chr function be able to create characters like chr(2472)? IMHO it should, both for convenience and because it's a very natural thing to expect. > My inclination is > that individual authors such as Lars and Stefane will always > distribute their code as single pieces, but there will also be an > omnibus package that contains everything -- SAX, DOM, xmltok, JPython > code, documentation, demo programs, and anything else we can think of. > Most users will install this package. I'm willing to do that > packaging job. I think this is the way to go. I've already installed the the different packages according to the dir structure we agreed on and it looks good. > * Also, we need a single factory function for instantiating > XML parsers, that will use xmltok if it's available, the appropriate > Java parser in JPython, and xmllib if there's nothing more specialized > installed. I'm planning to add my parser factory proposal as an extension to SAX. The way I plan to do it the parser creation method will first try expat, then xmlproc and then xmllib, but this order can be changed by the user. > * xmltok seems to have changed names, to expat. Probably > the Python extension should follow suit. And be updated. :) A SAX driver for this is planned, although I may not be able to actually do it until I return to Norway in a couple of weeks. > * We need to come to some resolution about handling multiple > XML documents coming from a single input source. (This is the problem > I ran into with xml.marshal, which prevents the code from marshalling > two Python objects to the same file and then reading them in again.) Actually, this may be problematic. See http://www.xml.com/axml/notes/TrailingMisc.html for Tim Brays comment on this. I think the way around it would be to have every document start with an XML declaration. That will make conforming parsers throw an error when the new document starts, which can be (maybe with a little extension to the parsers) be caught and used to trigger some code that makes the parser consider the rest of the stream a new document. > Anything else we need to consider? Not that I can think of. You've covered all my worries, at least. More software would be nice, but I guess that will appear in due time. As for the Perl XML effort we are definitely ahead. What they have so far is an equivalent of PyXMLTok and a non-standard grove builder (ie: a DOM equivalent). --Lars M. From akuchlin@cnri.reston.va.us Wed Apr 29 16:01:22 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Wed, 29 Apr 1998 11:01:22 -0400 (EDT) Subject: [XML-SIG] State of the world In-Reply-To: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> References: <13638.15528.211906.411994@newcnri.cnri.reston.va.us> <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Message-ID: <13639.14851.826278.560240@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >I had a quick look at your proposal and had a question I never got >round to asking: will the chr function be able to create characters >like chr(2472)? If I recall correctly, yes it would. There are still outstanding issues with the implementation at the C level that will have to be resolved. >I think the way around it would be to have every document start with >an XML declaration. That will make conforming parsers throw an error >when the new document starts, which can be (maybe with a little >extension to the parsers) be caught and used to trigger some code that >makes the parser consider the rest of the stream a new document. Hmm... wouldn't there then have to be a way to save that XML declaration, in order to pass it back for subsequent parsing of the second document? If you're reading from a file-like object, you could just seek back to before the declaration, but that doesn't work in general, if you're reading from sys.stdin or a network connection. Another approach might be an end-of-XML document PI, that means the document is over, stop parsing now. Would that make sense? >Not that I can think of. You've covered all my worries, at least. More >software would be nice, but I guess that will appear in due time. Is there anything else that should go into the basic package? A module to parse DTDs, perhaps? That would pave the way for checking whether a document is conformant to a DTD. Code for specific XML DTDs, such as MathML or whatever, wouldn't be in the basic package, of course. BTW, I forgot one relevant spec: XSL, the Extensible Style Language. It doesn't seem to me as if we can do much about XSL at the moment; the only draft dates back to August 21, 1997, and says that they're planning to release the first working draft in July. I'm also not clear whether it'll be JavaScript-specific, or more language-independent; seems to be the former... -- A.M. Kuchling http://starship.skyport.net/crew/amk/ We can lick gravity, but sometimes the paperwork is overwhelming. -- Wernher Von Braun From akuchlin@cnri.reston.va.us Wed Apr 29 17:14:26 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Wed, 29 Apr 1998 12:14:26 -0400 (EDT) Subject: [XML-SIG] State of the world In-Reply-To: <13639.14851.826278.560240@newcnri.cnri.reston.va.us> Message-ID: <13639.20813.303237.521754@newcnri.cnri.reston.va.us> Another message from Lars. (One clarification: I wasn't suggesting that a DTD module be distributed separately. If we agree that it would be useful, it would be part of the basic package, and could be part of the SAX or DOM implementations, or unconnected to either of them.) Andrew Kuchling wrote: > > Hmm... wouldn't there then have to be a way to save that XML > declaration, in order to pass it back for subsequent parsing of the > second document? Not unless there is information in it that we need. :) > Another approach might be an end-of-XML document PI, that means the > document is over, stop parsing now. Would that make sense? If we can't use the declaration: yes. Or we could just outlaw PIs and other junk outside the root element and stop parsing when the root element ends. That's possibly the cleanest solution of all since we don't need to use the XML declaration as a delimiter. > Is there anything else that should go into the basic package? A > module to parse DTDs, perhaps? xmlproc has that and I'd planned to make it into an API accessible to clients, even down to tracking states in content models. (In fact it is in part accessible already, but it needs more cleaning up before anyone can rely on it.) However, I'm a bit uncertain of how much sense it makes to distribute that part separately. I've always thought of that as a part of what a parser does, but maybe I have preconceived SGML notions about this that don't fit XML. What do people think? I can be convinced to distribute the DTD services as a separate module. > That would pave the way for checking whether a document is > conformant to a DTD. Hmmm. This makes me think of a validating parser again, but I'll agree that perhaps the DOM will need a separate DTD module. Stephane, what do you think about this? > BTW, I forgot one relevant spec: XSL, the Extensible Style Language. > It doesn't seem to me as if we can do much about XSL at the moment; That's right. The people who have insight into what the XSL WG is doing keep repeating that the August proposal will be substantially revised and some even say that those who implemented the proposal will basically have wasted lots of effort. > I'm also not clear whether it'll be > JavaScript-specific, or more language-independent; seems to be the > former... Well, given that Netscape, Microsoft and Opera Software already have implemented JavaScript I can understand that. Also, I'm not sure that XSL can be made language-independent (like the DOM) in a sensible way, although implementing PyXSL should be possible and useful, I think. --Lars M. From fleck@informatik.uni-bonn.de Wed Apr 29 17:57:30 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Wed, 29 Apr 1998 18:57:30 +0200 Subject: [XML-SIG] State of the world References: <13639.20813.303237.521754@newcnri.cnri.reston.va.us> Message-ID: <35475BFA.6501@informatik.uni-bonn.de> Andrew Kuchling wrote: > Well, given that Netscape, Microsoft and Opera Software already have > implemented JavaScript I can understand that. Also, I'm not sure that > XSL can be made language-independent (like the DOM) in a sensible > way, although implementing PyXSL should be possible and useful, I > think. I think Lotus Notes 5 will also include JavaScript capability. And Netscape includes their JavaScript implementation with Mozilla source code. It *might* be usable to provide for full XSL conformance even with a Python-based library. But there is of course the question if we really *want* to achieve full compatibility with JavaScript. -:^) Yours, Markus. From Jack.Jansen@cwi.nl Wed Apr 29 20:56:33 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Wed, 29 Apr 1998 21:56:33 +0200 Subject: [XML-SIG] State of the world In-Reply-To: Message by Andrew Kuchling , Wed, 29 Apr 1998 09:42:10 -0400 (EDT) , <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Message-ID: Recently, Andrew Kuchling said: > > * We need to come to some resolution about handling multiple > > XML documents coming from a single input source. (This is the problem > > I ran into with xml.marshal, which prevents the code from marshalling > > two Python objects to the same file and then reading them in again.) > > Actually, this may be problematic. See > > http://www.xml.com/axml/notes/TrailingMisc.html > > for Tim Brays comment on this. > > I think the way around it would be to have every document start with > an XML declaration. That will make conforming parsers throw an error > when the new document starts, which can be (maybe with a little > extension to the parsers) be caught and used to trigger some code that > makes the parser consider the rest of the stream a new document. I think this is all a bad idea. The "xml way" of doing things is to have a single object per file, and I see no reason why we shouldn't conform to that. After all, if you need multiple objects you can easily extend your DTD to allow for this case. Moreover, standard xml tools will complain bitterly over our files if, say, the xmlpickle DTD would describe a single pickled object and you would pass a file with multiple xmlpickled objects to such a tool. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Jack.Jansen@cwi.nl Wed Apr 29 21:05:52 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Wed, 29 Apr 1998 22:05:52 +0200 Subject: [XML-SIG] Javascript In-Reply-To: Message by Markus Fleck , Wed, 29 Apr 1998 18:57:30 +0200 , <35475BFA.6501@informatik.uni-bonn.de> Message-ID: Is anyone here familiar with Javascript? How difficult would a javascript interpreter in Python (or a javascript->python converter) be? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From papresco@technologist.com Wed Apr 29 21:52:28 1998 From: papresco@technologist.com (Paul Prescod) Date: Wed, 29 Apr 1998 16:52:28 -0400 Subject: [XML-SIG] Javascript References: Message-ID: <3547930C.3EC42633@technologist.com> I am fairly familiar with Javascript, and I have thought about a JavaScript to Python converter before (for the same reason you are: XSL). It would be fairly easy. JavaScript has few data types and a very simple object model. The grammar is very like Java/C++, so it isn't unfamiliar. Plus the subset that will be used in your typical XSL stylesheet is absolutely trivial: just function calls and simple arithmetic. Paul Prescod Jack Jansen wrote: > > Is anyone here familiar with Javascript? How difficult would a > javascript interpreter in Python (or a javascript->python converter) be? > -- > Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ > Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ > http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm > > ------------------------------------------------------ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Perpetually obsolescing and thus losing all data and programs every 10 years (the current pattern) is no way to run an information economy or a civilization." - Stewart Brand, founder of the Whole Earth Catalog http://www.wired.com/news/news/culture/story/10124.html From akuchlin@cnri.reston.va.us Wed Apr 29 21:55:07 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Wed, 29 Apr 1998 16:55:07 -0400 (EDT) Subject: [XML-SIG] Javascript In-Reply-To: References: <35475BFA.6501@informatik.uni-bonn.de> Message-ID: <13639.37567.97319.96891@newcnri.cnri.reston.va.us> Jack Jansen writes: >Is anyone here familiar with Javascript? How difficult would a >javascript interpreter in Python (or a javascript->python converter) be? Someone here at CNRI once thought of doing this; however, I can't remember who, and everyone seems to be in a meeting at the moment. If I remember rightly, he planned to write a parser for the JavaScript, and compile it to Python bytecodes that could then be executed. I think he may have gotten the parser done, but I'm not sure; perhaps it was purely a blue-sky project. :) But that seems like a workable approach; JavaScript is fairly simple, and should be a subset of Python's capabilities, once you've written some adapter objects that mimic the behaviour of JS data types. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ You shouldn't trust the story-teller; only trust the story -- The grandfather in SANDMAN #38: "The Hunt" From papresco@technologist.com Wed Apr 29 21:57:44 1998 From: papresco@technologist.com (Paul Prescod) Date: Wed, 29 Apr 1998 16:57:44 -0400 Subject: [XML-SIG] [Fwd: [XML-SIG] State of the world] Message-ID: <35479448.8568F44@technologist.com> Markus Fleck wrote: > > And Netscape includes their JavaScript implementation with > Mozilla source code. It *might* be usable to provide for full > XSL conformance even with a Python-based library. But there is > of course the question if we really *want* to achieve full > compatibility with JavaScript. -:^) I think that JavaScript could be translated into python bytecodes fairly easily. The whole thing could be written in Python so that the Python core wouldn't actually have to have any JavaScript crap in it. XSL is designed to be interpreted a line at a time anyhow. Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Perpetually obsolescing and thus losing all data and programs every 10 years (the current pattern) is no way to run an information economy or a civilization." - Stewart Brand, founder of the Whole Earth Catalog http://www.wired.com/news/news/culture/story/10124.html From Fred L. Drake, Jr." References: <35475BFA.6501@informatik.uni-bonn.de> Message-ID: <13639.37866.724496.319055@weyr.cnri.reston.va.us> Jack Jansen writes: > Is anyone here familiar with Javascript? How difficult would a > javascript interpreter in Python (or a javascript->python converter) be? I've thought about how to implement this in CPython; essentially, what's needed is to convert JavaScript source code to a Python structure accepted by the parser.sequence2ast() function; the resulting "ast object" can then be compiled to a Python code object and run in a JavaScript environment (implemented as an rexec.RExec subclass). This is more tedious than hard; I'll try to pursue it further. I have a tokenizer around here somewhere, but that's as far as implementation has gone. -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From papresco@technologist.com Wed Apr 29 22:01:15 1998 From: papresco@technologist.com (Paul Prescod) Date: Wed, 29 Apr 1998 17:01:15 -0400 Subject: [XML-SIG] State of the world References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Message-ID: <3547951B.441CEB5A@technologist.com> > As for the Perl XML effort we are definitely ahead. What they have so > far is an equivalent of PyXMLTok and a non-standard grove builder (ie: > a DOM equivalent). Yes, but note that their grove builder is written in C++ and has been under development for quite a while. The C++ish of it could be important for large documents because I would expect Python objects with many attributes to be quite large compared to their C++ equivalents. Actually, Python on Windows can use SP as a grovebuilder and grove manager. If only COM worked on Unix! MS junk or not, COM is useful and I wish some Unix programmer would make GNUCOM (but somehow I'm not holding my breath). Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Perpetually obsolescing and thus losing all data and programs every 10 years (the current pattern) is no way to run an information economy or a civilization." - Stewart Brand, founder of the Whole Earth Catalog http://www.wired.com/news/news/culture/story/10124.html From Fred L. Drake, Jr." References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Message-ID: <13639.38069.425576.922526@weyr.cnri.reston.va.us> Andrew Kuchling said: > I think the way around it would be to have every document start with > an XML declaration. That will make conforming parsers throw an error > when the new document starts, which can be (maybe with a little > extension to the parsers) be caught and used to trigger some code that > makes the parser consider the rest of the stream a new document. Jack Jansen writes: > I think this is all a bad idea. The "xml way" of doing things is to > have a single object per file, and I see no reason why we shouldn't > conform to that. After all, if you need multiple objects you can I think this is where the SGML distinction between an "entity" and a file makes a lot of sense. Each XML entity can be exactly one instance, but an entity manager can be used to access multiple entities in a single file. Each entity can be individually retrieved from the file using the entity manager. As expected, this is only useful (as far as allowing other tools to work directly on our files) if our entity manager behaves the same way as theirs does. Which means, at least for now, one file === one entity. I can live with that. -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From akuchlin@cnri.reston.va.us Wed Apr 29 22:08:05 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Wed, 29 Apr 1998 17:08:05 -0400 (EDT) Subject: [XML-SIG] State of the world In-Reply-To: References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> Message-ID: <13639.37838.224110.177388@newcnri.cnri.reston.va.us> Jack Jansen writes: >I think this is all a bad idea. The "xml way" of doing things is to >have a single object per file, and I see no reason why we shouldn't >conform to that. After all, if you need multiple objects you can That means that if you're sending XML data over a pipe, the reader must reads the whole XML document first (perhaps using a prefix giving its length), put it into a string or some file-like object, and then parse it. A bit clumsy, but if XML documents are commonly in files, or in one-shot pipes that get closed immediately afterwards, it's not something that will occur often. It certainly wouldn't be very difficult to write xml.marshal to use a marshalling object that provides .open() and .close() calls. What does everyone think? Is the case of multiple XML documents in the same input stream not worth worrying about? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Nobody can be exactly like me. Sometimes even I have trouble doing it. -- Tallulah Bankhead From fleck@informatik.uni-bonn.de Wed Apr 29 22:36:33 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Wed, 29 Apr 1998 23:36:33 +0200 Subject: [XML-SIG] State of the world References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> <13639.37838.224110.177388@newcnri.cnri.reston.va.us> Message-ID: <35479D61.1040@informatik.uni-bonn.de> Andrew Kuchling wrote: > What does everyone think? Is the case of multiple XML documents in > the same input stream not worth worrying about? How about using MIME to encapsulate several text/xml documents inside a multipart/related document? I think this would be the "official" way to handle these cases, although it is also a bit clumsy. Yours, Markus. From Fred L. Drake, Jr." References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> <13639.37838.224110.177388@newcnri.cnri.reston.va.us> Message-ID: <13639.40349.710184.444377@weyr.cnri.reston.va.us> Andrew Kuchling writes: > What does everyone think? Is the case of multiple XML documents in > the same input stream not worth worrying about? No, not worth it. *Extracting* an XML entity from a data stream is very much *not* the responsibility of the XML parser. Give it the XML data and no more. The rest is entity management. -Fred -- Fred L. Drake, Jr. fdrake@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive Reston, VA 20191 From ht@cogsci.ed.ac.uk Thu Apr 30 09:40:26 1998 From: ht@cogsci.ed.ac.uk (Henry S. Thompson) Date: Thu, 30 Apr 1998 09:40:26 +0100 Subject: [XML-SIG] Javascript In-Reply-To: (message from Jack Jansen on Wed, 29 Apr 1998 22:05:52 +0200) Message-ID: <21275.199804300840@naomi.cogsci.ed.ac.uk> I've done Javascript -> Scheme with a pretty modest lex/yacc approach, for my implementation of XSL by translation into DSSSL, so I expect Javascript -> Python would be pretty straightforward. See http://www.ltg.ed.ac.uk/~ht/xslj.html for information, including pointers to the source. ht From the@software-ag.de Thu Apr 30 10:06:12 1998 From: the@software-ag.de (Thomas Herchenroeder) Date: Thu, 30 Apr 1998 11:06:12 +0200 Subject: [XML-SIG] State of the world References: <13639.11228.540925.896002@newcnri.cnri.reston.va.us> <3547951B.441CEB5A@technologist.com> Message-ID: <35483F04.45C2@software-ag.de> Paul Prescod wrote: > > Actually, Python on Windows can use SP as a grovebuilder and grove > manager. If only COM worked on Unix! MS junk or not, COM is useful and I > wish some Unix programmer would make GNUCOM (but somehow I'm not holding > my breath). Hi! I'm not sure this is exactly what you were looking for, but just to make you aware of it, look at: http://www.softwareag.com/corporat/solutions/entirex/dcom/default.htm I doubt you would call it "GNUish", but at least you get DCOM for free on Solaris 2.5 and DigitalUnix 4.0. Regards, --------------------------------------------------------------------- Thomas Herchenroeder Software AG Unix System Administration the@software-ag.de (e-mail) From akuchlin@cnri.reston.va.us Thu Apr 30 16:09:29 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Thu, 30 Apr 1998 11:09:29 -0400 (EDT) Subject: [XML-SIG] saxlib 1.0beta Message-ID: <13640.37704.767888.409900@newcnri.cnri.reston.va.us> [Lars is still away from his account, so I'm forwarding this message. The software is at .] I've made a saxlib 1.0 beta release to shadow the Java one. I'm releasing the beta mainly for feedback on the translation, where I've made the following changes: - added Parser.parseFile to read from a file-like object - InputSource is left out (until we have Unicode) - all drivers must have the function create_parser(), which is used by the ParserFactory - saxutils.py has a collection of utilities - saxexts.py contains extensions to SAX The full release will include: - update of web page :-) - better xmlproc driver with DTD support - xml-toolkit and xmltok drivers - probably some more extensions - documentation - a non-validating driver for xmlproc (since it's faster) The zip file follows the proposed directory structure and also has a version of xmlproc in the right place so that one can choose between it and xmllib and test the ParserFactory. The original SAX version has a parser factory that uses the Java property sax.parser to tell the factory which parser to instantiate. What do people think of making saxlib use the environment variable sax.parser for the same purpose? --Lars M. From larsga@step.de Thu Apr 30 16:21:32 1998 From: larsga@step.de (Lars Marius Garshol) Date: Thu, 30 Apr 1998 17:21:32 +0200 Subject: [XML-SIG] State of the world References: <13639.20813.303237.521754@newcnri.cnri.reston.va.us> Message-ID: <354896FC.EDBFE1C9@step.de> (Given the amount of activity on the list I've subscribed, even if it means I receive all messages twice.) Andrew Kuchling wrote: > > One clarification: I wasn't suggesting > that a DTD module be distributed separately. If we agree that it > would be useful, it would be part of the basic package, and could be > part of the SAX or DOM implementations, or unconnected to either of > them. I think the clarification needs further clarification. :-) The DTD module is part of xmlproc already, so it will have to be distributed separately in the sense that it's distributed separately from xmlproc. (Unless you plan to include xmlproc, which would be OK, but what you've said so far doesn't indicate that you do.) If people think a module that only parses DTDs, delivers DTD events and provides a DTD-related object structure would be useful, I could take this piece out of xmlproc and distribute it separately. --Lars M. From akuchlin@cnri.reston.va.us Thu Apr 30 19:40:41 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Thu, 30 Apr 1998 14:40:41 -0400 (EDT) Subject: [XML-SIG] State of the world In-Reply-To: <354896FC.EDBFE1C9@step.de> References: <13639.20813.303237.521754@newcnri.cnri.reston.va.us> <354896FC.EDBFE1C9@step.de> Message-ID: <13640.49635.141273.252166@newcnri.cnri.reston.va.us> Lars Marius Garshol writes: >I think the clarification needs further clarification. :-) Good point. >The DTD module is part of xmlproc already, so it will have to be >distributed separately in the sense that it's distributed separately >from xmlproc. (Unless you plan to include xmlproc, which would be OK, >but what you've said so far doesn't indicate that you do.) It hadn't occurred to me, actually, though I wouldn't be against it. We already have low-end and high-end solutions; xmllib.py is included with Python 1.5 for people who want a pure Python solution, and xmltok is for people who want speed and are willing/able to compile the extension module for it. What would xmlproc buy that the other don't--validation, probably? >If people think a module that only parses DTDs, delivers DTD events >and provides a DTD-related object structure would be useful, I could >take this piece out of xmlproc and distribute it separately. xmldtd.py seems fairly independent of xmlproc (at least from my superficial look at the code), so it would probably be a good idea. We could provide parser subclasses that assemble an object representing the DTD as they parse (if xmltok is willing). For the weekend, I'll try to work on the XML-HOWTO, hopefully adding enough content that it'll be worth posting as a rough draft. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Interestingly, according to modern astronomers, space is finite. This is a very comforting thought---particularly for people who can never remember where they have left things. -- Woody Allen From papresco@technologist.com Thu Apr 30 20:32:21 1998 From: papresco@technologist.com (Paul Prescod) Date: Thu, 30 Apr 1998 15:32:21 -0400 Subject: [XML-SIG] State of the world References: <13639.20813.303237.521754@newcnri.cnri.reston.va.us> <354896FC.EDBFE1C9@step.de> <13640.49635.141273.252166@newcnri.cnri.reston.va.us> Message-ID: <3548D1C5.1C8785D8@technologist.com> Andrew Kuchling wrote: > > It hadn't occurred to me, actually, though I wouldn't be > against it. We already have low-end and high-end solutions; > xmllib.py is included with Python 1.5 for people who want a pure > Python solution, and xmltok is for people who want speed and are > willing/able to compile the extension module for it. What would > xmlproc buy that the other don't--validation, probably? We should probably do some benchmarks and interface comparison. xmllib came first, and was a great contribution to the Python library. But it may or may not be the best package to take us into the future. I haven't looked at its native interface since the early days because I always use SAX (with whatever parser is around). Since native interfaces probably aren't that interesting, we should just figure out what gives the best SAX performance (or can be tweaked to). Paul Prescod - http://itrc.uwaterloo.ca/~papresco "Perpetually obsolescing and thus losing all data and programs every 10 years (the current pattern) is no way to run an information economy or a civilization." - Stewart Brand, founder of the Whole Earth Catalog http://www.wired.com/news/news/culture/story/10124.html