" . $entry{"NAME"} . "

From uche.ogbuji@fourthought.com Sun Apr 1 00:16:06 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 31 Mar 2001 17:16:06 -0700 Subject: [XML-SIG] Metadata in XBEL In-Reply-To: Message from "Fred L. Drake, Jr." of "Sat, 31 Mar 2001 14:21:47 EST." <15046.11851.341533.770037@beowolf.pythonlabs.org> Message-ID: <200104010016.RAA07490@localhost.localdomain> > Uche Ogbuji writes: > > Yes. I actually implemented an off-line merge earlier, but I think a > > standardized merge indicator would be useful. > > To make this meaningful, do we need more discussion of what "merge" > means, or should this be left entirely to clients? I'm inclined to > think we need a good description of the expected range of application > and motivation, and the rest can be left to specific applications. OK. I agree as you say later that a merge can be at top level or folder level. In either case, I'd use the following guidelines: 1. A merge element is of the form ". The source is an XBEL file in which the merge element appears. The target is the XBEL file referenced by the URI in the xinclude:href attribute of the merge element. 2. the current folder of the source is the folder in which the relevant merge element appears. This can be the top-level "folder". 3. All bookmarks at the top level of the target are added to the source as if directly specified at the location of the merge element in the current folder. 4. If any bookmark element in the current folder of the source has an identical href attribute to a bookmark in the target, the bookmark in the target is ignored. 5. All folders at the top level of the target are added to the source as if directly specified at the location of the merge element (whether top-level or within a folder). This addition involves a recursive copying of all the bookmarks and sub-folders contained in the target folder. 6. If any folder in the current folder of the source has an identical title child element to a folder in the target, the folder in the target is merged into the folder in the source according to this process as if he target folder and the matching source folder were both top-level xbel elements. 7. The expanded bookmark file is the result of applying this process to each merge element in document order of the merge source. All merge elements in a merge target are first processed before incorporation into the merge source. > > That should instead be spelled > > > > > > > > Or such, so that processors that don't have first-class merge support can > > still include the other file through xinclude. > > This syntax seems reasonable; I presume we'll want to include some > way to mark multiple sources with priorities to determine > "who wins" in the presence of multiple sources for a bookmark; some > applications will present all versions of a bookmark and others will > only want to present one but make the determination based on the > bookmark data. The process above implicitly specifies that the priority is according to order of appearance of each merge element in the source, by document order. Earlier merge elements take precedence. > I presume this element should be allowed in both > and elements. Do we want to do this in XBEL 1.1 or wait for > more experiance before adding it? Probably a question for the browser implementors. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Apr 1 00:18:26 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 31 Mar 2001 17:18:26 -0700 Subject: [XML-SIG] Metadata in XBEL In-Reply-To: Message from David Faure of "Sat, 31 Mar 2001 23:21:13 +0100." <200103312221.f2VMLEX02984@faure.worldonline.co.uk> Message-ID: <200104010018.RAA07501@localhost.localdomain> > On Saturday 31 March 2001 20:21, Fred L. Drake, Jr. wrote: > > Uche Ogbuji writes: > > > Yes. I actually implemented an off-line merge earlier, but I think a > > > standardized merge indicator would be useful. > > What's off-line merge ? Basically, merging two XBEL files on the command line into a single XBEL file for re-import into browser format. By on-line merge, I would imagine the browser starts up, and reads the master XBEL file, and processes any merge elements right then, therefore merging for the duration of the browser session. This would provide the user max flexibility. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From stuff4gary@hotmail.com Mon Apr 2 00:52:18 2001 From: stuff4gary@hotmail.com (gary cor) Date: Sun, 01 Apr 2001 23:52:18 Subject: [XML-SIG] I am confused!.. can I pick someones brains on parsers? Message-ID: I am just starting doing my first bits of XML application development in python and for the large part am rushing around trying to get to grips with the concepts involved to create applications in publishing for editors. It seems to me that there are loads of parsers in python for XML that do the similar things, I wonder if someone could try and explain to me how parsers differ from Dynamic Linking Extensions on a PC or a Macintosh extension files and developing for these systems. Is it not possible to have one parser for image files (like a image gear DLL), another for dealing with text data, another for XHTML, another for SVG, another for movies, etc... My point being can't parsers be more application specific? And if they are application specific how do I find which ones I should use where and the way each is to be used for its task within its application? Kind Regards Gary C _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From jtauber@bowstreet.com Mon Apr 2 01:49:51 2001 From: jtauber@bowstreet.com (James Tauber) Date: Sun, 1 Apr 2001 20:49:51 -0400 Subject: [XML-SIG] I am confused!.. can I pick someones brains on pars ers? Message-ID: One of the characteristics of XML is that it provides a common syntax *regardless* of application and so a certain level of processing can be done that is *not* application specific. An XML parser or XSLT engine are examples of this. Typically, however, you want to build application specific objects from your XML. This can be done in a layered approach with a generic event-firing XML parser connected to an application specific object builder. If you look, for example, at the code for PyTREX (at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/pytrex/pytrex.py?rev=5.0&conte nt-type=text/x-cvsweb-markup&cvsroot=pytrex) you'll see a series of handlers that respond to events from the parser expat by building up application specific objects. Hope this helps. James > -----Original Message----- > From: gary cor [mailto:stuff4gary@hotmail.com] > Sent: Sunday, April 01, 2001 7:52 PM > To: xml-sig@python.org > Subject: [XML-SIG] I am confused!.. can I pick someones brains on > parsers? > > > I am just starting doing my first bits of XML application > development in > python and for the large part am rushing around trying to get > to grips with > the concepts involved to create applications in publishing > for editors. > > It seems to me that there are loads of parsers in python for > XML that do the > similar things, I wonder if someone could try and explain to > me how parsers > differ from Dynamic Linking Extensions on a PC or a Macintosh > extension > files and developing for these systems. Is it not possible > to have one > parser for image files (like a image gear DLL), another for > dealing with > text data, another for XHTML, another for SVG, another for > movies, etc... > My point being can't parsers be more application specific? > And if they are > application specific how do I find which ones I should use > where and the way > each is to be used for its task within its application? > > Kind Regards > > Gary C > ______________________________________________________________ > ___________ > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From ken@bitsko.slc.ut.us Mon Apr 2 02:23:23 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 01 Apr 2001 20:23:23 -0500 Subject: [XML-SIG] I am confused!.. can I pick someones brains on parsers? In-Reply-To: "gary cor"'s message of "Sun, 01 Apr 2001 23:52:18" References: Message-ID: "gary cor" writes: > It seems to me that there are loads of parsers in python for XML > that do the similar things, I wonder if someone could try and > explain to me how parsers differ from Dynamic Linking Extensions on > a PC or a Macintosh extension files and developing for these > systems. Is it not possible to have one parser for image files > (like a image gear DLL), another for dealing with text data, another > for XHTML, another for SVG, another for movies, etc... My point > being can't parsers be more application specific? Yes they could, and some (grovesy folks*) would argue that there should be lots of them. That's not the current, common practice though. > And if they are application specific how do I find which ones I > should use where and the way each is to be used for its task within > its application? Like the mapping from mime-types to applications in browsers, there would also be some kind of configuration that would map mime-types to their respective parsers. All this is just brainstorming, though, since no such system exists as yet. There recently has been a new specification out called Resource Directory Description Language (RDDL) that may be a great help in mapping data types to resources that apply to those data types (parsers, style sheets, documentation, validators, services, etc.). It's currently focused on providing resources for XML Namespaces, but the parallel to any type of data is very clear. -- Ken * The term "groves" refers to an abstract way of modeling data as nodes and properties (using objects and attributes in Py), with the ability to do processing on any set of nodes in a common way. Resource Description Format (RDF) is a similar abstract model with roughly the same characteristics. From ken@bitsko.slc.ut.us Mon Apr 2 02:51:48 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 01 Apr 2001 20:51:48 -0500 Subject: [XML-SIG] I am confused!.. can I pick someones brains on pars ers? In-Reply-To: James Tauber's message of "Sun, 1 Apr 2001 20:49:51 -0400" References: Message-ID: James Tauber writes: > One of the characteristics of XML is that it provides a common > syntax *regardless* of application and so a certain level of > processing can be done that is *not* application specific. > > An XML parser or XSLT engine are examples of this. > > Typically, however, you want to build application specific objects > from your XML. This can be done in a layered approach with a generic > event-firing XML parser connected to an application specific object > builder. Rephrasing what James is saying, tying it to what I just posted, Groves and RDF provide a common data model, and again, "a certain level of processing can be done that is *not* application specific", but also at a higher level than XML elements and attributes. The RDF IG folks, for example, have recently been talking about RDFTransform (XSLT mapped to RDF), RDFPath (XPath for RDF), etc. This is all mostly academic at this point, so I hope no one is bothered by my rambling about it! -- Ken From david@mandrakesoft.com Mon Apr 2 15:18:41 2001 From: david@mandrakesoft.com (David Faure) Date: Mon, 2 Apr 2001 15:18:41 +0100 Subject: [XML-SIG] Metadata in XBEL In-Reply-To: <200104010016.RAA07490@localhost.localdomain> References: <200104010016.RAA07490@localhost.localdomain> Message-ID: <200104021418.f32EIgM08766@faure.worldonline.co.uk> On Sunday 01 April 2001 01:16, Uche Ogbuji wrote: > > Uche Ogbuji writes: > > > Yes. I actually implemented an off-line merge earlier, but I think a > > > standardized merge indicator would be useful. > > > > To make this meaningful, do we need more discussion of what "merge" > > means, or should this be left entirely to clients? I'm inclined to > > think we need a good description of the expected range of application > > and motivation, and the rest can be left to specific applications. > > OK. I agree as you say later that a merge can be at top level or folder > level. In either case, I'd use the following guidelines: > > 1. A merge element is of the form ". The > source is an XBEL file in which the merge element appears. The target is the > XBEL file referenced by the URI in the xinclude:href attribute of the merge > element. > > 2. the current folder of the source is the folder in which the relevant merge > element appears. This can be the top-level "folder". > > 3. All bookmarks at the top level of the target are added to the source as if > directly specified at the location of the merge element in the current folder. > > 4. If any bookmark element in the current folder of the source has an > identical href attribute to a bookmark in the target, the bookmark in the > target is ignored. > > 5. All folders at the top level of the target are added to the source as if > directly specified at the location of the merge element (whether top-level or > within a folder). This addition involves a recursive copying of all the > bookmarks and sub-folders contained in the target folder. > > 6. If any folder in the current folder of the source has an identical title > child element to a folder in the target, the folder in the target is merged > into the folder in the source according to this process as if he target folder > and the matching source folder were both top-level xbel elements. > > 7. The expanded bookmark file is the result of applying this process to each > merge element in document order of the merge source. All merge elements in a > merge target are first processed before incorporation into the merge source. Wow, that's a very precise definition of a #include mechanism, with all details fleshed out. I'm fine with this specification, I think it's a simple one to implement, but at the same time it gives the user what we want. Thanks for this detailed spec. -- David FAURE, david@mandrakesoft.com, faure@kde.org http://perso.mandrakesoft.com/~david/, http://www.konqueror.org/ KDE, Making The Future of Computing Available Today From Alexandre.Fayolle@logilab.fr Mon Apr 2 18:12:34 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 2 Apr 2001 19:12:34 +0200 (CEST) Subject: [XML-SIG] Question about xml.dom.ext.reader.Sax2 Message-ID: Hello, I've noticed the following lines of codes in Sax2.py, class Reader (this is line 288 to 293, from the file shipping with 4Suite 0.10.2, in method Reader.__init__): self.parser.setProperty(handler.property_lexical_handler, self.handler) try: self.parser.setProperty(handler.property_declaration_handler, self.handler) except: pass Am I missing something obvious, or should the first line be deleted (I mean, if there is a chance that the third line raise an exception, it will presumably have been thrown on the first time the call is made, or are there some weird side effects that I'm not aware of ?) Cheers, Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Mon Apr 2 18:15:41 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 2 Apr 2001 19:15:41 +0200 (CEST) Subject: [XML-SIG] Re: Question about xml.dom.ext.reader.Sax2 Message-ID: OK, forget it, I'm obviously getting tired. The parameters are not the same. Sorry for the silly question. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Mon Apr 2 22:01:31 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 02 Apr 2001 15:01:31 -0600 Subject: [XML-SIG] Question about xml.dom.ext.reader.Sax2 In-Reply-To: Message from Alexandre Fayolle of "Mon, 02 Apr 2001 19:12:34 +0200." Message-ID: <200104022101.PAA20945@localhost.localdomain> > I've noticed the following lines of codes in Sax2.py, class Reader (this > is line 288 to 293, from the file shipping with 4Suite 0.10.2, in method > Reader.__init__): > > self.parser.setProperty(handler.property_lexical_handler, > self.handler) > try: > self.parser.setProperty(handler.property_declaration_handler, > self.handler) > except: > pass > > > Am I missing something obvious, or should the first line be deleted (I > mean, if there is a chance that the third line raise an exception, it will > presumably have been thrown on the first time the call is made, or are > there some weird side effects that I'm not aware of ?) python.org/mailman/listinfo/xml-sig It's a cut-n-paste bug. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From swanand75@yahoo.com Mon Apr 2 22:42:02 2001 From: swanand75@yahoo.com (Swanand Bhorkar) Date: Mon, 2 Apr 2001 14:42:02 -0700 (PDT) Subject: [XML-SIG] help needed Message-ID: <20010402214202.31623.qmail@web11304.mail.yahoo.com> Hi, I am trying to install PyXML package. I have python 2.0. Whenever i do either "python setup.py install" or "python setup.py build" on the command prompt, it exits giving an error that "distutils.core not found". I think the PyXML tar.zip file should have had it.But its not there.What should i do now ? thanks and regards, swanand __________________________________________________ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/?.refer=text From rsalz@zolera.com Tue Apr 3 03:42:16 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 2 Apr 2001 22:42:16 -0400 Subject: [XML-SIG] xpath (4xpath) and CDATA Message-ID: <200104030242.WAA12192@os390.zolera.com> xpath (which Martin has merged in from 4Xpath) doesn't recognize CDATA nodes. According to the spec, http://www.w3.org/TR/xpath#NT-NodeType , "a CDATA section is treated as if the were removed and every occurrence of < and & were replaced by < and & respectively." I think the way to fix this is to add CDATA_SECTION_NODE next to every place that TEXT_NODE appears, but to do the above conversion whenever the *value* of the node is needed. Make sense? I just want to double-check before I (ab)use my new status. :) /r$ From Nicolas.Chauvat@logilab.fr Tue Apr 3 09:27:25 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Tue, 3 Apr 2001 10:27:25 +0200 (CEST) Subject: [XML-SIG] help needed In-Reply-To: <20010402214202.31623.qmail@web11304.mail.yahoo.com> Message-ID: Hi, > I am trying to install PyXML package. I have python 2.0. Whenever i do > either "python setup.py install" or "python setup.py build" on the > command prompt, it exits giving an error that "distutils.core not > found". >=20 > I think the PyXML tar.zip file should have had it.But its not > there.What should i do now ? I don't use python 2.0, but looking at the standard library doc for python 2.0 would tend to prove distutil is not a standard package as it is not there. But life is not that bad as you can easily download it from http://www.python.org/sigs/distutils-sig/download.html --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From jfk@informaticon.dk Tue Apr 3 09:46:59 2001 From: jfk@informaticon.dk (=?iso-8859-1?Q?J=F8rgen=20Fr=F8jk=20Kj=E6rsgaard?=) Date: Tue, 03 Apr 2001 10:46:59 +0200 Subject: [XML-SIG] Python binding for Xalan C++ ? Message-ID: <3AC98E03.104942BF@informaticon.dk> Is there a Python binding for the Xalan C++ XSLT processor (http://xml.python.org/xalan-c) or is anybody working on creating one? If not, would it be worth doing, or should I instead use Jython with Xalan-j? I'd like to use Xalan because 4Suite XSLT doesn't perform adequately. In other words, what I'm looking for are pros and cons of cPython+Xalan-c and Jython+Xalan-j. regards, jfk -- = J=F8rgen Fr=F8jk Kj=E6rsgaard, Systemkonsulent (Systems Consultant) Inform@ticon ApS * Web: www.informaticon.dk * Tlf: 8672 0093 Internet programmering * Systemudvikling p=E5 Linux, FreeBSD og PalmOS From fdrake@acm.org Tue Apr 3 15:42:57 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 3 Apr 2001 10:42:57 -0400 (EDT) Subject: [XML-SIG] help needed In-Reply-To: References: <20010402214202.31623.qmail@web11304.mail.yahoo.com> Message-ID: <15049.57713.274634.4860@cj42289-a.reston1.va.home.com> Nicolas Chauvat writes: > I don't use python 2.0, but looking at the standard library doc for python > 2.0 would tend to prove distutil is not a standard package as it is not > there. But life is not that bad as you can easily download it from > http://www.python.org/sigs/distutils-sig/download.html Distutils has been a part of the standard library starting with release 1.6. The documentation for Distutils is in two separate documents that are part of the standard documentation: "Installing Python Modules" http://www.python.org/doc/current/inst/inst.html "Distributing Python Modules" http://www.python.org/doc/current/dist/dist.html The separate package distributed by the Distutils-SIG is needed for Python 1.5.2 users. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Tue Apr 3 15:46:21 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 3 Apr 2001 10:46:21 -0400 (EDT) Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: <200104030242.WAA12192@os390.zolera.com> References: <200104030242.WAA12192@os390.zolera.com> Message-ID: <15049.57917.459340.366669@cj42289-a.reston1.va.home.com> Rich Salz writes: > xpath (which Martin has merged in from 4Xpath) doesn't recognize CDATA > nodes. According to the spec, http://www.w3.org/TR/xpath#NT-NodeType , > "a CDATA section is treated as if the were removed and > every occurrence of < and & were replaced by < and & respectively." > > I think the way to fix this is to add CDATA_SECTION_NODE next to every > place that TEXT_NODE appears, but to do the above conversion whenever the > *value* of the node is needed. It's not at all clear what conversion is needed. Do you propose converting it to a text node and replacing the '<' and '&' characters with EntityReference nodes? It sounds to me like the XPath spec is referring to the serialization format, which is just confusing when talking about nodes. I think adding the CDATA_SECTION_NODE to the nodeType check is sufficient. So don't let me stop you! ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From noreply@sourceforge.net Wed Apr 4 16:42:10 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 04 Apr 2001 08:42:10 -0700 Subject: [XML-SIG] [ pyxml-Patches-413722 ] Sax2 reader for pDomlette Message-ID: Patches item #413722, was updated on 2001-04-04 08:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=413722&group_id=6473 Category: 4Suite Group: None Status: Open Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: Sax2 reader for pDomlette Initial Comment: I need to build a pDomlette document from a SAX2 parser. Currently, only SAX1 parsers are supported. The attached patch is a quick hack providing support for SAX2 parsers. Alexandre Fayolle ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=413722&group_id=6473 From sam@webslingerZ.com Wed Apr 4 18:16:25 2001 From: sam@webslingerZ.com (Sam Brauer) Date: Wed, 4 Apr 2001 13:16:25 -0400 (EDT) Subject: [XML-SIG] Announcing maki In-Reply-To: Message-ID: I have just made the first release of a project I have been working on for the past couple of months. The project's name is 'maki'. To quote from the manual: "The purpose of maki is to serve XML files via the web. A web developer can specify that the XML data be processed or transformed through any number of steps. Each step is either a stylesheet transformation or a custom process. A processor that evaluates embedded Python code is included. The output of each step is passed as the input to the next step (similar to a Unix pipe). Additionally, the output of each processor step can be cached for a user-specified time period. All configuration is in a central XML file that specifies rules based on matching paths against regular expressions." maki relies on Python2, Apache, 4Suite, and mod_python. It also supports the use of Sablotron and Sab-pyth for XSLT (these are optional, since maki uses 4xslt as the default XSLT transformer). maki is currently in the alpha stage (fully functional, but in need of community criticism and real-world testing). I would very much welcome any feedback. If you're interested, take a look at http://maki.sourceforge.net/ Thanks, Sam ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sam Brauer : Systems Programmer : sam@webslingerZ.com From sam@webslingerZ.com Wed Apr 4 19:36:54 2001 From: sam@webslingerZ.com (Sam Brauer) Date: Wed, 4 Apr 2001 14:36:54 -0400 (EDT) Subject: [XML-SIG] Announcing maki In-Reply-To: Message-ID: It does caching already. On my test system (a 400 mhz PII) static XML -> XSLT -> HTML is served out of cache in about 5 milliseconds. With every request, maki tests whether any of the input files have been modified (thereby invalidating the cache). Each step is cached, so if your XML goes through two XSLT passes and only the last stylesheet has changed, the cached output of the first XSLT pass is used, instead of starting over from scratch. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sam Brauer : Systems Programmer : sam@webslingerZ.com On Wed, 4 Apr 2001, Clark C. Evans wrote: > > Looks neat. I'll dig into it in about 2-3 weeks. > I'd like to see a simple caching mechanism as well, > so that HTML pages arn't re-generated all the time > for relatively static content. > > Clark > > On Wed, 4 Apr 2001, Sam Brauer wrote: > > > Date: Wed, 4 Apr 2001 13:16:25 -0400 (EDT) > > From: Sam Brauer > > To: xml-sig@python.org > > Subject: [XML-SIG] Announcing maki > > > > I have just made the first release of a project I have been working on for > > the past couple of months. > > The project's name is 'maki'. To quote from the manual: > > > > "The purpose of maki is to serve XML files via the web. A web developer > > can specify that the XML data be processed or transformed through any > > number of steps. Each step is either a stylesheet transformation or a > > custom process. A processor that evaluates embedded Python code is > > included. The output of each step is passed as the input to the next step > > (similar to a Unix pipe). Additionally, the output of each processor step > > can be cached for a user-specified time period. All configuration is in a > > central XML file that specifies rules based on matching paths against > > regular expressions." > > > > maki relies on Python2, Apache, 4Suite, and mod_python. It also > > supports the use of Sablotron and Sab-pyth for XSLT (these are optional, > > since maki uses 4xslt as the default XSLT transformer). > > > > maki is currently in the alpha stage (fully functional, but in need of > > community criticism and real-world testing). I would very much welcome > > any feedback. > > > > If you're interested, take a look at http://maki.sourceforge.net/ > > > > Thanks, > > Sam > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Sam Brauer : Systems Programmer : sam@webslingerZ.com > > > > > > > > _______________________________________________ > > XML-SIG maillist - XML-SIG@python.org > > http://mail.python.org/mailman/listinfo/xml-sig > > > > From Eugene.Leitl@lrz.uni-muenchen.de Wed Apr 4 21:43:22 2001 From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed, 04 Apr 2001 22:43:22 +0200 Subject: [XML-SIG] rendering XML trees to HTML Message-ID: <3ACB876A.DEA7650B@lrz.uni-muenchen.de> I'm writing a web app that renders trees of chemical reactions (spit out as XML) as (.png's embedded in) HTML s. I don't know exactly how to encode them yet, but it will be something like this (but minus length, since this is phylogenetic tree): Do you know of any code that renders XML as a HTML table? Maybe even something that will do a a fair liking of Tk tree widget, producing http://archive.eso.org/~abrighto/tree/tree.html a server-side bitmap? Being very lazy, I don't feel like reinventing the wheel, and XML + trees is a rather frequent combination (30+ k hits on Google). Have any of you ran across any such beast? TIA, Eugene Leitl From Nicolas.Chauvat@logilab.fr Wed Apr 4 19:52:05 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Wed, 4 Apr 2001 20:52:05 +0200 (CEST) Subject: [XML-SIG] rendering XML trees to HTML In-Reply-To: <3ACB876A.DEA7650B@lrz.uni-muenchen.de> Message-ID: > Being very lazy, I don't feel like reinventing the wheel, and XML + > trees is a rather frequent combination (30+ k hits on Google). Have > any of you ran across any such beast? It's no HTML rendering, but maybe http://www.logilab.org/xmltools/ and http://www.logilab.org/xmltools/img/xmltree.png is something alike what you're for? Hope this helps. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From Eugene.Leitl@lrz.uni-muenchen.de Wed Apr 4 21:57:05 2001 From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed, 04 Apr 2001 22:57:05 +0200 Subject: [XML-SIG] rendering XML trees to HTML References: Message-ID: <3ACB8AA1.B03B82FC@lrz.uni-muenchen.de> Nicolas Chauvat wrote: > It's no HTML rendering, but maybe http://www.logilab.org/xmltools/ and > http://www.logilab.org/xmltools/img/xmltree.png is something alike what > you're for? Thanks. I've seen it. No, I'd rather not use gtk. And Tk tree widget does essentially all I need (unfortunately it is a bit heavy on bandwidth, and evaluating clicks on nodes is not quite trivial). I guess I have to roll my own... Thanks again! -- Eugene From gianni@postino.it Indirizzo primario gianni@postino.it Wed Apr 4 22:28:22 2001 From: gianni@postino.it Indirizzo primario gianni@postino.it (Gianni Rubagotti) Date: Thu, 5 Apr 2001 1:58:22 +0430 Subject: [XML-SIG] XML for humanities Message-ID: <20010404212822.RGIQ21859.sirio@[213.145.3.20]> I'm glad to announce to you that 2 mailing lists are borning, now on yahoogroups to be soon moved to a university server (University of Milan through Professor Degli Antoni is preparing to to do it) x-humanities (in English) Mailing list about the utility of XML for Humanist applications http://groups.yahoo.com/group/x-humanities to Subscribe send a mail to x-humanities-subscribe@yahoogroups.com x-umanisti (in Italian) Applicazioni XML per letteratura, filosofie e scienze umane http://groups.yahoo.com/group/x-umanisti to Subscribe send a mail to x-umanisti-subscribe@yahoogroups.com All the people interested in this topic is invited to join! Best regards, Gianni Rubagotti From a.kellett@lancaster.ac.uk Wed Apr 4 23:30:02 2001 From: a.kellett@lancaster.ac.uk (Alexandertje) Date: Wed, 4 Apr 2001 23:30:02 +0100 Subject: [XML-SIG] opera bookmark converter Message-ID: <20010404233002.B7735@unix.lancs.ac.uk> --G4iJoqBmSsgzjUCe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Included is a perl script to convert from opera bookmark files into konqueror/xbel bookmarks. btw, unlike the one to kde-cvs@ this version actually works ;) thanks, Alex --G4iJoqBmSsgzjUCe Content-Type: application/x-perl Content-Disposition: attachment; filename="opera2xbel.pl" #!/usr/bin/perl use HTML::Entities; print "\n"; while (<>) { if (/^\#/) { chomp; $type = $_; $entry=(); ENTRY: while(<>) { chomp; last ENTRY if /^$/; next ENTRY unless /^[ \t](URL|NAME)=(.*)/; $entry{$1} = $2; } if ($type eq "#URL") { print " \n"; print " " . $entry{"NAME"} . "\n"; print " \n"; } elsif ($type eq "#FOLDER") { print "\n"; print " " . $entry{"NAME"} . "\n"; } } elsif (/^\-$/) { print "\n"; } } print "\n" --G4iJoqBmSsgzjUCe-- From rsalz@zolera.com Thu Apr 5 14:51:33 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 5 Apr 2001 09:51:33 -0400 Subject: [XML-SIG] Documenting xpath Message-ID: <200104051351.JAA20218@os390.zolera.com> Next week I want to start documenting the xpath (nee 4XPath) classes. I'll work bottom-up. First, is there documentation on the Node class, other than the DOM and Python mapping? If not, what do folks think should be documented? General model: scanner (currently only pyxpath) uses a parser factory (currently only FtFactory) to build up its "compiled scanner" object. This compiled scanner must support an evaluate(context) method which returns a nodeset (list of nodes). Parsing, pyxpath, parsed expressions pyxpath.Compile -- module-level method for compiling a query Context class Package-level: Errors: SyntaxException(pos,msg), GeneralException, Error class(?) def Evaluate(expr, contextNode=None, context=None): registers $EXTMODULES, creates context, compiles, evaluates def Compile(expr): calls "a" compiler (currently only pyxpath) def CreateContext(contextNode): Writing extensions: if os.environ.has_key('EXTMODULES'): def RegisterExtensionModules(moduleNames): (a dictionary of (namespace-string localname-string):method) Comments? From noreply@sourceforge.net Thu Apr 5 15:00:45 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 05 Apr 2001 07:00:45 -0700 Subject: [XML-SIG] [ pyxml-Bugs-414001 ] _checkversion.py obsolete Message-ID: Bugs item #414001, was updated on 2001-04-05 07:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=414001&group_id=6473 Category: None Group: None Status: Open Priority: 5 Submitted By: Rich Salz (rsalz) Assigned to: Martin v. L�wis (loewis) Summary: _checkversion.py obsolete Initial Comment: Should _checkversion.py be removed? ; python _checkversion.py Traceback (most recent call last): File "_checkversion.py", line 5, in ? import pyversioncheck ImportError: No module named pyversioncheck ; python -V Python 2.0 ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=414001&group_id=6473 From krussll@cc.UManitoba.CA Thu Apr 5 23:26:12 2001 From: krussll@cc.UManitoba.CA (Kevin Russell) Date: Thu, 5 Apr 2001 17:26:12 -0500 (CDT) Subject: [XML-SIG] How to leave character entities alone Message-ID: I'm using the DOM from PyXML 0.6.5 to manipulate documents with the Text Encoding Initiative DTD (teixlite.dtd). But all of the character entities, like – and ;, vanish into thin air when read into a DOM object. I've tried with minidom and 4DOM; I've tried with validation under 4DOM (where it gags on the DTD itself, which will probably be the subject of my next frantic query) and without; I've tried all the standard readers (please tell me I don't have to build my own). I'm obviously thrashing around clueless. So, using any combination of stuff in PyXML, how can I achieve *any* of the following (in decreasing order of niceness for me): - leave all character entity references unexpanded, sitting in the tree as well-behaved little EntityReference objects. - expand them into raw text if necessary, but trick the Printer into turning them back into –, ;, etc., when it's time to output the mangled DOM back into XML. - expand them into raw text and leave them that way. Indeed, anything short of having them vanish into thin air would be tolerable. Sorry for such a braindead question, but several wee-hours of squinting at documentation and source-code have left me unable to see the answer that's undoubtedly sitting out there in plain sight. -- Kevin Russell From uche.ogbuji@fourthought.com Fri Apr 6 05:37:52 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 05 Apr 2001 22:37:52 -0600 Subject: [XML-SIG] "Borrowed" tests Message-ID: <200104060438.f364bqg16888@borgia.local> I just noticed that in the transition from 4Suite to PyXML, the "borrowed" test for 4DOM went missing. I've long had the practice of placing test cases based on bug reports or examples provided by others into a "test_suite/borrowed" directory. I think this is useful, and important to distinguish from the "canned" tests. If these are worth keeping, where should we put them? I can find the old "borrowed" test for 4DOM, but should I check them in to PyXML? If so, where? "test/dom/borrowed"? "test/dom/use_case"? These are especially important for when 4XPath/4XSLT migrate into PyXML, since some of the best testing of 4XSLT comes from examples culled from the twisted minds on xml-sig. Thoughts? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Apr 6 06:15:48 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 05 Apr 2001 23:15:48 -0600 Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: Message from Rich Salz of "Mon, 02 Apr 2001 22:42:16 EDT." <200104030242.WAA12192@os390.zolera.com> Message-ID: <200104060516.f365FmS16995@borgia.local> > xpath (which Martin has merged in from 4Xpath) doesn't recognize CDATA > nodes. According to the spec, http://www.w3.org/TR/xpath#NT-NodeType , > "a CDATA section is treated as if the were removed and > every occurrence of < and & were replaced by < and & respectively." ??? Do you mean when passing 4DOM nodes to 4XPath? The Domlettes transparently turn CDATASections into contiguous text nodes. If you do mean the situation with 4DOM, there is no easy fix for this. There was some discussion on xsl-list about the problems of efficiently mapping DOM to XPath. It seems some XPath implementors have simply declined DOM support except through complete export to a specialized format. > I think the way to fix this is to add CDATA_SECTION_NODE next to every > place that TEXT_NODE appears, but to do the above conversion whenever the > *value* of the node is needed. This could be a performance nightmare (you'll be amazed how often string-value() conversion is done in typical XSLT processing). Besides, it just papers over the one problem, but what about adjacent text/CDATA nodes? What about entity reference nodes? > Make sense? I just want to double-check before I (ab)use my new status. :) Nice to have you chipping in. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Apr 6 06:27:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 05 Apr 2001 23:27:47 -0600 Subject: [XML-SIG] Documenting xpath In-Reply-To: Message from Rich Salz of "Thu, 05 Apr 2001 09:51:33 EDT." <200104051351.JAA20218@os390.zolera.com> Message-ID: <200104060528.f365Rlr17027@borgia.local> > Next week I want to start documenting the xpath (nee 4XPath) classes. > I'll work bottom-up. Unfortunately, 4XPath is not fully merged in yet. It's still actively developed as part of 4Suite and I try to merge in at stable points. If you let me know when you want to start your work, I'll try to sync up for you, but I'd be careful. Normally it wouldn't be a big deal, but I'm currently whipping 4Suite into shape for the 0.11.0 release so changes are a-brewing. 4XSLT and 4XPath will be moving completely into PyXML immediately following the 4Suite 1.0 release ca. June 1. Note that they'll keep their names while in PyXML. > Package-level: > Errors: SyntaxException(pos,msg), GeneralException, Error class(?) I just added a RuntimeException class which deals with such things as undefined variable and function refs during eval. > def Evaluate(expr, contextNode=None, context=None): > registers $EXTMODULES, creates context, compiles, evaluates > > def Compile(expr): > calls "a" compiler (currently only pyxpath) No, same with 4XPath. > def CreateContext(contextNode): > > Writing extensions: > if os.environ.has_key('EXTMODULES'): > def RegisterExtensionModules(moduleNames): > (a dictionary of (namespace-string localname-string):method) Yes. The user can either set the environment or just directly call RegisterExtensionModules(['mymodulename']) You'll want to have a look at http://services.4Suite.org/documents/4Suite/4XPath-Api to get a start. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Apr 6 06:34:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 05 Apr 2001 23:34:47 -0600 Subject: [XML-SIG] How to leave character entities alone In-Reply-To: Message from Kevin Russell of "Thu, 05 Apr 2001 17:26:12 CDT." Message-ID: <200104060535.f365Ylg17064@borgia.local> > I'm using the DOM from PyXML 0.6.5 to manipulate documents with the Text > Encoding Initiative DTD (teixlite.dtd). But all of the character > entities, like – and ;, vanish into thin air when read into a > DOM object. I've tried with minidom and 4DOM; I've tried with validation > under 4DOM (where it gags on the DTD itself, which will probably be the > subject of my next frantic query) and without; I've tried all the standard > readers (please tell me I don't have to build my own). I'm obviously > thrashing around clueless. There was a bug in 4DOM where entities weren't being handled properly, but this should now be sorted out. If you can provide a test case, I'll double-check that this is the case. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Fri Apr 6 11:34:40 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 06 Apr 2001 03:34:40 -0700 Subject: [XML-SIG] [ pyxml-Bugs-414263 ] pDomlette: appending empty DocFrag Message-ID: Bugs item #414263, was updated on 2001-04-06 03:34 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=414263&group_id=6473 Category: 4Suite Group: None Status: Open Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: pDomlette: appending empty DocFrag Initial Comment: When using appendChild with an empty DocumentFragment, the fragment is appended to the node. Same thing with insertBefore. Demo code : >>> from Ft.Lib.pDomlette import PyExpatReader >>> d = PyExpatReader().fromString('') >>> df = d.createDocumentFragment() >>> d.documentElement.appendChild(df) >>> d.documentElement.childNodes # We expect to see only the child here [, ] The problem is that the new node is tested both for DocumentFragmentitude and non emptyness in the code, and the the else clause catches the empty doc frag and does the default processing with it. Attached to this report is a patch to pDomlette.py that fixes the problem. It was generated on top of the other patch I submitted this week, but they do not overlap, so this should not be problem. Cheers Alexandre Fayolle ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=414263&group_id=6473 From rsalz@zolera.com Fri Apr 6 11:43:20 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 06 Apr 2001 06:43:20 -0400 Subject: [XML-SIG] "Borrowed" tests References: <200104060438.f364bqg16888@borgia.local> Message-ID: <3ACD9DC8.96D30286@zolera.com> How about test/dom/regress -- regression tests seems like a reasonable definition. /r$ From uche.ogbuji@fourthought.com Fri Apr 6 14:46:03 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 06 Apr 2001 07:46:03 -0600 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: Message from Rich Salz of "Fri, 06 Apr 2001 06:43:20 EDT." <3ACD9DC8.96D30286@zolera.com> Message-ID: <200104061346.f36Dk3S20089@borgia.local> > How about test/dom/regress -- regression tests seems like a reasonable > definition. Well, that doesn't properly differentiate them from the "canned" test, since those are also intended to be regression tests. What about "real-world" tests? I also considered calling them "black-box" tests, except that again, some of the canned tests are black-box (and some are white-box). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From cce@clarkevans.com Fri Apr 6 15:18:24 2001 From: cce@clarkevans.com (Clark C. Evans) Date: Fri, 6 Apr 2001 09:18:24 -0500 (EST) Subject: [XML-SIG] "Borrowed" tests In-Reply-To: <200104060438.f364bqg16888@borgia.local> Message-ID: On Thu, 5 Apr 2001, Uche Ogbuji wrote: > I've long had the practice of placing test cases based on bug reports or > examples provided by others into a "test_suite/borrowed" directory. I think > this is useful, and important to distinguish from the "canned" tests. Your logic here escapes me. Why would this distinction be necessary? It just seems like extra work with no additional benifit. As I find bugs in my software I add it to my regression test, and usually name the test after the bug. You could opt for a naming convention, where "canned" tests start with a "c" and borrowed tests start with a "b" (for bug) Clark From rsalz@zolera.com Fri Apr 6 15:19:17 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 06 Apr 2001 10:19:17 -0400 Subject: [XML-SIG] Documenting xpath References: <200104060528.f365Rlr17027@borgia.local> Message-ID: <3ACDD065.907F3D8C@zolera.com> > Unfortunately, 4XPath is not fully merged in yet. It's still actively > developed as part of 4Suite and I try to merge in at stable points. > > If you let me know when you want to start your work, I'll try to sync up for > you, but I'd be careful. No prob, I'll be careful. And I'll watch the checkin messages. > Normally it wouldn't be a big deal, but I'm currently whipping 4Suite into > shape for the 0.11.0 release so changes are a-brewing. Good luck. > http://services.4Suite.org/documents/4Suite/4XPath-Api Yeah, somehow I missed that one. Hmm, perhaps it makes sense to work on something else first? Suggestions, anyone (fdrake?) /r$ From fdrake@acm.org Fri Apr 6 16:01:54 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Apr 2001 11:01:54 -0400 (EDT) Subject: [XML-SIG] Documenting xpath In-Reply-To: <3ACDD065.907F3D8C@zolera.com> References: <200104060528.f365Rlr17027@borgia.local> <3ACDD065.907F3D8C@zolera.com> Message-ID: <15053.55906.109080.617192@cj42289-a.reston1.va.home.com> Rich Salz writes: > Hmm, perhaps it makes sense to work on something else first? > Suggestions, anyone (fdrake?) Given that we expect Python 2.1 to be released in a week, I'd like someone to review and, if appropriate, elaborate on the DOM and SAX documentation in the Python Library Reference. It definately needs some work, but I don't know how much I'll be able to do over the next week -- between other projects and building up the documentation on testing facilities, I'm pretty swamped. (Look for real unittest docs to land in the Python CVS this weekend!) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Fri Apr 6 16:09:04 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Apr 2001 11:09:04 -0400 (EDT) Subject: [XML-SIG] "Borrowed" tests In-Reply-To: <200104061346.f36Dk3S20089@borgia.local> References: <3ACD9DC8.96D30286@zolera.com> <200104061346.f36Dk3S20089@borgia.local> Message-ID: <15053.56336.313068.591813@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Well, that doesn't properly differentiate them from the "canned" test, since > those are also intended to be regression tests. I agree with the sentiment that there's no need to separate them from the rest of the tests -- they are needed to ensure coverage, or the bug would have been caught earlier anyway. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From rsalz@zolera.com Fri Apr 6 16:08:32 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 06 Apr 2001 11:08:32 -0400 Subject: [XML-SIG] xpath (4xpath) and CDATA References: <200104060516.f365FmS16995@borgia.local> Message-ID: <3ACDDBF0.E39905C7@zolera.com> > This could be a performance nightmare (you'll be amazed how often > string-value() conversion is done in typical XSLT processing). Besides, it > just papers over the one problem, but what about adjacent text/CDATA nodes? > What about entity reference nodes? I don't care about entity references right now. :) Fred and I disagree about the interepretation of the Xpath spec, so for now my changes just treat a CDATA identically to a text node. /r$ From uche.ogbuji@fourthought.com Fri Apr 6 16:39:36 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 06 Apr 2001 09:39:36 -0600 Subject: [XML-SIG] Documenting xpath In-Reply-To: Message from "Fred L. Drake, Jr." of "Fri, 06 Apr 2001 11:01:54 EDT." <15053.55906.109080.617192@cj42289-a.reston1.va.home.com> Message-ID: <200104061539.f36Fda721122@borgia.local.dhcp.fourthought.com> > > Rich Salz writes: > > Hmm, perhaps it makes sense to work on something else first? > > Suggestions, anyone (fdrake?) > > Given that we expect Python 2.1 to be released in a week, I'd like > someone to review and, if appropriate, elaborate on the DOM and SAX > documentation in the Python Library Reference. Yes, since this is probably source #1 of FAQs. I know Greg Wilson would be happy, since I haven't been able to help him with civilized DOM docs for Python. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Fri Apr 6 16:42:04 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Apr 2001 11:42:04 -0400 (EDT) Subject: [XML-SIG] Documenting xpath In-Reply-To: <200104061539.f36Fda721122@borgia.local.dhcp.fourthought.com> References: <15053.55906.109080.617192@cj42289-a.reston1.va.home.com> <200104061539.f36Fda721122@borgia.local.dhcp.fourthought.com> Message-ID: <15053.58316.100972.112192@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > I know Greg Wilson would be happy, since I haven't been able to help him with > civilized DOM docs for Python. Heck, I'm having a hard time imagining the DOM as civilized! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Fri Apr 6 16:45:22 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 06 Apr 2001 09:45:22 -0600 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: Message from "Clark C. Evans" of "Fri, 06 Apr 2001 09:18:24 CDT." Message-ID: <200104061545.f36FjMW21151@borgia.local.dhcp.fourthought.com> > On Thu, 5 Apr 2001, Uche Ogbuji wrote: > > I've long had the practice of placing test cases based on bug reports or > > examples provided by others into a "test_suite/borrowed" directory. I think > > this is useful, and important to distinguish from the "canned" tests. > > Your logic here escapes me. Why would this distinction > be necessary? It's not necessary. My main goal is to get the tests back in regardless of where they go. I was just suggesting a course based on what I'd done before. The reason why I made the distinction before is that the "canned" tests were grouped according to speficication fiat, so there'd be, say a test_variable.py which tests the various diktats of section 11.4 of the XSLT spec. The tests that come from bug-reports and borrowed code tend not to be so easy to neatly categorize. Therefore I placed them in a separate directory just to provide a separate axis of grouping. Not a big deal, except that this directory got lost for 4DOM as it moved to PyXML. > It just seems like extra work with no > additional benifit. There's no extra work whatsoever. Why do you think there is? > As I find bugs in my software I > add it to my regression test, and usually name the > test after the bug. You could opt for a naming > convention, where "canned" tests start with a "c" > and borrowed tests start with a "b" (for bug) This is exactly what I do. The *only* difference being that I differentiate by directory placement rather than prefix. I actually think the directory approach is less work than the prefix approach, but perhaps we're misunderstanding each other. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Apr 6 16:49:08 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 06 Apr 2001 09:49:08 -0600 Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: Message from Rich Salz of "Fri, 06 Apr 2001 11:08:32 EDT." <3ACDDBF0.E39905C7@zolera.com> Message-ID: <200104061549.f36Fn8w21159@borgia.local.dhcp.fourthought.com> > > This could be a performance nightmare (you'll be amazed how often > > string-value() conversion is done in typical XSLT processing). Besides, it > > just papers over the one problem, but what about adjacent text/CDATA nodes? > > What about entity reference nodes? > > I don't care about entity references right now. :) > > Fred and I disagree about the interepretation of the Xpath spec, so for > now my changes just treat a CDATA identically to a text node. Well, we don't know who else is out there: we just got a query about TEI lite. Woe betide anyone who tries to use TEI as 4DOM in XSLT. But I guess going with the XP rule of solving the most tractable problem might be OK here. I'd once though about adding a node.naziNormalize() sort of method, which would savagely go through the DOM expanding entity refs, merging adjacent text and CDATA nodes, and converting CDATA nodes to text. I also considered recommending TreeWalker for using 4DOM with 4XPath. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Fri Apr 6 17:56:39 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Apr 2001 12:56:39 -0400 (EDT) Subject: [XML-SIG] XBEL 1.1 docs up for review Message-ID: <15053.62791.21353.257128@cj42289-a.reston1.va.home.com> I've posted the docs for XBEL 1.1 for review: http://starship.python.net/crew/fdrake/xbel-1.1/xbel.html Please send comments to this forum. Note that something has broken in the formatter's support for bibTeX; I'll fix that as soon as I can, but the references are seriously broken in the preview copy. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Mon Apr 9 02:21:28 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Apr 2001 03:21:28 +0200 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: <200104060438.f364bqg16888@borgia.local> (message from Uche Ogbuji on Thu, 05 Apr 2001 22:37:52 -0600) References: <200104060438.f364bqg16888@borgia.local> Message-ID: <200104090121.f391LSj02038@mira.informatik.hu-berlin.de> > If these are worth keeping, where should we put them? I think it is always worth to keep test cases. The other question is whether it is worth distributing them with every PyXML/4Suite installation. On cvs.pyxml.sourceforge.net, a new test module was created to carry test cases which are not meant to be shipped - so that would be another option. One of the problems I have with your "borrowed" test cases is that it is not always easy to tell PASS from FAIL; they just produce a lot of output. To be useful for a user, a clear pass/fail indication is necessary. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Apr 9 01:50:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Apr 2001 02:50:59 +0200 Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: <3ACDDBF0.E39905C7@zolera.com> (message from Rich Salz on Fri, 06 Apr 2001 11:08:32 -0400) References: <200104060516.f365FmS16995@borgia.local> <3ACDDBF0.E39905C7@zolera.com> Message-ID: <200104090050.f390oxV01987@mira.informatik.hu-berlin.de> > Fred and I disagree about the interepretation of the Xpath spec, so for > now my changes just treat a CDATA identically to a text node. Does that really work? Shouldn't multiple text nodes be combined with any CDATA sections before doing any processing? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Apr 9 02:26:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Apr 2001 03:26:16 +0200 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: (cce@clarkevans.com) References: Message-ID: <200104090126.f391QGZ02040@mira.informatik.hu-berlin.de> > Your logic here escapes me. Why would this distinction > be necessary? It just seems like extra work with no > additional benifit. As I find bugs in my software I > add it to my regression test, and usually name the > test after the bug. Outright doing so with bugs found by others might violate the copyright of the others; there should be an indication of authorship, and there must be, of course, a permission of the author of the test case that copying it is allowed. It is often hard to determine whether the author would agree to redistribution of a test case, so it is clearly better to keep them separate from the cases with clear licensing conditions. The GCC team could not ship its entire regression test suite with gcc 2.95 because authorship of many of the tests cannot be established. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Apr 9 10:23:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Apr 2001 11:23:00 +0200 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: <200104090225.f392Pkr01687@borgia.local> (message from Uche Ogbuji on Sun, 08 Apr 2001 20:25:46 -0600) References: <200104090225.f392Pkr01687@borgia.local> Message-ID: <200104090923.f399N0500930@mira.informatik.hu-berlin.de> > > One of the problems I have with your "borrowed" test cases is that it > > is not always easy to tell PASS from FAIL; they just produce a lot of > > output. To be useful for a user, a clear pass/fail indication is > > necessary. > > Hmm. All the 4DOM and 4XSLT tests do indicate pass/fail using the "[OK]" or > "[FAILED]" notation we borrowed from Red Hat start-up scripts. Does that include the files which currently live in /xml/test/dom? I run that as 'python test.py', and see no such output. Some of the tests end with printing 'foo works', some only write the test name (such as ********** Node ********** ********** NodeList ********** ********** NamedNodeMap ********** ********** NodeIterator ********** ********** TreeWalker ********** ********** Attr ********** ... The entire test run ends with ********** HTML HTML_DOM_IMPLEMENTATION ********** testing source code syntax The Title Test Time - 1.782 secs That probably means 'pass', although I'm not certain whether I would recognize a failure. Regards, Martin From jeremy.kloth@fourthought.com Mon Apr 9 17:42:30 2001 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Mon, 09 Apr 2001 10:42:30 -0600 Subject: [XML-SIG] "Borrowed" tests References: <200104090225.f392Pkr01687@borgia.local> <200104090923.f399N0500930@mira.informatik.hu-berlin.de> Message-ID: <3AD1E676.1BD82F9B@fourthought.com> "Martin v. Loewis" wrote: > > > > One of the problems I have with your "borrowed" test cases is that it > > > is not always easy to tell PASS from FAIL; they just produce a lot of > > > output. To be useful for a user, a clear pass/fail indication is > > > necessary. > > > > Hmm. All the 4DOM and 4XSLT tests do indicate pass/fail using the "[OK]" or > > "[FAILED]" notation we borrowed from Red Hat start-up scripts. > > Does that include the files which currently live in /xml/test/dom? I > run that as 'python test.py', and see no such output. Some of the > tests end with printing 'foo works', some only write the test name > (such as > That was a bug in the TestSuite module which was checked into there. The tests now properly display their status for each test. One exception, the HTML tests were NOT updated to use this method. They do validate the results, however. -- Jeremy Kloth Consultant jeremy.kloth@fourthought.com (303)583-9900 x 105 Fourthought, Inc. http://www.fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From pyxml@xhaus.com Mon Apr 9 18:56:58 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Mon, 9 Apr 2001 18:56:58 +0100 Subject: [XML-SIG] Using character entities in external DTD without validating. Message-ID: Hi all, Firstly, thanks for the great XML software. I have a small problem. I have several hundred xml files, which contain data for the members of a scientific association, as well as research abstracts. I have used a variety of XML tools over the years to process them, including Java tools with Jpython. But now I want to use Cpython, for speed and memory reasons. The problem I have is this. Many of the files contain character entity references, which refer to ISO-8859-1 characters, as well as other characters such greek letters (alpha, mu, etc). The entity references are defined in one central DTD file, which every single XML refers to using a DOCTYPE declaration. But I do not have an actual structure for the XML files themselves, they are a pretty random structure that has grown over the years. I first started my Cpython/PyXML port by trying to use PyExpat. However, since PyExpat doesn't read the external subset, it dropped all my character entities. Then I tried to the Sax2.Reader from xml.dom.ext.reader. This reads the external subset, when the vaidate flag is turned on (i.e. the reader is instantiated like so "reader=Sax2.Reader(validate=1)". But now the Sax2.Reader is, understandably, insisting that my documents conform to a structure, which they don't, so I get errors such as "Element not declared". Can anyone suggest a way that I can keep the character entity definitions in an external file, AND read the documents without validating them? I considered converting all of the documents to ISO-8859-1 encoding, but doesn't solve the problem of the Greek letters in paper abstracts. I really don't want to have to define those character entities in the internal subset of all these documents. Thanks in advance for any help, Regards, Alan. From brunson@level3.net Mon Apr 9 22:29:24 2001 From: brunson@level3.net (Eric Brunson) Date: Mon, 9 Apr 2001 15:29:24 -0600 Subject: [XML-SIG] FileReader? Message-ID: <20010409152924.A8363@level3.net> I feel like this is a terribly stupid question, but I can't seen to figure out where I should be looking. Where is the xml.dom.util module? Is it supposed to be part of PyXML? I've checked the latest source distro and CVS, but I can't locate it. Most of the example code seems to depend on it, so if anyone could point me to the source, I'd appreciate it. Thanks, e. -- Eric Brunson - brunson@level3.net - page-eric@level3.net "When governments fear the people there is liberty. When the people fear the government there is tyranny." - Thomas Jefferson From martin@loewis.home.cs.tu-berlin.de Mon Apr 9 22:55:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 9 Apr 2001 23:55:00 +0200 Subject: [XML-SIG] Using character entities in external DTD without validating. In-Reply-To: References: Message-ID: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> > Can anyone suggest a way that I can keep the character entity definitions in > an external file, AND read the documents without validating them? > > I considered converting all of the documents to ISO-8859-1 encoding, but > doesn't solve the problem of the Greek letters in paper abstracts. I really > don't want to have to define those character entities in the internal subset > of all these documents. Did you consider using character references, instead of external entities? If that is also not feasible, I believe that none of the existing parsers will exactly fit your need. You cannot talk pyexpat into reading the external subset. With some efforts, you might manage to talk xmlproc (the validating parser) into not producing validation errors. The most promising approach might be to use sgmlop. There is currently no SAX2 sgmlop driver, but there is a SAX1 one; this does not support entity references, though. So here is a rough outline of what might succeed: - extend drv_sgmlop.py to also support entity references. To do that, you best inherit from xml.sax.drivers.drv_sgmlop.Parser and add a handle_entityref method. In your code, this method should magically know your DTD; off-hand, I don't see a way to have sgmlop actually parse the external subset as well. Whenever you see an entity reference, invoke self.doc_handler.characters(,0,len()) - Create an instance of your SAX driver. - Pass that to Sax.From*, as the parser= parameter. Hope this helps, Martin From swu@sybase.com Mon Apr 9 23:05:39 2001 From: swu@sybase.com (Sonny Wu) Date: Mon, 09 Apr 2001 15:05:39 -0700 Subject: [XML-SIG] XML howto update for Python 2.0? Message-ID: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> Hello, Is there an expected version of document for XML.SAX in the near future? Right now it seems to be insync with what is in 2.0. Thanks, Sonny From martin@loewis.home.cs.tu-berlin.de Tue Apr 10 06:23:13 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 10 Apr 2001 07:23:13 +0200 Subject: [XML-SIG] FileReader? In-Reply-To: <20010409152924.A8363@level3.net> (message from Eric Brunson on Mon, 9 Apr 2001 15:29:24 -0600) References: <20010409152924.A8363@level3.net> Message-ID: <200104100523.f3A5NDX00867@mira.informatik.hu-berlin.de> > I feel like this is a terribly stupid question, but I can't seen to > figure out where I should be looking. > > Where is the xml.dom.util module? Is it supposed to be part of PyXML? Not anymore, no. What made you think there should be such a module? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Apr 10 06:22:11 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 10 Apr 2001 07:22:11 +0200 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> (message from Sonny Wu on Mon, 09 Apr 2001 15:05:39 -0700) References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> Message-ID: <200104100522.f3A5MBk00865@mira.informatik.hu-berlin.de> > Is there an expected version of document for XML.SAX in the near future? > Right now it seems to be insync with what is in 2.0. I'm not sure I understand the question. The answer seems to be "no", nobody plans to modify the XML howto. Since it is in sync with what is in 2.0, there is no need for that, either... Regards, Martin From walter@livinglogic.de Tue Apr 10 10:16:12 2001 From: walter@livinglogic.de (=?us-ascii?Q?=22Walter_D=F6rwald=22?=) Date: Tue, 10 Apr 2001 11:16:12 +0200 Subject: [XML-SIG] Using character entities in external DTD without validating. In-Reply-To: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> References: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> Message-ID: <200104101116120984.0032FAC8@mail.livinglogic.de> On 09.04.01 at 23:55 Martin v. Loewis wrote: > > Can anyone suggest a way that I can keep the character entity > definitions in > > an external file, AND read the documents without validating them? > > > > I considered converting all of the documents to ISO-8859-1 encoding,= but > > doesn't solve the problem of the Greek letters in paper abstracts. I > really > > don't want to have to define those character entities in the internal > subset > > of all these documents. > > Did you consider using character references, instead of external > entities? > > If that is also not feasible, I believe that none of the existing > parsers will exactly fit your need. You cannot talk pyexpat into > reading the external subset. With some efforts, you might manage to > talk xmlproc (the validating parser) into not producing validation > errors. > > The most promising approach might be to use sgmlop. There is currently > no SAX2 sgmlop driver, I do have a rough, untested SAX2 driver for sgmlop, which could be used as the base for a real SAX2 driver. If there is interest I can post it. > but there is a SAX1 one; this does not support > entity references, though. > > So here is a rough outline of what might succeed: > - extend drv_sgmlop.py to also support entity references. To do that, > you best inherit from xml.sax.drivers.drv_sgmlop.Parser and add a > handle_entityref method. In your code, this method should magically > know your DTD; off-hand, I don't see a way to have sgmlop actually > parse the external subset as well. Whenever you see an entity > reference, invoke > > self.doc_handler.characters(,0,len()) > > - Create an instance of your SAX driver. > > - Pass that to Sax.From*, as the parser=3D parameter. Alternatively, you could try using XIST (ftp://titan.bnbt.de/pub/livinglogic/xist/), which is based on sgmlop and does exactly what Martin suggested. It "automagically" knows the character entities, so you can type Α to get the character: greek capital letter alpha, U+0391 And if you need a new entity you can simple add one Python class to define it: class Spam(xsc.Entity): "the spam character, U+4242" codepoint =3D 0x4242 HTH Bye, Walter D=F6rwald -- Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7= www.livinglogic.de From swu@sybase.com Tue Apr 10 18:15:53 2001 From: swu@sybase.com (Sonny Wu) Date: Tue, 10 Apr 2001 10:15:53 -0700 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <200104100522.f3A5MBk00865@mira.informatik.hu-berlin.de> References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> Message-ID: <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> Hi Martin, Sorry for an unclear question. My question is about the XML howto document, as someone has posted on the news group, appears to be outdated. (please refer to this thread: http://groups.google.com/groups?q=saxlib&hl=en&lr=&safe=off&rnum=1&seld=970271319&ic=1 The XML howto (http://py-howto.sourceforge.net/xml-howto/xml-howto.html) still documents the usage of xml.sax.saxlib. When I downloaded Python 2.0 and began to read the online document, I could not find any related info about this package saxlib. To that end, I am hoping to locate the new howto that utilizes xml.sax and xml.sax.xmlreader packages. The XML howto demonstrates the usage of At 07:22 AM 4/10/2001, Martin v. Loewis wrote: > > Is there an expected version of document for XML.SAX in the near future? > > Right now it seems to be insync with what is in 2.0. > >I'm not sure I understand the question. The answer seems to be "no", >nobody plans to modify the XML howto. Since it is in sync with what is >in 2.0, there is no need for that, either... > >Regards, >Martin Thanks, Sonny From fdrake@acm.org Tue Apr 10 18:36:49 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 10 Apr 2001 13:36:49 -0400 (EDT) Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> Message-ID: <15059.17585.844156.112748@cj42289-a.reston1.va.home.com> Sonny Wu writes: > The XML howto (http://py-howto.sourceforge.net/xml-howto/xml-howto.html) > still documents the usage of xml.sax.saxlib. When I downloaded Python 2.0 > and began to read the online document, I could not find any related info > about this package saxlib. This document hasn't been updated for a while; I'm not sure if anyone is actively maintaining it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin@mems-exchange.org Tue Apr 10 18:59:10 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 10 Apr 2001 13:59:10 -0400 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <200104100522.f3A5MBk00865@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Apr 10, 2001 at 07:22:11AM +0200 References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <200104100522.f3A5MBk00865@mira.informatik.hu-berlin.de> Message-ID: <20010410135910.C15085@ute.cnri.reston.va.us> On Tue, Apr 10, 2001 at 07:22:11AM +0200, Martin v. Loewis wrote: >I'm not sure I understand the question. The answer seems to be "no", >nobody plans to modify the XML howto. Since it is in sync with what is >in 2.0, there is no need for that, either... I'd like to revisit the HOWTO eventually, but have no idea when I'll be able to find the time. --amk From dieter@handshake.de Tue Apr 10 19:57:34 2001 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 10 Apr 2001 20:57:34 +0200 (CEST) Subject: [XML-SIG] Using character entities in external DTD without validating. In-Reply-To: <215095255@toto.iv> Message-ID: <15059.22430.758810.339497@lindm.dm> Alan Kennedy writes: > how about defining the entity file in the internal subset > and including it there? Dieter From krussll@cc.UManitoba.CA Tue Apr 10 22:30:29 2001 From: krussll@cc.UManitoba.CA (Kevin Russell) Date: Tue, 10 Apr 2001 16:30:29 -0500 (CDT) Subject: [XML-SIG] Re: using character entities in external DTD without validating In-Reply-To: Message-ID: Instead of trying to coerce a single set of tools to work on your files without validation, it might be a better use of your time to come up with a maximally vacuous DTD for your documents. That way you'd still be able to use your character references, but because you now have validatable documents you wouldn't be tying yourself to any single toolset. I'm thinking of a trivial DTD that's little more than a reference to the external character sets plus the following for every element you've ever used. It probably wouldn't be too hard to write a Python program to read all your files using a non-validating parser and generate such a DTD based on the elements and attributes it encountered. -- Kevin Russell From brunson@level3.net Tue Apr 10 23:23:54 2001 From: brunson@level3.net (Eric Brunson) Date: Tue, 10 Apr 2001 16:23:54 -0600 Subject: [XML-SIG] FileReader? In-Reply-To: <200104100523.f3A5NDX00867@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Apr 10, 2001 at 07:23:13AM +0200 References: <20010409152924.A8363@level3.net> <200104100523.f3A5NDX00867@mira.informatik.hu-berlin.de> Message-ID: <20010410162353.A13906@level3.net> * Martin v. Loewis (martin@loewis.home.cs.tu-berlin.de) [010409 23:27]: > > I feel like this is a terribly stupid question, but I can't seen to > > figure out where I should be looking. > > > > Where is the xml.dom.util module? Is it supposed to be part of PyXML? > > Not anymore, no. What made you think there should be such a module? > Only that most of the example code I came across relied on the FileReader object, even code posted to this mailing list as recently as 29 Jan 2001. I see that all but one reference to FileReader has made it's way out of the PyXML demo code, except for demo/dom/benchmark.py. But you've definitely answered my question, utils.FileReader is no longer part of PyXML. Thanks, e. -- Eric Brunson - brunson@level3.net - page-eric@level3.net "When governments fear the people there is liberty. When the people fear the government there is tyranny." - Thomas Jefferson From martin@loewis.home.cs.tu-berlin.de Wed Apr 11 08:49:02 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 11 Apr 2001 09:49:02 +0200 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> (message from Sonny Wu on Tue, 10 Apr 2001 10:15:53 -0700) References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> Message-ID: <200104110749.f3B7n2w01620@mira.informatik.hu-berlin.de> > My question is about the XML howto document, as someone has posted on the > news group, > appears to be outdated. (please refer to this thread: > http://groups.google.com/groups?q=saxlib&hl=en&lr=&safe=off&rnum=1&seld=970271319&ic=1 The article claims that the howto mentions xml.sax.saxlib.HandlerBase, but I cannot find any such mentioning. Instead, it uses ContentHandler in most places, as it should. > The XML howto (http://py-howto.sourceforge.net/xml-howto/xml-howto.html) > still documents the usage of xml.sax.saxlib. When I downloaded Python 2.0 > and began to read the online document, I could not find any related info > about this package saxlib. Can you please give the exact URL of the page where saxlib is mentioned? I cannot find that, either. > To that end, I am hoping to locate the new howto that utilizes > xml.sax and xml.sax.xmlreader packages. How about http://py-howto.sourceforge.net/xml-howto/node7.html from xml.sax import make_parser from xml.sax.handler import feature_namespaces ... parser = make_parser() parser.setFeature(feature_namespaces, 0) ... The howto does not mention xmlreader, as you'd normally would not need to use that module. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Apr 11 08:57:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 11 Apr 2001 09:57:16 +0200 Subject: [XML-SIG] Re: using character entities in external DTD without validating In-Reply-To: (message from Kevin Russell on Tue, 10 Apr 2001 16:30:29 -0500 (CDT)) References: Message-ID: <200104110757.f3B7vGm01680@mira.informatik.hu-berlin.de> > Instead of trying to coerce a single set of tools to work on your files > without validation, it might be a better use of your time to come up with > a maximally vacuous DTD for your documents. That way you'd still be able > to use your character references, but because you now have validatable > documents you wouldn't be tying yourself to any single toolset. The key point to notice here is that he wanted to use CPython for speed. Now, if he had to produce a DTD and use a validating parser to get the external entities expanded, then the parser would be xmlproc. I'm not all that certain that this would be faster than using a Java parser from JPython, since xmlproc isn't the fastest thing on earth. OTOH, sgmlop *is* the fastest parser that PyXML can offer, so it might be the right tool. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Apr 11 08:51:43 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 11 Apr 2001 09:51:43 +0200 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <15059.17585.844156.112748@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> <15059.17585.844156.112748@cj42289-a.reston1.va.home.com> Message-ID: <200104110751.f3B7phJ01677@mira.informatik.hu-berlin.de> > > The XML howto (http://py-howto.sourceforge.net/xml-howto/xml-howto.html) > > still documents the usage of xml.sax.saxlib. When I downloaded Python 2.0 > > and began to read the online document, I could not find any related info > > about this package saxlib. > > This document hasn't been updated for a while; I'm not sure if > anyone is actively maintaining it. The *HOWTO* was updated together with PyXML 0.6.1. It does explain things that work only with PyXML (e.g. 4DOM), but the SAX part works almost unmodified with stock Python 2.0 (except for a single mentioning of DefaultHandler). The *reference manual* is the one that has not been updated for a while. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Apr 11 09:00:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 11 Apr 2001 10:00:59 +0200 Subject: [XML-SIG] FileReader? In-Reply-To: <20010410162353.A13906@level3.net> (message from Eric Brunson on Tue, 10 Apr 2001 16:23:54 -0600) References: <20010409152924.A8363@level3.net> <200104100523.f3A5NDX00867@mira.informatik.hu-berlin.de> <20010410162353.A13906@level3.net> Message-ID: <200104110800.f3B80xb01754@mira.informatik.hu-berlin.de> > Only that most of the example code I came across relied on the > FileReader object, even code posted to this mailing list as recently > as 29 Jan 2001. In case you've wondered what to use instead: xml.dom.ext.reader is the package that supports reading streams (strings, files, URLs) into DOM trees; that gives you a 4DOM tree. Alternatively, you can use xml.dom.minidom.parse to read a file into a minidom tree. Eventually, the DOM level 3 interfaces should be implemented to allow "portable" access to a DOM parser. Regards, Martin From Mario.Ruggier@softplumbers.com Wed Apr 11 13:29:47 2001 From: Mario.Ruggier@softplumbers.com (Ruggier, Mario) Date: Wed, 11 Apr 2001 14:29:47 +0200 Subject: [XML-SIG] XML howto update for Python 2.0? Message-ID: <7FDE48022CEFA544878EA148687723DF0FD965@mqgenevaex01.myqube.com> Hi,=20 if I may pipe in... i was looking for some convenient=20 sample code to build a doc using DOM, using (preferably) minidom or the full DOM implementation. But did not find=20 anything useful, e.g.:=20 http://py-howto.sourceforge.net/xml-howto/node17.html Any pointers appreciated. Many thanks & Best Regards. Mario Ruggier -- SoftPlumbers SA 26, rue Maunoir CH-1207 Gen=E8ve +41.22.849.1002 From fdrake@acm.org Wed Apr 11 13:46:16 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 11 Apr 2001 08:46:16 -0400 (EDT) Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <7FDE48022CEFA544878EA148687723DF0FD965@mqgenevaex01.myqube.com> References: <7FDE48022CEFA544878EA148687723DF0FD965@mqgenevaex01.myqube.com> Message-ID: <15060.21016.341366.370395@cj42289-a.reston1.va.home.com> Ruggier, Mario writes: > if I may pipe in... i was looking for some convenient > sample code to build a doc using DOM, using (preferably) > minidom or the full DOM implementation. But did not find I don't know if this fits "convenient", but there is some minidom code in the Doc/tools/sgmlconv/docfixer.py file of the Python source distribution, and some SAX code (an alternate reader) in Doc/tools/sgmlconv/esistools.py. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From stuff4gary@hotmail.com Wed Apr 11 13:46:46 2001 From: stuff4gary@hotmail.com (gary cor) Date: Wed, 11 Apr 2001 12:46:46 Subject: [XML-SIG] Missing core! Message-ID: Hi I wonder if anyone can with help these problems? I am trying to learn how to parse XML in python using SAX and DOM... and want to end up doing this through a cgi. Problems are that (other than not being a very good programmer hehe!):- � I am having difficulties finding some good tuturials � The tutorials I do have don't seem to work and I it keeps saying missing 'core' when I try anything with DOM. I will have to go through the Python/XML how to as well again later and try that again!.. I am using the Python 2.0 with pyPyXML-0.6.5.win32-py2.0.exe should I use any other downloads as well? Kind Regards Gary C _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From fdrake@acm.org Wed Apr 11 13:51:57 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 11 Apr 2001 08:51:57 -0400 (EDT) Subject: [XML-SIG] Missing core! In-Reply-To: References: Message-ID: <15060.21357.225082.295660@cj42289-a.reston1.va.home.com> gary cor writes: > =95 The tutorials I do have don't seem to work and I it keeps saying= missing=20 > 'core' when I try anything with DOM. > I will have to go through the Python/XML how to as well again later = and try=20 > that again!.. I am using the Python 2.0 with pyPyXML-0.6.5.win32-py= 2.0.exe=20 > should I use any other downloads as well? This sounds like you're using old documentation -- the Python/XML Reference is out of date at this point. xml.dom.core is no longer part of PyXML. You can get information about the DOM interface in the development version of the Python documentation: http://python.sourceforge.net/devel-docs/lib/markup.html These documents will be released for Python 2.1 next Monday, at which point they will be published on python.org with a reliable URL. -Fred --=20 Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Mon Apr 9 13:23:24 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 09 Apr 2001 06:23:24 -0600 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: Message from "Martin v. Loewis" of "Mon, 09 Apr 2001 11:23:00 +0200." <200104090923.f399N0500930@mira.informatik.hu-berlin.de> Message-ID: <200104091223.f39CNOW03570@borgia.local> > > > One of the problems I have with your "borrowed" test cases is that it > > > is not always easy to tell PASS from FAIL; they just produce a lot of > > > output. To be useful for a user, a clear pass/fail indication is > > > necessary. > > > > Hmm. All the 4DOM and 4XSLT tests do indicate pass/fail using the "[OK]" or > > "[FAILED]" notation we borrowed from Red Hat start-up scripts. > > Does that include the files which currently live in /xml/test/dom? I > run that as 'python test.py', and see no such output. Some of the > tests end with printing 'foo works', some only write the test name > (such as > > ********** Node ********** > ********** NodeList ********** > ********** NamedNodeMap ********** > ********** NodeIterator ********** > ********** TreeWalker ********** > ********** Attr ********** > ... > > The entire test run ends with > > ********** HTML HTML_DOM_IMPLEMENTATION ********** > testing source code syntax > > > > The Title > > > > Test Time - 1.782 secs > > That probably means 'pass', although I'm not certain whether I would > recognize a failure. Odd. I wonder if the checked-in version is out of date. Jeremy, any ideas? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Apr 9 03:25:46 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 08 Apr 2001 20:25:46 -0600 Subject: [XML-SIG] "Borrowed" tests In-Reply-To: Message from "Martin v. Loewis" of "Mon, 09 Apr 2001 03:21:28 +0200." <200104090121.f391LSj02038@mira.informatik.hu-berlin.de> Message-ID: <200104090225.f392Pkr01687@borgia.local> > > If these are worth keeping, where should we put them? > > I think it is always worth to keep test cases. The other question is > whether it is worth distributing them with every PyXML/4Suite > installation. On cvs.pyxml.sourceforge.net, a new test module was > created to carry test cases which are not meant to be shipped - so > that would be another option. OK. I'll put them here to start. > One of the problems I have with your "borrowed" test cases is that it > is not always easy to tell PASS from FAIL; they just produce a lot of > output. To be useful for a user, a clear pass/fail indication is > necessary. Hmm. All the 4DOM and 4XSLT tests do indicate pass/fail using the "[OK]" or "[FAILED]" notation we borrowed from Red Hat start-up scripts. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Apr 9 03:30:11 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 08 Apr 2001 20:30:11 -0600 Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: Message from "Martin v. Loewis" of "Mon, 09 Apr 2001 02:50:59 +0200." <200104090050.f390oxV01987@mira.informatik.hu-berlin.de> Message-ID: <200104090230.f392UBe01701@borgia.local> > > Fred and I disagree about the interepretation of the Xpath spec, so for > > now my changes just treat a CDATA identically to a text node. > > Does that really work? Shouldn't multiple text nodes be combined with > any CDATA sections before doing any processing? It doesn't really work. Not only is there the problem you mention, but there's also the problem of expanding and merging EntiryRefs. Rich says all he needs to solve is the CDATA-as-sole-child problem. I say that this problem is so non-general that I'd hate to see specialized code to solve it withough at least some discussion of how to solve the general problem. In other words, I'd not like to see code checked in to 4XPath that blindly checks for CDATASection nodes wherever it's looking for TEXT nodes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Wed Apr 11 17:02:44 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 11 Apr 2001 12:02:44 -0400 (EDT) Subject: [XML-SIG] xpath (4xpath) and CDATA In-Reply-To: <200104090050.f390oxV01987@mira.informatik.hu-berlin.de> References: <200104060516.f365FmS16995@borgia.local> <3ACDDBF0.E39905C7@zolera.com> <200104090050.f390oxV01987@mira.informatik.hu-berlin.de> Message-ID: <15060.32804.290270.984415@beowolf.digicool.com> Rich Salz said: > Fred and I disagree about the interepretation of the Xpath spec, so for > now my changes just treat a CDATA identically to a text node. Martin v. Loewis responded: > Does that really work? Shouldn't multiple text nodes be combined with > any CDATA sections before doing any processing? I think it's a matter of "better than it was before", and I think better than the change that Rich described to me in email. I'm not sufficiently familiar with the code to the "right thing", and don't have any particular expectation that I'll have time to pursue this myself. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Wed Apr 11 15:01:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 11 Apr 2001 10:01:02 -0400 (EDT) Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <200104110751.f3B7phJ01677@mira.informatik.hu-berlin.de> References: <5.0.2.1.0.20010409150403.01e80540@olympus.sybase.com> <5.0.2.1.0.20010410100424.01df22e0@olympus.sybase.com> <15059.17585.844156.112748@cj42289-a.reston1.va.home.com> <200104110751.f3B7phJ01677@mira.informatik.hu-berlin.de> Message-ID: <15060.25502.128525.25667@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > The *HOWTO* was updated together with PyXML 0.6.1. It does explain > things that work only with PyXML (e.g. 4DOM), but the SAX part works > almost unmodified with stock Python 2.0 (except for a single > mentioning of DefaultHandler). My bad: I hadn't realized that either had been updated. > The *reference manual* is the one that has not been updated for a while. As I mentioned about a week ago, I'm planning to work on isolating the Python bindings from the other reference documentation sometime after Python 2.1 is out. I'm not sure exactly when this will happen (since I have some Expat time scheduled, that really needs some work), but I expect I'll have preliminary documents available fairly quickly since they'll be based on the existing documentation. The intent is to avoid the relationship with Python's release cycle, and allow the bindings documents to be more meaningfully referenced independently. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From lucas7@home.com Wed Apr 11 22:47:26 2001 From: lucas7@home.com (Lucas Vogel) Date: Wed, 11 Apr 2001 14:47:26 -0700 Subject: [XML-SIG] web page broken? Message-ID: <004301c0c2d0$ffc328a0$01000001@cx229913e> -----BEGIN PGP SIGNED MESSAGE----- The XML-SIG status page and Resources page links on the mailman page(http://mail.python.org/mailman/listinfo/xml-sig) appear to be broken...maybe they just need to be updated? Thought I'd let someone know. Lucas Vogel -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use iQCVAwUBOtTQ6SlA8LjISNKLAQFFpgP/eHAyucaN3ASej/LfOAj7iSdtP+FQx9DY +rSWgpQfVx/1zmSKnkNc2XIrwYwIasFsWVy9VABS2yEQ8UXTXvbvskUVywhV1NL2 4EVgNYv+/fjoaHjMx3qRyoDJIHg06edFUhqHpFcjH3r8re/e+obyV0bhLB1zGrPP O7SJAfoDKEY= =2JLb -----END PGP SIGNATURE----- From eliot@isogen.com Thu Apr 12 03:19:24 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Wed, 11 Apr 2001 21:19:24 -0500 Subject: [XML-SIG] One-Line Enhancement to catalog.py Message-ID: <3AD510AC.7782ED7A@isogen.com> Hi, In order to use the pyXML catalog stuff with SGML-specific catalogs, I had to enable the processing of SGMLDECL entries. This required adding this line to catalog.py: "SGMLDECL": ("s")} Making the new code: class CatalogParser(AbstrCatalogParser,xmlutils.EntityParser): "A parser for SGML Open catalog files." def __init__(self,error_lang=None): AbstrCatalogParser.__init__(self,error_lang) xmlutils.EntityParser.__init__(self) # p=pubid (or prefix) # s=sysid (to be resolved) # o=other self.entry_hash={ "PUBLIC": ("p","s"), "DELEGATE": ("p","s"), "CATALOG": ("s"), "DOCUMENT": ("s"), "BASE": ("o"), "SYSTEM": ("o","s"), "OVERRIDE": ("o"), "SGMLDECL": ("s")} Can someone make this change to the main distribution for me? Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From martin@loewis.home.cs.tu-berlin.de Thu Apr 12 10:20:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 12 Apr 2001 11:20:49 +0200 Subject: [XML-SIG] One-Line Enhancement to catalog.py In-Reply-To: <3AD510AC.7782ED7A@isogen.com> (eliot@isogen.com) References: <3AD510AC.7782ED7A@isogen.com> Message-ID: <200104120920.f3C9KnJ06117@mira.informatik.hu-berlin.de> > In order to use the pyXML catalog stuff with SGML-specific catalogs, I > had to enable the processing of SGMLDECL entries. This required adding > this line to catalog.py: > > "SGMLDECL": ("s")} It appears that just adding this line is an incomplete change. At a minimum, the application should be informed using, say, handle_sgmldecl. Can you come up with a patch that does that? If so, please submit the patch as a unified (-u) or context (-c) diff. Posting the diff to the list is fine, although submitting it to sourceforge is preferred. Thanks, Martin From noreply@sourceforge.net Thu Apr 12 12:54:53 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 12 Apr 2001 04:54:53 -0700 Subject: [XML-SIG] [ pyxml-Bugs-415640 ] pDomlette does not normalize Message-ID: Bugs item #415640, was updated on 2001-04-12 04:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=415640&group_id=6473 Category: None Group: None Status: Open Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: pDomlette does not normalize Initial Comment: Affected version : 0.10.2 The header of the pDomlette.py file states that "Domlette is also automatically normalized". (it also incorrectly states that Domlette is read only). Therefore, the implementation of normalize in pDomlette is as follow: def normalize(self): pass However, pDomlette does not automatically normalize. Here's a demonstration: >>> from Ft.Lib.pDomlette import PyExpatReader >>> from xml.xpath import Evaluate >>> >>> d=PyExpatReader().fromString('glou') >>> print d.documentElement.childNodes [] >>> t = d.createTextNode('blaaaaa') >>> d.documentElement.appendChild(t) >>> print d.documentElement.childNodes [, ] >>> >>> Evaluate('/doc/text()',d) [, ] I'll be working on a patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=415640&group_id=6473 From Juergen Hermann" Hi! We consider doing a wrapper for Xerces and Xalan as an OS project. One question is where to host it: 1. Own SourceForge project 2. Part of PyXML 3. Part of Xerces / Xalan projects 3 is unlikely. Option 2 seems not too clever, since installing Xerces and Xalan are extra steps anyway, and I doubt anyone seriously wants to = include them in a standard PyXML distribution. Working title for the project would be "pirxx" (Python InteRface to Xalan and Xerces, anyone who has read Stanislaw Lem should know where the name comes from). Before we really launch this, I also would like to know who's interested into such a wrapper (I've seen some requests in the past) and also who would be able and willing to contribute (developers, documenters, beta-testers, ...). Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From nuno.simoes@ruido-visual.pt Thu Apr 12 16:21:56 2001 From: nuno.simoes@ruido-visual.pt (Nuno Simoes) Date: Thu, 12 Apr 2001 16:21:56 +0100 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: Message-ID: <010901c0c364$4fc30260$8040a8c0@localdomain> From: "Juergen Hermann" Sent: Thursday, April 12, 2001 4:02 PM We consider doing a wrapper for Xerces and Xalan as an OS project. One question is where to host it: 1. Own SourceForge project 2. Part of PyXML 3. Part of Xerces / Xalan projects 3 is unlikely. Option 2 seems not too clever, since installing Xerces and Xalan are extra steps anyway, and I doubt anyone seriously wants to include them in a standard PyXML distribution. --> For the startup, probably the first option will be the better one. Working title for the project would be "pirxx" (Python InteRface to Xalan and Xerces, anyone who has read Stanislaw Lem should know where the name comes from). Before we really launch this, I also would like to know who's interested into such a wrapper (I've seen some requests in the past) and also who would be able and willing to contribute (developers, documenters, beta-testers, ...). --> Well, iam, and i believe that more ppl are, because with a python wrapper, you can easilly use Xalan + Xerces in Zope, for example. Yesterday i was trying to do a module for python that use Xalan to process some XML, but i had no skills in C++, so i first have to look to c++ to do it. Its kinda simple, if you look into the Python Manual. About your project, i could do some beta-testing. :-) Nuno Sim�es, rvti From mal@lemburg.com Thu Apr 12 16:21:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 12 Apr 2001 17:21:28 +0200 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: Message-ID: <3AD5C7F8.A60C9180@lemburg.com> Juergen Hermann wrote: > > Hi! > > We consider doing a wrapper for Xerces and Xalan as an OS project. One > question is where to host it: > > 1. Own SourceForge project > 2. Part of PyXML > 3. Part of Xerces / Xalan projects > > 3 is unlikely. Option 2 seems not too clever, since installing Xerces > and Xalan are extra steps anyway, and I doubt anyone seriously wants to > include them in a standard PyXML distribution. I'd suggest 1 and using a distutils setup so that installation becomes a breeze. > Working title for the project would be "pirxx" (Python InteRface to > Xalan and Xerces, anyone who has read Stanislaw Lem should know where > the name comes from). Isn't that name already taken by Christian Tismer ;-) > Before we really launch this, I also would like to know who's > interested into such a wrapper (I've seen some requests in the past) > and also who would be able and willing to contribute (developers, > documenters, beta-testers, ...). Could you provide pointers to Xerces and Xalan ? Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From Alexandre.Fayolle@logilab.fr Thu Apr 12 16:32:29 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 12 Apr 2001 17:32:29 +0200 (CEST) Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: <3AD5C7F8.A60C9180@lemburg.com> Message-ID: On Thu, 12 Apr 2001, M.-A. Lemburg wrote: > Could you provide pointers to Xerces and Xalan ? The starting point is http://xml.apache.org/ You'll get http://xml.apache.org/xerces-c/index.html and http://xml.apache.org/xalan-c/index.html quite rapidly. These are the C++ implementations. There are also Java implementations, that could be interesting for users of JPython, since I believe 4Suite is not available on this platform. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From nuno.simoes@ruido-visual.pt Thu Apr 12 16:32:53 2001 From: nuno.simoes@ruido-visual.pt (Nuno Simoes) Date: Thu, 12 Apr 2001 16:32:53 +0100 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: <3AD5C7F8.A60C9180@lemburg.com> Message-ID: <011601c0c365$d7bbdba0$8040a8c0@localdomain> From: "M.-A. Lemburg" Sent: Thursday, April 12, 2001 4:21 PM [...] > > Before we really launch this, I also would like to know who's > > interested into such a wrapper (I've seen some requests in the past) > > and also who would be able and willing to contribute (developers, > > documenters, beta-testers, ...). > > Could you provide pointers to Xerces and Xalan ? Well, i can answer that right now.. :-) http://xml.apache.org/ Nuno Sim�es, rvti From Nicolas.Chauvat@logilab.fr Thu Apr 12 16:49:38 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Thu, 12 Apr 2001 17:49:38 +0200 (CEST) Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: Message-ID: Hi Juergen, > Before we really launch this, I also would like to know who's > interested into such a wrapper (I've seen some requests in the past) > and also who would be able and willing to contribute (developers, > documenters, beta-testers, ...). Count us in as beta-testers. 4xslt works fine for us, but we'll be glad to help make available another XSL processing engine for python. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From mal@lemburg.com Thu Apr 12 17:04:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 12 Apr 2001 18:04:40 +0200 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: Message-ID: <3AD5D218.7B22866@lemburg.com> Alexandre Fayolle wrote: > > On Thu, 12 Apr 2001, M.-A. Lemburg wrote: > > > Could you provide pointers to Xerces and Xalan ? > > The starting point is http://xml.apache.org/ > > You'll get http://xml.apache.org/xerces-c/index.html and > http://xml.apache.org/xalan-c/index.html quite rapidly. These are the C++ > implementations. There are also Java implementations, that could be > interesting for users of JPython, since I believe 4Suite is not available > on this platform. Thanks. Those tools look very interesting, but doesn't FourThought offer pretty much the same or even a superset of these tools ? BTW, the ICU OSS project (*) hosted by IBM seems an interesting target for Python as well -- now that we have Unicode-support in the core, I guess hooking up Python with ICU should be easy ;-) (*) http://oss.software.ibm.com/icu/ "International Components for Unicode" -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin@mems-exchange.org Thu Apr 12 17:07:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 12 Apr 2001 12:07:28 -0400 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: ; from jh@web.de on Thu, Apr 12, 2001 at 05:02:06PM +0200 References: Message-ID: <20010412120728.C6058@ute.cnri.reston.va.us> On Thu, Apr 12, 2001 at 05:02:06PM +0200, Juergen Hermann wrote: > 1. Own SourceForge project > 2. Part of PyXML > 3. Part of Xerces / Xalan projects >3 is unlikely. Option 2 seems not too clever, since installing Xerces Why is 3 unlikely? There are already Xerces/Perl bindings; why should Python be a second-class citizen in this respect? Initially it'll probably be easier to start a separate development effort, but once you've demonstrated some working code, adding it to xml.apache.org should be considered. --amk From brunson@level3.net Thu Apr 12 17:09:13 2001 From: brunson@level3.net (Eric Brunson) Date: Thu, 12 Apr 2001 10:09:13 -0600 Subject: [XML-SIG] FileReader? In-Reply-To: <200104110800.f3B80xb01754@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Wed, Apr 11, 2001 at 10:00:59AM +0200 References: <20010409152924.A8363@level3.net> <200104100523.f3A5NDX00867@mira.informatik.hu-berlin.de> <20010410162353.A13906@level3.net> <200104110800.f3B80xb01754@mira.informatik.hu-berlin.de> Message-ID: <20010412100912.A2684@level3.net> * Martin v. Loewis (martin@loewis.home.cs.tu-berlin.de) [010411 02:34]: > > Only that most of the example code I came across relied on the > > FileReader object, even code posted to this mailing list as recently > > as 29 Jan 2001. > > In case you've wondered what to use instead: xml.dom.ext.reader is the > package that supports reading streams (strings, files, URLs) into DOM > trees; that gives you a 4DOM tree. Alternatively, you can use > xml.dom.minidom.parse to read a file into a minidom tree. > > Eventually, the DOM level 3 interfaces should be implemented to allow > "portable" access to a DOM parser. Oh, excellent. Thanks for the pointer. :-) -- Eric Brunson - brunson@level3.net - page-eric@level3.net "When governments fear the people there is liberty. When the people fear the government there is tyranny." - Thomas Jefferson From rsalz@zolera.com Thu Apr 12 17:15:35 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 12 Apr 2001 12:15:35 -0400 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: <20010412120728.C6058@ute.cnri.reston.va.us> Message-ID: <3AD5D4A7.E28B9683@zolera.com> I agree with Andrew. In fact, I'd encourage you to participate in xalan/xerces and get the code into their sourcebase as early as possible -- perhaps as soon as you've got something that "works" /r$ From Juergen Hermann" Message-ID: On Thu, 12 Apr 2001 12:07:28 -0400, Andrew Kuchling wrote: >Why is 3 unlikely? OK, add an "initially" there. As the next step, I plan to ask the Xerces and Xalan people for their opinion on the project too, just like here. From Alexandre.Fayolle@logilab.fr Thu Apr 12 17:53:36 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 12 Apr 2001 18:53:36 +0200 (CEST) Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: <3AD5D218.7B22866@lemburg.com> Message-ID: On Thu, 12 Apr 2001, M.-A. Lemburg wrote: > Thanks. Those tools look very interesting, but doesn't FourThought > offer pretty much the same or even a superset of these tools ? Yes, of course. Some of the great strength of 4Suite is 4RDF and DbDom (and of course 4SuiteServer). And of course, writing XSLT extensions in python. However, having used Xalan as a simple command line XSLT engine, when it comes to pure speed, things are just not the same. Xalan is several orders of magnitude faster, so I guess that for a number of applications, having the choice is a Good Thing. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Thu Apr 12 17:58:12 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 12 Apr 2001 18:58:12 +0200 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: (jh@web.de) References: Message-ID: <200104121658.f3CGwCX00922@mira.informatik.hu-berlin.de> > We consider doing a wrapper for Xerces and Xalan as an OS project. One > question is where to host it: > > 1. Own SourceForge project > 2. Part of PyXML > 3. Part of Xerces / Xalan projects > > 3 is unlikely. Option 2 seems not too clever, since installing Xerces > and Xalan are extra steps anyway, and I doubt anyone seriously wants to > include them in a standard PyXML distribution. I would not mind including it in PyXML, if setup.py could be talking into auto-detecting a Xerces installation, and building the wrapper only if Xerces is available. Of course, creating a new project is just fine. In any case, it would be desirable if the wrappers register themselves at the relevant SAX/DOM parser factories, so that users can make use of them without even knowing that they are there. That, of course, assumes that you will follow the Python SAX and DOM mappings. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 12 18:08:10 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 12 Apr 2001 19:08:10 +0200 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: <3AD5D218.7B22866@lemburg.com> (mal@lemburg.com) References: <3AD5D218.7B22866@lemburg.com> Message-ID: <200104121708.f3CH8AO00945@mira.informatik.hu-berlin.de> > Thanks. Those tools look very interesting, but doesn't FourThought > offer pretty much the same or even a superset of these tools ? Yes, Xerces is pretty much the same functionality as PyXML, and Xalan is what 4XSLT does (unless I mix them up right now). However, the hope of people who want such integration typically is that: - it will be faster than the pure Python code. That should be true for the validating parser (i.e. I'd hope that Xerces beats xmlproc in parsing speed), and might be true for the DOM implementation (although I'd wait for completion of the project to see whether this is actually the case). - it will be more correct than PyXML, since there is more developer power behind Xerces - especially since that got supported by the Apache foundation. I'm not sure here; AFAIK, xmlproc is still one of the most complete parsers with regard to catalogs and such stuff, and 4DOM supports almost all of DOM Level 2 - which Xerces didn't, last I checked. > BTW, the ICU OSS project (*) hosted by IBM seems an interesting target > for Python as well -- now that we have Unicode-support in the > core, I guess hooking up Python with ICU should be easy ;-) I think one of the problems with ICU is that you have to use their string types, so quite some copying forth and back might go on when you try to integrate that into Python. Also, it will be tricky to make the integration seemless - users might have to use ICU functions, instead of getting transparent access to codeset converters and locale information. As for codeset converters, this, again, is an area where some speed advantage might be gained from using ICU - if you can avoid too much copying. Regards, Martin From eliot@isogen.com Thu Apr 12 18:14:58 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 12 Apr 2001 12:14:58 -0500 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: <3AD5C7F8.A60C9180@lemburg.com> Message-ID: <3AD5E292.73F2ACA3@isogen.com> > Juergen Hermann wrote: > > Before we really launch this, I also would like to know who's > > interested into such a wrapper (I've seen some requests in the past) > > and also who would be able and willing to contribute (developers, > > documenters, beta-testers, ...). We (DataChannel Austin) would definitely be interested--we are building a Python-based content management system that, among other things, lets us apply XSLT styles to hyperdocuments (of any sort, not just XLink). Putting Xalan as a back end is something we were going to do (just be lauching a separate Java process), but being able to more tightly integrate it with our overall Python framework would be very handy. Also, while I'm thinking about it, we have modified the pyXML and the 4Suite XSLT and XPath stuff to allow us to process arbitrary groves and arbitrary Python objects with XSL. We would like to contribute our changes back but haven't had the bandwidth yet to properly package things up. But we will as soon as we can. Cheers, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Thu Apr 12 18:09:33 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 12 Apr 2001 12:09:33 -0500 Subject: [XML-SIG] One-Line Enhancement to catalog.py References: <3AD510AC.7782ED7A@isogen.com> <200104120920.f3C9KnJ06117@mira.informatik.hu-berlin.de> Message-ID: <3AD5E14D.B650FBA@isogen.com> "Martin v. Loewis" wrote: > > > In order to use the pyXML catalog stuff with SGML-specific catalogs, I > > had to enable the processing of SGMLDECL entries. This required adding > > this line to catalog.py: > > > > "SGMLDECL": ("s")} > > It appears that just adding this line is an incomplete change. At a > minimum, the application should be informed using, say, > handle_sgmldecl. Can you come up with a patch that does that? I can do that--the change I made was sufficient to allow my test cases to pass, but I can do the more complete change if more is required. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From Nicolas.Chauvat@logilab.fr Thu Apr 12 19:29:21 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Thu, 12 Apr 2001 20:29:21 +0200 (CEST) Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: <3AD5E292.73F2ACA3@isogen.com> Message-ID: > Also, while I'm thinking about it, we have modified the pyXML and the > 4Suite XSLT and XPath stuff to allow us to process arbitrary groves and > arbitrary Python objects with XSL. We would like to contribute our > changes back but haven't had the bandwidth yet to properly package > things up. But we will as soon as we can. Are you talking about the same groves that jade, SGML and DSSSL use? What do you mean by "XSL and XPath to process arbitrary python objects"? If that's interesting enough, you may quickly find people that will help you test the merging of your changes with the main tree before it goes mainstream... :-) --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From Juergen Hermann" Message-ID: On Thu, 12 Apr 2001 18:58:12 +0200, Martin v. Loewis wrote: >In any case, it would be desirable if the wrappers register themselves >at the relevant SAX/DOM parser factories, so that users can make use >of them without even knowing that they are there. That, of course, >assumes that you will follow the Python SAX and DOM mappings. That is one of the first goals. Also, we want to handle a C++ SAX stream with Python, and vice versa (feed a Python SAX stream into Xalan). Bi-SAXuality, in a sense. :) Ciao, J=FCrgen From martin@loewis.home.cs.tu-berlin.de Thu Apr 12 21:24:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 12 Apr 2001 22:24:49 +0200 Subject: [XML-SIG] XML howto update for Python 2.0? In-Reply-To: <7FDE48022CEFA544878EA148687723DF0FD965@mqgenevaex01.myqube.com> (Mario.Ruggier@softplumbers.com) References: <7FDE48022CEFA544878EA148687723DF0FD965@mqgenevaex01.myqube.com> Message-ID: <200104122024.f3CKOnU01804@mira.informatik.hu-berlin.de> > if I may pipe in... i was looking for some convenient > sample code to build a doc using DOM, using (preferably) > minidom or the full DOM implementation. But did not find > anything useful, e.g.: > http://py-howto.sourceforge.net/xml-howto/node17.html Not sure what you are asking, probably: How can I create a document by building a DOM tree, and then linearising the tree into an XML document? If so, there is an example for building DOM trees in test/test_minidom.py; the method to generate the document from the tree is .toxml. If you use 4DOM, the methods to build the tree are the same; linearisation then uses xml.dom.ext.PrettyPrint. Regards, Martin From info@webb2e.com Fri Apr 13 01:56:53 2001 From: info@webb2e.com (info@webb2e.com) Date: Thu, 12 Apr 2001 17:56:53 -0700 Subject: [XML-SIG] Free register of online company's profile Message-ID: <0d2285356000d41MAIL@mail3.chinainfoland.com> This is a multi-part message in MIME format. ------=_NextPart_000_DA48_01C0C379.F4BA67D0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit How much are you paying to advertise your business to the world? Expose Your service to the world with bold register of online business profile. Sign up today! Introducing WebB2E.com -- your direct link to global information; source of business, products, education/research, social/culture, entertainment and travel... Additionally you can BUY, SELL or PROMOTE your products and services At www.webb2e.com you'll get: --Message center (open to the public). --Employment center. --Sponsorship center. --Bulletin board (business and service issue). --Flexible Online Office (Business Online Report). --Economic news. --View thousands of trade leads. --Post business propositions. --Merchandise marketing (Vast advertising at a low cost). --World shopping center. .. and much more. Please visit www.webb2e.com If you do not want to recieve any more e-mails from WebB2E.com and wish to be removed from e-mail list please click here . ------=_NextPart_000_DA48_01C0C379.F4BA67D0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable www.webb2e.comHow much are you = paying to advertise your business to the world?

Expose Your = service to the world with bold register of online business profile. Sign = up today!

Introducing WebB2E.com -- your direct link to global = information; source of business, products, education/research, = social/culture, entertainment and travel...
Additionally you can = BUY, SELL or PROMOTE your products and services
At www.webb2e.com you'll get: =

--Message center (open to the public).
--Employment center. =
--Sponsorship center.
--Bulletin board (business and service = issue).
--Flexible Online Office (Business Online Report). =
--Economic news.
--View thousands of trade leads.
--Post = business propositions.
--Merchandise marketing (Vast advertising at = a low cost).
--World shopping center.

... and much more. = Please visit www.webb2e.com =

If you do not want to recieve any more e-mails from WebB2E.com = and wish to be removed from e-mail list please click = here.

------=_NextPart_000_DA48_01C0C379.F4BA67D0-- From frank63@ms5.hinet.net Fri Apr 13 01:15:40 2001 From: frank63@ms5.hinet.net (Frank Chen) Date: Fri, 13 Apr 2001 08:15:40 +0800 Subject: [XML-SIG] Re:Python Wrapper for Xerces/Xalan References: Message-ID: <006e01c0c3b7$b6d1ad00$efa01ea3@MiTACUser> > From: "Juergen Hermann" > To: "Python XML SIG" > Date: Thu, 12 Apr 2001 17:02:06 +0200 > Reply-To: "Juergen Hermann" > Subject: [XML-SIG] Python Wrapper for Xerces/Xalan > > We consider doing a wrapper for Xerces and Xalan as an OS project. One > question is where to host it: > > > Before we really launch this, I also would like to know who's > interested into such a wrapper (I've seen some requests in the past) > and also who would be able and willing to contribute (developers, > documenters, beta-testers, ...). > If there is such a prxx project, I would like to be a documenter and beta-tester. Frank From brian@sweetapp.com Fri Apr 13 02:27:12 2001 From: brian@sweetapp.com (Brian Quinlan) Date: Thu, 12 Apr 2001 18:27:12 -0700 Subject: [XML-SIG] RE: Python Wrapper for Xerces/Xalan In-Reply-To: <3AD62809.447F386C@ActiveState.com> Message-ID: <002101c0c3b8$ddeed6f0$b503a8c0@activestate.ca> I have actually already started a similar project. Currently, I provide a loose wrapping around Xalan i.e. transform( [, ], [, ] ) =3D> string containing transformed source file I was planning on making the first run available soon. -------- Original Message -------- Subject: [XML-SIG] Python Wrapper for Xerces/Xalan Date: Thu, 12 Apr 2001 17:02:06 +0200 From: "Juergen Hermann" Reply-To: "Juergen Hermann" To: "Python XML SIG" Hi! We consider doing a wrapper for Xerces and Xalan as an OS project. One question is where to host it: 1. Own SourceForge project 2. Part of PyXML 3. Part of Xerces / Xalan projects 3 is unlikely. Option 2 seems not too clever, since installing Xerces and Xalan are extra steps anyway, and I doubt anyone seriously wants to include them in a standard PyXML distribution. Working title for the project would be "pirxx" (Python InteRface to Xalan and Xerces, anyone who has read Stanislaw Lem should know where the name comes from). Before we really launch this, I also would like to know who's interested into such a wrapper (I've seen some requests in the past) and also who would be able and willing to contribute (developers, documenters, beta-testers, ...). Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From eliot@isogen.com Fri Apr 13 03:02:12 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 12 Apr 2001 21:02:12 -0500 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: Message-ID: <3AD65E24.E9864477@isogen.com> Nicolas Chauvat wrote: > > > Also, while I'm thinking about it, we have modified the pyXML and the > > 4Suite XSLT and XPath stuff to allow us to process arbitrary groves and > > arbitrary Python objects with XSL. We would like to contribute our > > changes back but haven't had the bandwidth yet to properly package > > things up. But we will as soon as we can. > > Are you talking about the same groves that jade, SGML and DSSSL use? The very same. We are building a generic grove-based hyperdocument management system. Given a grove implementation, we can both hyperlink to it using a generic hyperdocument API (that looks a lot like HyTime but is not dependent on the use of HyTime syntax--it can be bound to any reasonable way of representing hyperdocuments, from HTML to Micrsoft Project). > What do you mean by "XSL and XPath to process arbitrary python objects"? We discovered that with just a couple of lines of code, that we could get the 4Suite XSLT processor to happily apply XSLT templates and XPath expressions to arbitrary Python objects if we treat the object class as the "tag name" and all member variables and parameter-less methods as attributes. For example, our Hyperdocument object class has a "getBosMembers()" method that returns the list of groves from which the hyperdocument was constructed (the "bounded object set" of input documents, converted to groves). Given our hack, I can do this in an XSLT style sheet (note: I am not by any stretch an XSL expert, so please excuse any XSL errors or stupidity in the following example):

Documents in this hyperdocument:

Document of type:

Where the "ext:getHyDoc" extention method returns one of our Hyperdocument objects. The expression "@getBosMembers" is internally translated to a call to the getBosMembers() method of the Python object, which returns a node list of grove nodes. As defined in the grove standard, every grove node has a "ClassName" property that is the string value of the class name, as defined in the grove's property set. So, for example, say my hyperdoc consisted of one XML doc, one Word doc, and one Excel doc, my output would look like this:

Documents in this hyperdocument:

Document of type: SgmlDocument
Document of type: WordDocument
Document of type: ExcelDocument

Where "WordDocument" and ExcelDocument are names defined in the Word and Excel property sets that we have privately defined--I don't really expect Microsoft to publish formal grove property sets for their proprietary formats any time soon. This hack required relatively little modification to the 4Suite code base, although we haven't fully filled out the XPath expressions for operating on groves. I have submitted a paper on this work for the Extreme Markup conference in August. Cheers, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From Nicolas.Chauvat@logilab.fr Fri Apr 13 14:27:26 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Fri, 13 Apr 2001 15:27:26 +0200 (CEST) Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: <3AD65E24.E9864477@isogen.com> Message-ID: > > Are you talking about the same groves that jade, SGML and DSSSL use? >=20 > The very same. We are building a generic grove-based hyperdocument > management system. Given a grove implementation, we can both hyperlink > to it using a generic hyperdocument API (that looks a lot like HyTime > but is not dependent on the use of HyTime syntax--it can be bound to any > reasonable way of representing hyperdocuments, from HTML to Micrsoft > Project). Cool. As XML is sometimes limited for what we're trying to achieve (that is hypertext view of the outer world for http://www.logilab.org/narval/), we've been thinking about implementing groves and HyTime-related things by ourselves, but haven't got the needed bandwith yet. Is that available as (pythonic) free software? > > What do you mean by "XSL and XPath to process arbitrary python objects"= ? >=20 > We discovered that with just a couple of lines of code, that we could > ... Nice too. I suppose a lot of people could find uses for it if you were to contribute it back. > Where "WordDocument" and ExcelDocument are names defined in the Word and > Excel property sets that we have privately defined--I don't really > expect Microsoft to publish formal grove property sets for their > proprietary formats any time soon. What makes you think so? ;-) =20 --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From uche.ogbuji@fourthought.com Fri Apr 13 18:39:57 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 13 Apr 2001 11:39:57 -0600 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan In-Reply-To: Message from "M.-A. Lemburg" of "Thu, 12 Apr 2001 18:04:40 +0200." <3AD5D218.7B22866@lemburg.com> Message-ID: <200104131740.f3DHdvj08824@borgia.local.dhcp.fourthought.com> > Alexandre Fayolle wrote: > > > > On Thu, 12 Apr 2001, M.-A. Lemburg wrote: > > > > > Could you provide pointers to Xerces and Xalan ? > > > > The starting point is http://xml.apache.org/ > > > > You'll get http://xml.apache.org/xerces-c/index.html and > > http://xml.apache.org/xalan-c/index.html quite rapidly. These are the C++ > > implementations. There are also Java implementations, that could be > > interesting for users of JPython, since I believe 4Suite is not available > > on this platform. > > Thanks. Those tools look very interesting, but doesn't FourThought > offer pretty much the same or even a superset of these tools ? Yes, but I'd be the first to say "the more, the merrier". I should also note that probably later this year, there will be a new XSLT processor for Python, anyway. We have a proof-of-concept on a compile-to-python processor that simply screams, and we plan to work on it once we release 4Suite 1.0 and then take care of some business obligations. 4XSLT will move into PyXML as a respectable and very complete processor (if of mediocre performance), and the "next-gen" 4XSLT (to be renamed) will be developed in 4Suite 1.1. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From brian@sweetapp.com Sun Apr 15 02:26:27 2001 From: brian@sweetapp.com (Brian Quinlan) Date: Sat, 14 Apr 2001 18:26:27 -0700 Subject: [XML-SIG] Python extension module for Xerces/Xalan In-Reply-To: <200104131740.f3DHdvj08824@borgia.local.dhcp.fourthought.com> Message-ID: I've just made public my first run at a Python extension module providing XSLT processing capabilities using Xerces/Xalan. The source and Windows binaries for Python 2.0 and Python 2.1 are available. Unfortunately, there is no Disutils setup file so you will have to manually place a few DLLs (or change sys.path). Feedback would be much appreciated. The URL is http://www.sweetapp.com/Pyana/ It is pretty early stage but I hope to add significant improvements soon. Below is a sample usage: >>> import Pyana >>> params = { 'p1' : "'test'", 'p2' : 'count(//*)', 'p3' : '.' } >>> sourceDoc = 'This is a test' >>> styleDoc = Pyana.URI( 'http://www.sweetapp.com/Pyana/samples/params.xsl' ) >>> print Pyana.transform( sourceDoc, styleDoc, params ) test 1 This is a test p4 not provided p5 not provided From eliot@isogen.com Sat Apr 14 16:46:48 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Sat, 14 Apr 2001 10:46:48 -0500 Subject: [XML-SIG] Python Wrapper for Xerces/Xalan References: Message-ID: <3AD870E8.5EEAAE70@isogen.com> Nicolas Chauvat wrote: > > Is that available as (pythonic) free software? Unfortunately, at the moment we can only contribute back the changes we made to PyXML and 4Suite--our own stuff is still proprietary. However I spoke with our CEO just yesterday about providing an open source version of what we've got, which is all entirely Python based. She said she would give it serious consideration. One additional problem is that a key component of our stuff is a system provided by another company and they have no plans to make *that* open source, unfortunately. This is the grove and HyTime engine implementation, which gives us a lot of functionality. The grove component can be replaced with PyGrove, but the HyTime engine would be non-trivial to replace (although not impossible--I've implemented a HyTime engine in VB, so I know it can be done, I'm just not keen to do it at the moment). It's frustrating to me because I would really like to make all the stuff we've done available to everyone, but current business realities (or perceived realities) disallow it for now. I'm trying to sell my employer, DataChannel, on the benefit of other integrators having easy access to our software so they can develop additional add ons. I think the 4Suite/Zope model is a good one and is appropriate for what we're building, but it can be a tough sell in the company wasn't founded with that model, which DataChannel was not (although we have contributed some stuff to the Apache project, so there is some hope). What we're building is the core components that are needed to build a completely link-based, versioning information management system, where the goal is to be able to manage arbitrarily complex versioned hyperdocuments. We have explicitly designed it with a "business module" plug-in framework that encourages the development of clearly-distinct (from the core components) managed plug-ins that extend the system and adapt it to specific use cases. We would like to see an aftermarket of plug-ins. Because of the way we've implemented the system, it provides a variety of configurations, from low-cost, low-scale, essentially all-freeware system to high-cost, high-scale configurations using enterprise-hardened stuff. I think that if we can make the low-cost versions of the components available for free, that we'll see people doing some really cool stuff with it for projects that need this kind of functionality but that don't have any money to spend (or that require open source, such as Doug Englebart's Open Hypermedia System project). Anyway, we're trying to do what we can to contribute back to the community. Cheers, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From larsga@garshol.priv.no Mon Apr 16 10:29:29 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Apr 2001 11:29:29 +0200 Subject: [XML-SIG] One-Line Enhancement to catalog.py In-Reply-To: <200104120920.f3C9KnJ06117@mira.informatik.hu-berlin.de> References: <3AD510AC.7782ED7A@isogen.com> <200104120920.f3C9KnJ06117@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | It appears that just adding this line is an incomplete change. At a | minimum, the application should be informed using, say, | handle_sgmldecl. Can you come up with a patch that does that? Well, the question is whether that is appropriate or not. This information is only of interest to an SGML system. On the other hand, we probably need to support all of the SGML-specific entries in the end, and then we might perhaps just as well make the catalog parser a full SGML catalog parser, since someone may find that useful. Hmmmmmm. Yeah, let's do it. I'll make the changes and commit them. --Lars M. From larsga@garshol.priv.no Mon Apr 16 11:47:22 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Apr 2001 12:47:22 +0200 Subject: [XML-SIG] Using character entities in external DTD without validating. In-Reply-To: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> References: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | With some efforts, you might manage to talk xmlproc (the validating | parser) into not producing validation errors. You can tell xmlproc to read the external subset even when it does not validate. Call parser.set_read_external_subset(1) to tell it to do so. The SAX 2.0 property 'http://xml.org/sax/features/external-parameter-entities' can be used to tell parsers to read the external subset. The only SAX driver supporting this is that of xmlproc; support for it was checked into CVS two minutes ago. --Lars M. From larsga@garshol.priv.no Mon Apr 16 12:25:36 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Apr 2001 13:25:36 +0200 Subject: [XML-SIG] PyTRaX? Message-ID: We now have two XSLT processors usable in Python: 4XSLT and Sablotron, and it seems that we soon will have another. Is it time to start thinking about a standardized API to these processors, something along the lines of TRaX? I can think of the following areas that might be supported: - embedding processors; support for running transformations - support for providing structured input (streams, event streams, document trees) - support for information exchange (error handlers, URI resolvers, location information provision etc) - support for writing extension elements and functions (difficult) --Lars M. From fdrake@acm.org Mon Apr 16 13:13:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 16 Apr 2001 08:13:01 -0400 (EDT) Subject: [XML-SIG] PyTRaX? In-Reply-To: References: Message-ID: <15066.57805.329884.23693@cj42289-a.reston1.va.home.com> Lars Marius Garshol writes: > We now have two XSLT processors usable in Python: 4XSLT and Sablotron, > and it seems that we soon will have another. Is it time to start > thinking about a standardized API to these processors, something along > the lines of TRaX? Can you provide a pointer for TRaX? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From larsga@garshol.priv.no Mon Apr 16 13:47:02 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Apr 2001 14:47:02 +0200 Subject: [XML-SIG] PyTRaX? In-Reply-To: <15066.57805.329884.23693@cj42289-a.reston1.va.home.com> References: <15066.57805.329884.23693@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | Can you provide a pointer for TRaX? It's in the JAXP specification from Sun, at There used to also be a separate TRaX site, but as far as I can tell that no longer exists. --Lars M. From larsga@garshol.priv.no Mon Apr 16 13:55:22 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 Apr 2001 14:55:22 +0200 Subject: [XML-SIG] How to leave character entities alone In-Reply-To: <200104060535.f365Ylg17064@borgia.local> References: <200104060535.f365Ylg17064@borgia.local> Message-ID: * Uche Ogbuji | | There was a bug in 4DOM where entities weren't being handled | properly, but this should now be sorted out. I'm not convinced that creating EntityReference nodes for entities is the correct way to handle them. The presence of such nodes is likely to be surprising to most newbies and to expose bugs in quite a few DOM applications. I think the default should be that these nodes are not produced, but that there may be a flag to for turning them on. A better fix would be to redesign the way the DOM represents these nodes, but that is obviously too late now. --Lars M. From MichaelDyck@home.com Tue Apr 17 06:10:56 2001 From: MichaelDyck@home.com (Michael Dyck) Date: Mon, 16 Apr 2001 22:10:56 -0700 Subject: [XML-SIG] licencing of PyXML contributions Message-ID: <3ADBD060.3004DD2B@home.com> I'm working on a Python implementation of XQuery, and I'm thinking of contributing it to PyXML (if you want it). Before I start showing it to anyone, I want to put it under an open-source licence. Are there any licences that would make it easier/harder to add to PyXML? -Michael Dyck From pyxml@xhaus.com Tue Apr 17 12:21:50 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Tue, 17 Apr 2001 12:21:50 +0100 Subject: [XML-SIG] "Selling" Open Source. In-Reply-To: <3AD870E8.5EEAAE70@isogen.com> Message-ID: W. Eliot Kimber wrote: > However I > spoke with our CEO just yesterday about providing an open > source version of what we've got, which is all entirely > Python based. She said she would give it serious > consideration. > I'm trying to sell my > employer, DataChannel, on the benefit of other integrators > having easy access to our software so they can develop > additional add ons. For interest, The Economist has a series of articles this month about Software and Web Services, including an article on Open Source. It might be a useful for persuading CEO's, CIO's and CFO's to know that such an economics/finance/free market oriented publication as the Economist is writing positively about Open Source, in language that C?O's understand. You can read the article at http://www.economist.com/printedition/displayStory.cfm?Story_ID=568269 Contents for the "Survey of Software" can be read at http://www.economist.com/printedition/index.cfm Regards, Alan. From rsalz@zolera.com Tue Apr 17 17:39:02 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 17 Apr 2001 12:39:02 -0400 Subject: [XML-SIG] bug in dom.ext.GetElementByID Message-ID: <3ADC71A6.236051BF@zolera.com> It looks like GetElementyID is pre-NS (at least). Does the following rewrite make sense? At least it works ... _id_key = ('', 'ID') def GetElementById(startNode, targetId): ''' Return the element in the given tree with an ID attribute of the given value ''' snit = startNode.ownerDocument.createNodeIterator(startNode, NodeFilter.SHOW_ELEMENT, None, 0) curr_node = snit.nextNode() while curr_node: attr = curr_node.attributes.get(_id_key, None) if attr and attr._get_nodeValue() == targetId: return curr_node curr_node = snit.nextNode() return None Just double-checking before I check it in. /r$ From martin@loewis.home.cs.tu-berlin.de Tue Apr 17 21:12:36 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Apr 2001 22:12:36 +0200 Subject: [XML-SIG] licencing of PyXML contributions In-Reply-To: <3ADBD060.3004DD2B@home.com> (message from Michael Dyck on Mon, 16 Apr 2001 22:10:56 -0700) References: <3ADBD060.3004DD2B@home.com> Message-ID: <200104172012.f3HKCau01647@mira.informatik.hu-berlin.de> > Before I start showing it to anyone, I want to put it under an > open-source licence. Are there any licences that would make it > easier/harder to add to PyXML? I guess inclusion of GPL'ed software would be difficult unless we could reasonably claim that it is "mere packaging", since all derived code must then appear under GPL, also, and it might be hard to get all contributors to agree to change there license to GPL. We have the habit of copying all license text into the file LICENCE. For that, it would simplify maintainance if the license text would not read, say, "MICHAEL DYCK DISCLAIMS ALL WARRANTIES", since that could easily change into "MICHALE DYCK, LARS MARIUS GARSHOL, FRED DRAKE, AND MARTIN VON LOEWIS DISCLAIM ALL WARRANTIES". So you might chose a licensing text that does not mention the author's name. In recent expat (I believe), the text Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. is used. In the copy in LICENSE, we don't copy "the above copyright", but instead mention the files where it can be found. This license essentially allows any use, as long as the copyright notice in the source files is maintained. Of course, this is a suggestion only - he who writes the code choses the license. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Apr 17 21:15:10 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Apr 2001 22:15:10 +0200 Subject: [XML-SIG] Using character entities in external DTD without validating. In-Reply-To: (message from Lars Marius Garshol on 16 Apr 2001 12:47:22 +0200) References: <200104092155.f39Lt0u03361@mira.informatik.hu-berlin.de> Message-ID: <200104172015.f3HKFAO01672@mira.informatik.hu-berlin.de> > The only SAX driver supporting this is that of xmlproc; support for > it was checked into CVS two minutes ago. :-) From doc@sympatico.ca Wed Apr 18 01:09:59 2001 From: doc@sympatico.ca (DOC) Date: Tue, 17 Apr 2001 17:09:59 -0700 Subject: [XML-SIG] PyXML-0.6.5 win download Message-ID: <000c01c0c79b$e945d4e0$13733dcf@b1wyft68> This is a multi-part message in MIME format. ------=_NextPart_000_0009_01C0C761.3BF45F80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi: I did a download of this file but the self extracting process=20 stalls out after the second or third frame. PyXML-0.6.5.win32-py2.0.exe I tried it on two diff machines. One was running win 95 the other ME. DOC PyXML-0.6.5.win32-py2.0.exe ------=_NextPart_000_0009_01C0C761.3BF45F80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Hi:

I did a download of this file but the self extracting process =

stalls out after the second or third frame.

PyXML-0.6.5.win32-py2.0.exe

I tried it on two diff machines. One was running win 95 the other = ME.

DOC

PyXML-0.6.5.win32-py2.0.exe

------=_NextPart_000_0009_01C0C761.3BF45F80-- From martin@loewis.home.cs.tu-berlin.de Tue Apr 17 22:36:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 17 Apr 2001 23:36:59 +0200 Subject: [XML-SIG] PyXML-0.6.5 win download In-Reply-To: <000c01c0c79b$e945d4e0$13733dcf@b1wyft68> (doc@sympatico.ca) References: <000c01c0c79b$e945d4e0$13733dcf@b1wyft68> Message-ID: <200104172136.f3HLax202355@mira.informatik.hu-berlin.de> > I did a download of this file but the self extracting process > stalls out after the second or third frame. > > PyXML-0.6.5.win32-py2.0.exe Can you please elaborate? Did you download the file completely, or did the download stall after the "second or third frame"? If the latter, what do you mean by "frame"? If the download completed, can you please report the exact order of commands executed, and buttons pressed and clicked? Thanks, Martin From Mike.Olson@fourthought.com Wed Apr 18 00:35:20 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Tue, 17 Apr 2001 17:35:20 -0600 Subject: [XML-SIG] PyTRaX? References: Message-ID: <3ADCD338.ACCF9907@FourThought.com> Lars Marius Garshol wrote: > > We now have two XSLT processors usable in Python: 4XSLT and Sablotron, > and it seems that we soon will have another. Is it time to start > thinking about a standardized API to these processors, something along > the lines of TRaX? > > I can think of the following areas that might be supported: > > - embedding processors; support for running transformations > > - support for providing structured input (streams, event streams, > document trees) > > - support for information exchange (error handlers, URI resolvers, > location information provision etc) > > - support for writing extension elements and functions (difficult) I was thinking along these lines as well. Something else I thought of as needed is a way to register and retrieve instances of processors. Something like xml.xslt.newProcessor('4XSLT') and xml.xslt.RegisterProcessor('Sablotron',Sablotron.Processor) Mike > > --Lars M. > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jmurray@agyinc.com Wed Apr 18 02:21:04 2001 From: jmurray@agyinc.com (Joe Murray) Date: Tue, 17 Apr 2001 18:21:04 -0700 Subject: [XML-SIG] sax expatreader and unicode Message-ID: <3ADCEC00.3796F640@agyinc.com> What am I missing: the sax expatreader can't handle some unicode characters? I thought this was supported. I believe the xml.dom modules handle unicode characters just fine... >From the text: "...LEX. IN NA=EFVE H4 AND CHO CELLS, PS1 CO-IMM..." Output: =2E =2E =2E File "analyzexml.py", line 68, in analyze_sax parser.parse(stream) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed self._parser.Parse(data, isFinal) UnicodeError: UTF-8 decoding error: invalid data jm --=20 Joseph Murray Bioinformatics Specialist, AGY Therapeutics 290 Utah Avenue, South San Francisco, CA 94080 (650) 228-1146 From martin@loewis.home.cs.tu-berlin.de Wed Apr 18 06:25:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Apr 2001 07:25:56 +0200 Subject: [XML-SIG] sax expatreader and unicode In-Reply-To: <3ADCEC00.3796F640@agyinc.com> (message from Joe Murray on Tue, 17 Apr 2001 18:21:04 -0700) References: <3ADCEC00.3796F640@agyinc.com> Message-ID: <200104180525.f3I5Puf03550@mira.informatik.hu-berlin.de> > What am I missing: the sax expatreader can't handle some unicode > characters? Most likely, the error is in your data, not in Expat. > >From the text: > > "...LEX. IN NA�VE H4 AND CHO CELLS, PS1 CO-IMM..." You did not give the complete document. Did it include a UnicodeError: UTF-8 decoding error: invalid data That error is properly reported: Your data, atleast as transmitted in your message, is not valid UTF-8. In this message, the character in question is primarily the byte \xef. If taken as Latin-1, it is the character LATIN SMALL LETTER I WITH DIAERESIS. You have to declare that the document is Latin-1, or else an XML processor will assume UTF-8. Regards, Martin From larsga@garshol.priv.no Wed Apr 18 17:03:10 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 18 Apr 2001 18:03:10 +0200 Subject: [XML-SIG] PyTRaX? In-Reply-To: <3ADCD338.ACCF9907@FourThought.com> References: <3ADCD338.ACCF9907@FourThought.com> Message-ID: * Mike Olson | | I was thinking along these lines as well. Excellent! So perhaps we should go out and do it. :-) | Something else I thought of as needed is a way to register and | retrieve instances of processors. Good point. There does need to be something along the lines of the make_parser function in SAX. --Lars M. From mrklaw@mindless.com Wed Apr 18 17:46:18 2001 From: mrklaw@mindless.com (Sloan Poe) Date: Wed, 18 Apr 2001 12:46:18 -0400 (EDT) Subject: [XML-SIG] question... Message-ID: <200104181646.f3IGkJY04773@raven.warren-wilson.edu> --684802-1804289383-987612378=:4739 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-Disposition: INLINE --Please respond to me directly as I am not on the list-- I'm having a small problem getting some of the demos included with the python xml package to run. I've installed the latest version from sourceforge and confirmed that all the files are installed in /usr/lib/python1.5/site-packages/ but when I try do use the statement from xml.sax.saxutils import escape I get an error. I can "import xml" or even "from xml.sax import saxutils" I just can't import escape or any function for that matter. I'm trying to run one of the scripts in the xbel dir. I can get it to work on one computer but not on a computer with an almost identical installation. I'm sure this is some ignorance on my part. thanks in advance for your help. --684802-1804289383-987612378=:4739 Content-Type: TEXT/x-vcard; CHARSET=US-ASCII; NAME="vcard.vcf" Content-Disposition: INLINE; FILENAME="vcard.vcf" BEGIN:VCARD FN:Mr. R. Sloan Poe, Jr. TITLE:Web Programmer ORG:Warren Wilson College;Web Crew ADR;DOM;PARCEL;WORK:;6294 Warren Wilson College;701 Warren Wilson Road;Swannanoa;NC;28778; LABEL;POSTAL;WORK;ENCODING=QUOTED-PRINTABLE:6294 Warren Wilson College=0D=0A= 701 Warren Wilson Road=0D=0A= Swannanoa, NC 28778 ADR;DOM;POSTAL;WORK:;6294 Warren Wilson College;P.O. Box 9000;Asheville;NC;28815-9000; LABEL;POSTAL;WORK;ENCODING=QUOTED-PRINTABLE:6294 Warren Wilson College=0D=0A= P.0. Box 9000=0D=0A= Asheville, NC 28815-9000 TEL;Home;VOICE;MESG;PREF:1-828-771-5906 EMAIL;Internet:mrklaw@mindless.com URL:http://www.warren-wilson.edu/~rpoe/card.vcf UID:http://www.warren-wilson.edu/~rpoe/card.vcf TZ:-0500 BDAY:1978-03-16 REV:20010403T174502 VERSION:2.1 END:VCARD --684802-1804289383-987612378=:4739 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-Disposition: INLINE -- Sloan Poe mrklaw@mindless.com If I'm insane, who are you in? --684802-1804289383-987612378=:4739-- From martin@loewis.home.cs.tu-berlin.de Wed Apr 18 20:42:09 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Apr 2001 21:42:09 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: (message from Lars Marius Garshol on 18 Apr 2001 18:03:10 +0200) References: <3ADCD338.ACCF9907@FourThought.com> Message-ID: <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> > Good point. There does need to be something along the lines of the > make_parser function in SAX. My proposal would be to add two keyword arguments, properties= and features=. Each is a list of binary tuples, each tuple has name and value. Alternatively, dictionaries might be better. make_parser will the iterate over all known parser factories, invoking create_parser for each, then trying to set all the properties and features. It will return the first parser that supports all of them, and return a configured instance. There should also be a function xml.sax.register_parser, which accepts an object that has a create_parser function, or a string naming a module that has a create_parser function. What do you think? Regards, Martin From fdrake@acm.org Wed Apr 18 21:03:49 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 18 Apr 2001 16:03:49 -0400 (EDT) Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: <15069.62245.424602.782995@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > make_parser will the iterate over all known parser factories, invoking ... > There should also be a function xml.sax.register_parser, which accepts Sounds pretty reasonable to me. My biggest concern is the possible cost of creating all those parsers, but I'm not sure how important that is. I'm thinking of how I set up my DOMBuilder object, and it's actually pretty thin; the low-level parser doesn't get created until it's actually needed. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Wed Apr 18 20:57:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Apr 2001 21:57:40 +0200 Subject: [XML-SIG] question... In-Reply-To: <200104181646.f3IGkJY04773@raven.warren-wilson.edu> (message from Sloan Poe on Wed, 18 Apr 2001 12:46:18 -0400 (EDT)) References: <200104181646.f3IGkJY04773@raven.warren-wilson.edu> Message-ID: <200104181957.f3IJveV00955@mira.informatik.hu-berlin.de> > I'm sure this is some ignorance on my part. Hard to say, with as little information as you've provided. > I'm having a small problem getting some of the demos included with the > python xml package to run. I've installed the latest version from > sourceforge and confirmed that all the files are installed in > /usr/lib/python1.5/site-packages/ > but when I try do use the statement from xml.sax.saxutils import escape I > get an error. It would be helpful if you could report exactly what error you get, best with a full traceback as reported by python, copying that literally from the terminal where you have tried executing the code in question. > I can "import xml" or even "from xml.sax import saxutils" I just can't > import escape or any function for that matter. When you do >>> from xml.sax import saxutils >>> saxutils what do you get? If you look at the source file of the module (/usr/lib/python1.5/site-packages/xml/sax/saxutils.py in my case), can you spot the escape function? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Apr 18 21:26:43 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 18 Apr 2001 22:26:43 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <15069.62245.424602.782995@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> <15069.62245.424602.782995@cj42289-a.reston1.va.home.com> Message-ID: <200104182026.f3IKQhm01211@mira.informatik.hu-berlin.de> > Sounds pretty reasonable to me. My biggest concern is the possible > cost of creating all those parsers, but I'm not sure how important > that is. I'm thinking of how I set up my DOMBuilder object, and it's > actually pretty thin; the low-level parser doesn't get created until > it's actually needed. I'd be happy to use a different procedure, if somebody can propose on. Providing lists of all supported properties and their possible values in advance does not seem attractive: there could be an infinite number of supported values. Regards, Martin From fdrake@acm.org Wed Apr 18 22:03:06 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 18 Apr 2001 17:03:06 -0400 (EDT) Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104182026.f3IKQhm01211@mira.informatik.hu-berlin.de> References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> <15069.62245.424602.782995@cj42289-a.reston1.va.home.com> <200104182026.f3IKQhm01211@mira.informatik.hu-berlin.de> Message-ID: <15070.266.458475.645549@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > I'd be happy to use a different procedure, if somebody can propose on. > Providing lists of all supported properties and their possible values > in advance does not seem attractive: there could be an infinite number > of supported values. Agreed, and I'd expect maintainability would be seriously compromised as well. If somethon has an implementation of your proposal, I'll certainly not stand in the way. ;-) My misgivings about instantiating the parsers are fairly minor, so please don't consider my comment as reason to worry about it. That (worrying about it) is a small enough job that I can handle it alone. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From rsalz@zolera.com Thu Apr 19 02:32:58 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 18 Apr 2001 21:32:58 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: <3ADE404A.76EFC23B@zolera.com> I think this is pretty close. :) I think the keyword arguments properties and features should take dictionaries. > There should also be a function xml.sax.register_parser, which accepts > an object that has a create_parser function, or a string naming a > module that has a create_parser function. I think perhaps def can_create_parser(properties={}, features={}): 'If you can create a parser with the desired properties and features, return non-None. If you cannot, return None.' def create_parser(properties={}, features={}): 'Create a parser with the specified prperties and features, return None (raise an exception?) if not possible.' Is a little cleaner set of callbacks. From uche.ogbuji@fourthought.com Thu Apr 19 05:06:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 18 Apr 2001 22:06:34 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Message from "Martin v. Loewis" of "Wed, 18 Apr 2001 21:42:09 +0200." <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: <200104190406.f3J46Ym01936@borgia.local> > > Good point. There does need to be something along the lines of the > > make_parser function in SAX. > > My proposal would be to add two keyword arguments, properties= and > features=. Each is a list of binary tuples, each tuple has name and > value. Alternatively, dictionaries might be better. > > make_parser will the iterate over all known parser factories, invoking > create_parser for each, then trying to set all the properties and > features. It will return the first parser that supports all of them, > and return a configured instance. > > There should also be a function xml.sax.register_parser, which accepts > an object that has a create_parser function, or a string naming a > module that has a create_parser function. > > What do you think? I like. But I think we'll want more than just a parser factory: properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} features = ["http://factory.pyxml.org/properties/xpointer"] xml.factory.getXsltProcessor(properties, features) -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu Apr 19 05:46:09 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 18 Apr 2001 22:46:09 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Message from Uche Ogbuji of "Wed, 18 Apr 2001 22:06:34 MDT." <200104190406.f3J46Ym01936@borgia.local> Message-ID: <200104190446.f3J4kAK02141@borgia.local> > I like. But I think we'll want more than just a parser factory: > > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > features = ["http://factory.pyxml.org/properties/xpointer"] > xml.factory.getXsltProcessor(properties, features) Or should that be urn:factory.pyxml.org:properties:encoding etc.? At any rate, looks as if the pyxml.org domain is just a-sitting there. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Thu Apr 19 06:01:38 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 19 Apr 2001 01:01:38 -0400 (EDT) Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104190446.f3J4kAK02141@borgia.local> References: <200104190406.f3J46Ym01936@borgia.local> <200104190446.f3J4kAK02141@borgia.local> Message-ID: <15070.28978.544846.154129@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > Or should that be > > urn:factory.pyxml.org:properties:encoding Gosh, I wish the W3C would be consistent! They seem to like URIs until the next time they write a spec, and then use short strings! (Look at the DOM: feature names are short strings, DOMBuilder/ DOMWriter properties are short strings... except they want everyone else to use Java-style reversed domain names. Sheesh! That's pathetic! At least the SAX crew didn't start introducing other types of names. > At any rate, looks as if the pyxml.org domain is just a-sitting there. Is there any reason we can't use the python.org domain for this? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Thu Apr 19 06:14:55 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 18 Apr 2001 23:14:55 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Message from "Fred L. Drake, Jr." of "Thu, 19 Apr 2001 01:01:38 EDT." <15070.28978.544846.154129@cj42289-a.reston1.va.home.com> Message-ID: <200104190515.f3J5Ete02212@borgia.local> > Uche Ogbuji writes: > > Or should that be > > > > urn:factory.pyxml.org:properties:encoding > > Gosh, I wish the W3C would be consistent! W3C? Consistent? In a pig's een. Didntcha hear? Now they produce "standards", not "recommendations". > At least the SAX crew didn't start introducing other types of > names. Yeah. Of course SAX was my inspiration, until my mind wandered to URNs. > > At any rate, looks as if the pyxml.org domain is just a-sitting there. > > Is there any reason we can't use the python.org domain for this? Not really. http://xml.python.org/factory/property works for me as well. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 07:06:19 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 08:06:19 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104190446.f3J4kAK02141@borgia.local> (message from Uche Ogbuji on Wed, 18 Apr 2001 22:46:09 -0600) References: <200104190446.f3J4kAK02141@borgia.local> Message-ID: <200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de> > > I like. But I think we'll want more than just a parser factory: > > > > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > > features = ["http://factory.pyxml.org/properties/xpointer"] > > xml.factory.getXsltProcessor(properties, features) > > Or should that be > > urn:factory.pyxml.org:properties:encoding > > etc.? I'm not sure whether requiring URIs as configuration keys is such a smart idea, anyway. SAX uses that approach, DOM uses simple strings (like 'core' or 'mutationevents'). Works equally well, IMO. Even for SAX, most Python users probably don't write p.setFeature("http://xml.org/sax/features/namespaces", 1) but from xml.sax.handler import feature_namespaces ... p.setFeature(feature_namespaces, 1) Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 07:01:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 08:01:33 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104190406.f3J46Ym01936@borgia.local> (message from Uche Ogbuji on Wed, 18 Apr 2001 22:06:34 -0600) References: <200104190406.f3J46Ym01936@borgia.local> Message-ID: <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> > I like. But I think we'll want more than just a parser factory: Certainly. This one was specifically about xml.sax.make_parser. > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > features = ["http://factory.pyxml.org/properties/xpointer"] > xml.factory.getXsltProcessor(properties, features) I'm not sure the same factory mechanism would work for all classes. E.g. to select and configure a SAX parser, you need to do setFeature/setProperty calls. To select a DOM implementation, you need to check hasFeature on it, which triggered the xml.dom.getDOMImplementation API. I'd rather obtain an XSLT processor by calling xml.xslt.getProcessor (or xml.xslt.make_processor). Regards, Martin From larsga@garshol.priv.no Thu Apr 19 08:35:01 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Apr 2001 09:35:01 +0200 Subject: [XML-SIG] Re: SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | My proposal would be to add two keyword arguments, properties= and | features=. Each is a list of binary tuples, each tuple has name and | value. Alternatively, dictionaries might be better. I like this, and have been thinking about making something similar for SAX as an add-on outside the core API. I would much prefer dictionaries, BTW. | make_parser will the iterate over all known parser factories, invoking | create_parser for each, then trying to set all the properties and | features. It will return the first parser that supports all of them, | and return a configured instance. Yep. This was my idea as well. | There should also be a function xml.sax.register_parser, which accepts | an object that has a create_parser function, or a string naming a | module that has a create_parser function. Sounds good to me. But is this a proposal for a PyTRAX processor factory or for a SAX parser factory? If the latter, do you think this should go into core Python? --Lars M. From tpassin@home.com Thu Apr 19 14:02:37 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 19 Apr 2001 09:02:37 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190446.f3J4kAK02141@borgia.local> <200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de> Message-ID: <004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> Martin v. Loewis said - > > I'm not sure whether requiring URIs as configuration keys is such a > smart idea, anyway. SAX uses that approach, DOM uses simple strings > (like 'core' or 'mutationevents'). Works equally well, IMO. Even for > SAX, most Python users probably don't write > I favor short (non-URI) feature names. Using URIs to get unique strings may work well when you have the possibility of many people working independently on many projects accidentally producing name collisions. I don't thnk that applies here. Even if, say, the RDF people duplicated some of our feature names, it wouldn't matter since they would be used in a different context. Let's keep them short. Cheers, Tom P From tpassin@home.com Thu Apr 19 14:10:43 2001 From: tpassin@home.com (Thomas B. Passin) Date: Thu, 19 Apr 2001 09:10:43 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190406.f3J46Ym01936@borgia.local> <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> Message-ID: <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> I don't like make_parser() and create_parser() approach because the names are so similar as to lead to confusion. I do like the proposal to ask for a list of processors that support a feature list, which ability wuld be separate from actual parser creation. I say a list because if more than one had my features, I might want to choose one rather than another, not just take whichever one the system wanted to give me. Also, with the has_features() approach, it would be possible to keep a simple features catalog which would avoid the need to instantiate a processor so it could be asked if it had some feature. If each processor (or wrapper) could respond to a request for its features, a script could automatically query it when it was first registered and save the data in the catalog. Perhaps the catalog could be in xml, more likely a dictionary format. The next step in this evolution could be named feature sets! Cheers, Tom P From fdrake@acm.org Thu Apr 19 15:40:32 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 19 Apr 2001 10:40:32 -0400 (EDT) Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> References: <200104190406.f3J46Ym01936@borgia.local> <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> <200104190446.f3J4kAK02141@borgia.local> <200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de> <004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> Message-ID: <15070.63712.948366.946492@cj42289-a.reston1.va.home.com> Thomas B. Passin writes: > I favor short (non-URI) feature names. Using URIs to get unique strings may > work well when you have the possibility of many people working independently > on many projects accidentally producing name collisions. I don't thnk that > applies here. Even if, say, the RDF people duplicated some of our feature > names, it wouldn't matter since they would be used in a different context. Why would it not apply here? I know the "Parsed XML" product for Zope supports some additional features that can be checked with hasFeature(); why would this not be the case for other APIs which can support customizable features? If we do go with short names, we should at least strongly recommend a way to formulate the names for additional features. I'd stick with what's recommended for the DOM in this case; for example: "org.zope.dom.persistence". Thomas B. Passin writes: > I don't like make_parser() and create_parser() approach because the names > are so similar as to lead to confusion. I do like the proposal to ask for a > list of processors that support a feature list, which ability wuld be > separate from actual parser creation. I say a list because if more than one > had my features, I might want to choose one rather than another, not just > take whichever one the system wanted to give me. So perhaps what we have is an interface like this: object SAXImplementation: def can_set_feature(feature, enabled): """Return true if the parsers returned by create() can support 'feature', and false if they don't.""" def can_support_property(property, value): """Return true if the parsers returned by create() can support 'property' with the given value, and false if they can't.""" def get_features_list(): """Return a list of supported feature names. Inclusion of a name in the list does not imply that the feature is supported in both enabled and disabled forms.""" def get_properties_list(): """Return a list of supported property names. Inclusion of a name in the list does not imply that all values for that property are supported.""" def create(): """Return a new XMLReader instance.""" def find_parsers(features={}, properties={}): """Return a list of SAXImplementation objects that support the given features and properties.""" def create_parser(features={}, properties={}): """Return a configured parser object that supports the given feature and property settings. If more than one SAXImplementation supports the given settings, one will be selected arbitrarily.""" def register_parser(impl): """Add the SAXImplementation 'impl' to the set of parsers known to the factory.""" (I expect SAXImplementation objects will usually be modules.) > Also, with the has_features() approach, it would be possible to keep a > simple features catalog which would avoid the need to instantiate a > processor so it could be asked if it had some feature. If each processor > (or wrapper) could respond to a request for its features, a script could > automatically query it when it was first registered and save the data in the This would be fine with me. I think the SAXImplementation interface I outlined above would support this implementation of the factory functions. > catalog. Perhaps the catalog could be in xml, more likely a dictionary > format. I expect such a catalog would have to be a volatile data structure rather than something persistent -- the interaction with sys.path would be a nightmare for a persistent catalog! > The next step in this evolution could be named feature sets! That would be nice to have! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From ken@bitsko.slc.ut.us Thu Apr 19 16:14:55 2001 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 19 Apr 2001 10:14:55 -0500 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Uche Ogbuji's message of "Wed, 18 Apr 2001 22:06:34 -0600" References: <200104190406.f3J46Ym01936@borgia.local> Message-ID: [please do not Cc me msgs posted to list, thx.] Uche Ogbuji writes: > I like. But I think we'll want more than just a parser factory: > > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > features = ["http://factory.pyxml.org/properties/xpointer"] > xml.factory.getXsltProcessor(properties, features) Note when extending this beyond just SAX, "properties" were added to SAX a short time after "features" were. David Megginson later stated that having both was probably redundant, that a property with a value of true or false was effectively the same as a feature. -- Ken From fdrake@acm.org Thu Apr 19 16:26:12 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 19 Apr 2001 11:26:12 -0400 (EDT) Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: References: <200104190406.f3J46Ym01936@borgia.local> Message-ID: <15071.916.14259.394048@cj42289-a.reston1.va.home.com> Ken MacLeod writes: > Note when extending this beyond just SAX, "properties" were added to > SAX a short time after "features" were. David Megginson later stated > that having both was probably redundant, that a property with a value > of true or false was effectively the same as a feature. I agree, but I also understand why Java programmers might like having them as distinct things (the old boolean vs. Object static typing thing). For the Python SAX binding, I think we should keep the same distinction that exists in SAX, but that APIs originating in Python should unify properties and features, and just call them properties. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Thu Apr 19 16:40:36 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 19 Apr 2001 09:40:36 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Message from Ken MacLeod of "19 Apr 2001 10:14:55 CDT." Message-ID: <200104191540.f3JFeaf04858@borgia.local.dhcp.fourthought.com> > [please do not Cc me msgs posted to list, thx.] This is really something that should be fixed in the list config. Mailman does cc by default. Personally, I don't have a problem with it, but it's probably because of the way I filter my mail. > Uche Ogbuji writes: > > > I like. But I think we'll want more than just a parser factory: > > > > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > > features = ["http://factory.pyxml.org/properties/xpointer"] > > xml.factory.getXsltProcessor(properties, features) > > Note when extending this beyond just SAX, "properties" were added to > SAX a short time after "features" were. David Megginson later stated > that having both was probably redundant, that a property with a value > of true or false was effectively the same as a feature. Agreed. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu Apr 19 16:49:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 19 Apr 2001 09:49:07 -0600 Subject: [XML-SIG] RDF Parser -> PyXML Message-ID: <200104191549.f3JFn7x04884@borgia.local.dhcp.fourthought.com> James Tauber and I met at XML DevCon and had a good chat. We agreed that it would be great to get Redfoot's RDF parser into PyXML as a basis for RDF support. Then both RedFoot and 4RDF could use this as the parser, and build their unique additional functions on top of it. I really liek this idea, especially since I'm quite dissatisfied with the RDF parser in 4RDF and would like to have an alternative. the quickest way to get this would be if there were one in PyXML that we could use. So I'd like to wind up the discussion here so that we can maybe check in the Redfoot parser as early as this week, if James and co. are still agreeable. Any general comments before a discussion of technical issues? James? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Thu Apr 19 16:12:35 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 19 Apr 2001 09:12:35 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: <3ADF0063.4499C271@FourThought.com> "Martin v. Loewis" wrote: > > > Good point. There does need to be something along the lines of the > > make_parser function in SAX. > > My proposal would be to add two keyword arguments, properties= and > features=. Each is a list of binary tuples, each tuple has name and > value. Alternatively, dictionaries might be better. > > make_parser will the iterate over all known parser factories, invoking > create_parser for each, then trying to set all the properties and > features. It will return the first parser that supports all of them, > and return a configured instance. > > There should also be a function xml.sax.register_parser, which accepts > an object that has a create_parser function, or a string naming a > module that has a create_parser function. > > What do you think? Can you still specify a parser to avoid the iteration? What about setting a default? Mike > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Thu Apr 19 16:19:35 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 19 Apr 2001 09:19:35 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190406.f3J46Ym01936@borgia.local> Message-ID: <3ADF0207.FB270CF6@FourThought.com> Uche Ogbuji wrote: > > > I like. But I think we'll want more than just a parser factory: > > properties = {"http://factory.pyxml.org/properties/encoding": "BIG5", > "http://uche.ogbuji.net/my-extended-properties/spam": "eggs"} > features = ["http://factory.pyxml.org/properties/xpointer"] > xml.factory.getXsltProcessor(properties, features) Would there be an idea of precedence, what if two parsers meet all of the criteria which one would we pick? Mike > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From rsalz@zolera.com Thu Apr 19 17:39:57 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 19 Apr 2001 12:39:57 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190406.f3J46Ym01936@borgia.local> <3ADF0207.FB270CF6@FourThought.com> Message-ID: <3ADF14DD.28DC8F68@zolera.com> > Would there be an idea of precedence, what if two parsers meet all of > the criteria which one would we pick? the one with the most lines of code? As in, "my python's bigger than yours" :) From Olivier.Cayrol@logilab.fr Thu Apr 19 19:44:05 2001 From: Olivier.Cayrol@logilab.fr (Olivier CAYROL (Logilab)) Date: Thu, 19 Apr 2001 20:44:05 +0200 (CEST) Subject: [XML-SIG] [4XSLT] bug report and patch for complex XML and XSL nesting Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---1463794431-39797830-987705845=:3976 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Hello, I found a vicious bug in 4XSLT (hidden very deeply in the code).=20 Attached to this message you will find a tar.gz file containing a=20 directory tree that exhibits the bug. It is a little application for=20 managing Easter rabbits and eggs distribution (!). There is an XML file=20 that contains the data: easter_mng.xml, an XSL Transformation file:=20 xsl/transf.xsl and XML files containing data for localization:=20 lib/common.xml, lib/en.xml, lib/fr.xml. The lib/common.xml file is imported in the XSLT stylesheets with the=20 'document()' function and is used to insert language-dependant tags in=20 the output. This common.xml file imports other XML files (one per=20 language) with the classic external ENTITY mechanism of XML. When trying to transform the data file from the main directory with=20 the following line command:=20 4xslt -Dlang=3Den easter_mng.xml xsl/transf.xsl , I got this exception: ... File "/usr/lib/python1.5/site-packages/xml/xslt/XsltFunctions.py",=20 line 63, in Document doc =3D context.stylesheet._docReader.fromUri(uri, baseUri=3DbaseUr= i) File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line=20 67, in fromUri rt =3D self.fromStream(stream, baseUri, ownerDoc, stripElements) File "/usr/lib/python1.5/site-packages/Ft/Lib/pDomlette.py", line 5 78, in fromStream raise FtException(Error.XML_PARSE_ERROR, p.ErrorLineNumber, p.Err orColumnNumber, expat.ErrorString(p.ErrorCode)) Ft.Lib.FtException: ('XML parse error at line 16, column 2: error i n processing external entity reference', (16, 2, 'error in processing external entity reference')) In fact, there is a problem when 4XSLT reads the XML document=20 referenced in the 'document()' function: this XML file contains ENTITYs=20 that import XML tree parts by giving local paths from the current=20 document directory whereas in 4XSLT, the baseUri is always the URI of=20 the initial XSLT. The XML reader is unable to find the external entities and the bug appears. Replacing line 67 of Ft.Lib.ReaderBase.py in DomletteReader.fromUri function: rt =3D self.fromStream(stream, baseUri, ownerDoc, stripElements) with : newBaseUri =3D urllib.basejoin(baseUri, uri) rt =3D self.fromStream(stream, newBaseUri, ownerDoc, stripElements) fixes the bug. I initially found the bug while trying to process Norman Walsh's XSLT stylesheets for turning docbook files in XSL formatting objects files (I am unfortunately not working for the Easter Rabbit). Regards, O. CAYROL. _________________________________________________________________________ Olivier CAYROL LOGILAB - Paris (France) http://www.logilab.com/ Change your millenium, try NARVAL the Intelligent Personal Assistant. Changez de mill=E9naire, essayez NARVAL l'Assistant Personnel Intelligent. _________________________________________________________________________ ---1463794431-39797830-987705845=:3976 Content-Type: APPLICATION/x-gzip; name="easter_mng.tar.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: bug exhibitor Content-Disposition: attachment; filename="easter_mng.tar.gz" H4sIALYw3zoAA+1Y7W7aSBTld55iMlstiVbgD7AdKKakjVVFSposeKWNomhl 7IFYtT3seFxCH3efZGfG2GAokGgD3ao+f7C5d+4dz/ieezzIiSkif4XRuP4U BpW9QFZkWdebFZnBMOTCL0dT1iuyoeqGLmuGpjJ/VVGMCpD3M50ikpg6BIAK dmdb/XbZf1B03rFtB18QiX0cmVCpyxCgyMWeH41NeDm4qZ2daa2aAt91j446 bK1Go+4RAB3iDIc+BZETIhO+T8YxeJ9E0QxyIzOj8Ri4OMDEhAR5sKs0OhL7 b906DBIEu00tN3ekNPR6lr642ZShpW/NcDbPAAopOlL6RN97F74fAn8o7TsH r3DD0DbWP8O8/htyQ1MEXyiNCtD2PTGOn7z++f6jaH/cz7GL/xuc88X+M0eD 87+iNNSS/w+BTuBE48QZI+B7JkSRoNfOFBMPfEYzE1KfBgiCLw5jUROqrDCB JSQDsMaM9C/8mBJ/mFDWPUAfTTChUFoJkZJtHmNO46tejJxzFx56PQxj+cyB X6/aBc9nDuJG4gSfPd9PzPHbwOvfxWGI98gBu+pf55yf6r+mrOpc/8lqs6z/ Q+Bl+u/44uaDfXdrAc93eclHDpmB+1RyHVtX1rX1yS7YTrLyO/2tu+KWE88J r+GF/dy2ry4HS3bfAx8uzu1z8Evf+v2Py751sRpKkIB1fWvfrQbJ6GFThIKf 4I51zyzZJ/vSvmOLU+MPCAZ3A9u6BjBtnrBb9BqRoteIzL0eAF/HpSXiA3+d B33Lr+dD3zLuWnbb0/7z+k8nt6cElWf0fyPr/7LRVDXR/xkllPV/ABT7/4hs 7/99Z8JbPBA6IE4IK9KiBPCqNygZxcBD4PafvxP0jTZeVANXzsSPtooBEXC7 GsDJGO3SAygp9cA38BQH/6/vP10WfKGV338HAd9/SpwoHtXZ5X5y7OB/tWkY Of9rutB/itoo+f8QeJn+Y69IO6azAMWPCNHiMCGAlsECR3GbDTHhI6WTtiRN p9P6tFHHZCwprVZL+nNwJdni5cMkhEJpiRQ4oZOEghDRR8y6EkVP4nuRm49r NfAxwEMnABOHOMyFfYq6jxjHbMILzVirdbNgwm1+hMjtEMQoQC41YRVF1UJc RCmPsixg2cxAgF0n8L86osNtiMzHLCJ72E1CFNGTar0uFT+xqqfLsm45fR9j CigKJ4FDURtMCHZRHKMY4CiYgewoFHvsn6VJZANA6FD30YRSJkVT45M45Vxc 5ybRGGt4lM/5DZ9W3h7ve0wPvOF3DxLvqfc93lSrQg5UH6SeGJ723N25mGoI ZrVspnGeUhy+zg9j5/1bWn6mxRux+pSkcBacp1wEeIVnTXNsedj2M7P1+Cvy X9eKS6IXLREfsL4+r7pCLMWW5QEnr5fKTQjh9XQq9cSx/uasp+C521LfuSVr S53+sWDAUkSWKFGiRIkSJUqUKFHix8G/6s/VQQAoAAA= ---1463794431-39797830-987705845=:3976-- From larsga@garshol.priv.no Thu Apr 19 20:21:45 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 19 Apr 2001 21:21:45 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <200104191549.f3JFn7x04884@borgia.local.dhcp.fourthought.com> References: <200104191549.f3JFn7x04884@borgia.local.dhcp.fourthought.com> Message-ID: * Uche Ogbuji | | James Tauber and I met at XML DevCon and had a good chat. We agreed | that it would be great to get Redfoot's RDF parser into PyXML as a | basis for RDF support. Then both RedFoot and 4RDF could use this as | the parser, and build their unique additional functions on top of | it. I would like to see RDF and other things that are not core XML functionality kept outside the PyXML package. A PyRDF package might be a better solution for both PyXML and all those interested in RDF support in Python. SourceForge is there, so it should be easy to set up. --Lars M. From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 22:06:58 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 23:06:58 +0200 Subject: [XML-SIG] Re: SAX parser factories (Was: PyTRaX?) In-Reply-To: (message from Lars Marius Garshol on 19 Apr 2001 09:35:01 +0200) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> Message-ID: <200104192106.f3JL6ws00987@mira.informatik.hu-berlin.de> > But is this a proposal for a PyTRAX processor factory or for a SAX > parser factory? If the latter, do you think this should go into core > Python? The latter, so the API should be in core Python (even though it will only appear in 2.2). I think Python will then still support the expatreader only, unless additional parsers have registered themselves. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 22:20:30 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 23:20:30 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200104190406.f3J46Ym01936@borgia.local> <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> Message-ID: <200104192120.f3JLKUo01190@mira.informatik.hu-berlin.de> > I don't like make_parser() and create_parser() approach because the names > are so similar as to lead to confusion. I do like the proposal to ask for a > list of processors that support a feature list, which ability wuld be > separate from actual parser creation. I say a list because if more than one > had my features, I might want to choose one rather than another, not just > take whichever one the system wanted to give me. So what would be your proposed API, given that xml.sax.make_parser already exists (but only allows to specify parser names)? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 22:28:55 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 23:28:55 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <15070.63712.948366.946492@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200104190406.f3J46Ym01936@borgia.local> <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> <200104190446.f3J4kAK02141@borgia.local> <200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de> <004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> <15070.63712.948366.946492@cj42289-a.reston1.va.home.com> Message-ID: <200104192128.f3JLStl01204@mira.informatik.hu-berlin.de> [...] > def create(): > """Return a new XMLReader instance.""" > > def create_parser(features={}, properties={}): > """Return a configured parser object that supports the > given feature and property settings. If more than one > SAXImplementation supports the given settings, one will be > selected arbitrarily.""" I'd prefer to extend on existing API, where the SAXImplementation currently has a create_parser, and xml.sax currently has make_parser. Apart from that, this proposal requires quite some API from an individual parser module, so I'd rather see this implemented for, say, expatreader, before changing the infrastructure to use the API. Even writing a Python PEP might be appropriate. Also, the question will be whether we provide that infrastructure only for SAX2 parsers (of which we two at the moment), or also for the SAX1 drivers, which don't have the notion of features and properties at all at the moment. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 22:49:12 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 23:49:12 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <3ADF0063.4499C271@FourThought.com> (message from Mike Olson on Thu, 19 Apr 2001 09:12:35 -0600) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> <3ADF0063.4499C271@FourThought.com> Message-ID: <200104192149.f3JLnCU01364@mira.informatik.hu-berlin.de> > Can you still specify a parser to avoid the iteration? Certainly. > What about setting a default? I think PY_SAX_PARSER needs continued support, if for no other reason than backwards compatibility. The question is how it would integrate with properties requested by the application. There is no issue if the application did not request any features, or if the PY_SAX_PARSER(s) support the requested features. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Apr 19 22:52:04 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 19 Apr 2001 23:52:04 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <3ADF0207.FB270CF6@FourThought.com> (message from Mike Olson on Thu, 19 Apr 2001 09:19:35 -0600) References: <200104190406.f3J46Ym01936@borgia.local> <3ADF0207.FB270CF6@FourThought.com> Message-ID: <200104192152.f3JLq4l01366@mira.informatik.hu-berlin.de> > Would there be an idea of precedence, what if two parsers meet all of > the criteria which one would we pick? For SAX, I think the registry ought to have an ordered list. For the parsers known to PyXML, this list should be sorted roughly by expected resource consumption (both time and space). One proposal is to return the list of all matching parsers, which would off-load selection to the application (not that the application would be in any better position to make a choice - unless it can off-load the choice to an experienced user). Regards, Martin From uche.ogbuji@fourthought.com Fri Apr 20 03:59:39 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 19 Apr 2001 20:59:39 -0600 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: Message from Lars Marius Garshol of "19 Apr 2001 21:21:45 +0200." Message-ID: <200104200300.f3K2xdZ07783@borgia.local> > > * Uche Ogbuji > | > | James Tauber and I met at XML DevCon and had a good chat. We agreed > | that it would be great to get Redfoot's RDF parser into PyXML as a > | basis for RDF support. Then both RedFoot and 4RDF could use this as > | the parser, and build their unique additional functions on top of > | it. > > I would like to see RDF and other things that are not core XML > functionality kept outside the PyXML package. A PyRDF package might be > a better solution for both PyXML and all those interested in RDF > support in Python. SourceForge is there, so it should be easy to set up. Can you define "core XML"? My guess is that this is one of those "I know it when I see it" things. You don't use RDF, so it's probably harder for you to see it as core XML. For me, it's hard to see RDF as anything but core XML. I also think that your stated standard would tend to exclude large parts of the current contents of PyXML. I might be persuaded of your viewpoint, but I'd need to see an argument for your viewpoint. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jtauber@bowstreet.com Fri Apr 20 05:28:32 2001 From: jtauber@bowstreet.com (James Tauber) Date: Fri, 20 Apr 2001 00:28:32 -0400 Subject: [XML-SIG] RDF Parser -> PyXML Message-ID: I can certainly make it available. It was written from the start to be usable apart from Redfoot so it won't be any problem. There's no support for containers yet but that shouldn't hold things up. James > -----Original Message----- > From: Uche Ogbuji [mailto:uche.ogbuji@fourthought.com] > Sent: Thursday, April 19, 2001 11:49 AM > To: xml-sig@python.org > Subject: [XML-SIG] RDF Parser -> PyXML > > > James Tauber and I met at XML DevCon and had a good chat. We > agreed that it > would be great to get Redfoot's RDF parser into PyXML as a > basis for RDF > support. Then both RedFoot and 4RDF could use this as the > parser, and build > their unique additional functions on top of it. > > I really liek this idea, especially since I'm quite > dissatisfied with the RDF > parser in 4RDF and would like to have an alternative. the > quickest way to get > this would be if there were one in PyXML that we could use. > > So I'd like to wind up the discussion here so that we can > maybe check in the > Redfoot parser as early as this week, if James and co. are > still agreeable. > > Any general comments before a discussion of technical issues? > > James? > > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > From Mike.Olson@fourthought.com Fri Apr 20 04:52:42 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 19 Apr 2001 21:52:42 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> <3ADF0063.4499C271@FourThought.com> <200104192149.f3JLnCU01364@mira.informatik.hu-berlin.de> Message-ID: <3ADFB28A.B3337C33@FourThought.com> "Martin v. Loewis" wrote: > > > Can you still specify a parser to avoid the iteration? > > Certainly. > > > What about setting a default? > > I think PY_SAX_PARSER needs continued support, if for no other reason > than backwards compatibility. The question is how it would integrate > with properties requested by the application. There is no issue if the > application did not request any features, or if the PY_SAX_PARSER(s) > support the requested features. I guess what I am worried about is making it to complex/slow for an application that knows what parser/processor it wants to use. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Fri Apr 20 06:06:17 2001 From: tpassin@home.com (Thomas B. Passin) Date: Fri, 20 Apr 2001 01:06:17 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190406.f3J46Ym01936@borgia.local><200104190601.f3J61XW01112@mira.informatik.hu-berlin.de><004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com><200104190446.f3J4kAK02141@borgia.local><200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de><004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> <15070.63712.948366.946492@cj42289-a.reston1.va.home.com> Message-ID: <003201c0c957$a2181980$7cac1218@reston1.va.home.com> Fred L. Drake said - > > Thomas B. Passin writes: > > I favor short (non-URI) feature names. Using URIs to get unique strings may > > work well when you have the possibility of many people working independently > > on many projects accidentally producing name collisions. I don't thnk that > > applies here. Even if, say, the RDF people duplicated some of our feature > > names, it wouldn't matter since they would be used in a different context. > > Why would it not apply here? I know the "Parsed XML" product for > Zope supports some additional features that can be checked with > hasFeature(); why would this not be the case for other APIs which can > support customizable features? Well, XML namespaces envision a situation where different element of the same name might get used in the same document. With our features, if I ask for feature X on an RDF processor, it can't be confused with feature X on an xslt processsor or feature X on a validating XML parser. So there is no need to disambiguate, I thought. If there were likely to be a lot of uncoordinated efforts all adding features to python processors, I'd see it differently. But I'm not hard over about it. > If we do go with short names, we should at least strongly recommend > a way to formulate the names for additional features. I'd stick with > what's recommended for the DOM in this case; for example: > "org.zope.dom.persistence". > Why, do you see a unification with java processor APIs in the future? If we're going to go to dotted names, let's just use URIs and be done with it. If we use the dotted name method, would it be tied to the current package structure? I don;t favor that becaues what should be done if the packages are refactored? I favor making up a URN-like prefix and notation even if we never register it anywhere. Otherwise, a (possibly fake) url as others have suggested. If we use a url, we could consider pointing it to a RDDL-like document that could contain machine and human readible information on the features. Cheers, Tom P From tpassin@home.com Fri Apr 20 06:27:42 2001 From: tpassin@home.com (Thomas B. Passin) Date: Fri, 20 Apr 2001 01:27:42 -0400 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) References: <200104190406.f3J46Ym01936@borgia.local><200104190601.f3J61XW01112@mira.informatik.hu-berlin.de><004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com><200104190446.f3J4kAK02141@borgia.local><200104190606.f3J66Jn01217@mira.informatik.hu-berlin.de><004201c0c8d1$0294cc40$7cac1218@reston1.va.home.com> <15070.63712.948366.946492@cj42289-a.reston1.va.home.com> Message-ID: <003701c0c95a$9fae1700$7cac1218@reston1.va.home.com> I was thinking of something like this (ignoring any distinction between properties and features): def get_processor_list(featurelist): '''Return a list of parsers (by name or registered ID, these would not be instances) that support the requested features. Each item of the list can be instantiated by a call to create_named_parser().''' def create_named_processor(parsername, featurelist): '''Instantiate a parser using a parser name as returned by get_parser_list.''' These would be in addition to Fred's suggestions (copied below), except that neither of these methods should belong to a specific processor implementation (since they look across all of them). I'm not yet quite clear on what class they should be part of. With this approach, you can take the first processor on the list if you don't care which one you use: processors=get_processor_list(featurelist) if processors: parser=create_named_processor(processors[0],featurelist) else: bye_bye_right_now() Or if you know you a particular one, use it by name if it shows up in the list. This api should work nicely for other types of processors in the future, so it would give us a uniform approach to wrapping processors that may have various features. Cheers, Tom P Fred L. Drake wrote - > > > So perhaps what we have is an interface like this: > > object SAXImplementation: > def can_set_feature(feature, enabled): > """Return true if the parsers returned by create() > can support 'feature', and false if they don't.""" > > def can_support_property(property, value): > """Return true if the parsers returned by create() > can support 'property' with the given value, and > false if they can't.""" > > def get_features_list(): > """Return a list of supported feature names. > Inclusion of a name in the list does not imply that > the feature is supported in both enabled and disabled > forms.""" > > def get_properties_list(): > """Return a list of supported property names. > Inclusion of a name in the list does not imply that > all values for that property are supported.""" > > def create(): > """Return a new XMLReader instance.""" > > def find_parsers(features={}, properties={}): > """Return a list of SAXImplementation objects that support > the given features and properties.""" > > def create_parser(features={}, properties={}): > """Return a configured parser object that supports the > given feature and property settings. If more than one > SAXImplementation supports the given settings, one will be > selected arbitrarily.""" > > def register_parser(impl): > """Add the SAXImplementation 'impl' to the set of parsers > known to the factory.""" > > (I expect SAXImplementation objects will usually be modules.) > > > Also, with the has_features() approach, it would be possible to keep a > > simple features catalog which would avoid the need to instantiate a > > processor so it could be asked if it had some feature. If each processor > > (or wrapper) could respond to a request for its features, a script could > > automatically query it when it was first registered and save the data in the > > This would be fine with me. I think the SAXImplementation interface > I outlined above would support this implementation of the factory > functions. > > > catalog. Perhaps the catalog could be in xml, more likely a dictionary > > format. > > I expect such a catalog would have to be a volatile data structure > rather than something persistent -- the interaction with sys.path > would be a nightmare for a persistent catalog! > > > The next step in this evolution could be named feature sets! > > That would be nice to have! > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Digital Creations > > From larsga@garshol.priv.no Fri Apr 20 09:47:27 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Apr 2001 10:47:27 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> References: <200104190406.f3J46Ym01936@borgia.local> <200104190601.f3J61XW01112@mira.informatik.hu-berlin.de> <004a01c0c8d2$241f6ae0$7cac1218@reston1.va.home.com> Message-ID: * Thomas B. Passin | | I don't like make_parser() and create_parser() approach because the names | are so similar as to lead to confusion. That's a good point, but only make_parser() is externally visible. The only people who need to care about create_parser() is those making SAX drivers that they want to make available to the make_parser() factory. So far that has only been the XML-SIG, and I think we could change that without too much pain for anyone, if we find it worthwhile. | I do like the proposal to ask for a list of processors that support | a feature list, which ability wuld be separate from actual parser | creation. I say a list because if more than one had my features, I | might want to choose one rather than another, not just take | whichever one the system wanted to give me. Also a good point. So perhaps this should be a different function, then. It also seems that we should have a parser name property to make it easier to see which parsers have been returned. | Also, with the has_features() approach, it would be possible to keep | a simple features catalog which would avoid the need to instantiate | a processor so it could be asked if it had some feature. I think it's worth the trouble to instantiate a processor, since that is the only reliable way to check if the processor actually has all the modules it needs to work. --Lars M. From larsga@garshol.priv.no Fri Apr 20 09:48:43 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Apr 2001 10:48:43 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <3ADFB28A.B3337C33@FourThought.com> References: <3ADCD338.ACCF9907@FourThought.com> <200104181942.f3IJg9d00877@mira.informatik.hu-berlin.de> <3ADF0063.4499C271@FourThought.com> <200104192149.f3JLnCU01364@mira.informatik.hu-berlin.de> <3ADFB28A.B3337C33@FourThought.com> Message-ID: * Mike Olson | | I guess what I am worried about is making it to complex/slow for an | application that knows what parser/processor it wants to use. That application will simply give make_parser() the name of the parser and have it created and returned without further ado. That's how it works now and that needs to be kept for backwards compatibility. It also makes sense to keep it. --Lars M. From larsga@garshol.priv.no Fri Apr 20 09:49:29 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Apr 2001 10:49:29 +0200 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: <200104192152.f3JLq4l01366@mira.informatik.hu-berlin.de> References: <200104190406.f3J46Ym01936@borgia.local> <3ADF0207.FB270CF6@FourThought.com> <200104192152.f3JLq4l01366@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | For SAX, I think the registry ought to have an ordered list. For the | parsers known to PyXML, this list should be sorted roughly by expected | resource consumption (both time and space). Agreed. | One proposal is to return the list of all matching parsers, which | would off-load selection to the application (not that the application | would be in any better position to make a choice - unless it can | off-load the choice to an experienced user). I think this is worth considering, at least. --Lars M. From larsga@garshol.priv.no Fri Apr 20 09:56:37 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Apr 2001 10:56:37 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <200104200300.f3K2xdZ07783@borgia.local> References: <200104200300.f3K2xdZ07783@borgia.local> Message-ID: * Uche Ogbuji | | Can you define "core XML"? Well, it would have to be things that - use or support the XML data model (RDF fails here) - are not XML applications themselves (whether RDF fails here can be argued) - help you build XML applications and software Catalogs, TREX, XSLT, XPath and so on all meet these criteria while RDF does not. RDF has its own parsers, its own data model, its own schema language, its own databases and its own everything. To me these are strong arguments for why a separate package of RDF functionality makes more sense than to provide parsers, object model implementations and validators for two different data models in the same package. | You don't use RDF, so it's probably harder for you to see it as core | XML. For me, it's hard to see RDF as anything but core XML. You are right that I don't use RDF and so may be looking at it the wrong way, but to be honest I don't consider either RDF or topic maps as core XML technologies or even XML technologies at all. Sure, they use an XML serialization format, but that is the only connection apart from that they all are used for information management. What is, really, the connection between RDF and XML? (I am asking for enlightenment here. :) | I also think that your stated standard would tend to exclude large | parts of the current contents of PyXML. Which ones? --Lars M. From Nicolas.Chauvat@logilab.fr Fri Apr 20 11:19:04 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Fri, 20 Apr 2001 12:19:04 +0200 (CEST) Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: Message-ID: > You are right that I don't use RDF and so may be looking at it the > wrong way, but to be honest I don't consider either RDF or topic maps > as core XML technologies or even XML technologies at all. Sure, they > use an XML serialization format, but that is the only connection apart > from that they all are used for information management. >=20 > What is, really, the connection between RDF and XML? (I am asking for > enlightenment here. :) I used to think the same and the more I use RDF, the more I consider it as XML's other half. Ever used references to identifiers (IDs) in XML to materialize relations between objects? Where did you stuff them... in attributes or in text nodes? How would you tie two different source objects to the same target object using the same type of relation... with two different attributes (same name?) or element nodes, one for each source object? I you've ever faced that kind of situation and considered it to be a problem, then RDF will be your friend.=20 I don't pretend to perfectly answer your question, just to hint at key RDF advantages. BTW, I don't use topic maps, but what I read about it does not make me consider it a core XML technology... as opposed to RDF. --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From tpassin@home.com Fri Apr 20 13:52:49 2001 From: tpassin@home.com (Thomas B. Passin) Date: Fri, 20 Apr 2001 08:52:49 -0400 Subject: [XML-SIG] RDF Parser -> PyXML References: <200104200300.f3K2xdZ07783@borgia.local> Message-ID: <000e01c0c998$ce9a3b00$7cac1218@reston1.va.home.com> One thing to consider is that a separate effort for pyRDF might have trouble gaining critical mass - enough developers to really keep it going - if it were not part of a larger activity like PyXML. Here's what I suggest. By all means let's welcome the RDF contributions into the PyXML fold and CVS. Keep the RDF code in a separate "rdf" package, not the "xml" package. The RDF work should have its own co-captain, I would thnk. I'm sure Andrew has more than enough to keep him busy as it is! Then in the future if the time should come to split off a separate PyRDF SIG, it will be easy to do so and the XML SIG will have become its proud parent. In the meantime, thinking about RDF will be like getting a whole new group of use cases, whiich could lead to improvements in the mainline xml work too. Cheers, Tom P Lars Marius Garshol wrote - > - use or support the XML data model (RDF fails here) > - are not XML applications themselves (whether RDF fails here can be argued) > - help you build XML applications and software > > Catalogs, TREX, XSLT, XPath and so on all meet these criteria while > RDF does not. RDF has its own parsers, its own data model, its own > schema language, its own databases and its own everything. > > To me these are strong arguments for why a separate package of RDF > functionality makes more sense than to provide parsers, object model > implementations and validators for two different data models in the > same package. > From jennifer@homedr.org Fri Apr 20 10:11:38 2001 From: jennifer@homedr.org (jennifer@homedr.org) Date: Fri, 20 Apr 2001 09:11:38 Subject: [XML-SIG] Excellent Real Estate Investment Opportunity!!!! Message-ID: <731.16182.813432@unknown> Excellent Real Estate Investment Opportunites.

Remove instructions found at the bottom of this message

You are reading today's free Real Estate Investment and Financial Newslet= ter !!!!!
-------------------------------------------------------------------------= --------------
Today's Feature Investment Ad
-------------------------------------------------------------------------= -------------
NORTH IDAHO PROPERTY!
Great Price! $84,000
Post Falls, Idaho
One year old - 960 square foot single family home
2 bedroom - 2 bath
kitchen, dining room, living room
wood deck porch and single car garage
One block from Middle School
Presently rented at $600 per month
Post Falls (pop. 12,000) is located 30 minutes east of Spokane, WA. (pop.= 200,000) within the
heart of the Inland Empire (pop 500,000).
Pristine lakes and rivers surround Post Falls.
For pictures or more information email inquires to Jennifer@homedr.org with the word "NIProperty" in th= e subject line.

NOT AN AGENT

Our Supporters
Get paid to complete online Surveys! Get up to $100 per survey!!!!!
http://www.cyberbounty.com/ad?a=3D71&b=3D9999&c=3D2334

Free Debt Consolidation - Cut your monthly bills by %50 NOW!!!!
http://www.tshirtnews.co= m/x/mags.htm

Need a loan? Match your needs with Hundreds of Lenders.
Get approved in minutes. Apply Now!!!
h= ttp://www.onresponse.com/onR_Ads.asp?a=3D8663&d=3D2006

-----Premier Bank Visa Card -----

Bad credit, No credit, No Problem, Get your credit card?
Apply Now for a First Premier Bank Visa
h= ttp://www.tshirtnews.com/x/prem.htm

---- Attention Homeowners!!!------
Put some extra cash in your pocket, consolidate your bills, and lower you= r bills.
Online Pre qualification is Free and there's no obligation. Apply Now = regardless of your past credit!

http://tshirtnews.com/x/= spectrum.htm

-------------------------------------------------------------------------= -----------------

Unsubscribe information=85..

Ok, ok, ok. You don't want another copy of this e-rag. So, we don't want = to waste any bits sending it to anyone who does not want it. A friend gav= e us your email address, thinking you might want to see a free copy. So d= on't cuss us out. Please send us an email message with the single word UN= SUBSCRIBE in the message subject line and we will definitely remove your = from our list. No we don't sell our remove list.

From uche.ogbuji@fourthought.com Fri Apr 20 14:32:22 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 20 Apr 2001 07:32:22 -0600 Subject: [XML-SIG] SAX parser factories (Was: PyTRaX?) In-Reply-To: Message from "Thomas B. Passin" of "Fri, 20 Apr 2001 01:06:17 EDT." <003201c0c957$a2181980$7cac1218@reston1.va.home.com> Message-ID: <200104201332.f3KDWMZ11051@borgia.local> > > If we do go with short names, we should at least strongly recommend > > a way to formulate the names for additional features. I'd stick with > > what's recommended for the DOM in this case; for example: > > "org.zope.dom.persistence". > > > > Why, do you see a unification with java processor APIs in the future? If > we're going to go to dotted names, let's just use URIs and be done with it. This is my inclination as well. > If we use the dotted name method, would it be tied to the current package > structure? I don;t favor that becaues what should be done if the packages > are refactored? I favor making up a URN-like prefix and notation even if we > never register it anywhere. Otherwise, a (possibly fake) url as others have > suggested. If we use a url, we could consider pointing it to a RDDL-like > document that could contain machine and human readible information on the > features. Possibly something like that, but RDDL itself would probably be a tad heavyweight, unless we come up with a scheme where the processor can look up a RDDL document at the base URI that covers a whole family of properties. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Apr 20 14:47:47 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 20 Apr 2001 07:47:47 -0600 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: Message from Lars Marius Garshol of "20 Apr 2001 10:56:37 +0200." Message-ID: <200104201348.f3KDlme11097@borgia.local> > > * Uche Ogbuji > | > | Can you define "core XML"? > > Well, it would have to be things that > > - use or support the XML data model (RDF fails here) > - are not XML applications themselves (whether RDF fails here can be argued) > - help you build XML applications and software I'm unclear on "are not XML applications themselves" I certainly think that RDF falls well into your third category. I think that the result of this discussion between you and me will be "agree to disagree", so I'd like to hear from others. But let me touch on your other questions. > What is, really, the connection between RDF and XML? (I am asking for > enlightenment here. :) I think RDF is the most natural tool available for managing XML meta-data. Indeed I don't often use XML without using RDF as well. In this regard RDF is as important to me as XSLT. * Rather than process large XML documents, I tend to break things down to small, cohesive mini-docs, and use RDF-encoded relationships to "glue" them together. * When managing large quantities of XML documents, even with varying structure, I use RDF for indexing and aggregation. * RDF itself is my most common XML vocabulary for documents that do a lot of description, and represent object relationships. See http://www-106.ibm.com/developerworks/library/ws-rdf/index.html?dwzone=ws For an example of this line of thought as I apply it to the normally XML-only WSDL. > | I also think that your stated standard would tend to exclude large > | parts of the current contents of PyXML. > > Which ones? Well, what I had in mind was formed without the benefit of your clarification above. I was thinking things like unicode xpath xslt marshal iso8601 By your criteria above, most of these do indeed belong in the package, but so then does RDF, IMO. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Fri Apr 20 17:28:33 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 20 Apr 2001 12:28:33 -0400 Subject: [XML-SIG] Canonicalizing XML Message-ID: Has anyone written code for producing XML in Canonical XML format? (http://www.w3.org/TR/xml-c14n) --amk From rsalz@zolera.com Fri Apr 20 20:19:39 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 20 Apr 2001 15:19:39 -0400 Subject: [XML-SIG] Canonicalizing XML References: Message-ID: <3AE08BCB.5B8A5045@zolera.com> I've done some of it, but the code is horrible, and I want to re-do it. It uses xpath and 4dom; if someone wants the code to use as a source of ideas for a real version, let me know. /r$ From martin@loewis.home.cs.tu-berlin.de Fri Apr 20 21:10:07 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Apr 2001 22:10:07 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <200104201348.f3KDlme11097@borgia.local> (message from Uche Ogbuji on Fri, 20 Apr 2001 07:47:47 -0600) References: <200104201348.f3KDlme11097@borgia.local> Message-ID: <200104202010.f3KKA7E01355@mira.informatik.hu-berlin.de> > I think that the result of this discussion between you and me will be "agree > to disagree", so I'd like to hear from others. I know nothing about RDF, so I can't comment from an architectural point of view. However, I think I can represent or understand the following positions. 1. PyXML user who also uses RDF. I'd certainly appreciate if I had to install only a single package, not two - unless the release schedule for PyXML would compromise timely delivery of RDF code updates. 2. PyXML user who does not use RDF. I would not care if RDF code was included in PyXML, as long that does not consume an intolerable percentage of the entire package (which would be bad for download times and disk consumption). 3. PyXML maintainer and packager. I appreciate contributions from whoever is willing to contribute, as long as it is open source, and as long there is willingness to maintain the contribution for the months (and years) to come. My biggest concern is that the original author runs away and leaves me with all the bug reports. Regards, Martin From fdrake@acm.org Fri Apr 20 21:29:39 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 20 Apr 2001 16:29:39 -0400 (EDT) Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <200104202010.f3KKA7E01355@mira.informatik.hu-berlin.de> References: <200104201348.f3KDlme11097@borgia.local> <200104202010.f3KKA7E01355@mira.informatik.hu-berlin.de> Message-ID: <15072.39987.36359.665251@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > 1. PyXML user who also uses RDF. I'd certainly appreciate if I had to > install only a single package, not two - unless the release > schedule for PyXML would compromise timely delivery of RDF code > updates. Which indicates that the RDF code should be separate, at least until it has stabilized and has good reception from users. I don't know where Redfoot is on the development curve. > 2. PyXML user who does not use RDF. I would not care if RDF code was > included in PyXML, as long that does not consume an intolerable > percentage of the entire package (which would be bad for download > times and disk consumption). This is Python code; it should never be too long! ;-) > 3. PyXML maintainer and packager. I appreciate contributions from > whoever is willing to contribute, as long as it is open source, and > as long there is willingness to maintain the contribution for the > months (and years) to come. My biggest concern is that the original > author runs away and leaves me with all the bug reports. Reinforces my comment on #1 -- don't include until it's stable. Perhaps the right thing to do would be to have a separate package, with an option to consider merging it in later. I think it would be good to keep development discussion here -- it *is* a general XML thing, whether or not all of us use it (we don't all use validation either, but it's in the package). The cross-pollination and expansion of the set of use cases is good. I don't mind sharing the CVS repository, either -- CVS supports multiple modules quite well. Why don't we set Redfoot up as the "redfoot" CVS module and reserve the package name xml.rdf for it? (As in, let it require PyXML and then install itself to ../site-package/_xmlplus/rdf/.) If a decision is made to keep it separate or merge it in later, there are no real disruptions for user code. Is this reasonable, or am I off my rocker again? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Fri Apr 20 22:01:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Apr 2001 23:01:56 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: (message from Lars Marius Garshol on 20 Apr 2001 10:56:37 +0200) References: <200104200300.f3K2xdZ07783@borgia.local> Message-ID: <200104202101.f3KL1uY01684@mira.informatik.hu-berlin.de> > Catalogs, TREX, XSLT, XPath and so on all meet these criteria while > RDF does not. RDF has its own parsers, its own data model, its own > schema language, its own databases and its own everything. Maybe I'm missing something here, but ... Isn't any RDF description a well-formed XML document? and don't RDF 'parsers' typically operate on top of XML parsers? atleast 4RDF seems to operate that way. That you then perform some processing, further correctness checks and so on is expected for a specific XML application. Given that, I'm not so sure I understand what an RDF parser would do that a plain XML parser with a SAX interface wouldn't. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Apr 20 22:04:57 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Apr 2001 23:04:57 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <000e01c0c998$ce9a3b00$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <200104200300.f3K2xdZ07783@borgia.local> <000e01c0c998$ce9a3b00$7cac1218@reston1.va.home.com> Message-ID: <200104202104.f3KL4vG01741@mira.informatik.hu-berlin.de> > Then in the future if the time should come to split off a separate PyRDF > SIG, it will be easy to do so and the XML SIG will have become its proud > parent. I think nobody has questioned whether RDF discussion should take place on xml-sig@python.org; this is certainly the right mailing list (IMO). Discussion seems to be more about packaging. After all, PyXML <> xml-sig. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Apr 20 22:14:09 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 20 Apr 2001 23:14:09 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <15072.39987.36359.665251@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200104201348.f3KDlme11097@borgia.local> <200104202010.f3KKA7E01355@mira.informatik.hu-berlin.de> <15072.39987.36359.665251@cj42289-a.reston1.va.home.com> Message-ID: <200104202114.f3KLE9m01831@mira.informatik.hu-berlin.de> > Why don't we set Redfoot up as the "redfoot" CVS module and reserve > the package name xml.rdf for it? (As in, let it require PyXML and > then install itself to ../site-package/_xmlplus/rdf/.) If a decision > is made to keep it separate or merge it in later, there are no real > disruptions for user code. Is this reasonable, or am I off my rocker > again? Sounds good to me. There is plenty of experience of installing things into the _xmlplus tree (xml for you Python 1.5 users :-), as well as with multiple modules in CVS (the SF operation with typically a single module in each repository only is actually a strange setup). Regards, Martin From chris.arndt@web.de Fri Apr 20 22:25:00 2001 From: chris.arndt@web.de (Christopher Arndt) Date: Fri, 20 Apr 2001 23:25:00 +0200 Subject: [XML-SIG] xbel demo patch Message-ID: <3AE0A92C.B2F4B5F1@web.de> Hi xml'ers! I have recently toyed with the XBEL demos included in the PyXML distribution and found they did not work for be because of problems with german umlauts (and iso-latin characters in general). I fixed the parsing scripts to convert non-ascii into xml charrefs but that posed problems in the xbel parsing script, because the CDATA was not accumulated in the right way accross event handler calls. So I fixed that to and now the scripts are working for me. I made a patch against the most recent (0.6.5) distribution. Should I post it on thsi list (7k bzipped) or send it to someone else? -- Christopher Arndt [t] +49 6221-303918 Hildastr. 2 [c] +49 173-9542751 69115 Heidelberg [e] chris.arndt@web.de From fdrake@acm.org Fri Apr 20 22:31:00 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 20 Apr 2001 17:31:00 -0400 (EDT) Subject: [XML-SIG] xbel demo patch In-Reply-To: <3AE0A92C.B2F4B5F1@web.de> References: <3AE0A92C.B2F4B5F1@web.de> Message-ID: <15072.43668.443715.641248@cj42289-a.reston1.va.home.com> Christopher Arndt writes: > I made a patch against the most recent (0.6.5) distribution. Should I > post it on thsi list (7k bzipped) or send it to someone else? I'd have to dig around to see who wrote the modules in that directory, but I seem to be taking the lead on further development of XBEL (whcih I enjoy), so you can send them to me and I'll take a look at them. I wouldn't be surprised if the issue is that xmllib is being used for parsing rather than Expat. I'll try and poke around it this weekend. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jtauber@jtauber.com Sat Apr 21 00:15:59 2001 From: jtauber@jtauber.com (James Tauber) Date: Fri, 20 Apr 2001 19:15:59 -0400 Subject: [XML-SIG] RDF Parser -> PyXML Message-ID: <008501c0c9ef$df0dbc80$bd020a0a@EHUD> > Which indicates that the RDF code should be separate, at least until > it has stabilized and has good reception from users. I don't know > where Redfoot is on the development curve. Redfoot should be considered 1.0 beta at the moment but note: the RDF support we are talking about is only 5% of what Redfoot is. > Why don't we set Redfoot up as the "redfoot" CVS module and reserve > the package name xml.rdf for it? (As in, let it require PyXML and > then install itself to ../site-package/_xmlplus/rdf/.) If a decision > is made to keep it separate or merge it in later, there are no real > disruptions for user code. Is this reasonable, or am I off my rocker > again? Using the "redfoot" name for this would not be appropriate for the reason I give above: RDF parsing / serialization is only a tiny part of Redfoot. Other parts of Redfoot may be appropriate to donate later on but what we're talking about now is small. Personally, I don't care whether it is part of PyXML or not. I just want to: - make it easier for Python people to use RDF - to encourage RDF people to use Python James From jtauber@bowstreet.com Sat Apr 21 00:14:58 2001 From: jtauber@bowstreet.com (James Tauber) Date: Fri, 20 Apr 2001 19:14:58 -0400 Subject: [XML-SIG] RDF Parser -> PyXML Message-ID: > Maybe I'm missing something here, but ... Isn't any RDF description a > well-formed XML document? Yes > and don't RDF 'parsers' typically operate on > top of XML parsers? atleast 4RDF seems to operate that way. As does Redfoot's RDF parser. > That you then perform some processing, further correctness checks and > so on is expected for a specific XML application. > > Given that, I'm not so sure I understand what an RDF parser would do > that a plain XML parser with a SAX interface wouldn't. An RDF parser sits on top of an XML parser and converts RDF's various serializations into triples. James From uche.ogbuji@fourthought.com Sat Apr 21 00:18:17 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 20 Apr 2001 17:18:17 -0600 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: Message from "Fred L. Drake, Jr." of "Fri, 20 Apr 2001 16:29:39 EDT." <15072.39987.36359.665251@cj42289-a.reston1.va.home.com> Message-ID: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> > Martin v. Loewis writes: > > 1. PyXML user who also uses RDF. I'd certainly appreciate if I had to > > install only a single package, not two - unless the release > > schedule for PyXML would compromise timely delivery of RDF code > > updates. > > Which indicates that the RDF code should be separate, at least until > it has stabilized and has good reception from users. I don't know > where Redfoot is on the development curve. Hmm. I don't know that we can claim much history of waiting until code is mature before we merge it in. > > 3. PyXML maintainer and packager. I appreciate contributions from > > whoever is willing to contribute, as long as it is open source, and > > as long there is willingness to maintain the contribution for the > > months (and years) to come. My biggest concern is that the original > > author runs away and leaves me with all the bug reports. > > Reinforces my comment on #1 -- don't include until it's stable. > Perhaps the right thing to do would be to have a separate package, > with an option to consider merging it in later. I think it would be > good to keep development discussion here -- it *is* a general XML > thing, whether or not all of us use it (we don't all use validation > either, but it's in the package). The cross-pollination and expansion > of the set of use cases is good. I don't mind sharing the CVS > repository, either -- CVS supports multiple modules quite well. > Why don't we set Redfoot up as the "redfoot" CVS module and reserve > the package name xml.rdf for it? (As in, let it require PyXML and > then install itself to ../site-package/_xmlplus/rdf/.) If a decision > is made to keep it separate or merge it in later, there are no real > disruptions for user code. Is this reasonable, or am I off my rocker > again? Well this is all fair enough, but the only problem is that there is little value to it if it doesn't get packaged in. If it doesn't, there aren't people working with it, finding bugs, fixing them, etc., and it won't magically become mature sitting in a dungeon. I think it might help this discussion to note that James's code is really just a SAX handler that emits RDF triples in simple Python form. It's by no means a monster. And it's by no means a full-blown RDF system. 4RDF consists of almost 100 Python files, only *one* of which implements a parser. Checking in Redfoot's parser would provide a simple, common root from which the (hopefully many) Python RDF implementations can flourish, packaged separately. Sort of like an expat for which many different DOMs have grown. Of course, my opinion is to go ahead and check it right into PyXML as an "xml.rdf" package. After all, it only deals with the XML serialization of RDF. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jtauber@bowstreet.com Sat Apr 21 00:16:24 2001 From: jtauber@bowstreet.com (James Tauber) Date: Fri, 20 Apr 2001 19:16:24 -0400 Subject: [XML-SIG] RDF Parser -> PyXML Message-ID: > I think nobody has questioned whether RDF discussion should take place > on xml-sig@python.org; this is certainly the right mailing list (IMO). > Discussion seems to be more about packaging. I raised the RDF question here because I thought this mailing list was the right place, regardless of whether the RDF code is part of PyXML or not. So I'm glad someone agrees that we can at least talk about RDF here. James From jtauber@jtauber.com Sat Apr 21 00:46:13 2001 From: jtauber@jtauber.com (James Tauber) Date: Fri, 20 Apr 2001 19:46:13 -0400 Subject: [XML-SIG] RDF Parser -> PyXML References: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> Message-ID: <00a801c0c9f4$1875cb30$bd020a0a@EHUD> > I think it might help this discussion to note that James's code is really just > a SAX handler that emits RDF triples in simple Python form. Almost right. It actually uses expat. Which raises the question: should I change it to use SAX? Any advantage? I plan on a refactor tonight so the candidate code will be available this weekend. James From uche.ogbuji@fourthought.com Sat Apr 21 02:46:38 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 20 Apr 2001 19:46:38 -0600 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: Message from "James Tauber" of "Fri, 20 Apr 2001 19:15:59 EDT." <008501c0c9ef$df0dbc80$bd020a0a@EHUD> Message-ID: <200104210146.f3L1kca15963@borgia.local> > > Why don't we set Redfoot up as the "redfoot" CVS module and reserve > > the package name xml.rdf for it? (As in, let it require PyXML and > > then install itself to ../site-package/_xmlplus/rdf/.) If a decision > > is made to keep it separate or merge it in later, there are no real > > disruptions for user code. Is this reasonable, or am I off my rocker > > again? > > Using the "redfoot" name for this would not be appropriate for the reason I > give above: RDF parsing / serialization is only a tiny part of Redfoot. > Other parts of Redfoot may be appropriate to donate later on but what we're > talking about now is small. Agreed. That's why I had in mind xml.rdf. > Personally, I don't care whether it is part of PyXML or not. I just want to: > - make it easier for Python people to use RDF > - to encourage RDF people to use Python Yes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Apr 21 06:42:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 21 Apr 2001 07:42:16 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> (message from Uche Ogbuji on Fri, 20 Apr 2001 17:18:17 -0600) References: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> Message-ID: <200104210542.f3L5gG501042@mira.informatik.hu-berlin.de> > Of course, my opinion is to go ahead and check it right into PyXML > as an "xml.rdf" package. After all, it only deals with the XML > serialization of RDF. We didn't never have such a long discussion whether to check in a single file before... For a single file, I'm not even sure a package is appropriate, could be a module xml.rdf as well. But I guess you'd like to call it xml.rdf.parser, or some such. So I'd be now in favour of this addition, provided there would be some indication (e.g. in xml.rdf.__init__) that this is *not* a complete RDF library, just the parser. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Apr 21 06:46:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 21 Apr 2001 07:46:31 +0200 Subject: [XML-SIG] RDF Parser -> PyXML In-Reply-To: <00a801c0c9f4$1875cb30$bd020a0a@EHUD> (jtauber@jtauber.com) References: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> <00a801c0c9f4$1875cb30$bd020a0a@EHUD> Message-ID: <200104210546.f3L5kVv01066@mira.informatik.hu-berlin.de> > Almost right. It actually uses expat. Which raises the question: should I > change it to use SAX? Any advantage? I don't know whether RDF has the notion of validation, in which case SAX would be an advantage: you could parse it through xmlproc. Also, if speed matters, parsing through sgmlop might be a gain over using expat - although this gain could be easily eaten by the SAX overhead. IOW, if all validation/error checking happens in the RDF parser, then using xml.parser.expat is the right choice. Regards, Martin From jtauber@jtauber.com Sat Apr 21 07:05:50 2001 From: jtauber@jtauber.com (James Tauber) Date: Sat, 21 Apr 2001 02:05:50 -0400 Subject: [XML-SIG] RDF Parser -> PyXML References: <200104202318.f3KNIHJ14021@borgia.local.dhcp.fourthought.com> <00a801c0c9f4$1875cb30$bd020a0a@EHUD> <200104210546.f3L5kVv01066@mira.informatik.hu-berlin.de> Message-ID: <002001c0ca29$20ea4db0$c8020a0a@EHUD> > IOW, if all validation/error checking happens in the RDF parser, then > using xml.parser.expat is the right choice. This is true, so I'll stick with expat for now. James From mbennett@ideaeng.com Sun Apr 22 21:56:32 2001 From: mbennett@ideaeng.com (Mark Bennett) Date: Sun, 22 Apr 2001 13:56:32 -0700 Subject: [XML-SIG] Can't install PyXML w/ Python 2.1 on NT Message-ID: Sorry to bother you all. I did check the archives. I've done a fresh install of PYthon 2.1 Then I got the .exe for PyXML (I've tried 6.2 and 6.5) - I got the one for Python 2.0 When the install runs it asks me which installed Python to add it to, but has blanks and won't let me spec a path, no Browse button. I see in the March archives that somebody else asked this. The advice back then was to move up to 2.0 from 1.x. I'd prefer not to move backwards to 2.0 if I can avoid it. Any ideas? Thanks, Mark From martin@loewis.home.cs.tu-berlin.de Mon Apr 23 04:03:30 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 23 Apr 2001 05:03:30 +0200 Subject: [XML-SIG] Can't install PyXML w/ Python 2.1 on NT In-Reply-To: (message from Mark Bennett on Sun, 22 Apr 2001 13:56:32 -0700) References: Message-ID: <200104230303.f3N33U901641@mira.informatik.hu-berlin.de> > I've done a fresh install of PYthon 2.1 > Then I got the .exe for PyXML (I've tried 6.2 and 6.5) - I got the > one for Python 2.0 [...] > Any ideas? The binary releases are tied through a specific release of Python, through associating the binary modules with a specific version of pythonxy.dll. Therefore, you cannot use a binary release created for Python 2.0 with Python 2.1. Instead, you need to compile PyXML yourself, which requires Visual C++. Regards, Martin From sjoerd.mullender@oratrix.com Mon Apr 23 16:52:35 2001 From: sjoerd.mullender@oratrix.com (Sjoerd Mullender) Date: Mon, 23 Apr 2001 17:52:35 +0200 Subject: [XML-SIG] Canonicalizing XML In-Reply-To: Your message of Fri, 20 Apr 2001 12:28:33 -0400. References: Message-ID: <20010423155236.71C71301CF7@bireme.oratrix.nl> ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2357.988041096.1@bireme.oratrix.nl> I've written a validating XML parser in Python that can produce Canonical XML. I'll attach it. Usage (for getting Canonical XML): python fxmllib.py -c file.xml On Fri, Apr 20 2001 Andrew Kuchling wrote: > Has anyone written code for producing XML in Canonical XML format? > (http://www.w3.org/TR/xml-c14n) > > --amk > > > > -- Sjoerd Mullender ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2357.988041096.2@bireme.oratrix.nl> Content-Description: Validating XML Parser Content-Disposition: attachment; filename="fxmllib.py" Content-Transfer-Encoding: quoted-printable __version__ =3D "$Id: fxmllib.py,v 1.2 2001/04/20 15:12:49 sjoerd Exp $" import re, string import sys # need for CanonXMLParser class Error(Exception): """Error class; raised when a syntax error is encountered. Instance variables are: lineno: line at which error was found; offset: offset into data where error was found; text: data in which error was found. If these values are unknown, they are set to None.""" lineno =3D offset =3D text =3D filename =3D None def __init__(self, *args): self.args =3D args if len(args) > 1: self.lineno =3D args[1] if len(args) > 2: self.text =3D args[2] if len(args) > 3: self.offset =3D args[3] if len(args) > 4: self.filename =3D args[4] def __str__(self): if self.filename: if self.lineno: msg =3D '"%s", line %d: ' % (self.filename, self.lineno) else: msg =3D '"%s": ' % self.filename elif self.lineno: msg =3D 'line %d: ' % self.lineno else: msg =3D '' return '%sSyntax error: %s' % (msg, self.args[0]) # The character sets below are taken directly from the XML spec. _BaseChar =3D u'\u0041-\u005A\u0061-\u007A\u00C0-\u00D6\u00D8-\u00F6\u00F8= -\u00FF' \ u'\u0100-\u0131\u0134-\u013E\u0141-\u0148\u014A-\u017E' \ u'\u0180-\u01C3\u01CD-\u01F0\u01F4-\u01F5\u01FA-\u0217' \ u'\u0250-\u02A8\u02BB-\u02C1\u0386\u0388-\u038A\u038C' \ u'\u038E-\u03A1\u03A3-\u03CE\u03D0-\u03D6\u03DA\u03DC\u03DE' \= u'\u03E0\u03E2-\u03F3\u0401-\u040C\u040E-\u044F\u0451-\u045C' = \ u'\u045E-\u0481\u0490-\u04C4\u04C7-\u04C8\u04CB-\u04CC' \ u'\u04D0-\u04EB\u04EE-\u04F5\u04F8-\u04F9\u0531-\u0556\u0559' = \ u'\u0561-\u0586\u05D0-\u05EA\u05F0-\u05F2\u0621-\u063A' \ u'\u0641-\u064A\u0671-\u06B7\u06BA-\u06BE\u06C0-\u06CE' \ u'\u06D0-\u06D3\u06D5\u06E5-\u06E6\u0905-\u0939\u093D' \ u'\u0958-\u0961\u0985-\u098C\u098F-\u0990\u0993-\u09A8' \ u'\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09DC-\u09DD\u09DF-\u09E1' = \ u'\u09F0-\u09F1\u0A05-\u0A0A\u0A0F-\u0A10\u0A13-\u0A28' \ u'\u0A2A-\u0A30\u0A32-\u0A33\u0A35-\u0A36\u0A38-\u0A39' \ u'\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8B\u0A8D' \ u'\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2-\u0AB3' \ u'\u0AB5-\u0AB9\u0ABD\u0AE0\u0B05-\u0B0C\u0B0F-\u0B10' \ u'\u0B13-\u0B28\u0B2A-\u0B30\u0B32-\u0B33\u0B36-\u0B39\u0B3D' = \ u'\u0B5C-\u0B5D\u0B5F-\u0B61\u0B85-\u0B8A\u0B8E-\u0B90' \ u'\u0B92-\u0B95\u0B99-\u0B9A\u0B9C\u0B9E-\u0B9F\u0BA3-\u0BA4' = \ u'\u0BA8-\u0BAA\u0BAE-\u0BB5\u0BB7-\u0BB9\u0C05-\u0C0C' \ u'\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C33\u0C35-\u0C39' \ u'\u0C60-\u0C61\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8' \ u'\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CDE\u0CE0-\u0CE1\u0D05-\u0D0C' = \ u'\u0D0E-\u0D10\u0D12-\u0D28\u0D2A-\u0D39\u0D60-\u0D61' \ u'\u0E01-\u0E2E\u0E30\u0E32-\u0E33\u0E40-\u0E45\u0E81-\u0E82' = \ u'\u0E84\u0E87-\u0E88\u0E8A\u0E8D\u0E94-\u0E97\u0E99-\u0E9F' \= u'\u0EA1-\u0EA3\u0EA5\u0EA7\u0EAA-\u0EAB\u0EAD-\u0EAE\u0EB0' \= u'\u0EB2-\u0EB3\u0EBD\u0EC0-\u0EC4\u0F40-\u0F47\u0F49-\u0F69' = \ u'\u10A0-\u10C5\u10D0-\u10F6\u1100\u1102-\u1103\u1105-\u1107' = \ u'\u1109\u110B-\u110C\u110E-\u1112\u113C\u113E\u1140\u114C' \ u'\u114E\u1150\u1154-\u1155\u1159\u115F-\u1161\u1163\u1165' \ u'\u1167\u1169\u116D-\u116E\u1172-\u1173\u1175\u119E\u11A8' \ u'\u11AB\u11AE-\u11AF\u11B7-\u11B8\u11BA\u11BC-\u11C2\u11EB' \= u'\u11F0\u11F9\u1E00-\u1E9B\u1EA0-\u1EF9\u1F00-\u1F15' \ u'\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59' = \ u'\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBE' \= u'\u1FC2-\u1FC4\u1FC6-\u1FCC\u1FD0-\u1FD3\u1FD6-\u1FDB' \ u'\u1FE0-\u1FEC\u1FF2-\u1FF4\u1FF6-\u1FFC\u2126\u212A-\u212B' = \ u'\u212E\u2180-\u2182\u3041-\u3094\u30A1-\u30FA\u3105-\u312C' = \ u'\uAC00-\uD7A3' _Ideographic =3D u'\u4E00-\u9FA5\u3007\u3021-\u3029' _CombiningChar =3D u'\u0300-\u0345\u0360-\u0361\u0483-\u0486\u0591-\u05A1\= u05A3-\u05B9' \ u'\u05BB-\u05BD\u05BF\u05C1-\u05C2\u05C4\u064B-\u0652\u06= 70' \ u'\u06D6-\u06DC\u06DD-\u06DF\u06E0-\u06E4\u06E7-\u06E8' \= u'\u06EA-\u06ED\u0901-\u0903\u093C\u093E-\u094C\u094D' \ u'\u0951-\u0954\u0962-\u0963\u0981-\u0983\u09BC\u09BE\u09= BF' \ u'\u09C0-\u09C4\u09C7-\u09C8\u09CB-\u09CD\u09D7\u09E2-\u0= 9E3' \ u'\u0A02\u0A3C\u0A3E\u0A3F\u0A40-\u0A42\u0A47-\u0A48' \ u'\u0A4B-\u0A4D\u0A70-\u0A71\u0A81-\u0A83\u0ABC\u0ABE-\u0= AC5' \ u'\u0AC7-\u0AC9\u0ACB-\u0ACD\u0B01-\u0B03\u0B3C\u0B3E-\u0= B43' \ u'\u0B47-\u0B48\u0B4B-\u0B4D\u0B56-\u0B57\u0B82-\u0B83' \= u'\u0BBE-\u0BC2\u0BC6-\u0BC8\u0BCA-\u0BCD\u0BD7\u0C01-\u0= C03' \ u'\u0C3E-\u0C44\u0C46-\u0C48\u0C4A-\u0C4D\u0C55-\u0C56' \= u'\u0C82-\u0C83\u0CBE-\u0CC4\u0CC6-\u0CC8\u0CCA-\u0CCD' \= u'\u0CD5-\u0CD6\u0D02-\u0D03\u0D3E-\u0D43\u0D46-\u0D48' \= u'\u0D4A-\u0D4D\u0D57\u0E31\u0E34-\u0E3A\u0E47-\u0E4E\u0E= B1' \ u'\u0EB4-\u0EB9\u0EBB-\u0EBC\u0EC8-\u0ECD\u0F18-\u0F19\u0= F35' \ u'\u0F37\u0F39\u0F3E\u0F3F\u0F71-\u0F84\u0F86-\u0F8B' \ u'\u0F90-\u0F95\u0F97\u0F99-\u0FAD\u0FB1-\u0FB7\u0FB9' \ u'\u20D0-\u20DC\u20E1\u302A-\u302F\u3099\u309A' _Digit =3D u'\u0030-\u0039\u0660-\u0669\u06F0-\u06F9\u0966-\u096F\u09E6-\u= 09EF' \ u'\u0A66-\u0A6F\u0AE6-\u0AEF\u0B66-\u0B6F\u0BE7-\u0BEF' \ u'\u0C66-\u0C6F\u0CE6-\u0CEF\u0D66-\u0D6F\u0E50-\u0E59' \ u'\u0ED0-\u0ED9\u0F20-\u0F29' _Extender =3D u'\u00B7\u02D0\u02D1\u0387\u0640\u0E46\u0EC6\u3005\u3031-\u3= 035' \ u'\u309D-\u309E\u30FC-\u30FE' _Letter =3D _BaseChar + _Ideographic _NameChar =3D '-' + _Letter + _Digit + '._:' + _CombiningChar + _Extender _S =3D '[ \t\r\n]+' # white space _opS =3D '[ \t\r\n]*' # optional white space _Name =3D '['+_Letter+'_:]['+_NameChar+']*' # XML Name _QStr =3D "(?:'[^']*'|\"[^\"]*\")" # quoted XML string _Char =3D u'\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD' # legal characters comment =3D re.compile('') space =3D re.compile(_S) interesting =3D re.compile('[&<]') amp =3D re.compile('&') name =3D re.compile('^'+_Name+'$') names =3D re.compile('^'+_Name+'(?:'+_S+_Name+')*$') ref =3D re.compile('&(?:(?P'+_Name+')|#(?P(?:[0-9]+|x[0-9a-fA-= F]+)));') entref =3D re.compile('(?:&#(?P(?:[0-9]+|x[0-9a-fA-F]+))|%(?P= '+_Name+'));') _attrre =3D _S+'(?P'+_Name+')'+_opS+'=3D'+_opS+'(?P'+= _QStr+')' attrfind =3D re.compile(_attrre) starttag =3D re.compile('<(?P'+_Name+')(?P(?:'+_attrre+')*= )'+_opS+'(?P/?)>') endtag =3D re.compile(''+_Name+')'+_opS+'>') illegal =3D re.compile(r'\]\]>') illegal1 =3D re.compile('[^'+_Char+']') cdata =3D re.compile('(?:[^]]|\\](?!\\]>)|\\]\\](?!= >))*)\\]\\]>') _SystemLiteral =3D '(?P'+_QStr+')' _PublicLiteral =3D '(?P"[-\'()+,./:=3D?;!*#@$_%% \n\ra-zA-Z0-9]*"|= ' \ "'[-()+,./:=3D?;!*#@$_%% \n\ra-zA-Z0-9]*')" _ExternalId =3D '(?:SYSTEM|PUBLIC'+_S+_PublicLiteral+')'+_S+_SystemLiteral= externalid =3D re.compile(_ExternalId) ndata =3D re.compile(_S+'NDATA'+_S+'(?P'+_Name+')') doctype =3D re.compile(''+_Name+')(?:'+_S+_Exte= rnalId+')?'+_opS+'(?:\\[(?P(?:'+_S+'|%'+_Name+';|'+comment.pattern+'= |<(?:![^-]|[^!])(?:[^\'">]|\'[^\']*\'|"[^"]*")*>)*)\\]'+_opS+')?>') xmldecl =3D re.compile('<\?xml'+ _S+'version'+_opS+'=3D'+_opS+'(?P'+_QStr+')'= + '(?:'+_S+'encoding'+_opS+'=3D'+_opS+ "(?P'[A-Za-z][-A-Za-z0-9._]*'|" '"[A-Za-z][-A-Za-z0-9._]*"))?' '(?:'+_S+'standalone'+_opS+'=3D'+_opS+ '(?P\'(?:yes|no)\'|"(?:yes|no)"))?'+ _opS+'\?>') textdecl =3D re.compile('<\?xml' '(?:'+_S+'version'+_opS+'=3D'+_opS+'(?P'+_Q= Str+'))?'+ '(?:'+_S+'encoding'+_opS+'=3D'+_opS+ "(?P'[A-Za-z][-A-Za-z0-9._]*'|" '"[A-Za-z][-A-Za-z0-9._]*"))?'+ _opS+'\?>') pidecl =3D re.compile('<\\?(?![xX][mM][lL][ \t\r\n?])(?P'+_Name+')(?= :'+_S+'(?P(?:[^?]|\\?(?!>))*))?\\?>') # XML NAMESPACES _NCName =3D '['+_Letter+'_]['+'-' + _Letter + _Digit + '._' + _CombiningCh= ar + _Extender+']*' # XML Name, minus the ":" ncname =3D re.compile(_NCName + '$') qname =3D re.compile('(?:(?P' + _NCName + '):)?' # optional prefix= '(?P' + _NCName + ')$') xmlns =3D re.compile('xmlns(?::(?P' + _NCName + '))?$') # DOCTYPE _Nmtoken =3D '['+_NameChar+']+' nmtoken =3D re.compile('^'+_Nmtoken+'$') nmtokens =3D re.compile('^'+_Nmtoken+'(?:'+_S+_Nmtoken+')*$') element =3D re.compile(''+_Name+')'+_S+r'(?PEMPTY|ANY|$)') dfaelem0 =3D re.compile(_opS+r'(?P\(|'+_Name+')') dfaelem1 =3D re.compile(_opS+r'(?P[)|,])') dfaelem2 =3D re.compile(r'(?P[+*?])') mixedre =3D re.compile(r'\('+_opS+'#PCDATA'+'(('+_opS+r'\|'+_opS+_Name+')*= '+_opS+r'$\*|'+_opS+r'\))') paren =3D re.compile('[()]') attdef =3D re.compile(_S+'(?P'+_Name+')'+_S+'(?PCDATA|ID(?= :REFS?)?|ENTIT(?:Y|IES)|NMTOKENS?|NOTATION'+_S+r'$(?P'+_opS+_Na= me+'(?:'+_opS+r'\|'+_opS+_Name+')*'+_opS+r')$|$'+_opS+_Nmtoken+'(?:'+_op= S+r'\|'+_opS+_Nmtoken+')*'+_opS+r'$)'+_S+'(?P#REQUIRED|#IMPLIED|= (?:#FIXED'+_S+')?(?P'+_QStr+'))') attlist =3D re.compile(''+_Name+')(?P(?:'= +attdef.pattern+')*)'+_opS+'>') _EntityVal =3D '"(?:[^"&%]|'+ref.pattern+'|%'+_Name+';)*"|' \ "'(?:[^'&%]|"+ref.pattern+"|%"+_Name+";)*'" entity =3D re.compile(''+_Name+')'+_S+'(= ?P'+_EntityVal+'|'+_ExternalId+')|(?P'+_Name+')'+_S+'(?P'+_EntityVal+'|'+_ExternalId+'(?:'+_S+'NDATA'+_S+_Name+')?))'+_opS+'>'= ) notation =3D re.compile(''+_Name+')'+_S+'(?PSYSTEM'+_S+_SystemLiteral+'|PUBLIC'+_S+_PublicLiteral+'(?:'+_S+_SystemLi= teral+')?)'+_opS+'>') peref =3D re.compile('%(?P'+_Name+');') ignore =3D re.compile(r'') bracket =3D re.compile('[<>\'"%]') conditional =3D re.compile(r'INCLUDE)|(?PIGNOR= E))'+_opS+r'\[') class XMLParser: """XMLParser([ xmlns ]) -> instance XML document parser. There is one optional argument: xmlns: understand XML Namespaces (default is 1).""" def __init__(self, xmlns =3D 1): self.__xmlns =3D xmlns # whether or not to parse namesp= aces self.reset() def reset(self): """reset() Reset parser to pristine state.""" self.docname =3D None # The outermost element in the d= ocument (according to the DTD) self.rawdata =3D [] self.entitydefs =3D { # & entities defined in DTD (plu= s the default ones) 'lt': '<', # < 'gt': '>', # > 'amp': '&', # & 'apos': ''', # ' 'quot': '"', # " } self.pentitydefs =3D {} # % entities defined in DTD self.elems =3D {} # elements and their content/att= rs self.baseurl =3D '.' # base URL for external DTD self.ids =3D {} # IDs encountered in document self.notation =3D {} # NOTATIONs self.doctype =3D None def feed(self, data): """feed(data) Feed data to parser.""" self.rawdata.append(data) def close(self): """close() End of data, finish up parsing.""" # Actually, this is where we start parsing. data =3D string.join(self.rawdata, '') self.rawdata =3D [] self.parse(data) def __parse_textdecl(self, data, document =3D 0): # Figure out the encoding of a file by looking at the first # few bytes and the tag that may come at the very # beginning of the file. # This will convert the data to unicode from whatever format # it was originally. i =3D 0 if data[:2] =3D=3D '\376\377': # UTF-16, big-endian enc =3D 'utf-16-be' i =3D 2 elif data[:2] =3D=3D '\377\376': # UTF-16, little-endian enc =3D 'utf-16-le' i =3D 2 elif data[:4] =3D=3D '\x00\x3C\x00\x3F': # UTF-16, big-endian enc =3D 'utf-16-be' elif data[:4] =3D=3D '\x3C\x00\x3F\x00': # UTF-16, little-endian enc =3D 'utf-16-le' else: enc =3D None # unknowns as yet if enc: try: data =3D unicode(data[i:], enc) except UnicodeError: self.__error("data cannot be converted to Unicode", data, = i, self.baseurl, fatal =3D 1) i =3D 0 # optional XMLDecl if document: res =3D xmldecl.match(data, i) else: res =3D textdecl.match(data, i) if res is not None: if document: version, encoding, standalone =3D res.group('version', 'encoding', 'standalone') else: version, encoding =3D res.group('version', 'encoding') standalone =3D None if version is not None and version[1:-1] !=3D '1.0': self.__error('only XML version 1.0 supported', data, res.s= tart('version'), self.baseurl, fatal =3D 1) if encoding: encoding =3D encoding[1:-1] if enc and enc !=3D encoding.lower() and \ enc[:6] !=3D encoding.lower(): self.__error("declared encoding doesn't match actual e= ncoding", data, res.start('encoding'), self.baseurl, fatal =3D 1) enc =3D encoding.lower() if standalone: standalone =3D standalone[1:-1] ## self.handle_xml(encoding, standalone) i =3D res.end(0) if enc is None: # default is UTF 8 enc =3D 'utf-8' if type(data) is not type(u'a'): try: data =3D unicode(data[i:], enc) except UnicodeError: self.__error("data cannot be converted to Unicode", data, = i, self.baseurl, fatal =3D 1) else: data =3D data[i:] return data def __normalize_linefeed(self, data): # normalize line endings: first \r\n -> \n, then \r -> \n return u'\n'.join(u'\n'.join(data.split(u'\r\n')).split(u'\r')) def __normalize_space(self, data): # normalize white space: tab, linefeed and carriage return -> spac= e data =3D ' '.join(data.split('\t')) data =3D ' '.join(data.split('\n')) data =3D u' '.join(data.split('\r')) return data def parse(self, data): """parse(data) Parse the data as an XML document.""" from time import time t0 =3D time() data =3D self.__parse_textdecl(data, 1) data =3D self.__normalize_linefeed(data) # (Comment | PI | S)* i =3D self.__parse_misc(data, 0) # doctypedecl? res =3D doctype.match(data, i) if res is not None and self.doctype is None: docname, publit, syslit, docdata =3D res.group('docname', 'pub= lit', 'syslit', 'data') self.docname =3D docname if publit: publit =3D string.join(string.split(publit[1:-1])) if syslit: syslit =3D syslit[1:-1] self.handle_doctype(docname, publit, syslit, docdata) i =3D res.end(0) elif self.doctype: # do as if there was a declaration self.handle_doctype(None, '', self.doctype, '') else: # self.doctype =3D=3D '' or no DOCTYPE # ignore DOCTYPE self.doctype =3D None t1 =3D time() # (Comment | PI | S)* i =3D self.__parse_misc(data, i) # the document itself res =3D starttag.match(data, i) if res is None: self.__error('no elements in document', data, i, self.baseurl,= fatal =3D 1) i =3D res.end(0) tagname, slash =3D res.group('tagname', 'slash') if self.docname and tagname !=3D self.docname: self.__error('starttag does not match DOCTYPE', data, res.star= t('tagname'), self.baseurl, fatal =3D 0) val =3D self.__parse_attrs(tagname, data, res.start('tagname'), re= s.span('attrs'), None) if val is None: return nstag, attrs, namespaces =3D val self.finish_starttag(nstag, attrs) if not slash: i =3D self.__parse_content(data, i, tagname, namespaces) if i is None: return if type(i) is type(res): res =3D i else: res =3D endtag.match(data, i) if res is None: self.__error('end tag missing', data, i, self.baseurl, fat= al =3D 0) elif res.group('tagname') !=3D tagname: self.__error("end tag doesn't match start tag", data, res.= start('tagname'), self.baseurl, fatal =3D 0) i =3D res.end(0) self.finish_endtag(nstag) i =3D self.__parse_misc(data, i) if i !=3D len(data): self.__error('garbage at end of document', data, i, self.baseu= rl, fatal =3D 0) t2 =3D time() return t0, t1, t2 def __parse_misc(self, data, i): # match any number of whitespace, processing instructions and comm= ents matched =3D 1 while matched: matched =3D 0 res =3D comment.match(data, i) if res is not None: matched =3D 1 c0, c1 =3D res.span('comment') ires =3D illegal1.search(data, c0, c1) if ires is not None: self.__error('illegal characters in comment', data, ir= es.start(0), self.baseurl, fatal =3D 0) self.handle_comment(data[c0:c1]) i =3D res.end(0) res =3D pidecl.match(data, i) if res is not None: matched =3D 1 c0, c1 =3D res.span('data') ires =3D illegal1.search(data, c0, c1) if ires is not None: self.__error('illegal characters in Processing Instruc= tion', data, ires.start(0), self.baseurl, fatal =3D 0) self.handle_proc(res.group('name'), res.group('data') or '= ') i =3D res.end(0) res =3D space.match(data, i) if res is not None: matched =3D 1 i =3D res.end(0) return i def __update_state(self, dfa, states, tagname): # update the list of states in the dfa. If tagname is None, # we're looking for the final state, so return a list of all # states reachable using epsilon transitions nstates =3D [] seenstates =3D {} while states: s =3D states[0] seenstates[s] =3D 1 del states[0] if tagname is not None and dfa[s].has_key(tagname): nstates =3D dfa[s][tagname][:] else: for s in dfa[s].get('', []): if not seenstates.has_key(s): states.append(s) if tagname is None: nstates =3D seenstates.keys() states[:] =3D nstates # change in-line def __check_dfa(self, dfa, initstate, tagname, data, i): states =3D [initstate] possibles =3D {} seenstates =3D {} while states: s =3D states[0] seenstates[s] =3D 1 del states[0] for tag in dfa[s].keys(): if tag and possibles.has_key(tag): self.__error("non-deterministic content model for `%s'= " % tagname, data, i, self.baseurl, fatal =3D 0) possibles[tag] =3D 1 for s in dfa[s].get('', []): if not seenstates.has_key(s): states.append(s) def __parse_content(self, data, i, ptagname, namespaces, states =3D No= ne): # parse the content of an element (i.e. the string between # start tag and end tag) datalen =3D len(data) if self.elems.has_key(ptagname): content, attributes, start, end =3D self.elems[ptagname][:4] #= content model if states =3D=3D None: states =3D [start] else: content =3D None # unknown content model while i < datalen: matched =3D 0 res =3D interesting.search(data, i) if res is None: j =3D datalen else: j =3D res.start(0) if j > i: res =3D illegal.search(data, i, j) if res is not None: self.__error("illegal data content in element `%s'" % = ptagname, data, i, self.baseurl, fatal =3D 0) skip =3D 0 complain =3D 0 if content is not None: res =3D space.match(data, i, j) isspace =3D res is not None and res.span(0) =3D=3D (i,= j) if content =3D=3D 'EMPTY': complain =3D 1 skip =3D 1 elif not isspace and type(content) is type([]) and co= ntent and type(content[0]) is type({}): complain =3D 1 if complain: self.__error("no character data allowed in element= `%s'" % ptagname, data, i, self.baseurl, fatal =3D 0) matched =3D 1 if not skip: self.handle_data(data[i:j]) i =3D j res =3D starttag.match(data, i) if res is not None: tagname, slash =3D res.group('tagname', 'slash') if content =3D=3D 'EMPTY' or content =3D=3D '#PCDATA': self.__error("empty element `%s' has content" % ptagna= me, data, res.start(0), self.baseurl, fatal =3D 0) elif content =3D=3D 'ANY': # always OK pass elif type(content) is type([]) and content and type(conten= t[0]) is not type({}): # mixed if tagname not in content: self.__error("illegal content in element `%s'" % p= tagname, data, res.start(0), self.baseurl, fatal =3D 0) elif content is not None: self.__update_state(content, states, tagname) if not states: self.__error("illegal content for element `%s'" % = ptagname, data, i, self.baseurl) val =3D self.__parse_attrs(tagname, data, res.start('tagna= me'), res.span('attrs'), namespaces) if val is None: return i =3D res.end(0) nstag, attrs, subnamespaces =3D val self.finish_starttag(nstag, attrs) if not slash: i =3D self.__parse_content(data, i, tagname, subnamesp= aces) if i is None: return if type(i) is type(res): res =3D i else: res =3D endtag.match(data, i) if res is None: self.__error('end tag missing', data, i, self.base= url, fatal =3D 0) elif res.group('tagname') !=3D tagname: self.__error("end tag doesn't match start tag", da= ta, res.start('tagname'), self.baseurl, fatal =3D 0) i =3D res.end(0) self.finish_endtag(nstag) matched =3D 1 res =3D endtag.match(data, i) if res is not None: if type(content) is type([]) and content and type(content[= 0]) is type({}): self.__update_state(content, states, None) if end not in states: self.__error("content of element `%s' doesn't matc= h content model" % ptagname, data, i, self.baseurl, fatal =3D 0) return res res =3D comment.match(data, i) if res is not None: c0, c1 =3D res.span('comment') ires =3D illegal1.search(data, c0, c1) if ires is not None: self.__error('illegal characters in comment', data, ir= es.start(0), self.baseurl, fatal =3D 0) self.handle_comment(data[c0:c1]) i =3D res.end(0) matched =3D 1 res =3D ref.match(data, i) if res is not None: name =3D res.group('name') if name: if self.entitydefs.has_key(name): sval =3D val =3D self.entitydefs[name] baseurl =3D self.baseurl if type(val) is type(()): if val[2] is not None: apply(self.handle_ndata, val) val =3D None else: val =3D self.__read_pentity(val[0], val[1]= ) if val is not None: del self.entitydefs[name] # to break recursion= n =3D self.__parse_content(val, 0, ptagname, n= amespaces, states) self.entitydefs[name] =3D sval # restore value= if val is not None: if n is None: self.baseurl =3D baseurl return if type(n) is type(res) or n !=3D len(val): if type(n) is type(res): n =3D res.start(0) self.__error('misformed entity value', dat= a, n, self.baseurl, fatal =3D 0) self.baseurl =3D baseurl else: if self.docname: self.__error("unknown entity reference `&%s;' = in element `%s'" % (name, ptagname), data, i, self.baseurl, fatal =3D 0) self.data =3D data self.offset =3D res.start('name') self.lineno =3D string.count(data, '\n', 0, self.o= ffset) self.unknown_entityref(name) else: str =3D self.__parse_charref(res.group('char'), data, = res.start(0)) if str is None: return self.handle_data(str) i =3D res.end(0) matched =3D 1 res =3D pidecl.match(data, i) if res is not None: matched =3D 1 c0, c1 =3D res.span('data') ires =3D illegal1.search(data, c0, c1) if ires is not None: self.__error('illegal characters in Processing Instruc= tion', data, ires.start(0), self.baseurl, fatal =3D 0) self.handle_proc(res.group('name'), res.group('data') or '= ') i =3D res.end(0) res =3D cdata.match(data, i) if res is not None: matched =3D 1 c0, c1 =3D res.span('cdata') ires =3D illegal1.search(data, c0, c1) if ires is not None: self.__error('illegal characters in CDATA section', da= ta, ires.start(0), self.baseurl, fatal =3D 0) self.handle_cdata(res.group('cdata')) i =3D res.end(0) if not matched: self.__error("no valid content in element `%s'" % ptagname= , data, i, self.baseurl) return return i def __check_attr(self, tagname, attrname, value, attributes, data, att= rstart): # check that the attribute attrname on element tagname is of # the correct type with a legal value # return the normalized value (i.e. white space collapsed if # appropriate) # XXX this method needs work to be complete attype, atvalue, atstring =3D attributes[attrname] if atvalue[:6] =3D=3D '#FIXED': if value !=3D atstring: self.__error("attribute `%s' in element `%s' does not have= correct value" % (attrname, tagname), data, attrstart, self.baseurl, fata= l =3D 0) if attype =3D=3D 'CDATA': return value # always OK and don't change value= if type(attype) is type([]): # enumeration if value not in attype: self.__error("attribute `%s' in element `%s' not valid" % = (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) return value if type(attype) is type(()): if value not in attype[1]: self.__error("attribute `%s' in element `%s' not valid" % = (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) return value if attype =3D=3D 'ID': if name.match(value) is None: self.__error("attribute `%s' in element `%s' is not an ID"= % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) if self.ids.has_key(value): self.__error("attrbute `%s' in element `%s' is not unique"= % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) self.ids[value] =3D 1 return value if attype =3D=3D 'IDREF': if name.match(value) is None: self.__error("attrbute `%s' in element `%s' is not an IDRE= F" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) # XXX should check ID exists return value if attype =3D=3D 'IDREFS': if names.match(value) is None: self.__error("attrbute `%s' in element `%s' is not an IDRE= FS" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) # XXX should check IDs exist return value if attype =3D=3D 'NMTOKEN': if nmtoken.match(value) is None: self.__error("attrbute `%s' in element `%s' is not a NMTOK= EN" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) return value if attype =3D=3D 'NMTOKENS': if nmtokens.match(value) is None: self.__error("attrbute `%s' in element `%s' is not a NMTOK= ENS" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) return value if attype =3D=3D 'ENTITY': if name.match(value) is None: self.__error("attrbute `%s' in element `%s' is not an ENTI= TY" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) # XXX should check ENTITY exists return value if attype =3D=3D 'ENTITIES': if names.match(value) is None: self.__error("attrbute `%s' in element `%s' is not an ENTI= TIES" % (attrname, tagname), data, attrstart, self.baseurl, fatal =3D 0) # XXX should check ENTITIES exist return value # XXX other types? return value def __parse_attrs(self, tagname, data, tagstart, span, namespaces): # parse the string between the tag name and closing bracket # for attribute=3Dvalue pairs i, dataend =3D span attrlist =3D [] namespace =3D None reqattrs =3D {} # attributes that are #REQUIRED if self.elems.has_key(tagname): attributes =3D self.elems[tagname][1] for key, (attype, atvalue, atstring) in attributes.items(): if atvalue =3D=3D '#REQUIRED': reqattrs[key] =3D 1 attrseen =3D {} # attributes that we've seen else: attributes =3D None while i < dataend: res =3D attrfind.match(data, i, dataend) if res is None: # couldn't match any attributes, but there is more # string to parse: complain and ignore rest of string self.__error('bad attributes', data, i, self.baseurl, fata= l =3D 0) return name =3D res.group('attrname') if reqattrs.has_key(name): del reqattrs[name] # seen this #REQUIRED attribute if attributes is not None and attributes.has_key(name): attype =3D attributes[name][0] else: attype =3D None start, end =3D res.span('attrvalue') value =3D self.__parse_attrval(data, attype, span =3D (start+1= , end-1)) if value is None: # bad attribute value: ignore, but continue parsing i =3D res.end(0) continue attrstart =3D res.start('attrname') if attributes is not None: if attributes.has_key(name): attrseen[name] =3D 1 value =3D self.__check_attr(tagname, name, value, attr= ibutes, data, attrstart) else: self.__error("unknown attribute `%s' on element `%s'" = % (name, tagname), data, attrstart, self.baseurl, fatal =3D 0) i =3D res.end(0) if self.__xmlns: res =3D xmlns.match(name) if res is not None: # namespace declaration ncname =3D res.group('ncname') if namespace is None: namespace =3D {} namespace[ncname or ''] =3D value or None continue attrlist.append((name, value, attrstart)) if reqattrs: # there are #REQUIRED attributes that we haven't seen reqattrs =3D reqattrs.keys() reqattrs.sort() if len(reqattrs) > 1: s =3D 's' else: s =3D '' reqattrs =3D string.join(reqattrs, "', `") self.__error("required attribute%s `%s' of element `%s' missin= g" % (s, reqattrs, tagname), data, dataend, self.baseurl, fatal =3D 0) if attributes is not None: # fill in missing attributes that have a default value for key, (attype, atvalue, atstring) in attributes.items(): if atstring is not None and not attrseen.has_key(key): attrlist.append((key, atstring, dataend)) if namespace is not None: namespaces =3D (namespace, namespaces) if namespaces is not None: res =3D qname.match(tagname) if res is not None: prefix, nstag =3D res.group('prefix', 'local') if prefix is None: prefix =3D '' ns =3D None n =3D namespaces while n is not None: d, n =3D n if d.has_key(prefix): ns =3D d[prefix] break if ns is not None: tagname =3D ns + ' ' + nstag elif prefix !=3D '': self.__error("unknown namespace prefix `%s'" % prefix,= data, tagstart, self.baseurl, fatal =3D 0) else: self.__error("badly formed tag name `%s'" % tagname, data,= tagstart, self.baseurl, fatal =3D 0) attrdict =3D {} # collect attributes/values for attr, value, attrstart in attrlist: if namespaces is not None: res =3D qname.match(attr) if res is not None: prefix, nsattr =3D res.group('prefix', 'local') if prefix: ans =3D None n =3D namespaces while n is not None: d, n =3D n if d.has_key(prefix): ans =3D d[prefix] break if ans is not None: attr =3D ans + ' ' + nsattr elif prefix !=3D '': self.__error("unknown namespace prefix `%s'" %= prefix, data, attrstart, self.baseurl, fatal =3D 0) else: self.__error("badly formed attribute name `%s'" % attr= , data, attrstart, self.baseurl, fatal =3D 0) if attrdict.has_key(attr): self.__error("duplicate attribute name `%s'" % attr, data,= attrstart, self.baseurl, fatal =3D 0) attrdict[attr] =3D value return tagname, attrdict, namespaces def __parse_attrval(self, data, attype, span =3D None): # parse an attribute value, replacing entity and character # references with their values if span is None: i =3D 0 dataend =3D len(data) else: i, dataend =3D span res =3D illegal1.search(data, i, dataend) if res is not None: self.__error("illegal characters in attribute value", data, re= s.start(0), self.baseurl, fatal =3D 0) newval =3D [] while i < dataend: res =3D interesting.search(data, i, dataend) if res is None: str =3D data[i:dataend] if attype is None or attype =3D=3D 'CDATA': str =3D self.__normalize_space(str) newval.append(str) break j =3D res.start(0) if data[j] =3D=3D '<': self.__error("no `<' allowed in attribute value", data, j,= self.baseurl, fatal =3D 0) if j > i: str =3D data[i:j] if attype is None or attype =3D=3D 'CDATA': str =3D self.__normalize_space(str) newval.append(str) res =3D ref.match(data, j, dataend) if res is None: self.__error('illegal attribute value', data, j, self.base= url, fatal =3D 0) newval.append(data[j]) # the & i =3D j + 1 # continue searching after the &= continue i =3D res.end(0) name =3D res.group('name') if name: # entity reference (e.g. "<") if self.entitydefs.has_key(name): val =3D self.entitydefs[name] if type(val) is type(()): self.__error("no external parsed entity allowed in= attribute value", data, res.start(0), self.baseurl, fatal =3D 1) del self.entitydefs[name] nval =3D self.__parse_attrval(val, attype) self.entitydefs[name] =3D val if nval is None: return newval.append(nval) else: self.__error("reference to unknown entity `%s'" % name= , data, res.start(0), self.baseurl, fatal =3D 0) newval.append('&%s;' % name) else: val =3D self.__parse_charref(res.group('char'), data, res.= start(0)) if val is None: newval.append('&#%s;' % res.group('char')) continue newval.append(val) str =3D string.join(newval, '') if attype is not None and attype !=3D 'CDATA': str =3D string.join(string.split(str)) return str def __parse_charref(self, name, data, i): # parse a character reference (e.g. "%#38;") # the "name" arg is just part between # and ; if name[0] =3D=3D 'x': # e.g. & n =3D int(name[1:], 16) else: # e.g. & n =3D int(name) try: c =3D unichr(n) except ValueError: self.__error('bad character reference', data, i, self.baseurl,= fatal =3D 0) return if illegal1.search(c): self.__error('bad character reference', data, i, self.baseurl,= fatal =3D 0) return c def __read_pentity(self, publit, syslit): import urllib syslit =3D urllib.basejoin(self.baseurl, syslit) baseurl =3D self.baseurl self.baseurl =3D syslit val =3D self.read_external(publit, syslit) val =3D self.__parse_textdecl(val) return self.__normalize_linefeed(val) def parse_dtd(self, data, internal =3D 1): """parse_dtd(data[, internal ]) Parse the DTD. This method is called by the parse_doctype method and is provided so that parse_doctype can be overridden. Argument is a string containing the full DTD. Optional argument internal is true (default) if the DTD is internal.""" i =3D 0 matched =3D 1 ilevel =3D 0 # nesting level of ignored secti= ons while i < len(data) and matched: matched =3D 0 res =3D peref.match(data, i) if res is not None: matched =3D 1 name =3D res.group('name') if self.pentitydefs.has_key(name): val =3D self.pentitydefs[name] baseurl =3D self.baseurl if type(val) is type(()): val =3D self.__read_pentity(val[0], val[1]) self.parse_dtd(val, internal) self.baseurl =3D baseurl else: self.__error("unknown entity `%%%s;'" % name, data, i,= self.baseurl, fatal =3D 0) i =3D res.end(0) res =3D element.match(data, i) if res is not None: matched =3D 1 name, content =3D res.group('name', 'content') i =3D res.end(0) elemval =3D (None, {}, None, None, None) if self.elems.has_key(name): elemval =3D self.elems[name] if elemval[0] is not None: # XXX is this an error? self.__error('non-unique element name declaration'= , data, i, self.baseurl, fatal =3D 0) elif content =3D=3D 'EMPTY': # check for NOTATION on EMPTY element for atname, (attype, atvalue, atstring) in elemval= [1].items(): if type(attype) is type(()) and attype[0] =3D= =3D 'NOTATION': self.__error("NOTATION not allowed on EMPT= Y element", data, i, self.baseurl) if content[0] =3D=3D '(': i =3D res.start('content') j, content, start, end =3D self.__dfa(data, i) if type(content) is type([]) and content and type(cont= ent[0]) is type({}): self.__check_dfa(content, start, name, data, i) contentstr =3D data[i:j] i =3D j else: contentstr =3D content start =3D end =3D 0 self.elems[name] =3D (content, elemval[1], start, end, con= tentstr) res =3D space.match(data, i) if res is not None: i =3D res.end(0) if data[i:i+1] !=3D '>': self.__error('bad DOCTYPE', data, i, self.baseurl) return i =3D i+1 res =3D attlist.match(data, i) if res is not None: matched =3D 1 elname, atdef =3D res.group('elname', 'atdef') if not self.elems.has_key(elname): self.elems[elname] =3D (None, {}, None, None, None) ares =3D attdef.match(atdef) while ares is not None: atname, attype, atvalue, atstring =3D ares.group('atna= me', 'attype', 'atvalue', 'atstring') if attype[0] =3D=3D '(': attype =3D map(string.strip, string.split(attype[1= :-1], '|')) elif attype[:8] =3D=3D 'NOTATION': if self.elems[elname][0] =3D=3D 'EMPTY': self.__error("NOTATION not allowed on EMPTY el= ement", data, ares.start('attype'), self.baseurl) atnot =3D map(string.strip, string.split(ares.grou= p('notation'), '|')) attype =3D ('NOTATION', atnot) if atstring: atstring =3D atstring[1:-1] # remove quotes atstring =3D self.__parse_attrval(atstring, attype= ) if attype !=3D 'CDATA': atstring =3D string.join(string.split(atstring= )) else: atstring =3D string.join(string.split(atstring= , '\t'), ' ') if type(attype) is type([]): if atstring is not None and atstring not in attype= : self.__error("default value for attribute `%s'= on element `%s' not listed as possible value" % (atname, elname), data, i= , self.baseurl) elif type(attype) is type(()): if atstring is not None and atstring not in attype= [1]: self.__error("default value for attribute `%s'= on element `%s' not listed as possible value" % (atname, elname), data, i= , self.baseurl) if not self.elems[elname][1].has_key(atname): # first definition counts self.elems[elname][1][atname] =3D attype, atvalue,= atstring ares =3D attdef.match(atdef, ares.end(0)) i =3D res.end(0) res =3D entity.match(data, i) if res is not None: matched =3D 1 pname, name =3D res.group('pname', 'ename') if pname: pvalue =3D res.group('pvalue') if pvalue[0] in ('"',"'"): c0, c1 =3D res.span('pvalue') ires =3D illegal1.search(data, c0+1, c1-1) if ires is not None: self.__error("illegal characters in entity val= ue", data, ires.start(0), self.baseurl, fatal =3D 0) if self.pentitydefs.has_key(pname): # first definition counts pass elif pvalue[0] in ('"',"'"): pvalue =3D pvalue[1:-1] pvalue =3D self.__normalize_space(pvalue) cres =3D entref.search(pvalue) while cres is not None: chr, nm =3D cres.group('char', 'pname') if chr: repl =3D self.__parse_charref(cres.group('= char'), data, i) elif self.pentitydefs.has_key(nm): repl =3D self.pentitydefs[nm] else: self.__error("unknown entity `%s' referenc= ed" % nm, data, i, self.baseurl) repl =3D '%%%s;' % nm if type(repl) is type(()): baseurl =3D self.baseurl repl =3D self.__read_pentity(repl[0], repl= [1]) self.baseurl =3D baseurl pvalue =3D pvalue[:cres.start(0)] + repl + pva= lue[cres.end(0):] cres =3D entref.search(pvalue, cres.start(0)+l= en(repl)) self.pentitydefs[pname] =3D pvalue else: r =3D externalid.match(pvalue) publit, syslit =3D r.group('publit', 'syslit') if publit: publit =3D string.join(string.split(pub= lit[1:-1])) if syslit: syslit =3D syslit[1:-1] self.pentitydefs[pname] =3D publit, syslit else: value =3D res.group('value') if value[0] in ('"',"'"): c0, c1 =3D res.span('value') ires =3D illegal1.search(data, c0+1, c1-1) if ires is not None: self.__error("illegal characters in entity val= ue", data, ires.start(0), self.baseurl, fatal =3D 0) if self.entitydefs.has_key(name): # use first definition pass elif value[0] in ('"',"'"): value =3D value[1:-1] value =3D self.__normalize_space(value) cres =3D entref.search(value) while cres is not None: chr, nm =3D cres.group('char', 'pname') if chr: repl =3D self.__parse_charref(cres.group('= char'), data, i) elif self.pentitydefs.has_key(nm): repl =3D self.pentitydefs[nm] if type(repl) is type(()): baseurl =3D self.baseurl repl =3D self.__read_pentity(repl[0], = repl[1]) self.baseurl =3D baseurl else: self.__error("unknown entity `%s' referenc= ed" % nm, data, i, self.baseurl) repl =3D '%%%s;' % nm value =3D value[:cres.start(0)] + repl + value= [cres.end(0):] cres =3D entref.search(value, cres.start(0)+le= n(repl)) self.entitydefs[name] =3D value else: r =3D externalid.match(value) publit, syslit =3D r.group('publit', 'syslit') if publit: publit =3D string.join(string.split(pub= lit[1:-1])) if syslit: syslit =3D syslit[1:-1] r1 =3D ndata.match(value, r.end(0)) if r1 is not None: ndataname =3D r1.group('name') else: ndataname =3D None self.entitydefs[name] =3D publit, syslit, ndatanam= e i =3D res.end(0) res =3D notation.match(data, i) if res is not None: matched =3D 1 name, value =3D res.group('name', 'value') if not self.notation.has_key(name): self.notation[name] =3D value i =3D res.end(0) j =3D i # remember where we were i =3D self.__parse_misc(data, i) matched =3D matched or i > j # matched anything? if not internal: if data[i:i+1] =3D=3D '<': hlevel =3D 1 quote =3D None j =3D i+1 while hlevel > 0: res =3D bracket.search(data, j) if res is None: self.__error("unexpected EOF", data, i, self.b= aseurl, fatal =3D 1) j =3D res.end(0) c =3D data[res.start(0)] if c =3D=3D '<': hlevel =3D hlevel + 1 elif quote and c =3D=3D quote: quote =3D None elif c in ('"', "'"): quote =3D c elif c =3D=3D '>': hlevel =3D hlevel - 1 elif hlevel =3D=3D 1 and not quote: # only expand parsed entities at lowest level res =3D peref.match(data, res.start(0)) if res is not None: pname =3D res.group('name') if self.pentitydefs.has_key(pname): repl =3D self.pentitydefs[pname] if type(repl) is type(()): baseurl =3D self.baseurl repl =3D self.__read_pentity(repl[= 0], repl[1]) self.baseurl =3D baseurl data =3D data[:res.start(0)] + ' ' + r= epl + ' ' + data[res.end(0):] j =3D res.start(0) + len(repl) + 2 else: j =3D res.end(0) res =3D conditional.match(data, i) if res is not None: inc, ign =3D res.group('inc', 'ign') i =3D res.end(0) if ign: level =3D 1 while level > 0: res =3D ignore.search(data, i) if res.start(0) =3D=3D '<': level =3D level + 1 else: level =3D level - 1 i =3D res.end(0) elif inc: ilevel =3D ilevel + 1 if ilevel and data[i:i+3] =3D=3D ']]>': i =3D i+3 ilevel =3D ilevel - 1 if i < len(data): self.__error('error while parsing DOCTYPE', data, i, self.base= url) def __dfa(self, data, i): res =3D mixedre.match(data, i) if res is not None: mixed =3D res.group(0) if mixed[-1] =3D=3D '*': mixed =3D map(string.strip, string.split(mixed[1:-2], '|')= ) else: mixed =3D '#PCDATA' return res.end(0), mixed, 0, 0 dfa =3D [] i, start, end =3D self.__dfa1(data, i, dfa) return i, dfa, start, end def __dfa1(self, data, i, dfa): res =3D dfaelem0.match(data, i) if res is None: self.__error("syntax error in element content: `(' or Name exp= ecter", data, i, self.baseurl, fatal =3D 1) token =3D res.group('token') if token =3D=3D '(': i, start, end =3D self.__dfa1(data, res.end(0), dfa) res =3D dfaelem1.match(data, i) if res is None: self.__error("syntax error in element content: `)', `|', o= r `,' expected", data, i, self.baseurl, fatal =3D 1) token =3D res.group('token') sep =3D token while token in (',','|'): if sep !=3D token: self.__error("syntax error in element content: `%s' or= `)' expected" % sep, data, i, self.baseurl, fatal =3D 1) i, nstart, nend =3D self.__dfa1(data, res.end(0), dfa) res =3D dfaelem1.match(data, i) if res is None: self.__error("syntax error in element content: `%s' or= `)' expected" % sep, data, i, self.baseurl, fatal =3D 1) token =3D res.group('token') if sep =3D=3D ',': # concatenate DFAs e =3D dfa[end].get('', []) e.append(nstart) dfa[end][''] =3D e end =3D nend else: # make parallel s =3D len(dfa) dfa.append({'': [start, nstart]}) e =3D dfa[end].get('', []) e.append(len(dfa)) dfa[end][''] =3D e e =3D dfa[nend].get('', []) e.append(len(dfa)) dfa[nend][''] =3D e start =3D s end =3D len(dfa) dfa.append({}) # token =3D=3D ')' i =3D res.end(0) else: # it's a Name start =3D len(dfa) dfa.append({token: [start+1]}) end =3D len(dfa) dfa.append({}) i =3D res.end(0) res =3D dfaelem2.match(data, i) if res is not None: token =3D res.group('token') s =3D len(dfa) e =3D s+1 if token =3D=3D '+': dfa.append({'': [start]}) else: dfa.append({'': [start, e]}) dfa.append({}) l =3D dfa[end].get('', []) dfa[end][''] =3D l if token !=3D '?': l.append(start) l.append(e) start =3D s end =3D e i =3D res.end(0) return i, start, end def parse_doctype(self, tag, publit, syslit, data): """parse_doctype(tag, publit, syslit, data) Parse the DOCTYPE. This method is called by the handle_doctype callback method and is provided so that handle_doctype can be overridden. The arguments are: tag: the name of the outermost element of the document; publit: the Public Identifier of the DTD (or None); syslit: the System Literal of the DTD (or None); data: the internal subset of the DTD (or None).""" if data: self.parse_dtd(data) if syslit: import urllib syslit =3D urllib.basejoin(self.baseurl, syslit) baseurl =3D self.baseurl self.baseurl =3D syslit external =3D self.read_external(publit, syslit) external =3D self.__parse_textdecl(external) external =3D self.__normalize_linefeed(external) self.parse_dtd(external, 0) self.baseurl =3D baseurl def __error(self, message, data =3D None, i =3D None, filename =3D Non= e, fatal =3D 1): # called for all syntax errors # this either raises an exception (Error) or calls # self.syntax_error which may be overridden if data is not None and i is not None: self.lineno =3D lineno =3D string.count(data, '\n', 0, i) + 1 else: self.lineno =3D None self.data =3D data self.offset =3D i self.filename =3D filename if fatal: raise Error(message, lineno, data, i, filename) self.syntax_error(message) # Overridable -- handle xml processing instruction def handle_xml(self, encoding, standalone): pass # Overridable -- handle DOCTYPE def handle_doctype(self, tag, publit, syslit, data): if self.doctype is not None: syslit =3D self.doctype self.parse_doctype(tag, publit, syslit, data) # Example -- read external file referenced from DTD with a SystemLiter= al def read_external(self, publit, syslit): return '' # Example -- handle comment, could be overridden def handle_comment(self, data): pass # Example -- handle processing instructions, could be overridden def handle_proc(self, name, data): pass # Example -- handle data, should be overridden def handle_data(self, data): pass # Example -- handle cdata, should be overridden def handle_cdata(self, data): pass elements =3D {} # dict: tagname -> (startfunc, e= ndfunc) def finish_starttag(self, tagname, attrs): method =3D self.elements.get(tagname, (None, None))[0] if method is None: self.unknown_starttag(tagname, attrs) else: self.handle_starttag(tagname, method, attrs) def finish_endtag(self, tagname): method =3D self.elements.get(tagname, (None, None))[1] if method is None: self.unknown_endtag(tagname) else: self.handle_endtag(tagname, method) # Overridable -- handle start tag def handle_starttag(self, tagname, method, attrs): method(tagname, attrs) # Overridable -- handle end tag def handle_endtag(self, tagname, method): method(tagname) # To be overridden -- handlers for unknown objects def unknown_starttag(self, tagname, attrs): pass def unknown_endtag(self, tagname): pass def unknown_entityref(self, name): self.__error('reference to unknown entity', self.data, self.offset= , self.baseurl) # Example -- handle relatively harmless syntax errors, could be overri= dden def syntax_error(self, message): raise Error(message, self.lineno, self.data, self.offset, self.fil= ename) class TestXMLParser(XMLParser): def __init__(self, xmlns =3D 1): self.testdata =3D "" XMLParser.__init__(self, xmlns) def handle_xml(self, encoding, standalone): self.flush() print 'xml: encoding =3D %s standalone =3D %s' % (encoding, standa= lone) def read_external(self, publit, syslit): print 'reading %s' % name try: import urllib u =3D urllib.urlopen(syslit) data =3D u.read() u.close() except 'x': return '' return data def handle_doctype(self, tag, publit, syslit, data): self.flush() print 'DOCTYPE: %s %s' % (tag, `data`) XMLParser.handle_doctype(self, tag, publit, syslit, data) def handle_comment(self, data): self.flush() r =3D `data` if len(r) > 68: r =3D r[:32] + '...' + r[-32:] print 'comment: %s' % r def handle_proc(self, name, data): self.flush() print 'processing: %s %s' % (name,`data`) def handle_data(self, data): self.testdata =3D self.testdata + data if len(`self.testdata`) >=3D 70: self.flush() def handle_cdata(self, data): self.flush() print 'cdata: %s' % `data` def flush(self): data =3D self.testdata if data: self.testdata =3D "" print 'data: %s ' % `data` ## def syntax_error(self, message): ## if self.lineno is not None: ## print 'Syntax error at line %d: %s' % (self.lineno, message)= ## else: ## print 'Syntax error: %s' % message def unknown_starttag(self, tag, attrs): self.flush() if not attrs: print 'start tag: <%s>' % tag else: print 'start tag: <%s' % tag, for name, value in attrs.items(): print '%s =3D "%s"' % (name.encode('latin-1'), `value`), print '>' def unknown_endtag(self, tag): self.flush() print 'end tag: ' % tag def unknown_entityref(self, name): self.flush() print '&%s;' % name class CanonXMLParser(XMLParser): __cache =3D {} def read_external(self, publit, syslit): if publit and self.__cache.has_key(publit): return self.__cache[publit] try: import urllib u =3D urllib.urlopen(syslit) data =3D u.read() u.close() except 'x': return '' if publit: self.__cache[publit] =3D data return data def handle_data(self, data): sys.stdout.write(self.encode(data)) def handle_cdata(self, data): sys.stdout.write(self.encode(data)) def handle_proc(self, name, data): sys.stdout.write('' % (name.encode('utf-8'), data.strip()= .encode('utf-8'))) def unknown_starttag(self, tag, attrs): sys.stdout.write('<%s' % tag.encode('utf-8')) attrlist =3D attrs.items() attrlist.sort() for name, value in attrlist: sys.stdout.write(' %s=3D"%s"' % (name.encode('utf-8'), self.en= code(value))) sys.stdout.write('>') def unknown_endtag(self, tag): sys.stdout.write('' % tag.encode('utf-8')) def unknown_entityref(self, name): print '&%s;' % name.encode('utf-8') def encode(self, data): for c, tr in [('&', '&'), ('>', '>'), ('<', '<'), ('"', '"'), ('\t', ' '), ('\n', ' '), ('\r', ' ')]: data =3D tr.join(data.split(c)) return data.encode('utf-8') class CheckXMLParser(XMLParser): __cache =3D {} def read_external(self, publit, syslit): if publit and self.__cache.has_key(publit): return self.__cache[publit] try: import urllib u =3D urllib.urlopen(syslit) data =3D u.read() u.close() except 'x': return '' if publit: self.__cache[publit] =3D data return data def test(args =3D None): import sys, getopt if not args: args =3D sys.argv[1:] opts, args =3D getopt.getopt(args, 'cstnvCd:') klass =3D TestXMLParser do_time =3D 0 namespace =3D 1 verbose =3D 0 doctype =3D None for o, a in opts: if o =3D=3D '-c': klass =3D CanonXMLParser elif o =3D=3D '-C': klass =3D CheckXMLParser elif o =3D=3D '-s': klass =3D XMLParser elif o =3D=3D '-t': do_time =3D 1 elif o =3D=3D '-n': namespace =3D 0 elif o =3D=3D '-v': verbose =3D 1 elif o =3D=3D '-d': doctype =3D a if not args: args =3D ['test.xml'] for file in args: if file =3D=3D '-': f =3D sys.stdin url =3D '.' else: try: f =3D open(file, 'r') except IOError, msg: print file, ":", msg sys.exit(1) import urllib url =3D urllib.pathname2url(file) data =3D f.read() if f is not sys.stdin: f.close() x =3D klass(xmlns =3D namespace) x.baseurl =3D url x.doctype =3D doctype if verbose: print '=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D',file try: t0, t1, t2 =3D x.parse(data) except Error, info: do_time =3D 0 # can't print times now print str(info) if info.text is not None and info.offset is not None: i =3D string.rfind(info.text, '\n', 0, info.offset) + 1 j =3D string.find(info.text, '\n', info.offset) if j =3D=3D -1: j =3D len(info.text) try: print info.text[i:j] except UnicodeError: print `info.text[i:j]` else: print ' '*(info.offset-i)+'^' if klass is CanonXMLParser and (verbose or len(args) > 1): sys.stdout.write('\n') if do_time: print 'total time: %g' % (t2-t0) print 'parse DTD: %g' % (t1-t0) print 'parse body: %g' %(t2-t1) if __name__ =3D=3D '__main__': test() ------- =_aaaaaaaaaa0-- From rsalz@zolera.com Mon Apr 23 17:17:48 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 23 Apr 2001 12:17:48 -0400 Subject: [XML-SIG] Canonicalizing XML References: <20010423155236.71C71301CF7@bireme.oratrix.nl> Message-ID: <3AE455AC.2A829E01@zolera.com> > I've written a validating XML parser in Python that can produce > Canonical XML. As in the XML C14N recommendation? It seems to do something else. For example, all namespace attributes are removed. > I'll attach it. Usage (for getting Canonical XML): > > python fxmllib.py -c file.xml Do I need a DTD? /r$ From sjoerd.mullender@oratrix.com Mon Apr 23 19:02:16 2001 From: sjoerd.mullender@oratrix.com (Sjoerd Mullender) Date: Mon, 23 Apr 2001 20:02:16 +0200 Subject: [XML-SIG] Canonicalizing XML References: <20010423155236.71C71301CF7@bireme.oratrix.nl> <3AE455AC.2A829E01@zolera.com> Message-ID: <3AE46E28.D062FA30@oratrix.com> Ah, there is a recommendation. In other words, no, I didn't follow the C14N recommendation. I emulated the output of the XML test suite. I forget where that came from, but I'm sure I can find out. The test suite doesn't deal with namespaces since it's not for testing the XML Namespace rec. No, you don't need a DTD. Rich Salz wrote: > > > I've written a validating XML parser in Python that can produce > > Canonical XML. > > As in the XML C14N recommendation? It seems to do something else. For > example, all namespace attributes are removed. > > > I'll attach it. Usage (for getting Canonical XML): > > > > python fxmllib.py -c file.xml > > Do I need a DTD? > /r$ From noreply@sourceforge.net Mon Apr 23 19:36:25 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 23 Apr 2001 11:36:25 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418315 ] ODS problems in Python 2.0 Message-ID: Bugs item #418315, was updated on 2001-04-23 11:36 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418315&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mike Olson (mikeolson) Assigned to: Nobody/Anonymous (nobody) Summary: ODS problems in Python 2.0 Initial Comment: Email from alexander smishlajev hello all! i have run into some problems with 4Suite-0.10.2: - the long long datatype formatter in Oracle and Postgres adapters: Constants.Types.SIGNED_LONG_LONG:lambda x:str(long(x))[:-1] does not work on python 2.0: str() in v2.0 does not append 'L' suffix, so the last digit gets trimmed. replacing str() with repr() makes types_test.py pass ok. - i think that there is a bug in Lib/DbmDatabase.CreateDatabase(): the DATABASE_DIR is joined twice. the following patch fixes the problem: --- DbmDatabase.py.orig Mon Feb 19 02:31:18 2001 +++ DbmDatabase.py Tue Feb 20 21:15:46 2001 @@ -179,7 +179,7 @@ dir_util._path_created = {} dir_util.mkpath(dbpath) - return Database(dbpath) + return Database(dbName) def DropDatabase(dbName): CheckVersion() - DbmDatabase assumes that each table is stored in file with the same name. in most cases, this is true because default anydbm backend module is dbhash, which works both on posix and windows platforms. but dbm and dumbdbm databases create a pair of files having additional extensions. IMHO whichdb module may be used instead of os.path.exists() to see if the database exists, but there still remain problems with GetAllDatabaseNames() method. - there is a problem that seems to be a bug in win32 dbhash: if key+value length exceeds 400 characters, the database may become broken. posix system (FreeBSD 4.2) is not affected. the following interactive session demonstrates this: Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import anydbm >>> db =anydbm.open('test.db', 'c') >>> db['2'] ='*' *1000 >>> db['2'] ='*' >>> db.keys() ['22'] >>> best wishes, alex. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418315&group_id=6473 From noreply@sourceforge.net Mon Apr 23 19:42:05 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 23 Apr 2001 11:42:05 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418317 ] CDATA in XPath Message-ID: Bugs item #418317, was updated on 2001-04-23 11:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418317&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mike Olson (mikeolson) Assigned to: Nobody/Anonymous (nobody) Summary: CDATA in XPath Initial Comment: Email from Kemani Driss I have some elements whose data is enclosed in CDATA tags and I am finding that the Evaluate method is returning None for elements whose data is enclosed in CDATA tags. Works fine for other elements without CDATA tags. My XPath statement given to the Evaluate method is "child::Options/child::text()". Am I doing something wrong or is this a known feature/bug of 4XPath? thanks in advance. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418317&group_id=6473 From noreply@sourceforge.net Mon Apr 23 19:44:31 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 23 Apr 2001 11:44:31 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418319 ] more ODS issues Message-ID: Bugs item #418319, was updated on 2001-04-23 11:44 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418319&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mike Olson (mikeolson) Assigned to: Nobody/Anonymous (nobody) Summary: more ODS issues Initial Comment: Email from alexander smishlajev hello! _4odb_odmsdump.py and _4odb_odmsload.py use sys.argv instead of argv argument in Run() DbAdapter.getAllObjectIds() in Dbm adapter seems to be broken. following patch was locally applied: --- Adapter.py.orig Mon Feb 19 02:39:42 2001 +++ Adapter.py Fri Mar 09 22:17:39 2001 @@ -208,8 +208,8 @@ del objs[str(oid)] - def getAllObjectIds(self): - return map(lambda x:x[0],self.statements[CompiliedStatement.GET_OBJECT_IDS].query(db)) + def getAllObjectIds(self,db): + return map(int, db['objects'].keys()) _4odb_dig.py and _4odb_metadig.py both exit with usage info if len(argv) != 2 while corresponding .bat files always pass a list with 10 elements. OifParser fails with long integers. following patch helps (i am not sure if we can always return long integer): --- OifParser.py.orig Fri Feb 09 00:07:40 2001 +++ OifParser.py Fri Mar 09 22:51:05 2001 @@ -104,7 +104,10 @@ return float(literal[0][0]) elif literal[0]['TYPE'] == 'INTEGER_LITERAL': # could be decimal, octal or hexadecimal - return eval(literal[0][0]) + try: + return eval(literal[0][0]) + except OverflowError: + return eval(literal[0][0] +'L') elif literal[0]['TYPE'] == 'OBJECT_TAG': return literal[0][0] else: best wishes, alex. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418319&group_id=6473 From noreply@sourceforge.net Mon Apr 23 19:52:38 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 23 Apr 2001 11:52:38 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418323 ] RDF problem Message-ID: Bugs item #418323, was updated on 2001-04-23 11:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418323&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mike Olson (mikeolson) Assigned to: Nobody/Anonymous (nobody) Summary: RDF problem Initial Comment: Email from "Joern v. Kattchee" Hello, there seems to be a bug in 4RDF (4Suite-0.10.2). Parsing the following RDF results in these statements (p,s,o): rdf:type, http://test#this, ns:item rdf:type, http://test#anon-1, ns:prop rdf:type, http://test#anon-2, ns:item ns:prop, http://test#anon-2, http://foo/ ns:item, http://test#anon-1, http://test#anon-2 ns:prop, http://test#this, http://test#anon-1 but correct would IMHO be something like this: rdf:type, http://test#this, ns:item rdf:type, http://test#anon-2, ns:item ns:prop, http://test#anon-2, http://foo/ ns:prop, http://test#this, http://test#anon-2 4RDF somehow interprets the second 'item' as propertyElt, but it is a typedNode. Regards, Joern v. Kattchee ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418323&group_id=6473 From noreply@sourceforge.net Mon Apr 23 19:56:44 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 23 Apr 2001 11:56:44 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418324 ] [4XSLT] bug report and patch for complex Message-ID: Bugs item #418324, was updated on 2001-04-23 11:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418324&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mike Olson (mikeolson) Assigned to: Nobody/Anonymous (nobody) Summary: [4XSLT] bug report and patch for complex Initial Comment: Email from "Olivier CAYROL (Logilab)" Hello, I found a vicious bug in 4XSLT (hidden very deeply in the code). Attached to this message you will find a tar.gz file containing a directory tree that exhibits the bug. It is a little application for managing Easter rabbits and eggs distribution (!). There is an XML file that contains the data: easter_mng.xml, an XSL Transformation file: xsl/transf.xsl and XML files containing data for localization: lib/common.xml, lib/en.xml, lib/fr.xml. The lib/common.xml file is imported in the XSLT stylesheets with the 'document()' function and is used to insert language-dependant tags in the output. This common.xml file imports other XML files (one per language) with the classic external ENTITY mechanism of XML. When trying to transform the data file from the main directory with the following line command: 4xslt -Dlang=en easter_mng.xml xsl/transf.xsl , I got this exception: ... File "/usr/lib/python1.5/site-packages/xml/xslt/XsltFunctions.py", line 63, in Document doc = context.stylesheet._docReader.fromUri(uri, baseUri=baseUri) File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line 67, in fromUri rt = self.fromStream(stream, baseUri, ownerDoc, stripElements) File "/usr/lib/python1.5/site-packages/Ft/Lib/pDomlette.py", line 5 78, in fromStream raise FtException(Error.XML_PARSE_ERROR, p.ErrorLineNumber, p.Err orColumnNumber, expat.ErrorString(p.ErrorCode)) Ft.Lib.FtException: ('XML parse error at line 16, column 2: error i n processing external entity reference', (16, 2, 'error in processing external entity reference')) In fact, there is a problem when 4XSLT reads the XML document referenced in the 'document()' function: this XML file contains ENTITYs that import XML tree parts by giving local paths from the current document directory whereas in 4XSLT, the baseUri is always the URI of the initial XSLT. The XML reader is unable to find the external entities and the bug appears. Replacing line 67 of Ft.Lib.ReaderBase.py in DomletteReader.fromUri function: rt = self.fromStream(stream, baseUri, ownerDoc, stripElements) with : newBaseUri = urllib.basejoin(baseUri, uri) rt = self.fromStream(stream, newBaseUri, ownerDoc, stripElements) fixes the bug. I initially found the bug while trying to process Norman Walsh's XSLT stylesheets for turning docbook files in XSL formatting objects files (I am unfortunately not working for the Easter Rabbit). Regards, O. CAYROL. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418324&group_id=6473 From sskau@pchome.com.tw Tue Apr 24 01:51:03 2001 From: sskau@pchome.com.tw (sskau) Date: Tue, 24 Apr 2001 08:51:03 +0800 Subject: [XML-SIG] Persistence of HTMLDocument Message-ID: <20010424083936.C5B4.SSKAU@pchome.com.tw> Dear All: I used the PyXML 0.6.5 to parse some html files and then try to save this html document object to disk (using shelve). (The reason is that we could speed up the loading process without further parsing) I got the following exception: Traceback (innermost last): File "docper.py", line 18, in ? newdoc=3Ddocobj["doc"] File "e:\Python\Lib\shelve.py", line 65, in __getitem__ return Unpickler(f).load() TypeError: ('too many arguments; expected 1, got 2', , (None,)) The Source code is: =66rom xml.dom.ext.reader import HtmlLib =66rom xml.dom import ext import shelve stream =3D open("c:\\temp\\new.html") doc=3DHtmlLib.Reader().fromStream(stream) # persistence of HtmlDocument docobj =3D shelve.open("c:\\temp\\doc.obj") docobj["doc"]=3Ddoc docobj.close() stream.close() # reopen, load persistence document docobj =3D shelve.open("c:\\temp\\doc.obj") newdoc=3Ddocobj["doc"] ext.PrettyPrint(newdoc) Have anyone concern the persistence of paresed HtmlDocument DOM tree ? Thanks a lot Shi-Shiuan Kao ========================================================== PC home �K�O�q�l�H�c�A�ӽнЦ�: http://www.pchome.com.tw PC home Online ��a�x�@�@ �|��Ĥ@�A�x�W�̤j��J�f�� ========================================================== From jtauber@jtauber.com Tue Apr 24 03:45:00 2001 From: jtauber@jtauber.com (James Tauber) Date: Mon, 23 Apr 2001 22:45:00 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 Message-ID: <002601c0cc68$915a1940$20020a0a@EHUD> We just discovered that Redfoot (and its RDF parser) breaks in Python 2.1 because expat is now more restrictive in what can be a namespace_separator. Does anyone know why this change was made? James From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 07:07:55 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 08:07:55 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <002601c0cc68$915a1940$20020a0a@EHUD> (jtauber@jtauber.com) References: <002601c0cc68$915a1940$20020a0a@EHUD> Message-ID: <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> > We just discovered that Redfoot (and its RDF parser) breaks in Python 2.1 > because expat is now more restrictive in what can be a namespace_separator. > Does anyone know why this change was made? I'm not aware of any change to that respect. pyexpat passes the namespace_separator as-is to Expat. What version of expat are you using? Regards, Martin From ssaky@kldp.org Tue Apr 24 07:17:28 2001 From: ssaky@kldp.org (ssaky@kldp.org) Date: Tue, 24 Apr 2001 15:17:28 +0900 Subject: [XML-SIG] about Python/XML HOWTO... Message-ID: <20010424151728.A21831@ssaky.jerimo.org> Hi. Sorry for my rough english. Could you mind if i translate your "Python/XML HOWTO" into korean, and post it to http://kldp.org (Korean Linux Document Project) and Python web site in Korean (http://www.python.or.kr) ? Thanks you. -- God Bless You~ =) http://ssaky.jerimo.org http://glory.python.or.kr From eikeon@eikeon.com Tue Apr 24 08:08:30 2001 From: eikeon@eikeon.com (Daniel "eikeon" Krech) Date: Tue, 24 Apr 2001 03:08:30 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> Message-ID: <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> > > We just discovered that Redfoot (and its RDF parser) breaks in Python 2.1 > > because expat is now more restrictive in what can be a namespace_separator. > > Does anyone know why this change was made? > > I'm not aware of any change to that respect. pyexpat passes the > namespace_separator as-is to Expat. What version of expat are you > using? > We are using the one that came with the Python2.1 final (source) release. There is a test in Python-2.1/Lib/test/test_pyexpat.py at the end that is testing for 'too short'. We have been using a namespace separator of an empty string with no problems using Python2.0 and earlier. It does appear to be a restriction imposed by the underlying pyexpat library... I am curious what motivated this restriction as it seems like it should be a valid thing for an application to want to do. -eikeon From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 13:23:41 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 14:23:41 +0200 Subject: [XML-SIG] about Python/XML HOWTO... In-Reply-To: <20010424151728.A21831@ssaky.jerimo.org> References: <20010424151728.A21831@ssaky.jerimo.org> Message-ID: <200104241223.f3OCNfv00849@mira.informatik.hu-berlin.de> > Could you mind if i translate your "Python/XML HOWTO" into korean, > and post it to http://kldp.org (Korean Linux Document Project) and > Python web site in Korean (http://www.python.or.kr) ? This would be very good. When you are done, please let us know so we can make appropriate links. Also, I recommend that you remember the CVS revision of the original document in your translation, so you can find out better what has been changed later. As you probably know, the original document is doc/xml-howto.tex. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 13:26:11 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 14:26:11 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> (eikeon@eikeon.com) References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> Message-ID: <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> > > I'm not aware of any change to that respect. pyexpat passes the > > namespace_separator as-is to Expat. What version of expat are you > > using? > > We are using the one that came with the Python2.1 final (source) release. There is no copy of Expat included in the source release of Python 2.1, so you must have got it from somewhere else... > There is a test in Python-2.1/Lib/test/test_pyexpat.py at the end that is > testing for 'too short'. We have been using a namespace separator of an > empty string with no problems using Python2.0 and earlier. It does appear to > be a restriction imposed by the underlying pyexpat library... I am curious > what motivated this restriction as it seems like it should be a valid thing > for an application to want to do. So am I, but that is hard to answer until we find out what the exact version of Expat is that you are using (expat is the underlying library, not pyexpat), and until you can provide a test case that demonstrates the problem... Regards, Martin From fdrake@acm.org Tue Apr 24 15:01:24 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Apr 2001 10:01:24 -0400 (EDT) Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> Message-ID: <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > There is no copy of Expat included in the source release of Python > 2.1, so you must have got it from somewhere else... This is an issue with the bindings; the Expat version doesn't enter into this. > So am I, but that is hard to answer until we find out what the exact > version of Expat is that you are using (expat is the underlying > library, not pyexpat), and until you can provide a test case that > demonstrates the problem... I added the check that a string passed as the namespace_separator would be either omitted, None, or of length 1. I don't understand why you would want it to be of length 0 -- were you expecting parsing without namespaces? Is there a reason to pass namespace_separator as an empty string rather than None, or just omit it? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From eikeon@eikeon.com Tue Apr 24 15:17:44 2001 From: eikeon@eikeon.com (Daniel "eikeon" Krech) Date: Tue, 24 Apr 2001 10:17:44 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD><200104240607.f3O67tR00949@mira.informatik.hu-berlin.de><02c501c0cc8d$5e0a6b10$0401a8c0@VAIO><200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> Message-ID: <002001c0ccc9$54e28cc0$0401a8c0@VAIO> > I added the check that a string passed as the namespace_separator > would be either omitted, None, or of length 1. I don't understand why > you would want it to be of length 0 -- were you expecting parsing > without namespaces? Is there a reason to pass namespace_separator as > an empty string rather than None, or just omit it? If we pass in None or omit it ends up parsing without namespaces, or at least the names we are getting are of the form prefix:localname. We do want namespaces, just with no separator between the namespace and the local name. [For in our application (of the parser) we would like to receive names in the form foobar. Passing in None yields p:bar] -eikeon From fdrake@acm.org Tue Apr 24 15:31:05 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Apr 2001 10:31:05 -0400 (EDT) Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <002001c0ccc9$54e28cc0$0401a8c0@VAIO> References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> Message-ID: <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> "Daniel \"eikeon\" Krech" writes: > If we pass in None or omit it ends up parsing without namespaces, or at > least the names we are getting are of the form prefix:localname. We do want > namespaces, just with no separator between the namespace and the local name. > [For in our application (of the parser) we would > like to receive names in the form foobar. Passing in None yields p:bar] This sounds like it would be ambiguous -- consider: would give you the elements batfoo and batfoo. Admittedly, this is a contrived example, and probably pretty unusually in practice. But it definately shows that using '' as the separator is not workable in the general case -- you have to know that your input is reasonable. I can easily imagine somethng that generated artificial namespaces coming up with XML that will be problematic for this. I'm not particularly opposed to opening up the bindings to support using '', though. If that's the SIG's concensus, I'll be glad to do it. Backward compatibility is a nice quality! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From eikeon@eikeon.com Tue Apr 24 16:41:53 2001 From: eikeon@eikeon.com (Daniel "eikeon" Krech) Date: Tue, 24 Apr 2001 11:41:53 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD><200104240607.f3O67tR00949@mira.informatik.hu-berlin.de><02c501c0cc8d$5e0a6b10$0401a8c0@VAIO><200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de><15077.34612.333942.402686@cj42289-a.reston1.va.home.com><002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> Message-ID: <001a01c0ccd5$f2f385c0$f3fc0a0a@bowstreet.com> > This sounds like it would be ambiguous -- consider: > > > > > > would give you the elements batfoo and batfoo. With no separator (aka '') there is some information loss, as you no longer know what part of the name was contributed by the namespace and what part by the local name. For our application of the parser we currently do not use this piece of information and so the information only becomes a burden on us. Even with the ambiguity its seems fairly heavy handed to enforce at such a low level. Often when the degenerate boundary cases are handled in a consistent manor... it is a good thing. What are the benifits of being heavy handed at the lower level? IMHO, it would be really nice if batfoo and batfoo where the same. Does anyone know if this is or is not the case? Or is it application specific as to wether or not they are? -eikeon From loewis@informatik.hu-berlin.de Tue Apr 24 17:13:06 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 24 Apr 2001 18:13:06 +0200 (MEST) Subject: [XML-SIG] PyXML 0.6.5 for Python 2.1 Message-ID: <200104241613.SAA01273@pandora.informatik.hu-berlin.de> I have rebuilt PyXML 0.6.5 with the Windows installer for Python 2.1, and put that package on SourceForge for download at http://prdownloads.sourceforge.net/pyxml/PyXML-0.6.5.win32-py2.1.exe Regards, Martin P.S. It seems SF is constantly changing hostnames... From rsalz@zolera.com Tue Apr 24 17:38:08 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 24 Apr 2001 12:38:08 -0400 Subject: [XML-SIG] XML Schema Message-ID: <200104241638.MAA19329@os390.zolera.com> Is anyone doing anything with XML Schema? My interest is in WSDL and SOAP ("web services"), so full schema -- in particular the subtyping, restrictions, et al. -- isn't totally necessary. If anyone's given thought to what an internal parsed representation might look like (and no, a set of domNodes doesn't count), I'd like to chat. /r$ From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 17:41:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 18:41:16 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> Message-ID: <200104241641.f3OGfGW01442@mira.informatik.hu-berlin.de> > This is an issue with the bindings; the Expat version doesn't enter > into this. Sorry for the confusion. I just missed the check you put in somehow. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 17:45:36 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 18:45:36 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> Message-ID: <200104241645.f3OGjaS01465@mira.informatik.hu-berlin.de> > I'm not particularly opposed to opening up the bindings to support > using '', though. If that's the SIG's concensus, I'll be glad to do > it. Backward compatibility is a nice quality! Well, I'm in favour of backwards-compatibility, then. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 17:48:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 18:48:38 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <001a01c0ccd5$f2f385c0$f3fc0a0a@bowstreet.com> (eikeon@eikeon.com) References: <002601c0cc68$915a1940$20020a0a@EHUD><200104240607.f3O67tR00949@mira.informatik.hu-berlin.de><02c501c0cc8d$5e0a6b10$0401a8c0@VAIO><200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de><15077.34612.333942.402686@cj42289-a.reston1.va.home.com><002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <001a01c0ccd5$f2f385c0$f3fc0a0a@bowstreet.com> Message-ID: <200104241648.f3OGmck01467@mira.informatik.hu-berlin.de> > > > > > > > > > > would give you the elements batfoo and batfoo. [...] > IMHO, it would be really nice if batfoo and batfoo where the same. Does > anyone know if this is or is not the case? Or is it application specific as > to wether or not they are? They are certainly not, in XML! One is the foo element in the bat namespace, the other is the tfoo element in the ba namespace. Of course, on the application level, you could always treat apples and oranges as the same elements, if you want to. Regards, Martin From larsga@garshol.priv.no Tue Apr 24 19:01:02 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 24 Apr 2001 20:01:02 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> Message-ID: * Fred L. Drake, Jr. | | I'm not particularly opposed to opening up the bindings to support | using '', though. I'm not very happy with the idea. This sounds very much like selling a gun with a bent muzzle. It may be nice for firing around corners, but it does seem to me to increase the chances of accidentally removing one or more feet. I see the potential problems with this. What I do not see is the potential benefits. Can someone suggest any? --Lars M. From martin@loewis.home.cs.tu-berlin.de Tue Apr 24 22:22:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 24 Apr 2001 23:22:49 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: (message from Lars Marius Garshol on 24 Apr 2001 20:01:02 +0200) References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> Message-ID: <200104242122.f3OLMnK11789@mira.informatik.hu-berlin.de> > I see the potential problems with this. What I do not see is the > potential benefits. Can someone suggest any? Backwards compatibility. Code that used to work still does work - even if you think it is stupid code. Happy developers, happy users. Regards, Martin From tpassin@home.com Tue Apr 24 23:07:11 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 24 Apr 2001 18:07:11 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD><200104240607.f3O67tR00949@mira.informatik.hu-berlin.de><02c501c0cc8d$5e0a6b10$0401a8c0@VAIO><200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de><15077.34612.333942.402686@cj42289-a.reston1.va.home.com><002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <001a01c0ccd5$f2f385c0$f3fc0a0a@bowstreet.com> Message-ID: <001d01c0cd0a$e9b823a0$7cac1218@reston1.va.home.com> Daniel "eikeon" Krech" asked - > > IMHO, it would be really nice if batfoo and batfoo where the same. Does > anyone know if this is or is not the case? Or is it application specific as > to wether or not they are? > Definitely completely different from an xml-namespaces point of view. I'm not sure how or why your code uses the concatenated names, but I'd suggest they might lead to problems down the road, if yu want to start using other tools or interfacing with other systems. You couldn't count on any random parser supporting your way, if you had an urge to start using some other parser. Cheers, Tom P From jtauber@jtauber.com Tue Apr 24 23:45:57 2001 From: jtauber@jtauber.com (James Tauber) Date: Tue, 24 Apr 2001 18:45:57 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> Message-ID: <003601c0cd10$57870090$ed020a0a@bowstreet.com> The key issue that eikeon forgot to mention is that this is for RDF where namespaceURI+localName are concatenated to form property URIs. Without the ability to use '' we have to do a split+join on every element coming back from expat. James > | I'm not particularly opposed to opening up the bindings to support > | using '', though. > > I'm not very happy with the idea. This sounds very much like selling a > gun with a bent muzzle. It may be nice for firing around corners, but > it does seem to me to increase the chances of accidentally removing > one or more feet. > > I see the potential problems with this. What I do not see is the > potential benefits. Can someone suggest any? From tpassin@home.com Wed Apr 25 00:04:52 2001 From: tpassin@home.com (Thomas B. Passin) Date: Tue, 24 Apr 2001 19:04:52 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> Message-ID: <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> James Tauber said - > The key issue that eikeon forgot to mention is that this is for RDF where > namespaceURI+localName are concatenated to form property URIs. Without the > ability to use '' we have to do a split+join on every element coming back > from expat. > Aren't those two parts usually joined with a "#" symbol? Ayway, if you did go to a split/join, I bet it wouldn't slow you down hardly at all, since the rest of the RDF parsing must take a lot more time. Cheers, Tom P From jtauber@jtauber.com Wed Apr 25 00:58:51 2001 From: jtauber@jtauber.com (James Tauber) Date: Tue, 24 Apr 2001 19:58:51 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> Message-ID: <001201c0cd1a$86429c50$6401a8c0@EHUD> > > The key issue that eikeon forgot to mention is that this is for RDF where > > namespaceURI+localName are concatenated to form property URIs. Without the > > ability to use '' we have to do a split+join on every element coming back > > from expat. > > > Aren't those two parts usually joined with a "#" symbol? Nope. The # ends up being at the end of the namespace, it isn't a separator. That is done sometimes so that the URI can refer to an actual element in an XML document (via its ID) but this don't not have to be the case. You can have a property http://foo.com/bar which could be serialized any of ... ... ... Now, I don't like this but it's the way it works. James From rsalz@zolera.com Wed Apr 25 01:56:27 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 24 Apr 2001 20:56:27 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> <001201c0cd1a$86429c50$6401a8c0@EHUD> Message-ID: <3AE620BB.9FB13F18@zolera.com> > ... > ... > ... Hunh? Can you point to a spec that shows this? /r$ From jtauber@jtauber.com Wed Apr 25 02:48:13 2001 From: jtauber@jtauber.com (James Tauber) Date: Tue, 24 Apr 2001 21:48:13 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> <001201c0cd1a$86429c50$6401a8c0@EHUD> <3AE620BB.9FB13F18@zolera.com> Message-ID: <002301c0cd29$caa4b900$6401a8c0@EHUD> You cut that part out of my email. It's RDF. In particular see http://www.w3.org/TR/REC-rdf-syntax/ in section 6: "This expansion is generated by concatenating the namespace name given in the namespace declaration with the LocalPart of the qualified name" ----- Original Message ----- From: "Rich Salz" To: "James Tauber" Cc: Sent: Tuesday, April 24, 2001 8:56 PM Subject: Re: [XML-SIG] expat namespace_separator in Python 2.1 > > ... > > ... > > ... > > Hunh? Can you point to a spec that shows this? > /r$ From rsalz@zolera.com Wed Apr 25 03:27:48 2001 From: rsalz@zolera.com (Rich Salz) Date: Tue, 24 Apr 2001 22:27:48 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> <001201c0cd1a$86429c50$6401a8c0@EHUD> <3AE620BB.9FB13F18@zolera.com> <002301c0cd29$caa4b900$6401a8c0@EHUD> Message-ID: <3AE63624.9C763305@zolera.com> > You cut that part out of my email. It's RDF. Oh, it wasn't clear to me from your note. Thanks. From jtauber@jtauber.com Wed Apr 25 03:29:00 2001 From: jtauber@jtauber.com (James Tauber) Date: Tue, 24 Apr 2001 22:29:00 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> <001201c0cd1a$86429c50$6401a8c0@EHUD> <3AE620BB.9FB13F18@zolera.com> <002301c0cd29$caa4b900$6401a8c0@EHUD> <3AE63624.9C763305@zolera.com> Message-ID: <003001c0cd2f$80210ae0$6401a8c0@EHUD> Sorry I wasn't more clear. It is only RDF that works this way. It is unusual but internally consistent. James > > You cut that part out of my email. It's RDF. > > Oh, it wasn't clear to me from your note. > > Thanks. From uche.ogbuji@fourthought.com Wed Apr 25 03:57:56 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 24 Apr 2001 20:57:56 -0600 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: Message from Rich Salz of "Tue, 24 Apr 2001 20:56:27 EDT." <3AE620BB.9FB13F18@zolera.com> Message-ID: <200104250258.f3P2vu814912@borgia.local> > > ... > > ... > > ... > > Hunh? Can you point to a spec that shows this? James is quite right. The offending spec is RDF. Dr. Jonathan Borden has been carrying on a spirited campaign to get this fixed, but as of now, that's the way it is, and the former permissiveness of expat NS separators will need to be restored for proper RDF support, which is important enough, IMO, to warrant the sale of guns with bent muzzles. ;-) -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From jtauber@jtauber.com Wed Apr 25 04:10:37 2001 From: jtauber@jtauber.com (James Tauber) Date: Tue, 24 Apr 2001 23:10:37 -0400 Subject: [XML-SIG] expat namespace_separator in Python 2.1 References: <200104250258.f3P2vu814912@borgia.local> Message-ID: <000d01c0cd35$5078f900$ed020a0a@bowstreet.com> Actually, the whole RDF serialization needs overhauling. Yet another reason for a standard RDF parser / serializer to hide people from this stuff :-) James > James is quite right. The offending spec is RDF. Dr. Jonathan Borden has > been carrying on a spirited campaign to get this fixed, but as of now, that's > the way it is, and the former permissiveness of expat NS separators will need > to be restored for proper RDF support, which is important enough, IMO, to > warrant the sale of guns with bent muzzles. From noreply@sourceforge.net Wed Apr 25 13:26:27 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 25 Apr 2001 05:26:27 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418808 ] [4XSLT] Bug in precise/general match Message-ID: Bugs item #418808, was updated on 2001-04-25 05:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418808&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Logilab (ornicar) Assigned to: Nobody/Anonymous (nobody) Summary: [4XSLT] Bug in precise/general match Initial Comment: Hello, I was writing an XSL stylesheet for processing docbook XML files when I discovered a very strange behaviour of 4XSLT that looks like a bug. Consider the following XML file (testbug.xml):

This is a paragraph. Title This is a formal paragraph.

And the following XSLT file (testbug1.xslt):

: Processing the XML file with the XSLT file gives the following result:

This is a paragraph.

Title: This is a formal paragraph

You have noticed that the template for processing a node is different if the node is child of node or not. 4XSLT chooses the template with the most precise matching condition that matches the current node (and this is what is written in the XSLT norm). Now, consider the following XSLT file (testbug2.xslt):

: You have noticed that the only modification I did was to permute the "para" template and the "formalpara/para" template. Processing the same XML file with this new XSLT file outputs the following result:

This is a paragraph.

Title:

This is a formal paragraph.

Now, 4XSLT doesn't use the "formalpara" template for the node child of but the "para" template (thus we have nested

nodes in the output). I feel that instead of choosing the template with the most precise matching condition that matches the current node, 4XSLT chooses the first one. Therefore, my two XSLT that should have the same behaviour, output two different results. Regards, O. CAYROL. PS: I tried xalan on these examples and I got the same result for each XSLT (just as I expected). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418808&group_id=6473 From larsga@garshol.priv.no Wed Apr 25 07:23:52 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 25 Apr 2001 08:23:52 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <003601c0cd10$57870090$ed020a0a@bowstreet.com> References: <002601c0cc68$915a1940$20020a0a@EHUD> <200104240607.f3O67tR00949@mira.informatik.hu-berlin.de> <02c501c0cc8d$5e0a6b10$0401a8c0@VAIO> <200104241226.f3OCQBw00851@mira.informatik.hu-berlin.de> <15077.34612.333942.402686@cj42289-a.reston1.va.home.com> <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> Message-ID: * James Tauber | | The key issue that eikeon forgot to mention is that this is for RDF | where namespaceURI+localName are concatenated to form property | URIs. Without the ability to use '' we have to do a split+join on | every element coming back from expat. Well, if expat used to support this and it's useful for RDF I guess we should do it. Sorry I was so slow in the uptake here. --Lars M. From uche.ogbuji@fourthought.com Wed Apr 25 20:52:05 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 25 Apr 2001 13:52:05 -0600 Subject: [XML-SIG] Checking to see whether the list is down again Message-ID: <200104251952.f3PJq5m20145@borgia.local.dhcp.fourthought.com> If everyone gets it, I guess it ain't. --Uche From fdrake@acm.org Wed Apr 25 20:59:28 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 15:59:28 -0400 (EDT) Subject: [XML-SIG] Checking to see whether the list is down again In-Reply-To: <200104251952.f3PJq5m20145@borgia.local.dhcp.fourthought.com> References: <200104251952.f3PJq5m20145@borgia.local.dhcp.fourthought.com> Message-ID: <15079.11424.15199.976834@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > If everyone gets it, I guess it ain't. There was an Exim configuration bug on python.org & zope.org; mail should be flooding through there about now... ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Wed Apr 25 14:50:40 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 09:50:40 -0400 (EDT) Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <200104250258.f3P2vu814912@borgia.local> References: <3AE620BB.9FB13F18@zolera.com> <200104250258.f3P2vu814912@borgia.local> Message-ID: <15078.54832.157396.84051@cj42289-a.reston1.va.home.com> Uche Ogbuji writes: > James is quite right. The offending spec is RDF. Dr. Jonathan > Borden has been carrying on a spirited campaign to get this fixed, > but as of now, that's the way it is, and the former permissiveness > of expat NS separators will need to be restored for proper RDF > support, which is important enough, IMO, to warrant the sale of > guns with bent muzzles. OK, I'll make the adjustment to pyexpat.c, but I think it's bad to perpetuate such vile brokeness. Whether or not its bad enough to use guns with straight muzzles, well, I guess we'll all have to decide for ourselves. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From noreply@sourceforge.net Wed Apr 25 23:52:40 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 25 Apr 2001 15:52:40 -0700 Subject: [XML-SIG] [ pyxml-Bugs-418986 ] Cant find /usr/local/bin/python RedHat7 Message-ID: Bugs item #418986, was updated on 2001-04-25 15:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418986&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Richard Hebert (hebertrich) Assigned to: Nobody/Anonymous (nobody) Summary: Cant find /usr/local/bin/python RedHat7 Initial Comment: Well to sum up i have a recent official RH 7 disk Linux install. The packages are all where RH installs them. Python has been installed at the same time RH Linux was It has been updated through the update agent and showed no bugs. I tried to install PyXML with GnomeRPM and it plain dont work It says (retriple checking ) PyXML requires /usr/local/bin/python of course ...python is in /usr/bin This has been reproduced on two machines other than mine. Is there a workaround or patch for this ? Any help will be highly appreciated. Richard Hebert hebertrich@operamail you can e mail me directly if you need more details. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=418986&group_id=6473 From mbennett@ideaeng.com Thu Apr 26 00:43:06 2001 From: mbennett@ideaeng.com (Mark Bennett) Date: Wed, 25 Apr 2001 16:43:06 -0700 Subject: [XML-SIG] Exception handling with xml.dom.ext.reader.HtmlLib Message-ID: I was thrilled to see sample python scripts for parsing HTML. The library seems to handle lots of common mistakes like unbalanced tags, etc, things that most XML parsers will reject by design. By it's nature, HTML is rarely in proper XML format. But I've hit a couple snags with the library and I was wondering if anybody had any ideas? * There are some classes of common HTML mistakes that it doesn't handle, like unbalanced quotes. As in , the second form gives a stack dump. * When it does crash it doesn't give you any information about the source file, like what line it was looking at. Such info would be helpful. * Though I don't know the exact cause, it doesn't handle pages like http://www.cnn.com I'm not a parsing expert, but I'd be happy to contribute to any efforts to make the parser more robust. Processing existing (poorly formed) HTML is the 800 pound gorilla for lots of XML applications. This library does go a long way. Thanks, Mark From Mike.Olson@fourthought.com Thu Apr 26 03:52:06 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 25 Apr 2001 20:52:06 -0600 Subject: [XML-SIG] 4Suite and 4SuiteServer 0.11 alpha announcement Message-ID: <3AE78D56.869210D@FourThought.com> Well, friends, thanks for all your patience. Finally, we are ready to release 4Suite and 4Suite Server 0.11, and this is to announce an alpha package of the product. The alpha test is available from 4Suite.org: no registration required. See http://4suite.org/download.html So why the looooong delay for this release? We have taken all the lessons learned from developing, deploying and using 4SS so far, huddled up for an intensive few weeks, and emerged with a complete re-factoring of the architecture. The most important change is that now the heart of 4SS, which is simply a set of Python APIs for managing and processing XML in a server-oriented framework, is directly exposed, rather than hidden through CORBA. This means that all the heavy requirements and complex installation are a thing of the past. You can still use CORBA and all that, but it is just another integration protocol that sits beside HTTP, SOAP, WebDAV, etc. It is not required. PostgresQL or other RDBMS is not required. In fact, all that is required now is Python and 4Suite. It's quick to install, and there is a great boost in performance. The architecture of the repository is now much more "Pythonic" and easy to grasp. And we have also spent a good deal of time on docs. Starting at http://4Suite.org/topics/rdf/Top/Documentation/GeneralProgrammersGuide you will find a comprehensive, step-by-step introduction to 4SS 0.11. For the now very simple installation, see http://stage.4Suite.org/documents/guides/4SuiteServer/UNIX_Installation (should be applicable to Windows users as well but we'll get the Windows installation docs back on line before final release). Note that the 4Suite 0.11 alpha is a prerequisite. Pretty much all the features of 4SS 0.10.2 will be in the 0.11 final release, and all but a few are fully implemented in this alpha. See http://4Suite.org/topics/rdf/Top/Documentation/features.html For a listing of features in this release. Please help us shake out this Alpha for bugs, or give 4SS a try if you've been shy of doing so because of its complexity. We have set a pretty firm final release date of 30 April (next Monday), so quick bug reports and suggestions would be ideal. Note that we have been testing it a great deal internally, and all our own 4SS apps have already been ported to and tested on the new architecture, so you needn't feel like a guinea pig. Thanks. The Fourthought guys -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Thu Apr 26 03:58:25 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Wed, 25 Apr 2001 20:58:25 -0600 Subject: [XML-SIG] Re: [4suite] 4Suite and 4SuiteServer 0.11 alpha announcement References: <3AE78D56.869210D@FourThought.com> Message-ID: <3AE78ED1.CE25170A@FourThought.com> Mike Olson wrote: > > http://4Suite.org/topics/rdf/Top/Documentation/GeneralProgrammersGuide Sorry everyone, I sent a bad link in the first message, here is the correct one. http://4suite.org/documents/guides/4SuiteServer/GeneralProgrammersGuide Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Juergen Hermann" Message-ID: On Tue, 24 Apr 2001 20:57:56 -0600, Uche Ogbuji wrote: >the way it is, and the former permissiveness of expat NS separators wil= l need >to be restored for proper RDF support, which is important enough, IMO, = to >warrant the sale of guns with bent muzzles. Maybe the bend-o-matic could get a safety switch? Ciao, J=FCrgen From martin@loewis.home.cs.tu-berlin.de Thu Apr 26 09:42:48 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 26 Apr 2001 10:42:48 +0200 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: (jh@web.de) References: Message-ID: <200104260842.f3Q8gmv01500@mira.informatik.hu-berlin.de> > Maybe the bend-o-matic could get a safety switch? Very difficult. We could add a warning, using the Python 2.1 warning framework, but that would probably not satisfy the users relying on the feature. So the best we can do is to document that an empty separator is allowed but not recommended. Regards, Martin From pyxml@xhaus.com Thu Apr 26 10:04:45 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Thu, 26 Apr 2001 10:04:45 +0100 Subject: [XML-SIG] Exception handling with xml.dom.ext.reader.HtmlLib In-Reply-To: Message-ID: Mark, I'm currently working on an application that needs to parse HTML, and have looked at htmllib as a way to do it. However, htmllib seems to only parse HTML 2.0, and as you have pointed out, is not very tolerant of the structural errors that typify a lot of HTML pages. One avenue I'm currently investigating is to use Dave Raggetts TIDY program, which takes a messy HTML file and outputs a cleaned up version, i.e. tags rebalanced, attributes quoted, etc, etc. It also has some support for XML and XHTML. While this support is not complete, it is very good. You can find tidy at http://www.w3.org/People/Raggett/tidy/ This program is written in C, so it should be possible to use it directly from Python. The documentation for Tidy mentions that someone has done a SWIG interface for it. There is also a Java version, which could be used from Jython fairly easily. I had a look at the code to see if it might be feasible to insert some hooks into it to turn it into a generator of SAX events, but the code is quite messy, and the printing/output works at a character and buffer level, not an element and attribute level. Alan. From eliot@isogen.com Thu Apr 26 21:39:28 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 26 Apr 2001 15:39:28 -0500 Subject: [XML-SIG] Processing of External Unparsed Entities Message-ID: <3AE88780.8A08A119@isogen.com> The code for processing external unparsed entities appears to be broken in the current code, both in DOM1 and DOM2 (I am currently using DOM1 to do development). Has anyone exercised this or tested it? Should it be working? I'm looking at the latest code off the SourceForge CVS tree. I've tried to trace through and get it to work, but I got as far as the unparsedEntityDecl() method of XmlDomGenerator (in Sax.py). It was being called with too many arguments, and the code as written didn't have a "name" attribute, which an entity declaration obviously must have. Fixing that, I then discovered that the DOM implementation object was an HTMLDOMImplementation, which doesn't seem right (but maybe it's my error)? Anyway, at that point, the call to createEntity() failed because HTMLDOMImplementation has no such method. I have only a dim understanding of how the Python DOM code works, but I'm happy to take a stab at fixing it as I must be able to processing external data entities in order to implement HyTime address resolution in a DOM processing context. Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Thu Apr 26 23:06:40 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 26 Apr 2001 17:06:40 -0500 Subject: [XML-SIG] Processing of External Unparsed Entities References: <3AE88780.8A08A119@isogen.com> Message-ID: <3AE89BF0.E4B1D86F@isogen.com> "W. Eliot Kimber" wrote: > > The code for processing external unparsed entities appears to be broken > in the current code, both in DOM1 and DOM2 (I am currently using DOM1 to > do development). Has anyone exercised this or tested it? Should it be > working? I'm looking at the latest code off the SourceForge CVS tree. Ok, I got the entity creation to work--there were several things that clearly had never been tested because they couldn't have possibly ever worked--for shame. Can someone point me to info on how to run the regression tests? Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Fri Apr 27 02:16:18 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Thu, 26 Apr 2001 20:16:18 -0500 Subject: [XML-SIG] Processing of External Unparsed Entities References: <3AE88780.8A08A119@isogen.com> Message-ID: <3AE8C862.41F12F26@isogen.com> "W. Eliot Kimber" wrote: > > The code for processing external unparsed entities appears to be broken > in the current code, both in DOM1 and DOM2 (I am currently using DOM1 to > do development). Has anyone exercised this or tested it? Should it be > working? I'm looking at the latest code off the SourceForge CVS tree. I had to do more work to get all the entity and notation processing for unparsed entities working correctly with the Sax2 parser and drivers. (I haven't tried to fix the Sax 1 stuff because it doesn't correctly handle resolving relative system ID paths, so I can't use it anyway.) I also had to fix attribute lookup using getNamedItem on Element. I've touched the following files: dom/DOMImplementation.py dom/Document.py dom/DocumentType.py dom/Element.py dom/Entity.py dom/ext/reader/Sax2.py The problems I found were: - Orphaned unparsed entity and notation declarations were not getting properly set with the owning document. - Added createEntity to XmlDomGenerator (it was being called from Sax driver but hadn't been implemented) - Unparsed entities were not be constructed with a useful nodeName value. - lookup of attributes on NamedNodeList failed when using getAttribute() because in Sax2 the keys are (Uri, name) but the code was only passing in name. I hacked it by passing in (None, name) from getAttribute on Element. I have no idea if any of these fixes are appropriate, I just hacked it until my document parsed and my particular application got the data it wanted. Please let me know how I should proceed with testing these changes and contributing them back. Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Fri Apr 27 16:16:46 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 10:16:46 -0500 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource Message-ID: <3AE98D5E.477205D1@isogen.com> I need to be able to get from a DOM the filename or URI of the file it was constructed from (if it was constructed from a file). I didn't see anything in the code that preserved this data, so I've hacked my local copy of the code to add an optional "dataSource" parameter to the FromFile/URI/Stream methods that propogates to the Sax reader. This is set as the friend property _dataSource on the resulting DOM node. Is this an appropriate solution? I need this because uparsed entity resolution does not preserve the absolute path of the entities, only the system ID as specified in the declaration. As I am using relative paths, I must have the path of the declaring document in order to be able to construct new DOMs from referenced XML document entities. Unfortunately, the DOM spec does not provide a property on Entity to hold the fully-resolved system ID so I think my approach is the only solution. My business problem is processing a hyperdocument consisting of many subordinate documents, where each document is declared as an unparsed entity with a relative path (this hyperdocument is generated by an automatic process and the intent is for the resulting package of documents to be self contained so that it can be packaged and moved without the need to rework any external identifiers or catalog files). Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Fri Apr 27 16:35:58 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 10:35:58 -0500 Subject: [XML-SIG] Question About PCDATA Content Message-ID: <3AE991DE.6C2B9F87@isogen.com> I have a question which may be a basic XML question, but I think is an implementation question: I have this data in my document: myname However, the text node that is the first child of the Element node constructed from the name element returns ' myname' (note the leading newline) as the value of the data property of the text node. Why is there that leading newline? It's not in the data anywhere and there are certainly no rules for adding newlines to element content or moving them around as there are in SGML. Is this an implementation bug or have I forgotten something essential about XML? Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From Alexandre.Fayolle@logilab.fr Fri Apr 27 16:49:25 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 27 Apr 2001 17:49:25 +0200 (CEST) Subject: [XML-SIG] Question About PCDATA Content In-Reply-To: <3AE991DE.6C2B9F87@isogen.com> Message-ID: On Fri, 27 Apr 2001, W. Eliot Kimber wrote: > I have a question which may be a basic XML question, but I think is an > implementation question: I cannot reproduce your problem using 4Suite 0.11 (and the version of 4DOM that goes with it) with python 1.52 on a linux machine: >>> s= 'myname' >>> from xml.dom.ext.reader import Sax2 >>> d = Sax2.FromXml(s) >>> d.documentElement.firstChild.data 'myname' Could you send a code sample and tell us what your platform is? Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From larsga@garshol.priv.no Fri Apr 27 16:50:29 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 27 Apr 2001 17:50:29 +0200 Subject: [XML-SIG] Question About PCDATA Content In-Reply-To: <3AE991DE.6C2B9F87@isogen.com> References: <3AE991DE.6C2B9F87@isogen.com> Message-ID: * W. Eliot Kimber | | I have a question which may be a basic XML question, but I think is an | implementation question: | | I have this data in my document: | | myname | | However, the text node that is the first child of the Element node | constructed from the name element returns | | ' | myname' This is a bug. There's nothing in XML that allows this to happen. --Lars M. From eliot@isogen.com Fri Apr 27 17:04:49 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 11:04:49 -0500 Subject: [XML-SIG] Question About PCDATA Content References: Message-ID: <3AE998A1.2B9941C5@isogen.com> Alexandre Fayolle wrote: > > On Fri, 27 Apr 2001, W. Eliot Kimber wrote: > > > I have a question which may be a basic XML question, but I think is an > > implementation question: > > I cannot reproduce your problem using 4Suite 0.11 (and the version of > 4DOM that goes with it) with python 1.52 on a linux machine: I'm running under Win2K, using the latest code from CVS as of yesterday noon CDT with Linux 1.5.2 Thanks, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From Alexandre.Fayolle@logilab.fr Fri Apr 27 17:25:00 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 27 Apr 2001 18:25:00 +0200 (CEST) Subject: [XML-SIG] Question About PCDATA Content (fwd) Message-ID: I'm forwarding this to the list, since there may be people there who are interested. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). ---------- Forwarded message ---------- Date: Fri, 27 Apr 2001 11:08:11 -0500 From: W. Eliot Kimber To: Alexandre Fayolle Subject: Re: [XML-SIG] Question About PCDATA Content Alexandre Fayolle wrote: > > Could you send a code sample and tell us what your platform is? Here is the document I'm processing: ]> {Not Available in CORBA API} #DEFAULT {Not Available in CORBA API} I get ' #DEFAULT' from the name element. Thanks, Eliot -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 18:10:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 19:10:31 +0200 Subject: [XML-SIG] Processing of External Unparsed Entities In-Reply-To: <3AE8C862.41F12F26@isogen.com> (eliot@isogen.com) References: <3AE88780.8A08A119@isogen.com> <3AE8C862.41F12F26@isogen.com> Message-ID: <200104271710.f3RHAV801062@mira.informatik.hu-berlin.de> > I have no idea if any of these fixes are appropriate, I just hacked it > until my document parsed and my particular application got the data it > wanted. > > Please let me know how I should proceed with testing these changes and > contributing them back. Hi Eliot, To run the test suite, run regrtest.py in PyXML/test, and test.py in test/dom. Ideally, you'd run them with and without your patches, since not all tests may pass on your system (due to bugs in the test suite, mostly). Please submit your patches to sourceforge.net/projects/pyxml. Use unified (-u) or context (-c) diffs, and copy your rationale for these changes into the initial comment. Thanks for contributing, Martin From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 18:18:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 19:18:40 +0200 Subject: [XML-SIG] Processing of External Unparsed Entities In-Reply-To: <3AE88780.8A08A119@isogen.com> (eliot@isogen.com) References: <3AE88780.8A08A119@isogen.com> Message-ID: <200104271718.f3RHIeG01213@mira.informatik.hu-berlin.de> > The code for processing external unparsed entities appears to be broken > in the current code, both in DOM1 and DOM2 (I am currently using DOM1 to > do development). Has anyone exercised this or tested it? Should it be > working? It somewhat depends on the parser that you use, whether the SAX readers report the proper events. I think it is fair to say that this has seen little or no testing. So, even if you cannot contribute code that fixes all the problems, contributing test cases is worthwhile. > Fixing that, I then discovered that the DOM implementation object was an > HTMLDOMImplementation, which doesn't seem right (but maybe it's my > error)? That is OK. In the DOM, an HTMLDOMImplementation is a DOMImplementation that also supports HTML. > Anyway, at that point, the call to createEntity() failed because > HTMLDOMImplementation has no such method. This is not surprising: There is no operation "createEntity" in the DOM... Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 18:34:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 19:34:40 +0200 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource In-Reply-To: <3AE98D5E.477205D1@isogen.com> (eliot@isogen.com) References: <3AE98D5E.477205D1@isogen.com> Message-ID: <200104271734.f3RHYei01291@mira.informatik.hu-berlin.de> > I need to be able to get from a DOM the filename or URI of the file it > was constructed from (if it was constructed from a file). I didn't see > anything in the code that preserved this data, so I've hacked my local > copy of the code to add an optional "dataSource" parameter to the > FromFile/URI/Stream methods that propogates to the Sax reader. This is > set as the friend property _dataSource on the resulting DOM node. > > Is this an appropriate solution? I would not think so. Are you sure you need this on every node? That seems to be quite expensive for a rarely-used extension. If you really only put the parameter to FromFile into the tree, isn't putting it into the document sufficient? But then, aren't you interested in the data sources of the elements originating from an external parsed entity? > I need this because uparsed entity resolution does not preserve the > absolute path of the entities, only the system ID as specified in the > declaration. As I am using relative paths, I must have the path of the > declaring document in order to be able to construct new DOMs from > referenced XML document entities. I'm not sure I understand. Are you saying that the DOM requires to store the relative path in Entity::systemId? Where does the spec say so? > My business problem is processing a hyperdocument consisting of many > subordinate documents, where each document is declared as an unparsed > entity with a relative path (this hyperdocument is generated by an > automatic process and the intent is for the resulting package of > documents to be self contained so that it can be packaged and moved > without the need to rework any external identifiers or catalog files). To solve this problem, isn't it sufficient to carry the document's system ID along with the document, instead of putting it *into* the document? Regards, Martin From eliot@isogen.com Fri Apr 27 19:28:01 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 13:28:01 -0500 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource References: <3AE98D5E.477205D1@isogen.com> <200104271734.f3RHYei01291@mira.informatik.hu-berlin.de> Message-ID: <3AE9BA31.201BF14A@isogen.com> "Martin v. Loewis" wrote: > > > I need to be able to get from a DOM the filename or URI of the file it > > was constructed from (if it was constructed from a file). I didn't see > > anything in the code that preserved this data, so I've hacked my local > > copy of the code to add an optional "dataSource" parameter to the > > FromFile/URI/Stream methods that propogates to the Sax reader. This is > > set as the friend property _dataSource on the resulting DOM node. > > > > Is this an appropriate solution? > > I would not think so. Are you sure you need this on every node? > That seems to be quite expensive for a rarely-used extension. It's only set on the document node, not every node in the DOM. I think my comment above was misleading. > If you really only put the parameter to FromFile into the tree, isn't > putting it into the document sufficient? > > But then, aren't you interested in the data sources of the elements > originating from an external parsed entity? Since I don't recognize the useful existence of external parsed entities, I would never need to know anything about them :-) Possibly, but since an external parsed entity would be relative to the document that declares it, having the document's path and the entity's relative path would be sufficient (as it is for unparsed entities). > > I need this because uparsed entity resolution does not preserve the > > absolute path of the entities, only the system ID as specified in the > > declaration. As I am using relative paths, I must have the path of the > > declaring document in order to be able to construct new DOMs from > > referenced XML document entities. > > I'm not sure I understand. Are you saying that the DOM requires to > store the relative path in Entity::systemId? Where does the spec say > so? I don't think it explicitly requires it, but I would be very upset if a DOM parser changed the value of the original system ID with no way to get it back. I would consider that to be agregious destruction of data--what if the system ID is a URN that I need to be able to interrogate after the entity is resolved or what if I want to rewrite the entity declarations as originally specified? Thus I would not, personally suggest that the system ID value be the fully-resolved location of the file. > > My business problem is processing a hyperdocument consisting of many > > subordinate documents, where each document is declared as an unparsed > > entity with a relative path (this hyperdocument is generated by an > > automatic process and the intent is for the resulting package of > > documents to be self contained so that it can be packaged and moved > > without the need to rework any external identifiers or catalog files). > > To solve this problem, isn't it sufficient to carry the document's > system ID along with the document, instead of putting it *into* the > document? I could, but then I'm requiring processing applications to do this, at some significant cost in complexity. For example, in my code, I may be receiving a node from any document in the hyperdocument. Thus I would have to maintain a global mapping from document nodes to system IDs. Doable, but why force all applications to do this when I should be able to just ask the DOM "where did you come from?" Given that the info is only stored on the document node, I think it's a relatively small cost. It was certainly easier for me to patch the DOM implementation then to implement my own dictionary (although not that much easier). But now nobody else has to think about it. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 19:38:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 20:38:38 +0200 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource In-Reply-To: <3AE9BA31.201BF14A@isogen.com> (eliot@isogen.com) References: <3AE98D5E.477205D1@isogen.com> <200104271734.f3RHYei01291@mira.informatik.hu-berlin.de> <3AE9BA31.201BF14A@isogen.com> Message-ID: <200104271838.f3RIccF01642@mira.informatik.hu-berlin.de> > Given that the info is only stored on the document node, I think it's a > relatively small cost. It was certainly easier for me to patch the DOM > implementation then to implement my own dictionary (although not that > much easier). But now nobody else has to think about it. Even easier would be to put the attribute into the Document object without patching the DOM implementation: root = FromXmlStream(foo) root.dataSource = bar This is Python; you can modify any instance's attributes (unless the instance takes countermeasures). IMO this is as clean, but simpler, than extending the DOM implementation. Regards, Martin From eliot@isogen.com Fri Apr 27 19:53:26 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 13:53:26 -0500 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource References: <3AE98D5E.477205D1@isogen.com> <200104271734.f3RHYei01291@mira.informatik.hu-berlin.de> <3AE9BA31.201BF14A@isogen.com> <200104271838.f3RIccF01642@mira.informatik.hu-berlin.de> Message-ID: <3AE9C026.B7286C09@isogen.com> "Martin v. Loewis" wrote: > > > Given that the info is only stored on the document node, I think it's a > > relatively small cost. It was certainly easier for me to patch the DOM > > implementation then to implement my own dictionary (although not that > > much easier). But now nobody else has to think about it. > > Even easier would be to put the attribute into the Document object > without patching the DOM implementation: > > root = FromXmlStream(foo) > root.dataSource = bar > > This is Python; you can modify any instance's attributes (unless the > instance takes countermeasures). You're right. I've been so brainwashed by our use of CORBA on our project that I forget that you can do that kind of thing. That would certainly satisfy the requirement I have. But I think that this is probably a requirement against the DOM spec itself--that is, the notion of preserving the data source for DOMs is, I think, a general requirement that all DOM implementations should satisfy. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From linudom@hotmail.com Fri Apr 27 21:20:34 2001 From: linudom@hotmail.com (Dom Linu) Date: Fri, 27 Apr 2001 20:20:34 -0000 Subject: [XML-SIG] Validating a DOM w/xmlproc? Message-ID: Hi, Quick question: I have been using xmlproc for validation with its XMLValidator parse_resource method and a filename to validate XML. Now I find myself holding a DOM, and need to validate a DOM I have in memory (as opposed to a file). What is the easiest way to have the DOM validated? Pointers to past postings, TFM that covers this, or really obnoxious flames are all welcome! Thanks, dl _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com From gstein@lyra.org Fri Apr 27 21:38:58 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 27 Apr 2001 13:38:58 -0700 Subject: [XML-SIG] expat namespace_separator in Python 2.1 In-Reply-To: <003001c0cd2f$80210ae0$6401a8c0@EHUD>; from jtauber@jtauber.com on Tue, Apr 24, 2001 at 10:29:00PM -0400 References: <002001c0ccc9$54e28cc0$0401a8c0@VAIO> <15077.36393.731033.157812@cj42289-a.reston1.va.home.com> <003601c0cd10$57870090$ed020a0a@bowstreet.com> <004801c0cd12$f87cae80$7cac1218@reston1.va.home.com> <001201c0cd1a$86429c50$6401a8c0@EHUD> <3AE620BB.9FB13F18@zolera.com> <002301c0cd29$caa4b900$6401a8c0@EHUD> <3AE63624.9C763305@zolera.com> <003001c0cd2f$80210ae0$6401a8c0@EHUD> Message-ID: <20010427133858.T1374@lyra.org> Not just RDF, but WebDAV (RFC 2518) also works that way. I *very* much agree with the concatenation mechanism. We also use it for constructing property URIs in WebDAV. So... we have two specs (and RFC and a REC) that use concatenation. IMO, that means that pyexpat should support it. Plus the backwards compat thing. Cheers, -g On Tue, Apr 24, 2001 at 10:29:00PM -0400, James Tauber wrote: > > Sorry I wasn't more clear. It is only RDF that works this way. It is unusual > but internally consistent. > > James > > > > You cut that part out of my email. It's RDF. > > > > Oh, it wasn't clear to me from your note. > > > > Thanks. > > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Greg Stein, http://www.lyra.org/ From eliot@isogen.com Fri Apr 27 21:37:58 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 15:37:58 -0500 Subject: [XML-SIG] Optimizing getElementsByTagName() Message-ID: <3AE9D8A6.4635B154@isogen.com> In my processing I'm using getElementsByTagName() a lot. I notice that in the current implementation is recalculates the subelement list every time you call it. An obvious optimization would be to cache lists by tagName once calculated. Is this an appropriate optimization? It would have some memory cost. Using getElementsByTagName() is very convenient in my business logic, but I could replace it with my own iteration over child nodes. But normally I would presume that these sorts of queries would be optimized by the underlying implementation. Here they are not. Any other thoughts on optimization for this method? I'll implement the cache approach and see how it works. Thanks, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From eliot@isogen.com Fri Apr 27 21:51:55 2001 From: eliot@isogen.com (W. Eliot Kimber) Date: Fri, 27 Apr 2001 15:51:55 -0500 Subject: [XML-SIG] Optimizing getElementsByTagName() References: <3AE9D8A6.4635B154@isogen.com> Message-ID: <3AE9DBEB.E3467B0D@isogen.com> "W. Eliot Kimber" wrote: > > In my processing I'm using getElementsByTagName() a lot. I notice that > in the current implementation is recalculates the subelement list every > time you call it. > > An obvious optimization would be to cache lists by tagName once > calculated. > > Is this an appropriate optimization? It would have some memory cost. I thought about it some more and realized that it's probably not worth the cost, especially when you factor in tracking mutations of the DOM, which would invalidate the caches. Even in my own code, I was constructing new lists much more than I was using cached lists. Sorry for the bother. Cheers, E. -- . . . . . . . . . . . . . . . . . . . . . . . . W. Eliot Kimber | Lead Brain 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.656.4139 | F 512.419.1860 | eliot@isogen.com w w w . d a t a c h a n n e l . c o m From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 22:22:03 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 23:22:03 +0200 Subject: [XML-SIG] Another Sax2 Enhancement: dataSource In-Reply-To: <3AE9C026.B7286C09@isogen.com> (eliot@isogen.com) References: <3AE98D5E.477205D1@isogen.com> <200104271734.f3RHYei01291@mira.informatik.hu-berlin.de> <3AE9BA31.201BF14A@isogen.com> <200104271838.f3RIccF01642@mira.informatik.hu-berlin.de> <3AE9C026.B7286C09@isogen.com> Message-ID: <200104272122.f3RLM3602240@mira.informatik.hu-berlin.de> > But I think that this is probably a requirement against the DOM spec > itself--that is, the notion of preserving the data source for DOMs is, I > think, a general requirement that all DOM implementations should > satisfy. That may well be the case. So you might report an issue to www-dom@w3.org (I hope I got the address right). If such an attribute becomes official, a patch adding it to the Python DOM implementations would be certainly appreciated (somebody will probably implement it when nobody else does). Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Apr 27 22:33:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 27 Apr 2001 23:33:16 +0200 Subject: [XML-SIG] Validating a DOM w/xmlproc? In-Reply-To: (linudom@hotmail.com) References: Message-ID: <200104272133.f3RLXGC02360@mira.informatik.hu-berlin.de> > Quick question: I have been using xmlproc for validation with its > XMLValidator parse_resource method and a filename to validate XML. > Now I find myself holding a DOM, and need to validate a DOM I have > in memory (as opposed to a file). What is the easiest way to have > the DOM validated? I guess the easiest way is to linearise it, then parse it again. If it is 4DOM, you can use xml.dom.ext.Printer for the linearisation. Regards, Martin From uche.ogbuji@fourthought.com Sat Apr 28 00:57:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 27 Apr 2001 17:57:34 -0600 Subject: [XML-SIG] Possible bug in sgmlop (charcter entityrefs in attributes) Message-ID: <200104272357.f3RNvYv31305@borgia.local.dhcp.fourthought.com> Seems that sgmlop.c does not handle character entities refs in attributes. If the following HTML is parsed using 4DOM's HtmlLib reader, which uses Sgmlop.py, which uses sgmlop.c from PyXML, the handle_entiryref call-back is not being invoked for the "Ä" in the attribute, although it is working for the one in regular CDATA. Character entities in attributes
Ä I don't know enough about sgmlop to easily sort this out myself. Any ideas? Thanks -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From stuff4gary@hotmail.com Sat Apr 28 01:20:18 2001 From: stuff4gary@hotmail.com (gary cor) Date: Sat, 28 Apr 2001 00:20:18 Subject: [XML-SIG] Fool-proof XML examples, additional tutorials? Message-ID: Hello all, I am very impressed with the SAX2 search utility which I found on the latest python howto, I am searching and building XML again and again!!! However, I seem to have hit a wall doing anything else but searching for XML bits in python and the DOM howto part is far to unambitious as well!!! Listening to you guys go on about projects sounds very promissing but is also difficult to follow at times. Sure, if you make python XML foolproof then only fools would use it , but I think there is the danger of some intellectual snobery as well!! A few more people need to contribute to some good tutorials?.. Play safe as there will always the need for a wide variety of simple scripting methods for XML and python could rule there! Possibly, full XML applications at there limits are better supported in Java anyway? Gary _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. From rsalz@zolera.com Sat Apr 28 02:51:51 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 27 Apr 2001 21:51:51 -0400 Subject: [XML-SIG] Validating a DOM w/xmlproc? References: <200104272133.f3RLXGC02360@mira.informatik.hu-berlin.de> Message-ID: <3AEA2237.DA5F01BF@zolera.com> > I guess the easiest way is to linearise it, then parse it again. If it > is 4DOM, you can use xml.dom.ext.Printer for the linearisation. interesting question: can you invoke DOM methods such that the resultant structure is invalid? Off the top of my head, I don't know. From rsalz@zolera.com Sat Apr 28 03:32:45 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 27 Apr 2001 22:32:45 -0400 Subject: [XML-SIG] Proposal for "xml.namespaces" Message-ID: <3AEA2BCD.D2742E18@zolera.com> Mike Olson suggested I bring this up. I'd like to create "namespaces.py" that has the URI strings for various namespaces. Something like class SOAP: ENV = "http://schemas.xmlsoap.org/soap/envelope/" ENC = "http://schemas.xmlsoap.org/soap/encoding/" class XSD: v1999 = "http://www.w3.org/1999/XMLSchema" v2000 = "http://www.w3.org/2000/10/XMLSchema" v2001 = "http://www.w3.org/2001/XMLSchema" ALL = [ v1999, v2000, v2001 ] etc. Should it be "ns.py"? That filename is somewhat terse, but being able to write things like xml.ns.XSD.v2001, xml.ns.DSIG.RSA, etc., seems like a good thing. /r$ From uche.ogbuji@fourthought.com Sat Apr 28 04:20:01 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 27 Apr 2001 21:20:01 -0600 Subject: [XML-SIG] Optimizing getElementsByTagName() In-Reply-To: Message from "W. Eliot Kimber" of "Fri, 27 Apr 2001 15:37:58 CDT." <3AE9D8A6.4635B154@isogen.com> Message-ID: <200104280320.f3S3K1F32244@borgia.local> > In my processing I'm using getElementsByTagName() a lot. I notice that > in the current implementation is recalculates the subelement list every > time you call it. This is just a naive way to satisfy at least a portion of the (silly, IMHO) liveness requirements of the DOM. > An obvious optimization would be to cache lists by tagName once > calculated. > > Is this an appropriate optimization? It would have some memory cost. I pretty much don't know how to *spell* getElementsbyTagName anymore. My optimization for this is to use XPath. I know it might not be suitable for your needs, but just a thought... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 07:48:52 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 08:48:52 +0200 Subject: [XML-SIG] Removing the Wise directory Message-ID: <200104280648.f3S6mqx00941@mira.informatik.hu-berlin.de> Is anybody still interested in the Wise directory of PyXML? If not, I'd like to remove it from the package. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 07:33:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 08:33:15 +0200 Subject: [XML-SIG] Fool-proof XML examples, additional tutorials? In-Reply-To: (stuff4gary@hotmail.com) References: Message-ID: <200104280633.f3S6XF500890@mira.informatik.hu-berlin.de> > A few more people need to contribute to some good tutorials? Contributions of documentation is certainly welcome! Please be aware that this is a volunteer's project, though: People contribute what they have fun contributing; which is not necessarily what users want most. It turns out that only few people have fun writing documentation; this is why getting contributions of documentation is so hard, and happens rarely. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 07:47:13 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 08:47:13 +0200 Subject: [XML-SIG] Proposal for "xml.namespaces" In-Reply-To: <3AEA2BCD.D2742E18@zolera.com> (message from Rich Salz on Fri, 27 Apr 2001 22:32:45 -0400) References: <3AEA2BCD.D2742E18@zolera.com> Message-ID: <200104280647.f3S6lDt00938@mira.informatik.hu-berlin.de> > Should it be "ns.py"? That filename is somewhat terse, but being > able to write things like xml.ns.XSD.v2001, xml.ns.DSIG.RSA, etc., seems > like a good thing. I don't care much about that. What is wrong with defining symbolic names for these in your application? However, *if* these is an addition of such a module, I would require that a module documentation goes with it - or else nobody but the author of the module will ever make use of it. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 07:45:16 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 08:45:16 +0200 Subject: [XML-SIG] Validating a DOM w/xmlproc? In-Reply-To: <3AEA2237.DA5F01BF@zolera.com> (message from Rich Salz on Fri, 27 Apr 2001 21:51:51 -0400) References: <200104272133.f3RLXGC02360@mira.informatik.hu-berlin.de> <3AEA2237.DA5F01BF@zolera.com> Message-ID: <200104280645.f3S6jGf00936@mira.informatik.hu-berlin.de> > interesting question: can you invoke DOM methods such that the > resultant structure is invalid? Off the top of my head, I don't > know. Certainly. Suppose you have the DTD Still, it is certainly possible to do >>> d=imp.createDocument(None,"foo",None) >>> e=d.createElement("bar") >>> d.documentElement.appendChild(e) >>> d.toxml() '\n' How could the DOM implementation possibly know what DTD I had in mind? I could have provided a DocumentType node also, but which of these operations should then have failed, and how? Answer: the DOM does not care about validity. During modification operations, it might well happen that the document becomes invalid, eg. when removing one node and inserting a different one elsewhere. If the DOM would reject such modifications, it would be useless. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 08:02:42 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 09:02:42 +0200 Subject: [XML-SIG] Unicode literals in PyXML Message-ID: <200104280702.f3S72g401019@mira.informatik.hu-berlin.de> There are currently two files that use Unicode literals in PyXML; xml.utils.characters, and xml.schema.trex. Unfortunately, this means that distutils installation of PyXML fails under Python 1.5 - even though the rest of the package works fine with 1.5. There may be distutils tricks to avoid these problems, but I'm also looking for a way to change the sources so that they will install with Python 1.5 - getting an error if anybody uses them is acceptable. For xml.utils.characters, wrapping each literal with a unicode() call is probably safe; I'll best use UTF-16BE for these strings. As for xml.schema.trex, I'm not sure what the possibilities are. It *seems* like one could replace all Unicode literals with plain string literals, but I may be wrong. Regards, Martin From linudom@hotmail.com Sat Apr 28 08:32:55 2001 From: linudom@hotmail.com (Dom Linu) Date: Sat, 28 Apr 2001 07:32:55 -0000 Subject: [XML-SIG] Validating a DOM w/xmlproc? Message-ID:

Yep, in this case the DOM is assembled by parsing a flat file and creating Nodes using 4DOM. It is then handed off to another process that wishes to validate it with xmlproc-- I guess it will have to "reconstitute" it to a form xmlproc can deal with-- I've played with using the InputSourceFactory interface and the parser's set_inputsource_factory in xmlproc to return a custom file-like object-- but then I'd have to wrap the DOM up like a file-- probably about as much work as turning it back into text.

Thanks for the input!!

>From: "Martin v. Loewis"

>To: rsalz@zolera.com

>CC: linudom@hotmail.com, xml-sig@python.org

>Subject: Re: [XML-SIG] Validating a DOM w/xmlproc?

>Date: Sat, 28 Apr 2001 08:45:16 +0200

> > interesting question: can you invoke DOM methods such that the

> > resultant structure is invalid? Off the top of my head, I don't

> > know.

>Certainly. Suppose you have the DTD

>Still, it is certainly possible to do

> >>> d=imp.createDocument(None,"foo",None)

> >>> e=d.createElement("bar")

> >>> d.documentElement.appendChild(e)

> >>> d.toxml()

>'\n'

>How could the DOM implementation possibly know what DTD I had in mind?

>I could have provided a DocumentType node also, but which of these

>operations should then have failed, and how? Answer: the DOM does not

>care about validity.

>During modification operations, it might well happen that the document

>becomes invalid, eg. when removing one node and inserting a different

>one elsewhere. If the DOM would reject such modifications, it would be

>useless.

>Regards,

>Martin

>_______________________________________________

>XML-SIG maillist - XML-SIG@python.org

>http://mail.python.org/mailman/listinfo/xml-sig

Get your FREE download of MSN Explorer at http://explorer.msn.com

From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 08:43:25 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 09:43:25 +0200 Subject: [XML-SIG] Yet another 0.6.x release Message-ID: <200104280743.f3S7hPS01431@mira.informatik.hu-berlin.de> It appears that we have to make another release of PyXML 0.6, to provide patches that are requires by 4Suite 0.11. To prepare this release, I have created a CVS branch "o6maint", starting from v065, which should include bug fixes only. IOW, all the files that have been added to PyXML since 0.6.5 will *not* appear in 0.6.6. Regards, Martin From rnd@onego.ru Sat Apr 28 10:52:37 2001 From: rnd@onego.ru (Roman Suzi) Date: Sat, 28 Apr 2001 13:52:37 +0400 (MSD) Subject: [XML-SIG] Fool-proof XML examples, additional tutorials? Message-ID: >Hello all, >I am very impressed with the SAX2 search utility which I found on the latest >python howto, I am searching and building XML again and again!!! However, I >seem to have hit a wall doing anything else but searching for XML bits in >python and the DOM howto part is far to unambitious as well!!! > >Listening to you guys go on about projects sounds very promissing but is >also difficult to follow at times. Sure, if you make python XML foolproof >then only fools would use it I am not experienced in XML and have not used it for more than play, but I feel I need it in some places. XML is a data model, as strings are for unmarked text. But XML has more dimensions than string and more complex structure, allowing it to model much more complex things. (while XML code could could be represented by strings). And look, string type has a - built in syntactic support - after decades of development, string operations are more or less determined (find, sub, strip, +, ...). After awhile, XML will no more complex to handle than strings or lists or arrays, when it will be clear what is the right way. (Look at Numeric Python for examples that multidimensional arrays could be very versatile). For now, operations on XML are organized in several ways and that is fixed with all that thingies: DOM, SAX, XSLT, XPath, XQL, ... And XML-work is (IMHO) more complex than SQL one (due to less regular (richer) internal structure than that of relational tables.) So, for now we must experiment, and wait when X* will become as foolproof, as relational databases are today ;-) Do you remember pre-SQL DBs? FoxBase, for example? Roman. P.S. I am no specialist in XML, so probably I have some points wrong. But its my opinion for now. From rsalz@zolera.com Sat Apr 28 13:15:09 2001 From: rsalz@zolera.com (Rich Salz) Date: Sat, 28 Apr 2001 08:15:09 -0400 Subject: [XML-SIG] Proposal for "xml.namespaces" References: <3AEA2BCD.D2742E18@zolera.com> <200104280647.f3S6lDt00938@mira.informatik.hu-berlin.de> Message-ID: <3AEAB44D.B84CD471@zolera.com> > I don't care much about that. What is wrong with defining symbolic > names for these in your application? For the same reason people put #define's in header files. > However, *if* these is an addition of such a module, I would require > that a module documentation goes with it Sure. What format would you like? /r$ From fdrake@acm.org Sat Apr 28 13:54:34 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 28 Apr 2001 08:54:34 -0400 (EDT) Subject: [XML-SIG] Proposal for "xml.namespaces" In-Reply-To: <3AEAB44D.B84CD471@zolera.com> References: <3AEA2BCD.D2742E18@zolera.com> <200104280647.f3S6lDt00938@mira.informatik.hu-berlin.de> <3AEAB44D.B84CD471@zolera.com> Message-ID: <15082.48522.843043.135280@cj42289-a.reston1.va.home.com> Rich Salz writes: > For the same reason people put #define's in header files. I think the idea is pretty good, but am unsure if its better to put the namespace constants in a separate module of it the constants should be defined along with the API for the relevant classes. > Sure. What format would you like? If you were to provide docstrings that include links to the specifications that define the namespaces, I'll be glad to create the formal documentation in LaTeX. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Sat Apr 28 15:38:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 28 Apr 2001 16:38:31 +0200 Subject: [XML-SIG] Proposal for "xml.namespaces" In-Reply-To: <3AEAB44D.B84CD471@zolera.com> (message from Rich Salz on Sat, 28 Apr 2001 08:15:09 -0400) References: <3AEA2BCD.D2742E18@zolera.com> <200104280647.f3S6lDt00938@mira.informatik.hu-berlin.de> <3AEAB44D.B84CD471@zolera.com> Message-ID: <200104281438.f3SEcVG03306@mira.informatik.hu-berlin.de> > > However, *if* these is an addition of such a module, I would require > > that a module documentation goes with it > > Sure. What format would you like? A Python module documentation file, in the format of the Python library documentation (ie TeX): http://python.sourceforge.net/maint-docs/doc/doc.html Regards, Martin From Mike.Olson@fourthought.com Sat Apr 28 20:18:31 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 28 Apr 2001 13:18:31 -0600 Subject: [XML-SIG] 4Suite and 4Suite Server second alpha release Message-ID: <3AEB1787.4A21CB5A@FourThought.com> All, One more alpha release before we call it 0.11 final. Thank you all of the bug reports and comments. One note, we have removed PyXML from the 4Suite package. PyXML 0.6.5 is now required. Improvements since the last alpha: * Tested 4ODS with Dbm Driver * General Windows fixes for 4SuiteServer * Documentation improvements * Nasty bug in 4ss agent * More through test harnesses of 4Suite Server * Removed PyXML from 4Suite package * Small bug fixes here and there You can download the new packages at: http://4Suite.org/download.html Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Sun Apr 29 02:45:20 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 28 Apr 2001 21:45:20 -0400 (EDT) Subject: [XML-SIG] Validating a DOM w/xmlproc? In-Reply-To: <3AEA2237.DA5F01BF@zolera.com> References: <200104272133.f3RLXGC02360@mira.informatik.hu-berlin.de> <3AEA2237.DA5F01BF@zolera.com> Message-ID: <15083.29232.579127.635176@cj42289-a.reston1.va.home.com> Rich Salz writes: > interesting question: can you invoke DOM methods such that the > resultant structure is invalid? Off the top of my head, I don't know. Absolutely! Whether or not any given change will do so depends on the change and the schema. Now, if we're talking about well-formedness, I think that's guaranteed to hold if the DOM implementation isn't seriously buggy. But a non-well-formed DOM tree just doesn't make conceptual sense -- that the document is a tree is just about all well-formedness requires! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From larsga@garshol.priv.no Sun Apr 29 11:47:51 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Apr 2001 12:47:51 +0200 Subject: [XML-SIG] Validating a DOM w/xmlproc? In-Reply-To: References: Message-ID: * Dom Linu | | Quick question: I have been using xmlproc for validation with its | XMLValidator parse_resource method and a filename to validate XML. | Now I find myself holding a DOM, and need to validate a DOM I have | in memory (as opposed to a file). What is the easiest way to have | the DOM validated? I've been wanting to modify xmlproc so that the validator can operate without the XML parser itself. They are already separated to some extent, but the validator is not yet up to only validating an event stream. Once that is done I would like to make a validating SAX filter. What you could then do would be to use a DOM2SAX event generator and pipe the events through the validator. This is not yet done, but it's something I hope that either I or someone else can do in the not too distant future. --Lars M. From rsalz@zolera.com Sun Apr 29 17:11:31 2001 From: rsalz@zolera.com (Rich Salz) Date: Sun, 29 Apr 2001 12:11:31 -0400 Subject: [XML-SIG] Validating a DOM w/xmlproc? References: <200104272133.f3RLXGC02360@mira.informatik.hu-berlin.de> <3AEA2237.DA5F01BF@zolera.com> <15083.29232.579127.635176@cj42289-a.reston1.va.home.com> Message-ID: <3AEC3D33.113AAA7D@zolera.com> > Absolutely! Whether or not any given change will do so depends on > the change and the schema. > Now, if we're talking about well-formedness Gaargh, that's what I meant. Sorry for the confusion. /r$ From donaldallingham@home.com Mon Apr 30 02:47:26 2001 From: donaldallingham@home.com (Don Allingham) Date: Sun, 29 Apr 2001 19:47:26 -0600 Subject: [XML-SIG] problems reading iso-8859-1 data Message-ID: <3AECC42E.1040802@home.com> I hope this is the correct spot to post this question. I have an XML file with iso-8859-1 encoding. The sax parser (expat) seems to translating charaters above 128 to to separate characters. For example "�" in the xml file is being interpreted as "é" by the parser. (I'm running python 1.5.2 with PyXML 0.6.5) Am I missing something obvious? -- Don Allingham donaldallingham@home.com From fdrake@acm.org Mon Apr 30 03:05:59 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 29 Apr 2001 22:05:59 -0400 (EDT) Subject: [XML-SIG] problems reading iso-8859-1 data In-Reply-To: <3AECC42E.1040802@home.com> References: <3AECC42E.1040802@home.com> Message-ID: <15084.51335.966370.432959@cj42289-a.reston1.va.home.com> Don Allingham writes: > I have an XML file with iso-8859-1 encoding. The sax parser (expat)=20= > seems to translating charaters above 128 to to separate characters.=20= > For example "=E9" in the xml file is being interpreted as "=C3=A9" b= y the=20 > parser. (I'm running python 1.5.2 with PyXML 0.6.5) >=20 > Am I missing something obvious? I'm not sure how obvious it is, but what you are missing is that the expat output under Python 1.5.2 will always be UTF-8 encoded. (Under more recent versions of Python, Unicode strings are provided by default, but UTF-8 can be requested if desired.) -Fred --=20 Fred L. Drake, Jr. PythonLabs at Digital Creations From pyxml@xhaus.com Mon Apr 30 09:24:12 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Mon, 30 Apr 2001 09:24:12 +0100 Subject: [XML-SIG] problems reading iso-8859-1 data In-Reply-To: <3AECC42E.1040802@home.com> Message-ID: Don, Just a quick suggestion. > I have an XML file with iso-8859-1 encoding. The sax > parser (expat) seems to translating charaters above 128 > to to separate characters. > For example "=E9" in the xml file is being interpreted as > "=C3=A9" by the parser. > (I'm running python 1.5.2 with PyXML 0.6.5) > > Am I missing something obvious? Have you placed an encoding declaration at the top of your XML file, i.= e. something along the lines of I'm parsing iso-8859-1 files containing such characters without problem= =2E Just a suggestion. Alan. From jfk@informaticon.dk Mon Apr 30 10:41:50 2001 From: jfk@informaticon.dk (=?iso-8859-1?Q?J=F8rgen=20Fr=F8jk=20Kj=E6rsgaard?=) Date: Mon, 30 Apr 2001 11:41:50 +0200 Subject: [XML-SIG] problems reading iso-8859-1 data References: Message-ID: <3AED335E.AE1EBE13@informaticon.dk> Alan Kennedy wrote: > = > Don, > = > Just a quick suggestion. > = > > I have an XML file with iso-8859-1 encoding. The sax > > parser (expat) seems to translating charaters above 128 > > to to separate characters. > > For example "=E9" in the xml file is being interpreted as > > "=C3=A9" by the parser. > > (I'm running python 1.5.2 with PyXML 0.6.5) > > > > Am I missing something obvious? Expat always translates the parsed input to UTF-8 encoding. Python 2.0 handles this correctly but I'm not sure about Python 1.5.x as I've never used it for XML processing. > Have you placed an encoding declaration at the top of your XML file, i.= e. > something along the lines of > = > This does not change the fact that Expat outputs UTF-8. However, if the Expat parser hasn't been told to use iso-8859-1 as default encoding, it will assume UTF-8 input unless you state the encoding in the input as above. /jfk -- = J=F8rgen Fr=F8jk Kj=E6rsgaard, Systemkonsulent (Systems Consultant) Informaticon ApS * Web: www.informaticon.dk * Tlf: +45 8672 0093 Internet programmering * Systemudvikling p=E5 Linux, FreeBSD og PalmOS From donaldallingham@home.com Mon Apr 30 14:16:34 2001 From: donaldallingham@home.com (Don Allingham) Date: 30 Apr 2001 07:16:34 -0600 Subject: [XML-SIG] problems reading iso-8859-1 data In-Reply-To: <3AED335E.AE1EBE13@informaticon.dk> References: <3AED335E.AE1EBE13@informaticon.dk> Message-ID: <988636595.8711.5.camel@wallace> --=-AotdqlOxl1oqQ3Q0ScLH Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 30 Apr 2001 11:41:50 +0200, J=F8rgen Fr=F8jk Kj=E6rsgaard wrote: > Alan Kennedy wrote: > >=20 > > >=20 > This does not change the fact that Expat outputs UTF-8. However, if the > Expat parser hasn't been told to use iso-8859-1 as default encoding, it > will assume UTF-8 input unless you state the encoding in the input as > above. >=20 Yes, I have the encoding=3D"iso-8859-1" set the file. It sounds as if expat is converting the iso-8859-1 to UTF-8. Is there a way to convert back to iso-8859-1? Unfortunately, I'm stuck with 1.5.2. --=20 Don Allingham donaldallingham@home.com=20 --=-AotdqlOxl1oqQ3Q0ScLH Content-Type: text/html; charset=utf-8

On 30 Apr 2001 11:41:50 +0200, Jørgen Frøjk Kjærsgaard wrote:
> Alan Kennedy wrote:
> > 
> > <?xml version="1.0" encoding="iso-8859-1"?>
> 
> This does not change the fact that Expat outputs UTF-8. However, if the
> Expat parser hasn't been told to use iso-8859-1 as default encoding, it
> will assume UTF-8 input unless you state the encoding in the input as
> above.
>

Yes, I have the encoding="iso-8859-1" set the file. It sounds as if expat is converting the iso-8859-1 to UTF-8. Is there a way to convert back to iso-8859-1? Unfortunately, I'm stuck with 1.5.2.

--
Don Allingham
donaldallingham@home.com --=-AotdqlOxl1oqQ3Q0ScLH-- From Alexandre.Fayolle@logilab.fr Mon Apr 30 14:52:38 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 30 Apr 2001 15:52:38 +0200 (CEST) Subject: [XML-SIG] problems reading iso-8859-1 data In-Reply-To: <988636595.8711.5.camel@wallace> Message-ID: On 30 Apr 2001, Don Allingham wrote: > Yes, I have the encoding="iso-8859-1" set the file. It sounds as if > expat is converting the iso-8859-1 to UTF-8. Is there a way to convert > back to iso-8859-1? Unfortunately, I'm stuck with 1.5.2. Try this : ---------------------------8<--------------------------------- from xml.unicode.utf8_iso import utf8_to_code, code_to_utf8 import cStringIO def utf8_to_latin(s): buff = cStringIO.StringIO() while s: try: head,s = utf8_to_code(1,s) except Exception,e: from traceback import print_exc print_exc() head = '' s = s[1:] buff.write(head) ans = buff.getvalue() buff.close() return ans def latin_to_utf8(s): buff = cStringIO.StringIO() for c in s: try: cv = code_to_utf8(1,c) except Exception,e: from traceback import print_exc print_exc() cv = '' buff.write(cv) ans = buff.getvalue() buff.close() return ans ---------------------------------8<-------------------------- Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Mon Apr 30 15:04:56 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 30 Apr 2001 16:04:56 +0200 (CEST) Subject: [XML-SIG] One more remark on latin/UTF-8 and XBEL Message-ID: I think there's a problem in the XBEL demos. With Python 1.52, the sgmlop parser which is used for parsing netscape bookmarks will output latin1 strings which are passed 'as is' into Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Mon Apr 30 15:12:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 30 Apr 2001 16:12:06 +0200 (CEST) Subject: [XML-SIG] Wooops... (One more remark on latin/UTF-8 and XBEL) In-Reply-To: Message-ID: Sorry, I hit the wrong key, sending the mail instead of canceling it (^C and ^X are pretty close on my keyboard). Forget you even saw this mail and delete it straight away... Alexandre 'yes, I'm using Pine' Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Mon Apr 30 19:39:55 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 30 Apr 2001 20:39:55 +0200 Subject: [XML-SIG] problems reading iso-8859-1 data In-Reply-To: (message from Alexandre Fayolle on Mon, 30 Apr 2001 15:52:38 +0200 (CEST)) References: Message-ID: <200104301839.f3UIdtL01407@mira.informatik.hu-berlin.de> > > Yes, I have the encoding="iso-8859-1" set the file. It sounds as if > > expat is converting the iso-8859-1 to UTF-8. Is there a way to convert > > back to iso-8859-1? Unfortunately, I'm stuck with 1.5.2. > > > Try this : > > ---------------------------8<--------------------------------- > from xml.unicode.utf8_iso import utf8_to_code, code_to_utf8 > import cStringIO Or, shorter yet: from xml.unicode.iso8859 import wstring def utf8_to_latin1(s): wstring.decode("utf-8",s).encode("iso-8859-1") def latin1_to_utf8(s): wstring.decode("iso-8859-1",s).encode("utf-8") Regards, Martin