From ken@bitsko.slc.ut.us Fri Dec 1 00:09:07 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 30 Nov 2000 18:09:07 -0600 Subject: [XML-SIG] 0.5.1 and 0.6.2 In-Reply-To: Michael Sobolev's message of "Fri, 1 Dec 2000 00:22:42 +0300" References: <20001201002242.A5950@transas.com> Message-ID: Michael Sobolev writes: [Michael already pointed out he's using DOM, but I already had this written in case anyone finds it useful.] I have a SAX client that I've got working well with SAX1 and SAX2 so far. Having Unicode strings caught me in one place, and I needed to wrap it with str() to make it work in that context, but I have no general tips if that goes wrong for anyone. In my SAX handler module, in the file global scope, I do this: import sys if hasattr(sys, 'version_info'): isPy2 = 1 import xml.sax from xml.sax.handler import feature_namespaces from xml.sax import SAXException else: isPy2 = 0 from xml.sax import saxexts from xml.sax.saxlib import SAXException When creating the parser I do this: if isPy2: self.parser = xml.sax.make_parser() self.parser.setFeature(feature_namespaces, 0) self.parser.setContentHandler(self) else: self.parser = saxexts.make_parser() self.parser.setDocumentHandler(self) I'm parsing files, so later I do: if isPy2: self.parser.parse(file) else: self.parser.parseFile(file) While working with attributes in startElement(), I do this to get a list of attribute names to use as indexes into the attributes: if isPy2: att_names = atts.keys() else: att_names = [] for ii in range(0, len(atts)): att_names.append(atts[ii]) And for characters(), I do this: def characters(self, ch, start=0, length=-1): if length == -1: # SAX2 self.text = self.text + ch else: self.text = self.text + ch[start:start+length] I do my own namespace processing (more for convenience than for SAX1/SAX2 differences), so that makes start/endElement() usable for both SAX1 and SAX2. Otherwise you'll need both start/endElement() and start/endElementNS(). If you do use namespace processing, you need no special code in startElement() (as above) because you know only startElement() will be called from SAX1 and startElementNS() will be called from SAX2. -- Ken From calvin@cs.uni-sb.de Fri Dec 1 00:16:48 2000 From: calvin@cs.uni-sb.de (Bastian Kleineidam) Date: Fri, 1 Dec 2000 01:16:48 +0100 (CET) Subject: [XML-SIG] 0.5.1 and 0.6.2 In-Reply-To: <20001201002242.A5950@transas.com> Message-ID: >Can anybody give a hint on how to correctly write applications >that may need to work with both versions of python-xml? Make a compatibility layer with try: except: statements. I am using this: #-----8<------ try: try: # xml interface (DOM-2, SAX-2) as found in PyXML 0.6.2 from xml.dom.ext.reader.Sax2 import Reader def _get_dom(filename): return Reader(validate=1).fromStream(open(filename)) except ImportError: # xml interface (DOM-2, SAX-2) as found in PyXML 0.6.1 from xml.dom.ext.reader.Sax2 import FromXmlFile def _get_dom(filename): return FromXmlFile(filename, validate=1) def get_dom(filename): # change dir to find DTD file import os olddir = os.getcwd() os.chdir(os.path.dirname(filename)) dom = _get_dom(filename) os.chdir(olddir) return dom def get_attr(attrs, name): if attrs.has_key(('', name)): return attrs[('', name)]._get_value() def get_dom_attrs(dom): return dom.documentElement._get_attributes() def get_node_attrs(node): return node._get_attributes() def get_node_name(node): return node._get_nodeName() def get_childnodes(node): return node._get_childNodes() def node_value(node): from xml.dom.Node import Node if node._get_nodeType() == Node.TEXT_NODE: return node._get_nodeValue() s = "" for n in node._get_childNodes(): s = s + node_value(n) return s except ImportError: # xml interface (DOM-1, SAX-1) as found in PyXML 0.5.x from xml.sax import saxexts,saxutils from xml.dom.sax_builder import SaxBuilder _parser = saxexts.XMLValParserFactory.make_parser() _parser.setErrorHandler(saxutils.ErrorPrinter()) def get_dom(filename): _dom_builder = SaxBuilder() _parser.setDocumentHandler(_dom_builder) _parser.parse(filename) _parser.reset() return _dom_builder.document def get_attr(attrs, name): if attrs.has_key(name): return attrs[name].get_value() def get_dom_attrs(dom): return dom.get_documentElement().get_attributes() def get_node_attrs(node): return node.get_attributes() def get_node_name(node): return node.get_name() def get_childnodes(node): return node.get_childNodes() def node_value(node): from xml.dom.core import TEXT_NODE if node.get_nodeType() == TEXT_NODE: return node.get_nodeValue() s = "" for n in node.get_childNodes(): s = s + node_value(n) return s def get_node_attr(node, name): return get_attr(get_node_attrs(node), name) #---8<---- Bastian From uche.ogbuji@fourthought.com Fri Dec 1 11:16:03 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 1 Dec 2000 04:16:03 -0700 Subject: [XML-SIG] ANN: 4Suite 0.10.0 Message-ID: <200012011116.EAA09752@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite 0.10.2 --------------------------- Open source tools for standards-based XML, DOM, XPath, XSLT, RDF XPointer, XLink and object-database development in Python http://4Suite.org 4Suite is a collection of Python tools for XML processing and object database management. An integrated packaging of several formerly separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS, 4XPointer, 4XLink and DbDOM. News ---- * RDF: Added a driver based on shelve (DB/DBM) * ODS: Added a driver based on anydbm * Fix format-number support and implement in C * Improve Unicode and other encoding support * Documentation updates * Many misc optimizations * Many misc bug-fixes More info and Obtaining 4Suite ------------------------------ Please see http://4Suite.org From where you can download source, Windows and Linux binaries. 4Suite is distributed under a license similar to that of the Apache Web Server. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +01 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Dec 1 11:19:34 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 1 Dec 2000 04:19:34 -0700 Subject: [XML-SIG] ANN: 4Suite Server 0.10.0 Message-ID: <200012011119.EAA09842@localhost.localdomain> Fourthought, Inc. (http://Fourthought.com) announces the release of 4Suite Server 0.10.0 ---------------------------- An open source XML data server based on open standards implemented using 4Suite and other tools http://FourThought.com/4SuiteServer http://4Suite.org 4Suite Server is a platform for handling XML processing needs in application development. It is an XML data repository with a rules-based engine. It supports DOM access, XSLT transformation, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also supports other related services such as distributed transactions, and access control lists. It supports remote, cross-platform and cross-language access through CORBA and other request protocols to be added shortly. 4Suite Server is not designed to be a full-blown application server. It provides highly-specialized services for XML processing that can be used with other application servers. 4Suite Server is open-source and free to download. Priority support and customization is available from Fourthought, Inc. For more information on this, see the http://FourThought.com, or contact Fourthought at info@fourthought.com or +1 303 583 9900 The 4Suite Server home page is http://FourThought.com/4SuiteServer From where you can download the software itself or an executive summary thereof, read usage scenarios and find other information. From yang13@126.com Fri Dec 1 16:21:05 2000 From: yang13@126.com (=?ISO-8859-1?Q?=D0=A1=D1=EE?=) Date: Sat, 2 Dec 2000 0:21:5 +0800 Subject: [XML-SIG] (no subject) Message-ID: XML-SIG=A3=AC=C4=FA=BA=C3=A3=A1 =D6=C2 =C0=F1=A3=A1 =D0=A1=D1=EE yang13@126.com From fdrake@acm.org Fri Dec 1 16:32:19 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Dec 2000 11:32:19 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200011261856.TAA00929@loewis.home.cs.tu-berlin.de> References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <14880.4503.540928.777303@cj42289-a.reston1.va.home.com> <200011261856.TAA00929@loewis.home.cs.tu-berlin.de> Message-ID: <14887.53907.908244.249743@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > Good, I'll add this to PyXML first, and then move it over to Python > later. Please note that DOMException is already defined in > xml.dom(.__init__) of PyXML, so it is merely a matter of adding the > derived classes, and adding them in 4DOM. Have you had time to work on this? Would you like me to take a look at it? I'm not familiar with the 4DOM code, but would like to see the exceptions defined and available from xml.dom soon. I said: > I'd also like to see the .nodeType values defined this way, and > shared by the implementations. and Martin responded: > It's more difficult with those, since the spec says they are defined > inside of the Node interface. We could deviate from the DOM spec in Perhaps we should provide a Node class in xml.dom that defines just those values, and implementations can inherit that or duplicate the values in their own Node implementation. Nothing other than the enumeration values should be defined in xml.dom.Node (except maybe a docstring). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Fri Dec 1 16:44:40 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Dec 2000 11:44:40 -0500 (EST) Subject: [XML-SIG] minidom/pulldom connection In-Reply-To: <200011232203.XAA01220@loewis.home.cs.tu-berlin.de> References: <14876.11936.725389.726400@cj42289-a.reston1.va.home.com> <200011232203.XAA01220@loewis.home.cs.tu-berlin.de> Message-ID: <14887.54648.796431.588740@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > From the conformance point of view, minidom is *wrong* by not raising > exceptions in appropriate places. However, I doubt anybody fixing this > would start with pulldom. I think this is mostly a minidom problem and not a pulldom issue. [in response to my proposal to pass a Document factory to PullDOM:] > I don't see the need to provide this kind of extensibility until > somebody actually wants to implement an alternative minidom on top of > pulldom. However, if this is added now, I'd agree with Mike that it > would be better to support DOMImplementation objects in minidom. I'll point out that if anyone should want to do this, they'll have to hack pulldom to do it, and not be able to share their DOM implementation until pulldom is updated at least in PyXML. I think this should be done sooner rather than later. I agree that a DOMImplementation would be better than some other Document factory. My preliminary DOMImplementation code for minidom is not correct (but works in context); I'll try and fix it this weekend. pulldom will require some corresponding changes. (The documentElement on created documents is supposed to already be created, as well as the doctype. I'll write up some notes on what I've found there for things that the recommendation doesn't seem to say.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Fri Dec 1 20:47:04 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 01 Dec 2000 13:47:04 -0700 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: Message from "Fred L. Drake, Jr." of "Fri, 01 Dec 2000 11:32:19 EST." <14887.53907.908244.249743@cj42289-a.reston1.va.home.com> Message-ID: <200012012047.NAA10970@localhost.localdomain> > I said: > > I'd also like to see the .nodeType values defined this way, and > > shared by the implementations. > > and Martin responded: > > It's more difficult with those, since the spec says they are defined > > inside of the Node interface. We could deviate from the DOM spec in > > Perhaps we should provide a Node class in xml.dom that defines just > those values, and implementations can inherit that or duplicate the > values in their own Node implementation. Nothing other than the > enumeration values should be defined in xml.dom.Node (except maybe a > docstring). Well, this would interfere pretty badly with 4DOM. There is an xml.dom.Node.py file in 4DOM and having a Node class in the __init__ would cause problems with the import. What's wrong with from xml.dom.Node import Node n.nodeType == Node.ELEMENT_NODE -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mss@transas.com Fri Dec 1 20:58:34 2000 From: mss@transas.com (Michael Sobolev) Date: Fri, 1 Dec 2000 23:58:34 +0300 Subject: [XML-SIG] 0.5.1 and 0.6.2 In-Reply-To: <20001201002242.A5950@transas.com>; from mss@transas.com on Fri, Dec 01, 2000 at 12:22:42AM +0300 References: <20001201002242.A5950@transas.com> Message-ID: <20001201235834.A31966@transas.com> On Fri, Dec 01, 2000 at 12:22:42AM +0300, Michael Sobolev wrote: > I have a small problem here. :) Thank you all. I am going to try to implement some of the given advices. :) Regards, -- Misha From fdrake@acm.org Fri Dec 1 21:16:34 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Dec 2000 16:16:34 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012012047.NAA10970@localhost.localdomain> References: <14887.53907.908244.249743@cj42289-a.reston1.va.home.com> <200012012047.NAA10970@localhost.localdomain> Message-ID: <14888.5426.542871.817456@cj42289-a.reston1.va.home.com> uche.ogbuji@fourthought.com writes: > Well, this would interfere pretty badly with 4DOM. There is an > xml.dom.Node.py file in 4DOM and having a Node class in the __init__ would > cause problems with the import. That sucks. > What's wrong with > > from xml.dom.Node import Node > > n.nodeType == Node.ELEMENT_NODE I was hoping for a nice simple way of sharing the values, and a common place to pick them up. The latter is more important for client code I think. If we have DOMException & friends as: xml.dom.DOMException xml.dom.DOMStringSizeError xml.dom.HierarchyRequestError ... xml.dom.DOMSTRING_SIZE_ERR ... then it seems we also want to be able to access the .nodeType codes according to the spec from the same location: xml.dom.Node xml.dom.Node.ELEMENT_NODE ... I can live with the .nodeType values being directly in the __init__.py, so we have: xml.dom.ELEMENT_NODE ... That just means we can't provide a Node class in a common place that provides the constants for *_NODE values. Not a huge problem, but not as nice as I'd hoped for. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin@mems-exchange.org Fri Dec 1 23:33:42 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 01 Dec 2000 18:33:42 -0500 Subject: [XML-SIG] Two minidom patches Message-ID: I've submitted two patches to minidom.py using the Python project's patch manager. (Should such patches be submitted to the PyXML patch manager, or the Python one?) https://sourceforge.net/patch/?func=detailpatch&patch_id=102485&group_id=5470 [ Patch #102485 ] minidom.py: Check for legal children https://sourceforge.net/patch/?func=detailpatch&patch_id=102492&group_id=5470 [ Patch #102492 ] minidom/pulldom: remove nodes already in the tree Anyone want to review them? --amk From uche.ogbuji@fourthought.com Sat Dec 2 00:01:24 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 01 Dec 2000 17:01:24 -0700 Subject: [XML-SIG] XML 2000 anyone? Message-ID: <200012020001.RAA11619@localhost.localdomain> Just wanted to say that if any of you lot will be at XML 2000, do come by Fourthought's booth (#900). We'd love to put some more faces to names. We'll be demoing the soon-to-be-relaunched OpenTechnology.org, which has been completely re-architected to run on top of 4Suite Server. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Sat Dec 2 00:11:00 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Dec 2000 19:11:00 -0500 (EST) Subject: [XML-SIG] Two minidom patches In-Reply-To: References: Message-ID: <14888.15892.801277.516501@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > I've submitted two patches to minidom.py using the Python project's > patch manager. (Should such patches be submitted to the PyXML patch > manager, or the Python one?) > > https://sourceforge.net/patch/?func=detailpatch&patch_id=102485&group_id=5470 > [ Patch #102485 ] minidom.py: Check for legal children > > https://sourceforge.net/patch/?func=detailpatch&patch_id=102492&group_id=5470 > [ Patch #102492 ] minidom/pulldom: remove nodes already in the tree > > Anyone want to review them? I'll be glad to take a look at them this weekend. Did you check to see if they're compatible with the patch to minidom/pulldom I have in the Python PM? If not, I'll integrate them if they look good, and check them in if no one objects to the combined patch. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin@mems-exchange.org Sat Dec 2 00:19:12 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Fri, 1 Dec 2000 19:19:12 -0500 Subject: [XML-SIG] Two minidom patches In-Reply-To: <14888.15892.801277.516501@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Dec 01, 2000 at 07:11:00PM -0500 References: <14888.15892.801277.516501@cj42289-a.reston1.va.home.com> Message-ID: <20001201191912.B28955@kronos.cnri.reston.va.us> On Fri, Dec 01, 2000 at 07:11:00PM -0500, Fred L. Drake, Jr. wrote: > I'll be glad to take a look at them this weekend. Did you check to >see if they're compatible with the patch to minidom/pulldom I have in >the Python PM? If not, I'll integrate them if they look good, and No. I can check if they collide and reconcile them if you like. I'm most uncertain about the pulldom changes, so it's probably best to look *very* carefully at those bits. --amk From martin@loewis.home.cs.tu-berlin.de Sat Dec 2 07:54:29 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 2 Dec 2000 08:54:29 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <14887.53907.908244.249743@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <14880.4503.540928.777303@cj42289-a.reston1.va.home.com> <200011261856.TAA00929@loewis.home.cs.tu-berlin.de> <14887.53907.908244.249743@cj42289-a.reston1.va.home.com> Message-ID: <200012020754.IAA00799@loewis.home.cs.tu-berlin.de> > Have you had time to work on this? Would you like me to take a look > at it? I'm not familiar with the 4DOM code, but would like to see the > exceptions defined and available from xml.dom soon. Please have a look at the current PyXML CVS. To copy the code into the Python core, some work is probably necessary on the exception message strings - unless you also want to copy en_US.py. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Dec 2 08:03:34 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 2 Dec 2000 09:03:34 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012012047.NAA10970@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012012047.NAA10970@localhost.localdomain> Message-ID: <200012020803.JAA00847@loewis.home.cs.tu-berlin.de> > Well, this would interfere pretty badly with 4DOM. There is an > xml.dom.Node.py file in 4DOM and having a Node class in the __init__ > would cause problems with the import. What exactly would those problems be? > What's wrong with > > from xml.dom.Node import Node > > n.nodeType == Node.ELEMENT_NODE The problem is that we'd expost xml.dom.Node as a public class as defined in the DOM, giving the impression that it is base of all other DOM classes. Yet, when you do isinstance with a 4DOM object and that xml.dom.Node, it will fail. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Dec 2 08:05:04 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 2 Dec 2000 09:05:04 +0100 Subject: [XML-SIG] Two minidom patches In-Reply-To: (message from Andrew Kuchling on Fri, 01 Dec 2000 18:33:42 -0500) References: Message-ID: <200012020805.JAA00877@loewis.home.cs.tu-berlin.de> > Anyone want to review them? I have just assigned them to me, and will take a look soon. Regards, Martin From dieter@handshake.de Sun Dec 3 22:32:43 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 3 Dec 2000 23:32:43 +0100 Subject: [XML-SIG] 4XSLT: excessive time complexity (in stylesheet size) Message-ID: <200012032232.XAA02858@lindm.dm> I have tried to use 4XSLT to transform an XML/DocBook document using Normal Walsh's stylesheets. The stylesheet files have been read in and parsed in about 1 to 2 minutes. However, the "stylesheet.setup" took about 15 CPU minutes, before I interrupted it. I repeated this twice. In both cases, the interrupt was reported in the function "getChildNodeIndex". It was looking the the child index of about the 600. child in the top level child list with 1000 elements. Apparently, there is at least quadratic time complexity in the number of children. "getChildNodeIndex" seems to be highly responsible for this behaviour. Dieter From Taylor.Johnd@emeryworld.com Mon Dec 4 04:31:52 2000 From: Taylor.Johnd@emeryworld.com (Taylor, John D MWA) Date: Mon, 4 Dec 2000 04:31:52 -0000 Subject: [XML-SIG] PyXML-0.6.2 install Message-ID: Hi, My name is John Taylor and I'm learning Python/XML via Sean McGrath's truly great book. I just got permission to install Python on one of our Solaris boxes, and .... I just ran (after building PyXML-0.6.2) 'setup.py install' on my solaris machine and got the following: creating /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/__init__.py -> /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/iso8601.py -> /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/qp_xml.py -> /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils byte-compiling /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/__init__.p y to __init__.pyc ld.so.1: python: fatal: relocation error: file python: symbol fseeko: referenced symbol not found Funny thing is, each time I rerun the install, it gets to the next file, then it dies (next run it died on __checkversion__.pyc, the next run it went into ./dom and died after Attr.pyc. If I wanted to rerun this thing about 2000 times, I might just get all the way through.... So I thought I'd check with you all, just in case someone had run into this before. Thanks in advance, John Taylor MQ-Series Support BEST Consulting Portland, OR 97210 (503)450-5984 taylor.johnd@emeryworld.com or jdta@uswest.net From martin@loewis.home.cs.tu-berlin.de Mon Dec 4 08:34:32 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 4 Dec 2000 09:34:32 +0100 Subject: [XML-SIG] PyXML-0.6.2 install In-Reply-To: (Taylor.Johnd@emeryworld.com) References: Message-ID: <200012040834.JAA00677@loewis.home.cs.tu-berlin.de> > My name is John Taylor and I'm learning Python/XML via Sean McGrath's truly > great book. I just got permission to install Python on one of our Solaris > boxes, and .... How exactly did you install Python? What compiler did you use, what configure options did you give, did you tell it to compile all C modules as *shared* libraries? I recommend not to do the latter. > ld.so.1: python: fatal: relocation error: file python: symbol fseeko: > referenced symbol not found That appears to be problem with the Python installation; apparently importing distutils.util.byte_compile (or running it) results in an import of an external module which cannot be loaded. It then somehow still manages to generate the pyc file (although that may be corrupted), and goes to the next file. In any case, you probably will have to fix the Python installation, as whatever the problem is, it probably will re-occur in another context (other than installing PyXML). Regards, Martin From matt@clondiag.com Mon Dec 4 09:24:22 2000 From: matt@clondiag.com (Matthias Kirst) Date: Mon, 04 Dec 2000 10:24:22 +0100 Subject: [XML-SIG] ODBC-XML-Interface Message-ID: <3A2B62C6.2DF381C1@clondiag.com> Hi folks, Is there any Python SAX2-Driver available that parses Databases via Python Database API. Thanks, Matthias, CLONDIAG From Alexandre.Fayolle@logilab.fr Mon Dec 4 09:57:32 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 4 Dec 2000 10:57:32 +0100 (CET) Subject: [XML-SIG] location of the PyXML package with python 2.0 Message-ID: Quite a while ago, there was a discussion on how PyXML could avoid a clash with the build in xml package in python 2.0 using some deep import voodoo processing. I cannot recall what the outcome of the discussion was. In other words, to use PyXML with python2.0, is it necessary to use "from _xmlplus.dom.ext.reader import Sax2" or can I safely write "from xml.dom.ext.reader import Sax2" and assume the import voodoo magick is performed when the xml package is imported ? Thanks for the support. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From noreply@sourceforge.net Mon Dec 4 13:55:50 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 4 Dec 2000 05:55:50 -0800 Subject: [XML-SIG] [Bug #124375] DbDom/4ODS bug : InitDomDb fails with Dbm backend Message-ID: <200012041355.FAA30080@sf-web3.vaspecialprojects.com> Bug #124375, was updated on 2000-Dec-04 05:55 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: ornicar Assigned to : Nobody Summary: DbDom/4ODS bug : InitDomDb fails with Dbm backend Details: Maybe it's only a documentation issue, however running initDbDom gives the following stack trace: initDomDb test_dom_db Traceback (innermost last): File "/usr/bin/initDomDb", line 4, in ? from Ft.DbDom import initDomDb File "/usr/lib/python1.5/site-packages/Ft/DbDom/initDomDb.py", line 18, in ? from Ft.Ods.StorageManager import Adapters File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/__init__.py", line 15, in ? from Ft.Ods.StorageManager.Adapters import g_driverModule File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/__init__.py", line 25, in ? SetDriver(os.environ['FTODS_DB_DRIVER']) File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/__init__.py", line 21, in SetDriver g_driverModule = __import__("Ft.Ods.StorageManager.Adapters." + g_driverName, globals(), locals(), [g_driverName]) File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/Dbm.py", line 21, in ? import DbmMappings, DbmHelper File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/DbmHelper.py", line 19, in ? from Ft.Lib import DbmDatabase File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 154, in ? Database() File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 93, in __init__ os.makedirs(self._dbpath) File "/usr/lib/python1.5/os.py", line 114, in makedirs mkdir(name, mode) OSError: [Errno 2] No such file or directory: '/var/local/data/ftdatabase/' For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124375&group_id=6473 From noreply@sourceforge.net Mon Dec 4 14:38:24 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 4 Dec 2000 06:38:24 -0800 Subject: [XML-SIG] [Bug #124380] DbDom: usage for initDomDb shows the wrong executable name Message-ID: <200012041438.GAA02514@sf-web3.vaspecialprojects.com> Bug #124380, was updated on 2000-Dec-04 06:38 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom: usage for initDomDb shows the wrong executable name Details: Here's a patch: --- initDomDb.py Mon Dec 4 15:35:42 2000 +++ /usr/lib/python1.5/site-packages/Ft/DbDom/initDomDb.py Mon Dec 4 15:36:48 2000 @@ -20,8 +20,8 @@ from Ft.Ods.Tools import _4odb_create from Ft.Ods.Parsers.Odl import OdlParse -usage = """initDbDom connString - connString the strin to connect to the database with +usage = """initDomDb connString + connString the string to connect to the database with odlFileLocation dom.odl, defaults to directory of this file """ For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124380&group_id=6473 From noreply@sourceforge.net Mon Dec 4 14:48:39 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 4 Dec 2000 06:48:39 -0800 Subject: [XML-SIG] [Bug #124382] xml.dom.ext.PyExpat.Reader is useless as is. Message-ID: <200012041448.GAA32017@sf-web2.i.sourceforge.net> Bug #124382, was updated on 2000-Dec-04 06:48 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: mjpieters Assigned to : Nobody Summary: xml.dom.ext.PyExpat.Reader is useless as is. Details: With the conversion from 4Suite to Python-XML PyExpat.Reader has not been fully stripped of FourThrought references. fromStream still has two references to Ft.Lib code. Also, PyExpat isn't imported; the code importing it was stripped accidently, I think. Looking at 4Suite 0.10 the follwing code should be inserted before line 24: try: #Python 2.0 import pyexpat except ImportError: #Python 1.x with PyXML from xml.parsers import pyexpat For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124382&group_id=6473 From noreply@sourceforge.net Mon Dec 4 15:08:28 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 4 Dec 2000 07:08:28 -0800 Subject: [XML-SIG] [Bug #124387] DbDom + Dbm fails create_test.py Message-ID: <200012041508.HAA32454@sf-web2.i.sourceforge.net> Bug #124387, was updated on 2000-Dec-04 07:08 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom + Dbm fails create_test.py Details: I tried running /usr/doc/4Suite-0.10.0/DbDom/test_suite/create_test.py using Dbm as a database backend. It failed on the first commit() statement. export FT_DATABASE_DIR=/home/alf/DbDom export FTODS_DB_DRIVER=Dbm export ODS_TEST_DB=ods_test $ initDomDb ods_test $ python create_test.py Instance Node Type 1 Prefix foo local name bar Namespace URI http://www.foo.com tag name foo:bar ownerDocument Traceback (innermost last): File "create_test.py", line 151, in ? test1() File "create_test.py", line 41, in test1 tx.commit() File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 91, in commit self.checkpoint() File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 170, in checkpoint self.__storageManager.writeObject(o) File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/__init__.py", line 78, in writeObject self._dba.writeObject(o) File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/Dbm.py", line 317, in writeObject self._db.insertInto(tableName)[str(oid)] = o._4ods_getFullTuple() File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 140, in insertInto db = anydbm.open(table_file, WRITEABLE) File "/usr/lib/python1.5/anydbm.py", line 80, in open raise error, "need 'c' or 'n' flag to open new db" anydbm.error: need 'c' or 'n' flag to open new db For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124387&group_id=6473 From noreply@sourceforge.net Mon Dec 4 16:55:07 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 4 Dec 2000 08:55:07 -0800 Subject: [XML-SIG] [Patch #102641] patch for bug #124387 Message-ID: <200012041655.IAA02336@sf-web2.i.sourceforge.net> Patch #102641 has been updated. Project: pyxml Category: None Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: patch for bug #124387 ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102641&group_id=6473 From fdrake@acm.org Mon Dec 4 18:12:30 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Dec 2000 13:12:30 -0500 (EST) Subject: [XML-SIG] location of the PyXML package with python 2.0 In-Reply-To: References: Message-ID: <14891.56974.204543.338570@cj42289-a.reston1.va.home.com> Alexandre Fayolle writes: > In other words, to use PyXML with python2.0, is it necessary to use "from > _xmlplus.dom.ext.reader import Sax2" or can I safely write "from > xml.dom.ext.reader import Sax2" and assume the import voodoo magick is > performed when the xml package is imported ? Use the later. This will raise ImportError if PyXML is not installed. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fredrik@effbot.org Mon Dec 4 19:07:58 2000 From: fredrik@effbot.org (Fredrik Lundh) Date: Mon, 4 Dec 2000 20:07:58 +0100 Subject: [XML-SIG] sax parser leaks memory? Message-ID: <001001c05e25$87e7cc10$3c6340d5@hagrid> on my windows box, this little script runs out of memory within 30 seconds or so... import xml.sax, xml.sax.handler class myHandler(xml.sax.handler.ContentHandler): def startElement(self, name, attrs): pass # print "START", name, attrs.items() def endElement(self, name): pass # print "END", name def characters(self, content): pass # print "DATA", content while 1: p = xml.sax.make_parser() p.setContentHandler(myHandler()) p.feed("hello") p.close() del p what am I doing wrong? or is this what I think it is... From fdrake@acm.org Mon Dec 4 19:19:34 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Dec 2000 14:19:34 -0500 (EST) Subject: [XML-SIG] confusability ... Message-ID: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com> I've been poring over the DOM spec the last few days. Now, I'm confused. ;) When the recommendation refers to the "name" of a node, does it refer to the qualified name? From the text, I'd take it that I should be looking at "prefix:localName" when it says "name" -- is that correct? Or should I only be thinking of this as localName? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dieter@handshake.de Mon Dec 4 20:04:21 2000 From: dieter@handshake.de (Dieter Maurer) Date: Mon, 4 Dec 2000 21:04:21 +0100 (CET) Subject: [XML-SIG] PyXML-0.6.2 install In-Reply-To: References: Message-ID: <14891.63685.106639.569043@lindm.dm> Taylor, John D MWA writes: > /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/__init__.p > y to __init__.pyc > ld.so.1: python: fatal: relocation error: file python: symbol fseeko: > referenced symbol not found You have a Python compiled for large file support (>= Solaris 2.6). You try to run it on a systems without "fseeko" in the standard library (< Solaris 2.6). Your options: * find a Python binary compiled for Solaris 2.5 or below * fetch the Python source and compile it yourself (is very easy) * upgrade your Solaris Dieter From fredrik@effbot.org Mon Dec 4 20:46:53 2000 From: fredrik@effbot.org (Fredrik Lundh) Date: Mon, 4 Dec 2000 21:46:53 +0100 Subject: [XML-SIG] Re: sax parser leaks memory? Message-ID: <000501c05e33$573184e0$3c6340d5@hagrid> I wrote: > on my windows box, this little script runs out of memory > within 30 seconds or so... here's another example: from xml.parsers import expat while 1: p = expat.ParserCreate() From martin@loewis.home.cs.tu-berlin.de Mon Dec 4 23:06:07 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 00:06:07 +0100 Subject: [XML-SIG] ODBC-XML-Interface In-Reply-To: <3A2B62C6.2DF381C1@clondiag.com> (message from Matthias Kirst on Mon, 04 Dec 2000 10:24:22 +0100) References: <3A2B62C6.2DF381C1@clondiag.com> Message-ID: <200012042306.AAA00767@loewis.home.cs.tu-berlin.de> > Is there any Python SAX2-Driver available that parses Databases via > Python Database API. I guess the answer to that question is "no"; I couldn't really tell what "parsing a database" would mean when it comes to XML files. An XML file is a byte sequence in some specific format, and a parser analyses its structure. A database typically is a byte sequence (or several of them) in a totally different structure, and an DBMS is used to access the bytes. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Dec 4 23:04:01 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 00:04:01 +0100 Subject: [XML-SIG] location of the PyXML package with python 2.0 In-Reply-To: (message from Alexandre Fayolle on Mon, 4 Dec 2000 10:57:32 +0100 (CET)) References: Message-ID: <200012042304.AAA00766@loewis.home.cs.tu-berlin.de> > Quite a while ago, there was a discussion on how PyXML could avoid a > clash with the build in xml package in python 2.0 using some deep > import voodoo processing. I cannot recall what the outcome of the > discussion was. The voodoo magic was applied, "import xml.something" will import PyXML if installed, and Python 2.0 xml otherwise. See Python's xml/__init__.py if you want to know how this exactly works - there isn't much magic behind it, really. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon Dec 4 23:15:19 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 00:15:19 +0100 Subject: [XML-SIG] PyXML-0.6.2 install In-Reply-To: <14891.63685.106639.569043@lindm.dm> (message from Dieter Maurer on Mon, 4 Dec 2000 21:04:21 +0100 (CET)) References: <14891.63685.106639.569043@lindm.dm> Message-ID: <200012042315.AAA00833@loewis.home.cs.tu-berlin.de> > You have a Python compiled for large file support (>= Solaris 2.6). > You try to run it on a systems without "fseeko" in the standard > library (< Solaris 2.6). > > Your options: > > * find a Python binary compiled for Solaris 2.5 or below > > * fetch the Python source and compile it yourself > (is very easy) > > * upgrade your Solaris Thanks for this clear analysis (although I'd like confirmation from the original poster that this is indeed the problem). Perhaps you can post it on the Python 2.0 MoinMoin? Regards, Martin From Reza Naima Mon Dec 4 23:40:56 2000 From: Reza Naima (Reza Naima) Date: Mon, 4 Dec 2000 15:40:56 -0800 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order Message-ID: <20001204154056.K25116@reza.net> I'm using PyXML to parse an XML document, modify it, and spit it back out. I'm having a lame problem. It seems as if this third-party software is not working properly, and I need to work around it. Their problem is that they have an element that looks like well, after I parse it, I will occasionally change the attribute1 to be 'true'. after generating the XML from the DOM, It prints it out like this : Now, the 3rd party software is broken and rather than looking for attribute1, it just assumes that attribute1 is the first attribute, and mistakenly reads it as 'false'. (it's actually attribute2 that it's reading). Now, there are to work-arounds... First off, would there be a way for me to guarantee that attribute1 is first on the list of attributes for that element. The other work-around is to get rid of attribute2 and attribute3. This workes, but it seems as PyXML looks at the DTD spec, notices that they are missing, and fills them in. So, I'de like to find a way to get PyXML to ignore the DTD. Are either of these options possible? I've started going through the source, but it's getting uglier and uglier.. Thanks, Reza From martin@loewis.home.cs.tu-berlin.de Mon Dec 4 23:40:14 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 00:40:14 +0100 Subject: [XML-SIG] confusability ... In-Reply-To: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com> Message-ID: <200012042340.AAA00966@loewis.home.cs.tu-berlin.de> > When the recommendation refers to the "name" of a node, does it > refer to the qualified name? From the text, I'd take it that I should > be looking at "prefix:localName" when it says "name" -- is that > correct? Or should I only be thinking of this as localName? You mean, e.g. as the parameter tagElement to createElement? In that case, neither nor - think "namespace unaware". All of localName, prefix and namespaceURI will be None in the Element node being created. Or do you mean the description of the tagName attribute for Element? In that case, it would depend whether the Element was create through createElement or createElementNS - for either case, its content is well-defined. Apart from these two occurences, I can't find any phrase that resembles "name of a node" that isn't also qualified as, e.g. "local name of a node". In general, I believe the intent is that the tagName attribute is the string of tag as it appeared literally in the XML document (if the DOM tree was created through parsing). If you were looking at some other text, please tell us what that was? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Dec 5 00:04:47 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 01:04:47 +0100 Subject: [XML-SIG] Re: sax parser leaks memory? In-Reply-To: <000501c05e33$573184e0$3c6340d5@hagrid> (fredrik@effbot.org) References: <000501c05e33$573184e0$3c6340d5@hagrid> Message-ID: <200012050004.BAA01145@loewis.home.cs.tu-berlin.de> > > on my windows box, this little script runs out of memory > > within 30 seconds or so... > > here's another example: Thanks for the report. Here is a patch. Regards, Martin P.S. It seems like pyexpat also needs to be told about garbage collection... Index: pyexpat.c =================================================================== RCS file: /cvsroot/pyxml/xml/extensions/pyexpat.c,v retrieving revision 1.16 diff -u -r1.16 pyexpat.c --- pyexpat.c 2000/11/02 04:57:40 1.16 +++ pyexpat.c 2000/12/05 00:00:33 @@ -680,6 +680,7 @@ for (i=0; handler_info[i].name != NULL; i++) { Py_XDECREF(self->handlers[i]); } + free (self->handlers); #if PY_MAJOR_VERSION == 1 && PY_MINOR_VERSION < 6 /* Code for versions before 1.6 */ free(self); From martin@loewis.home.cs.tu-berlin.de Tue Dec 5 00:11:50 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 01:11:50 +0100 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <20001204154056.K25116@reza.net> (message from Reza Naima on Mon, 4 Dec 2000 15:40:56 -0800) References: <20001204154056.K25116@reza.net> Message-ID: <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> > Now, there are to work-arounds... First off, would there be a way for me > to guarantee that attribute1 is first on the list of attributes for that > element. That shouldn't be hard to achieve if you use the xml.dom.ext.Printer framework - just subclass the PrintVisitor (or the PrettyPrintVisitor) and replace the visitNameNodeMap method. That iterates over the attributes in the order they have in the dictionary; you could sort them (lexically) before that. > The other work-around is to get rid of attribute2 and attribute3. This > workes, but it seems as PyXML looks at the DTD spec, notices that they > are missing, and fills them in. So, I'de like to find a way to get > PyXML to ignore the DTD. I'm surprised it looks into the DTD. During parsing, you mean? Then you probably use xmlproc as the parser, which is validating. If you'd use pyexpat (or some other non-validating parser), it couldn't possibly use the DTD. Regards, Martin From iron@mso.oz.net Tue Dec 5 00:27:44 2000 From: iron@mso.oz.net (Mike Orr) Date: Mon, 4 Dec 2000 16:27:44 -0800 Subject: [XML-SIG] ODBC-XML-Interface In-Reply-To: <200012042306.AAA00767@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Dec 05, 2000 at 12:06:07AM +0100 References: <3A2B62C6.2DF381C1@clondiag.com> <200012042306.AAA00767@loewis.home.cs.tu-berlin.de> Message-ID: <20001204162744.B2465@mso.oz.net> On Tue, Dec 05, 2000 at 12:06:07AM +0100, Martin v. Loewis wrote: > > Is there any Python SAX2-Driver available that parses Databases via > > Python Database API. > > I guess the answer to that question is "no"; I couldn't really tell > what "parsing a database" would mean when it comes to XML files. An > XML file is a byte sequence in some specific format, and a parser > analyses its structure. A database typically is a byte sequence (or > several of them) in a totally different structure, and an DBMS is used > to access the bytes. Dunno if this'll help, but just in case... I've been thinking for a while about XML's relationship to databases and evaluating its use as an "editing UI" for the (MySQL) databases. It would involve converting a database structure to XML and back, although not using the Database API for the XML part. My idea was to use a list (the rows) of dictionaries (each record) as the intermediate format and to make it "generic" for a variety of databases. In this case, one level of XML tags would correspond to the records, and the child level would be the fields. A parent level could then mean "tables", if that was desired. (And the script would then have to check referential integrity after the edit.) I've done a few prototype tests and am undecided whether to proceed at this point. It's hard to imagine how one would write a Database API driver for XML. XML has no native concept of "this is a record level" and "this is a field level"; the application or DTD has to infer this. XML just has an arbitrary nesting of tags. So for a database driver to extract SELECT name, phone FROM contact_manager.phone_list WHERE name LIKE "Mc%" ORDER BY name from an XML file, the file would have to conform to a specific DTD, it couldn't be just any XML file. At that point, one wonders whether perhaps either XML or the Database API should be thrown out of this project. Because either the project belongs more naturally to one or to the other. -- -Mike (Iron) Orr, iron@mso.oz.net (if mail problems: mso@jimpick.com) http://mso.oz.net/ English * Esperanto * Russkiy * Deutsch * Espan~ol From rsalz@caveosystems.com Tue Dec 5 00:53:43 2000 From: rsalz@caveosystems.com (Rich Salz) Date: Mon, 04 Dec 2000 19:53:43 -0500 Subject: [XML-SIG] ODBC-XML-Interface References: <3A2B62C6.2DF381C1@clondiag.com> <200012042306.AAA00767@loewis.home.cs.tu-berlin.de> <20001204162744.B2465@mso.oz.net> Message-ID: <3A2C3C97.49B16532@caveosystems.com> > I've been thinking for a while about XML's relationship to databases and > evaluating its use as an "editing UI" for the (MySQL) databases. It > would involve converting a database structure to XML and back, although > not using the Database API for the XML part. You might want to poke around microsoft.com and see how they're integrating xml, sqlserver, etc. query the scheme and write the DTD/schema on the fly. replace SQL queries with xpath, etc. parts are pretty cool. shoudl be some good ideas there. /r$ From Reza Naima Tue Dec 5 01:12:24 2000 From: Reza Naima (Reza Naima) Date: Mon, 4 Dec 2000 17:12:24 -0800 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <200012050011.BAA01196@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Dec 05, 2000 at 01:11:50AM +0100 References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> Message-ID: <20001204171224.M25116@reza.net> On Tue, Dec 05, 2000 at 01:11:50AM +0100, Martin v. Loewis sent me this... > > Now, there are to work-arounds... First off, would there be a way for me > > to guarantee that attribute1 is first on the list of attributes for that > > element. > > That shouldn't be hard to achieve if you use the xml.dom.ext.Printer > framework - just subclass the PrintVisitor (or the PrettyPrintVisitor) > and replace the visitNameNodeMap method. That iterates over the > attributes in the order they have in the dictionary; you could sort > them (lexically) before that. I'm lost here.. I can't find anything called PrintVisitor or PrettyPrintVisitor in any of the PyXML Code : reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs grep -i NameNodeMap reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs grep -i PrettyPrintVisitor reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs grep -i PrintVisitor reza@gooz:/usr/local/src/PyXML-0.5.4 > > > > The other work-around is to get rid of attribute2 and attribute3. This > > workes, but it seems as PyXML looks at the DTD spec, notices that they > > are missing, and fills them in. So, I'de like to find a way to get > > PyXML to ignore the DTD. > > I'm surprised it looks into the DTD. During parsing, you mean? Then > you probably use xmlproc as the parser, which is validating. If you'd > use pyexpat (or some other non-validating parser), it couldn't > possibly use the DTD. I tried to specify pyexpat as the parser : ------------- from xml.dom import core, utils import sys fr = utils.FileReader() path = sys.argv[1] file = open(path, 'r') document = fr.readXml(file, 'pyexpat') print document.toxml() --------------- and I got this exception thrown : --------------- # /lc/bin/python /tmp/test.py /var/tmp/JUNIPER.xml Traceback (innermost last): File "/tmp/test.py", line 7, in ? document = fr.readXml(file, 'pyexpat') File "/lc/blackshadow/PyXML/xml/dom/utils.py", line 162, in readXml p = saxexts.make_parser(parserName) File "/lc/blackshadow/PyXML/xml/sax/saxexts.py", line 159, in make_parser return XMLParserFactory.make_parser(parser) File "/lc/blackshadow/PyXML/xml/sax/saxexts.py", line 65, in make_parser raise saxlib.SAXException("No parsers found",None) xml.sax.saxlib.SAXException: No parsers found ------------- Am I doing something wrong? Thanks, Reza From tpassin@home.com Tue Dec 5 01:22:24 2000 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 4 Dec 2000 20:22:24 -0500 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> Message-ID: <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com> Martin v. Loewis wrote - > > I'm surprised it looks into the DTD. During parsing, you mean? Then > you probably use xmlproc as the parser, which is validating. If you'd > use pyexpat (or some other non-validating parser), it couldn't > possibly use the DTD. > What, don't the default parsers read the internal subset and insert default values? I never tried it, but always assumed they did (it's allowed by the Rec for non-validating parsers). Tom P From tpassin@home.com Tue Dec 5 01:27:01 2000 From: tpassin@home.com (Thomas B. Passin) Date: Mon, 4 Dec 2000 20:27:01 -0500 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order References: <20001204154056.K25116@reza.net> Message-ID: <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com> Reza Naima wrote - > ... > The other work-around is to get rid of attribute2 and attribute3. This > workes, but it seems as PyXML looks at the DTD spec, notices that they > are missing, and fills them in. So, I'de like to find a way to get > PyXML to ignore the DTD. > If you can change the DTD, you could make these attributes #IMPLIED without any default values. Then the parser shouldn;t be adding them. Martin's solution of sorting would only work if your "broken" 3rd party software want to see alphabetical order. Fundamentally, xml attributes are never guaranteed to be in any particular order - basically, they are a set, not a list. Cheers, Tom P From Reza Naima Tue Dec 5 02:23:41 2000 From: Reza Naima (Reza Naima) Date: Mon, 4 Dec 2000 18:23:41 -0800 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com>; from tpassin@home.com on Mon, Dec 04, 2000 at 08:27:01PM -0500 References: <20001204154056.K25116@reza.net> <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com> Message-ID: <20001204182341.O25116@reza.net> Alas, I don't want to touch the DTD as it will break the 3rd party software. -r On Mon, Dec 04, 2000 at 08:27:01PM -0500, Thomas B. Passin sent me this... > Reza Naima wrote - > > > ... > > The other work-around is to get rid of attribute2 and attribute3. This > > workes, but it seems as PyXML looks at the DTD spec, notices that they > > are missing, and fills them in. So, I'de like to find a way to get > > PyXML to ignore the DTD. > > > > If you can change the DTD, you could make these attributes #IMPLIED without > any default values. Then the parser shouldn;t be adding them. > > Martin's solution of sorting would only work if your "broken" 3rd party > software want to see alphabetical order. Fundamentally, xml attributes are > never guaranteed to be in any particular order - basically, they are a set, > not a list. From martin@loewis.home.cs.tu-berlin.de Tue Dec 5 08:28:15 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 09:28:15 +0100 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <20001204171224.M25116@reza.net> (message from Reza Naima on Mon, 4 Dec 2000 17:12:24 -0800) References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> <20001204171224.M25116@reza.net> Message-ID: <200012050828.JAA00752@loewis.home.cs.tu-berlin.de> > I'm lost here.. I can't find anything called PrintVisitor or > PrettyPrintVisitor in any of the PyXML Code : Yes, that's part of 4DOM, which only appears in PyXML 0.6. > Am I doing something wrong? Probably, although I can't tell what it is - I don't know the signature of readXml. Regards, Martin From Alexandre.Fayolle@logilab.fr Tue Dec 5 08:37:04 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 5 Dec 2000 09:37:04 +0100 (CET) Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <20001204171224.M25116@reza.net> Message-ID: On Mon, 4 Dec 2000, Reza Naima wrote: > I'm lost here.. I can't find anything called PrintVisitor or > PrettyPrintVisitor in any of the PyXML Code : Try upgrading to the latest release of PyXML (0.6.2 if I'm not mistaken). This might require changing some code since the DOM implementation has changed. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Tue Dec 5 08:53:26 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 09:53:26 +0100 Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order In-Reply-To: <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com> (tpassin@home.com) References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com> Message-ID: <200012050853.JAA00940@loewis.home.cs.tu-berlin.de> > What, don't the default parsers read the internal subset and insert > default values? I never tried it, but always assumed they did (it's > allowed by the Rec for non-validating parsers). Indeed, atleast pyexpat does. I was assuming there is an external subset in the original poster's problem; it makes more sense to assume that it was internal subset. In that case, I don't see a way to stop the parser from filling in the default values. Regards, Martin From paul@prescod.net Tue Dec 5 08:58:58 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 05 Dec 2000 03:58:58 -0500 Subject: [XML-SIG] Specializing DOM exceptions References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> Message-ID: <3A2CAE52.999C4EE5@prescod.net> Sorry for being long-delayed in writing this. "Martin v. Loewis" wrote: > > I'd like to propose an enhancement to the DOM exception classes, > namely that different codes are mapped to different subclasses: > > class IndexSizeErr(DOMException): > code = INDEX_SIZE_ERR > Also, I'd like to make DOMException, the code constants, and the > derived classes part of the official Python API, so all DOM > implementations use the same set of exceptions. My concern is that Python already has an IndexError and it is raised "naturally" (and efficiently) in a lot of places in minidom. At one point we had talked about formalizing a mechanism where Python exceptions stand for DOM exceptions. So IndexSizeErr could be a subclass of Python's IndexError. Python "clients" could check for IndexError as they would in any other Python code. Those that want to treat the DOM stuff specially could do so. This would all be part of the Python-DOM mapping. Paul Prescod From noreply@sourceforge.net Tue Dec 5 10:36:03 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 5 Dec 2000 02:36:03 -0800 Subject: [XML-SIG] [Bug #124521] 4ODS : transaction.begin() throws unexpected exception Message-ID: <200012051036.CAA22000@sf-web2.i.sourceforge.net> Bug #124521, was updated on 2000-Dec-05 02:36 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: 4ODS : transaction.begin() throws unexpected exception Details: When calling begin() on a transaction that has just been commited, a TransactionInProgress exception is raised. My reading of the ODMG C++ and Java bindings (p 179 and 252) is that this should not occur. I'm using 4Suite 0.10.0. >>> from Ft.DbDom import Dom >>> from Ft.Ods import Database >>> from xml.dom import ext >>> import sys, os >>> DBNAME=os.environ.get("ODS_TEST_DB","ods:test") >>> db = Database.Database() >>> db.open(DBNAME) >>> tx = db.new() >>> tx.begin() >>> from Ft.DbDom import Reader >>> r = Reader.Reader()ader >>> f = open('/home/alf/memory.xml') # or some other file >>> doc = r.fromStream(f) >>> db.bind(doc,'memory') >>> tx.commit() >>> tx.begin() Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 54, in begin raise TransactionInProgress() Ft.Ods.Transaction.TransactionInProgress: For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124521&group_id=6473 From larsga@garshol.priv.no Tue Dec 5 11:44:29 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 05 Dec 2000 12:44:29 +0100 Subject: [XML-SIG] sax parser leaks memory? In-Reply-To: <001001c05e25$87e7cc10$3c6340d5@hagrid> References: <001001c05e25$87e7cc10$3c6340d5@hagrid> Message-ID: * Fredrik Lundh | | on my windows box, this little script runs out of memory | within 30 seconds or so... There is nothing wrong with the script, so there must be a memory leak somewhere. I did a similar test where I used pyexpat directly: import pyexpat while 1: p = pyexpat.ParserCreate() p.Parse("This is a little document", 1) del p and that also leaked memory. (Incidentally, it crashed my Win98 box so hard I had to physically turn it off and back on again.) So apparently the leak is in pyexpat somewhere. I tried running Plumbo on your application, but it couldn't find any cycles. --Lars M. From noreply@sourceforge.net Tue Dec 5 12:20:50 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 5 Dec 2000 04:20:50 -0800 Subject: [XML-SIG] [Bug #124529] DbDom : Dom.py uses DOMError which is not declared Message-ID: <200012051220.EAA23700@sf-web2.i.sourceforge.net> Bug #124529, was updated on 2000-Dec-05 04:20 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : Dom.py uses DOMError which is not declared Details: Here's a patch: --- /home/alf/4Suite-0.10/DbDom/Dom.py Fri Nov 17 00:05:37 2000 +++ /usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py Tue Dec 5 12:50:13 2000 @@ -24,6 +24,14 @@ from Ft.DbDom import Comment from Ft.DbDom import ProcessingInstruction +from xml.dom import DOMException +from xml.dom import INDEX_SIZE_ERR,DOMSTRING_SIZE_ERR,HIERARCHY_REQUEST_ERR +from xml.dom import WRONG_DOCUMENT_ERR,INVALID_CHARACTER_ERR,NO_DATA_ALLOWED_ERR +from xml.dom import NO_MODIFICATION_ALLOWED_ERR,NOT_FOUND_ERR,NOT_SUPPORTED_ERR +from xml.dom import INUSE_ATTRIBUTE_ERR,INVALID_STATE_ERR,SYNTAX_ERR +from xml.dom import INVALID_MODIFICATION_ERR,NAMESPACE_ERR,INVALID_ACCESS_ERR + + from Ft.Ods.Collections import LiteralListOfObjects For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124529&group_id=6473 From noreply@sourceforge.net Tue Dec 5 12:36:42 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 5 Dec 2000 04:36:42 -0800 Subject: [XML-SIG] [Bug #124531] DbDom : reader fails when passed an owner document Message-ID: <200012051236.EAA23968@sf-web2.i.sourceforge.net> Bug #124531, was updated on 2000-Dec-05 04:36 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : reader fails when passed an owner document Details: The version is 4Suite0.10.0 with the patches I posted today applied. Here's a sample test session: >>> DBNAME = 'ods:alf@orion:5432:dom_test' >>> from Ft.DbDom import Dom >>> from Ft.Ods import Database >>> from xml.dom import ext >>> import sys, os >>> db = Database.Database() >>> db.open(DBNAME) >>> tx = db.new() >>> tx.begin() >>> doc = Dom.DocumentImp() >>> e = doc.createElementNS('','root') >>> doc.appendChild(e) >>> fragment = '' >>> from Ft.DbDom import Reader >>> r = Reader.Reader() >>> r.fromString(fragment,doc) Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line 49, in fromString rt = self.fromStream(stream, ownerDoc) File "/usr/lib/python1.5/site-packages/Ft/DbDom/Reader.py", line 27, in fromStream Sax2.Reader.fromStream(self,stream,ownerDocument=ownerDocument) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 267, in fromStream self.parser.parseFile(stream) File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 68, in parseFile if self.parser.Parse(buf, 0) != 1: File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 49, in endElement self.doc_handler.endElement(name) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 170, in endElement self._nodeStack[-1].appendChild(new_element) File "/usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py", line 336, in appendChild raise DOMException(HIERARCHY_REQUEST_ERR) xml.dom.DOMException: DOM Error Code 3: Node manipulation results in invalid parent/child relationship. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124531&group_id=6473 From noreply@sourceforge.net Tue Dec 5 12:59:08 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 5 Dec 2000 04:59:08 -0800 Subject: [XML-SIG] [Patch #102658] patch for bugs #124529 and #124531 Message-ID: <200012051259.EAA00548@sf-web3.vaspecialprojects.com> Patch #102658 has been updated. Project: pyxml Category: None Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: patch for bugs #124529 and #124531 ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102658&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Tue Dec 5 22:01:05 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Dec 2000 23:01:05 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <3A2CAE52.999C4EE5@prescod.net> (message from Paul Prescod on Tue, 05 Dec 2000 03:58:58 -0500) References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> Message-ID: <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> > My concern is that Python already has an IndexError and it is raised > "naturally" (and efficiently) in a lot of places in minidom. At one > point we had talked about formalizing a mechanism where Python > exceptions stand for DOM exceptions. > > So IndexSizeErr could be a subclass of Python's IndexError. Python > "clients" could check for IndexError as they would in any other Python > code. Those that want to treat the DOM stuff specially could do so. This > would all be part of the Python-DOM mapping. I don't see the value of this. When applications catch IndexError, they normally do so to wrap a specific index access. In the Python library, I found the following places where IndexError is caught: try: bp = Breakpoint.bpbynumber[number] except IndexError: return 'Breakpoint number (%d) out of range' % number ############# try: result.append(self[key]) except IndexError: result.append(self.dict[key]) ############# try: self.response = args[0] except IndexError: self.response = 'No response given' ... I can't imagine a scenario where a DOM INDEX_SIZE_ERR and a Python IndexError could likewise occur for a block of code, and would deserve identical, specific treatment. That said, if you think it is useful: go ahead and propose a specific patch. It probably can't hurt. Regards, Martin From paul@prescod.net Tue Dec 5 23:02:19 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 05 Dec 2000 18:02:19 -0500 Subject: [XML-SIG] Specializing DOM exceptions References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> Message-ID: <3A2D73FB.19BA4E12@prescod.net> "Martin v. Loewis" wrote: > > ... > > I don't see the value of this. When applications catch IndexError, > they normally do so to wrap a specific index access. I agree. My point is simply that Python already has a way to spell "index-related error" and Python programmers are used to using it. The implementation raises them naturally when you try to do something Index-ish using minidom. So why not use IndexError instead of or in addition to DOM_INDEX_SIZE_ERR. Then, just as you would write: try: bp = Breakpoint.bpbynumber[number] except IndexError: error message You could write: try: element = node.childNodes[number] except IndexError: error message Paul Prescod From martin@loewis.home.cs.tu-berlin.de Wed Dec 6 07:36:49 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 6 Dec 2000 08:36:49 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <3A2D73FB.19BA4E12@prescod.net> (message from Paul Prescod on Tue, 05 Dec 2000 18:02:19 -0500) References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> Message-ID: <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> > Then, just as you would write: > > try: > bp = Breakpoint.bpbynumber[number] > except IndexError: > error message > > You could write: > > try: > element = node.childNodes[number] > except IndexError: > error message You certainly would - no matter how DOMExceptions work (*). The question is whether users would prefer to write try: text1 = text.splitText(offs) except IndexError: error message over try: text1 = text.splitText(offs) except IndexSizeErr: error message Nobody would expect that splitText could possibly raise IndexError. Nobody would guess that it could raise IndexSizeErr, either - but at least you'd have the DOM documentation to tell you. Regards, Martin (*) In DOM, childNodes does not have a []-operator; only a method item(). Interestingly enough, that method is specified to return null in case of an out-of-range index, not to raise INDEX_SIZE_ERR. From noreply@sourceforge.net Wed Dec 6 14:10:31 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 6 Dec 2000 06:10:31 -0800 Subject: [XML-SIG] [Bug #124715] DbDom : DocFrag children are orphans Message-ID: <200012061410.GAA06471@sf-web1.i.sourceforge.net> Bug #124715, was updated on 2000-Dec-06 06:10 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : DocFrag children are orphans Details: When using DocumentFragments in DbDom, a child node of the fragment has no parentNode. This causes StripXml to crash, and possibly other things. Here's a sample demo code: >>> from Ft.DbDom import Dom >>> from Ft.Ods import Database >>> from Ft.DbDom import Reader >>> from xml.dom.ext import PrettyPrint,StripXml >>> >>> DBNAME='ods:alf@orion:5432:dom_test' >>> >>> db = Database.Database() >>> db.open(DBNAME) >>> tx = db.new() >>> tx.begin() >>> >>> doc = Dom.DocumentImp() >>> >>> e = doc.createElementNS('','root') >>> doc.appendChild(e) >>> >>> fragment='''''' >>> r = Reader.Reader() >>> f = r.fromString(fragment,doc) >>> print f.firstChild >>> print f.firstChild.parentNode None For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124715&group_id=6473 From noreply@sourceforge.net Wed Dec 6 16:24:55 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 6 Dec 2000 08:24:55 -0800 Subject: [XML-SIG] [Patch #102687] DbDom patch for bug #124715 Message-ID: <200012061624.IAA25268@sf-web2.i.sourceforge.net> Patch #102687 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: DbDom patch for bug #124715 ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102687&group_id=6473 From noreply@sourceforge.net Wed Dec 6 17:41:35 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 6 Dec 2000 09:41:35 -0800 Subject: [XML-SIG] [Bug #124736] 4Ods LiteralListOfObjects fails on python list operations Message-ID: <200012061741.JAA20916@sf-web1.i.sourceforge.net> Bug #124736, was updated on 2000-Dec-06 09:41 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: 4Ods LiteralListOfObjects fails on python list operations Details: using DbDom, e is an Element: >>> e.childNodes[:] Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/Ft/Ods/Collections/CollectionBase.py", line 120, in __getslice__ rt._4ods_initialize(self._4ods_getContents()[i:j]) TypeError: not enough arguments; expected 3, got 2 For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124736&group_id=6473 From noreply@sourceforge.net Wed Dec 6 18:09:06 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 6 Dec 2000 10:09:06 -0800 Subject: [XML-SIG] [Patch #102688] 4ODS patch for bug #124736 (__getslice__) Message-ID: <200012061809.KAA02218@sf-web3.vaspecialprojects.com> Patch #102688 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: 4ODS patch for bug #124736 (__getslice__) ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102688&group_id=6473 From noreply@sourceforge.net Wed Dec 6 19:17:45 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 6 Dec 2000 11:17:45 -0800 Subject: [XML-SIG] [Patch #102690] PyXML 0.6.2 compile error with Python 2.0b1 Message-ID: <200012061917.LAA23932@sf-web1.i.sourceforge.net> Patch #102690 has been updated. Project: pyxml Category: expat Status: Open Submitted by: calvin Assigned to : Nobody Summary: PyXML 0.6.2 compile error with Python 2.0b1 ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102690&group_id=6473 From fdrake@acm.org Thu Dec 7 06:18:22 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 7 Dec 2000 01:18:22 -0500 (EST) Subject: [XML-SIG] Re: sax parser leaks memory? In-Reply-To: <200012050004.BAA01145@loewis.home.cs.tu-berlin.de> References: <000501c05e33$573184e0$3c6340d5@hagrid> <200012050004.BAA01145@loewis.home.cs.tu-berlin.de> Message-ID: <14895.11182.359006.281805@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > Thanks for the report. Here is a patch. Are you planning to check this in to either Python or PyXML? I think both could use it. ;) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From noreply@sourceforge.net Thu Dec 7 10:25:56 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 7 Dec 2000 02:25:56 -0800 Subject: [XML-SIG] [Bug #124829] DbDom : getAttribute / getAttributeNode bad implementation Message-ID: <200012071025.CAA11021@sf-web2.i.sourceforge.net> Bug #124829, was updated on 2000-Dec-07 02:25 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : getAttribute / getAttributeNode bad implementation Details: Using DbDom, getAttributeNS returns an AttributeImp object (instead of the value of the attribute) and getAttributeNodeNS is not implemented. Sample code (e is an ElementImp object): >>> e.setAttributeNS('','toto','5') >>> e.getAttributeNS('','toto') >>> e.getAttributeNodeNS('','toto') Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 163, in __getattr__ raise AttributeError(name) AttributeError: getAttributeNodeNS For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124829&group_id=6473 From noreply@sourceforge.net Thu Dec 7 10:45:58 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 7 Dec 2000 02:45:58 -0800 Subject: [XML-SIG] [Patch #102700] DbDom : bug #124829 getAttribute/getAttributeNode Message-ID: <200012071045.CAA25413@sf-web1.i.sourceforge.net> Patch #102700 has been updated. Project: pyxml Category: None Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : bug #124829 getAttribute/getAttributeNode ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102700&group_id=6473 From paul@prescod.net Thu Dec 7 11:07:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 07 Dec 2000 06:07:42 -0500 Subject: [XML-SIG] Specializing DOM exceptions References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> Message-ID: <3A2F6F7E.6149A760@prescod.net> "Martin v. Loewis" wrote: > >... > > Nobody would expect that splitText could possibly raise > IndexError. Nobody would guess that it could raise IndexSizeErr, > either - but at least you'd have the DOM documentation to tell you. The DOM documentation does not mention a Python IndexSizeErr exception. That's part of the Python binding so you can only find out about it in the Python documentation. > (*) In DOM, childNodes does not have a []-operator; only a method > item(). Interestingly enough, that method is specified to return null > in case of an out-of-range index, not to raise INDEX_SIZE_ERR. That's part of the Python binding also: >>> from xml.dom.minidom import parse >>> d = parse("c:\\temp\\test.xml") >>> d.childNodes[0] I don't think returning null would be very Pythonic. Paul Prescod From noreply@sourceforge.net Thu Dec 7 11:11:10 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 7 Dec 2000 03:11:10 -0800 Subject: [XML-SIG] [Bug #124839] DbDom : reader.releaseNode fails on DocumentFragments Message-ID: <200012071111.DAA27999@sf-web1.i.sourceforge.net> Bug #124839, was updated on 2000-Dec-07 03:11 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: DbDom : reader.releaseNode fails on DocumentFragments Details: releaseNode calls FreePersistentObject on DF, which are not persistent objects. And fails miserably... sample code (doc is a DocumentImp object) >>> fragment=''' ... ''' >>> r = Reader.Reader() >>> f = r.fromString(fragment,doc) >>> r.releaseNode(f) Traceback (innermost last): File "", line 1, in ? File "/usr/lib/python1.5/site-packages/Ft/DbDom/Reader.py", line 30, in releaseNode FreePersistentObject(doc) File "/usr/lib/python1.5/site-packages/Ft/Ods/__init__.py", line 57, in FreePersistentObject obj._pseudo_del() AttributeError: _pseudo_del For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124839&group_id=6473 From noreply@sourceforge.net Thu Dec 7 11:47:00 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 7 Dec 2000 03:47:00 -0800 Subject: [XML-SIG] [Patch #102704] DbDom releaseNode patch (bug #124839) Message-ID: <200012071147.DAA30704@sf-web1.i.sourceforge.net> Patch #102704 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: AFayolle Assigned to : Nobody Summary: DbDom releaseNode patch (bug #124839) ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102704&group_id=6473 From fdrake@acm.org Thu Dec 7 14:15:18 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 7 Dec 2000 09:15:18 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <3A2F6F7E.6149A760@prescod.net> References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> <3A2F6F7E.6149A760@prescod.net> Message-ID: <14895.39798.226480.773640@cj42289-a.reston1.va.home.com> Paul Prescod writes: > The DOM documentation does not mention a Python IndexSizeErr exception. > That's part of the Python binding so you can only find out about it in > the Python documentation. I'll take a look at this today and see what I think the right thing is. Martin says: > (*) In DOM, childNodes does not have a []-operator; only a method > item(). Interestingly enough, that method is specified to return null > in case of an out-of-range index, not to raise INDEX_SIZE_ERR. Paul responds: > That's part of the Python binding also: ... > I don't think returning null would be very Pythonic. NodeList.item(i) should return None if the recommendation says it should return null, but NodeList[] should handle negative indexes and raise IndexError in the appropriate Pythonic way. The Python DOM API is written that way as well: http://python.sourceforge.net/devel-docs/lib/dom-nodelist-objects.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From noreply@sourceforge.net Thu Dec 7 14:32:46 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 7 Dec 2000 06:32:46 -0800 Subject: [XML-SIG] [Bug #124857] 4ODS operations can occur outside transactions Message-ID: <200012071432.GAA07800@sf-web1.i.sourceforge.net> Bug #124857, was updated on 2000-Dec-07 06:32 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: AFayolle Assigned to : Nobody Summary: 4ODS operations can occur outside transactions Details: Here's a sample code. I would expect the last line to raise a TransactionNotInProgress exception, but in does not. from Ft.DbDom import Dom from Ft.Ods import Database DBNAME='ods:alf@orion:5432:dom_test' db = Database.Database() db.open(DBNAME) tx = db.new() tx.begin() doc = Dom.DocumentImp() e = doc.createElementNS('','root') doc.appendChild(e) tx.commit() e.setAttributeNS('','foo','bar') For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=124857&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Thu Dec 7 16:11:00 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 7 Dec 2000 17:11:00 +0100 Subject: [XML-SIG] Re: sax parser leaks memory? In-Reply-To: <14895.11182.359006.281805@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <000501c05e33$573184e0$3c6340d5@hagrid> <200012050004.BAA01145@loewis.home.cs.tu-berlin.de> <14895.11182.359006.281805@cj42289-a.reston1.va.home.com> Message-ID: <200012071611.RAA00732@loewis.home.cs.tu-berlin.de> > Are you planning to check this in to either Python or PyXML? I > think both could use it. ;) I just committed it to PyXML; thanks for the reminder :-) I plan to synchronize Python pyexpat.c with PyXML pyexpat.c around the time of the next PyXML release; there is a number of other changes that needs to be carried over as well. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Dec 7 16:20:32 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 7 Dec 2000 17:20:32 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <3A2F6F7E.6149A760@prescod.net> (message from Paul Prescod on Thu, 07 Dec 2000 06:07:42 -0500) References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> <3A2F6F7E.6149A760@prescod.net> Message-ID: <200012071620.RAA00800@loewis.home.cs.tu-berlin.de> > The DOM documentation does not mention a Python IndexSizeErr exception. > That's part of the Python binding so you can only find out about it in > the Python documentation. No, but it does mention DOMException with an INDEX_SIZE_ERR code. Such an exception is represented in Python by an IndexSizeErr object (which is indeed a DOMException instance with a .code field of INDEX_SIZE_ERR). So Python's IndexSizeErr and DOM's INDEX_SIZE_ERR are really one and the same - it's just that IDL cannot express exception specialization. > > (*) In DOM, childNodes does not have a []-operator; only a method > > item(). Interestingly enough, that method is specified to return null > > in case of an out-of-range index, not to raise INDEX_SIZE_ERR. > > That's part of the Python binding also: > > >>> from xml.dom.minidom import parse > >>> d = parse("c:\\temp\\test.xml") > > >>> d.childNodes[0] > > > I don't think returning null would be very Pythonic. Indeed. The childNodes collection behaves like a Python sequence - so you'd expect sequence exceptions for the sequence operations. It is also a DOM NodeList implementation, and I'd expect DOM exceptions for DOM operations. I would not, however, expect standard Python exceptions coming out of DOM operations, or DOM exceptions coming out of Python sequence operations. Again, if you think otherwise, just propose a specification (or is class IndexSizeErr(DOMException, IndexError): code = INDEX_SIZE_ERR really all that you are proposing?) I won't object to adding specific text or code, even if I don't see a value in such an addition. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Dec 7 16:22:00 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 7 Dec 2000 17:22:00 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <14895.39798.226480.773640@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> <3A2F6F7E.6149A760@prescod.net> <14895.39798.226480.773640@cj42289-a.reston1.va.home.com> Message-ID: <200012071622.RAA00801@loewis.home.cs.tu-berlin.de> > NodeList.item(i) should return None if the recommendation says it > should return null, but NodeList[] should handle negative indexes and > raise IndexError in the appropriate Pythonic way. Exactly my understanding. Regards, Martin From noreply@sourceforge.net Fri Dec 8 16:22:59 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 8 Dec 2000 08:22:59 -0800 Subject: [XML-SIG] [Bug #125004] 4xslt: XPath doesn't like ISO-8859-1 Message-ID: <200012081622.IAA14454@sf-web3.vaspecialprojects.com> Bug #125004, was updated on 2000-Dec-08 08:22 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: ornicar Assigned to : Nobody Summary: 4xslt: XPath doesn't like ISO-8859-1 Details: Hello, Today, I'm using 4xslt XSLT engine to transform an XML file into another nicer XML file. In the attached example, there is a data.xml file that contains a short description of my agenda but I don't like this representation because the node contains the time of the meeting (before the ' Durée: ' word) and the duration of the meeting (after the ' Durée: ' word) (in french, 'Durée' means 'Duration') : 11h00 Durée: 20mn Therefore, I constructed an xslt file to turn my agenda into a new agenda with separated nodes for the time of the meeting and the duration of the meeting : 20mn The xslt stylesheet has to take the content of the text node child of the node and to divide it in two parts: what is before ' Durée: ' and what is after. This can be easily done in xslt by using the substring-before() and substring-after() functions : and Unfortunately, 4xslt doesn't like the "é" character in an XPath expression (the expression inside the select) and returns the attached stacktrace ending with: xml.xpath.XPathParserBase.SyntaxException: ********** Syntax Exception ********** Exception at or near "Ã" Line: 0, Production Number: 0 Of course changing "Durée" with "Duree" bothly in the xml file and in the xslt stylesheet fixes the bug but this is not very satisfying. Using another xslt engine (e.g. Xalan) allows transformation even in the case with an "é" character. This seems to be a bug in XPath expression processing (4xpath doesn't like ISO-8859-1 characters). O. CAYROL. PS: see attached files are below ... _________________________________________________________________________ Olivier CAYROL LOGILAB - Paris (France) http://www.logilab.com/ For Christmas, give yourself an Intelligent Personal Assistant (free) Pour Noël, offrez-vous un Assistant Personnel Intelligent (c'est gratuit) _________________________________________________________________________ _________________________________________________________________________ data.xml "Initial XML file" 11h00 Durée: 20mn 11h30 Durée: 40mn _________________________________________________________________________ transf.xslt "XSLT stylesheet" _________________________________________________________________________ agenda.xml "Expected XML output" 20mn 40mn _________________________________________________________________________ Stacktrace $ 4xslt data.xml transf.xslt Traceback (innermost last): File "/usr/bin/4xslt", line 5, in ? _4xslt.Run(sys.argv) File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 85, in Run processor.appendStylesheetUri(sty) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 86, in appendStylesheetUri sty = self._styReader.fromUri(styleSheetUri) File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line 99, in fromUri rt = self.fromStream(stream, baseUri, ownerDoc, stripElements) File "/usr/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py", line 300, in fromStream sheet.setup() File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 144, in setup curr_node.setup() File "/usr/lib/python1.5/site-packages/xml/xslt/ValueOfElement.py", line 34, in setup self.__dict__['_expr'] = parser.parseExpression(self._select) File "/usr/lib/python1.5/site-packages/xml/xpath/XPathParser.py", line 36, in parseExpression XPathParserBase.XPathParserBase.parse(self, st) File "/usr/lib/python1.5/site-packages/xml/xpath/XPathParserBase.py", line 60, in parse XPath.cvar.g_prodNum) xml.xpath.XPathParserBase.SyntaxException: ********** Syntax Exception ********** Exception at or near "Ã" Line: 0, Production Number: 0 _________________________________________________________________________ For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125004&group_id=6473 From noreply@sourceforge.net Sat Dec 9 00:26:51 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 8 Dec 2000 16:26:51 -0800 Subject: [XML-SIG] [Bug #125043] Losing attributes when cloning an element Message-ID: <200012090026.QAA32725@usw-sf-web1.sourceforge.net> Bug #125043, was updated on 2000-Dec-08 16:26 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: jkloth Assigned to : nobody Summary: Losing attributes when cloning an element Details: I ran into a bug with cloning non-namespace XML Elements with attributes, illustrated by the following testcase: >>> from xml.dom.Document import Document >>> dom=Document(None) >>> dom.appendChild(dom.createElement('foo')) >>> dom.documentElement.setAttribute('name', 'bar') >>> dom.documentElement.setAttribute('spam', 'eggs') >>> clone=dom.documentElement.cloneNode(deep=0) >>> clone >>> dom.documentElement >>> clone.attributes }> >>> dom.documentElement.attributes , 'name': }> For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125043&group_id=6473 From Mike.Olson@fourthought.com Sat Dec 9 06:52:58 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 08 Dec 2000 23:52:58 -0700 Subject: [XML-SIG] SourceForge Message-ID: <3A31D6CA.5F0629CD@FourThought.com> Am I just having a bad night, or is something borken at SourceForge? I've been trying to update bog #124375 and I keep getting index errors. So I tried to submit a bug to sourceForge, and I get roughly the same error. I looked all over the site but couldn't find an email address to ask, so I thought I'd see if others are have problems. Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Dec 9 08:07:35 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 9 Dec 2000 09:07:35 +0100 Subject: [XML-SIG] SourceForge In-Reply-To: <3A31D6CA.5F0629CD@FourThought.com> (message from Mike Olson on Fri, 08 Dec 2000 23:52:58 -0700) References: <3A31D6CA.5F0629CD@FourThought.com> Message-ID: <200012090807.JAA00693@loewis.home.cs.tu-berlin.de> > Am I just having a bad night, or is something borken at SourceForge? It appears indeed that SF is down. Currently, I get a page that reads An error occured in the logger. ERROR: Relation 'activity_log' does not exist That's what you get for using PHP3 instead of Python :-) Regards, Martin From Alexandre.Fayolle@logilab.fr Sat Dec 9 10:04:38 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Sat, 9 Dec 2000 11:04:38 +0100 (CET) Subject: [XML-SIG] sourceforge PyXML project disappeared ?! Message-ID: Hello, It looks like sourceforge is back online. It also looks like the PyXML project was lost in deep space: http://sourceforge.net/projects/PyXML/ send me to an invalid project page. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From Alexandre.Fayolle@logilab.fr Sat Dec 9 10:09:03 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Sat, 9 Dec 2000 11:09:03 +0100 (CET) Subject: [XML-SIG] sourceforge PyXML project disappeared ?! In-Reply-To: Message-ID: On Sat, 9 Dec 2000, Alexandre Fayolle wrote: > Hello, > > It looks like sourceforge is back online. It also looks like the PyXML > project was lost in deep space: http://sourceforge.net/projects/PyXML/ > send me to an invalid project page. Well, sorry for raising a false alarm, it still there, but the page is /projects/pyxml/ (lowercase), and this is why my bookmark no longer worked. This leaves me wondering on why the name changed, though... Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Sat Dec 9 10:33:29 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 9 Dec 2000 11:33:29 +0100 Subject: [XML-SIG] sourceforge PyXML project disappeared ?! In-Reply-To: (message from Alexandre Fayolle on Sat, 9 Dec 2000 11:09:03 +0100 (CET)) References: Message-ID: <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> > Well, sorry for raising a false alarm, it still there, but the page is > /projects/pyxml/ (lowercase), and this is why my bookmark no longer > worked. > > This leaves me wondering on why the name changed, though... To my knowledge, the SF project was always pyxml, and thus never changed. I don't know why a different spelling was accepted before. Regards, Martin From Mike.Olson@fourthought.com Sat Dec 9 18:53:02 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 09 Dec 2000 11:53:02 -0700 Subject: [XML-SIG] sourceforge PyXML project disappeared ?! References: <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> Message-ID: <3A327F8E.C65DA0B6@FourThought.com> "Martin v. Loewis" wrote: > hmm, I still cannot change the status of a bug though.... Mike > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Dec 9 21:33:09 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 9 Dec 2000 22:33:09 +0100 Subject: [XML-SIG] sourceforge PyXML project disappeared ?! In-Reply-To: <3A327F8E.C65DA0B6@FourThought.com> (message from Mike Olson on Sat, 09 Dec 2000 11:53:02 -0700) References: <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> <3A327F8E.C65DA0B6@FourThought.com> Message-ID: <200012092133.WAA01892@loewis.home.cs.tu-berlin.de> > hmm, I still cannot change the status of a bug though.... You could not modify the status of a patch; I have changed that. I can't see any reason why you can't modify the status of a bug - what is the response you get from SF? Regards, Martin From Mike.Olson@fourthought.com Sat Dec 9 22:06:42 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sat, 09 Dec 2000 15:06:42 -0700 Subject: [XML-SIG] sourceforge PyXML project disappeared ?! References: <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> <3A327F8E.C65DA0B6@FourThought.com> <200012092133.WAA01892@loewis.home.cs.tu-berlin.de> Message-ID: <3A32ACF2.3D387E23@FourThought.com> "Martin v. Loewis" wrote: > > > hmm, I still cannot change the status of a bug though.... > > You could not modify the status of a patch; I have changed that. I > can't see any reason why you can't modify the status of a bug - what > is the response you get from SF? Never mind, it seems as they have fixed something there. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Sat Dec 9 22:14:26 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 9 Dec 2000 14:14:26 -0800 Subject: [XML-SIG] [Bug #125186] xsl:number fails for two-level numbering Message-ID: <200012092214.OAA27650@usw-sf-web2.sourceforge.net> Bug #125186, was updated on 2000-Dec-09 14:14 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: nobody Assigned to : nobody Summary: xsl:number fails for two-level numbering Details: Consider the following stylesheet.

Apply this to the following XML: Chapter 1 Chapter 1 content. Section 1.1 Section 1.1 content. Section 1.2 Section 1.2 content. Chapter 2 Chapter 2 content. Section 2.1 Section 2.1 content. Section 2.2 Section 2.2 content. The result is:

1 Chapter 1

Chapter 1 content.

3 Section 1.1

Section 1.1 content.

3 Section 1.2

Section 1.2 content.

2 Chapter 2

Chapter 2 content.

4 Section 2.1

Section 2.1 content.

4 Section 2.2

Section 2.2 content. As you can see, the level two numbers are wrong. Instead of e.g. "2.2", it gives a single number, which appears to the be total of all div1 through the current one, plus div2 elements through the current entire div1 section. In other words, Section 2.1 is numbered as "4" since there are two div1 sections up through 2.2, and two div2 sections in the second div1 section. This is a subset of the "xmlspec.xsl" file, used to transform W3C specifications. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125186&group_id=6473 From dieter@handshake.de Sun Dec 10 08:45:32 2000 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 10 Dec 2000 09:45:32 +0100 (CET) Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference Message-ID: <14899.17068.574076.957348@lindm.dm> I use the SAX2 implementation bundled with the Python 2.0 distribution to process DocBook/XML documents. When I turn on validation, "xmlproc" complains "unsupported character number 'XXXX' in character reference" for each XXXX larger than 255. Apparently, "xmlproc" does not yet know that such character references no longer make problems with the new Python unicode support. Is there already a fix? If not, I can look into the problem. Dieter From noreply@sourceforge.net Sun Dec 10 09:10:34 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Dec 2000 01:10:34 -0800 Subject: [XML-SIG] [Bug #125225] system-property(xsl:vendor-url) fails Message-ID: <200012100910.BAA06127@usw-sf-web1.sourceforge.net> Bug #125225, was updated on 2000-Dec-10 01:10 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: rtmyers Assigned to : nobody Summary: system-property(xsl:vendor-url) fails Details: Using 4XSLT v.0.10.2, RH 6.2, Python 1.5.2. Following stylesheet: Running this against arbitrary XML file gives traceback: [rtm@rabbit xsgf]# 4xslt table.xml vendor.xsl Traceback (innermost last): File "/usr/bin/4xslt", line 5, in ? _4xslt.Run(sys.argv) File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 87, in Run topLevelParams=top_level_params) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 127, in runUri writer, uri, outputStream) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 177, in runNode self.applyTemplates(context, None) File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 193, in applyTemplates found = sty.applyTemplates(context, mode, self, params) File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 356, in applyTemplates patternInfo[TEMPLATE].instantiate(context, processor, params) File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line 115, in instantiate context = child.instantiate(context, processor)[0] File "/usr/lib/python1.5/site-packages/xml/xslt/MessageElement.py", line 41, in instantiate context = child.instantiate(context, processor)[0] File "/usr/lib/python1.5/site-packages/xml/xslt/ValueOfElement.py", line 41, in instantiate result = self._expr.evaluate(context) File "/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py", line 171, in evaluate return self._func(context, arg0) File "/usr/lib/python1.5/site-packages/xml/xslt/ExtFunctions.py", line 126, in SystemProperty if split_name[0] == XSL_NAMESPACE: NameError: XSL_NAMESPACE [rtm@rabbit xsgf]# For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125225&group_id=6473 From kentsin@sinaman.com Mon Dec 11 00:00:53 2000 From: kentsin@sinaman.com (kentsin) Date: Sun Dec 10 18:00:53 CST 2000 Subject: [XML-SIG] xml / html parsing for webbot Message-ID: <20001210100053.20311.qmail@hk.sina.com.hk> Dear All, I am learning to build a webbot. I am reading Jeff's webbot code. I have some difficults and doubts: 1. xml.dom.walker and xml.dom.writer is missing in python 2.0 's xml package. What are their usage? 2. I have think of not building a dom tree but using regular expressions to extract all links. Can somebody tell me from their experience some comparision of the two approaches? What is better? Especially I found some pages which were generated by scripts, do contain unmatched tags in the pages. How the two approaches handle them? Rgs, KEnt Sin =================================================================== ·s®ö§K¶O¹q¤l¶l½c http://sinamail.sina.com.hk ¥ß§Y¤U¸ü SinaTicker http://sinaticker.sina.com.hk From martin@loewis.home.cs.tu-berlin.de Sun Dec 10 10:48:39 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 10 Dec 2000 11:48:39 +0100 Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: <20001210100053.20311.qmail@hk.sina.com.hk> (message from kentsin on Sun Dec 10 18:00:53 CST 2000) References: <20001210100053.20311.qmail@hk.sina.com.hk> Message-ID: <200012101048.LAA00761@loewis.home.cs.tu-berlin.de> > 1. xml.dom.walker and xml.dom.writer is missing in python 2.0 's xml > package. What are their usage? Indeed. These classes originate from PyDOM, which is obsolete. In Python 2.0, only minidom is included. There is no equivalent of a walker class in minidom. Instead of a writer, you can probably use .toxml() in most cases. > I have think of not building a dom tree but using regular > expressions to extract all links. Can somebody tell me from their > experience some comparision of the two approaches? What is better? In principle, an approach using regular expressions could fail more easily than a solution that really analysis the structure of the document. For most practical purposes, the solution using regular expressions will work just fine. In the end, all that matters is that it works. > Especially I found some pages which were generated by scripts, do > contain unmatched tags in the pages. How the two approaches handle > them? For that purpose, the DOM authors made special support for HTML. You normally need a special parser, one that is capable of processing HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I believe, is capable of converting arbitrary HTML into a DOM tree. Regards, Martin From Alexandre.Fayolle@logilab.fr Sun Dec 10 13:21:37 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Sun, 10 Dec 2000 14:21:37 +0100 (CET) Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: <200012101048.LAA00761@loewis.home.cs.tu-berlin.de> Message-ID: > For that purpose, the DOM authors made special support for HTML. You > normally need a special parser, one that is capable of processing > HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I > believe, is capable of converting arbitrary HTML into a DOM tree. Logilab contributed a much improved version of FromHtml to 4DOM a while ago which was included in 4Suite 0.9.2 I think. I don't know which version is shipped in PyXml 0.6.2, though. If you need this piece of code, and can't find it in your distribution, jsut ask. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From uche.ogbuji@fourthought.com Sun Dec 10 13:32:03 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 10 Dec 2000 06:32:03 -0700 Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: Message from "Martin v. Loewis" of "Sun, 10 Dec 2000 11:48:39 +0100." <200012101048.LAA00761@loewis.home.cs.tu-berlin.de> Message-ID: <200012101332.GAA11760@localhost.localdomain> > > Especially I found some pages which were generated by scripts, do > > contain unmatched tags in the pages. How the two approaches handle > > them? > > For that purpose, the DOM authors made special support for HTML. You > normally need a special parser, one that is capable of processing > HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I > believe, is capable of converting arbitrary HTML into a DOM tree. Correct as usual, Martin, although Python's standard htmllib gets much of the credit for wrangling unruly HTML. Here's a little demo. It shows how to read in any HTML and print out shiny XHTML. Basically, it has the functionality of the highly popular Tidy (http://www.w3.org/People/Raggett/tidy/) or JTidy (http://lempinen.net/sami/jti dy/) but with XHTML output (Can be easily modified to produce cleaned HTML output) [uogbuji@borgia one-offs]$ cat html-to-xhtml-converter.py import sys from xml.dom.ext.reader import HtmlLib import xml.dom.ext #set up a re-usable reader object reader = HtmlLib.Reader() #parse HTML ffrom file or URI given on command line. Return the DOM document doc = reader.fromUri(sys.argv[1]) #Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for empty tags, all attributes with given value, etc. xml.dom.ext.XHtmlPrettyPrint(doc) [uogbuji@borgia one-offs]$ cat data/example-from-wsdl-xslt-article.html Service summary: EndorsementSearch

Service summary: EndorsementSearch


Service: EndorsementSearchService
snowboarding-info.com Endorsement Service
Port: http://www.snowboard-info.com/Endorse mentSearch SOAP
[uogbuji@borgia one-offs]$ python html-to-xhtml-converter.py data/example-from-wsdl-xslt-article.html Service summary: EndorsementSearch <meta charset='UTF-8' http-equiv='content-type' content='text/html'/> </head> <body style='background: #ffffff'> <h1>Service summary: EndorsementSearch</h1> <hr/> <table> <thead/>Service: EndorsementSearchService <tbody/> <tr> <td style='background: #ccffff' colspan='3'> <i>snowboarding-info.com Endorsement Service</i> </td> </tr> <tr> <td>Port:</td> <td style='background: #ffccff'>http://www.snowboard-info.com/EndorsementSe arch</td> <td style='background: #ff66ff'>SOAP</td> </tr> </table> </body> </html> [uogbuji@borgia one-offs]$ -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Dec 10 13:51:59 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 10 Dec 2000 06:51:59 -0700 Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> of "Sun, 10 Dec 2000 14:21:37 +0100." <Pine.LNX.4.21.0012101415420.16772-100000@orion.logilab.fr> Message-ID: <200012101351.GAA11831@localhost.localdomain> > > For that purpose, the DOM authors made special support for HTML. You > > normally need a special parser, one that is capable of processing > > HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I > > believe, is capable of converting arbitrary HTML into a DOM tree. > > Logilab contributed a much improved version of FromHtml to 4DOM a while > ago which was included in 4Suite 0.9.2 I think. I don't know which version > is shipped in PyXml 0.6.2, though. If you need this piece of code, and > can't find it in your distribution, jsut ask. This was after PyXML 0.6.2, so it's not included. We have a few improvements to make yet to 4DOM before we release 4Suite 0.10.1 in a few weeks. Are there any plans on the horizon to release PyXML 0.6.3? If so, we'll get all the changes in before then. I should note that the code from Logilab meticulously sets up the HTML content model according to spec. It's a brilliant piece of work. However, in many cases of HTML usage you would be able to get by just fine with the DOM code in PyXML 0.6.2. If you start to run into problems, you might want to install 4Suite 0.10.0 which includes LogiLab's code and many other fixes. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Dec 10 14:20:59 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 10 Dec 2000 07:20:59 -0700 Subject: [XML-SIG] 4XPath and Unicode Message-ID: <3A33914B.6A28143A@fourthought.com> See https://sourceforge.net/bugs/?func=detailbug&group_id=6473&bug_id=125004 This bug covers the fact that 4XPath is really limited to the US-ASCII encoding. No ISO-8859-?, no Unicode, none of the other encodings supported by the i18n-sig such as JIS or BIG5. This really sucks. Especially after we've put so much work into i18n in other parts of 4Suite, and especially since Python 2.0 finally gives us native character encoding support. The problem is that 4XPath's lexer is implemented using Flex. Flex is really ancient code still mired in the world of C's char. Even 8-bit scanners can be a big deal for Flex, never mind wide characters. We could hack in ISO-8859-? support into the Flex at great effort and close the above bug, but it doesn't provide a long-term fix. Another provlem with Flex is that we are having the devil of a time making it thread-safe, which we need for 4Suite Server. Bison, in contrast, we've got safely concurrent now. Conclusion: we've pretty much decided to ditch Flex, and ditch it quickly. In fact we're working towards 4Suite 0.10.1's using a different scanner entirely when it's released in a couple of weeks. Here are the options we're exploring: 1) Move all XPath parsing to another technology, perhaps Spark (http://www.csr.uvic.ca/~aycock/python/). Pro: it's in Python and should be easy to maintain. Con: we might lose performance, and most Python scanner/parser packages seem to be only sporadically maintained. For instance, Spark's last update (0.6.1) was in April. We'd like to avoid being stuck maintaining a parser package in addition to everything else. 2) Use an existing Python package for lexing, for instance mxTextTools. Pro: should be easier to convert and maintain. Con: performance? encoding support? 3) Write our own scanner in Python using SRE. We'd probably have one Python code to tokenize and then write a shell in C to feed the tokens to Bison. This would ensure best performance. Pro: performance, we get to add all the encoding support we want directly. Con: maintainability. We'd love to hear of any other ideas or comments on the above. It will be a good deal of work to fix our scanner, and we'd like to only have to do it once, with relatively straightforward maintenance thereafter. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From calvin@cs.uni-sb.de Sun Dec 10 14:35:14 2000 From: calvin@cs.uni-sb.de (Bastian Kleineidam) Date: Sun, 10 Dec 2000 15:35:14 +0100 (CET) Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: <20001210100053.20311.qmail@hk.sina.com.hk> Message-ID: <Pine.LNX.4.21.0012101504151.20378-100000@earth.cs.uni-sb.de> Hello Kent, >2. I have think of not building a dom tree but using regular expressions > to extract all links. Can somebody tell me from their experience some > comparision of the two approaches? What is better? Especially I found > some pages which were generated by scripts, do contain unmatched tags in > the pages. How the two approaches handle them? I am using Regexps: _linkMatcher = r""" (?i) # case insensitive < # open tag \s* # whitespace %s # tag name \s+ # whitespace [^>]*? # skip leading attributes %s # attrib name \s* # whitespace = # equal sign \s* # whitespace (?P<value> # attribute value ".*?" | # in double quotes '.*?' | # in single quotes [^\s>]+) # unquoted ([^">]|".*?")* # skip trailing attributes > # close tag """ # and now fill in some tags: LinkPatterns = ( re.compile(_linkMatcher % ("a", "href"), re.VERBOSE), re.compile(_linkMatcher % ("img", "src"), re.VERBOSE), re.compile(_linkMatcher % ("form", "action"), re.VERBOSE), re.compile(_linkMatcher % ("body", "background"), re.VERBOSE), re.compile(_linkMatcher % ("frame", "src"), re.VERBOSE), re.compile(_linkMatcher % ("link", "href"), re.VERBOSE), # <meta http-equiv="refresh" content="x; url=..."> re.compile(_linkMatcher % ("meta", "url"), re.VERBOSE), re.compile(_linkMatcher % ("area", "href"), re.VERBOSE), re.compile(_linkMatcher % ("script", "src"), re.VERBOSE), ) This regex even catches missing quotes: <a href="bla> <a href=bla"> But only if you strip leading and trailing quotes from the URL. For a complete code example get Linkchecker: http://linkchecker.sourceforge.net and look in linkcheck/UrlData.py Bastian From chapmanb@arches.uga.edu Sun Dec 10 14:51:21 2000 From: chapmanb@arches.uga.edu (Brad Chapman) Date: Sun, 10 Dec 2000 09:51:21 -0500 (EST) Subject: [XML-SIG] 4XPath and Unicode In-Reply-To: <3A33914B.6A28143A@fourthought.com> References: <3A33914B.6A28143A@fourthought.com> Message-ID: <14899.39017.639236.429461@taxus.athen1.ga.home.com> Uche writes: > Conclusion: we've pretty much decided to ditch Flex > > Here are the options we're exploring: [Spark, mxTextTools, SRE] > We'd love to hear of any other ideas or comments on the above. One option to consider is Martel, written by Andrew Dalke: http://www.biopython.org/~dalke/Martel It's a parser generator which allows you to build up a grammer for a format using regular expressions. It provides a bunch of "high level" regular expressions to allow you to build up a readable and maintainable grammer for what you want to parse. It uses SRE and mxTextTools (both of which you mention above) under the covers, and returns the parse tree as XML callbacks that you can deal with using a standard SAX handler. I've used it to develop parsers for a couple of different formats and found it very nice to use. It is a "spare time" project of Andrew's, but he is working on it quite often, so it is currently very well-maintained. I hope this helps! Brad From martin@loewis.home.cs.tu-berlin.de Sun Dec 10 18:32:18 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 10 Dec 2000 19:32:18 +0100 Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: <200012101351.GAA11831@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012101351.GAA11831@localhost.localdomain> Message-ID: <200012101832.TAA00709@loewis.home.cs.tu-berlin.de> > This was after PyXML 0.6.2, so it's not included. We have a few > improvements to make yet to 4DOM before we release 4Suite 0.10.1 in > a few weeks. Are there any plans on the horizon to release PyXML > 0.6.3? If so, we'll get all the changes in before then. There is a number of pending minidom changes which need to be reviewed, corrected, and applied in order, both to PyXML and Python proper. I don't know when this will happen, it much depends on Fred, Andrew and myself finding the time for it. After that, I'd like to release 0.6.3. If possible, I'd like to get a 4DOM update there, too - but it would not be a problem to release PyXML 0.6.4 shortly after 4Suite 0.10.1. Release early, release often. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun Dec 10 18:41:32 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 10 Dec 2000 19:41:32 +0100 Subject: [XML-SIG] 4XPath and Unicode In-Reply-To: <3A33914B.6A28143A@fourthought.com> (message from Uche Ogbuji on Sun, 10 Dec 2000 07:20:59 -0700) References: <3A33914B.6A28143A@fourthought.com> Message-ID: <200012101841.TAA00760@loewis.home.cs.tu-berlin.de> > 1) Move all XPath parsing to another technology, perhaps Spark > (http://www.csr.uvic.ca/~aycock/python/). Pro: it's in Python and > should be easy to maintain. I hope I can find some time to write an XPath parser in YAPPS. Is there some readily-readable grammar for XPath? I find the bisongen input of 4Suite extremely hard to read. I think the time would not be wasted to evaluate different parser toolkits in that application. I have the feeling that XPath is sufficiently simple put together a parser in any of these toolkits; we could then evaluate speed and readability of the generator input. > Con: we might lose performance, and most Python scanner/parser > packages seem to be only sporadically maintained. For instance, > Spark's last update (0.6.1) was in April. We'd like to avoid being > stuck maintaining a parser package in addition to everything else. As for performance: Most of it probably comes from the lexing speed; with sre, I hope that we can perform comparable to flex. If 4Suite (and perhaps PyXML) made an educated selection for a parser generator toolkit, that may set sufficient precedence of establishing a standard, and getting the author of the toolkit interested in improving it. Furthermore, these things normally don't need much maintainance - bison is still in wide use, even though it is not maintained anymore. > 2) Use an existing Python package for lexing, for instance mxTextTools. > Pro: should be easier to convert and maintain. Con: performance? > encoding support? I'd discourage yet another C module. It is *very* unlikely that they get reasonable Unicode support. > 3) Write our own scanner in Python using SRE. We'd probably have one > Python code to tokenize and then write a shell in C to feed the tokens > to Bison. This would ensure best performance. Pro: performance, we get > to add all the encoding support we want directly. Con: maintainability. Also, this is exactly what all these parser toolkits do - I don't think there is need for yet another one. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun Dec 10 18:46:36 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 10 Dec 2000 19:46:36 +0100 Subject: [XML-SIG] 4XPath and Unicode In-Reply-To: <14899.39017.639236.429461@taxus.athen1.ga.home.com> (message from Brad Chapman on Sun, 10 Dec 2000 09:51:21 -0500 (EST)) References: <3A33914B.6A28143A@fourthought.com> <14899.39017.639236.429461@taxus.athen1.ga.home.com> Message-ID: <200012101846.TAA00854@loewis.home.cs.tu-berlin.de> > I've used it to develop parsers for a couple of different formats and > found it very nice to use. It is a "spare time" project of Andrew's, > but he is working on it quite often, so it is currently very > well-maintained. It seems that this supports only regular expressions, so it can't really express an LR(n) language, such as XPath, can it? Regards, Martin From uche.ogbuji@fourthought.com Sun Dec 10 19:19:59 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 10 Dec 2000 12:19:59 -0700 Subject: [XML-SIG] 4XPath and Unicode In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> of "Sun, 10 Dec 2000 19:46:36 +0100." <200012101846.TAA00854@loewis.home.cs.tu-berlin.de> Message-ID: <200012101919.MAA01996@localhost.localdomain> > > I've used it to develop parsers for a couple of different formats and > > found it very nice to use. It is a "spare time" project of Andrew's, > > but he is working on it quite often, so it is currently very > > well-maintained. > > It seems that this supports only regular expressions, so it can't > really express an LR(n) language, such as XPath, can it? I think this shoots it down. XPath is not an enormously complex language, but it's not a regular grammar either. I don't have a formal proof that XPath is LR(k), but I've written enough parsers that I think I can confidently say so (besides, Martin thinks so as well). This means that we'll either have to find an LR(k) parser engine for Python, or just replace the scanner and stick with Bison. I'm inclined to agree with Martin in his other post that we should just find a scanner package for Python that already takes advantage of SRE and feed its token stream to Bison. I'll try to investigate some lexer toolkits today. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Dec 10 19:48:57 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sun, 10 Dec 2000 12:48:57 -0700 Subject: [XML-SIG] [Fwd: [4suite] For Python 1.5.2 users] Message-ID: <3A33DE29.7C758006@fourthought.com> -------- Original Message -------- Subject: [4suite] For Python 1.5.2 users Date: Sun, 10 Dec 2000 12:42:17 -0700 From: Uche Ogbuji <uche.ogbuji@fourthought.com> Organization: Fourthought, Inc To: 4suite@fourthought.com At Alexandre's suggestion I've put up Martin von Loewis's add-on package for unicode and ISO-8859-?. This can be used with PyXML 0.6.0 through 0.6.2. Versions 0.6.3 and higher will have it build in, but for now it's available at ftp://ftp.fourthought.com/pub/third-party/xml-sig/unicode-py152-20001210.tar.gz -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python _______________________________________________ 4suite mailing list 4suite@lists.fourthought.com http://lists.fourthought.com/mailman/listinfo/4suite From fdrake@acm.org Sun Dec 10 20:56:45 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 10 Dec 2000 15:56:45 -0500 (EST) Subject: [XML-SIG] xml / html parsing for webbot In-Reply-To: <200012101832.TAA00709@loewis.home.cs.tu-berlin.de> References: <200012101351.GAA11831@localhost.localdomain> <200012101832.TAA00709@loewis.home.cs.tu-berlin.de> Message-ID: <14899.60941.119363.272129@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > There is a number of pending minidom changes which need to be > reviewed, corrected, and applied in order, both to PyXML and Python > proper. I don't know when this will happen, it much depends on Fred, > Andrew and myself finding the time for it. I've been doing more XML stuff lately, so this is becoming more of a priority for me. I'm not sure exactly when I'll be able to get it done, however, but it should be before too long. On a related note, I've just written an xml.sax.xmlreader.XMLReader subclass that reads ESIS data, so we should be able to drive a SAX application from an ESIS stream. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From kens@sightreader.com Mon Dec 11 04:23:39 2000 From: kens@sightreader.com (Ken) Date: Sun, 10 Dec 2000 22:23:39 -0600 Subject: [XML-SIG] Child nodes and lazy evaluation (Generators) Message-ID: <003501c0632a$237aa3b0$04090a0a@devup.upcast.com> >> This sounds like an excellent utility for a "pull DOM parser", where >> you receive DOM events as you ask for them, out of a queue. In a >> basic "pull DOM parser" though, no real magic is necessary as long as >> you have an incremental parser feeding the DOM builder. >> >> James Clark's Jade DSSSL processor uses a similar technique for >> manipulating partial groves. Jade had the ability to be parsing the >> source file and doing the transform in parallel, if any node requested >> was not yet parsed, the node request would block until the parser >> thread caught up. > >Yes. If Python gets coroutines, this would be pretty simple to implement as >well. As I've mentioned on the 4Suite lists, if some of the facilities from >Stackless were to move into cpython (which seems likely), a _lot_ of >sophistication will become available for XML processing patterns that I think >would put us way ahead of Java, Perl, etc. Who needs to wait for coroutines? The generator module already works! Coroutines would, of course, make it faster, but it's fine as it is for I/O bound processes. Also, the Generator module can be rewritten later with coroutines (or related technique) without changing the usage syntax, so a current solution could have a long lifetime. The main point of Generator is the pretty usage syntax (i.e. a buffered asyncronous threaded data stream as a simple sequence object). James Clark's Jade approach sounds exactly like what I have in mind, except for the usage syntax. The children of a node would be returned as a Generator (which would behave just like a list, except that it would block for unparsed children). Admittedly, this approach is a little frivolous in it's creation of threads (you should ideally only need one parser thread), but as I mentioned, this shouldn't be a problem for I/O bound situations, and maybe the nested Generator concept could be improved upon without changing the syntax (e.g. the generators could share a thread). The Generator module is available at: http://starship.python.net/crew/seehof/Generator.html From mal@lemburg.com Mon Dec 11 10:03:37 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Dec 2000 11:03:37 +0100 Subject: [XML-SIG] 4XPath and Unicode References: <3A33914B.6A28143A@fourthought.com> <200012101841.TAA00760@loewis.home.cs.tu-berlin.de> Message-ID: <3A34A679.F2F6BC49@lemburg.com> "Martin v. Loewis" wrote: > > > 2) Use an existing Python package for lexing, for instance mxTextTools. > > Pro: should be easier to convert and maintain. Con: performance? > > encoding support? > > I'd discourage yet another C module. It is *very* unlikely that they > get reasonable Unicode support. I wouldn't count on that given mxTextTools' heritage ;-) In fact, there will be a version which supports Unicode by mid-2001 because I have a need for this myself. It will most likely use the same technique as SRE: simply provide two separate implementations, one for 8-bit and one for 16-bit characters. BTW, why can't you design a parser API and then provide parser implementations which provide it ?! You'd then have the possibility to switch to another implementation later on. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji@fourthought.com Mon Dec 11 16:08:55 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 11 Dec 2000 09:08:55 -0700 Subject: [XML-SIG] Reader architecture and 4DOM Message-ID: <200012111608.JAA04088@localhost.localdomain> See https://sourceforge.net/bugs/?func=detailbug&group_id=6473&bug_id=124382 I've taken care of most of this, but there is one remaining dependence on Ft.Lib in 4DOM. All the readers inherit from Ft.Lib.ReaderBase. The problem is that the same readerbase is used for the Domlettes in Ft.Lib. It seems the only ways to eliminate the dependency are: 1) Move ReaderBase to xml.dom.ext. Probably easiest, but I think it's logically incorrect. The reader architecture is more general than just 4DOM. 2) Hack the distribution code to maintain copies of the reader base between Ft.Lib and xml.dom.ext. This would be more work, and likely error-prone. Any ideas? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Mon Dec 11 17:49:14 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 11 Dec 2000 18:49:14 +0100 Subject: [XML-SIG] 4XPath and Unicode In-Reply-To: <3A34A679.F2F6BC49@lemburg.com> (mal@lemburg.com) References: <3A33914B.6A28143A@fourthought.com> <200012101841.TAA00760@loewis.home.cs.tu-berlin.de> <3A34A679.F2F6BC49@lemburg.com> Message-ID: <200012111749.SAA00696@loewis.home.cs.tu-berlin.de> > I wouldn't count on that given mxTextTools' heritage ;-) > > In fact, there will be a version which supports Unicode by mid-2001 > because I have a need for this myself. That's good to hear; but I'll wait until then... > BTW, why can't you design a parser API and then provide parser > implementations which provide it ?! Mostly because parser generators typically don't have APIs. Many of them have entirely different input syntaxes, which are then converted into programming language code. Now, it might be possible to have a callback-style API for our grammar (XPath). Adapting a specific parser generator for this callback API is just as much work as writing a fresh parser in the generator language. Reusing the abstract syntax tree might be feasible, though - although it is more likely that the current 4XPath AS will be used, instead of somebody designing a new AS. Regards, Martin From Mike.Olson@fourthought.com Mon Dec 11 18:57:55 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 11 Dec 2000 11:57:55 -0700 Subject: [XML-SIG] Reader architecture and 4DOM References: <200012111608.JAA04088@localhost.localdomain> Message-ID: <3A3523B3.280CFB98@FourThought.com> uche.ogbuji@fourthought.com wrote: > > 2) Hack the distribution code to maintain copies of the reader base between > Ft.Lib and xml.dom.ext. This would be more work, and likely error-prone. This is what we did for our test suite and it seems to be working fine. Mike > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Mon Dec 11 19:08:08 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 11 Dec 2000 12:08:08 -0700 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> of "Sat, 02 Dec 2000 09:03:34 +0100." <200012020803.JAA00847@loewis.home.cs.tu-berlin.de> Message-ID: <200012111908.MAA05005@localhost.localdomain> > > Well, this would interfere pretty badly with 4DOM. There is an > > xml.dom.Node.py file in 4DOM and having a Node class in the __init__ > > would cause problems with the import. > > What exactly would those problems be? I guess I'm wrong about this. I just tried adding class Node: pass To Ft/Dom/__init__.py and expected everything to break, but all was well. It seems that at least Python 2.0 is clever when the same import can be made as a package and an object. Is this also the casde with Python 1.5.2? Given this, I guess it does make sense to move a base Node class to the __init__.py -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Mon Dec 11 22:27:29 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 11 Dec 2000 17:27:29 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012111908.MAA05005@localhost.localdomain> References: <martin@loewis.home.cs.tu-berlin.de> <200012020803.JAA00847@loewis.home.cs.tu-berlin.de> <200012111908.MAA05005@localhost.localdomain> Message-ID: <14901.21713.862544.22201@cj42289-a.reston1.va.home.com> uche.ogbuji@fourthought.com writes: > Given this, I guess it does make sense to move a base Node class to the > __init__.py Done. ;) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Mon Dec 11 23:24:52 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Dec 2000 00:24:52 +0100 Subject: [XML-SIG] Announcing PyXPath 1.0 Message-ID: <200012112324.AAA01115@loewis.home.cs.tu-berlin.de> --Multipart_Tue_Dec_12_00:24:52_2000-1 Content-Type: text/plain; charset=US-ASCII After recent discussions on removing lex and yacc from 4XPath, I got interested in writing a 100% pure XPath parser in Python, using available parser generators. The first result of this research is attached below. It hasn't been tested much, but it does recognize the LocationPath expressions that are given as examples in the XPath spec. The parser is based on YAPPS. Since YAPPS is LL(1), some rewriting of the grammar was necessary to make it LL(1). I found that the generated scanner class of YAPPS is not usable for XPath: there is a number of context-sensitive aspects in the XPath lexis that make the straight-forward longest-match approach of YAPPS unsuitable. In particular, a regex lexer cannot distinguish between an NCName and a FunctionName, and may decide to return an OperatorName in places where it shouldn't. I tried resolving the former problem by only having NCName as a token, but that caused a conflict in the LL(1) parsing algorithm, which could not tell whether an expression was going to be a FunctionCall (that would require to look ahead to the LPAREN). I haven't done any performance measurements with this grammar yet. Also, it returns some ad-hoc data structure as the parse tree. If there is interest, I will try to have it generate 4XPath data structures; I'd probably need help from a 4Suite expert here. I have tested the capability of parsing a Unicode string. The definition of an NCName needs further work, since it does not yet reflect the set of characters that count as letters in XML (or what else is allowed in NCNames). Regards, Martin --Multipart_Tue_Dec_12_00:24:52_2000-1 Content-Type: application/octet-stream; type=tar+gzip Content-Disposition: attachment; filename="PyXPath.tgz" Content-Transfer-Encoding: base64 H4sIAL5hNToAA+Q8/ZfaRpL5Ff6KDj4/ISMwTGI7IR7fzibrPe85jp/HuWSPIYyAhpFHSFpJeIZs 9v72q69utQRMvL7k3r13bHYQ/VFdXVVdny2/3v34OiyvHn7yO37U58Mnjx6pT5RSTx5/XvuWzxBa TkaPTh49OTl5rNRoCM+fqEe/J1Lmsy3KMFfqk02Yl1Hyv7Hi/6nPa+H/LsyyYpDtfo81RsPh488/ P8L/0fCzk5OR4f/jzx7D8+hkOPrsEzX8PZBpfv6f8/+e+ityXo0GI9VXO12qMEnLK52rbFdepYnK wryAX8WuKPWmfU+dbaJS/UWB0Og4UM/1PN+G+U6NvvzyC+g911pdlWU2fvgQgKT5bgDkTVZpvhzo 5fbhf4UwO3tISz5U0KyW6WK70UkZlhEsFiZLtc2WALtoA7TRsH+2XfcR9lh9Gy61AqAq18U2Bl6t 1SqKoUkXpYrKQscrFcF/pVeo2WwTRslsBjBOHvUBSYFxtlzqpUozWuzVd29VmaptwWAXaVLq21IV OimiMnqvVbEIk0TnAUCpfW6uosWVgk611ItchwBA5zluRpd6QbAXYRbOoxjgwE7uQkIQeM3EHg0e qaLcxbjJtb4FvnS9XHtqky63sfaRJE8cSM+jW4A0367VTVReKb3Jyp3KchhMWAANV3m6AYg5Uiva ZGleqgdteaAlAuQs0PqeenulAZEw17BajgiEy3AOmNxc6UTdaMADmIVw5mkaa9h8UW5XK3Wv+rTL fKvVqRq1V2Fc4NOQID+P8qKEhdINCIeOMxCn1TZhFN35d3za7aVeKRAbnSy7MUC79cdt5AU8D6T5 1qeWXJfbPMEOnlToMlwuu0WgzJxOpwM8ULdI/MLKDIi9CmPc9g7lIdcwjIbDgFvqjWDPY1U465kF tgnsBZco60voWKNwFypdKZyPCxqwKP23CLMcVzgamARwFkdF2YW+ooL6dbqZRwlLrIWOx0a2je00 2SxTABsmU7si7wJAjl28yxrpCgeH7oP/4frS3twQCQYA2+DcGCECcqJr+Gh8mGi0yXDNzBTYK63a 9fRmrpd9OcJe4Nkn6lfei2QRb0WjSB+oAGgA+qTbMtuWYA09P6iBo2Pez+A4lQS08bsBOlRJtLC6 ISqyONypHIDDdivIcPz7ib7py4kHqHstAvf7OxRFBU/0WD9KrCZzyHBnr6xzjqotjGM4H9cwSAHH 8OwvCt/z21Pi3PkG++UwL+KwKPSHHuXqSNM8tCRAvKSwIvaNXqE8kLpYgITAdmWTXyG3REmBhknp 9AD/CM/oZy160ojei29AHryfJmH/57P+f86m5mHY/3L64F88GvPq+295EDb2pPH87RtuzN9M//XC u7joXsBncHHxy+QneBiOTi686cWF/wD7fqFRnQODOjSmI0Blu1/HIfDXbvbftc6UDtGeUAduKUKF cQNiOH8H5sTsBo/kbBYlUTmbddHagUbdzkHnhPka/5I+hYPawr4BdsEe8EtacBi04JdpYVt1KnOd VXKd5bIKQpQz7DHu3ftFQP/5nrqvupd2vcvg0q5knwn0JZ539f36CjRkHmUgtn/bpiXYLjZNBUhX 4pVqje5HnCZiz/60CRcFaK2k7Mfp4nqg/k3v1B/DPN992n6v8zn4DJuZvl1oe/g9zxMyn+/Ap7j9 Ex49ofUPbMdUvk1YE4eolUCsgMaACckQoHcVgUkoWL8Z0GKlj/IhS4vT/ihQm2J92vljuFRvEVjH MgP6ATn4K79hHPyGv8dJHqFdkalP1XBs9Gjn3tOCtsa66Fmn3dJga6t+Z+OTP6jFFTi3HdVTlwba JfzojKnN4gIt0445j6/Sb8H+0w7MmTwTl7AiCItmQCblxiEsKDeydTSb6ZUB0DYypuKZ0bfMMNcd MUic84jxUYqL0ghUtE4AXfhOQGUj5eC8GJWCjJx04XETJWEc0Ar+YDCYftVuvaB5NMIOMF0ICntC kU88gsw50YhkUrkFvFEYs7Dmx7QTPtBA3zU5GJpfBktnlkGlDopRPZW9tlvkrAFu8GNpScFaEX5G ueqC3szAOV6Kk1ek23whtsJvt5Br10GOuiaz2rdF3kINCePpXPtOr5lhOlX3OmDIA1m1m/u+gimW eUQ14VyEg5lkp0Ph15/h3OOBizw49XIQ0S9A1wz5kII5ADHiQbAmHlOQORQj3oyJWWjuV3YB5iE6 HZVQwtQQ+MEmJE5vNEADCEPFcHY8bEAsp/VPT1Us2Av7/TFTAtfvmrV8Gf50fzQT7x5EMNfAtW1u wxgjOIinON3cppc8ByCaJoBCWyWuuY1mnHVRm1I5iaY4rpWHEdiXzvcJHDj23PRyXEMDPDPUGMla LzuMgPEIq+0AtLaAcjVFlz3I4wqt8xq9JNp5jKwcDV2GqKsQAq65Fq4mILlRAjiwJ0aKgjldbVHQ 6Y+G46mQuBp9v7i4SMAb+gNYKaXuFwoa4YsMFo8KyslwCn9OpsFlOflsCibK2DkeUN8P8Zol2HKc hff8Kt3GSxpQl0MSYXDrwZ6jAaGdRxQIVegHIIkyqikTGCs0WUlSeU+R0xCn6TUFwSi35tSQhsBW lGRDKaM2IG4Fyzuy8niuwxwcD1zYKhGiMWIKVhgD601YLsBqc3wZ5nGkczNd2Cc+GJ0y4CIunuV6 BQFUstA8dg6QZgQJ2NAfOY2wLrKrS8bM97gHcchEp2R2F46maklMCcKqF9dMKY2HgMNUSwQy5oQm zHHOEnEmMwfmcKtDubGSY4bOc5RsYVetDaDNCA5oY91K6QfWcvu87EY9c0hgzuw99YNWaww6mch0 DiAOnWvcJ/5iwgIx30fplrSgTHVIl7lNhsQbaTSceoGUgQhkiU4WONdLFA8gTrIGKtNRhrWIBVbt VEs47CEKOSuhY4K0IHfGcXzq1DYblmFv8x3xJmVMULeDcu703qVRpU2DTqA6SD5Gz3Fquoa45G6B X1nb4yrdAkwMHWh7rFuiAvcNC85FNlDny1mPyv0tH5AAkrmvcx2W2hw2PhMkYsuwDGEEN58qB0Xz 1KuoFtiFAtTKqhKciRk9PjBtitRwvAj72HM4Qmh+l8Q70ShG/GtJDufA0vguUAaFDAwCKpUM6CaC i6Ndtyc1mu3T04YOnhomO83GQaBfvtvvqjQzyrGjooeZMeTdEp5vq+0UrHMrfqrB4EPIU3dDKW6f cXjOrihqeqe1K+cZfgQmSVClQt5KqBBiriwGmi+3mzkpTpveUkRatHu8ykYXRQh6lRID8W5gYrsM 1ocRiHCbxfl5tEZHAf1pZhYckwTgg/rFAfT7FBTSNikZy8k4mwYK4tHEY1Ns10UXv9eBg8azOr1L /O6NLnudcUeWe5XeQGS240mkwsOcnSZZnMZRflRc2skmvO1m/S+GAXhxWe+L4dTuJFN95fS2ZY1z DP2sJdQrtrARKlLUBF2ETrtwNvHuYG8uveJwDVEwu+/gKSIN9U49Ay8bvDSE/a4aia3sV8LQDLW6 wTaC/4NZkv3RUlFvNJ4eRD2P1lcV7hY5RjvIKsRrXbnt+gisEVcXuXFkUUPOwXHISy25EOinLrb2 6Ixikw826MlQTNwz9RgVN2g0CtdW6IxhpFjYVToQCmGISKuhh1XR6om7cmadOo6UAjd+DzHQXkaL EAIVRyS9Z8oLLJpOm6e8Bxks6v3kuV0vxX9PwP+Yi39ejD17IvlU49Fd63LWDDK7khqszi254ezm sr9T5dQCJHmi9RIcYDma72HbTZiGOXuZOkoe8npE39di2BspxnCevseslbpJ82umFSzeryX0IPbj FM6k67kRMuy7+o1Zv1YLawQYfmFXrgemse52YOeG+I8isRms83SbdYe+Xx8lc+vuBM6ttTRAeP6U vXPyBhFv8THtAXkP3AJ/27fGNpKsBsck/yEEZvJchQWHAhKLDCQYIVag7PeKyWjaew9nlGItBDye utnm9408wp81/EExrCtvHrPmTs157Js8wkdUvKHh2h0puCTc6EPpCMoX2wQQjgLk8Ws/+DePx6L9 vcTAJsy6cbiZL0N1O1a3GMdYIL6Mp/XRquB322IPdn7G4Y3gjyU0n2T1Gw3kyEMuHLWwfUC06HYO FZEukg76XngITPrY+ky0po0yXhMl4ZQAc7OdsSlO2o6SceSrcPpWgoAKgQO5Pl/SXmYRk0FoZqjg gCPy2MV1ziNL8B6plmC26PicQS3Upa2Lf318/6jdD+qHDyaRURlN+nxlAzBxQB4keoGORb57wEGQ ugmT0pQWD2gWnj7Xi9AUP616ohMhHqJLMQQFpyJe1qE4JDyofKvN+sdYZpb+eEYxAJct7gRud1t4 k52ePZq9jsDonhsnb3/WEQUA5+LYaPwIwEF91kQmYKB7KMBtblo5H8rmZpTGDY6vix9QC+DsOUoF PD5GeH8etRDxrKpIQeEf0RVfc3XI5V8N3AHKcn4Wl7V1Hyz2faxm5lKhUcw6K6KYShnDahMgYnm4 2YS5Ey78PcH6F6vScReAhJsimHDphdK//j9k7pktyLCHv0jzJZtqU1uZwDeQuCqjsIq086H7wGQx BcZdMmWZ2twbLD9zVYiGExQHc8UpH/l1t2kylxjUmonhmCbQ+akqQpizk0wKbMNdBiEULKZrFFMB MbjWu6Jr8poU6aP7wXkQk3sfvPgG1vQxf9DHlJNE8h5mCYQv4FZcrkmQ1WYLPt4cY+v3YRyBh76E aDRaRcBixPMcdMU2Y4XHit2aURSDffNaVRiseFK77ADa5emguWxRaRrkVaqQ7BECBYASG7mpMjeh mQW0omsOqxSzyiYCXTCYGfXNQIpn3N81dpo6yEvKC7MpC4If2g6Dm0nVTueNZGgRzzsRlCQeWBmM 5wouNFJCsZZkNYm/UwrhyNXAxKJLPeG/7zuicaBf5IMPmSGIDJmsp+DCWQHiMeNqrPf06f3i2TPM 13IbD43DuY4xw4KhgsG0p7CCJVMhehgrz5/0u9LdO0Hf0OYVqyprE50h5TBqKeTkflHPGTMCgaJc FYMibYChC5KjZRBE/JBwXWrw+yc+YvaL8ppJZsczw7TDUpdhFBcugzlv/uv8JVbSLZnlcjbPw8W1 LgGQc0yLqkr49CmYv17n2bOOk2Wp7n+IGXV0QZOBwmMMGBsD5MBVHqucDNMvkl4/JyLpJE8YNbpq qOjZJyNSHBgid8bEHqwX1Xbt8jZBUfOr0GMPvimtMENnESWFMfLo2lKOAwuCDd9ku0SaGrJNgyYG 2lTGMtJgoDuBtOCajcUcmeJVWjzNaZ9EU5Cl2nYpj2jRIZMCo1AeJQ9shOyoOnLkDUuLz1+8OX9L 14SoqghAsUCuFpwA5YIBKt5Bu0XDv3v58rsfGuNhXzSe07AuzAEEDvSL68fO7YfAlN6xElUv2dF9 O8wVoqoNWfLneh0lCS5ALnNo6aQEJWcBh+FA/w9ZRoUrScSX7ux2y8anZPEgPlV07zFUUv/M8Yrc NS1NVT051FxTbLcqG+RKzd5p+5iD5lhK1BZRGYEV59OHHpScOeRBdSj//g/nMOKPcIExVDGrnCoe cuBkWmf+LXosHldqGTCSlogPMcHOUbs1k8l+LgyHMyMV77qibh7GA+fQzDeJ5InJMlRYvUPvglCz BLCFrhpneXEcU0cIM4CYEKXNcGWB3NDEep5MNFMca5CQgeHtSIeOTSK6dICv6XFsDkKnK5dtFIE3 GqS3ZGcJ8/lqC65UrAjtAhxNupj6M10jUFiDMsHiqko5N0WA+BbR5DVMlLyM3ROnGqVVtlONYcws Yw+qX1DL+7uqKmXPm2qiWfg7e/VXcx1RLVNNhQ47m+r8llP9vooGehCos5cvqyuMc9y1O25gp5s8 /wL6YUMarxsmqGPwieqsBdEBNQn+TFd2Kt0rJpSLw9ANCEulA45KTebJJggKlv6tyg9zDMXYmAWp 5DQJvGZgDjSDQ6s1B819bZZiBLvmEXnBU0xRRxaQAQT1DhmF3oYA1aRWf6zQ7um530Bq7xZbE2iL rqIS+GmdX1S5OluSBsK9NZRjJdN3+iDOApXzYZSS1CE1ZWcs/MBePwAqReYwgPc/q4UciGXTU6HK NMY8OBWvGShhjKM+ieVcuoUeOxHLRnzbWaJ4lOFC/22LlwHUHK/xOYfCTqOkDosM6lYRGouzNn5K JfHYIXG4Q5xqKwO++WjZt3Yy062WSwa+5mxbAveqMpNoTd6jrYgeOUf7qFpxxy5zosx5qUaO1V3o WAPgHLZ9zmFF1s4ZHzhircpQOrOcSW2HWECnPS8YPdL+iH3StdwsndDAqaWo2TjvQTBfT6nOTdD3 SFtB6Y2mDpkraJ+aMNil8/7mHFu5tvuSoMLGPzX2H0SW8OkDKoGqEf4oUnbCh+Fmh9dRRLWG5TRw L6LVrq4osOYNnqZzdWwn12XpzlTOehNDQ1KO3RcrM0auyZXiBWHJFXNR27ygBDk4uJs5qM+o3MEW 56Fjq+929r5uGF82ehXWHATQ2yi/5s8ZmWStJgdsX1CNypPA74cwT/jyyjbWXpAEnmiarhfURlKc fiQs48AdI3W/PsnzgTXACXHviwEE7a12XZPsCTJsgW8prOXi6G4zT2NWwk5CXigWgTu6QEbAGnjl MKI8WwtchJksKu7eHTahTjoRcULuAP3MGybOCoGFK7c9EKjTj4XBUWVrjC8dquU2i7Goq3+NKXgx 2AvWgSd7xRsIIINjTybiD7tR0zBbhYuSr285PXzJLaLXEgyCVhXVGRxa3S+xs6fUtzJDeQGLBCVs HJ2+HruXu7B1DbAweZnYgw32lm5ONg8Sv/WFacMaPeAovqwP5DeM6M4pXnWrlomYMIWzFE2tSMEZ SNMtzKyTKyBYtCmnv6iaRUTqs2jJe+obeTuFX21j4S1sVjaukGEkGlts4onsopPYco5hA1usKfuB Nxj4RhxYA7HSxOtYoXHKZLWzRL182R355jWhHNyKKNeFvU9i43i66bVNWL2lWzrBNc6cGRg3URzj YFNpWAYUCUR0dXibxNG1jnemUm/A2ByeOAzNNG0WLa5nmIyuvaZBVs7mWAC8vAlEwynBj1lsE5bS aLwObS7iiIa7RGCAx4sV9/xoMiYp3vpCjwyvSFDKgtL/Eb1VUcC40KzxFc0mq0CKmn220WDwirSV hUjLA0QxI3x/DObDkIhh/ChX6/az9jJnRmQs+DZF665KgeNO+HzlxVibsxKYG9I7nqx+9AKjPE72 8xA0nU4gwOUT5qFz2Y2V8RWM3ViLFbqA6DYVX9AnaTG39Y0COuQSGbtF98QJ/7FMJGjcadJ2pzW3 aVytSN+9kdlPFyEA8sP+PATiValLGX/KiplfAzAVIjBJ4Gbja5biHvB1Mfb9TWbXddoo/rS3687K MpS7KjJR6ntMnwMQqA6ZFr3RJToef0FdIYNCPk/RQiabrPOs0ygxHruNQO8E8Il0so1cFZQsmrkC LAn7WiWylh5zFiHw5Gz9gLXEQzlEzYscLBibV2acohO+S/PhpeK7i8us2Q8Vr46Ua+n2wV0v9B28 bnAIUfcFjRnwdAbnd5dpg+mhafiBaXvJSIJEMPy7J0e07OSEYhdc7VcWw8/+bWAEgfdfDs30GleN vR5t6u5lzHusAPezKQ91ri38hsRzX604QL1ATRDK1Ldo7PH/tyvQ0HE6lCk3Kf9Q4Y3qdHl3aCCp ruBIzS+5q+AH7kDPLfVxRarqPx16x1nQ6SW9DrOAiNcRUL1OjQvxDO8tVIlUTMvPKv9sL1V5VzJG VL+tAElrLZliF5QYk34GThms5qSLG+5k8Mhg4c2AzBSG5C1PNAEX7DaxRrIeh1uUVM2KEsTvZhpC dotKEJOyt7tPFEnRsWkyngOYYhxYebdk8cH0pvhPMlCGnK1vSOk8YXyCDsIQ75eaU9CxIlFflzP7 ltNmJuhb8xYL87VVwXTGMOTmTSMOqeV9kIYoyeReZ4b/Hkg5C+B/NBK/Qb/ODiq6TmWa/zlFXDvM e+pSxxKOXbLIXGIsdvKoj02CqO/ciadr8SndPCfzf5OHGZv9/VXwyhBQqEkrtOWyFrbVcGEP4QCo w7Mcp2yvvmfcsZqKoZctE/p3PIp/6uyt4lpNuZnKxJzyhi7a0kvjEhzYa2ZS+6ObTOYd8kIi7hWF 0bETWEsL7nUs34OcwHdX/q/pAfZQTYxBV3f+u71nXWvjWDJ/raeYyMffSGYQFyfOFy1SLBs51h4M tsC5fEJxZDQQfQFJKwkczmH3MfZR9gX2xbYufe8eDWDA+KzmSwx0V3dX36qrq+tCthc8C2uhtatu 5IfHZWb42NEJbJ1aTc4XnHzAewkDYOOTkICygPxdVyiHULGRPF17huHzzvbWr4a5kYcmtalI1Xtl IymrXs6od6uxu2fU628/45p6xf30gITWdGGl3is7wwN0zzKwNSrZEoplonpVU5fEpocTBauR+MhN 4MBP02LZ74vkCP0xWg0PC7+MzxsXXBcmcqSZT6ovRmOMol292G8ng37/eH71du+9BuQqNrb0eAA3 MUfWT3oYPM54BKAoCPcU34hiIht43Sdaf8Bno5SSZwiePY0I7IJUoptz9vF1Ww3Izng2OBn8g3Vn +KFf6gTCIkuHfaHhABt6ogrR6z0Z4qn3G15TwoeDKVl2d4E8cemkFYcuj/6wb5/mrKlXltZ0kqpT H0k+wdJIdI6AV2kSYHP/YxrMB6LyI0V9KrZwXiB5FHp4fSD0D03VES3Vl7JekkpYx70QX5WOiGtg RmXOIEjJ1TyGhkZBicDMJWIK0gekmYp3BrGc+BHTW9RruHYJImJ6ycZg4uak6qLuf+0r8KyJx1G+ YxM3mcLATVnoLmT82S371UHivnoHKQqcqHUHKdxm9NZMdhAsyzrp9fFmP8BHtMEshjxmqvS2N/tT jcSkIlMtWjFASnMmijC83CShzIAXBSFL76wsTcOdjxgTVGgeM2aLb0R6JFTfwvDyAp6IWdYFrSsC TUVRaAkWS3opyMEvm6PvUirFjh25OhtSsgNDWRc7N0hd18zlRjjT2o2rJqjsC66LYDe0XaXRgyKz YEe/W90gdM2G8ocQmmUuV9ZBqsFFlAyiBlF61js+JdJZpNfkIdwKSdUiBjykJJsIGxm9xkuxkuGZ o/r72lJvNhogUKdqg3fL2Id4SY27vbC5fHFtSfRdptOp1DJsW907CXLDqdT/VtIp7E6q35vUEtXn J1pkQO+hj8g74oMWGRbjCSDUvU2L2im9C4jCljq4u4U0NeaXZEys6iLxNnABsQYqZcpL5ZbcYJNG HJAg7O5e+/LATsVlockhkCuWHk3LxeiRdBeUtd6XilLet8SQaHUAN/IRXMetG4IWwqVoCM48mpQz XJ0NVMZtkg2tCynpLtFNnwEUWkRIm5BLhKX4UbCQeMMNXkCXiOuTPFcoH/H2xVTyVhm55YrxC1qb uBzYuo5kGXGGlEqnPniQbamRJUaFsYRFeYpPMEy0hYj3jStexYG3fZvliRJDRtU+/jEZ9oqta/aL uoFYlJBlVN58woJSIQ2uZUlkS8WyKwM2RoiFwFalAXsYYkEm50onGrgv2CWTErfEj5zlUhluYOGy bIhmG5FNq1pse7XRVeKSgHE6i9tQGVfKl3OllrJz+QJX39rKwCC/+BXQzapGDGSLWmd/XpdsN4p5 9CMqFSdTNIRPYmEHHyeBSsw64LSxzOunuIqUif3cWbdcaGUL+wWW7Z7ym0VNxZniXjMJb2t07Xn/ vlaLpcPTOKM5NuE8V7MIvM9ZGFIqAgAAHeJPqlJznQn+ErEFsFERAvjaJBqNBTz60okncbmCTjRL 5awxYsGueNMldjXawO1UjzYQFvtUzxyDq/kYzPQ9KO5YZG78sactzfr8fKpenI6kedrXhYVjufvo WO4G/MXle4uDsShm+oujTM9j3MJfXHRb/uKu5y3uATnowsm6jsM4LJfrMm7hMO6Ldhh3J/7iPs1b HC3Dq/iLW3iLKyy8xS28xS28xS28xV3dW5zlqGIznR54XkIyHXPPZCZ+GT5AVD5+pbjIyBXJwRP9 Kn26GzCsdYAg9FsAYrlO2cv1QB7esiiXVLT9/AvKvAjkCHcbmM2/BmCqlF0N5LBLLMrmXwMwnWh/ fwZX6/39SXeJQCFldW2NvK2vrn3bXfILNbc3EfBvfk5jb6+NWRsb+/ulzm/17tL+/kUdf0GX7PXA 4LSoqrCveB96l6uHyovUwP5+kZoQLuDLjyG5TH+XyH88uYSPbRDMKLtVw3264w1Gl1eUvyCzF6C1 QvHLcIsR5LktLTZVQ462n4LzFfusHmbr8GXo66nSlmqexnyHXUIZ6l9JdGbgrRgZE4srqfm4PRS9 lK/leOipNW43o0ae61njB46ENkvZg+S4Awb8uoTH9ebDy+fNJwzm5aPDNEEwzxJ8xClBPWW7Huvd H4h9xyQyJk3S1AO3Xdfvp/S9JtaP08o0DZTIlpN7YnExy3FZT/zu6Qd71k+TO593nBi/YzyRXHPm rK1lzBr0Qs/baYJTluhpjOM584cYASHzEaJQGBn4wMi+wfdt+Y7KAyFT5Zr1V5+DZ2vTRFMWz0GW jqvMpVSCamErl29nReHyMZcTk1VjQZG8Qc3apywcGv4QHYDJ0lChjW8TCrEJvJki+ieoABwgXV2C L9t2AXEBD0C3UV/HBqakEKyBrcAV6YJHrUul1mYi8RbYUJ04rXILmLScV91tzIKz9ohHcM4mQ3rR wOcepwafWApGgLkHwQLRou4Ga45jdy376/iKVJHGy1zHHKXlc5+HF6F9zajZC0wElckkM9YhxsC5 J9jnParEBJhzwrvo884ID0rG2ZBLnOSXc1qsZ02jTy7CPNB1logY8FAhTbr4727OAhPUirv4Qmob zz/EcJHd9RqjBeXv+luhm3hQ2mNLnJc6E5L5xwKRQw2AdNWujSitN7ny/BAdwybplkC/4LsjFtO8 gRoIqu02hgHPC5+Lkmhibme9urzeDZ1r4sj9vARA8PWfSgHy2RS1cNz7zJPL3We+ucx9BndqznWG ei2vMPf2YlaKHwp5T15/7sPhxis5poBuTVbb0o/n+J5+U0/0/wrBGP4VYjHMDcVwXyMxZAdiWMRh uLU4DCIGAxMD3ghSoScRj6Ty7xpehEjZrdbp6l38ozaRUl5rjwZn9DrKSz2SNUSlXypHPJH0Iiqf YQ0AQKcnIkXDgvmlMj4vq00u5xUJm40a68hY2HeW16vdWi2uHMVVB1puQQWKHMBSjLFjlQINP6O3 huwk4aWALLpjIueznfb68tkF/SlMUuoTYyy0REjnyhlhQ/VKd3E8SbW6pKBvo+nMTdts/dTabOIh V9wfPnq0P5TEaDeV1lPC+8sU62Mzmg/p7GOaqnfHZVEprW36leo4lBtQ0G/RlNqDh7yzHihMk8jS helUD2FA94e4p0XKIcV9kBWJ0A9XwNXAE8Zi+erIiolI7LFU6BLpB3yXMtAV+IoXQimU1+Mmgxew XgueBXh0YRZPIAGpZvVzQMl/qmI1J8YfVT4LD8R2FBIZsY1ZoZWjUejKK0IkViZHdujHTmE4HacH 6Omjr6I1k/WUGc95LEJJWL61gayoDFvBhX3MneOT/Omwn06ms5G0N4cSdmQVmirIAAZlmETEGFlx oIVTDqec9MGmGSHlyUY4L6pC0+it/YiiCVvhqIEuwi8FoXYdVu8tPJin8OqovUr1EbM+X0VO0193 tx8oxuyDQLVPDmiP0xlyM7GzypTWhVJsss4IdiGT0jQKDRBeaqbOGW0CqVwm/nY00OTM/Jnw07Ho Cs7Gn8SPS15XzCI+KclHdcowdciCimVBfTJ63IDB/ud/4v+GOpl2tvtP9uzvuEwwUB4mvcT05IjK HrgpqsplL3uxNL0xwhiWbTcMEq1hWYw/bprBkLaE9k+kLI75YV2hEPSyP98dqCwbcv/JFnriGd/E skyURefpDFNDLNOg5Miw7xEqBH0Ro0TqDkhNSOggFETnug/01KExx2RWilYTwEHM3pE9ew/82T/y zHEfRj8LuhGRbg+2P/uh8MDd1xGZFUSPoHuo0jU0fFMZFxxBdmBkODIB7yD9ZuoGEyrND/5j0GpF pd2yWdEZqKzSTqPT3mYWgAB8jE0woSatTtFAXvzwYfQc3d0IrPT1hCmT4SmHf9i5hg2FmW03r85D uGpkqagzHWFF9KN0hmSVKhrPxJZi60XOqvCP0vR8WmGV8yqer4dVffMgUyosA7tA6K6T97p1SZCj +B3e9NDxnHIBNuagX3RsALcWdQ6Pe0fTLpPZylHUEf0an3d1sZcIUxWahVggwSOoPzoInUCadJeg veVDNMXEMshZA4f9zWq5U33yLRqGQgUFY13LA7PTdY8/HB9jyzMGvBzmHoI1DJkAGCRURO1xakgR LLTslPpheQcl3D+Mk1IyAcAXxcRakKAOf2KMB6j/+Lwk7wXAcZzCGcWTBNs9JiTiKv34z3Lhq8V3 l9+b81+AtP+xQv9Wjm6ljbXV1afffBN9Bavpu6f2T/jWv1tfXYeU9bVv15+uPv3umyhaW/vm6Xdf Rau3go3znaKkK4q+OkFb1+FdtHivvofS+A/pCS2CBD0FwtV5wDrVUbv5YvkvOOP+WF77/vvv19bW nhaAqRyNz0n6Eq2vrsIp/ppGLzqrRFv/+z8fB9OCtLAR59/AMbFpnAxm0b9jfMT0GJiFXxtv3uxC kT9ms3F1ZQXO89HkvIJHMaDVr6T905X/6kGR8QoFB1shbIFqnqJSuoiHAvzM6biP0aL49vswepUe j7FnQv5HjBeZ6pKP3QncFj+cDo77GHcJJRev/9yZlNK/xhMZK1KYQnFSrWaI54WIlXI6q1bowVKM tmZcpiDqbQB5vYGKiTGRNbNNhx7UXTuCGytvor8IvkROUjmlPWUYUSkIt66HxLVonpRABzxmaPVA zg+Pj2GriNs0OpsUl9MRVAJXDp7C6M/h6GP0UfJkgt+FKioKYcGP83UTOaeTMTtyZ8/KWIyWIUXN OoH57XOTB8ejKdxrEqjnwwiyiWn9a3ZK0bD68vo8SadyRVSoPa4WTywZ8VMZDkBN6KWgCj+XSQa6 rL1eoU0YcFLHvYNUBO6i1a4hdCHh0lS4FsGCeODhzZj9UKrgfFRkmsINeTY4YBv0afSRvHCS70X2 QAJc+rL2zPboUUHsIBoUXig8SVvMV1ejuNj5rdh9XLzYjzu/7cfdx7EBtU1i6mpU3N/vL5Uq8O/j 8g8X+HOpaID91JsMeh+O07ZUxa+icuPftDpklX9bXf7+fWXZaWLUT/dQzQ/vpDhjswucmYvxZIQ9 h7FH422yO4YRuhgCuFm88ddguo1ytwgWOBSAwbyQvyzDf/jqcoG2t4MPp7P04uAP2LIX/XR6gEsD 2tK/Kmg16fq35engwzGmUEyzMczrBdv3A5owQUgE9G8KGGuzuvpCYOrqiYbG5W0ObKmakVP+wazm NYowgYPaGYtAeTgzVkNbbxrt5jall8z0tk4vm+m7e422X8ubrXe7lLpk1f283Xjx9+Ye5XSs2s2c rpnzUhDc6/b/sT0AmzvcRsVJVOlO1nPRuwszsbm9CYkPhVSEboMUPUGT3q3RAW1hIj4AHi3Xo40N M7VeN0qbGbqSdnrcQxcFVmXBj2oPgUMrEuYianyYjo5h2efVR7WFgC2cQwAa93glRu2OEE4bG5BZ r3MzGTAO3h/QEAVPpnAXBMJzoUSNhHomYmf1uu4C1XpmYRKcEYIrnSWhzDL00/mMIQyVMDT9Z+k4 CDLd2MA8UXXm3E/NLmfUk9dhnMZ5eKAO6UqcIAh3NRcbWe8VK86t19yK6Vh3Ck+EXSFfnvDxgsZl b4CHIb/zUzF/Flwi4RINV85ak9SNrM9dmjxxGlldvzMPna7VnoLzMO+ohO6SzrO3qtk3e2iQrEZx tRpn9kBhVEKwRBbKHA57wHOGwwK2cJYzoNHFVmnucjGVkBaKkrMQJxwSAcH1iLNNdROd1cWJhE80 YNkhIpJrsmcO3e5YLctm+AzgP+y+CnyNvQ+nKsPHj4FWFh5cMAfAafSr3TniJRAnPIgEHpS2JNIc 1DHJwTuO7S0Ka8JAo0p4BBZuVR3rOq0J9wp1pnMFVl64JsypRhEVpjIeaA6NN48foC/h81NMMgBk UexwgzZZAX7BX3dxxR5C5is8oEqc2anwRg0uq/hZLCp8ZtfHw7gzoXEUp+wkMOoDuDqdE7Cq1OPc RXkv3V7fvH140sVWkliLdnM3g7GU2WE895l+t7IlP0hRownITLG6aGboPpoMpUS9MTk65dBtAn+x SNCvUZyYJRIFahEDlTqfikuw6O/4uKkKAZz8vSsZIxvCbMstm3uGJ3F2w2dLl2ja72d13i59hw6q 7YWFO4tKqDxoWybKdnWe2V2zhNXVUF+BXQ+3VYov4uQskXmap7AaNbaHAMxg6UPsvLFGB8ewrnk/ hNkbDSE7n8XbGMtZlTHGVe9iIx/HVmfIFox8c3ytYnlrKbM94NeMvHK4TeMEcvrRGPapzh1ZX0ek qEVJAjWRbZ/DQdxRcObXiqvdqXfHGhHupeq4sfC5mG6g+R+nvePBjIdC5CLeZrqBPErtJJRNOVRR G3+Uz2U1At1wmxGclQDJ641Z2r1vjoa9Y2rPBJrKBSpzZc8sILNfTml750ZxLc5prgQgibxUSSC9 bUPtcnfjry9R99fXrDwwlnYlxpLu98nBoEEGJBCgYObaJEAB+dc3o7w7nhtxToMlAIEum0Dl+Q2L 4dyoXaLq2vXqrudXXb9mzZfAun49rEPUwahDT4sQtiFvK7EwAQEHH0IiYAFa9MKuwVkHKHe7RLul eAm77gOW57UvhnY5vlQLy9duITDAfj261++G8kTyoaZ4wDvnYADKHOBgJc4wu2LUPBxKcIuD0VBQ 5XxcxGD3B2dxfu0Idb36T0b9S9SPUNeqPzCVqhLjsFs2cRC8N6wfsz2jUsWyORwcNvPoET4OtVPl ZUprxZDG/ghm7Pgc37Ki/mDKQd1opxeEDskkLYjLdC0qKplyaf/jRYfE6Y+LhbciW8Chk9Aq6lXr v8s/FAu/SBayxmrCpR/eqPvOfrHz236x+3hfv7Psx+ULASVuPfTIQhJpembh35YUlH8n29//GzQF 7b+VeBSLulJCrg5JS0IwYOa+VZlvvTy+I0HtJZXUlklllYSiAkh4rBKQFEGCRliKCSCxo2vSiV2V CJdm+Lti/i2SzNTn1OCF+ltuxvrKykW1elGvXWzULr6uaQwHw6PjVEF1Nuq1ZOVZdVm3i7P18x8w R7v4plPvRPuz/SF6zSgXcA4LsqyY/w4/otIbLW8R2ojdghKsMQw/P6GRjfsURWnyNQr/oAep2Pbi EetHKQTxn6gwVb1NuYW9RyuEVu9WwgdLylWrNyu3Eu8xC6Gp6W5BOhKh9Z7jOlH6SZQVe24NrQzb i6GVJZ0ZWokpkgT0jaV8Aao9mEDiT832853dpmHmmOEaMCKTDVU3khN22LSsfQSS+y/t5mkmfQWO tPMpVcEktcJllYymLHtyyxsY/IFeTxxHYXnG3ViKIpi5PkPROggyf/dMIiFRD4jhaM5AMiEfTWJU jGHZwmje9JyvBqaCzrAImpQGhsv/SCcj7a+0Z5QWr99nqId6Ji69Qg3AdD0mnEhh+CpBYI06oG2i VxUxRVOtKZCwi8opzY3ltI8aMSrp9c9oX6qVpfLMyQn7VjRnQzoCI52A3jEaT5yTq7y0v9z7A/5S nr+0coQAr9jTGnR7qJvZQaOEjwMMkEKhnCej0Qlr/KKdDzudC+wwqZuHgkS9DGDrppOZ17VabbC0 Zk42943cViqdD2lHJs2EBOL43mljYfqJNT1auvQAP+nKTHkyU7+wiyZhZBLoIEXFdFo2xpMzssZx JtwRTtKj0+PehLRmUOFgpOs6kVhh3lzfe0a/ydjR7qHvtM7MtdzXRc53DUd21kjkubSTgGYPTipH k9HpuFS0j8diWSoG4YIKbwX0MfgRS0R02kilZtyTTOPUdggsXfz66XHWPlCT73ldw+WViXQ5c6u5 tJnJXtlcMY0+x+6hJYr+pKLewcFo0heT0aMrDd4atWoORy2SVVCZGurrIP84GmpiRt4vkzOkVQL3 PuBQKlcA9RNS3XcWCZFYGnrtHNIFMd+mLrEGPFj/K6ow79VHUx326BHpSz+aYkSHEjSbYEeTmTPa ovczK5HH4kxTLNSkMCbUyBiNjXSyg4LCjrcAsRoA1qFeHKOYrHoVO2N4SxWHjpXO69s8b3ijPUui ajWJSknUSSJy3wuHk2QOza1DJyFahA7qHLvJWsvLa93OE3XEd+JnwEDiE2tcgv87cdefcx5A8hTV CLmKehhxqORe9FjF2zYUqylqs3t39SoRsxS7gLEFSbbkCh8+mdlLJ07oQI8HKe4EMBWGluLSlIGt XU0WqujstwSblpaD7TsOAdi3jsX9I/+qfnfM3udXqvtN1RL7AdMlBsCpasiGtwZ5WVpLVpM1g6jw YLREuOA/epPewYw0aKWqoR4jNIhA/vs86h3O0olTBQV/OEtJoV4RPSK7RKhL7lpC3JDDxOnjK15w QX36erqZqWZyxoMuNANh2FE3EH6EtQMhg/QDA24NjHmO5UN/7EGFHRyYZc2nwThjUmcfR3pip/Nn 1qmB5nnuzCJDVs2mE+a+pAl/whOOCh/GXlVqk1kLQHOsmbMn66hkEhQJoQda8XpI84ljH7OnyAQx CzGxwAHUbDbB4u4+twb8/+/Ptv8Yn99GG/PtPyDr6Xeu/cd3qwv7jzv5FvYfC/uPhf3HvbX/KDgm uwUO5MSW1eL543HBj9JlOSYwvQqEYKX7Fl8ofCuezou4Q4oJ6Xu4jrWLJJTHTPzp56KwHnPxp58L LBJmVn0f5EVUH8A8+OHncVYop85Z9WAeZwVyNrjURqjUBmcFcp5RzrNADm78YsIKJn7uMuUtB3IS yklCI8UDFchZWaEs+BHI4yzfHbp4JCPnp64xkgfMb2XsPt21SfKhvTczLjjHNMlvUN4WkitYKPku 5SUrnNxXQyW/4y8kwpn2Sl6Zt3lF5pktebV5AomE7Y78FcTXWfap72e3jWzPZ36JZStZVeOrJmcG lpd84WSATqBlC6DrA1j3yeuM2+PQwG3uiBYrwTydHYR4LocjEMtBBE14GIo84AQaMGwO7yjGgCvc dfxbAopYoFu2AgAE9TBv3demL3rtyHEXQxxY/OoE0wedWJ6CjosDQp4h4pSRB5E4quRppk4GdaTq fWKsW3EQCPrNJN7xg+s6mbzHHbmyx0vbISiNQsBj/ywdOz5uIWW+V2rT4MlpMySE+uQm1II3VHQ/ xzK3FhLM0D1eLJAUZCD0cWOwLpoxsWj6/K3yBQ3GjfiKNVTG7cVsZIScERsr2tQ6N/w223Y7N+jC OenI41ScnF3rwcH0Foy5Ycf9lTjxInYEfPdLaBNcddEym7rJDlo4dOSVwO60OgA+fUckkckNy5uL ZMIUB6nZT4MD944f88g3F49eGKwhvxgv/kxNd3sHmjme83JhZuB4XhepiruybBR8X+yOUYTvfF2Y Kd7mXNnk1uTf7zHpxR9VaxpzjpEvpF/Xj3IScr5OOyKo7YG2rEm07hJUzQjd3oq7UQbiOqenSSBN zk/s2fBpunPLo/Jl0UxBsYLEzx1uYV3mxDexx1yZpIVIpGH05hPI0F31FteuOz3XHVOjoxnXmDmL Nnw/F0PpXm0MA+CbZQGztqRlXOyF9xDWxuHwGGI9G+tMWuA6K00m++Fx/HApbbdSFUhrvm1zKOzI TTPSzuq69ga9FD9Ogojrrlf7sLE9hdhzY2aFLi7Kr4hVTKaGSmiXHi6Nl+mhUmqa892aeAHxMi8i 3EXnZuUy2Moa5w4J0f08J5TtgT1vInUejdM26epclnbrxk1G0YHbG2jFq31ZJ7RzD2+HtNvkDul0 L7EB5FiHKfGcDUhwtj8FfexLvwuBSV3wWv4wBu/zGa68MiWZN7Ap7vENKjCvN8Ckfbny/CyJpL/D Q4vIiV0aWmbzjt4Mj3BBOYfYVndBytUWvh7BRIuD/OGUXn6CcYJFTQImwGna1vcLWsifaTDvHERG Tv7dSXk1kCeR5QFBz4JzbH2OJ1CD/0jmPTNmLlpjBK7wLJFz3M99ZDMdKplcsZTHLBZzYPIId3se 1IjZk6AHMjgL+npruQ+44h1H32FcdkPdm25WnKAEtMYIKa96Hg21Xn4yboXhIQoICGTLHsWwW/GJ tBL3f5ZH4yuwBPOohqjIJwGXfBjJIwXWE8glKI9Lq0IPmraQ4fY4hsvu7GfzniKuI2C5EqchZfrP 4sAaNS7Sn+P4ulessXj7ydHVuVco34iOAboEDLKh+OXIaOar0zguBfWy+8Lk8YpHmHdK0+q4sctl 9tl7mWtWnoTS9wzr6wGEgG7yTL/EkIYVyMyuZEk35KMB6U9fXXB7qREyBMx39gB/jyULSRTk0v5l FLiuJA9VayOD+7z5h4ywl3NfluI7JFvcuPjLvVDNERwE3LxJ4YG+ZOlJsFjkxfDTZ8teQqJFJe+a MxGWZ1I5BY4b0yw52me6qd0W+Zp/rdsIXeouKUHL4fp8P6oOF+rhUrtFZHwfpjnY1G8PGd9Xax4u tzgyAe+uN3UBvwPeMUQ8rvw64Y2QFo9lhj/wj1R1G7r7B/1PJL83cQhc4TZ0Q1jPuRxd7VKk9fvs tWIoOua8XM9VttLVmPcq06vvZxG63BX7PPcxgpAI2Duge+ZMYYTPZdnzFuBtc2hglpfnHLLsCeNv B7ssD9GfQqYpDtuXR6lMuX8mjZ+3+33JPt45PcpvVR6Q6Zvn64Jxv+66n8PA2/RRkMbQHghepj4T C39bTHqQ+7vCRSmHvoRiPORQvq9vGaVg3IlPYkvv6DHoRrWIaqYzMF+5Nqw8jV9Ab5oddWS9AlsR 68ra6CN7tLNEpiHNXr1HF/QySO12LC5TDRdWQ/rOd/94mb0gAy415dvi4zhkrumUfxtYzW+zl7Ko myAuYwyqghIE6hKGco6VibCem6snYccIvIw4dfGSmsP6e6hmstI6fEjmleB6WjD48XnjhB3JOf1o QH10PxUJLzpJDho0nTePhhfE5FPO3Z3P6THhU/kuLB8Qu82za8vVeRM2bJ8ypqbZ4u2dDLckEcKt 71dt994PrznvaPCgL3EIBR35KrOzQFsB/abM6b6EeZmBvq02lo2xGO8b0B/ObENMpHOgcpzReS0w yBVcNbB4zbQ69Fgfaer1efW2rqGwNUcsqfi5XJGkCsIcHPar2Ufip1XnPeZHqtR7VpL4XW0pl+bF hDZNJXGGybdiCV1SJpEZ30I4L62xI6yS5RmR4bjt2eS8Khs+SmfoA67EZRNydFkulaUzz/Svg3Ts +Gc0ItBiTVZ3fHeN48lgOHufYlEL0kgvyegWieiCXIPGPDMeLaqT8Kj6tQGHTXhGBBAn08rJ9CiJ R8PoeDCEhba2dDA6Hc64uU51ip78u7D89odiUkQrpgNK3Yxoot0bRqPTGTrpp3riQgHX93v0f/f+ fa0Wv39/0hsM378XxIA9YJ6r8ehNjjgSgfCqin+Xo3otelIVTfD0YnpnDdAbjQVUZ72bxJO4XMG4 LyUxl7QBJG6NydG0GkUbOIn1aOMQGAREqw44FhZByxZByxZBy1ThRdAy4p4WQcsWQcsWQcsWQcsW QcsWQcv8pRGkzYugZYFvEbRsEbSMvkXQskXQskXQskXQskXQslv5ZPwvFJbeUvivnPhf0fqTp09F /K9vnz55uobxv56uPl3E/7qLj6SpHHBJhRMq4L2DBCW0+IskLKpWx0BaiomV9Nj5G8lkqewkIm3U iUoMBRmwRwPJqlItgbIbl9KtarU/OHMTpbDKyvRlWXaNforAHejpGIjiShYqchxWAmO0on7JKm1l jI4H05msB5lxpw0s2AH6TZx/qVxb6+YAHPemMBmXg1rOq62uATxJnxqmMHqeUC8H3hqVQ7gETFIT 8Jt1DSlw7Y8OVuwJMwt825WZ05QOUTNTVRab/TZWI54SteLH3gSPxWI3DoySCx0L6LhroxFqyQK4 fLuym3LfDWbH0G5rOJuM+oIpcSc0VMSFedzhnSBg8UrBCSzSGfx19QJzVuTKSspvJx1zxs87lVr8 j7jb7cCSQMBuoYCXVmKEiTLlP86lgpE23n8mFdYzp5e4z011788nz/92s7H5unk7beSd/9+uyvif 334H/+D5v/ZkbXH+38W3JyXmUT89GQ0pwuOIo4F+nMC+5esDMwhix53i/YfjPVYKrRlKhaZwdE2l lFZfPPC9dFoo0PM1MJfRsogSuVZZi0rTNCUAams0GRwNhmV+zKwcRQAqmBJ6mxFRIwsySi1k68iX IjKi4GCxqEAUU6KD3jRFqeNoOoMLLbfTo34R65MVb9K8XMhXhYPTCb5gQTXDEcpvhdyW2pmewpH9 bzAWAHIwG6YcgxOG5nAwOUn7ZnUfzqFX0zGfRXLQRqczfHwq5AdWfYPBIUXDI3xgTqCZ8Tl2sT84 FGE5+4OpOEkY++nocPaRhOqQOZhNC3bw1EMSLZ1H49MJkGuGwqiX+C5/mJI0C6VagDnMBNwS+0kB rsJngz4++8hon70Po7OUcOEODEezwUGK0TDTHl4uo97xMWYPMPDnkAsWKJYnVeEXJBiMIat77NZZ mJ6OkWvF+bS6lKgmWOyLt18Yanc4SSj2IS1goFRCsX+WAgStcBiT8SkwLAeD2TniMOsNhkIAq4ZX z2BBDbEcuCnrnh8ktJNmcAGF4wgrVd2pFAqvG+291nb00842YPTfP7d2o83W7outRuv1btTY2op+ brTbje29VnM3+rm19ypqN39stDejvZ1o71Vrt7C783IPQJpJ1Np+sfVus7X9IxVrvX6z1WpumsV3 Xkavm+0Xr+DPxvPWVmvv16ixvVl42drbbu7uYgXR9k7U/Km5vRftvsJKfNyeN6OtVuP5VjN6udOG 4r9Gu2+aL1qNraTQ2t5stZsv9iLIeLGzvdt8+w5qgqxos/G68SMiwCXknz+/auzt7kB7bejU7rst aOrHwsv2zutoa2eX0H23C/3abOw1sOib9g6gCnj+/Kq59woKAWYN+O/FXgvwA2hodK8NfyaF7eaP W60fm9svmlhwh6D3dqAvO+92RYEkarRbuzhaO+/2sPQOVQh1bDcJoECjDUURC2q/2YY+v25QrS9p 9CM5+pUFQ7H4Ft/iW3yL78v6/g9ORwHsAEABAA== --Multipart_Tue_Dec_12_00:24:52_2000-1 Content-Type: text/plain; charset=US-ASCII --Multipart_Tue_Dec_12_00:24:52_2000-1-- From chetan@pybiz.com Tue Dec 12 01:07:14 2000 From: chetan@pybiz.com (chetan patel) Date: Mon, 11 Dec 2000 17:07:14 -0800 Subject: [XML-SIG] [ANNOUNCE] XDisect 1.0 - An XML Indexing and Search Engine References: <200012112324.AAA01115@loewis.home.cs.tu-berlin.de> Message-ID: <05bf01c063d7$dd8540f0$09d40518@C746107A> PyBiz Inc announces release 1.0 of its product XDisect , an XML Indexing and Search Engine The release can be downloaded for free evaluation at the following url http://www.xdfind.com/ I would appreciate your feedback and comments on the product. Product Overview ============= XDisect is an enterprise class XML search product with high speed XML indexing capabilities. XDisect is ideal for distributed management of XML Documents. XDisect provides a solid foundation for next generation vertical markets, secure portals and other dynamic e-business applications. Features in this release =============== - XDisect is completely written in Python 1.5.2 - High speed indexing of millions of XML documents - Index sizes can be in excess of 2 GBs - Supports Incremental indexing / updates to documents - Runs on Linux, Solaris, Win NT/2000 - Supports the SQL query language - Support sophisticated joins, keywords, free text, path based searching - Supports Oracle's XSQL query standard - Open HTTP/XML Api Interface for integration with most popular programming environments - XSLT Integration for direct html rendering of XML query results or transformation - Brwoser Based GUI for developers to look at the schemas and documents stored in the repository regards Chetan Patel PyBiz, Inc www.pybiz.com From kentsin@sinaman.com Tue Dec 12 15:21:31 2000 From: kentsin@sinaman.com (kentsin) Date: Tue Dec 12 09:21:31 CST 2000 Subject: [XML-SIG] xml / html parsing for web Message-ID: <20001212012131.22258.qmail@hk.sina.com.hk> I have download 4Suite but I found it difficult to understand from the document to build what I want. I have also read the linkcheck code which contain a very smart regular expression to parse almost all links. What I found missing is a javascript driven or form driven links : some site have <option .... value="link1"... Which linkchecker can not follow. Moreover, I would like to extract the form data and link them with labels found on the page. Associating the link with the hot text or image. Which linkchecker can not. Linkchecker's regular expression approach is much clear to me, but as a newbie I would like to hear from you that how far can it go? Does it worth for me to go into the 4dom way? Can somebody point me to some 4dom sample code? Many thanks to all who reply. Best Regards, Kent Sin =================================================================== ·s®ö§K¶O¹q¤l¶l½c http://sinamail.sina.com.hk ¥ß§Y¤U¸ü SinaTicker http://sinaticker.sina.com.hk From kentsin@sinaman.com Tue Dec 12 09:26:08 2000 From: kentsin@sinaman.com (kentsin) Date: Tue Dec 12 09:26:08 HKT 2000 Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web Message-ID: <20001212012608.14096.qmail@hk.sina.com.hk> Dear all, I just come across SAX, is it useful in my task? How does it compare to DOM and regular expression? Rgs, Kent Sin =================================================================== ·s®ö§K¶O¹q¤l¶l½c http://sinamail.sina.com.hk ¥ß§Y¤U¸ü SinaTicker http://sinaticker.sina.com.hk From noreply@sourceforge.net Tue Dec 12 01:51:03 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Dec 2000 17:51:03 -0800 Subject: [XML-SIG] [Bug #125424] Node.replaceChild broken in minidom Message-ID: <200012120151.RAA03038@usw-sf-web2.sourceforge.net> Bug #125424, was updated on 2000-Dec-11 17:51 Here is a current snapshot of the bug. Project: Python/XML Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: keffy Assigned to : nobody Summary: Node.replaceChild broken in minidom Details: In xml.dom.minidom, Node.replaceChild doesn't replace any children. The definition from the source is: def replaceChild(self, newChild, oldChild): index = self.childNodes.index(oldChild) self.childNodes[index] = oldChild Is there a good reason why it's not the following? def replaceChild(self, newChild, oldChild): index = self.childNodes.index(oldChild) self.childNodes[index] = newChild Sorry if this is a repeat report or addresses a design decision for the "mini" of minidom. Sorry also that I'm clueless about the Unix-style patch system -- this is as close as I come to submitting a fix. :-) -- Kevin Russell krussll@cc.umanitoba.ca For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125424&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Tue Dec 12 08:32:41 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Dec 2000 09:32:41 +0100 Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk> (message from kentsin on Tue Dec 12 09:26:08 HKT 2000) References: <20001212012608.14096.qmail@hk.sina.com.hk> Message-ID: <200012120832.JAA00700@loewis.home.cs.tu-berlin.de> > I just come across SAX, is it useful in my task? If you use an HTML parser (instead of an XML one), then maybe, yes. > How does it compare to DOM and regular expression? It is an event-based API, instead of a tree-based or a function-based one. Regards, Martin From larsga@garshol.priv.no Tue Dec 12 09:25:41 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Dec 2000 10:25:41 +0100 Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk> References: <20001212012608.14096.qmail@hk.sina.com.hk> Message-ID: <m3bsuiq9d6.fsf@lambda.garshol.priv.no> * kentsin@sinaman.com | | I just come across SAX, is it useful in my task? How does it compare | to DOM and regular expression? Like the DOM SAX is used to work with a real XML parser. The DOM gives you back the document as a full object structure, whereas SAX instead gives you the document as a series of method calls. So SAX is faster and requires less memory, the DOM is easier to understand. Which is easier to use depends on what you want to do. --Lars M. From mak@mikroplan.com.pl Tue Dec 12 10:26:38 2000 From: mak@mikroplan.com.pl (Grzegorz Makarewicz) Date: Tue, 12 Dec 2000 11:26:38 +0100 Subject: [XML-SIG] [BUG] sax.ExpatParser.reset Message-ID: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl> Test failure in test/test_sax.test_expat_incremental_reset due to bug in sax.expatreader. mak --- expatreader.py Thu Nov 02 18:23:08 2000 +++ _xmlplus\sax\expatreader.py Tue Dec 12 11:16:05 2000 @@ -69,8 +69,8 @@ def feed(self, data, isFinal = 0): if not self._parsing: - self._parsing = 1 self.reset() + self._parsing = 1 self._cont_handler.startDocument() try: @@ -118,6 +118,7 @@ # self._parser.NotStandaloneHandler = self._parser.ExternalEntityRefHandler = self.external_entity_ref + self._parsing = 0 self._entity_stack = [] # Locator methods From martin@loewis.home.cs.tu-berlin.de Tue Dec 12 08:35:26 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Dec 2000 09:35:26 +0100 Subject: [XML-SIG] xml / html parsing for web In-Reply-To: <20001212012131.22258.qmail@hk.sina.com.hk> (message from kentsin on Tue Dec 12 09:21:31 CST 2000) References: <20001212012131.22258.qmail@hk.sina.com.hk> Message-ID: <200012120835.JAA00746@loewis.home.cs.tu-berlin.de> > Linkchecker's regular expression approach is much clear to me, but > as a newbie I would like to hear from you that how far can it go? I think that's hard to tell. Just draft some code, and see yourself. > Can somebody point me to some 4dom sample code? Please have a look at the demo/dom directory of PyXML. Regards, Martin From larsga@garshol.priv.no Tue Dec 12 10:31:27 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Dec 2000 11:31:27 +0100 Subject: [XML-SIG] [BUG] sax.ExpatParser.reset In-Reply-To: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl> References: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl> Message-ID: <m33dfuq6bk.fsf@lambda.garshol.priv.no> * Grzegorz Makarewicz | | Test failure in test/test_sax.test_expat_incremental_reset | due to bug in sax.expatreader. Thank you Grzegorz, but this bug was already fixed in the CVS tree. (Revision 1.18, 2000-10-14.) --Lars M. From larsga@garshol.priv.no Tue Dec 12 11:43:09 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Dec 2000 12:43:09 +0100 Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference In-Reply-To: <14899.17068.574076.957348@lindm.dm> References: <14899.17068.574076.957348@lindm.dm> Message-ID: <m3y9xlq302.fsf@lambda.garshol.priv.no> * Dieter Maurer | | I use the SAX2 implementation bundled with the Python 2.0 | distribution to process DocBook/XML documents. | | When I turn on validation, "xmlproc" complains | "unsupported character number 'XXXX' in character reference" | for each XXXX larger than 255. | | Apparently, "xmlproc" does not yet know that such character | references no longer make problems with the new Python | unicode support. You are quite right, xmlproc has not yet been updated to Python 2.0, chiefly because I am too busy writing my book to do much development these days. I'm planning to add full Unicode support to it, but that probably won't happen for another couple of months. --Lars M. From calvin@cs.uni-sb.de Tue Dec 12 15:35:20 2000 From: calvin@cs.uni-sb.de (Bastian Kleineidam) Date: Tue, 12 Dec 2000 16:35:20 +0100 (CET) Subject: [XML-SIG] xml / html parsing for web In-Reply-To: <20001212012131.22258.qmail@hk.sina.com.hk> Message-ID: <Pine.LNX.4.21.0012121624570.31907-100000@earth.cs.uni-sb.de> Kent, > contain a very smart regular expression to parse almost all links. What > I found missing is a javascript driven or form driven links : some site > have <option .... value="link1"... > Which linkchecker can not follow. Yes. In general you can not tell if the option "value" is a link or if it is just some data. The same is with Javascript. I can construct links out of many parts: <script> mybase = "mydata/sub1" if browser=="IE" { url = mybase+"/ieblubb.html" else { url = mybase+"/netscapeblubb.html" } </script> It is difficult to extract such dynamic urls. > Moreover, I would like to extract the form data and link them with > labels found on the page. Associating the link with the hot text or > image. Which linkchecker can not. Yes, its the same. Generally I think you can not always extract dynamic URLs out of forms or Javascript because you never know if they are really URLs or just data. Bastian From calvin@cs.uni-sb.de Tue Dec 12 15:36:47 2000 From: calvin@cs.uni-sb.de (Bastian Kleineidam) Date: Tue, 12 Dec 2000 16:36:47 +0100 (CET) Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk> Message-ID: <Pine.LNX.4.21.0012121635430.31907-100000@earth.cs.uni-sb.de> > I just come across SAX, is it useful in my task? How does it compare to > DOM and regular expression? SAX is a parser, DOM is a parsetree format. You can use both. A parsetree is usually the output from a parser. Bastian From larsga@garshol.priv.no Tue Dec 12 21:52:44 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 12 Dec 2000 22:52:44 +0100 Subject: [XML-SIG] Sab-pyth Message-ID: <m3g0jtz4r7.fsf@lambda.garshol.priv.no> Has anyone been able to compile Sab-pyth? I can't do it at all on Windows and am having problems on Linux, so if anyone could make this available to me I would be very grateful. Windows is preferred, but Linux is also good. --Lars M. From martin@loewis.home.cs.tu-berlin.de Tue Dec 12 22:40:19 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Dec 2000 23:40:19 +0100 Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference In-Reply-To: <14899.17068.574076.957348@lindm.dm> (message from Dieter Maurer on Sun, 10 Dec 2000 09:45:32 +0100 (CET)) References: <14899.17068.574076.957348@lindm.dm> Message-ID: <200012122240.XAA00752@loewis.home.cs.tu-berlin.de> > Apparently, "xmlproc" does not yet know that such character > references no longer make problems with the new Python > unicode support. Indeed. xmlproc currently does not use the Unicode type. > Is there already a fix? Not that I know of; Lars has not put anything into PyXML, yet. Regards, Martin From kentsin@sinaman.com Wed Dec 13 21:45:15 2000 From: kentsin@sinaman.com (kentsin) Date: Wed Dec 13 21:45:15 HKT 2000 Subject: [XML-SIG] xml / html parsing for web Message-ID: <20001213134515.25819.qmail@hk.sina.com.hk> Yes, you are right. There are no general way to do this. I am not making a general spider, my job is to collect some information on the web automatically. I have a small set of targets, so I would like to build a framework of spider which I could customer for every target site. One of the target contains links build with a pull down option list. So I need a way to include that. I think the regular expression way is simple for newbie like me to handle, the problem is that it seems very difficult to customize like the above cases? The other problem is that I want to base the selection of action on hot words (which is the words between <a> and </a>.) And I want to preserve the order of the links so I could customer the action to choose a specific link by its location. I think the regular expression method is very difficult for this, but I have try with the parser way, but they crash with ill structure htmls. There are many parser modules comes with python, Can someone comment on them on my case? How to choose between them? =================================================================== ·s®ö§K¶O¹q¤l¶l½c http://sinamail.sina.com.hk ¥ß§Y¤U¸ü SinaTicker http://sinaticker.sina.com.hk From noreply@sourceforge.net Wed Dec 13 15:54:27 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Dec 2000 07:54:27 -0800 Subject: [XML-SIG] [Bug #125668] DbDom : Reader produces DocumentFragments Message-ID: <E146EEd-0005Dt-00@usw-sf-web3.sourceforge.net> Bug #125668, was updated on 2000-Dec-13 07:54 Here is a current snapshot of the bug. Project: Python/XML Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: DbDom : Reader produces DocumentFragments Details: Hi Mike! When used without an Document parameter, Reader.fromStream() returns a DocumentFragment (instead of a Document). This is because when the document parameter is None, fromStream creates a new DocumentImp and passes it to Sax2.Reader.fromStream which returns a DocumentFragment. A way to correct this would be to append the DF to the newly created Document in that case. I'll see if I can setup a patch for this one. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125668&group_id=6473 From noreply@sourceforge.net Wed Dec 13 16:01:51 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Dec 2000 08:01:51 -0800 Subject: [XML-SIG] [Patch #102818] DbDom patch for bug #125668 (Reader and Doc Frags) Message-ID: <E146ELn-0003wn-00@usw-sf-web1.sourceforge.net> Patch #102818 has been updated. Project: pyxml Category: 4Suite Status: Open Submitted by: afayolle Assigned to : nobody Summary: DbDom patch for bug #125668 (Reader and Doc Frags) ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102818&group_id=6473 From Alexandre.Fayolle@logilab.fr Wed Dec 13 17:45:25 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 13 Dec 2000 18:45:25 +0100 (CET) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012111908.MAA05005@localhost.localdomain> Message-ID: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr> On Mon, 11 Dec 2000 uche.ogbuji@fourthought.com wrote: > To Ft/Dom/__init__.py and expected everything to break, but all was well. It > seems that at least Python 2.0 is clever when the same import can be made as a > package and an object. Is this also the casde with Python 1.5.2? I tried that with python 1.5.2 (adding a empty Node class to xml/dom/__init__.py) and it looks like it's fine too. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From fdrake@acm.org Wed Dec 13 17:44:36 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Dec 2000 12:44:36 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr> References: <200012111908.MAA05005@localhost.localdomain> <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr> Message-ID: <14903.46468.622296.363688@cj42289-a.reston1.va.home.com> Alexandre Fayolle writes: > I tried that with python 1.5.2 (adding a empty Node class to > xml/dom/__init__.py) and it looks like it's fine too. Great! Now I won't have to worry about needing to back out my changes. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From uche.ogbuji@fourthought.com Wed Dec 13 22:59:31 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 13 Dec 2000 15:59:31 -0700 Subject: [XML-SIG] Mixed encodings and XML Message-ID: <3A37FF53.206662F3@fourthought.com> [crossposted: 4Suite, xml-sig, i18n-sig] Time for me to expose my ignorance on XML and i18n again. How would one go about creating a well-formed XML document with multiple encodings? For instance, if I had UCS-2, UTF-8 and BIG5 all in one doc, how could I make it work. Take the following example ftp://ftp.fourthought.com/pub/etc/HOWTO/cjkv.doc This document is a CJKV HOWTO by Chen Chien-Hsun. He originally wrote it in HTML. See ftp://ftp.fourthought.com/pub/etc/HOWTO/CJKV_4XSLT.HTM It contains many sections within HTML PREs with the different encodings I mentioned. They look like <PRE LANG="zh-TW"> ... BIG5-encoded stuff ... </PRE> I need to convert the document to XML Docbook format. My naive attempts at converting to <screen xml:lang="zh-TW"> ... BIG5-encoded stuff ... </screen> Of course don't work because the parser takes one look at the BIG5 and throws a well-formedness error. Is there any way to manage this besides using XInclude? Do any of the Python parsers have any tricks that could help? Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tree@basistech.com Wed Dec 13 23:09:47 2000 From: tree@basistech.com (Tom Emerson) Date: Wed, 13 Dec 2000 18:09:47 -0500 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: <3A37FF53.206662F3@fourthought.com> References: <3A37FF53.206662F3@fourthought.com> Message-ID: <14904.443.228020.168633@cymru.basistech.com> Uche Ogbuji writes: > It contains many sections within HTML PREs with the different encodings > I mentioned. They look like > > <PRE LANG="zh-TW"> > ... BIG5-encoded stuff ... > </PRE> The LANG attribute does not specify an encoding, it specifies a language. You cannot safely imply anything about the encoding based on the value of the LANG attribute. For example, "zh-TW" text could be encoded in Big 5, Big 5+, GBK, CP950, CP936, EUC-CN (depending on the text), ISO-2022-CN, ISO-2022-CN-EXT, and others. The LANG attribute can be used by the application to help generate the appropriate glyph variants, however, though I don't know of any off hand that do this. > I need to convert the document to XML Docbook format. My naive attempts > at converting to > > <screen xml:lang="zh-TW"> > ... BIG5-encoded stuff ... > </screen> > > Of course don't work because the parser takes one look at the BIG5 and > throws a well-formedness error. Which it is required to do, see Section 4.3.3 of the XML specification. > Is there any way to manage this besides using XInclude? Do any of the > Python parsers have any tricks that could help? Convert all of those sections into Unicode, using UTF-8 as the encoding form. You could write a trivial Python script to do this for you. The bigger problem (IMHO) will be convincing your DocBook tool chain to handle the Asian characters. If you find a good solution to that (i.e., allowing Simplified and Traditional Chinese, Korean, and (say) Thai in a single document) let me know. -tree -- Tom Emerson Basis Technology Corp. Zenkaku Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From mal@lemburg.com Wed Dec 13 23:22:50 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 14 Dec 2000 00:22:50 +0100 Subject: [XML-SIG] Mixed encodings and XML References: <3A37FF53.206662F3@fourthought.com> <14904.443.228020.168633@cymru.basistech.com> Message-ID: <3A3804CA.5DC4B238@lemburg.com> Tom Emerson wrote: > > > I need to convert the document to XML Docbook format. My naive attempts > > at converting to > > > > <screen xml:lang="zh-TW"> > > ... BIG5-encoded stuff ... > > </screen> > > > > Of course don't work because the parser takes one look at the BIG5 and > > throws a well-formedness error. > > Which it is required to do, see Section 4.3.3 of the XML specification. This is not really related to text encodings, but somewhat similar: Is there a standard way of including binary data in XML files ? I would like to put a complete web-site into a (large) XML file. The XML file should ideally contain not only the structure information, attributes, etc. but also the HTML files, the images and maybe even sound files or flash apps. Is something like this possible or will I have to use some other storage method for the binary parts and reference these from within the XML file (I would prefer not to, so that I can include e.g. the HTML file content in XML searches) ? Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji@fourthought.com Thu Dec 14 00:14:40 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 13 Dec 2000 17:14:40 -0700 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: Message from Tom Emerson <tree@basistech.com> of "Wed, 13 Dec 2000 18:09:47 EST." <14904.443.228020.168633@cymru.basistech.com> Message-ID: <200012140014.RAA15620@localhost.localdomain> > Uche Ogbuji writes: > > It contains many sections within HTML PREs with the different encodings > > I mentioned. They look like > > > > <PRE LANG="zh-TW"> > > ... BIG5-encoded stuff ... > > </PRE> > > The LANG attribute does not specify an encoding, it specifies a > language. You cannot safely imply anything about the encoding based on > the value of the LANG attribute. For example, "zh-TW" text could be > encoded in Big 5, Big 5+, GBK, CP950, CP936, EUC-CN (depending on the > text), ISO-2022-CN, ISO-2022-CN-EXT, and others. > > The LANG attribute can be used by the application to help generate the > appropriate glyph variants, however, though I don't know of any off > hand that do this. Makes sense, but I wasn't clear on this. > > I need to convert the document to XML Docbook format. My naive attempts > > at converting to > > > > <screen xml:lang="zh-TW"> > > ... BIG5-encoded stuff ... > > </screen> > > > > Of course don't work because the parser takes one look at the BIG5 and > > throws a well-formedness error. > > Which it is required to do, see Section 4.3.3 of the XML specification. I'm quite aware of this (I read the XML spec more often that I'd like to). That's why I said "of course". > > Is there any way to manage this besides using XInclude? Do any of the > > Python parsers have any tricks that could help? > > Convert all of those sections into Unicode, using UTF-8 as the > encoding form. You could write a trivial Python script to do this for > you. Not what I need, unfortunately. The whole point of the exercise is to have examples in the actual encodings. > The bigger problem (IMHO) will be convincing your DocBook tool chain > to handle the Asian characters. If you find a good solution to that > (i.e., allowing Simplified and Traditional Chinese, Korean, and (say) > Thai in a single document) let me know. Hmm? My docbook tool is simply 4XSLT, which handles the individual encodings just fine now. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu Dec 14 00:18:49 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 13 Dec 2000 17:18:49 -0700 Subject: [XML-SIG] Mixed encodings and XML In-Reply-To: Message from "M.-A. Lemburg" <mal@lemburg.com> of "Thu, 14 Dec 2000 00:22:50 +0100." <3A3804CA.5DC4B238@lemburg.com> Message-ID: <200012140018.RAA15661@localhost.localdomain> > This is not really related to text encodings, but somewhat similar: > > Is there a standard way of including binary data in XML files ? No. > I would like to put a complete web-site into a (large) XML file. > The XML file should ideally contain not only the structure > information, attributes, etc. but also the HTML files, the images > and maybe even sound files or flash apps. Ah. This is similar to what the ebXML folks and the SOAP folks were at odds over. Not, this is a well-known deficiency in XML. The most common suggestion is: put it all into one file, separate them with form-feeds, and have the application process each bit separately. Clearly this doesn't suit your needs, but there's not much more to go on right now. > Is something like this possible or will I have to use some > other storage method for the binary parts and reference these > from within the XML file (I would prefer not to, so that I can > include e.g. the HTML file content in XML searches) ? Could you expand on this last bit about the searches? It hints at what might be a work-around if that's your main concern. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tree@basistech.com Thu Dec 14 01:05:43 2000 From: tree@basistech.com (Tom Emerson) Date: Wed, 13 Dec 2000 20:05:43 -0500 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: <200012140014.RAA15620@localhost.localdomain> References: <tree@basistech.com> <14904.443.228020.168633@cymru.basistech.com> <200012140014.RAA15620@localhost.localdomain> Message-ID: <14904.7399.328781.898962@cymru.basistech.com> uche.ogbuji@fourthought.com writes: > > Convert all of those sections into Unicode, using UTF-8 as the > > encoding form. You could write a trivial Python script to do this for > > you. > > Not what I need, unfortunately. The whole point of the exercise is > to have examples in the actual encodings. And the point of that is what? They will display (most probably) as jibberish within the browser... or is that the point? > Hmm? My docbook tool is simply 4XSLT, which handles the individual encodings > just fine now. Sure, but if you want to generate a LaTeX (and from there PDF or PS) version you're screwed, AFAIK. If you are just generating HTML then you're OK. -tree -- Tom Emerson Basis Technology Corp. Zenkaku Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From uche.ogbuji@fourthought.com Thu Dec 14 01:17:51 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 13 Dec 2000 18:17:51 -0700 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: Message from Tom Emerson <tree@basistech.com> of "Wed, 13 Dec 2000 20:05:43 EST." <14904.7399.328781.898962@cymru.basistech.com> Message-ID: <200012140117.SAA15823@localhost.localdomain> > uche.ogbuji@fourthought.com writes: > > > Convert all of those sections into Unicode, using UTF-8 as the > > > encoding form. You could write a trivial Python script to do this for > > > you. > > > > Not what I need, unfortunately. The whole point of the exercise is > > to have examples in the actual encodings. > > And the point of that is what? They will display (most probably) as > jibberish within the browser... or is that the point? Good question. I have not tried Chen Chien-Hsun's original HTML. Perhaps even that won't work in a browser. Makes sense. What does a browser do with a document with <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'> ^^^^^^^^^^ !!!!???!!!! In the header and then runs into a big patch of UCS-2 or BIG5? My guess is that it displays gibberish as you suggest. In this case, I think there's no point expecting HTML generated from XML to do any better and it simply makes sense to break out the alternatively encoded portions into separate, linked files. Chen, does this make sense? > > Hmm? My docbook tool is simply 4XSLT, which handles the individual encodings > > just fine now. > > Sure, but if you want to generate a LaTeX (and from there PDF or PS) > version you're screwed, AFAIK. If you are just generating HTML then > you're OK. Yeah. That's all for now. Thanks much. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tree@basistech.com Thu Dec 14 01:22:19 2000 From: tree@basistech.com (Tom Emerson) Date: Wed, 13 Dec 2000 20:22:19 -0500 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: <200012140117.SAA15823@localhost.localdomain> References: <tree@basistech.com> <14904.7399.328781.898962@cymru.basistech.com> <200012140117.SAA15823@localhost.localdomain> Message-ID: <14904.8395.286379.623954@cymru.basistech.com> uche.ogbuji@fourthought.com writes: > Good question. I have not tried Chen Chien-Hsun's original HTML. > Perhaps even that won't work in a browser. Makes sense. What does > a browser do with a document with > > <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'> > ^^^^^^^^^^ > !!!!???!!!! > > In the header and then runs into a big patch of UCS-2 or BIG5? It treats those bytes as 8-bit Latin 1 characters and it displays them. Once you've seen enough of these you start recognizing the patterns, but it is still junk. > My guess is that it displays gibberish as you suggest. In this case, I think > there's no point expecting HTML generated from XML to do any better and it > simply makes sense to break out the alternatively encoded portions into > separate, linked files. No. What makes sense, if the intention of the original author is to show the Chinese text correctly, is to convert that section to UTF-8 and put that in the document. -tree -- Tom Emerson Basis Technology Corp. Zenkaku Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From uche.ogbuji@fourthought.com Thu Dec 14 02:45:46 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 13 Dec 2000 19:45:46 -0700 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: Message from Tom Emerson <tree@basistech.com> of "Wed, 13 Dec 2000 20:22:19 EST." <14904.8395.286379.623954@cymru.basistech.com> Message-ID: <200012140245.TAA16426@localhost.localdomain> > > My guess is that it displays gibberish as you suggest. In this case, I think > > there's no point expecting HTML generated from XML to do any better and it > > simply makes sense to break out the alternatively encoded portions into > > separate, linked files. > > No. What makes sense, if the intention of the original author is to > show the Chinese text correctly, is to convert that section to UTF-8 > and put that in the document. Eccovi! Now I understand why we've been talking past each other. I assumed you'd read the text in question: bad assumption, I admit. No. The intention is not to display Chinese characters correctly. The intention, I'm pretty sure, is to provide examples than can be cut and pasted in order for people to play with the various snippets themselves. As such, I'm not really concerned about what the HTML rendering looks like when it hits the different encodings. What I was originally writing about was: 1. Is there any way to convince an XML parser to work with source with mixed encoding. The exchange with you has helped disabuse me of any silly notion that this might be so. So I shall have to use XInclude. 2. Will the results of the rendering be such that the LATIN-1 parts can be read normally and the portions with other encodings would be available for cut and paste? If I use XInclude, no reason why not. So thanks for all the help. I think I was pretty much on a fool's errand from the start, but at least I know how to proceed. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Thu Dec 14 03:05:01 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 14 Dec 2000 04:05:01 +0100 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: <3A37FF53.206662F3@fourthought.com> (message from Uche Ogbuji on Wed, 13 Dec 2000 15:59:31 -0700) References: <3A37FF53.206662F3@fourthought.com> Message-ID: <200012140305.EAA00999@loewis.home.cs.tu-berlin.de> > How would one go about creating a well-formed XML document with multiple > encodings? As others have pointed out: You don't. XML documents are in Unicode. They may have some other encoding *for transfer*, but conceptually, they are still in Unicode. > It contains many sections within HTML PREs with the different encodings > I mentioned. They look like > > <PRE LANG="zh-TW"> > ... BIG5-encoded stuff ... > </PRE> So what you really want is to include binary data in a tag. As you've explained yourself when answering to Marc-Andre: That is not supported in XML. Of course, if XML had a BDATA type (or section) you could include a binary data fragment, and then any presentation tool would have to provide visualization (such as opening a hex editor on double-click). In the specific case of cjkv.doc, I guess the best approach would be: - use Python string escapes in Python code, e.g. sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9" # Shift-JIS encoded source string - use Unicode text data where output is intended to be displayed properly - don't cite the output if it will come out as gibberish on any terminal (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead, explain what the user will likely see. Regards, Martin From tpassin@home.com Thu Dec 14 04:00:00 2000 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 13 Dec 2000 23:00:00 -0500 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML References: <3A37FF53.206662F3@fourthought.com> <200012140305.EAA00999@loewis.home.cs.tu-berlin.de> Message-ID: <00c001c06582$54fa0840$7cac1218@reston1.va.home.com> Martin v. Loewis chimed in - > So what you really want is to include binary data in a tag. As you've > explained yourself when answering to Marc-Andre: That is not supported > in XML. Of course, if XML had a BDATA type (or section) you could > include a binary data fragment, and then any presentation tool would > have to provide visualization (such as opening a hex editor on > double-click). > > In the specific case of cjkv.doc, I guess the best approach would be: > - use Python string escapes in Python code, e.g. > sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9" > # Shift-JIS encoded source string > - use Unicode text data where output is intended to be displayed properly > - don't cite the output if it will come out as gibberish on any terminal > (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead, > explain what the user will likely see. > How about a good old-fashioned PI? The PI could indicate when to switch to another encoding for the purposes of display or conversion. True, this takes a specialized processor, but you are asking for specialized processing anyway. This kind of instruction to a processor is just what a PI is supposed to be for, I always thought. Cheers, Tom P From uche.ogbuji@fourthought.com Thu Dec 14 04:14:47 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 13 Dec 2000 21:14:47 -0700 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: Message from "Thomas B. Passin" <tpassin@home.com> of "Wed, 13 Dec 2000 23:00:00 EST." <00c001c06582$54fa0840$7cac1218@reston1.va.home.com> Message-ID: <200012140414.VAA16674@localhost.localdomain> > Martin v. Loewis chimed in - > > > So what you really want is to include binary data in a tag. As you've > > explained yourself when answering to Marc-Andre: That is not supported > > in XML. Of course, if XML had a BDATA type (or section) you could > > include a binary data fragment, and then any presentation tool would > > have to provide visualization (such as opening a hex editor on > > double-click). > > > > In the specific case of cjkv.doc, I guess the best approach would be: > > - use Python string escapes in Python code, e.g. > > sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9" > > # Shift-JIS encoded source string > > - use Unicode text data where output is intended to be displayed properly > > - don't cite the output if it will come out as gibberish on any terminal > > (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead, > > explain what the user will likely see. > > > How about a good old-fashioned PI? The PI could indicate when to switch to > another encoding for the purposes of display or conversion. True, this takes > a specialized processor, but you are asking for specialized processing anyway. > This kind of instruction to a processor is just what a PI is supposed to be > for, I always thought. Very interesting thought. However, my intention is to try to handle the CJKV doc with a minimum of highly specialized processing. So now that I've come to my senses, I think I'll stick to my conclusion. Besides, it will give me a chance to consider XInclude support throughout 4Suite. Thanks to all for yor patience even when I wasn't making much sense. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From kajiyama@grad.sccs.chukyo-u.ac.jp Thu Dec 14 04:31:20 2000 From: kajiyama@grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA) Date: Thu, 14 Dec 2000 13:31:20 +0900 Subject: [XML-SIG] Re: Mixed encodings and XML In-Reply-To: <200012140245.TAA16426@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012140120.KAA14252@dhcp198.grad.sccs.chukyo-u.ac.jp> Message-ID: <200012140431.NAA14495@dhcp198.grad.sccs.chukyo-u.ac.jp> uche.ogbuji@fourthought.com wrote: | | The intention, I'm pretty sure, is to provide examples than | can be cut and pasted in order for people to play with the | various snippets themselves. I don't think that mixing different encodings in a document is a good idea. A brower assumes an encoding when reading a sequence of characters from a stream. If the browser finds one or more bytes out of the expected range, the result of decoding is undefined in general. So, cut-and-paste may or may not pass correct character data to the user. Safer ways for giving examples in various encodings are: - to use Unicode for displaying code snippets in the document the end users see on their browsers, and - to use native encodings in separate files to provide the real code snippets. Authoring an XML source of the document is another story. Regards, -- KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp> From fdrake@acm.org Thu Dec 14 04:54:58 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Dec 2000 23:54:58 -0500 (EST) Subject: [XML-SIG] Pending xml.dom patches for Python 2.1 Message-ID: <14904.21154.374339.173523@cj42289-a.reston1.va.home.com> There are currently patches pending for xml.dom in the SourceForge patch manager for Python: http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102477&grou= p_id=3D5470 This extends the minidom and pulldom modules to support more of the DOM and fix a range of smallish bugs. It is an improvement, but should not be considered "complete"; see the notes I added to the patch with the today's update for a TODO list. Assigned to Martin von L=F6wis for review. http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102485&grou= p_id=3D5470 Andrew's patch to check the validity of node insertions by their nodeType. This needs to be updated to use the exceptions recently added to the xml.dom package (in __init__.py), but otherwise should be easy to integrate with the changes from the first patch. Marked out-of-date since it needs an update and integration with the first patch. http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102492&grou= p_id=3D5470 This patch will probably need to be substantially revised once the changes noted in the TODO list in the comments on the first patch have been made, but should work reasonably once those changes have been made. Marked postponed since the other patches and noted changes need to be resolved first, since they heavily impact the implementation of this functionality. Assigned back to Andrew to update and re-open once the other changes have been handled and checked in. Getting these patches finished and checked in should allow both open bugs against the XML support in Python CVS to be closed: http://sourceforge.net/bugs/?func=3Ddetailbug&bug_id=3D116677&group_id=3D= 5470 http://sourceforge.net/bugs/?func=3Ddetailbug&bug_id=3D116678&group_id=3D= 5470 -Fred --=20 Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From larsga@garshol.priv.no Thu Dec 14 10:03:11 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 14 Dec 2000 11:03:11 +0100 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: <200012140245.TAA16426@localhost.localdomain> References: <200012140245.TAA16426@localhost.localdomain> Message-ID: <m3zohzwc9s.fsf@lambda.garshol.priv.no> * uche ogbuji | | 1. Is there any way to convince an XML parser to work with source | with mixed encoding. A single XML entity must be entirely in a single character encoding. A document, however, can be in any number of different encodings, provided each entity is internally consistent. You can have encoding declarations on both the document entity (in the form of the XML declaration) and on subordinate entities (using text declarations). So you can do what you want using entities. --Lars M. From frank63@ms5.hinet.net Thu Dec 14 19:00:31 2000 From: frank63@ms5.hinet.net (Frank Chen) Date: Thu, 14 Dec 2000 19:00:31 -0000 Subject: [XML-SIG] Re:Mixed encodings and XML Message-ID: <200012141105.TAA16020@ms5.hinet.net> Hi: When I wrote this document, I made an assumption. If someone cannot see BIG5 or Shift_JIS, he knows he can "respectively" see BIG5 or Shift_JIS with a CJK viewer, like NJStar. Frank Chen From mal@lemburg.com Thu Dec 14 11:10:08 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 14 Dec 2000 12:10:08 +0100 Subject: [XML-SIG] Mixed encodings and XML References: <200012140018.RAA15661@localhost.localdomain> Message-ID: <3A38AA90.139D7FDB@lemburg.com> uche.ogbuji@fourthought.com wrote: > > > This is not really related to text encodings, but somewhat similar: > > > > Is there a standard way of including binary data in XML files ? > > No. Rich Salz pointed out in private mail that I could use base64 as encoding (can '<' and '>' appear in base64 ?). Alas, I would lose the search capability... > > I would like to put a complete web-site into a (large) XML file. > > The XML file should ideally contain not only the structure > > information, attributes, etc. but also the HTML files, the images > > and maybe even sound files or flash apps. > > Ah. This is similar to what the ebXML folks and the SOAP folks were at odds > over. Not, this is a well-known deficiency in XML. The most common > suggestion is: put it all into one file, separate them with form-feeds, and > have the application process each bit separately. Clearly this doesn't suit > your needs, but there's not much more to go on right now. Now thats about as non-XML like as it could get: form-feeds to separate file parts... ;-) > > Is something like this possible or will I have to use some > > other storage method for the binary parts and reference these > > from within the XML file (I would prefer not to, so that I can > > include e.g. the HTML file content in XML searches) ? > > Could you expand on this last bit about the searches? It hints at what might > be a work-around if that's your main concern. I would like to be able to use XML searching machinery to scan over web site structures. This includes limiting searches to certain attributes, e.g. keywords or meta-descriptions of the content, but should also cover full-text search of the content itself. Even better would be a possible recursive application of this scheme to embedded XML files, e.g. take a product catalog which is stored as XML and made available on the site using special site tools which only show the relevant parts of that file. I think I would have to provide a special tag <content encoding="base64|hex|plain|..." mimetype="..."> ... </content> to enable this. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Thu Dec 14 11:09:43 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 14 Dec 2000 12:09:43 +0100 Subject: [XML-SIG] PyXPath 1.1 Message-ID: <200012141109.MAA00827@loewis.home.cs.tu-berlin.de> As promised earlier, I tried to use another alternative parser toolkit for parsing XPath LocationPath expressions. Since Uche proposed to use Spark, that's what I did. As I result, I can now announce PyXPath 1.1, which is available from http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.1.tgz The major change over the previous version is the xpathspark module, which requires spark.py from Spark 0.6.1 (I decided not to include spark; let me know if you think I should). As with the YAPPS parser, the Spark parser generates an ad-hoc syntax tree, namely a nested list consisting of the rhs tupel for each production that was applied. Only in trivial cases, I modified the list, such as unwrapping a list of a single item. With the three parsers which I now have (the YAPPS parser, 4XPath, and the Spark parser), I performed some measurements. I took the list of the LocationPath examples from the recommendation and asked each parser to parse each expression 10 respectively 100 times. On a AMD K6 with 350 MHz, using Linux 2.4t7, glibc 2.2, and Python 2.0, I got the following results: 10 iterations: 4XPath 1.58s YAPPS 1.43s YAPPS with pre 2.31s Spark 12.58s 100 iterations: 4XPath 5.16s YAPPS 12.35s YAPPS with pre 22.54s Spark 124.92s In these numbers, "pre" is the PCRE regex module of 1.5.2, but still executed in Python 2; the default is sre. =46rom these numbers, I conclude: - sre is significantly faster than pre, so Python 2.0 is better for processing regular expressions than 1.5.2. Even when parsing from a Unicode string, the parser does not get much slower (numbers not shown here). - Spark is an order of magnitude slower than YAPPS. The Spark documentation suggests that the parsing algorithm used in Spark is quite general, but also quite slow. YAPPS used a recursive-descent LL(1) parsing, which seems to win easily. - The pure Python solution takes twice as much time as bison/flex solution. Note that for parsing a "small" number of expressions (300), the startup time of 4XPath overweights the parsing time, so the YAPPS parser is actually faster here. That may change once the YAPPS parser generates the same structure as 4XPath. IMO, this overhead is a fair price to pay for the increased portability, the Unicode support and the thread-safety of the Python solution. Unless somebody can suggest more parser generators to try (*), I'd now proceed with making the YAPPS parser 4XPath compatible. Regards, Martin (*) Be aware that any alternative parser generator should support: - tokenization of Unicode strings, either via an external lexer, or on its own using the re module - support for LL(1) or LALR(1) grammars. - ideally be pure Python, although an addition C module is acceptable as long as the resulting parser is still thread-safe. From larsga@garshol.priv.no Thu Dec 14 11:42:46 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 14 Dec 2000 12:42:46 +0100 Subject: [XML-SIG] Mixed encodings and XML In-Reply-To: <3A38AA90.139D7FDB@lemburg.com> References: <200012140018.RAA15661@localhost.localdomain> <3A38AA90.139D7FDB@lemburg.com> Message-ID: <m3snnrw7nt.fsf@lambda.garshol.priv.no> * mal@lemburg.com | | Rich Salz pointed out in private mail that I could use base64 | as encoding (can '<' and '>' appear in base64 ?). base64 is indeed the common way to encode binary material inside XML documents. It uses only A-Za-z+/= for encoding. | I would like to be able to use XML searching machinery to scan over | web site structures. This includes limiting searches to certain | attributes, e.g. keywords or meta-descriptions of the content, but | should also cover full-text search of the content itself. In that case I would recommend keeping the non-XML content external to the XML documents and only reference them from the XML content. | I think I would have to provide a special tag | | <content encoding="base64|hex|plain|..." mimetype="..."> | ... | </content> | | to enable this. That seems like a very reasonable solution. --Lars M. From uche.ogbuji@fourthought.com Thu Dec 14 15:21:44 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 14 Dec 2000 08:21:44 -0700 Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no> of "14 Dec 2000 11:03:11 +0100." <m3zohzwc9s.fsf@lambda.garshol.priv.no> Message-ID: <200012141521.IAA18229@localhost.localdomain> > > * uche ogbuji > | > | 1. Is there any way to convince an XML parser to work with source > | with mixed encoding. > > A single XML entity must be entirely in a single character encoding. > A document, however, can be in any number of different encodings, > provided each entity is internally consistent. You can have encoding > declarations on both the document entity (in the form of the XML > declaration) and on subordinate entities (using text declarations). > > So you can do what you want using entities. Excellent! Just when I'd convinced myself that I was on a fool's errand, comes Lars to the rescue. I gues it's been too long since I've exercised all of XML 1.0. I so rarely use entities that I completely forgot that they are exactly the solution. I can use entities in special XML elements, and extend the docbook stylesheet to output the contents of those elements to a separate file using the "ft:write-file" extension element. Perfect. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Thu Dec 14 15:27:31 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 14 Dec 2000 08:27:31 -0700 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> of "Thu, 14 Dec 2000 12:09:43 +0100." <200012141109.MAA00827@loewis.home.cs.tu-berlin.de> Message-ID: <200012141527.IAA18240@localhost.localdomain> Wow Martin! Brilliant work as usual. Last weekend Jeremy quietly wrote a partial XPath lexer all in Python/SRE. We'll try to bind it to bison and post this today so you can run your test harness on it. I agree that we have little choice to to expect some slow-down. flex/bison is certainly very fact, but it doesn't deal with wide chars and flex doesn't bother with thread-safety. So speed at a dear price. Maybe it's worth designing a plug-in API for XPath implementations so people can make their choices. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From paulp@ActiveState.com Thu Dec 14 16:57:32 2000 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 14 Dec 2000 08:57:32 -0800 Subject: [XML-SIG] PyXPath 1.1 References: <200012141527.IAA18240@localhost.localdomain> Message-ID: <3A38FBFC.6082824D@ActiveState.com> uche.ogbuji@fourthought.com wrote: > > Wow Martin! Brilliant work as usual. Strongly agree. > Maybe it's worth designing a plug-in API for XPath implementations so people > can make their choices. That's a good idea independent of this parsing issue. XPath implementations will always have different performance characteristics, especially if they take advantage of "secret handshakes" with certain underlying DOMs. What ever happened to this effort: http://lists.w3.org/Archives/Public/www-dom-xpath/ Paul Prescod From fdrake@acm.org Thu Dec 14 16:56:09 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 14 Dec 2000 11:56:09 -0500 (EST) Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: <3A38FBFC.6082824D@ActiveState.com> References: <200012141527.IAA18240@localhost.localdomain> <3A38FBFC.6082824D@ActiveState.com> Message-ID: <14904.64425.308375.787523@cj42289-a.reston1.va.home.com> Paul Prescod writes: > What ever happened to this effort: > > http://lists.w3.org/Archives/Public/www-dom-xpath/ I wasn't even aware of this -- it looks like a little spam killed the list in the end! Frankly, I imagine everyone's been too busy to work it up, and the DOM still seems to be evolving quite rapidly. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations From mclay@nist.gov Thu Dec 14 05:08:12 2000 From: mclay@nist.gov (Michael McLay) Date: Thu, 14 Dec 2000 00:08:12 -0500 Subject: [XML-SIG] Mixed encodings and XML In-Reply-To: <3A3804CA.5DC4B238@lemburg.com> References: <3A37FF53.206662F3@fourthought.com> <14904.443.228020.168633@cymru.basistech.com> <3A3804CA.5DC4B238@lemburg.com> Message-ID: <00121400081206.16898@fermi.eeel.nist.gov> On Wednesday 13 December 2000 18:22, M.-A. Lemburg wrote: > Tom Emerson wrote: > > > I need to convert the document to XML Docbook format. My naive > > > attempts at converting to > > > > > > <screen xml:lang="zh-TW"> > > > ... BIG5-encoded stuff ... > > > </screen> > > > > > > Of course don't work because the parser takes one look at the BIG5 and > > > throws a well-formedness error. > > > > Which it is required to do, see Section 4.3.3 of the XML specification. > > This is not really related to text encodings, but somewhat similar: > > Is there a standard way of including binary data in XML files ? There is a standard solution defined for binary encoding in XML Schema. Search for the term binary in http://www.w3.org/TR/xmlschema-0/ The specification for binary encoding in XML Schema is at: http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/#binary Is anyone working on an XML Schema validator that works with the standard Python XML library? From uche.ogbuji@fourthought.com Fri Dec 15 04:11:24 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 14 Dec 2000 21:11:24 -0700 Subject: [XML-SIG] Re: [4suite] memory leak problem 4DOM - update References: <3A37FD5F.B9FD3620@fourthought.com> <3A393F92.C689182D@fourthought.com> <00121511533903.00886@localhost.localdomain> <0012151603120H.00886@localhost.localdomain> Message-ID: <3A3999EC.47ACB9FF@fourthought.com> We really should share this discussion with XML-SIG. matt wrote: > > Some answers to my own questions ... but still a problem > > On Fri, 15 Dec 2000, matt wrote: > > Ok, have done some more experimentation ..... I stepped through everything with > > pdb, let things settle over a few iterations and then discovered the recuring > > process. Some of it is indeed in Py_expat .... in > > xml/sax/drivers/drv_pyexpat.py to be specific. The offending lines being the > > buf = fileobj.read(16384) ones(see function below) ... these chomp 4 kb each > > time through. Well, they are not really that offending, they're just loading a > > buffer like they are supposed to be doing. > > > > I made the patch. Looking more carefully the memory gulp comes from > self.parser.Parse in the parseFile function ..... which confuses me, because I > made the patch, rebuitl and reinstalled ... including to make sure that all was > updated : > i.e. : > copying xml/dom/ext/Printer.py -> build/lib.linux-i686-1.5/xml/dom/ext (I had > found a patch for that too) > > gcc -g -O2 -fpic -DXML_NS -Iextensions/expat/xmltok > -Iextensions/expat/xmlparse -I/usr/local/include/python1.5 -c extensi > > copying build/lib.linux-i686-1.5/xml/parsers/pyexpat.so -> > /usr/local/lib/python1.5/site-packages/xml/parsers > > the patch was : > Index: pyexpat.c > =================================================================== > RCS file: /cvsroot/pyxml/xml/extensions/pyexpat.c,v > retrieving revision 1.16 > diff -u -r1.16 pyexpat.c > --- pyexpat.c 2000/11/02 04:57:40 1.16 > +++ pyexpat.c 2000/12/05 00:00:33 > @@ -680,6 +680,7 @@ > for (i=0; handler_info[i].name != NULL; i++) { > Py_XDECREF(self->handlers[i]); > } > + free (self->handlers); > #if PY_MAJOR_VERSION == 1 && PY_MINOR_VERSION < 6 > /* Code for versions before 1.6 */ > free(self); > > and it indeed did succeed. > > I guess I keep looking. Anyone find this patch did not help? > > regards > Matt > > > > > > > def parseFile(self,fileobj,sysID=None): > > self.reset() > > self.sysID=sysID > > self.doc_handler.startDocument() > > > > buf = fileobj.read(16384) > > while buf != "": > > if self.parser.Parse(buf, 0) != 1: > > self.__report_error() > > buf = fileobj.read(16384) > > self.parser.Parse("", 1) > > > > self.doc_handler.endDocument() > > > > > > So the problem I see is the freeing of this buffer 'buf' : I can only guess a > > few things : > > 1) obviously it gets put into the py_expat parser document, which space for > > that frame gets allocated on the first time through. Perhaps the py_expat > > document is not releasing this buffer properly when ext.ReleaseNode(d) calls > > all the delete nodes. I haven't looked for anything cirsular there. > > > > 2) the fileob.read above is actually doing something weird. The 4kb seems > > weird considering it a) reads 16384 bytes, and my file is only 190 bytes, and b) > > 16384 = 1.64 kb and not 4 kb. > > 4 kb seems to me the size of some sort of stack frame for a function that never > > gets released to be used again???? > > > > Either way, using ext.ReleaseNode(d) did help somewhat, so I would guess that > > py_expat is to blame somewhere. I will now go in search of the patch for > > py_expat and see if this solves the problem overall. > > > > to be continued ..... > > > > Matt > > > > > > > > > > > > On Fri, 15 Dec 2000, Uche Ogbuji wrote: > > > matt wrote: > > > > > > > > Using ext.ReleaseNode(d) helped partially. On the first iteration through the > > > > first loop it chomps about 332kb, which I never get back in either case, i.e. > > > > a) using ext.ReleaseNode(d) or b) not. After that I get smaller bites, if > > > > using a) they are 4-12 kb bites, or in b) 16-20 kb bites. Both methods seem to > > > > oscillate between two values. So there was an improvement, i.e approx 8 kb > > > > improvement with using ext.ReleaseNode(d). That first jump in both methods is > > > > a bit of a shock, especially because it never gets given back. However I had > > > > the feeling this first jump was just python memory allocation, and that it > > > > might release it some time later. > > > > > > This is pretty common because of Python's dynamic nature. The first > > > time in the loop you are importing a wole bunch of modules, which are of > > > course added to the memory footprint. After that subsequent imports > > > don't add to memory. The little incrementa jumps are probably indeed > > > memory leaks, so any more info you have tol help us track it down would > > > be appreciated. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From dagb@fast.no Fri Dec 15 10:44:15 2000 From: dagb@fast.no (=?ISO-8859-1?Q?Dag=20Brattli?=) Date: Fri, 15 Dec 2000 10:44:15 GMT Subject: [XML-SIG] =?ISO-8859-1?Q?PyXML,=20sgmlop=20and=20xmllib?= Message-ID: <200012151044.KAA34209@tepid.osl.fast.no> Hi, The xmllib.py for sgmlop is missing from PyXML. Does anybody know where to find an updated version? Both README.sgmlop and xml/parsers/__init__.py tells that there should be an xmllib.py around but it's not. -- Dag ---- Dag Brattli, Mail: dagb@fast.no Senior Systems Engineer Web: http://www.fastsearch.com/ Fast Search & Transfer ASA Phone: +47 776 96 688 P.O. Box 621 Fax: +47 776 96 689 NO-9257 Tromsø, NORWAY Cell: +47 415 72 969 (new) Try FAST Mobile Search: http://mobile.alltheweb.com/ From noreply@sourceforge.net Fri Dec 15 14:49:11 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Dec 2000 06:49:11 -0800 Subject: [XML-SIG] [Bug #125896] Ods proble with checkpoints Message-ID: <E146wAZ-0003lt-00@usw-sf-web1.sourceforge.net> Bug #125896, was updated on 2000-Dec-15 06:49 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: Ods proble with checkpoints Details: It's got to do with transactions and checkpoints. I guess Narval's bit stressing the Ods Engine. I've attached a sample test file, widely inspired from some code from Narval init. It runs as is, but if you uncomment line 32, everything falls apart (and you get a stacktrace similar to the one reported before). -------------------------8<-------------------------------- from Ft.DbDom import Dom from Ft.Ods import Database from Ft.DbDom import Reader from xml.dom.ext import PrettyPrint,StripXml,Print from Ft.Ods import FreePersistentObject from xml.xpath import Evaluate AL_NS = '' class MemoryDocument(Dom.DocumentImp) : def __init__(self) : Dom.DocumentImp.__init__(self) self.eid_count = 1 self.eid_ref_count = {} def add_element(self,element) : """if the element has not already an eid (== not yet in memory) assign unique id to the element and append it to memory. """ global tx eid = element.getAttributeNS(AL_NS,'eid') if not eid : eid = str(self.eid_count) element.setAttributeNS(AL_NS,'eid',eid) self.eid_count = self.eid_count + 1 self.eid_ref_count[eid] = 1 self.documentElement.appendChild(element) ### Uncomment following line to see the bug #tx.checkpoint() for node in self.documentElement.childNodes[:] : if node.tagName == 'plan' : node.element_change(element) return eid, element DBNAME='ods:alf@orion:5432:dom_test' mydoc='''<root><child id="1"><info/></child> <child id="2"><info>foo</info></child></root>''' db = Database.Database() db.open(DBNAME) tx = db.new() tx.begin() doc = MemoryDocument() e = doc.createElementNS('','elt') doc.appendChild(e) tx.checkpoint() r = Reader.DbDomReader() frag = r.fromString(mydoc,doc) map(doc.add_element, Evaluate('root/child',frag)) tx.commit() PrettyPrint(doc) For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125896&group_id=6473 From noreply@sourceforge.net Fri Dec 15 14:51:11 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Dec 2000 06:51:11 -0800 Subject: [XML-SIG] [Bug #125897] PyExpat still uses Expat version 1.1 Message-ID: <E146wCV-0003mN-00@usw-sf-web1.sourceforge.net> Bug #125897, was updated on 2000-Dec-15 06:51 Here is a current snapshot of the bug. Project: Python/XML Category: expat Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: mjpieters Assigned to : nobody Summary: PyExpat still uses Expat version 1.1 Details: PyExpat should be upgraded to Expat 1.2. Expat 1.2 changes adds support for parsing external DTDs and parameter entities. The xml.dom.ext.PyExpat reader (once unbroken ;)) already supports the additional interface for Expat 1.2 (XML_StartDoctypeDeclHandler -> Reader.startDTD). This functionailty is needed to parse out the public and system Ids of a <!DOCTYPE> declaration, for example. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125897&group_id=6473 From Mike.Olson@fourthought.com Fri Dec 15 14:48:28 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 15 Dec 2000 07:48:28 -0700 Subject: [XML-SIG] PyXPath 1.1 References: <200012141527.IAA18240@localhost.localdomain> Message-ID: <3A3A2F3C.8AE8A27E@FourThought.com> uche.ogbuji@fourthought.com wrote: > > Wow Martin! Brilliant work as usual. Last weekend Jeremy quietly wrote a > partial XPath lexer all in Python/SRE. We'll try to bind it to bison and post > this today so you can run your test harness on it. Yes, thanks Martin this saves us a lot of time. Question, will it handle "mod mod mod" or "* * *"? These needs to translate to the token wildcard name, operator, wildcard name. I ask 'cause this caused us many headaches with 4XPath. We had to do it with flex state. > > Maybe it's worth designing a plug-in API for XPath implementations so people > can make their choices. This wouldn't be that difficult. A simple interface to get a list of tokens(and the matched string) from the scanner would suffice. We will need some logic to turn this list into YY unions for Bison, but that is pretty simple as well. Mike > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA > Software-engineering, knowledge-management, XML, CORBA, Linux, Python > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Nicolas.Chauvat@logilab.fr Fri Dec 15 15:08:07 2000 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Fri, 15 Dec 2000 16:08:07 +0100 (CET) Subject: [XML-SIG] 4DOM.xml.xslt.DomWriter.py bugfix Message-ID: <Pine.LNX.4.21.0012151544180.24303-100000@aries> In 4Suite's 4DOM xml.xslt.DomWriter.py, at line 76, read: pi =3D self.__ownerDoc.createProcessingInstruction(target,data) ^^ at line 81, read: comment =3D self.__ownerDoc.createDocument(text) ^^ ^^^^ And you're back on track :-) --=20 Nicolas Chauvat http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F= rance) From uche.ogbuji@fourthought.com Fri Dec 15 15:23:04 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 15 Dec 2000 08:23:04 -0700 Subject: [XML-SIG] Re: [4suite] 4DOM.xml.xslt.DomWriter.py bugfix References: <Pine.LNX.4.21.0012151544180.24303-100000@aries> Message-ID: <3A3A3758.62591C7B@fourthought.com> Nicolas Chauvat wrote: > > In 4Suite's 4DOM xml.xslt.DomWriter.py, > > at line 76, read: > > pi = self.__ownerDoc.createProcessingInstruction(target,data) > ^^ > > at line 81, read: > > comment = self.__ownerDoc.createDocument(text) > ^^ ^^^^ > > And you're back on track :-) Done. Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ngps@post1.com Fri Dec 15 16:17:23 2000 From: ngps@post1.com (Ng Pheng Siong) Date: Sat, 16 Dec 2000 00:17:23 +0800 Subject: [XML-SIG] Copyright character chokes parser Message-ID: <20001216001723.A1163@madcap.dyndns.org> Hi, I'm fiddling with XBEL using PyXML 0.6.2. I have a bookmark entry as follows: <bookmark href="http://www.optioninsight.com/" added="946429657" visited="946444587" modified="946429652" > <title>Option Insight© - Home of the Greatest Option Program. Ever. The copyright character (you might see it as ) in the title chokes xbel_parse.py: $ python xbel_parse.py --xbel < bm.xml Traceback (most recent call last): File "xbel_parse.py", line 91, in ? p.parseFile( sys.stdin ) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers/drv_pyexpat.py", line 68, in parseFile if self.parser.Parse(buf, 0) != 1: xml.parsers.expat.error: not well-formed: line 68, column 27 A simple SAX-based parser written per the XML HOWTO throws an exception at the same spot: $ python xbp.py < bm.xml Traceback (most recent call last): File "xbp.py", line 19, in ? p.parse(sys.stdin) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 42, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 86, in feed self._err_handler.fatalError(exc) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :68:27: not well-formed Line 68 column 27 is where the copyright character is. Any hints to a workaround? (I'm not subscribed. Please cc replies.) TIA. Cheers. -- Ng Pheng Siong * http://www.post1.com/home/ngps From uche.ogbuji@fourthought.com Fri Dec 15 16:49:40 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 15 Dec 2000 09:49:40 -0700 Subject: [XML-SIG] Copyright character chokes parser In-Reply-To: Message from Ng Pheng Siong of "Sat, 16 Dec 2000 00:17:23 +0800." <20001216001723.A1163@madcap.dyndns.org> Message-ID: <200012151649.JAA22126@localhost.localdomain> > I'm fiddling with XBEL using PyXML 0.6.2. > = > I have a bookmark entry as follows: > = > > Option Insight=A9 - Home of the Greatest Option Program. E= ver. > I just went through encoding hell of a more involved sort so I might as w= ell = chip in here. Add = As the first thing in your XML file (that is even before any white space)= and = you should be fine. If you don't specify an encoding, the parser assumes= UTF-8 (except if you use a byte-order mark in which case it assumes UTF-16). T= he = copyright char is not legal UTF-8 because it''s a byte value exceeding 12= 7. = ISO-8859-1 or LATIN-1 allow you to use byte values above 127. -- = Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com = 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Fri Dec 15 17:17:33 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Dec 2000 09:17:33 -0800 Subject: [XML-SIG] [Bug #125909] xml.dom.ext.Printer produces invalid or incomplete DTDs Message-ID: Bug #125909, was updated on 2000-Dec-15 09:17 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: mjpieters Assigned to : nobody Summary: xml.dom.ext.Printer produces invalid or incomplete DTDs Details: xml.dom.Printer.PrintVisitor.visitDocumentType will produce incorrect or incomplete XML in the following cases: - There is no System ID defined, but there are entities or notations: The entitites and notations are not written out. The XML spec says that a System ID isn't mandatory, DTDs with ]> is perfectly valid. - There is both a System ID and a Public ID defined: The Public and System ID are written out as: " SYSTEM ""> The keyword 'SYSTEM' is illegal in this context, it should read: " ""> - There is a double-quote character (") in either the System ID or the Public ID: The Public or System ID in question will be written out enclosed with double-quotes, while the XML spec provides for enclosing the ID in single quotes ('). I'll submit a patch to the patch manager. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125909&group_id=6473 From ken@bitsko.slc.ut.us Fri Dec 15 17:49:34 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 15 Dec 2000 11:49:34 -0600 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: Mike Olson's message of "Fri, 15 Dec 2000 07:48:28 -0700" References: <200012141527.IAA18240@localhost.localdomain> <3A3A2F3C.8AE8A27E@FourThought.com> Message-ID: Mike Olson writes: > uche.ogbuji@fourthought.com wrote: > > > Maybe it's worth designing a plug-in API for XPath implementations > > so people can make their choices. > > This wouldn't be that difficult. A simple interface to get a list > of tokens(and the matched string) from the scanner would suffice. > We will need some logic to turn this list into YY unions for Bison, > but that is pretty simple as well. At the plug-in API level, I'd be interested in something more at the "location path" level, possibly an array of steps, each step with axis, node test, and list of predicates. This would involve defining a common, sharable data model for these, but I think it would be more useful overall than a raw token list. -- Ken From akuchlin@mems-exchange.org Fri Dec 15 18:27:25 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Fri, 15 Dec 2000 13:27:25 -0500 Subject: [XML-SIG] Adding scripts Message-ID: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> What do people think about adding some useful scripts to PyXML that get installed in /usr/local/bin or somewhere like that? Possibilities would be (names off the top of my head): xmlproc_val : Validate files using xmlproc xmlrpc_call : Make an XML-RPC call (useful for shell scripts, or using XML-RPC from languages w/o an XML parser, such as Emacs Lisp) Anyone have additional ideas? --amk From noreply@sourceforge.net Fri Dec 15 18:25:52 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Dec 2000 10:25:52 -0800 Subject: [XML-SIG] [Patch #102861] xml.dom.ext.Printer produces invalid or incomplete DTDs Message-ID: Patch #102861 has been updated. Project: pyxml Category: None Status: Open Submitted by: mjpieters Assigned to : nobody Summary: xml.dom.ext.Printer produces invalid or incomplete DTDs ------------------------------------------------------- For more info, visit: http://sourceforge.net/patch/?func=detailpatch&patch_id=102861&group_id=6473 From noreply@sourceforge.net Fri Dec 15 18:44:25 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Dec 2000 10:44:25 -0800 Subject: [XML-SIG] [Bug #125917] DbDom : no cloneNode method. Message-ID: Bug #125917, was updated on 2000-Dec-15 10:44 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: DbDom : no cloneNode method. Details: Here's a small script that demonstrates the bug: from Ft.DbDom.Dom import DocumentImp d = DocumentImp() e = d.createElementNS('','root') d.appendChild(e) f = e.cloneNode(1) -------------- [alf@leo alf]$ python dbdomclone.py Traceback (innermost last): File "dbdomclone.py", line 6, in ? f = e.cloneNode(1) File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 170, in __getattr__ raise AttributeError(name) AttributeError: cloneNode For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=125917&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Fri Dec 15 21:06:23 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 15 Dec 2000 22:06:23 +0100 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: (message from Ken MacLeod on 15 Dec 2000 11:49:34 -0600) References: <200012141527.IAA18240@localhost.localdomain> <3A3A2F3C.8AE8A27E@FourThought.com> Message-ID: <200012152106.WAA00918@loewis.home.cs.tu-berlin.de> > At the plug-in API level, I'd be interested in something more at the > "location path" level, possibly an array of steps, each step with > axis, node test, and list of predicates. Yes, that would be a reasonable XPath API. How do you like the 4Suite ParsedLocationPath class, and corresponding structures? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Dec 15 20:58:59 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 15 Dec 2000 21:58:59 +0100 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: <3A3A2F3C.8AE8A27E@FourThought.com> (message from Mike Olson on Fri, 15 Dec 2000 07:48:28 -0700) References: <200012141527.IAA18240@localhost.localdomain> <3A3A2F3C.8AE8A27E@FourThought.com> Message-ID: <200012152058.VAA00867@loewis.home.cs.tu-berlin.de> > Yes, thanks Martin this saves us a lot of time. Question, will it > handle "mod mod mod" or "* * *"? These needs to translate to the > token wildcard name, operator, wildcard name. I ask 'cause this > caused us many headaches with 4XPath. We had to do it with flex > state. Currently, "* * *" is recognized as NameTest NameTest MultiplyOperator. This was incorrect due to a minor bug. I just fixed that, it now tokenizes this as STAR (i.e. NameTest) MultiplyOperator STAR and NCName mod NCName, respectively. The scanner generator in yapps was not suitable for the special rules, so I have my own hand-written parser. For Spark, the "generated" tokenization could be easily expanded to provide the correct token sequence. > > Maybe it's worth designing a plug-in API for XPath implementations > > so people can make their choices. > > This wouldn't be that difficult. I think there is no need to have two different XPath tokenizers that both use sre. Instead, I hope we can merge the two implementations, using correctness and speed as measurements. It then still needs to adjusted to the parser, but that is normally a simple transformation - the underlying code should always be the same. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Dec 15 21:09:28 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 15 Dec 2000 22:09:28 +0100 Subject: [XML-SIG] Adding scripts In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> (amk@mira.erols.com) References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> Message-ID: <200012152109.WAA00962@loewis.home.cs.tu-berlin.de> > xmlproc_val : Validate files using xmlproc Sounds like a good idea. > xmlrpc_call : Make an XML-RPC call (useful for shell scripts, or using > XML-RPC from languages w/o an XML parser, such as Emacs Lisp) I can't see the usage for that one. Why would you need an XML parser to make an XML-RPC call? Formatting the request is easy. For processing the response, there might be indeed the need for a parser - how would this script present the result to the caller? Regards, Martin From ngps@post1.com Sun Dec 17 15:02:28 2000 From: ngps@post1.com (Ng Pheng Siong) Date: Sun, 17 Dec 2000 23:02:28 +0800 Subject: [XML-SIG] Copyright character chokes parser In-Reply-To: <200012151649.JAA22126@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 15, 2000 at 09:49:40AM -0700 References: <200012151649.JAA22126@localhost.localdomain> Message-ID: <20001217230228.B300@madcap.dyndns.org> On Fri, Dec 15, 2000 at 09:49:40AM -0700, uche.ogbuji@fourthought.com wrote: > Add > Thanks, that did it. Cheers. -- Ng Pheng Siong * http://www.post1.com/home/ngps From keichwa@gmx.net Mon Dec 18 05:52:04 2000 From: keichwa@gmx.net (Karl Eichwalder) Date: 18 Dec 2000 06:52:04 +0100 Subject: [XML-SIG] Re: Adding scripts In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> Message-ID: "A.M. Kuchling" writes: > xmlproc_val : Validate files using xmlproc > xmlrpc_call : Make an XML-RPC call (useful for shell scripts, or using > XML-RPC from languages w/o an XML parser, such as Emacs Li= sp) >=20 > Anyone have additional ideas? Please, consider the prefix =BBpy_=AB or something. --=20 work : ke@suse.de | ,__o : http://www.suse.de/~ke/ | _-\_<, home : keichwa@gmx.net | (*)/'(*) From Mike.Olson@fourthought.com Mon Dec 18 08:47:04 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 18 Dec 2000 01:47:04 -0700 Subject: [XML-SIG] Small memory leak Message-ID: <3A3DCF08.12C1A98F@FourThought.com> In the DTDParser. There is a cirular reference, cyclops output follows: 0x84e1850 rc:1 instance xml.parsers.xmlproc.dtdparser.DTDParser repr: this.ent -> 0x84e1968 rc:1 instance xml.parsers.xmlproc.xmlapp.EntityHandler repr: this.parser -> 0x84e18d0 rc:1 instance xml.parsers.xmlproc.xmlapp.ErrorHandler repr: this.locator -> 0x84e1850 rc:1 instance xml.parsers.xmlproc.dtdparser.DTDParser repr: I got around it by changing the deref function on DTDParser to also set self.ent to None. def deref(self): "Removes circular references." self.ent = self.dtd_consumer = self.dtd = self.app = self.err = None Mike -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Mon Dec 18 10:49:56 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 18 Dec 2000 11:49:56 +0100 Subject: [XML-SIG] Small memory leak In-Reply-To: <3A3DCF08.12C1A98F@FourThought.com> (message from Mike Olson on Mon, 18 Dec 2000 01:47:04 -0700) References: <3A3DCF08.12C1A98F@FourThought.com> Message-ID: <200012181049.LAA00706@loewis.home.cs.tu-berlin.de> > I got around it by changing the deref function on DTDParser to also set > self.ent to None. Unless Lars Marius objects - would you like to commit that change to PyXML? Regards, Martin From uche.ogbuji@fourthought.com Tue Dec 19 03:28:02 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 18 Dec 2000 20:28:02 -0700 Subject: [XML-SIG] Lexical handlers for PyXML? Message-ID: <200012190328.UAA13043@localhost.localdomain> Looks as if there is no lexical handler support in drv_pyexpat or drv_xmlproc. They're all mentioned in to-do lists. I know Lars is pretty much buried in work and unless someone else picks up the flag it might be a while before it happens. I can certainly add lexical handler support to drv_pyexpat (I'll sign up for the easy part. Heh!) Mostly I wanted to be sure it's not completely forgotten. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From Mike.Olson@fourthought.com Tue Dec 19 03:33:00 2000 From: Mike.Olson@fourthought.com (Mike Olson) Date: Mon, 18 Dec 2000 20:33:00 -0700 Subject: [XML-SIG] Small memory leak References: <3A3DCF08.12C1A98F@FourThought.com> <200012181049.LAA00706@loewis.home.cs.tu-berlin.de> Message-ID: <3A3ED6EC.754898E0@FourThought.com> "Martin v. Loewis" wrote: > > > I got around it by changing the deref function on DTDParser to also set > > self.ent to None. > > Unless Lars Marius objects - would you like to commit that change to > PyXML? No objections so I checked it in. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From noreply@sourceforge.net Tue Dec 19 03:51:39 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Dec 2000 19:51:39 -0800 Subject: [XML-SIG] [Bug #126272] LexicalHandler not supported for drv_pyexpat. Message-ID: Bug #126272, was updated on 2000-Dec-18 19:51 Here is a current snapshot of the bug. Project: Python/XML Category: SAX Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: fdrake Assigned to : uche Summary: LexicalHandler not supported for drv_pyexpat. Details: Uche pointed out that LexicalHandler wasn't support for either pyexpat or xmlproc, and volunteered to implement it. This bug report is his reminder! For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126272&group_id=6473 From fdrake@acm.org Tue Dec 19 03:47:51 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 18 Dec 2000 22:47:51 -0500 (EST) Subject: [XML-SIG] Lexical handlers for PyXML? In-Reply-To: <200012190328.UAA13043@localhost.localdomain> References: <200012190328.UAA13043@localhost.localdomain> Message-ID: <14910.55911.770435.756449@cj42289-a.reston1.va.home.com> uche.ogbuji@fourthought.com writes: > Looks as if there is no lexical handler support in drv_pyexpat or > drv_xmlproc. ... > I can certainly add lexical handler support to drv_pyexpat (I'll > sign up for the easy part. Heh!) I'd love to see it get done! In fact, I just filed a bug & assigned it to you as a reminder. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From noreply@sourceforge.net Tue Dec 19 04:06:19 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Dec 2000 20:06:19 -0800 Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX parser Message-ID: Bug #126275, was updated on 2000-Dec-18 20:06 Here is a current snapshot of the bug. Project: Python/XML Category: expat Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: uche Assigned to : nobody Summary: pyexpat.c doesn't match docs or SAX parser Details: _xmlplus/sax/expatreader.py, line 81 self._parser.Parse(data, isFinal) And the Python 2.0 docs say this is right. But 'ave a butcher's at PyXML-0.6.1/extensions/pyexpat.c line 379 and following, particularly the PyArg_ParseTuple static PyObject * xmlparse_Parse(xmlparseobject *self, PyObject *args) { char *s; int slen; int isFinal = 0; int rv; if (!PyArg_ParseTuple(args, "s#|i:Parse", &s, &slen, &isFinal)) return NULL; Uh oh. Surely enough: >>> doc = r.fromString(s) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py", line 49, in fromString rt = self.fromStream(stream, ownerDoc) File "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 267, in fromStream self.parser.parse(stream) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 42, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 81, in feed self._parser.Parse(data, isFinal) TypeError: not enough arguments; expected 4, got 2 >>> Hmm. So what's right? The C code or the SAX driver and docs? Note: Python 2.0's pyexpat.c is the same way as PyXML's For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126275&group_id=6473 From uche.ogbuji@fourthought.com Tue Dec 19 04:14:45 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 18 Dec 2000 21:14:45 -0700 Subject: [XML-SIG] Lexical handlers for PyXML? In-Reply-To: Message from "Fred L. Drake, Jr." of "Mon, 18 Dec 2000 22:47:51 EST." <14910.55911.770435.756449@cj42289-a.reston1.va.home.com> Message-ID: <200012190414.VAA13313@localhost.localdomain> > > uche.ogbuji@fourthought.com writes: > > Looks as if there is no lexical handler support in drv_pyexpat or > > drv_xmlproc. > ... > > I can certainly add lexical handler support to drv_pyexpat (I'll > > sign up for the easy part. Heh!) > > I'd love to see it get done! In fact, I just filed a bug & assigned > it to you as a reminder. ;-) Oh yeah? Remind me to send you a time machine and a ticket on the Titanic. Ah well, I'll take it on. Even more serious, probably, is this bug, which I just submitted. https://sourceforge.net/bugs/?func=detailbug&bug_id=126275&group_id=6473 I assume Paul or whoever wrote pyexpat sent you the interface. Looks as if the code doesn't match the docs, and it bombs SAX2 with pyexpat. It looks as if the right thing to do is to just match the docs and ixnay the extra parameters in the C code, but I'm guessing there are others who know better. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Dec 19 04:23:54 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Mon, 18 Dec 2000 21:23:54 -0700 Subject: [XML-SIG] Lexical handlers for PyXML? In-Reply-To: Message from uche.ogbuji@fourthought.com of "Mon, 18 Dec 2000 21:14:45 MST." <200012190414.VAA13313@localhost.localdomain> Message-ID: <200012190423.VAA13351@localhost.localdomain> > > I'd love to see it get done! In fact, I just filed a bug & assigned > > it to you as a reminder. ;-) > > Oh yeah? Remind me to send you a time machine and a ticket on the Titanic. > Ah well, I'll take it on. Just in case anyone is low on humor supplements, this was a joke. I know, I know, but you never know, especially since I don't use emoticons on principle. Happy holiday, all. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Tue Dec 19 04:36:54 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 18 Dec 2000 23:36:54 -0500 (EST) Subject: [XML-SIG] Lexical handlers for PyXML? In-Reply-To: <200012190414.VAA13313@localhost.localdomain> References: <200012190414.VAA13313@localhost.localdomain> <200012190423.VAA13351@localhost.localdomain> <14910.55911.770435.756449@cj42289-a.reston1.va.home.com> Message-ID: <14910.58854.493580.318328@cj42289-a.reston1.va.home.com> uche.ogbuji@fourthought.com writes: > Even more serious, probably, is this bug, which I just submitted. I agree; this is a problem. The version in the Python CVS tree (xml.sax.expatreader) seems fine, or it was working for me this morning (I don't find I use PyXML often anymore now that we have something in the standard library). I'll try and look at it this week. > Just in case anyone is low on humor supplements, this was a joke. Sure.... ;) > Happy holiday, all. Bah, humbug! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Tue Dec 19 10:05:36 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 19 Dec 2000 11:05:36 +0100 Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX parser In-Reply-To: (noreply@sourceforge.net) References: Message-ID: <200012191005.LAA10014@loewis.home.cs.tu-berlin.de> > Hmm. So what's right? The C code or the SAX driver and docs? My guess is that this has nothing to do with Parse(), the function works correctly. Instead, the problem is that pyexpat invokes a callback on the content handler, and *that* call has problems with the number of arguments. Most likely, it's a call to characters, which occurs frequently when a DocumentHandler is used in a place where a ContentHandler is expected (i.e. in SAX2). The straight-forward solution is to have expat call a Python function with the right number of arguments, and to have that function call the content handler. Unfortunately, that will add another Python function call for every characters event, even though in every working application the argument number mismatch will never be a problem. So somehow pyexpat should put itself into the traceback. I'm not sure how this would be done best, though - we can't give reasonable line number, for example. Contributions are welcome. Regards, Martin From noreply@sourceforge.net Tue Dec 19 15:19:15 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Dec 2000 07:19:15 -0800 Subject: [XML-SIG] [Bug #126342] DbDom: cloneNode bug (18/12 snapshot) Message-ID: Bug #126342, was updated on 2000-Dec-19 07:19 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: DbDom: cloneNode bug (18/12 snapshot) Details: I get an integrity exception when trying to set a cloned Attribute. Sample code: from Ft.DbDom import Dom from Ft.Ods import Database from Ft.DbDom import Reader from xml.dom.ext import PrettyPrint,StripXml,Print from Ft.Ods import FreePersistentObject from Ft.DbDom.Dom import DocumentImp DBNAME='ods:alf@orion:5432:dom_test' db = Database.Database() db.open(DBNAME) tx = db.new() tx.begin() d = DocumentImp() e = d.createElementNS('','root') d.appendChild(e) e.setAttributeNS('','foo','bar') f=d.createElementNS('','child') e.appendChild(f) for attr in e.attributes: f.setAttributeNodeNS(attr.cloneNode(1)) tx.commit() ------------------8<------------------- Sample output: [alf@leo alf]$ python dbdomclone.py Traceback (innermost last): File "dbdomclone.py", line 22, in ? f.setAttributeNodeNS(attr.cloneNode(1)) File "/usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py", line 226, in setAttributeNodeNS self.add_attributes(node) File "/usr/lib/python1.5/site-packages/Ft/DbDom/Element/__init__.py", line 22, in add_attributes self._4ods_addRelationship('attributes',Attribute.Attribute_stub,'ownerElement','form',target,inverse) File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 271, in _4ods_addRelationship val._4ods_formRelationship(inverseName,self.__class__,name,'add',self,0) File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 232, in _4ods_formRelationship raise IntegrityException(name) Ft.Ods.IntegrityException: Integrity error on relationship ownerElement For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126342&group_id=6473 From martin@loewis.home.cs.tu-berlin.de Tue Dec 19 15:53:31 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 19 Dec 2000 16:53:31 +0100 Subject: [XML-SIG] Upgrading Expat Message-ID: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de> I just imported Expat 1.2 into the PyXML tree, and updated the pyexpat module to expose the new handlers supported by Expat. Unfortunately, there is no version number in the Expat headers, so anybody compiling the expat module must now what the expat version is. For PyXML, setup.py can always know what the Expat version is we ship; for Python proper, it would default to 1.1 unless specified otherwise. Regards, Martin From uche.ogbuji@fourthought.com Tue Dec 19 16:09:29 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 19 Dec 2000 09:09:29 -0700 Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX parser In-Reply-To: Message from "Martin v. Loewis" of "Tue, 19 Dec 2000 11:05:36 +0100." <200012191005.LAA10014@loewis.home.cs.tu-berlin.de> Message-ID: <200012191609.JAA31431@localhost.localdomain> > > Hmm. So what's right? The C code or the SAX driver and docs? > > My guess is that this has nothing to do with Parse(), the function > works correctly. Instead, the problem is that pyexpat invokes a > callback on the content handler, and *that* call has problems with the > number of arguments. Most likely, it's a call to characters, which > occurs frequently when a DocumentHandler is used in a place where a > ContentHandler is expected (i.e. in SAX2). Aieee. Just so. I need to stop raising alarms when I should be sleeping. By the time I glanced at the PyArg_ParseTuple I had already convinced myself what the bug was, so I quite readily read it wrongly. Culpa mea. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Tue Dec 19 16:07:23 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Dec 2000 11:07:23 -0500 (EST) Subject: [XML-SIG] Upgrading Expat In-Reply-To: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de> References: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de> Message-ID: <14911.34747.370233.637321@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > I just imported Expat 1.2 into the PyXML tree, and updated the pyexpat > module to expose the new handlers supported by Expat. Unfortunately, > there is no version number in the Expat headers, so anybody compiling Could you file a bug report for this at http://sourceforge.net/projects/expat/? I'll try and make sure something gets added. There is an XML_ExpatVersion() function in the CVS version, but that still doesn't provide for compile-time checking, or support the older versions we've been using. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Tue Dec 19 18:55:11 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 19 Dec 2000 19:55:11 +0100 Subject: [XML-SIG] PyXML, sgmlop and xmllib In-Reply-To: <200012151044.KAA34209@tepid.osl.fast.no> (message from Dag Brattli on Fri, 15 Dec 2000 10:44:15 GMT) References: <200012151044.KAA34209@tepid.osl.fast.no> Message-ID: <200012191855.TAA12487@loewis.home.cs.tu-berlin.de> > The xmllib.py for sgmlop is missing from PyXML. Does anybody know > where to find an updated version? You can find one in old copies of PyXML, e.g. in PyXML 0.5.1. > Both README.sgmlop and xml/parsers/__init__.py tells that there > should be an xmllib.py around but it's not. Yes, that's an error in the documentation which will be corrected in the next release; users should use sgmlop directly, or, say, the SAX driver. Regards, Martin From Alexandre.Fayolle@logilab.fr Wed Dec 20 13:03:39 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 20 Dec 2000 14:03:39 +0100 (CET) Subject: [XML-SIG] 4DOM and DTD Message-ID: Hello, I was wondering if there's a way to get a reference to the DTD object once an XML document has been read using the the validating reader stub in 4DOM (the idea is to enable be able to validate it at some later point, after it's been modified, to ensure that the document in still valid before flushing it to disk, for example.) Thanks for your help. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From martin@loewis.home.cs.tu-berlin.de Wed Dec 20 14:52:43 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 20 Dec 2000 15:52:43 +0100 Subject: [XML-SIG] Lexical handlers for PyXML? Message-ID: <200012201452.PAA00800@loewis.home.cs.tu-berlin.de> > Looks as if there is no lexical handler support in drv_pyexpat or > drv_xmlproc. Sure there is. The SAX2 xmlproc driver definitely emits LexicalHandler and DeclHandler events. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed Dec 20 14:49:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 20 Dec 2000 15:49:13 +0100 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: (message from Alexandre Fayolle on Wed, 20 Dec 2000 14:03:39 +0100 (CET)) References: Message-ID: <200012201449.PAA00741@loewis.home.cs.tu-berlin.de> > I was wondering if there's a way to get a reference to the DTD > object once an XML document has been read using the the validating > reader stub in 4DOM I believe that is not possible: The 4DOM readers use only SAX1 parsers, and the only reader that reports DeclHandler and LexicalHandler events is the SAX2 xmlproc driver. Regards, Martin From uche.ogbuji@fourthought.com Wed Dec 20 15:52:24 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Wed, 20 Dec 2000 08:52:24 -0700 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: Message from "Martin v. Loewis" of "Wed, 20 Dec 2000 15:49:13 +0100." <200012201449.PAA00741@loewis.home.cs.tu-berlin.de> Message-ID: <200012201552.IAA16613@localhost.localdomain> > > I was wondering if there's a way to get a reference to the DTD > > object once an XML document has been read using the the validating > > reader stub in 4DOM > > I believe that is not possible: The 4DOM readers use only SAX1 > parsers, and the only reader that reports DeclHandler and > LexicalHandler events is the SAX2 xmlproc driver. I was actually in the process of migrating to the SAX2 framework when I ran into all the troubles I've been reporting. You've corrected me on some things so I'll have a second look, but it has been much more of a chore than it needs to be. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Wed Dec 20 18:05:15 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 20 Dec 2000 19:05:15 +0100 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: <200012201552.IAA16613@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012201552.IAA16613@localhost.localdomain> Message-ID: <200012201805.TAA01147@loewis.home.cs.tu-berlin.de> > I was actually in the process of migrating to the SAX2 framework > when I ran into all the troubles I've been reporting. You've > corrected me on some things so I'll have a second look, but it has > been much more of a chore than it needs to be. I think the decision to change the signature of characters between a DocumentHandler and a ContentHandler has by far caused the most portability problems recently. Since the number of authors that have written DocumentHandlers is limited, I hope there will be a time when this is not a problem anymore. Regards, Martin From larsga@garshol.priv.no Wed Dec 20 19:22:12 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Dec 2000 20:22:12 +0100 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: <200012201805.TAA01147@loewis.home.cs.tu-berlin.de> References: <200012201552.IAA16613@localhost.localdomain> <200012201805.TAA01147@loewis.home.cs.tu-berlin.de> Message-ID: * Martin v. Loewis | | I think the decision to change the signature of characters between a | DocumentHandler and a ContentHandler has by far caused the most | portability problems recently. Since the number of authors that have | written DocumentHandlers is limited, I hope there will be a time | when this is not a problem anymore. It's beginning to look like adding an adapter to the PyXML package would be a good idea, perhaps as part of the saxtools. I can't do it just yet, but if nobody gets there before me I will probably do it once the book is done. --Lars M. From larsga@garshol.priv.no Wed Dec 20 19:27:45 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Dec 2000 20:27:45 +0100 Subject: [XML-SIG] Adding scripts In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com> Message-ID: * A. M. Kuchling | | What do people think about adding some useful scripts to PyXML that | get installed in /usr/local/bin or somewhere like that? I like the idea. | Possibilities would be (names off the top of my head): | | xmlproc_val : Validate files using xmlproc xvcmd.py in the xmlproc distribution does this and could be used. The xmlproc distribution contains more scripts that might fall into this category: wxValidator.py : wxPython-based parser interface xpcmd.py : non-validating cousin of xpcmd.py dtdcmd.py : parse and check DTDs dtd2schema.py : naive DTD to XML Schema converter I've also been thinking about tools like: - something that normalizes XML documents - something that makes XML documents standalone - a DTD normalizer (this exists, but is not in the xmlproc distro yet) --Lars M. From larsga@garshol.priv.no Wed Dec 20 19:30:21 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 20 Dec 2000 20:30:21 +0100 Subject: [XML-SIG] Small memory leak In-Reply-To: <200012181049.LAA00706@loewis.home.cs.tu-berlin.de> References: <3A3DCF08.12C1A98F@FourThought.com> <200012181049.LAA00706@loewis.home.cs.tu-berlin.de> Message-ID: * Mike Olson | | I got around it by changing the deref function on DTDParser to also set | self.ent to None. * Martin v. Loewis | | Unless Lars Marius objects - would you like to commit that change to | PyXML? The fix is perfectly fine. I've now also applied it to my local CVS tree, which will be merged with the PyXML one as soon as I have time. --Lars M. From martin@loewis.home.cs.tu-berlin.de Wed Dec 20 21:26:39 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 20 Dec 2000 22:26:39 +0100 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: (message from Lars Marius Garshol on 20 Dec 2000 20:22:12 +0100) References: <200012201552.IAA16613@localhost.localdomain> <200012201805.TAA01147@loewis.home.cs.tu-berlin.de> Message-ID: <200012202126.WAA01550@loewis.home.cs.tu-berlin.de> > It's beginning to look like adding an adapter to the PyXML package > would be a good idea, perhaps as part of the saxtools. I can't do it > just yet, but if nobody gets there before me I will probably do it > once the book is done. It's not that changing the code is so difficult that you'd need support libraries - in my experience, the necessary changes are trivial. What *is* a problem is to know that you have to make changes, and to find out what those changes are. It is particularly confusing that the Python traceback puts you on the wrong track. Regards, Martin From fdrake@acm.org Thu Dec 21 01:55:30 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 20 Dec 2000 20:55:30 -0500 (EST) Subject: [XML-SIG] forwarded message from noreply@sourceforge.net Message-ID: <14913.25362.547012.190609@cj42289-a.reston1.va.home.com> --PrH0oNW7ir Content-Type: text/plain; charset=us-ascii Content-Description: message body and .signature Content-Transfer-Encoding: 7bit Progess! ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations --PrH0oNW7ir Content-Type: message/rfc822 Content-Description: forwarded message Content-Transfer-Encoding: 7bit Return-Path: Received: from mh3-sfba.mail.home.com ([24.0.95.134]) by mail.rdc1.md.home.com (InterMail vM.4.01.03.00 201-229-121) with ESMTP id <20001221015740.ZWNK10139.mail.rdc1.md.home.com@mh3-sfba.mail.home.com> for ; Wed, 20 Dec 2000 17:57:40 -0800 Received: from mx3-sfba.mail.home.com (mx3-sfba.mail.home.com [24.0.95.138]) by mh3-sfba.mail.home.com (8.9.3/8.9.0) with ESMTP id RAA19288 for ; Wed, 20 Dec 2000 17:57:39 -0800 (PST) Received: from mail.acm.org (mail.acm.org [199.222.69.4]) by mx3-sfba.mail.home.com (8.9.1/8.9.1) with ESMTP id RAA17390 for ; Wed, 20 Dec 2000 17:57:39 -0800 (PST) Received: from usw-sf-netmisc.sourceforge.net (usw-sf-sshgate.sourceforge.net [216.136.171.253]) by mail.acm.org (8.9.3/8.9.3) with ESMTP id UAA39740 for ; Wed, 20 Dec 2000 20:57:34 -0500 Received: from usw-sf-web2-b.sourceforge.net ([10.3.1.6] helo=usw-sf-web2.sourceforge.net ident=mail) by usw-sf-netmisc.sourceforge.net with esmtp (Exim 3.16 #1 (Debian)) id 148uz4-0001hS-00; Wed, 20 Dec 2000 17:57:30 -0800 Received: from nobody by usw-sf-web2.sourceforge.net with local (Exim 3.16 #1 (Debian)) id 148uz5-0000Mm-00; Wed, 20 Dec 2000 17:57:31 -0800 Message-Id: From: noreply@sourceforge.net Sender: nobody To: loewis@informatik.hu-berlin.de, fdrake@acm.org, expat-bugs@sourceforge.net Subject: [Bug #126353] xmlparse.h does not indicate a version Date: Wed, 20 Dec 2000 17:57:31 -0800 Bug #126353, was updated on 2000-Dec-19 09:04 Here is a current snapshot of the bug. Project: Expat XML Parser Category: None Status: Closed Resolution: Fixed Bug Group: None Priority: 6 Submitted by: loewis Assigned to : fdrake Summary: xmlparse.h does not indicate a version Details: Applications that need to compile for different versions of expat cannot determine the expat version at compile time. Therefore, manual intervention or advanced guessing is necessary to compile such applications, which is undesirable. Follow-Ups: Date: 2000-Dec-20 17:57 By: fdrake Comment: Added compile-time detectable version information to expat.h (new name for xmlparse.h). Three new #defines, XML_MAJOR_VERSION, XML_MINOR_VERSION, and XML_MICRO_VERSION, have been added. XML_ExpatVersion() computes it's result dynamically using this information, and the new function XML_ExpatVersionInfo() returns this information in a structure. This will be available in Expat 1.96.0. ------------------------------------------------------- Date: 2000-Dec-19 09:09 By: fdrake Comment: Assigned to me, since I asked Martin to actually make this a bug report. I'll note that the application in question is the Python binding for Expat, but the need is not limited to scripting language bindings. ------------------------------------------------------- For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126353&group_id=10127 --PrH0oNW7ir-- From Alexandre.Fayolle@logilab.fr Thu Dec 21 09:27:39 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 21 Dec 2000 10:27:39 +0100 (CET) Subject: [XML-SIG] New stuff on w3.org Message-ID: Since I believe not everybody on this list monitors the W3C website closely (I, for one, do not), I thought I might as well post a few pieces on info here concerning Recommendations and Proposed Recommendations. For more info, please refer to http://www.w3.org On Dec. 19th, XHTML Basic bacame a Recommentation. On Dec. 20th, XLink and XML Base became Proposed Recommentations. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From mbulik@mecalog.fr Thu Dec 21 14:05:47 2000 From: mbulik@mecalog.fr (Michal BULIK) Date: Thu, 21 Dec 2000 15:05:47 +0100 Subject: [XML-SIG] problem with install Message-ID: <3A420E3B.B8131D61@mecalog.fr> I have just installed python 1.5.2 from source on an SGI with Irix 6.5 and then I've tried to install PyXML. When I try to execute setup.py the pgm complains about missing distutils.core : jorasses 1788% python setup.py build Traceback (innermost last): File "setup.py", line 8, in ? from distutils.core import setup, Extension ImportError: No module named distutils.core I could find no such a file in the python tree ... I'm sorry if the question is completely stupid, but I'm a python newbie ... Best regards, Michal Bulik ------------------------------------------------------------- Michal BULIK Tel. : 33 (0) 1 55 59 01 90 MECALOG Fax : 33 (0) 1 55 59 96 36 Centre d'affaires, Bat. A E-mail : mbulik@mecalog.fr 2, rue de la Renaissance F - 92184 ANTONY CEDEX http://www.radioss.com ------------------------------------------------------------- From martin@loewis.home.cs.tu-berlin.de Thu Dec 21 14:58:33 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 21 Dec 2000 15:58:33 +0100 Subject: [XML-SIG] problem with install In-Reply-To: <3A420E3B.B8131D61@mecalog.fr> (message from Michal BULIK on Thu, 21 Dec 2000 15:05:47 +0100) References: <3A420E3B.B8131D61@mecalog.fr> Message-ID: <200012211458.PAA00665@loewis.home.cs.tu-berlin.de> > ImportError: No module named distutils.core > > I could find no such a file in the python tree ... You need to install the distutils, http://www.python.org/sigs/distutils-sig Distutils are included with Python starting from 1.6. Regards, Martin From Alexandre.Fayolle@logilab.fr Thu Dec 21 16:30:25 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 21 Dec 2000 17:30:25 +0100 (CET) Subject: [XML-SIG] 4DOM and DTD In-Reply-To: <200012201449.PAA00741@loewis.home.cs.tu-berlin.de> Message-ID: On Wed, 20 Dec 2000, Martin v. Loewis wrote: > > I was wondering if there's a way to get a reference to the DTD > > object once an XML document has been read using the the validating > > reader stub in 4DOM > > I believe that is not possible: The 4DOM readers use only SAX1 > parsers, and the only reader that reports DeclHandler and > LexicalHandler events is the SAX2 xmlproc driver. I'm a bit surprised, but Uche did not comment on this, so you must be right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide then? I really thought that specifying validate=1 in FromXml made it use xmlproc. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From noreply@sourceforge.net Thu Dec 21 16:52:37 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Dec 2000 08:52:37 -0800 Subject: [XML-SIG] [Bug #126612] 4DOM: handling attribute default value Message-ID: Bug #126612, was updated on 2000-Dec-21 08:52 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: 4DOM: handling attribute default value Details: Hi there, I tried to investigate this, but got stuck with the lack of Sax2 support, since resolution involves accessing a DTD object. The DOM spec says that Element.removeAttribute should do the following: "If the removed attribute is known to have a default value, an attribute immediately appears containing the default value as well as the corresponding namespace URI, local name, and prefix when applicable." This is not the case in the current implementation of 4DOM. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126612&group_id=6473 From Alexandre.Fayolle@logilab.fr Thu Dec 21 17:00:14 2000 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 21 Dec 2000 18:00:14 +0100 (CET) Subject: [XML-SIG] 4DOM and DTD In-Reply-To: Message-ID: On Thu, 21 Dec 2000, Alexandre Fayolle wrote: > I'm a bit surprised, but Uche did not comment on this, so you must be > right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide > then? I really thought that specifying validate=1 in FromXml made it use > xmlproc. OK, I refered to the Source code, and see the problem. I'll sum it up, in case someone else is interested but is too lazy to check for him/herself. If I'm wrong, please correct me. There are two packages providing Sax interface to parsers, xml.sax.drivers and xml.sax.drivers2. The first one uses Sax1 parsers, and is used by xml.dom.ext.reader.Sax2. I reckon the latter will be soon upgraded to use xml.sax.drivers2, but could not so far because of the lack of SAX2 parsers in xml-sig (?). However, everything should be ready in reader.Sax2 to use the Sax2 interfaces to the xml-sig parsers. Alexandre Fayolle -- http://www.logilab.com Narval is the first software agent available as free software (GPL). LOGILAB, Paris (France). From noreply@sourceforge.net Thu Dec 21 17:01:55 2000 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Dec 2000 09:01:55 -0800 Subject: [XML-SIG] [Bug #126613] 4DOM: documentType node has empty systemID Message-ID: Bug #126613, was updated on 2000-Dec-21 09:01 Here is a current snapshot of the bug. Project: Python/XML Category: 4Suite Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: afayolle Assigned to : nobody Summary: 4DOM: documentType node has empty systemID Details: This is probably due to a SAX1 parser being used in reader.Sax2, and therefore does not report the documentType properly; if so please consider this report as a reminder of something to be checked when the package is updated. When building a DOM with validate=1, the doctype systemID and publicID are empty strings. For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=126613&group_id=6473 From div@commerceflow.com Thu Dec 21 21:59:21 2000 From: div@commerceflow.com (Div Shekhar) Date: Thu, 21 Dec 2000 13:59:21 -0800 Subject: [XML-SIG] problem with install References: <3A420E3B.B8131D61@mecalog.fr> <200012211458.PAA00665@loewis.home.cs.tu-berlin.de> Message-ID: <3A427D39.F352B92F@commerceflow.com> I've had a similar problem with 1.6a2, so I'm currently using PyXML 0.5.5.1 div@div:~/py/PyXML-0.6.2$ python setup.py Traceback (most recent call last): File "setup.py", line 8, in ? from distutils.core import setup, Extension ImportError: cannot import name Extension From uche.ogbuji@fourthought.com Fri Dec 22 04:17:49 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 21 Dec 2000 21:17:49 -0700 Subject: [XML-SIG] TEST: IGNORE Message-ID: <3A42D5ED.60CF99B5@fourthought.com> -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Dec 22 04:57:23 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 21 Dec 2000 21:57:23 -0700 Subject: [XML-SIG] 4DOM and DTD In-Reply-To: Message from Alexandre Fayolle of "Thu, 21 Dec 2000 18:00:14 +0100." Message-ID: <200012220457.VAA01821@localhost.localdomain> > On Thu, 21 Dec 2000, Alexandre Fayolle wrote: > > > I'm a bit surprised, but Uche did not comment on this, so you must be > > right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide > > then? I really thought that specifying validate=1 in FromXml made it use > > xmlproc. > > OK, I refered to the Source code, and see the problem. I'll sum it up, in > case someone else is interested but is too lazy to check for > him/herself. If I'm wrong, please correct me. > > There are two packages providing Sax interface to parsers, xml.sax.drivers > and xml.sax.drivers2. The first one uses Sax1 parsers, and is used by > xml.dom.ext.reader.Sax2. I reckon the latter will be soon upgraded to use > xml.sax.drivers2, but could not so far because of the lack of SAX2 parsers > in xml-sig (?). However, everything should be ready in reader.Sax2 to use > the Sax2 interfaces to the xml-sig parsers. Close. I actually went most of the way on this, as you can see from the latest CVS snapshop. I ran into a lot of problems which I mostly misinterpreted out of fatigue and sloth. I plan to have another go, probably today, and I might even add LexicalHandler support to drv_pyexpat while I'm at it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Fri Dec 22 05:26:19 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Thu, 21 Dec 2000 22:26:19 -0700 Subject: [XML-SIG] Oddities Message-ID: <200012220526.WAA01901@localhost.localdomain> I think I have the pyexpat/lexhandler work in hand. However, while testing it, I ran into two oddities in setup.py. Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find extensions/expat/xmlparse/hashtable.c. I just commented this out of setup.py and it compiles fine now. Secondly, it tries to place the docs at /usr/local/xmldoc Tsk. tsk. That should be /usr/local/doc/PyXML- It looks, however, as if someone went to some length to avoid the standard way, so I'd like to know why before fixing it. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From eugeneai@icc.ru Fri Dec 22 09:19:11 2000 From: eugeneai@icc.ru (Evgeny Cherkashin) Date: Fri, 22 Dec 2000 17:19:11 +0800 Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat Message-ID: <200012220816.QAA08820@monster.icc.ru> This is a multi-part message in MIME format. --Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hi! Please find patch to support python encodings by pyexpat. Is it possible to include it in next release of PyXML? It seems that the patch will work fine for 8bit->unicode translation. The patch works simple: it builds expat structure encoding table by translation of template (vector of chars "\0\1...\0xff') into desired encoding (no translation procedure needed) Sincerely, Evgeny -- --Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30 Content-Type: application/octet-stream; name="pyexpat_diff" Content-Disposition: attachment; filename="pyexpat_diff" Content-Transfer-Encoding: base64 QmluYXJ5IGZpbGVzIG9yaWcvUHlYTUwtMC42LjIvYnVpbGQvbGliLmxpbnV4LWk1ODYtMi4wL194 bWxwbHVzL3BhcnNlcnMvcHlleHBhdC5zbyBhbmQgbmV3L1B5WE1MLTAuNi4yL2J1aWxkL2xpYi5s aW51eC1pNTg2LTIuMC9feG1scGx1cy9wYXJzZXJzL3B5ZXhwYXQuc28gZGlmZmVyCmRpZmYgLXJ1 TiBvcmlnL1B5WE1MLTAuNi4yL2V4dGVuc2lvbnMvcHlleHBhdC5jIG5ldy9QeVhNTC0wLjYuMi9l eHRlbnNpb25zL3B5ZXhwYXQuYwotLS0gb3JpZy9QeVhNTC0wLjYuMi9leHRlbnNpb25zL3B5ZXhw YXQuYwlUaHUgTm92ICAyIDEyOjU0OjUwIDIwMDAKKysrIG5ldy9QeVhNTC0wLjYuMi9leHRlbnNp b25zL3B5ZXhwYXQuYwlGcmkgRGVjIDIyIDExOjE2OjMyIDIwMDAKQEAgLTYyNSw2ICs2MjUsNjEg QEAKIC8qIC0tLS0tLS0tLS0gKi8KIAogCisjaWYgIShQWV9NQUpPUl9WRVJTSU9OID09IDEgJiYg UFlfTUlOT1JfVkVSU0lPTiA8IDYpCisKKy8qIAorICAgIHB5ZXhwYXQgaW50ZXJuYXRpb25hbCBl bmNvZGluZyBzdXBwb3J0LgorICAgIE1ha2UgaXQgYXMgc2ltcGxlIGFzIHBvc3NpYmxlLgorKi8K Kworc3RhdGljIGNoYXIgdGVtcGxhdGVfYnVmZmVyWzI1Nl07CitQeU9iamVjdCAqIHRlbXBsYXRl X3N0cmluZz1OVUxMOworCitzdGF0aWMgdm9pZCAKK2luaXRfdGVtcGxhdGVfYnVmZmVyKCkKK3sK KyAgICBpbnQgaTsKKyAgICBmb3IgKGk9MDtpPDI1NjtpKyspIHsKKwl0ZW1wbGF0ZV9idWZmZXJb aV09aTsKKyAgICB9OworICAgIHRlbXBsYXRlX2J1ZmZlclsyNTZdPTA7Cit9OworCitpbnQgCitQ eVVua25vd25FbmNvZGluZ0hhbmRsZXIodm9pZCAqZW5jb2RpbmdIYW5kbGVyRGF0YSwgCitjb25z dCBYTUxfQ2hhciAqbmFtZSwgCitYTUxfRW5jb2RpbmcgKiBpbmZvKQoreworICAgIFB5VW5pY29k ZU9iamVjdCAqIF91X3N0cmluZz1OVUxMOworICAgIGludCByZXN1bHQ9MDsKKyAgICBpbnQgaTsK KyAgICAKKyAgICBfdV9zdHJpbmc9KFB5VW5pY29kZU9iamVjdCAqKSBQeVVuaWNvZGVfRGVjb2Rl KHRlbXBsYXRlX2J1ZmZlciwgMjU2LCBuYW1lLCAicmVwbGFjZSIpOyAvLyBZZXMsIHN1cHBvcnRz IG9ubHkgOGJpdCBlbmNvZGluZ3MKKyAgICAKKyAgICBpZiAoX3Vfc3RyaW5nPT1OVUxMKSB7CisJ cmV0dXJuIHJlc3VsdDsKKyAgICB9OworICAgIAorICAgIGZvciAoaT0wOyBpPDI1NjsgaSsrKSB7 CisJUHlfVU5JQ09ERSBjID0gX3Vfc3RyaW5nLT5zdHJbaV0gOyAvLyBTdHVwaWQgdG8gYWNjZXNz IGRpcmVjdGx5LCBidXQgZmFzdAorCWlmIChjPT1QeV9VTklDT0RFX1JFUExBQ0VNRU5UX0NIQVJB Q1RFUikgeworCSAgICBpbmZvLT5tYXBbaV0gPSAtMTsKKwl9IGVsc2UgeworCSAgICBpbmZvLT5t YXBbaV0gPSBjOworCX07CisgICAgfTsKKyAgICAKKyAgICBpbmZvLT5kYXRhID0gTlVMTDsKKyAg ICBpbmZvLT5jb252ZXJ0ID0gTlVMTDsKKyAgICBpbmZvLT5yZWxlYXNlID0gTlVMTDsKKyAgICBy ZXN1bHQ9MTsKKyAgICAKKyAgICBQeV9ERUNSRUYoX3Vfc3RyaW5nKTsKKyAgICByZXR1cm4gcmVz dWx0OworfQorCisjZW5kaWYKKwogc3RhdGljIHhtbHBhcnNlb2JqZWN0ICoKIG5ld3htbHBhcnNl b2JqZWN0KGNoYXIgKmVuY29kaW5nLCBjaGFyICpuYW1lc3BhY2Vfc2VwYXJhdG9yKQogewpAQCAt NjU4LDYgKzcxMywxMCBAQAogICAgICAgICByZXR1cm4gTlVMTDsKICAgICB9CiAgICAgWE1MX1Nl dFVzZXJEYXRhKHNlbGYtPml0c2VsZiwgKHZvaWQgKilzZWxmKTsKKyNpZiBQWV9NQUpPUl9WRVJT SU9OID09IDEgJiYgUFlfTUlOT1JfVkVSU0lPTiA8IDYKKyNlbHNlCisgICAgWE1MX1NldFVua25v d25FbmNvZGluZ0hhbmRsZXIoc2VsZi0+aXRzZWxmLCAoWE1MX1Vua25vd25FbmNvZGluZ0hhbmRs ZXIpIFB5VW5rbm93bkVuY29kaW5nSGFuZGxlciwgTlVMTCk7CisjZW5kaWYKIAogICAgIGZvcihp ID0gMDsgaGFuZGxlcl9pbmZvW2ldLm5hbWUgIT0gTlVMTDsgaSsrKQogICAgICAgICAvKiBkbyBu b3RoaW5nICovOwpAQCAtODIxLDcgKzg4MCw2IEBACiAvKiBFbmQgb2YgY29kZSBmb3IgeG1scGFy c2VyIG9iamVjdHMgKi8KIC8qIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tICovCiAKLQogc3RhdGljIGNoYXIgcHlleHBhdF9QYXJzZXJDcmVh dGVfX2RvY19fW10gPQogIlBhcnNlckNyZWF0ZShbZW5jb2RpbmdbLCBuYW1lc3BhY2Vfc2VwYXJh dG9yXV0pIC0+IHBhcnNlclxuXAogUmV0dXJuIGEgbmV3IFhNTCBwYXJzZXIgb2JqZWN0LiI7CkBA IC05MzcsNiArOTk1LDEwIEBACiAgICAgUHlNb2R1bGVfQWRkT2JqZWN0KG0sICJfX3ZlcnNpb25f XyIsCiAgICAgICAgICAgICAgICAgICAgICAgIFB5U3RyaW5nX0Zyb21TdHJpbmdBbmRTaXplKHJl disxMSwgc3RybGVuKHJldisxMSktMikpOwogCisjaWYgUFlfTUFKT1JfVkVSU0lPTiA9PSAxICYm IFBZX01JTk9SX1ZFUlNJT04gPCA2CisjZWxzZQorICAgIGluaXRfdGVtcGxhdGVfYnVmZmVyKCk7 CisjZW5kaWYKICAgICAvKiBYWFggV2hlbiBFeHBhdCBzdXBwb3J0cyBzb21lIHdheSBvZiBmaWd1 cmluZyBvdXQgaG93IGl0IHdhcwogICAgICAgIGNvbXBpbGVkLCB0aGlzIHNob3VsZCBjaGVjayBh bmQgc2V0IG5hdGl2ZV9lbmNvZGluZyAKICAgICAgICBhcHByb3ByaWF0ZWx5LiAK --Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30 Content-Type: application/octet-stream; name="enc_test.xml" Content-Disposition: attachment; filename="enc_test.xml" Content-Transfer-Encoding: base64 PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0ia29pOC1yIj8+Cjx0YWcgbmFtZT0i6c3RIiB2 YWx1ZT0i+s7B3sXOycUiPgrhINzUzyDX08Ugz9PUwczYztnFINPJzdfPzNkgwsXaIChcImUpOgoK ysPVy8XOx9vd2sjfxtnXwdDSz8zE1tHe083J1NjCwOrj9evl7uf7/fro/+b59+Hw8u/s5Pb88f7z 7en0+OLgCjwvdGFnPg== --Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30 Content-Type: application/octet-stream; name="test_encodings.py" Content-Disposition: attachment; filename="test_encodings.py" Content-Transfer-Encoding: base64 IyEvdXNyL2Jpbi9lbnYgcHl0aG9uCgoiIiIKVGhpcyB3aWxsIHNob3cgcnVzc2lhbiB0ZXh0IGlu IGtvaTgtciBlbmNvZGluZy4KIiIiCgpmcm9tIHhtbC5wYXJzZXJzIGltcG9ydCBleHBhdA0KaW1w b3J0IHN0cmluZw0KDQpjbGFzcyBYTUxUcmVlOg0KCWRlZiBfX2luaXRfXyhzZWxmKToNCgkJcGFz cw0KDQoJIyBEZWZpbmUgYSBoYW5kbGVyIGZvciBzdGFydCBlbGVtZW50IGV2ZW50cw0KCWRlZiBT dGFydEVsZW1lbnQoc2VsZiwgbmFtZSwgYXR0cnMgKToNCgkJI25hbWUgPSBuYW1lLmVuY29kZSgp DQoJCXByaW50ICI8IiwgcmVwcihuYW1lKSwgIj4iDQoJCXByaW50ICJhdHRyIG5hbWU6IiwgYXR0 cnMuZ2V0KCJuYW1lIix1IiIpLmVuY29kZSgia29pOC1yIikKCQlwcmludCAiYXR0ciB2YWx1ZToi LCBhdHRycy5nZXQoInZhbHVlIix1IiIpLmVuY29kZSgia29pOC1yIikKDQoJZGVmIEVuZEVsZW1l bnQoc2VsZiwgIG5hbWUgKToNCgkJcHJpbnQgIjwvIiwgcmVwcihuYW1lKSwgIj4iDQoNCglkZWYg Q2hhcmFjdGVyRGF0YShzZWxmLCBkYXRhICk6DQoJCWlmIHN0cmluZy5zdHJpcChkYXRhKToNCgkJ CWRhdGEgPSBkYXRhLmVuY29kZSgia29pOC1yIikNCgkJCXByaW50IGRhdGENCg0KDQoJZGVmIExv YWRUcmVlKHNlbGYsIGZpbGVuYW1lKToNCgkJIyBDcmVhdGUgYSBwYXJzZXINCgkJUGFyc2VyID0g ZXhwYXQuUGFyc2VyQ3JlYXRlKCkNCg0KCQkjIFRlbGwgdGhlIHBhcnNlciB3aGF0IHRoZSBzdGFy dCBlbGVtZW50IGhhbmRsZXIgaXMNCgkJUGFyc2VyLlN0YXJ0RWxlbWVudEhhbmRsZXIgPSBzZWxm LlN0YXJ0RWxlbWVudA0KCQlQYXJzZXIuRW5kRWxlbWVudEhhbmRsZXIgPSBzZWxmLkVuZEVsZW1l bnQNCgkJUGFyc2VyLkNoYXJhY3RlckRhdGFIYW5kbGVyID0gc2VsZi5DaGFyYWN0ZXJEYXRhDQoN CgkJIyBQYXJzZSB0aGUgWE1MIEZpbGUNCgkJUGFyc2VyU3RhdHVzID0gUGFyc2VyLlBhcnNlKG9w ZW4oZmlsZW5hbWUsJ3InKS5yZWFkKCksIDEpDQoNCg0KZGVmIHJ1blRlc3QoKToNCgl3aW4gPSBY TUxUcmVlKCkNCgl3aW4uTG9hZFRyZWUoImVuY190ZXN0LnhtbCIpDQoJcmV0dXJuIHdpbg0KDQpy dW5UZXN0KCkK --Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30-- From martin@loewis.home.cs.tu-berlin.de Fri Dec 22 13:31:59 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 22 Dec 2000 14:31:59 +0100 Subject: [XML-SIG] Oddities In-Reply-To: <200012220526.WAA01901@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012220526.WAA01901@localhost.localdomain> Message-ID: <200012221331.OAA00885@loewis.home.cs.tu-berlin.de> > Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find > extensions/expat/xmlparse/hashtable.c. I just commented this out of > setup.py and it compiles fine now. Oops. I had an uncommitted fix for that in my setup.py... > Secondly, it tries to place the docs at > > /usr/local/xmldoc > > Tsk. tsk. That should be > > /usr/local/doc/PyXML- > > It looks, however, as if someone went to some length to avoid the standard > way, so I'd like to know why before fixing it. By default, setup.py should not install the doc files at all - what would be the standard way to have the installed there? Again, it was a checkin error that it is installed - only that the doc2xmldoc=1 line should *not* have been committed :-( The intent here is that the doc files go into the RPM as %doc, are installed as xmldoc on Windows, and are not touched otherwise. Regards, Martin From uche.ogbuji@fourthought.com Fri Dec 22 15:44:37 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 22 Dec 2000 08:44:37 -0700 Subject: [XML-SIG] Oddities In-Reply-To: Message from "Martin v. Loewis" of "Fri, 22 Dec 2000 14:31:59 +0100." <200012221331.OAA00885@loewis.home.cs.tu-berlin.de> Message-ID: <200012221544.IAA03136@localhost.localdomain> > > Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find > > extensions/expat/xmlparse/hashtable.c. I just commented this out of > > setup.py and it compiles fine now. > > Oops. I had an uncommitted fix for that in my setup.py... Ah. Never mind, though. I checked it in with my pyexpat changes. > > Secondly, it tries to place the docs at > > > > /usr/local/xmldoc > > > > Tsk. tsk. That should be > > > > /usr/local/doc/PyXML- > > > > It looks, however, as if someone went to some length to avoid the standard > > way, so I'd like to know why before fixing it. > > By default, setup.py should not install the doc files at all - what > would be the standard way to have the installed there? By "standard" I mean Linux standard. I'm not sure if Solaris, etc. place docs at the same spot. But for Linux, vendor-packaged docs go in /usr/doc/- and third-party package docs to /usr/local/doc/- Actually, it looks as if Red Hat has started moving to the latter location for all docs. Anyway, every Python/distutils package I've installed follows this convention and places its docs in /usr/local/doc/ -. As does 4Suite, of course. I think the default should be to install docs. They are an important part of the package. Even better if people know exactly where to look for them. > Again, it was a > checkin error that it is installed - only that the doc2xmldoc=1 line > should *not* have been committed :-( > > The intent here is that the doc files go into the RPM as %doc, are > installed as xmldoc on Windows, and are not touched otherwise. Hmm. I think they should be installed by setup.py as well. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From ben@thoughtstream.org Fri Dec 22 16:00:44 2000 From: ben@thoughtstream.org (Ben Darnell) Date: Fri, 22 Dec 2000 11:00:44 -0500 Subject: [XML-SIG] Oddities In-Reply-To: <200012221544.IAA03136@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 22, 2000 at 08:44:37AM -0700 References: <200012221544.IAA03136@localhost.localdomain> Message-ID: <20001222110044.B2227@unity.ncsu.edu> On Fri, Dec 22, 2000 at 08:44:37AM -0700, uche.ogbuji@fourthought.com wrote: > By "standard" I mean Linux standard. I'm not sure if Solaris, etc. place docs > at the same spot. But for Linux, vendor-packaged docs go in > > /usr/doc/- > > and third-party package docs to > > /usr/local/doc/- By "standard" you mean Red Hat standard. Debian, for instance, uses /usr/share/doc/ -Ben -- Ben Darnell ben@thoughtstream.org http://thoughtstream.org Finger bgdarnel@debian.org for PGP/GPG key 1024D/1F06E509 From uche.ogbuji@fourthought.com Fri Dec 22 16:13:54 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 22 Dec 2000 09:13:54 -0700 Subject: [XML-SIG] Oddities In-Reply-To: Message from Ben Darnell of "Fri, 22 Dec 2000 11:00:44 EST." <20001222110044.B2227@unity.ncsu.edu> Message-ID: <200012221613.JAA03272@localhost.localdomain> > On Fri, Dec 22, 2000 at 08:44:37AM -0700, uche.ogbuji@fourthought.com wrote: > > By "standard" I mean Linux standard. I'm not sure if Solaris, etc. place docs > > at the same spot. But for Linux, vendor-packaged docs go in > > > > /usr/doc/- > > > > and third-party package docs to > > > > /usr/local/doc/- > > By "standard" you mean Red Hat standard. Debian, for instance, uses > /usr/share/doc/ Really? I thought /usr/local/doc was Linux Standard Base. I don't have a ref, mind, I was commenting off-head. Also, many other distros besides Red Hat do it this way. Nevertheless, I still think docs should be installed with every package. Do you have any idea for an algorithm for package documentation location? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Fri Dec 22 16:06:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 22 Dec 2000 17:06:13 +0100 Subject: [XML-SIG] Oddities In-Reply-To: <200012221544.IAA03136@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012221544.IAA03136@localhost.localdomain> Message-ID: <200012221606.RAA01619@loewis.home.cs.tu-berlin.de> > I think the default should be to install docs. They are an > important part of the package. That poses an interesting problem for distutils. Karl Eichwalder from SuSE requested that the PyXML RPM should use the %doc directive for declaring documentation files. That is easy enough to do; rpm will then, on installation, chose a location for these files (typically /usr/doc or /usr/share/doc). *That*, AFAIK, is the official way. Now, if I also install them, then bdist_rpm will include them twice, and the will also get installed twice. That is undesirable. Regards, Martin From teg@redhat.com Fri Dec 22 23:34:50 2000 From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) Date: 22 Dec 2000 18:34:50 -0500 Subject: [XML-SIG] Oddities In-Reply-To: <200012221544.IAA03136@localhost.localdomain> References: <200012221544.IAA03136@localhost.localdomain> Message-ID: uche.ogbuji@fourthought.com writes: > By "standard" I mean Linux standard. I'm not sure if Solaris, etc. place docs > at the same spot. But for Linux, vendor-packaged docs go in > > /usr/doc/- It's /usr/share/doc/- now (FHS) > /usr/local/doc/- > > Actually, it looks as if Red Hat has started moving to the latter location for > all docs. No, we don't touch /usr/local at all. -- Trond Eivind Glomsrød Red Hat, Inc. From paulp@ActiveState.com Sat Dec 23 00:13:27 2000 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 22 Dec 2000 16:13:27 -0800 Subject: [XML-SIG] New stuff on w3.org References: Message-ID: <3A43EE27.BBC9D32A@ActiveState.com> Alexandre Fayolle wrote: > > Since I believe not everybody on this list monitors the W3C website > closely (I, for one, do not), ... > On Dec. 19th, XHTML Basic bacame a Recommentation. > On Dec. 20th, XLink and XML Base became Proposed Recommentations. An even more interesting development is that a draft version of XSLT now has a formal mechanism for embedding other scripting languages. An example is at the bottom http://www.w3.org/TR/xslt11 Paul Prescod function upper(n) { return n.toUpperCase(); } function lower(n) { return n.toLowerCase(); } function iff(arg1, arg2, arg3) { if (arg1) { return arg2; } else { return arg3; } } From div@commerceflow.com Sat Dec 23 01:50:20 2000 From: div@commerceflow.com (Div Shekhar) Date: Fri, 22 Dec 2000 17:50:20 -0800 Subject: [XML-SIG] how to clean up parser without causing parsing? Message-ID: <3A4404DC.D76CB60D@commerceflow.com> Hi! I'm using xmlproc through the SAX interface. (PyXML 0.5.5.1/Python 1.6) I have this code to parse a file: ! p = XMLParserFactory.make_parser( 'xml.sax.drivers.drv_xmlproc' ) ! sp = MyHandler() ! p.setDocumentHandler( sp ) # other handlers left out for simplicity ! try: ! p.parseFile( file ) ! finally: # even if an exception is raised ! p.close() # call close() to free memory My handler does some validation, and raises an exception when it's not happy with the XML that comes from the file. The close() causes remaining data to be parsed, which results in more SAX callbacks coming to my handler, which throws new exceptions which are very confusing. To work around this, I replaced the 'p.close()' with the close() implementation in drv_xmlproc, and moved one line above the 'finally': ! try: ! p.parseFile( file ) ! p.parser.close() # \ cut & paste from ! finally: ! p.parser.deref() # | drv_xmlproc.close() ! p.err_handler = p.dtd_handler = None # | ! p.doc_handler = p.parser = None # | ! p.locator = p.ent_handler = None # / I thought of the following alternatives: 1. have my handler set a flag, and then ignore further calls. 2. point the parser to a do nothing handler before calling close() 3. doing the following: p.reset() p.close() But they're not as efficient. What should I be doing? Sincerely, Div (P.S. Any chance of ExtendedParser adding a free() method? :) From uche.ogbuji@fourthought.com Sat Dec 23 04:07:51 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 22 Dec 2000 21:07:51 -0700 Subject: [XML-SIG] Oddities In-Reply-To: Message from teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) of "22 Dec 2000 18:34:50 EST." Message-ID: <200012230407.VAA01167@localhost.localdomain> > uche.ogbuji@fourthought.com writes: > = > > By "standard" I mean Linux standard. I'm not sure if Solaris, etc. p= lace docs = > > at the same spot. But for Linux, vendor-packaged docs go in > > = > > /usr/doc/- > = > It's /usr/share/doc/- now (FHS) > = > > /usr/local/doc/- > > = > > Actually, it looks as if Red Hat has started moving to the latter loc= ation for = > > all docs. > = > No, we don't touch /usr/local at all. Good to have an authority on the subject. Thanks. So it looks like /usr/share/doc/- across = the = board now. Does this settle the matter for Linux? Any thoughts about other Unixen? -- = Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com = 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sat Dec 23 04:10:17 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 22 Dec 2000 21:10:17 -0700 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: Message from Paul Prescod of "Fri, 22 Dec 2000 16:13:27 PST." <3A43EE27.BBC9D32A@ActiveState.com> Message-ID: <200012230410.VAA01180@localhost.localdomain> > Alexandre Fayolle wrote: > > > > Since I believe not everybody on this list monitors the W3C website > > closely (I, for one, do not), ... > > > On Dec. 19th, XHTML Basic bacame a Recommentation. > > On Dec. 20th, XLink and XML Base became Proposed Recommentations. > > An even more interesting development is that a draft version of XSLT now > has a formal mechanism for embedding other scripting languages. An > example is at the bottom Yes. This was one of my more depressing discoveries of the month. They couldn't just provide a node-set function, maybe some grouping primitives, and be done with XSLT 1.1. Sigh. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 11:23:14 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 12:23:14 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: (message from Alexandre Fayolle on Wed, 13 Dec 2000 18:45:25 +0100 (CET)) References: Message-ID: <200012231123.MAA01132@loewis.home.cs.tu-berlin.de> > > To Ft/Dom/__init__.py and expected everything to break, but all > > was well. It seems that at least Python 2.0 is clever when the > > same import can be made as a package and an object. Is this also > > the casde with Python 1.5.2? > > I tried that with python 1.5.2 (adding a empty Node class to > xml/dom/__init__.py) and it looks like it's fine too. Actually, there is a problem. If you do "import xml.dom.Node", then you'll loose the class from __init__. Please see the attached example. So I think the change of adding xml.dom.Node needs to be reverted somehow. Regards Martin #!/bin/sh # This is a shell archive (produced by GNU sharutils 4.2). # To extract the files from this archive, save it to some FILE, remove # everything before the `!/bin/sh' line above, then type `sh FILE'. # # Made on 2000-12-23 12:22 CET by . # Source directory was `/home/martin/tmp/x'. # # Existing files will *not* be overwritten unless `-c' is specified. # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 19 -rw-r--r-- pack/__init__.py # 19 -rw-r--r-- pack/A.py # 61 -rw-r--r-- testing.py # save_IFS="${IFS}" IFS="${IFS}:" gettext_dir=FAILED locale_dir=FAILED first_param="$1" for dir in $PATH do if test "$gettext_dir" = FAILED && test -f $dir/gettext \ && ($dir/gettext --version >/dev/null 2>&1) then set `$dir/gettext --version 2>&1` if test "$3" = GNU then gettext_dir=$dir fi fi if test "$locale_dir" = FAILED && test -f $dir/shar \ && ($dir/shar --print-text-domain-dir >/dev/null 2>&1) then locale_dir=`$dir/shar --print-text-domain-dir` fi done IFS="$save_IFS" if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED then echo=echo else TEXTDOMAINDIR=$locale_dir export TEXTDOMAINDIR TEXTDOMAIN=sharutils export TEXTDOMAIN echo="$gettext_dir/gettext -s" fi touch -am 1231235999 $$.touch >/dev/null 2>&1 if test ! -f 1231235999 && test -f $$.touch; then shar_touch=touch else shar_touch=: echo $echo 'WARNING: not restoring timestamps. Consider getting and' $echo "installing GNU \`touch', distributed in GNU File Utilities..." echo fi rm -f 1231235999 $$.touch # if mkdir _sh01060; then $echo 'x -' 'creating lock directory' else $echo 'failed to create lock directory' exit 1 fi # ============= pack/__init__.py ============== if test ! -d 'pack'; then $echo 'x -' 'creating directory' 'pack' mkdir 'pack' fi if test -f 'pack/__init__.py' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'pack/__init__.py' '(file already exists)' else $echo 'x -' extracting 'pack/__init__.py' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'pack/__init__.py' && class A: X val = 3 SHAR_EOF $shar_touch -am 12231215100 'pack/__init__.py' && chmod 0644 'pack/__init__.py' || $echo 'restore of' 'pack/__init__.py' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'pack/__init__.py:' 'MD5 check failed' d0e22baa34ce648d02a5985bb626ca97 pack/__init__.py SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'pack/__init__.py'`" test 19 -eq "$shar_count" || $echo 'pack/__init__.py:' 'original size' '19,' 'current size' "$shar_count!" fi fi # ============= pack/A.py ============== if test -f 'pack/A.py' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'pack/A.py' '(file already exists)' else $echo 'x -' extracting 'pack/A.py' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'pack/A.py' && class B: X val = 4 SHAR_EOF $shar_touch -am 12231215100 'pack/A.py' && chmod 0644 'pack/A.py' || $echo 'restore of' 'pack/A.py' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'pack/A.py:' 'MD5 check failed' 7c0bf0114ca239435403d33f3c475cb3 pack/A.py SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'pack/A.py'`" test 19 -eq "$shar_count" || $echo 'pack/A.py:' 'original size' '19,' 'current size' "$shar_count!" fi fi # ============= testing.py ============== if test -f 'testing.py' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'testing.py' '(file already exists)' else $echo 'x -' extracting 'testing.py' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'testing.py' && import pack print pack.A.val import pack.A print pack.A.val X SHAR_EOF $shar_touch -am 12231216100 'testing.py' && chmod 0644 'testing.py' || $echo 'restore of' 'testing.py' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'testing.py:' 'MD5 check failed' 045d5a097b0968507fce45f10d00c5b2 testing.py SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'testing.py'`" test 61 -eq "$shar_count" || $echo 'testing.py:' 'original size' '61,' 'current size' "$shar_count!" fi fi rm -fr _sh01060 exit 0 From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 11:44:31 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 12:44:31 +0100 Subject: [XML-SIG] how to clean up parser without causing parsing? In-Reply-To: <3A4404DC.D76CB60D@commerceflow.com> (message from Div Shekhar on Fri, 22 Dec 2000 17:50:20 -0800) References: <3A4404DC.D76CB60D@commerceflow.com> Message-ID: <200012231144.MAA01234@loewis.home.cs.tu-berlin.de> > I thought of the following alternatives: > > 1. have my handler set a flag, and then ignore further calls. > 2. point the parser to a do nothing handler before calling close() > 3. doing the following: p.reset() p.close() > > But they're not as efficient. What should I be doing? I suggest to use PyXML 0.6, and the SAX2 xmlproc driver. AFAICT, it is safe to just drop the reference to the parser (certainly in Python 2.0, where potential cycles are collected). The xmlproc driver is not incremental, so the SAX2 version releases the underlying parser at the end of parse(). If you release the reader object, that will in turn release the references to your handlers. So in short, you should write ! p = XMLParserFactory.make_parser( 'xml.sax.drivers.drv_xmlproc' ) ! sp = MyHandler() ! p.setDocumentHandler( sp ) # other handlers left out for simplicity ! p.parseFile( file ) ! p = None > (P.S. Any chance of ExtendedParser adding a free() method? :) Since it is an experimental interface, why not? Please submit patches to sourceforge.net/projects/pyxml. In Python, explicit memory management is normally not necessary. So these methods are typically called close() or release(). Please note that adding the operation to the interface won't give you anything; you'd also have to modify the existing parsers. I personally won't change any of the existing SAX1 drivers; efforts should be put into the SAX2 drivers, IMO. Also, PyXML 0.5 is no longer maintained, so I'd apply any patches I get only to 0.6.x. Regards, Martin From akuchlin@mems-exchange.org Sat Dec 23 13:57:53 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Sat, 23 Dec 2000 08:57:53 -0500 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <200012230410.VAA01180@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 22, 2000 at 09:10:17PM -0700 References: <200012230410.VAA01180@localhost.localdomain> Message-ID: <20001223085753.A11534@newcnri.cnri.reston.va.us> On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote: >They couldn't just provide a node-set function, maybe some grouping >primitives, and be done with XSLT 1.1. Lots of people on W3C mailing lists do seem hell-bent on giving the world another example of rampant overcomplexity to put on the shelf next to the OSI protocols. (For me, it was XSchema: two documents specify it, and they're around 400K and 600K of HTML. Don't hold your breath waiting for a Python implementation...) --amk From uche.ogbuji@fourthought.com Sat Dec 23 16:45:32 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 23 Dec 2000 09:45:32 -0700 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: Message from Andrew Kuchling of "Sat, 23 Dec 2000 08:57:53 EST." <20001223085753.A11534@newcnri.cnri.reston.va.us> Message-ID: <200012231645.JAA02928@localhost.localdomain> > On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote: > >They couldn't just provide a node-set function, maybe some grouping > >primitives, and be done with XSLT 1.1. > > Lots of people on W3C mailing lists do seem hell-bent on giving the > world another example of rampant overcomplexity to put on the shelf > next to the OSI protocols. (For me, it was XSchema: two documents > specify it, and they're around 400K and 600K of HTML. Don't hold your > breath waiting for a Python implementation...) Yeah. I just had my moment with XSchema: while wrestling with SOAP. I'd always been familiar with them, and that's why I had always shunned them for Schematron, but now that I've got even more close and personal with XSchema, I think I can say I've never seen a worse example of an overwrought specification since ANSI STD C++. I know that a Python version of XSchema is unlikely to come from this quarter. We're happily chugging away with Schematron. Now we use it through XSLT, but we might consider writing a pure Python engine for it (a Perl engine was recently announced). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sat Dec 23 16:49:37 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 23 Dec 2000 09:49:37 -0700 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: Message from "Martin v. Loewis" of "Sat, 23 Dec 2000 12:23:14 +0100." <200012231123.MAA01132@loewis.home.cs.tu-berlin.de> Message-ID: <200012231649.JAA02948@localhost.localdomain> > > > To Ft/Dom/__init__.py and expected everything to break, but all > > > was well. It seems that at least Python 2.0 is clever when the > > > same import can be made as a package and an object. Is this also > > > the casde with Python 1.5.2? > > > > I tried that with python 1.5.2 (adding a empty Node class to > > xml/dom/__init__.py) and it looks like it's fine too. > > Actually, there is a problem. If you do "import xml.dom.Node", then > you'll loose the class from __init__. Please see the attached example. OK, not so fast, mate. Do you really think we'll let yer out so easily after yer talked us into this? Seriously, after a quick survey of my code, the only place I import Node is in order to get at the constants. I think we can deal with the problem you mentioned by re-naming xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all the internal 4DOM imports accordingly. This should break little existing code and it would keep the nenefit of being able to share the constants and any other material we need for normalization across the DOM implementations. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fdrake@acm.org Sat Dec 23 16:50:49 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 23 Dec 2000 11:50:49 -0500 (EST) Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012231649.JAA02948@localhost.localdomain> References: <200012231123.MAA01132@loewis.home.cs.tu-berlin.de> <200012231649.JAA02948@localhost.localdomain> Message-ID: <14916.55273.8789.573578@cj42289-a.reston1.va.home.com> uche.ogbuji@fourthought.com writes: > I think we can deal with the problem you mentioned by re-naming > xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all > the internal 4DOM imports accordingly. I think this is the best solution. > This should break little existing code and it would keep the > nenefit of being able to share the constants and any other material > we need for normalization across the DOM implementations. Especially since this is more important than being able to access the implementation class via import! There's a factory method on the Document object, so there's no need to import the class from outside the DOM implementation. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 21:10:39 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 22:10:39 +0100 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <20001223085753.A11534@newcnri.cnri.reston.va.us> (message from Andrew Kuchling on Sat, 23 Dec 2000 08:57:53 -0500) References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> Message-ID: <200012232110.WAA00730@loewis.home.cs.tu-berlin.de> > Lots of people on W3C mailing lists do seem hell-bent on giving the > world another example of rampant overcomplexity to put on the shelf > next to the OSI protocols. My feelings exactly. That's what you get when you try to extend an archtitecture to do things it was not supposed to do... Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 21:17:07 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 22:17:07 +0100 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: <200012231649.JAA02948@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012231649.JAA02948@localhost.localdomain> Message-ID: <200012232117.WAA00775@loewis.home.cs.tu-berlin.de> > I think we can deal with the problem you mentioned by re-naming > xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all > the internal 4DOM imports accordingly. That would solve the problem as well, so I'm all for it. Any proposal for a new module name? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 21:45:18 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 22:45:18 +0100 Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat In-Reply-To: <200012220816.QAA08820@monster.icc.ru> (message from Evgeny Cherkashin on Fri, 22 Dec 2000 17:19:11 +0800) References: <200012220816.QAA08820@monster.icc.ru> Message-ID: <200012232145.WAA01155@loewis.home.cs.tu-berlin.de> > Please find patch to support python encodings by pyexpat. > Is it possible to include it in next release of PyXML? Dear Evgeni, I always wanted to have that feature in pyexpat, so I'm glad you wrote the code. I've applied it to the CVS tree, so it will appear in the next release. Thanks for contributing, Martin From fdrake@acm.org Sat Dec 23 22:10:24 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 23 Dec 2000 17:10:24 -0500 (EST) Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <200012232110.WAA00730@loewis.home.cs.tu-berlin.de> References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <200012232110.WAA00730@loewis.home.cs.tu-berlin.de> Message-ID: <14917.8912.749393.373487@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > My feelings exactly. That's what you get when you try to extend an > archtitecture to do things it was not supposed to do... I'm afraid the DOM isn't faring much better. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From seven.nine@gte.net Sat Dec 23 22:23:50 2000 From: seven.nine@gte.net (Chris Jones) Date: Sat, 23 Dec 2000 14:23:50 -0800 Subject: [XML-SIG] New stuff on w3.org References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> Message-ID: <3A4525F6.8010906@gte.net> Forgive the abrupt de-cloak... but this is nice to hear... I'm diving quite deeply into implementing Python with PyXML, and was really wondering what you (the creators) think the core aspects of PyXML are-- I'm really banking on it, think its a great API, and would like to know where you're headed. When any organization is going to dive deep into a technology, questions (and FUD) inevitably arise about the longevity and direction of the technologies you're using. I agree that complexity for complexity's sake is the fastest way to kill an API, protocol, or standard. Anyone care to speak up about what they think the core functionality of PyXML should be for the long-term (in this world I think thats about 6 to 9 months)? Thanks in advance, Chris Jones Consultant Andrew Kuchling wrote: > On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote: > >> They couldn't just provide a node-set function, maybe some grouping >> primitives, and be done with XSLT 1.1. > > > Lots of people on W3C mailing lists do seem hell-bent on giving the > world another example of rampant overcomplexity to put on the shelf > next to the OSI protocols. (For me, it was XSchema: two documents > specify it, and they're around 400K and 600K of HTML. Don't hold your > breath waiting for a Python implementation...) > > --amk > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig > > From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 22:57:28 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 23:57:28 +0100 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <14917.8912.749393.373487@cj42289-a.reston1.va.home.com> (fdrake@acm.org) References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <200012232110.WAA00730@loewis.home.cs.tu-berlin.de> <14917.8912.749393.373487@cj42289-a.reston1.va.home.com> Message-ID: <200012232257.XAA01573@loewis.home.cs.tu-berlin.de> > > My feelings exactly. That's what you get when you try to extend an > > archtitecture to do things it was not supposed to do... > > I'm afraid the DOM isn't faring much better. ;-( It just occured to me that this is an application of Peter's principle. A good technology results in users asking for more, so it is extended and extended until it reaches its level of incompetence. Martin From martin@loewis.home.cs.tu-berlin.de Sat Dec 23 22:56:29 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 23 Dec 2000 23:56:29 +0100 Subject: [XML-SIG] Better pyexpat backtraces Message-ID: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de> Since a number of people have run into the trap of thinking that Parse is called with a bad argument number, I just checked-in a patch to pyexpat that adds an artificial frame object on the stack. With that, if you pass a DocumentHandler in place of a ContentHandler, you now get a back-trace that reads Traceback (most recent call last): File "a.py", line 48, in ? parser.parse( comic_xml ) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed self._parser.Parse(data, isFinal) File "pyexpat.c", line 370, in CharacterData TypeError: not enough arguments to characters(); expected 4, got 2 Normally, you would not get a stack frame that points to pyexpat.c; please let me know what you think. The "to characters()" part is not my doing; that is a Python 2.1 feature. Regards, Martin From uche.ogbuji@fourthought.com Sun Dec 24 02:08:45 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 23 Dec 2000 19:08:45 -0700 Subject: [XML-SIG] Specializing DOM exceptions In-Reply-To: Message from "Martin v. Loewis" of "Sat, 23 Dec 2000 22:17:07 +0100." <200012232117.WAA00775@loewis.home.cs.tu-berlin.de> Message-ID: <200012240208.TAA01782@localhost.localdomain> > > I think we can deal with the problem you mentioned by re-naming > > xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all > > the internal 4DOM imports accordingly. > > That would solve the problem as well, so I'm all for it. Any proposal > for a new module name? "xml/dom/FtNode.py", since a module name can't start with "4". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Sun Dec 24 02:21:40 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sat, 23 Dec 2000 19:21:40 -0700 Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat In-Reply-To: Message from "Martin v. Loewis" of "Sat, 23 Dec 2000 22:45:18 +0100." <200012232145.WAA01155@loewis.home.cs.tu-berlin.de> Message-ID: <200012240221.TAA01835@localhost.localdomain> > > Please find patch to support python encodings by pyexpat. > > Is it possible to include it in next release of PyXML? > > Dear Evgeni, > > I always wanted to have that feature in pyexpat, so I'm glad you wrote > the code. I've applied it to the CVS tree, so it will appear in the > next release. > > Thanks for contributing, > Martin Seconded. Now folks can process XML with all the great unicode codecs folks have been contributing without needing to go through Python to convert to Unicode first. Much appreciated. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tpassin@home.com Sun Dec 24 03:04:13 2000 From: tpassin@home.com (Thomas B. Passin) Date: Sat, 23 Dec 2000 22:04:13 -0500 Subject: [XML-SIG] Holiday Best Wishes References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net> Message-ID: <001d01c06d56$320b72c0$7cac1218@reston1.va.home.com> Happy Holidays to everyone on the list. It's been a privilege to share your knowledge and contributions this year. Cheers, Tom P From ken@bitsko.slc.ut.us Tue Dec 26 17:49:41 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 26 Dec 2000 11:49:41 -0600 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: "Martin v. Loewis"'s message of "Fri, 15 Dec 2000 22:06:23 +0100" References: <200012141527.IAA18240@localhost.localdomain> <3A3A2F3C.8AE8A27E@FourThought.com> <200012152106.WAA00918@loewis.home.cs.tu-berlin.de> Message-ID: "Martin v. Loewis" writes: > > At the plug-in API level, I'd be interested in something more at > > the "location path" level, possibly an array of steps, each step > > with axis, node test, and list of predicates. > > Yes, that would be a reasonable XPath API. How do you like the > 4Suite ParsedLocationPath class, and corresponding structures? Likely! :-) I briefly skimmed the source and 4suite.org and can't seem to get a good description of what those structures look like, is there a URL I missed? Note also: I'm getting odd URL redirects going to 4suite.{org|com}, with URLs being replaced with quoted strings that then won't resolve: http://www.4suite.org/ --> http://www.4suite.org/"index.epy" This seems to happen on "directory" URLs. -- Ken From ken@bitsko.slc.ut.us Tue Dec 26 23:10:38 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 26 Dec 2000 17:10:38 -0600 Subject: [XML-SIG] Better pyexpat backtraces In-Reply-To: "Martin v. Loewis"'s message of "Sat, 23 Dec 2000 23:56:29 +0100" References: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de> Message-ID: "Martin v. Loewis" writes: > Since a number of people have run into the trap of thinking that Parse > is called with a bad argument number, I just checked-in a patch to > pyexpat that adds an artificial frame object on the stack. With that, > if you pass a DocumentHandler in place of a ContentHandler, you now > get a back-trace that reads > > Traceback (most recent call last): > File "a.py", line 48, in ? > parser.parse( comic_xml ) > File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse > xmlreader.IncrementalParser.parse(self, source) > File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse > self.feed(buffer) > File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed > self._parser.Parse(data, isFinal) > File "pyexpat.c", line 370, in CharacterData > TypeError: not enough arguments to characters(); expected 4, got 2 > > Normally, you would not get a stack frame that points to pyexpat.c; > please let me know what you think. > > The "to characters()" part is not my doing; that is a Python 2.1 > feature. But that is correct and the intended error message, right? Passing a DocumentHandler to a SAX2 parser will result in characters() being called with "only" two arguments when a SAX1 handler expects four. Just checking what you meant there. -- Ken From uche.ogbuji@fourthought.com Wed Dec 27 01:20:52 2000 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 26 Dec 2000 18:20:52 -0700 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: Message from Ken MacLeod of "26 Dec 2000 11:49:41 CST." Message-ID: <200012270120.SAA02777@localhost.localdomain> > "Martin v. Loewis" writes: > > > > At the plug-in API level, I'd be interested in something more at > > > the "location path" level, possibly an array of steps, each step > > > with axis, node test, and list of predicates. > > > > Yes, that would be a reasonable XPath API. How do you like the > > 4Suite ParsedLocationPath class, and corresponding structures? > > Likely! :-) I briefly skimmed the source and 4suite.org and can't seem > to get a good description of what those structures look like, is there > a URL I missed? There is no such beast. These were originally intended to be purely internal objects. If we decided to expose them as an API, we'd want to decide on the naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and document them properly. For now, your best bet is to have a look at XPath/Parsed* in 4Suite (and also check out Xslt/Parsed* for the associated Pattern machine objects). > Note also: I'm getting odd URL redirects going to 4suite.{org|com}, > with URLs being replaced with quoted strings that then won't resolve: > > http://www.4suite.org/ > --> http://www.4suite.org/"index.epy" > > This seems to happen on "directory" URLs. Hmm. I looked into this, but I'm not seeing it. I went as bare-bones as possible to avoid user agent artifacts and all that: [uogbuji@borgia uogbuji]$ telnet www.4suite.org 80 Trying 204.144.146.184... Connected to dollar.4suite.org. Escape character is '^]'. GET http://www.4suite.org/ HTTP/1.0 HTTP/1.1 200 OK Date: Wed, 27 Dec 2000 01:14:59 GMT Server: Apache/1.3.12 (Unix) mod_snake/0.4.1 Last-Modified: Thu, 02 Nov 2000 19:07:30 GMT ETag: "36f0d-178-3a01bb72" Accept-Ranges: bytes Content-Length: 376 Connection: close Content-Type: text/html

Click to Enter
Connection closed by foreign host. [uogbuji@borgia uogbuji]$ As you can see, the meta refresh goes to the relative "index.epy". I don't know how this would cause the effect you mention. What user agent are you using? Thanks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From akuchlin@mems-exchange.org Wed Dec 27 16:26:05 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 27 Dec 2000 11:26:05 -0500 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <3A4525F6.8010906@gte.net>; from seven.nine@gte.net on Sat, Dec 23, 2000 at 02:23:50PM -0800 References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net> Message-ID: <20001227112605.C31745@kronos.cnri.reston.va.us> On Sat, Dec 23, 2000 at 02:23:50PM -0800, Chris Jones wrote: >Anyone care to speak up about what they think the core functionality of >PyXML should be for the long-term (in this world I think thats about 6 >to 9 months)? Beats me; it's whatever people choose to implement and contribute. To pursue the XSchema example, I'm sure that if someone implemented XSchema for Python, it would certainly be considered for inclusion. But no one has said publicly that they're working on such support or released any code. This is how free software projects work; usually there's no plan, so you can't say what will happen over the next 6 months. If a feature -- XSchema, XSLT, whatever -- matters to you, you can help implement it and rewrite the plan yourself, but prediction is essentially impossible. (At the last Python conference Guido had a set of slides with new features for 1.6 and 2.0; some of those features made it in, but several others didn't.) --amk From sean@digitome.com Wed Dec 27 18:04:25 2000 From: sean@digitome.com (Sean McGrath) Date: Wed, 27 Dec 2000 18:04:25 +0000 Subject: [XML-SIG] New stuff on w3.org Message-ID: <4.3.2.7.0.20001227180351.00ba8ee0@www.digitome.com> [Andrew Kuchling] >Beats me; it's whatever people choose to implement and contribute. To >pursue the XSchema example, I'm sure that if someone implemented >XSchema for Python, it would certainly be considered for inclusion. >But no one has said publicly that they're working on such support or >released any code. Henry Thompson's XSL is an XSchema validator written in Python. Souce is available. See: http://www.ltg.ed.ac.uk/~ht/xsv-status.html Sean From sean@digitome.com Wed Dec 27 18:08:25 2000 From: sean@digitome.com (Sean McGrath) Date: Wed, 27 Dec 2000 18:08:25 +0000 Subject: Freudian slip alert (Was: Re: [XML-SIG] New stuff on w3.org) Message-ID: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com> Of course, I meant "XSV" not "XSL" in my list posting. Sorry. Henry Thompson's Python implementation of an XSChema validator is XSV, not XSL. Sean ------- [Andrew Kuchling] >Beats me; it's whatever people choose to implement and contribute. To >pursue the XSchema example, I'm sure that if someone implemented >XSchema for Python, it would certainly be considered for inclusion. >But no one has said publicly that they're working on such support or >released any code. Henry Thompson's XSV is an XSchema validator written in Python. Souce is available. See: http://www.ltg.ed.ac.uk/~ht/xsv-status.html Sean From martin@loewis.home.cs.tu-berlin.de Thu Dec 28 10:01:42 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 28 Dec 2000 11:01:42 +0100 Subject: [XML-SIG] New stuff on w3.org In-Reply-To: <3A4525F6.8010906@gte.net> (message from Chris Jones on Sat, 23 Dec 2000 14:23:50 -0800) References: <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net> Message-ID: <200012281001.LAA00943@loewis.home.cs.tu-berlin.de> > Forgive the abrupt de-cloak... but this is nice to hear... I'm diving > quite deeply into implementing Python with PyXML, and was really > wondering what you (the creators) think the core aspects of PyXML are-- > I'm really banking on it, think its a great API, and would like to know > where you're headed. To me, the core part of PyXML are the parsers (expat and xmlproc), and the parser APIs (SAX and DOM); for all of those, you'll see improvements in upcoming releases. > Anyone care to speak up about what they think the core functionality > of PyXML should be for the long-term (in this world I think thats > about 6 to 9 months)? As amk explained, free software lives from user contributions. Without any contributions, PyXML will look essentially the same in 9 months as it does today. There is a chance that we start distributing more parts of 4Suite in PyXML, in addition to 4DOM; these parts would most likely be 4XPath and 4XSLT. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Thu Dec 28 10:39:13 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 28 Dec 2000 11:39:13 +0100 Subject: Freudian slip alert (Was: Re: [XML-SIG] New stuff on w3.org) In-Reply-To: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com> (message from Sean McGrath on Wed, 27 Dec 2000 18:08:25 +0000) References: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com> Message-ID: <200012281039.LAA01170@loewis.home.cs.tu-berlin.de> > Henry Thompson's Python implementation of an XSChema > validator is XSV, not XSL. Thanks for the pointer; I've added a link on the PyXML "other software" page. Martin From larsga@garshol.priv.no Thu Dec 28 11:47:28 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 28 Dec 2000 12:47:28 +0100 Subject: [XML-SIG] saxtools package Message-ID: I'll start working on the saxtools package once my book is done and the new year begins. Meanwhile, I'll need to refer to it from the book, and so it needs a package name. To me xml.saxtools seems like the obvious solution. What say ye? --Lars M. From larsga@garshol.priv.no Thu Dec 28 11:59:37 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 28 Dec 2000 12:59:37 +0100 Subject: [XML-SIG] Using SP in Python Message-ID: I've written a simple wrapper for the SP SGML parser's generic API and also a SAX driver for that wrapper. The SAX driver probably belongs in saxtools and will be placed there. The SP wrapper is perhaps better off as a separate project, but if anyone feels it belongs in the XML-SIG, I'll be happy to reconsider. Appended are a sample application that emits ESIS, the C module and the SAX driver, in that order. Comments of all kinds would be welcome. --Lars M. ====================================================================== import pysp class EsisHandler: def start_element(self, name, attrs): print "(" + name for pair in attrs.items(): print "A%s %s" % pair def error(self, msg): print "E" + msg def data(self, data): print "-" + repr(data) def sdata(self, text, name): print "[" + text + " " + name def pi(self, data): print "?" + data def end_element(self, name): print ")" + name class Empty: pass pysp.add_catalog("/home/larsga/data/catalog") parser = pysp.make_parser("/home/larsga/cvs-co/data/book/bok.sgml") parser.run(Empty()) ====================================================================== /** * A wrapper module for the generic API of the SP SGML parser. * * $Id$ */ /** * Todo: * - implement more events * - support more SP options * - better support for attributes through dedicated attribute type? * - let parser use an internal dictionary to intern element and attr names? */ #include "Python.h" // define this if your libsp.a has been built with multibyte support // (this is the default) // undefine it if it has not // if you fail to define this and libsp.a _does_ have multibyte support // all your element and attribute names will be one character long... #define SP_MULTI_BYTE 1 #include "ParserEventGeneratorKit.h" // defines SP_VERSION as SP_T("x.x.x") #include "version.h" #define SP_T(x) x static char pysp_module_documentation[] = "Python wrapper for the generic API of the SP SGML parser."; /* ---------------------------------------------------------------------- INTERNAL STUFF */ ParserEventGeneratorKit parserGenerator; /* ---------------------------------------------------------------------- UTILITIES */ char* extract_string(const SGMLApplication::CharString &string) { char* str = new char[string.len + 1]; for (int ix = 0; ix < string.len; ix++) str[ix] = char(string.ptr[ix]); str[string.len] = 0; return str; } void extract_string(char* buffer, const SGMLApplication::CharString &string) { for (int ix = 0; ix < string.len; ix++) buffer[ix] = char(string.ptr[ix]); } /* ---------------------------------------------------------------------- SGML APPLICATION */ class PYSPApplication : public SGMLApplication { public: PYSPApplication(PyObject *_pyapp, EventGenerator *_eventGen) { Py_INCREF(_pyapp); pyapp = _pyapp; eventGen = _eventGen; position = NULL; openEntity = NULL; } void openEntityChange(const OpenEntityPtr &event) { openEntity = (OpenEntityPtr*) &event; } void startElement(const StartElementEvent &event) { position = (Position*) &event.pos; char *gi = extract_string(event.gi); PyObject *attrs = PyDict_New(); for (size_t ix = 0; ix < event.nAttributes; ix++) { if (event.attributes[ix].type != Attribute::implied && event.attributes[ix].type != Attribute::invalid) { char *name = extract_string(event.attributes[ix].name); PyDict_SetItemString(attrs, name, getValue(event.attributes[ix])); delete[] name; } } PyObject *arglist = Py_BuildValue("(sO)", gi, attrs); handleCallback("start_element", arglist); delete[] gi; } void data(const DataEvent &event) { position = (Position*) &event.pos; char *data = extract_string(event.data); PyObject *arglist = Py_BuildValue("(s)", data); handleCallback("data", arglist); delete[] data; } void sdata(const SdataEvent &event) { position = (Position*) &event.pos; char *text = extract_string(event.text); char *name = extract_string(event.entityName); PyObject *arglist = Py_BuildValue("(ss)", text, name); handleCallback("sdata", arglist); delete[] text, name; } void endElement(const EndElementEvent &event) { position = (Position*) &event.pos; char *gi = extract_string(event.gi); PyObject *arglist = Py_BuildValue("(s)", gi); handleCallback("end_element", arglist); delete[] gi; } void pi(const PiEvent &event) { position = (Position*) &event.pos; char *data = extract_string(event.data); PyObject *arglist = Py_BuildValue("(s)", data); handleCallback("pi", arglist); delete[] data; } void error(const ErrorEvent &event) { position = (Position*) &event.pos; char* msg = extract_string(event.message); PyObject *arglist = Py_BuildValue("(s)", msg); handleCallback("error", arglist); delete[] msg; } Location* getLocation() { return new Location(*openEntity, *position); } ~PYSPApplication() { Py_DECREF(pyapp); } private: PyObject *pyapp; EventGenerator *eventGen; Position *position; OpenEntityPtr *openEntity; void handleCallback(char *name, PyObject *arglist) { // get function from pyapp PyObject *callback = PyObject_GetAttrString(pyapp, name); if (callback == NULL) { PyErr_Clear(); // not really a problem; ignore return; } if (!PyCallable_Check(callback)) { eventGen->halt(); PyErr_SetString(PyExc_TypeError, "callback attribute must be callable"); return; } // call function if (PyEval_CallObject(callback, arglist) == NULL) eventGen->halt(); Py_DECREF(arglist); } PyObject *getValue(const Attribute &attr) { PyObject *value = PyString_FromString(""); char *tmp_value; int value_len = 0; int pos = 0; switch(attr.type) { case Attribute::cdata: for (int ix = 0; ix < attr.nCdataChunks; ix++) value_len += attr.cdataChunks[ix].data.len; tmp_value = new char[value_len + 1]; for (int ix = 0; ix < attr.nCdataChunks; ix++) { extract_string(tmp_value + pos, attr.cdataChunks[ix].data); pos += attr.cdataChunks[ix].data.len; } tmp_value[pos] = 0; value = PyString_FromString(tmp_value); delete[] tmp_value; break; case Attribute::tokenized: tmp_value = extract_string(attr.tokens); value = PyString_FromString(tmp_value); delete[] tmp_value; break; } return value; } }; /* ---------------------------------------------------------------------- SGML PARSER CLASS */ typedef struct { PyObject_HEAD EventGenerator *eventGen; PYSPApplication *application; } sgmlparseobject; static char Sgmlparsetype__doc__[] = "SGML parser."; static char sgmlparse_halt__doc__[] = "halt()\n Halt the generation of events by run(). This can be at any point\nduring the execution of run(). It is safe to call this function from a\ndifferent thread from that which called run(). "; extern "C" PyObject* sgmlparse_halt(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; self->eventGen->halt(); Py_INCREF(Py_None); return Py_None; } static char sgmlparse_get_line_number__doc__[] = "get_line_number()\n Returns the line number of the current event."; extern "C" PyObject* sgmlparse_get_line_number(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); PyObject *value = Py_BuildValue("i", location->lineNumber); delete location; return value; } static char sgmlparse_get_column_number__doc__[] = "get_column_number()\n Returns the column number of the current event."; extern "C" PyObject* sgmlparse_get_column_number(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); PyObject *value = Py_BuildValue("i", location->columnNumber); delete location; return value; } static char sgmlparse_get_filename__doc__[] = "get_filename()\n Returns the name of the file where the current event occurred."; extern "C" PyObject* sgmlparse_get_filename(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); char* tmp = extract_string(location->filename); PyObject *value = Py_BuildValue("s", tmp); delete location; delete tmp; return value; } static char sgmlparse_get_entity_name__doc__[] = "get_entity_name()\n Returns the name of the entity where the current event occurred."; extern "C" PyObject* sgmlparse_get_entity_name(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); char* tmp = extract_string(location->entityName); PyObject *value = Py_BuildValue("s", tmp); delete location; delete tmp; return value; } static char sgmlparse_get_byte_offset__doc__[] = "get_byte_offset()\n Returns number of bytes in the storage object preceding the point\nwhere the current event occurred."; extern "C" PyObject* sgmlparse_get_byte_offset(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); PyObject *value = Py_BuildValue("i", location->byteOffset); delete location; return value; } static char sgmlparse_get_entity_offset__doc__[] = "get_entity_offset()\n Returns number of characters in the current entity preceding the\npoint where the current event occurred."; extern "C" PyObject* sgmlparse_get_entity_offset(sgmlparseobject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; SGMLApplication::Location *location = self->application->getLocation(); PyObject *value = Py_BuildValue("i", location->entityOffset); delete location; return value; } static char sgmlparse_run__doc__[] = "run(app)\n Generate the sequence of events, calling the corresponding\nmember of app for each event. Returns the number of errors. This must\nnot be called more than once for any SGML parser object."; extern "C" PyObject* sgmlparse_run(sgmlparseobject *self, PyObject *args) { PyObject* app; if (!PyArg_ParseTuple(args, "O", &app)) return NULL; PYSPApplication realapp = PYSPApplication(app, self->eventGen); self->application = &realapp; self->eventGen->run(realapp); if (PyErr_Occurred()) return NULL; // an error occurred in a callback; tell Python about it Py_INCREF(Py_None); return Py_None; } struct PyMethodDef sgmlparse_methods[] = { {"halt", (PyCFunction) sgmlparse_halt, METH_VARARGS, sgmlparse_halt__doc__}, {"run", (PyCFunction) sgmlparse_run, METH_VARARGS, sgmlparse_run__doc__}, {"get_line_number", (PyCFunction) sgmlparse_get_line_number, METH_VARARGS, sgmlparse_get_line_number__doc__}, {"get_column_number",(PyCFunction) sgmlparse_get_column_number, METH_VARARGS, sgmlparse_get_column_number__doc__}, {"get_filename", (PyCFunction) sgmlparse_get_filename, METH_VARARGS, sgmlparse_get_filename__doc__}, {"get_entity_name", (PyCFunction) sgmlparse_get_entity_name, METH_VARARGS, sgmlparse_get_entity_name__doc__}, {"get_byte_offset", (PyCFunction) sgmlparse_get_byte_offset, METH_VARARGS, sgmlparse_get_byte_offset__doc__}, {"get_entity_offset",(PyCFunction) sgmlparse_get_entity_offset, METH_VARARGS, sgmlparse_get_entity_offset__doc__}, {NULL, NULL} /* sentinel */ }; extern "C" void sgmlparse_dealloc(sgmlparseobject *self) { delete self->eventGen; self->eventGen = NULL; PyMem_DEL(self); } extern "C" PyObject* sgmlparse_getattr(sgmlparseobject *self, char *name) { if (strcmp(name, "__members__") == 0){ PyObject *list = PyList_New(0); for (int ix = 0; sgmlparse_methods[ix].ml_name; ix++) PyList_Append(list, PyString_FromString(sgmlparse_methods[ix].ml_name)); return list; } return Py_FindMethod(sgmlparse_methods, (PyObject*) self, name); } static PyTypeObject Sgmlparsetype = { PyObject_HEAD_INIT(NULL) 0, /*ob_size*/ "sgmlparser", /*tp_name*/ sizeof(sgmlparseobject), /*tp_basicsize*/ 0, /*tp_itemsize*/ /* methods */ (destructor) sgmlparse_dealloc, /*tp_dealloc*/ (printfunc) 0, /*tp_print*/ (getattrfunc) sgmlparse_getattr, /*tp_getattr*/ (setattrfunc) 0, /*tp_setattr*/ (cmpfunc) 0, /*tp_compare*/ (reprfunc) 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ (hashfunc) 0, /*tp_hash*/ (ternaryfunc) 0, /*tp_call*/ (reprfunc) 0, /*tp_str*/ /* Space for future expansion */ 0L,0L,0L,0L, Sgmlparsetype__doc__ /* Documentation string */ }; /* ---------------------------------------------------------------------- FUNCTIONS */ static char pysp_make_parser__doc__[] = "make_parser(filename) -> parser\n\ Return a new SGML parser object bound to the given file name."; extern "C" PyObject* pysp_make_parser(PyObject *self, PyObject *args) { char *filename; sgmlparseobject *parser; if (!PyArg_ParseTuple(args, "s", &filename)) return NULL; EventGenerator *evg = parserGenerator.makeEventGenerator(1, &filename); parser = PyObject_NEW(sgmlparseobject, &Sgmlparsetype); if (parser == NULL) return NULL; parser->eventGen = evg; evg->inhibitMessages(1); // don't print error messages to stderr return (PyObject*) parser; } static char pysp_add_catalog__doc__[] = "add_catalog(filename)\n\ Tell the pysp module about a catalog file."; extern "C" PyObject* pysp_add_catalog(PyObject *self, PyObject *args) { char *filename; if (!PyArg_ParseTuple(args, "s", &filename)) return NULL; parserGenerator.setOption(ParserEventGeneratorKit::addCatalog, filename); Py_INCREF(Py_None); return Py_None; } /* ---------------------------------------------------------------------- MODULE INITIALIZATION */ static PyMethodDef PYSPMethods[] = { {"make_parser", pysp_make_parser, METH_VARARGS, pysp_make_parser__doc__}, {"add_catalog", pysp_add_catalog, METH_VARARGS, pysp_add_catalog__doc__}, {NULL, NULL} /* Sentinel */ }; extern "C" void initpysp() { PyObject *module, *dict; Sgmlparsetype.ob_type = &PyType_Type; module = Py_InitModule4("pysp", PYSPMethods, pysp_module_documentation, (PyObject*) NULL, PYTHON_API_VERSION); dict = PyModule_GetDict(module); PyDict_SetItemString(dict, "sp_version", Py_BuildValue("s", SP_VERSION)); PyDict_SetItemString(dict, "version", Py_BuildValue("s", "0.01")); } ====================================================================== """A SAX driver for the SP SGML parser, using the pysp extension module. $Id$ """ # --- Import wizardry from xml.sax._exceptions import * try: import pysp except ImportError: raise SAXReaderNotAvailable("pysp not supported", None) from xml.sax import xmlreader, saxutils, handler AttributesImpl = xmlreader.AttributesImpl import string # --- Constants version = "0.01" namespace = "http://garshol.priv.no/symbolic/" property_catalogs = "http://garshol.priv.no/symbolic/" + "properties/catalogs" # --- PySPParser class PySPParser(xmlreader.XMLReader, xmlreader.Locator): "SAX driver for the pysp C module." def __init__(self): xmlreader.XMLReader.__init__(self) self._source = xmlreader.InputSource() self._parser = None self._parsing = 0 self._catalogs = [] # XMLReader methods def parse(self, source): "Parse an XML document from a file. (Nothing else is supported.)" source = saxutils.prepare_input_source(source) self._cont_handler.setDocumentLocator(self) for catalog in self._catalogs: pysp.add_catalog(catalog) parser = pysp.make_parser(source.getSystemId()) parser.run(self) def getFeature(self, name): raise SAXNotRecognizedException("Feature '%s' not recognized" % name) def setFeature(self, name, state): if self._parsing: raise SAXNotSupportedException("Cannot set features while parsing") raise SAXNotRecognizedException("Feature '%s' not recognized" % name) def getProperty(self, name): if name == property_catalogs: return self._catalogs raise SAXNotRecognizedException("Property '%s' not recognized" % name) def setProperty(self, name, value): if self._parsing: raise SAXNotSupportedException("Cannot set properties while parsing") if name == property_catalogs: if type(value) != type([]): raise SAXException("Value must be a list of strings!") self._catalogs = value return raise SAXNotRecognizedException("Property '%s' not recognized" % name) # Locator methods def getColumnNumber(self): return self._parser.get_column_number() def getLineNumber(self): return self._parser.get_line_number() def getPublicId(self): return None # FIXME! def getSystemId(self): return self._parser.get_filename() # event handlers def start_element(self, name, attrs): self._cont_handler.startElement(name, AttributesImpl(attrs)) def end_element(self, name): self._cont_handler.endElement(name) def pi(self, data): pos = string.find(data, " ") if pos != -1: self._cont_handler.processingInstruction(data[ : pos], data[pos + 1 : ]) def data(self, data): self._cont_handler.characters(data) def sdata(self, text, entityname): # FIXME: does this make sense? self._cont_handler.characters(text) def error(self, msg): self._err_handler.error(SAXException(msg)) # --- def create_parser(*args, **kwargs): return apply(PySPParser, args, kwargs) # --- if __name__ == "__main__": from xml.sax.saxutils import XMLGenerator from xml.sax.handler import ErrorHandler p = create_parser() p.setContentHandler(XMLGenerator(open("bok.xml", "w"))) p.setErrorHandler(ErrorHandler()) p.setProperty(property_catalogs, ["/home/larsga/data/catalog"]) p.parse("/home/larsga/cvs-co/data/book/bok.sgml") From martin@loewis.home.cs.tu-berlin.de Thu Dec 28 15:53:54 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 28 Dec 2000 16:53:54 +0100 Subject: [XML-SIG] Using SP in Python In-Reply-To: (message from Lars Marius Garshol on 28 Dec 2000 12:59:37 +0100) References: Message-ID: <200012281553.QAA00673@loewis.home.cs.tu-berlin.de> > I've written a simple wrapper for the SP SGML parser's generic API > and also a SAX driver for that wrapper. The SAX driver probably > belongs in saxtools and will be placed there. Why not in xml.sax.drivers2? > The SP wrapper is perhaps better off as a separate project, but if > anyone feels it belongs in the XML-SIG, I'll be happy to reconsider. If distributed with PyXML, we'd probably need code in setup.py to detect presence of an acceptable SP installation. If that was available, I'm +0 for including it in PyXML, probably into xml.parsers. Regards, Martin From fdrake@acm.org Thu Dec 28 16:06:57 2000 From: fdrake@acm.org (Fred L. Drake) Date: Thu, 28 Dec 2000 11:06:57 -0500 Subject: [XML-SIG] Using SP in Python In-Reply-To: Message-ID: On 28 Dec 2000 12:59:37 +0100, Lars Marius Garshol wrote: > I've written a simple wrapper for the SP SGML parser's > generic API and > also a SAX driver for that wrapper. The SAX driver > probably belongs > in saxtools and will be placed there. The SP wrapper is > perhaps > better off as a separate project, but if anyone feels it > belongs in the > XML-SIG, I'll be happy to reconsider. This is great news! I'm not sure why the extension and driver belong in separate projects; shouldn't they be in the same project? The driver can be listed by name in the table used by xml.sax.make_parser(), but when the import fails it'll just keep going (not having the code available to check, I can make bold assertions! ;). > Appended are a sample application that emits ESIS, the C > module and the SAX driver, in that order. > > Comments of all kinds would be welcome. This reminds me that I have an XMLReader that works from ESIS input data. I plan to add it to xml.sax when I get back from the holidays. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From martin@loewis.home.cs.tu-berlin.de Thu Dec 28 16:46:22 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 28 Dec 2000 17:46:22 +0100 Subject: [XML-SIG] saxtools package In-Reply-To: (message from Lars Marius Garshol on 28 Dec 2000 12:47:28 +0100) References: Message-ID: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de> > I'll start working on the saxtools package once my book is done and > the new year begins. Meanwhile, I'll need to refer to it from the > book, and so it needs a package name. To me xml.saxtools seems like > the obvious solution. > > What say ye? Re-reading your list of things that will go into it (from 24 Oct): I think the extra drivers should be somewhere inside xml.sax, so that xml.sax.parse() can find them. Likewise, LexicalHandler and DTDHandler ought belong into xml.sax; they are interfaces, and the properties to set and retrieve them are there already. For the utilities, it seems that xml.saxtools was already accepted. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Dec 29 15:57:36 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 29 Dec 2000 16:57:36 +0100 Subject: [XML-SIG] Announcing PyXPath 1.2 Message-ID: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> I have now completed the first fully-functional version of a 4XPath parser, so PyXPath *should* work as a drop-in replacement of the bison/lex part of 4XPath; essentially, it offers a function pyxpath.Compile that has the same meaning as xml.xpath.Compile. It uses the Parsed* classes of 4XPath as-is, so no modification to these classes is necessary. The distribution is available from http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.2.tgz To introduce some abstraction from the specific classes, and from the fact that 4XPath uses bison token numbers in many places, I have defined an abstract interface to XPath, which is attached below. Unlike a former W3C effort, this API is currently designed towards "pluggable parsers", i.e. the implementation of the abstract syntax tree is separated from the parser engine. This interface currently does not at all attempt to support evaluation; thus it is orthogonal to Scott Boag's draft, which only supported evaluation but not creation of an XPath tree. I plan to extend that API to also support evaluation; contributions are welcome. Even though I managed to make the current 4XPath classes to appear as an implementation of that API, this conformance works so far only for the ExprFactory interface. According to the API, each object should have a number of attributes to allow navigation in the expression. Since 4XPath does not expose any attributes, I decided to come up with my own attribute names and types. I'd like to know potential improvements to that API before making 4XPath fully conforming. The API is IDL based, which is meant in the same way as in the DOM: there is a (yet to be specified) mapping to Python, which roughly works that way: - global constants are defined in the module xml.xpath. - DOMString means Unicode objects, although normal strings should be accepted were possible. - attributes are accessed as attributes; _get_ accessor functions are optional. Any comments are welcome. Regards, Martin module XPath{ typedef wstring DOMString; const unsigned short ABSOLUTE_LOCATION_PATH = 1; const unsigned short ABBREVIATED_ABSOLUTE_LOCATION_PATH = 2; const unsigned short RELATIVE_LOCATION_PATH = 3; const unsigned short ABBREVIATED_RELATIVE_LOCATION_PATH = 4; const unsigned short STEP_EXPR = 5; // STEP would conflict with Step in case const unsigned short NODE_TEST = 6; const unsigned short NAME_TEST = 7; const unsigned short BINARY_EXPR = 8; const unsigned short UNARY_EXPR = 9; const unsigned short PATH_EXPR = 10; const unsigned short ABBREVIATED_PATH_EXPR = 11; // filter '//' path const unsigned short FILTER_EXPR = 12; const unsigned short VARIABLE_REFERENCE = 13; const unsigned short LITERAL_EXPR = 14; const unsigned short NUMBER_EXPR = 15; const unsigned short FUNCTION_CALL = 16; interface Expr{ readonly attribute unsigned short exprType; }; interface AbsoluteLocationPath; interface AbbreviatedAbsoluteLocationPath; interface RelativeLocationPath; interface Step; interface AxisSpecifier; interface NodeTest; typedef sequence PredicateList, ExprList; interface NameTest; interface BinaryExpr; interface UnaryExpr; interface UnionExpr; interface PathExpr; interface FilterExpr; interface VariableReference; interface Literal; interface Number; interface FunctionCall; interface ExprFactory{ AbsoluteLocationPath createAbsoluteLocationPath(in RelativeLocationPath p); AbsoluteLocationPath createAbbreviatedAbsoluteLocationPath(in RelativeLocationPath p); RelativeLocationPath createRelativeLocationPath(in RelativeLocationPath left, in Step right); RelativeLocationPath createAbbreviatedRelativeLocationPath(in RelativeLocationPath left, in Step right); Step createStep(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); // . is represented as self::node(); .. as parent::node() Step createAbbreviatedStep(in boolean dotdot); // false for .; true for .. // An omitted axisname is created as CHILD; @ is created as ATTRIBUTE AxisSpecifier createAxisSpecifier(in unsigned short name); NodeTest createNodeTest(in unsigned short type); NameTest createNameTest(in DOMString prefix, in DOMString localName); BinaryExpr createBinaryExpr(in unsigned short operator, in Expr left, in Expr right); UnaryExpr createUnaryExpr(in Expr exp); PathExpr createPathExpr(in Expr filter, in Expr path); // filter '//' path PathExpr createAbbreviatedPathExpr(in Expr filter, in Expr path); FilterExpr createFilterExpr(in Expr filter, in Expr predicate); // the name must still contain the leading $ VariableReference createVariableReference(in DOMString name); Literal createLiteral(in DOMString literal); Number createNumber(in DOMString value); FunctionCall createFunctionCall(in DOMString name, in ExprList args); }; interface Parser{ Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step }; interface AbsoluteLocationPath:Expr{ /* '/' relative-opt, or '//' relative */ readonly attribute Expr relative; // step or relative path }; interface RelativeLocationPath:Expr{ readonly attribute Expr left; // step or relative path readonly attribute Step right; }; interface Step:Expr{ readonly attribute AxisSpecifier axis; readonly attribute NodeTest test; readonly attribute PredicateList predicates; }; const unsigned short ANCESTOR = 1; const unsigned short ANCESTOR_OR_SELF = 2; const unsigned short _ATTRIBUTE = 3; // attribute is a keyword const unsigned short CHILD = 4; const unsigned short DESCENDANT = 5; const unsigned short DESCENDANT_OR_SELF = 6; const unsigned short FOLLOWING = 7; const unsigned short FOLLOWING_SIBLING = 8; const unsigned short NAMESPACE = 9; const unsigned short PARENT = 10; const unsigned short PRECEDING = 11; const unsigned short PRECEDING_SIBLING = 12; const unsigned short SELF = 13; interface AxisSpecifier:Expr{ readonly attribute unsigned short name; }; const unsigned short COMMENT = 1; const unsigned short TEXT = 2; const unsigned short PROCESSING_INSTRUCTION = 3; const unsigned short NODE = 4; interface NodeTest:Expr{ readonly attribute unsigned short test; readonly attribute DOMString literal; // only for PROCESSING_INSTRUCTION }; interface NameTest:Expr{ readonly attribute DOMString prefix; // may be null readonly attribute DOMString localName; // may be "*" }; const unsigned short BINOP_OR = 1; const unsigned short BINOP_AND = 2; const unsigned short BINOP_EQ = 3; const unsigned short BINOP_NEQ = 4; const unsigned short BINOP_LT = 5; const unsigned short BINOP_GT = 6; const unsigned short BINOP_LE = 7; const unsigned short BINOP_GE = 8; const unsigned short BINOP_PLUS = 9; const unsigned short BINOP_MINUS = 10; const unsigned short BINOP_TIMES = 11; const unsigned short BINOP_DIV = 12; const unsigned short BINOP_MOD = 13; const unsigned short BINOP_UNION = 14; interface BinaryExpr:Expr{ readonly attribute unsigned short operator; readonly attribute Expr left,right; }; // can be only the unary minus interface UnaryExpr:Expr{ readonly attribute Expr exp; }; interface PathExpr:Expr{ readonly attribute Expr filter; readonly attribute Expr path; }; interface FilterExpr:Expr{ readonly attribute Expr filter; readonly attribute Expr predicate; }; interface VariableReference:Expr{ readonly attribute DOMString name; }; interface Literal:Expr{ readonly attribute DOMString value; }; interface Number:Expr{ readonly attribute double value; }; interface FunctionCall:Expr{ readonly attribute DOMString name; readonly attribute ExprList args; }; }; From martin@loewis.home.cs.tu-berlin.de Fri Dec 29 16:03:50 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 29 Dec 2000 17:03:50 +0100 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: <200012270120.SAA02777@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200012270120.SAA02777@localhost.localdomain> Message-ID: <200012291603.RAA01507@loewis.home.cs.tu-berlin.de> > There is no such beast. These were originally intended to be purely internal > objects. If we decided to expose them as an API, we'd want to decide on the > naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and > document them properly. If you could follow the IDL API I just posted, renaming the classes would not be necessary: the "official" way to create instances of those classes would be to use the factory; the official way to find out what kind of expression you have would be to look at the exprType attribute. > For now, your best bet is to have a look at XPath/Parsed* in 4Suite > (and also check out Xslt/Parsed* for the associated Pattern machine > objects). Given that the Pattern grammar is only slightly larger than the XPath grammar: Would it be useful to provide only a single interface, with the option of either parsing a LocationPath or a Pattern? At least when using YAPPS, it is not difficult to have two start symbols in a single grammar. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Dec 29 16:26:27 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 29 Dec 2000 17:26:27 +0100 Subject: [XML-SIG] 4XPath: parsing Unicode string In-Reply-To: <200011252105.GAA01817@dhcp198.grad.sccs.chukyo-u.ac.jp> (message from Tamito KAJIYAMA on Sun, 26 Nov 2000 06:05:21 +0900) References: <200011252105.GAA01817@dhcp198.grad.sccs.chukyo-u.ac.jp> Message-ID: <200012291626.RAA01579@loewis.home.cs.tu-berlin.de> > I have a problem that I cannot pass a Unicode string containing > Japanese characters to the 4XPath parser. Following reproduces > the problem: Please have a look at the PyXPath package I've just released. I noticed that there is still an incompatibility to 4XPath: it only allows to compile LocationPath expressions, not full expressions. Putting full Unicode into the expression is no problem, though: >>> print pyxpath.Compile(u'para[substring-after("2000\u5E7410\u670830\u65E5", "\u6708")]') If you attempt to use that package in addition to 4XPath, please let me know. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Fri Dec 29 17:54:46 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 29 Dec 2000 18:54:46 +0100 Subject: [XML-SIG] Better pyexpat backtraces In-Reply-To: (message from Ken MacLeod on 26 Dec 2000 17:10:38 -0600) References: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de> Message-ID: <200012291754.SAA02094@loewis.home.cs.tu-berlin.de> > But that is correct and the intended error message, right? Passing a > DocumentHandler to a SAX2 parser will result in characters() being > called with "only" two arguments when a SAX1 handler expects four. Right. Before, you'd get an error message saying self.feed(buffer) File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed self._parser.Parse(data, isFinal) TypeError: not enough arguments; expected 4, got 2 That was confusing; it would suggest that there is an error in the call to Parse. It's just the traceback that has changed. Regards, Martin From uche.ogbuji@fourthought.com Fri Dec 29 19:07:01 2000 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Fri, 29 Dec 2000 12:07:01 -0700 Subject: [XML-SIG] PyXPath 1.1 References: <200012270120.SAA02777@localhost.localdomain> <200012291603.RAA01507@loewis.home.cs.tu-berlin.de> Message-ID: <3A4CE0D5.1C51C02D@fourthought.com> "Martin v. Loewis" wrote: > If you could follow the IDL API I just posted ^^^^^^^^^^^^^ No dice. XML-SIG has disappeared again. I haven't received anything since yesterday afternoon. The archives aren't showing anything either. What's up with the mailing lists? -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From larsga@garshol.priv.no Fri Dec 29 19:29:58 2000 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 29 Dec 2000 20:29:58 +0100 Subject: [XML-SIG] saxtools package In-Reply-To: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de> References: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de> Message-ID: * Martin v. Loewis | | Re-reading your list of things that will go into it (from 24 Oct): I | think the extra drivers should be somewhere inside xml.sax, so that | xml.sax.parse() can find them. If that means that they also go into the Python distribution, then I'm perfectly happy with that. | Likewise, LexicalHandler and DTDHandler ought belong into xml.sax; | they are interfaces, and the properties to set and retrieve them are | there already. I agree. This should be in xml.sax. | For the utilities, it seems that xml.saxtools was already accepted. Good! Then I'll start checking things in as soon as I can. (I have many of the bits and pieces already.) --Lars M. From martin@loewis.home.cs.tu-berlin.de Fri Dec 29 22:56:26 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 29 Dec 2000 23:56:26 +0100 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: <3A4CE0D5.1C51C02D@fourthought.com> (message from Uche Ogbuji on Fri, 29 Dec 2000 12:07:01 -0700) References: <200012270120.SAA02777@localhost.localdomain> <200012291603.RAA01507@loewis.home.cs.tu-berlin.de> <3A4CE0D5.1C51C02D@fourthought.com> Message-ID: <200012292256.XAA00714@loewis.home.cs.tu-berlin.de> > No dice. XML-SIG has disappeared again. I haven't received anything > since yesterday afternoon. The archives aren't showing anything either. > > What's up with the mailing lists? Apparently, python.org ran out of disk space. Barry mentioned that it should be fixed now, but it apparently isn't. I got some messages back (mainly to python-help); when it comes back and doesn't have my messages, I'll have to repost. Regards, Martin From ken@bitsko.slc.ut.us Sat Dec 30 18:01:28 2000 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 30 Dec 2000 12:01:28 -0600 Subject: [XML-SIG] PyXPath 1.1 In-Reply-To: "Martin v. Loewis"'s message of "Fri, 29 Dec 2000 17:03:50 +0100" References: <200012270120.SAA02777@localhost.localdomain> <200012291603.RAA01507@loewis.home.cs.tu-berlin.de> Message-ID: "Martin v. Loewis" writes: > > There is no such beast. These were originally intended to be > > purely internal objects. If we decided to expose them as an API, > > we'd want to decide on the naming (Martin doesn't like the > > "Parsed" prefixes, I'm +0 on killing them) and document them > > properly. > > If you could follow the IDL API I just posted, renaming the classes > would not be necessary: the "official" way to create instances of > those classes would be to use the factory; the official way to find > out what kind of expression you have would be to look at the > exprType attribute. Yes, the classes/attributes in that IDL look excellent. > > For now, your best bet is to have a look at XPath/Parsed* in > > 4Suite (and also check out Xslt/Parsed* for the associated Pattern > > machine objects). > > Given that the Pattern grammar is only slightly larger than the > XPath grammar: Would it be useful to provide only a single > interface, with the option of either parsing a LocationPath or a > Pattern? At least when using YAPPS, it is not difficult to have two > start symbols in a single grammar. Yes, I think it would be very useful to reuse the same interface. -- Ken From martin@loewis.home.cs.tu-berlin.de Sun Dec 31 08:10:12 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 31 Dec 2000 09:10:12 +0100 Subject: [XML-SIG] saxtools package In-Reply-To: (message from Lars Marius Garshol on 29 Dec 2000 20:29:58 +0100) References: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de> Message-ID: <200012310810.JAA00707@loewis.home.cs.tu-berlin.de> > | Re-reading your list of things that will go into it (from 24 Oct): I > | think the extra drivers should be somewhere inside xml.sax, so that > | xml.sax.parse() can find them. > > If that means that they also go into the Python distribution, then I'm > perfectly happy with that. That's a different matter. Both Python and PyXML support xml.sax.parse, but only PyXML offers a choice of parsers. Regards, Martin