From ken@bitsko.slc.ut.us  Fri Dec  1 00:09:07 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 30 Nov 2000 18:09:07 -0600
Subject: [XML-SIG] 0.5.1 and 0.6.2
In-Reply-To: Michael Sobolev's message of "Fri, 1 Dec 2000 00:22:42 +0300"
References: <20001201002242.A5950@transas.com>
Message-ID: <x7y9y1f1ek.fsf@bitsko.slc.ut.us>

Michael Sobolev <mss@transas.com> writes:

[Michael already pointed out he's using DOM, but I already had this
written in case anyone finds it useful.]

I have a SAX client that I've got working well with SAX1 and SAX2 so
far.  Having Unicode strings caught me in one place, and I needed to
wrap it with str() to make it work in that context, but I have no
general tips if that goes wrong for anyone.

In my SAX handler module, in the file global scope, I do this:

  import sys

  if hasattr(sys, 'version_info'):
      isPy2 = 1
      import xml.sax
      from xml.sax.handler import feature_namespaces
      from xml.sax import SAXException
  else:
      isPy2 = 0
      from xml.sax import saxexts
      from xml.sax.saxlib import SAXException

When creating the parser I do this:

        if isPy2:
            self.parser = xml.sax.make_parser()
            self.parser.setFeature(feature_namespaces, 0)
            self.parser.setContentHandler(self)
        else:
            self.parser = saxexts.make_parser()
            self.parser.setDocumentHandler(self)

I'm parsing files, so later I do:

        if isPy2:
            self.parser.parse(file)
        else:
            self.parser.parseFile(file)

While working with attributes in startElement(), I do this to get a
list of attribute names to use as indexes into the attributes:

        if isPy2:
            att_names = atts.keys()
        else:
            att_names = []
            for ii in range(0, len(atts)):
                att_names.append(atts[ii])

And for characters(), I do this:

    def characters(self, ch, start=0, length=-1):
        if length == -1:   # SAX2
            self.text = self.text + ch
        else:
            self.text = self.text + ch[start:start+length]

I do my own namespace processing (more for convenience than for
SAX1/SAX2 differences), so that makes start/endElement() usable for
both SAX1 and SAX2.  Otherwise you'll need both start/endElement() and
start/endElementNS().  If you do use namespace processing, you need no
special code in startElement() (as above) because you know only
startElement() will be called from SAX1 and startElementNS() will be
called from SAX2.

  -- Ken


From calvin@cs.uni-sb.de  Fri Dec  1 00:16:48 2000
From: calvin@cs.uni-sb.de (Bastian Kleineidam)
Date: Fri, 1 Dec 2000 01:16:48 +0100 (CET)
Subject: [XML-SIG] 0.5.1 and 0.6.2
In-Reply-To: <20001201002242.A5950@transas.com>
Message-ID: <Pine.LNX.4.21.0012010113580.27935-100000@earth.cs.uni-sb.de>

>Can anybody give a hint on how to correctly write applications
>that may need to work with both versions of python-xml?
Make a compatibility layer with try: except: statements.

I am using this:
#-----8<------
try:
    try:
        # xml interface (DOM-2, SAX-2) as found in PyXML 0.6.2
        from xml.dom.ext.reader.Sax2 import Reader
        def _get_dom(filename):
            return Reader(validate=1).fromStream(open(filename))
    except ImportError:
        # xml interface (DOM-2, SAX-2) as found in PyXML 0.6.1
        from xml.dom.ext.reader.Sax2 import FromXmlFile
        def _get_dom(filename):
            return FromXmlFile(filename, validate=1)

    def get_dom(filename):
        # change dir to find DTD file
        import os
        olddir = os.getcwd()
        os.chdir(os.path.dirname(filename))
        dom = _get_dom(filename)
        os.chdir(olddir)
        return dom

    def get_attr(attrs, name):
        if attrs.has_key(('', name)):
            return attrs[('', name)]._get_value()
    def get_dom_attrs(dom):
        return dom.documentElement._get_attributes()
    def get_node_attrs(node):
        return node._get_attributes()
    def get_node_name(node):
        return node._get_nodeName()
    def get_childnodes(node):
        return node._get_childNodes()
    def node_value(node):
        from xml.dom.Node import Node
        if node._get_nodeType() == Node.TEXT_NODE:
            return node._get_nodeValue()
        s = ""
        for n in node._get_childNodes():
            s = s + node_value(n)
        return s

except ImportError:
    # xml interface (DOM-1, SAX-1) as found in PyXML 0.5.x
    from xml.sax import saxexts,saxutils
    from xml.dom.sax_builder import SaxBuilder
    _parser = saxexts.XMLValParserFactory.make_parser()
    _parser.setErrorHandler(saxutils.ErrorPrinter())
    def get_dom(filename):
        _dom_builder = SaxBuilder()
        _parser.setDocumentHandler(_dom_builder)
        _parser.parse(filename)
        _parser.reset()
        return _dom_builder.document
    def get_attr(attrs, name):
        if attrs.has_key(name):
            return attrs[name].get_value()
    def get_dom_attrs(dom):
        return dom.get_documentElement().get_attributes()
    def get_node_attrs(node):
        return node.get_attributes()
    def get_node_name(node):
        return node.get_name()
    def get_childnodes(node):
        return node.get_childNodes()
    def node_value(node):
        from xml.dom.core import TEXT_NODE
        if node.get_nodeType() == TEXT_NODE:
            return node.get_nodeValue()
        s = ""
        for n in node.get_childNodes():
            s = s + node_value(n)
        return s

def get_node_attr(node, name):
    return get_attr(get_node_attrs(node), name)
#---8<----

Bastian


From uche.ogbuji@fourthought.com  Fri Dec  1 11:16:03 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 1 Dec 2000 04:16:03 -0700
Subject: [XML-SIG] ANN: 4Suite 0.10.0
Message-ID: <200012011116.EAA09752@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                             4Suite 0.10.2
                      ---------------------------
   Open source tools for standards-based XML, DOM, XPath, XSLT, RDF
       XPointer, XLink and object-database development in Python

                           http://4Suite.org


4Suite is a collection of Python tools for XML processing and object
database management.  An integrated packaging of several formerly
separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS,
4XPointer, 4XLink and DbDOM.

News
----

  * RDF: Added a driver based on shelve (DB/DBM)
  * ODS: Added a driver based on anydbm
  * Fix format-number support and implement in C
  * Improve Unicode and other encoding support
  * Documentation updates
  * Many misc optimizations
  * Many misc bug-fixes


More info and Obtaining 4Suite
------------------------------

Please see

        http://4Suite.org

From where you can download source, Windows and Linux binaries.

4Suite is distributed under a license similar to that of the
Apache Web Server.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +01 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Dec  1 11:19:34 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 1 Dec 2000 04:19:34 -0700
Subject: [XML-SIG] ANN: 4Suite Server 0.10.0
Message-ID: <200012011119.EAA09842@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                          4Suite Server 0.10.0
                      ----------------------------
         An open source XML data server based on open standards
               implemented using 4Suite and other tools


                  http://FourThought.com/4SuiteServer
                           http://4Suite.org


4Suite Server is a platform for handling XML processing needs in
application development.  It is an XML data repository with a
rules-based engine.  It supports DOM access, XSLT transformation, XPath
and RDF-based indexing and query, XLink resolution and many other XML
services.  It also supports other related services such as distributed
transactions, and access control lists.  It supports remote,
cross-platform and cross-language access through CORBA and other request
protocols to be added shortly.

4Suite Server is not designed to be a full-blown application server.
It provides highly-specialized services for XML processing that can be
used with other application servers.

4Suite Server is open-source and free to download.  Priority support
and customization is available from Fourthought, Inc.  For more
information on this, see the http://FourThought.com, or contact
Fourthought at info@fourthought.com or +1 303 583 9900

The 4Suite Server home page is

http://FourThought.com/4SuiteServer

From where you can download the software itself or an executive summary
thereof, read usage scenarios and find other information.


From yang13@126.com  Fri Dec  1 16:21:05 2000
From: yang13@126.com (=?ISO-8859-1?Q?=D0=A1=D1=EE?=)
Date: Sat, 2 Dec 2000 0:21:5 +0800
Subject: [XML-SIG] (no subject)
Message-ID: <G4WD9T00.I6I@public.nn.gx.cn>

XML-SIG=A3=AC=C4=FA=BA=C3=A3=A1


                    =D6=C2
=C0=F1=A3=A1

            =D0=A1=D1=EE
            yang13@126.com


From fdrake@acm.org  Fri Dec  1 16:32:19 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 1 Dec 2000 11:32:19 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200011261856.TAA00929@loewis.home.cs.tu-berlin.de>
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de>
 <14880.4503.540928.777303@cj42289-a.reston1.va.home.com>
 <200011261856.TAA00929@loewis.home.cs.tu-berlin.de>
Message-ID: <14887.53907.908244.249743@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > Good, I'll add this to PyXML first, and then move it over to Python
 > later. Please note that DOMException is already defined in
 > xml.dom(.__init__) of PyXML, so it is merely a matter of adding the
 > derived classes, and adding them in 4DOM.

  Have you had time to work on this?  Would you like me to take a look
at it?  I'm not familiar with the 4DOM code, but would like to see the
exceptions defined and available from xml.dom soon.

I said:
 >   I'd also like to see the .nodeType values defined this way, and
 >   shared by the implementations.

and Martin responded:
 > It's more difficult with those, since the spec says they are defined
 > inside of the Node interface. We could deviate from the DOM spec in

  Perhaps we should provide a Node class in xml.dom that defines just
those values, and implementations can inherit that or duplicate the
values in their own Node implementation.  Nothing other than the
enumeration values should be defined in xml.dom.Node (except maybe a
docstring).


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Fri Dec  1 16:44:40 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 1 Dec 2000 11:44:40 -0500 (EST)
Subject: [XML-SIG] minidom/pulldom connection
In-Reply-To: <200011232203.XAA01220@loewis.home.cs.tu-berlin.de>
References: <14876.11936.725389.726400@cj42289-a.reston1.va.home.com>
 <200011232203.XAA01220@loewis.home.cs.tu-berlin.de>
Message-ID: <14887.54648.796431.588740@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > From the conformance point of view, minidom is *wrong* by not raising
 > exceptions in appropriate places. However, I doubt anybody fixing this
 > would start with pulldom.

  I think this is mostly a minidom problem and not a pulldom issue.

[in response to my proposal to pass a Document factory to PullDOM:]
 > I don't see the need to provide this kind of extensibility until
 > somebody actually wants to implement an alternative minidom on top of
 > pulldom. However, if this is added now, I'd agree with Mike that it
 > would be better to support DOMImplementation objects in minidom.

  I'll point out that if anyone should want to do this, they'll have
to hack pulldom to do it, and not be able to share their DOM
implementation until pulldom is updated at least in PyXML.  I think
this should be done sooner rather than later.  I agree that a
DOMImplementation would be better than some other Document factory.
  My preliminary DOMImplementation code for minidom is not correct
(but works in context); I'll try and fix it this weekend.  pulldom
will require some corresponding changes.  (The documentElement on
created documents is supposed to already be created, as well as the
doctype.  I'll write up some notes on what I've found there for things
that the recommendation doesn't seem to say.)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Fri Dec  1 20:47:04 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 01 Dec 2000 13:47:04 -0700
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Fri, 01 Dec 2000 11:32:19 EST." <14887.53907.908244.249743@cj42289-a.reston1.va.home.com>
Message-ID: <200012012047.NAA10970@localhost.localdomain>

> I said:
>  >   I'd also like to see the .nodeType values defined this way, and
>  >   shared by the implementations.
> 
> and Martin responded:
>  > It's more difficult with those, since the spec says they are defined
>  > inside of the Node interface. We could deviate from the DOM spec in
> 
>   Perhaps we should provide a Node class in xml.dom that defines just
> those values, and implementations can inherit that or duplicate the
> values in their own Node implementation.  Nothing other than the
> enumeration values should be defined in xml.dom.Node (except maybe a
> docstring).

Well, this would interfere pretty badly with 4DOM.  There is an 
xml.dom.Node.py file in 4DOM and having a Node class in the __init__ would 
cause problems with the import.

What's wrong with

from xml.dom.Node import Node

n.nodeType == Node.ELEMENT_NODE


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From mss@transas.com  Fri Dec  1 20:58:34 2000
From: mss@transas.com (Michael Sobolev)
Date: Fri, 1 Dec 2000 23:58:34 +0300
Subject: [XML-SIG] 0.5.1 and 0.6.2
In-Reply-To: <20001201002242.A5950@transas.com>; from mss@transas.com on Fri, Dec 01, 2000 at 12:22:42AM +0300
References: <20001201002242.A5950@transas.com>
Message-ID: <20001201235834.A31966@transas.com>

On Fri, Dec 01, 2000 at 12:22:42AM +0300, Michael Sobolev wrote:
> I have a small problem here. :)

Thank you all.  I am going to try to implement some of the given advices. :)

Regards,

--
Misha


From fdrake@acm.org  Fri Dec  1 21:16:34 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 1 Dec 2000 16:16:34 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012012047.NAA10970@localhost.localdomain>
References: <fdrake@acm.org>
 <14887.53907.908244.249743@cj42289-a.reston1.va.home.com>
 <200012012047.NAA10970@localhost.localdomain>
Message-ID: <14888.5426.542871.817456@cj42289-a.reston1.va.home.com>

uche.ogbuji@fourthought.com writes:
 > Well, this would interfere pretty badly with 4DOM.  There is an 
 > xml.dom.Node.py file in 4DOM and having a Node class in the __init__ would 
 > cause problems with the import.

  That sucks.

 > What's wrong with
 > 
 > from xml.dom.Node import Node
 > 
 > n.nodeType == Node.ELEMENT_NODE

  I was hoping for a nice simple way of sharing the values, and a
common place to pick them up.  The latter is more important for client
code I think.  If we have DOMException & friends as:

     xml.dom.DOMException
     xml.dom.DOMStringSizeError
     xml.dom.HierarchyRequestError
     ...
     xml.dom.DOMSTRING_SIZE_ERR
     ...

then it seems we also want to be able to access the .nodeType codes
according to the spec from the same location:

     xml.dom.Node
     xml.dom.Node.ELEMENT_NODE
     ...

  I can live with the .nodeType values being directly in the
__init__.py, so we have:

     xml.dom.ELEMENT_NODE
     ...

That just means we can't provide a Node class in a common place that
provides the constants for *_NODE values.  Not a huge problem, but not
as nice as I'd hoped for.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From akuchlin@mems-exchange.org  Fri Dec  1 23:33:42 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 01 Dec 2000 18:33:42 -0500
Subject: [XML-SIG] Two minidom patches
Message-ID: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us>

I've submitted two patches to minidom.py using the Python project's
patch manager.  (Should such patches be submitted to the PyXML patch
manager, or the Python one?)

https://sourceforge.net/patch/?func=detailpatch&patch_id=102485&group_id=5470
[ Patch #102485 ] minidom.py: Check for legal children

https://sourceforge.net/patch/?func=detailpatch&patch_id=102492&group_id=5470
[ Patch #102492 ] minidom/pulldom: remove nodes already in the tree

Anyone want to review them?  

--amk


From uche.ogbuji@fourthought.com  Sat Dec  2 00:01:24 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 01 Dec 2000 17:01:24 -0700
Subject: [XML-SIG] XML 2000 anyone?
Message-ID: <200012020001.RAA11619@localhost.localdomain>

Just wanted to say that if any of you lot will be at XML 2000, do come by 
Fourthought's booth (#900).  We'd love to put some more faces to names.  We'll 
be demoing the soon-to-be-relaunched OpenTechnology.org, which has been 
completely re-architected to run on top of 4Suite Server.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Sat Dec  2 00:11:00 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 1 Dec 2000 19:11:00 -0500 (EST)
Subject: [XML-SIG] Two minidom patches
In-Reply-To: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us>
References: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us>
Message-ID: <14888.15892.801277.516501@cj42289-a.reston1.va.home.com>

Andrew Kuchling writes:
 > I've submitted two patches to minidom.py using the Python project's
 > patch manager.  (Should such patches be submitted to the PyXML patch
 > manager, or the Python one?)
 > 
 > https://sourceforge.net/patch/?func=detailpatch&patch_id=102485&group_id=5470
 > [ Patch #102485 ] minidom.py: Check for legal children
 > 
 > https://sourceforge.net/patch/?func=detailpatch&patch_id=102492&group_id=5470
 > [ Patch #102492 ] minidom/pulldom: remove nodes already in the tree
 > 
 > Anyone want to review them?  

  I'll be glad to take a look at them this weekend.  Did you check to
see if they're compatible with the patch to minidom/pulldom I have in
the Python PM?  If not, I'll integrate them if they look good, and
check them in if no one objects to the combined patch.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From akuchlin@mems-exchange.org  Sat Dec  2 00:19:12 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 1 Dec 2000 19:19:12 -0500
Subject: [XML-SIG] Two minidom patches
In-Reply-To: <14888.15892.801277.516501@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Dec 01, 2000 at 07:11:00PM -0500
References: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us> <14888.15892.801277.516501@cj42289-a.reston1.va.home.com>
Message-ID: <20001201191912.B28955@kronos.cnri.reston.va.us>

On Fri, Dec 01, 2000 at 07:11:00PM -0500, Fred L. Drake, Jr. wrote:
>  I'll be glad to take a look at them this weekend.  Did you check to
>see if they're compatible with the patch to minidom/pulldom I have in
>the Python PM?  If not, I'll integrate them if they look good, and

No.  I can check if they collide and reconcile them if you like.
I'm most uncertain about the pulldom changes, so it's probably best to
look *very* carefully at those bits.

--amk


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  2 07:54:29 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 2 Dec 2000 08:54:29 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <14887.53907.908244.249743@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de>
 <14880.4503.540928.777303@cj42289-a.reston1.va.home.com>
 <200011261856.TAA00929@loewis.home.cs.tu-berlin.de> <14887.53907.908244.249743@cj42289-a.reston1.va.home.com>
Message-ID: <200012020754.IAA00799@loewis.home.cs.tu-berlin.de>

>   Have you had time to work on this?  Would you like me to take a look
> at it?  I'm not familiar with the 4DOM code, but would like to see the
> exceptions defined and available from xml.dom soon.

Please have a look at the current PyXML CVS. To copy the code into the
Python core, some work is probably necessary on the exception message
strings - unless you also want to copy en_US.py.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  2 08:03:34 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 2 Dec 2000 09:03:34 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012012047.NAA10970@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012012047.NAA10970@localhost.localdomain>
Message-ID: <200012020803.JAA00847@loewis.home.cs.tu-berlin.de>

> Well, this would interfere pretty badly with 4DOM.  There is an
> xml.dom.Node.py file in 4DOM and having a Node class in the __init__
> would cause problems with the import.

What exactly would those problems be?

> What's wrong with
> 
> from xml.dom.Node import Node
> 
> n.nodeType == Node.ELEMENT_NODE

The problem is that we'd expost xml.dom.Node as a public class as
defined in the DOM, giving the impression that it is base of all other
DOM classes. Yet, when you do isinstance with a 4DOM object and that
xml.dom.Node, it will fail.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  2 08:05:04 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 2 Dec 2000 09:05:04 +0100
Subject: [XML-SIG] Two minidom patches
In-Reply-To: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us> (message from Andrew
 Kuchling on Fri, 01 Dec 2000 18:33:42 -0500)
References: <E141zgU-0007Xe-00@kronos.cnri.reston.va.us>
Message-ID: <200012020805.JAA00877@loewis.home.cs.tu-berlin.de>

> Anyone want to review them?  

I have just assigned them to me, and will take a look soon.

Regards,
Martin


From dieter@handshake.de  Sun Dec  3 22:32:43 2000
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 3 Dec 2000 23:32:43 +0100
Subject: [XML-SIG] 4XSLT: excessive time complexity (in stylesheet size)
Message-ID: <200012032232.XAA02858@lindm.dm>

I have tried to use 4XSLT to transform an XML/DocBook
document using Normal Walsh's stylesheets.

The stylesheet files have been read in and parsed in about 1 to 2 minutes.
However, the "stylesheet.setup" took about 15 CPU minutes,
before I interrupted it.

I repeated this twice.
In both cases, the interrupt was reported in the
function "getChildNodeIndex". It was looking the the
child index of about the 600. child in the top level child list with 1000
elements.

Apparently, there is at least quadratic time complexity in the number
of children.
"getChildNodeIndex" seems to be highly responsible for this behaviour.


Dieter


From Taylor.Johnd@emeryworld.com  Mon Dec  4 04:31:52 2000
From: Taylor.Johnd@emeryworld.com (Taylor, John D MWA)
Date: Mon, 4 Dec 2000 04:31:52 -0000
Subject: [XML-SIG] PyXML-0.6.2 install
Message-ID: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com>

Hi,

My name is John Taylor and I'm learning Python/XML via Sean McGrath's truly
great book. I just got permission to install Python on one of our Solaris
boxes, and ....

I just ran (after building PyXML-0.6.2) 'setup.py install' on my solaris
machine and got the following:

creating
/export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils
copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/__init__.py ->
/export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils
copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/iso8601.py ->
/export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils
copying build/lib.solaris-2.5.1-sun4d-2.0/_xmlplus/utils/qp_xml.py ->
/export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/utils
byte-compiling
/export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/__init__.p
y to __init__.pyc
ld.so.1: python: fatal: relocation error: file python: symbol fseeko:
referenced symbol not found

Funny thing is, each time I rerun the install, it gets to the next file,
then it dies (next run it died on __checkversion__.pyc, the next run it went
into ./dom and died after Attr.pyc. If I wanted to rerun this thing about
2000 times, I might just get all the way through.... So I thought I'd check
with you all, just in case someone had run into this before. 

Thanks in advance,
John Taylor
MQ-Series Support
BEST Consulting
Portland, OR 97210
(503)450-5984
taylor.johnd@emeryworld.com or jdta@uswest.net


From martin@loewis.home.cs.tu-berlin.de  Mon Dec  4 08:34:32 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 4 Dec 2000 09:34:32 +0100
Subject: [XML-SIG] PyXML-0.6.2 install
In-Reply-To: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com>
 (Taylor.Johnd@emeryworld.com)
References: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com>
Message-ID: <200012040834.JAA00677@loewis.home.cs.tu-berlin.de>

> My name is John Taylor and I'm learning Python/XML via Sean McGrath's truly
> great book. I just got permission to install Python on one of our Solaris
> boxes, and ....

How exactly did you install Python? What compiler did you use, what
configure options did you give, did you tell it to compile all C
modules as *shared* libraries? I recommend not to do the latter.

> ld.so.1: python: fatal: relocation error: file python: symbol fseeko:
> referenced symbol not found

That appears to be problem with the Python installation; apparently
importing distutils.util.byte_compile (or running it) results in an
import of an external module which cannot be loaded. It then somehow
still manages to generate the pyc file (although that may be
corrupted), and goes to the next file.

In any case, you probably will have to fix the Python installation, as
whatever the problem is, it probably will re-occur in another context
(other than installing PyXML).

Regards,
Martin


From matt@clondiag.com  Mon Dec  4 09:24:22 2000
From: matt@clondiag.com (Matthias Kirst)
Date: Mon, 04 Dec 2000 10:24:22 +0100
Subject: [XML-SIG] ODBC-XML-Interface
Message-ID: <3A2B62C6.2DF381C1@clondiag.com>

Hi folks,

Is there any Python SAX2-Driver available that parses Databases via
Python Database API.

Thanks,

Matthias, CLONDIAG


From Alexandre.Fayolle@logilab.fr  Mon Dec  4 09:57:32 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 4 Dec 2000 10:57:32 +0100 (CET)
Subject: [XML-SIG] location of the PyXML package with python 2.0
Message-ID: <Pine.LNX.4.21.0012041052220.16444-100000@leo.logilab.fr>

Quite a while ago, there was a discussion on how PyXML could avoid a clash
with the build in xml package in python 2.0 using some deep import voodoo
processing. I cannot recall what the outcome of the discussion was. 

In other words, to use PyXML with python2.0, is it necessary to use "from
_xmlplus.dom.ext.reader import Sax2" or can I safely write "from
xml.dom.ext.reader import Sax2" and assume the import voodoo magick is
performed when the xml package is imported ?

Thanks for the support.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From noreply@sourceforge.net  Mon Dec  4 13:55:50 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 4 Dec 2000 05:55:50 -0800
Subject: [XML-SIG] [Bug #124375] DbDom/4ODS bug : InitDomDb fails with Dbm backend
Message-ID: <200012041355.FAA30080@sf-web3.vaspecialprojects.com>

Bug #124375, was updated on 2000-Dec-04 05:55
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: ornicar
Assigned to : Nobody
Summary: DbDom/4ODS bug : InitDomDb fails with Dbm backend

Details: Maybe it's only a documentation issue, however running initDbDom gives the following stack trace:

initDomDb test_dom_db
Traceback (innermost last):
  File "/usr/bin/initDomDb", line 4, in ?
    from Ft.DbDom import initDomDb
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/initDomDb.py", line 18, in ?
    from Ft.Ods.StorageManager import Adapters
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/__init__.py", line 15, in ?
    from Ft.Ods.StorageManager.Adapters import g_driverModule
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/__init__.py", line 25, in ?
    SetDriver(os.environ['FTODS_DB_DRIVER'])
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/__init__.py", line 21, in SetDriver
    g_driverModule = __import__("Ft.Ods.StorageManager.Adapters." + g_driverName, globals(), locals(), [g_driverName])
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/Dbm.py", line 21, in ?
    import DbmMappings, DbmHelper
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/DbmHelper.py", line 19, in ?
    from Ft.Lib import DbmDatabase
  File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 154, in ?
    Database()
  File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 93, in __init__
    os.makedirs(self._dbpath)
  File "/usr/lib/python1.5/os.py", line 114, in makedirs
    mkdir(name, mode)
OSError: [Errno 2] No such file or directory: '/var/local/data/ftdatabase/'


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124375&group_id=6473


From noreply@sourceforge.net  Mon Dec  4 14:38:24 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 4 Dec 2000 06:38:24 -0800
Subject: [XML-SIG] [Bug #124380] DbDom: usage for initDomDb shows the wrong executable name
Message-ID: <200012041438.GAA02514@sf-web3.vaspecialprojects.com>

Bug #124380, was updated on 2000-Dec-04 06:38
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom: usage for initDomDb shows the wrong executable name

Details: Here's a patch:

--- initDomDb.py        Mon Dec  4 15:35:42 2000
+++ /usr/lib/python1.5/site-packages/Ft/DbDom/initDomDb.py      Mon Dec  4 15:36:48 2000
@@ -20,8 +20,8 @@
 from Ft.Ods.Tools import _4odb_create
 from Ft.Ods.Parsers.Odl import OdlParse
 
-usage = """initDbDom connString <odlFileLocation>
-   connString the strin to connect to the database with
+usage = """initDomDb connString <odlFileLocation>
+   connString the string to connect to the database with
    odlFileLocation dom.odl, defaults to directory of this file
    """


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124380&group_id=6473


From noreply@sourceforge.net  Mon Dec  4 14:48:39 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 4 Dec 2000 06:48:39 -0800
Subject: [XML-SIG] [Bug #124382] xml.dom.ext.PyExpat.Reader is useless as is.
Message-ID: <200012041448.GAA32017@sf-web2.i.sourceforge.net>

Bug #124382, was updated on 2000-Dec-04 06:48
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: mjpieters
Assigned to : Nobody
Summary: xml.dom.ext.PyExpat.Reader is useless as is.

Details: With the conversion from 4Suite to Python-XML PyExpat.Reader has not been fully stripped of FourThrought references. fromStream still has two references to Ft.Lib code.

Also, PyExpat isn't imported; the code importing it was stripped accidently, I think. Looking at 4Suite 0.10 the follwing code should be inserted before line 24:

try:
    #Python 2.0
    import pyexpat
except ImportError:
    #Python 1.x with PyXML
    from xml.parsers import pyexpat


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124382&group_id=6473


From noreply@sourceforge.net  Mon Dec  4 15:08:28 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 4 Dec 2000 07:08:28 -0800
Subject: [XML-SIG] [Bug #124387] DbDom + Dbm fails create_test.py
Message-ID: <200012041508.HAA32454@sf-web2.i.sourceforge.net>

Bug #124387, was updated on 2000-Dec-04 07:08
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom + Dbm fails create_test.py

Details: I tried running  /usr/doc/4Suite-0.10.0/DbDom/test_suite/create_test.py using Dbm as a database backend. It failed on the first commit() statement.

export FT_DATABASE_DIR=/home/alf/DbDom
export FTODS_DB_DRIVER=Dbm
export ODS_TEST_DB=ods_test

$ initDomDb ods_test
$ python create_test.py 
Instance
<DbDom Element Node at 81eb130: name='foo:bar' with 0 attributes and 0 children>
Node Type
1
Prefix
foo
local name
bar
Namespace URI
http://www.foo.com
tag name
foo:bar
ownerDocument
<Ft.DbDom.Dom.DocumentImp instance at 81eab48>
Traceback (innermost last):
  File "create_test.py", line 151, in ?
    test1()
  File "create_test.py", line 41, in test1
    tx.commit()
  File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 91, in commit
    self.checkpoint()
  File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 170, in checkpoint
    self.__storageManager.writeObject(o)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/__init__.py", line 78, in writeObject
    self._dba.writeObject(o)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/StorageManager/Adapters/Dbm.py", line 317, in writeObject
    self._db.insertInto(tableName)[str(oid)] = o._4ods_getFullTuple()
  File "/usr/lib/python1.5/site-packages/Ft/Lib/DbmDatabase.py", line 140, in insertInto
    db = anydbm.open(table_file, WRITEABLE)
  File "/usr/lib/python1.5/anydbm.py", line 80, in open
    raise error, "need 'c' or 'n' flag to open new db"
anydbm.error: need 'c' or 'n' flag to open new db


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124387&group_id=6473


From noreply@sourceforge.net  Mon Dec  4 16:55:07 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 4 Dec 2000 08:55:07 -0800
Subject: [XML-SIG] [Patch #102641] patch for bug #124387
Message-ID: <200012041655.IAA02336@sf-web2.i.sourceforge.net>

Patch #102641 has been updated. 

Project: pyxml
Category: None
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: patch for bug #124387

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102641&group_id=6473


From fdrake@acm.org  Mon Dec  4 18:12:30 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 4 Dec 2000 13:12:30 -0500 (EST)
Subject: [XML-SIG] location of the PyXML package with python 2.0
In-Reply-To: <Pine.LNX.4.21.0012041052220.16444-100000@leo.logilab.fr>
References: <Pine.LNX.4.21.0012041052220.16444-100000@leo.logilab.fr>
Message-ID: <14891.56974.204543.338570@cj42289-a.reston1.va.home.com>

Alexandre Fayolle writes:
 > In other words, to use PyXML with python2.0, is it necessary to use "from
 > _xmlplus.dom.ext.reader import Sax2" or can I safely write "from
 > xml.dom.ext.reader import Sax2" and assume the import voodoo magick is
 > performed when the xml package is imported ?

  Use the later.  This will raise ImportError if PyXML is not
installed.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fredrik@effbot.org  Mon Dec  4 19:07:58 2000
From: fredrik@effbot.org (Fredrik Lundh)
Date: Mon, 4 Dec 2000 20:07:58 +0100
Subject: [XML-SIG] sax parser leaks memory?
Message-ID: <001001c05e25$87e7cc10$3c6340d5@hagrid>

on my windows box, this little script runs out of memory
within 30 seconds or so...

import xml.sax, xml.sax.handler

class myHandler(xml.sax.handler.ContentHandler):
    def startElement(self, name, attrs):
        pass # print "START", name, attrs.items()
    def endElement(self, name):
        pass # print "END", name
    def characters(self, content):
        pass # print "DATA", content

while 1:
    p = xml.sax.make_parser()
    p.setContentHandler(myHandler())
    p.feed("<xml>hello</xml>")
    p.close()
    del p

what am I doing wrong?  or is this what I think it is...

</F>


From fdrake@acm.org  Mon Dec  4 19:19:34 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 4 Dec 2000 14:19:34 -0500 (EST)
Subject: [XML-SIG] confusability ...
Message-ID: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com>

  I've been poring over the DOM spec the last few days.  Now, I'm
confused.  ;)
  When the recommendation refers to the "name" of a node, does it
refer to the qualified name?  From the text, I'd take it that I should
be looking at "prefix:localName" when it says "name" -- is that
correct?  Or should I only be thinking of this as localName?
  Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From dieter@handshake.de  Mon Dec  4 20:04:21 2000
From: dieter@handshake.de (Dieter Maurer)
Date: Mon, 4 Dec 2000 21:04:21 +0100 (CET)
Subject: [XML-SIG] PyXML-0.6.2 install
In-Reply-To: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com>
References: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com>
Message-ID: <14891.63685.106639.569043@lindm.dm>

Taylor, John D MWA writes:
 > /export/home/jdtaylor/python/lib/python2.0/site-packages/_xmlplus/__init__.p
 > y to __init__.pyc
 > ld.so.1: python: fatal: relocation error: file python: symbol fseeko:
 > referenced symbol not found
You have a Python compiled for large file support (>= Solaris 2.6).
You try to run it on a systems without "fseeko" in the standard
library (< Solaris 2.6).

Your options:

  *  find a Python binary compiled for Solaris 2.5 or below

  *  fetch the Python source and compile it yourself
     (is very easy)

  *  upgrade your Solaris


Dieter


From fredrik@effbot.org  Mon Dec  4 20:46:53 2000
From: fredrik@effbot.org (Fredrik Lundh)
Date: Mon, 4 Dec 2000 21:46:53 +0100
Subject: [XML-SIG] Re: sax parser leaks memory?
Message-ID: <000501c05e33$573184e0$3c6340d5@hagrid>

I wrote:
> on my windows box, this little script runs out of memory
> within 30 seconds or so...

here's another example:

from xml.parsers import expat

while 1:
    p = expat.ParserCreate()

</F>


From martin@loewis.home.cs.tu-berlin.de  Mon Dec  4 23:06:07 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 00:06:07 +0100
Subject: [XML-SIG] ODBC-XML-Interface
In-Reply-To: <3A2B62C6.2DF381C1@clondiag.com> (message from Matthias Kirst on
 Mon, 04 Dec 2000 10:24:22 +0100)
References: <3A2B62C6.2DF381C1@clondiag.com>
Message-ID: <200012042306.AAA00767@loewis.home.cs.tu-berlin.de>

> Is there any Python SAX2-Driver available that parses Databases via
> Python Database API.

I guess the answer to that question is "no"; I couldn't really tell
what "parsing a database" would mean when it comes to XML files. An
XML file is a byte sequence in some specific format, and a parser
analyses its structure. A database typically is a byte sequence (or
several of them) in a totally different structure, and an DBMS is used
to access the bytes.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Dec  4 23:04:01 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 00:04:01 +0100
Subject: [XML-SIG] location of the PyXML package with python 2.0
In-Reply-To: <Pine.LNX.4.21.0012041052220.16444-100000@leo.logilab.fr>
 (message from Alexandre Fayolle on Mon, 4 Dec 2000 10:57:32 +0100
 (CET))
References: <Pine.LNX.4.21.0012041052220.16444-100000@leo.logilab.fr>
Message-ID: <200012042304.AAA00766@loewis.home.cs.tu-berlin.de>

> Quite a while ago, there was a discussion on how PyXML could avoid a
> clash with the build in xml package in python 2.0 using some deep
> import voodoo processing. I cannot recall what the outcome of the
> discussion was.

The voodoo magic was applied, "import xml.something" will import PyXML
if installed, and Python 2.0 xml otherwise. See Python's
xml/__init__.py if you want to know how this exactly works - there
isn't much magic behind it, really.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Dec  4 23:15:19 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 00:15:19 +0100
Subject: [XML-SIG] PyXML-0.6.2 install
In-Reply-To: <14891.63685.106639.569043@lindm.dm> (message from Dieter Maurer
 on Mon, 4 Dec 2000 21:04:21 +0100 (CET))
References: <B1F9E0D87882D21182F8006094519AD80359367A@mwabs021.emeryworld.com> <14891.63685.106639.569043@lindm.dm>
Message-ID: <200012042315.AAA00833@loewis.home.cs.tu-berlin.de>

> You have a Python compiled for large file support (>= Solaris 2.6).
> You try to run it on a systems without "fseeko" in the standard
> library (< Solaris 2.6).
> 
> Your options:
> 
>   *  find a Python binary compiled for Solaris 2.5 or below
> 
>   *  fetch the Python source and compile it yourself
>      (is very easy)
> 
>   *  upgrade your Solaris

Thanks for this clear analysis (although I'd like confirmation from
the original poster that this is indeed the problem). Perhaps you can
post it on the Python 2.0 MoinMoin?

Regards,
Martin


From Reza Naima <reza@reza.net>  Mon Dec  4 23:40:56 2000
From: Reza Naima <reza@reza.net> (Reza Naima)
Date: Mon, 4 Dec 2000 15:40:56 -0800
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
Message-ID: <20001204154056.K25116@reza.net>

I'm using PyXML to parse an XML document, modify it, and spit it back
out.  I'm having a lame problem.  It seems as if this third-party
software is not working properly, and I need to work around it.  Their
problem is that they have an element that looks like

<element attribute1='false' attribute2='false' attribute3='false'>

well, after I parse it, I will occasionally change the attribute1 to be
'true'.  after generating the XML from the DOM, It prints it out like
this :

<element attribute2='false' attribute3='false' attribute1='true'>

Now, the 3rd party software is broken and rather than looking for
attribute1, it just assumes that attribute1 is the first attribute, and
mistakenly reads it as 'false'.  (it's actually attribute2 that it's
reading).

Now, there are to work-arounds... First off, would there be a way for me
to guarantee that attribute1 is first on the list of attributes for that
element. 

The other work-around is to get rid of attribute2 and attribute3.  This
workes, but it seems as PyXML looks at the DTD spec, notices that they
are missing, and fills them in.  So, I'de like to find a way to get
PyXML to ignore the DTD.

Are either of these options possible?  I've started going through the
source, but it's getting uglier and uglier..

Thanks,
Reza


From martin@loewis.home.cs.tu-berlin.de  Mon Dec  4 23:40:14 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 00:40:14 +0100
Subject: [XML-SIG] confusability ...
In-Reply-To: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <14891.60998.848950.528003@cj42289-a.reston1.va.home.com>
Message-ID: <200012042340.AAA00966@loewis.home.cs.tu-berlin.de>

>   When the recommendation refers to the "name" of a node, does it
> refer to the qualified name?  From the text, I'd take it that I should
> be looking at "prefix:localName" when it says "name" -- is that
> correct?  Or should I only be thinking of this as localName?

You mean, e.g. as the parameter tagElement to createElement? In that
case, neither nor - think "namespace unaware". All of localName,
prefix and namespaceURI will be None in the Element node being created.

Or do you mean the description of the tagName attribute for Element?
In that case, it would depend whether the Element was create through
createElement or createElementNS - for either case, its content is
well-defined.

Apart from these two occurences, I can't find any phrase that
resembles "name of a node" that isn't also qualified as, e.g. "local
name of a node".

In general, I believe the intent is that the tagName attribute is the
string of tag as it appeared literally in the XML document (if the DOM
tree was created through parsing).

If you were looking at some other text, please tell us what that was?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Dec  5 00:04:47 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 01:04:47 +0100
Subject: [XML-SIG] Re: sax parser leaks memory?
In-Reply-To: <000501c05e33$573184e0$3c6340d5@hagrid> (fredrik@effbot.org)
References: <000501c05e33$573184e0$3c6340d5@hagrid>
Message-ID: <200012050004.BAA01145@loewis.home.cs.tu-berlin.de>

> > on my windows box, this little script runs out of memory
> > within 30 seconds or so...
> 
> here's another example:

Thanks for the report. Here is a patch.

Regards,
Martin

P.S. It seems like pyexpat also needs to be told about garbage
collection...

Index: pyexpat.c
===================================================================
RCS file: /cvsroot/pyxml/xml/extensions/pyexpat.c,v
retrieving revision 1.16
diff -u -r1.16 pyexpat.c
--- pyexpat.c	2000/11/02 04:57:40	1.16
+++ pyexpat.c	2000/12/05 00:00:33
@@ -680,6 +680,7 @@
     for (i=0; handler_info[i].name != NULL; i++) {
         Py_XDECREF(self->handlers[i]);
     }
+    free (self->handlers);
 #if PY_MAJOR_VERSION == 1 && PY_MINOR_VERSION < 6
     /* Code for versions before 1.6 */
     free(self);


From martin@loewis.home.cs.tu-berlin.de  Tue Dec  5 00:11:50 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 01:11:50 +0100
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <20001204154056.K25116@reza.net> (message from Reza Naima on Mon,
 4 Dec 2000 15:40:56 -0800)
References: <20001204154056.K25116@reza.net>
Message-ID: <200012050011.BAA01196@loewis.home.cs.tu-berlin.de>

> Now, there are to work-arounds... First off, would there be a way for me
> to guarantee that attribute1 is first on the list of attributes for that
> element. 

That shouldn't be hard to achieve if you use the xml.dom.ext.Printer
framework - just subclass the PrintVisitor (or the PrettyPrintVisitor)
and replace the visitNameNodeMap method. That iterates over the
attributes in the order they have in the dictionary; you could sort
them (lexically) before that.

> The other work-around is to get rid of attribute2 and attribute3.  This
> workes, but it seems as PyXML looks at the DTD spec, notices that they
> are missing, and fills them in.  So, I'de like to find a way to get
> PyXML to ignore the DTD.

I'm surprised it looks into the DTD. During parsing, you mean? Then
you probably use xmlproc as the parser, which is validating. If you'd
use pyexpat (or some other non-validating parser), it couldn't
possibly use the DTD.

Regards,
Martin


From iron@mso.oz.net  Tue Dec  5 00:27:44 2000
From: iron@mso.oz.net (Mike Orr)
Date: Mon, 4 Dec 2000 16:27:44 -0800
Subject: [XML-SIG] ODBC-XML-Interface
In-Reply-To: <200012042306.AAA00767@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Dec 05, 2000 at 12:06:07AM +0100
References: <3A2B62C6.2DF381C1@clondiag.com> <200012042306.AAA00767@loewis.home.cs.tu-berlin.de>
Message-ID: <20001204162744.B2465@mso.oz.net>

On Tue, Dec 05, 2000 at 12:06:07AM +0100, Martin v. Loewis wrote:
> > Is there any Python SAX2-Driver available that parses Databases via
> > Python Database API.
> 
> I guess the answer to that question is "no"; I couldn't really tell
> what "parsing a database" would mean when it comes to XML files. An
> XML file is a byte sequence in some specific format, and a parser
> analyses its structure. A database typically is a byte sequence (or
> several of them) in a totally different structure, and an DBMS is used
> to access the bytes.

Dunno if this'll help, but just in case...

I've been thinking for a while about XML's relationship to databases and
evaluating its use as an "editing UI" for the (MySQL) databases.  It
would involve converting a database structure to XML and back, although
not using the Database API for the XML part.  My idea was to use a list
(the rows) of dictionaries (each record) as the intermediate format and
to make it "generic" for a variety of databases.  In this case, one
level of XML tags would correspond to the records, and the child level
would be the fields.  A parent level could then mean "tables", if that
was desired.  (And the script would then have to check referential
integrity after the edit.)  I've done a few prototype tests and am
undecided whether to proceed at this point.

It's hard to imagine how one would write a Database API driver for XML.
XML has no native concept of "this is a record level" and "this is a
field level"; the application or DTD has to infer this.  XML just has
an arbitrary nesting of tags.  So for a database driver to extract
	SELECT name, phone FROM contact_manager.phone_list
	WHERE name LIKE "Mc%" ORDER BY name
from an XML file, the file would have to conform to a specific DTD, it
couldn't be just any XML file.  At that point, one wonders whether
perhaps either XML or the Database API should be thrown out of this
project.  Because either the project belongs more naturally to one or
to the other.

-- 
-Mike (Iron) Orr, iron@mso.oz.net  (if mail problems: mso@jimpick.com)
   http://mso.oz.net/     English * Esperanto * Russkiy * Deutsch * Espan~ol


From rsalz@caveosystems.com  Tue Dec  5 00:53:43 2000
From: rsalz@caveosystems.com (Rich Salz)
Date: Mon, 04 Dec 2000 19:53:43 -0500
Subject: [XML-SIG] ODBC-XML-Interface
References: <3A2B62C6.2DF381C1@clondiag.com> <200012042306.AAA00767@loewis.home.cs.tu-berlin.de> <20001204162744.B2465@mso.oz.net>
Message-ID: <3A2C3C97.49B16532@caveosystems.com>

> I've been thinking for a while about XML's relationship to databases and
> evaluating its use as an "editing UI" for the (MySQL) databases.  It
> would involve converting a database structure to XML and back, although
> not using the Database API for the XML part.

You might want to poke around microsoft.com and see how they're integrating
xml, sqlserver, etc.   query the scheme and write the DTD/schema on the fly.
replace SQL queries with xpath, etc.  parts are pretty cool.  shoudl be some
good ideas there.
	/r$


From Reza Naima <reza@reza.net>  Tue Dec  5 01:12:24 2000
From: Reza Naima <reza@reza.net> (Reza Naima)
Date: Mon, 4 Dec 2000 17:12:24 -0800
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <200012050011.BAA01196@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Dec 05, 2000 at 01:11:50AM +0100
References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de>
Message-ID: <20001204171224.M25116@reza.net>

On Tue, Dec 05, 2000 at 01:11:50AM +0100, Martin v. Loewis sent me this...
> > Now, there are to work-arounds... First off, would there be a way for me
> > to guarantee that attribute1 is first on the list of attributes for that
> > element. 
> 
> That shouldn't be hard to achieve if you use the xml.dom.ext.Printer
> framework - just subclass the PrintVisitor (or the PrettyPrintVisitor)
> and replace the visitNameNodeMap method. That iterates over the
> attributes in the order they have in the dictionary; you could sort
> them (lexically) before that.

I'm lost here.. I can't find anything called PrintVisitor or
PrettyPrintVisitor in any of the PyXML Code :

reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs
grep -i NameNodeMap
reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs
grep -i PrettyPrintVisitor
reza@gooz:/usr/local/src/PyXML-0.5.4 > find . -type f -print | xargs
grep -i PrintVisitor
reza@gooz:/usr/local/src/PyXML-0.5.4 > 

> 
> > The other work-around is to get rid of attribute2 and attribute3.  This
> > workes, but it seems as PyXML looks at the DTD spec, notices that they
> > are missing, and fills them in.  So, I'de like to find a way to get
> > PyXML to ignore the DTD.
> 
> I'm surprised it looks into the DTD. During parsing, you mean? Then
> you probably use xmlproc as the parser, which is validating. If you'd
> use pyexpat (or some other non-validating parser), it couldn't
> possibly use the DTD.

I tried to specify pyexpat as the parser :

-------------
from xml.dom import core, utils
import sys

fr = utils.FileReader()
path = sys.argv[1]
file = open(path, 'r')
document = fr.readXml(file, 'pyexpat')
print document.toxml()
---------------

and I got this exception thrown :

---------------
# /lc/bin/python /tmp/test.py /var/tmp/JUNIPER.xml 
Traceback (innermost last):
  File "/tmp/test.py", line 7, in ?
    document = fr.readXml(file, 'pyexpat')
  File "/lc/blackshadow/PyXML/xml/dom/utils.py", line 162, in readXml
    p = saxexts.make_parser(parserName)
  File "/lc/blackshadow/PyXML/xml/sax/saxexts.py", line 159, in make_parser
    return XMLParserFactory.make_parser(parser)
  File "/lc/blackshadow/PyXML/xml/sax/saxexts.py", line 65, in make_parser
    raise saxlib.SAXException("No parsers found",None)
xml.sax.saxlib.SAXException: No parsers found
-------------

Am I doing something wrong?

Thanks,
Reza


From tpassin@home.com  Tue Dec  5 01:22:24 2000
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 4 Dec 2000 20:22:24 -0500
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de>
Message-ID: <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com>

Martin v. Loewis wrote -

>
> I'm surprised it looks into the DTD. During parsing, you mean? Then
> you probably use xmlproc as the parser, which is validating. If you'd
> use pyexpat (or some other non-validating parser), it couldn't
> possibly use the DTD.
>
 What, don't the default parsers read the internal subset and insert default
values?  I never tried it, but always assumed they did (it's allowed by the
Rec for non-validating parsers).

Tom P


From tpassin@home.com  Tue Dec  5 01:27:01 2000
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 4 Dec 2000 20:27:01 -0500
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
References: <20001204154056.K25116@reza.net>
Message-ID: <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com>

Reza Naima wrote -

> ...
> The other work-around is to get rid of attribute2 and attribute3.  This
> workes, but it seems as PyXML looks at the DTD spec, notices that they
> are missing, and fills them in.  So, I'de like to find a way to get
> PyXML to ignore the DTD.
>

If you can change the DTD, you could make these attributes #IMPLIED without
any default values.  Then the parser shouldn;t be adding them.

Martin's solution of sorting would only work if your "broken" 3rd party
software want to see alphabetical order.  Fundamentally, xml attributes are
never guaranteed to be in any particular order - basically, they are a set,
not a list.

Cheers,

Tom P


From Reza Naima <reza@reza.net>  Tue Dec  5 02:23:41 2000
From: Reza Naima <reza@reza.net> (Reza Naima)
Date: Mon, 4 Dec 2000 18:23:41 -0800
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com>; from tpassin@home.com on Mon, Dec 04, 2000 at 08:27:01PM -0500
References: <20001204154056.K25116@reza.net> <005101c05e5a$77c460c0$7cac1218@reston1.va.home.com>
Message-ID: <20001204182341.O25116@reza.net>

Alas, I don't want to touch the DTD as it will break the 3rd party
software.  

-r

On Mon, Dec 04, 2000 at 08:27:01PM -0500, Thomas B. Passin sent me this...
> Reza Naima wrote -
> 
> > ...
> > The other work-around is to get rid of attribute2 and attribute3.  This
> > workes, but it seems as PyXML looks at the DTD spec, notices that they
> > are missing, and fills them in.  So, I'de like to find a way to get
> > PyXML to ignore the DTD.
> >
> 
> If you can change the DTD, you could make these attributes #IMPLIED without
> any default values.  Then the parser shouldn;t be adding them.
> 
> Martin's solution of sorting would only work if your "broken" 3rd party
> software want to see alphabetical order.  Fundamentally, xml attributes are
> never guaranteed to be in any particular order - basically, they are a set,
> not a list.


From martin@loewis.home.cs.tu-berlin.de  Tue Dec  5 08:28:15 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 09:28:15 +0100
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <20001204171224.M25116@reza.net> (message from Reza Naima on Mon,
 4 Dec 2000 17:12:24 -0800)
References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> <20001204171224.M25116@reza.net>
Message-ID: <200012050828.JAA00752@loewis.home.cs.tu-berlin.de>

> I'm lost here.. I can't find anything called PrintVisitor or
> PrettyPrintVisitor in any of the PyXML Code :

Yes, that's part of 4DOM, which only appears in PyXML 0.6.

> Am I doing something wrong?

Probably, although I can't tell what it is - I don't know the
signature of readXml.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Tue Dec  5 08:37:04 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 5 Dec 2000 09:37:04 +0100 (CET)
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <20001204171224.M25116@reza.net>
Message-ID: <Pine.LNX.4.21.0012050935460.18579-100000@leo.logilab.fr>

On Mon, 4 Dec 2000, Reza Naima wrote:

> I'm lost here.. I can't find anything called PrintVisitor or
> PrettyPrintVisitor in any of the PyXML Code :

Try upgrading to the latest release of PyXML (0.6.2 if I'm not
mistaken). This might require changing some code since the DOM
implementation has changed.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From martin@loewis.home.cs.tu-berlin.de  Tue Dec  5 08:53:26 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 09:53:26 +0100
Subject: [XML-SIG] Dissabling DTDs or arranging the Attribute order
In-Reply-To: <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <20001204154056.K25116@reza.net> <200012050011.BAA01196@loewis.home.cs.tu-berlin.de> <004701c05e59$d2d441c0$7cac1218@reston1.va.home.com>
Message-ID: <200012050853.JAA00940@loewis.home.cs.tu-berlin.de>

>  What, don't the default parsers read the internal subset and insert
> default values?  I never tried it, but always assumed they did (it's
> allowed by the Rec for non-validating parsers).

Indeed, atleast pyexpat does. I was assuming there is an external
subset in the original poster's problem; it makes more sense to assume
that it was internal subset.

In that case, I don't see a way to stop the parser from filling in the
default values.

Regards,
Martin


From paul@prescod.net  Tue Dec  5 08:58:58 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 05 Dec 2000 03:58:58 -0500
Subject: [XML-SIG] Specializing DOM exceptions
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de>
Message-ID: <3A2CAE52.999C4EE5@prescod.net>

Sorry for being long-delayed in writing this.

"Martin v. Loewis" wrote:
> 
> I'd like to propose an enhancement to the DOM exception classes,
> namely that different codes are mapped to different subclasses:
> 
> class IndexSizeErr(DOMException):
>       code = INDEX_SIZE_ERR

> Also, I'd like to make DOMException, the code constants, and the
> derived classes part of the official Python API, so all DOM
> implementations use the same set of exceptions.

My concern is that Python already has an IndexError and it is raised
"naturally" (and efficiently) in a lot of places in minidom. At one
point we had talked about formalizing a mechanism where Python
exceptions stand for DOM exceptions.

So IndexSizeErr could be a subclass of Python's IndexError. Python
"clients" could check for IndexError as they would in any other Python
code. Those that want to treat the DOM stuff specially could do so. This
would all be part of the Python-DOM mapping.

 Paul Prescod


From noreply@sourceforge.net  Tue Dec  5 10:36:03 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 5 Dec 2000 02:36:03 -0800
Subject: [XML-SIG] [Bug #124521] 4ODS : transaction.begin() throws unexpected exception
Message-ID: <200012051036.CAA22000@sf-web2.i.sourceforge.net>

Bug #124521, was updated on 2000-Dec-05 02:36
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: 4ODS : transaction.begin() throws unexpected exception

Details: 
When calling begin() on a transaction that has just been commited, a TransactionInProgress exception is raised. My reading of the ODMG C++ and Java bindings (p 179 and 252) is that this should not occur. I'm using 4Suite 0.10.0. 


>>> from Ft.DbDom import Dom
>>> from Ft.Ods import Database
>>> from xml.dom import ext
>>> import sys, os
>>> DBNAME=os.environ.get("ODS_TEST_DB","ods:test")
>>> db = Database.Database()
>>> db.open(DBNAME)
>>> tx = db.new()
>>> tx.begin()
>>> from Ft.DbDom import Reader
>>> r = Reader.Reader()ader
>>> f = open('/home/alf/memory.xml') # or some other file
>>> doc = r.fromStream(f)
>>> db.bind(doc,'memory')
>>> tx.commit()
>>> tx.begin()
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/Ft/Ods/Transaction.py", line 54, in begin
    raise TransactionInProgress()
Ft.Ods.Transaction.TransactionInProgress: <Ft.Ods.Transaction.TransactionInProgress instance at 84f2ea0>


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124521&group_id=6473


From larsga@garshol.priv.no  Tue Dec  5 11:44:29 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Dec 2000 12:44:29 +0100
Subject: [XML-SIG] sax parser leaks memory?
In-Reply-To: <001001c05e25$87e7cc10$3c6340d5@hagrid>
References: <001001c05e25$87e7cc10$3c6340d5@hagrid>
Message-ID: <m3bsurdrdu.fsf@lambda.garshol.priv.no>

* Fredrik Lundh
|
| on my windows box, this little script runs out of memory
| within 30 seconds or so...

There is nothing wrong with the script, so there must be a memory leak
somewhere. I did a similar test where I used pyexpat directly:

import pyexpat

while 1:
  p = pyexpat.ParserCreate()
  p.Parse("<doc>This is a little document</doc>", 1)
  del p

and that also leaked memory. (Incidentally, it crashed my Win98 box so
hard I had to physically turn it off and back on again.) So apparently
the leak is in pyexpat somewhere.

I tried running Plumbo on your application, but it couldn't find any
cycles.

--Lars M.


From noreply@sourceforge.net  Tue Dec  5 12:20:50 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 5 Dec 2000 04:20:50 -0800
Subject: [XML-SIG] [Bug #124529] DbDom : Dom.py uses DOMError which is not declared
Message-ID: <200012051220.EAA23700@sf-web2.i.sourceforge.net>

Bug #124529, was updated on 2000-Dec-05 04:20
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : Dom.py uses DOMError which is not declared

Details: 

Here's a patch:

--- /home/alf/4Suite-0.10/DbDom/Dom.py  Fri Nov 17 00:05:37 2000
+++ /usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py    Tue Dec  5 12:50:13 2000
@@ -24,6 +24,14 @@
 from Ft.DbDom import Comment
 from Ft.DbDom import ProcessingInstruction
 
+from xml.dom import DOMException
+from xml.dom import INDEX_SIZE_ERR,DOMSTRING_SIZE_ERR,HIERARCHY_REQUEST_ERR
+from xml.dom import WRONG_DOCUMENT_ERR,INVALID_CHARACTER_ERR,NO_DATA_ALLOWED_ERR
+from xml.dom import NO_MODIFICATION_ALLOWED_ERR,NOT_FOUND_ERR,NOT_SUPPORTED_ERR
+from xml.dom import INUSE_ATTRIBUTE_ERR,INVALID_STATE_ERR,SYNTAX_ERR
+from xml.dom import INVALID_MODIFICATION_ERR,NAMESPACE_ERR,INVALID_ACCESS_ERR
+
+
 
 from Ft.Ods.Collections import LiteralListOfObjects


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124529&group_id=6473


From noreply@sourceforge.net  Tue Dec  5 12:36:42 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 5 Dec 2000 04:36:42 -0800
Subject: [XML-SIG] [Bug #124531] DbDom : reader fails when passed an owner document
Message-ID: <200012051236.EAA23968@sf-web2.i.sourceforge.net>

Bug #124531, was updated on 2000-Dec-05 04:36
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : reader fails when passed an owner document

Details: The version is 4Suite0.10.0 with the patches I posted today applied.

Here's a sample test session:

>>> DBNAME = 'ods:alf@orion:5432:dom_test'
>>> from Ft.DbDom import Dom
>>> from Ft.Ods import Database
>>> from xml.dom import ext
>>> import sys, os
>>> db = Database.Database()
>>> db.open(DBNAME)
>>> tx = db.new()
>>> tx.begin()
>>> doc = Dom.DocumentImp()
>>> e = doc.createElementNS('','root')
>>> doc.appendChild(e)
<DbDom Element Node at 820da60: name='root' with 0 attributes and 0 children>
>>> fragment = '<node1/><node2/>'
>>> from Ft.DbDom import Reader
>>> r = Reader.Reader()
>>> r.fromString(fragment,doc)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line 49, in fromString
    rt = self.fromStream(stream, ownerDoc)
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/Reader.py", line 27, in fromStream
    Sax2.Reader.fromStream(self,stream,ownerDocument=ownerDocument)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 267, in fromStream
    self.parser.parseFile(stream)
  File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 68, in parseFile
    if self.parser.Parse(buf, 0) != 1:
  File "/usr/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 49, in endElement
    self.doc_handler.endElement(name)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 170, in endElement
    self._nodeStack[-1].appendChild(new_element)
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py", line 336, in appendChild
    raise DOMException(HIERARCHY_REQUEST_ERR)
xml.dom.DOMException: DOM Error Code 3: Node manipulation results in invalid parent/child relationship.


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124531&group_id=6473


From noreply@sourceforge.net  Tue Dec  5 12:59:08 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 5 Dec 2000 04:59:08 -0800
Subject: [XML-SIG] [Patch #102658] patch for bugs #124529 and #124531
Message-ID: <200012051259.EAA00548@sf-web3.vaspecialprojects.com>

Patch #102658 has been updated. 

Project: pyxml
Category: None
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: patch for bugs #124529 and #124531

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102658&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Tue Dec  5 22:01:05 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 5 Dec 2000 23:01:05 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <3A2CAE52.999C4EE5@prescod.net> (message from Paul Prescod on
 Tue, 05 Dec 2000 03:58:58 -0500)
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net>
Message-ID: <200012052201.XAA00773@loewis.home.cs.tu-berlin.de>

> My concern is that Python already has an IndexError and it is raised
> "naturally" (and efficiently) in a lot of places in minidom. At one
> point we had talked about formalizing a mechanism where Python
> exceptions stand for DOM exceptions.
> 
> So IndexSizeErr could be a subclass of Python's IndexError. Python
> "clients" could check for IndexError as they would in any other Python
> code. Those that want to treat the DOM stuff specially could do so. This
> would all be part of the Python-DOM mapping.

I don't see the value of this. When applications catch IndexError,
they normally do so to wrap a specific index access. In the Python
library, I found the following places where IndexError is caught:

        try:
            bp = Breakpoint.bpbynumber[number]
        except IndexError:
            return 'Breakpoint number (%d) out of range' % number
#############
            try:
                result.append(self[key])
            except IndexError:
                result.append(self.dict[key])
#############
                try:
                        self.response = args[0]
                except IndexError:
                        self.response = 'No response given'
...

I can't imagine a scenario where a DOM INDEX_SIZE_ERR and a Python
IndexError could likewise occur for a block of code, and would deserve
identical, specific treatment.

That said, if you think it is useful: go ahead and propose a specific
patch. It probably can't hurt.

Regards,
Martin


From paul@prescod.net  Tue Dec  5 23:02:19 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 05 Dec 2000 18:02:19 -0500
Subject: [XML-SIG] Specializing DOM exceptions
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de>
Message-ID: <3A2D73FB.19BA4E12@prescod.net>

"Martin v. Loewis" wrote:
> 
> ...
> 
> I don't see the value of this. When applications catch IndexError,
> they normally do so to wrap a specific index access. 

I agree. My point is simply that Python already has a way to spell
"index-related error" and Python programmers are used to using it. The
implementation raises them naturally when you try to do something
Index-ish using minidom. So why not use IndexError instead of or in
addition to DOM_INDEX_SIZE_ERR. 

Then, just as you would write:

        try:
            bp = Breakpoint.bpbynumber[number]
        except IndexError:
            error message

You could write:

        try:
            element = node.childNodes[number]
        except IndexError:
            error message


 Paul Prescod


From martin@loewis.home.cs.tu-berlin.de  Wed Dec  6 07:36:49 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 6 Dec 2000 08:36:49 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <3A2D73FB.19BA4E12@prescod.net> (message from Paul Prescod on
 Tue, 05 Dec 2000 18:02:19 -0500)
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net>
Message-ID: <200012060736.IAA00698@loewis.home.cs.tu-berlin.de>

> Then, just as you would write:
> 
>         try:
>             bp = Breakpoint.bpbynumber[number]
>         except IndexError:
>             error message
> 
> You could write:
> 
>         try:
>             element = node.childNodes[number]
>         except IndexError:
>             error message

You certainly would - no matter how DOMExceptions work (*). The
question is whether users would prefer to write

   try:
     text1 = text.splitText(offs)
   except IndexError:
     error message

over

   try:
     text1 = text.splitText(offs)
   except IndexSizeErr:
     error message

Nobody would expect that splitText could possibly raise
IndexError. Nobody would guess that it could raise IndexSizeErr,
either - but at least you'd have the DOM documentation to tell you.

Regards,
Martin

(*) In DOM, childNodes does not have a []-operator; only a method
item(). Interestingly enough, that method is specified to return null
in case of an out-of-range index, not to raise INDEX_SIZE_ERR.


From noreply@sourceforge.net  Wed Dec  6 14:10:31 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 6 Dec 2000 06:10:31 -0800
Subject: [XML-SIG] [Bug #124715] DbDom : DocFrag children are orphans
Message-ID: <200012061410.GAA06471@sf-web1.i.sourceforge.net>

Bug #124715, was updated on 2000-Dec-06 06:10
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : DocFrag children are orphans

Details: When using DocumentFragments in DbDom, a child node of the fragment has no
parentNode. This causes StripXml to crash, and possibly other things. 

Here's a sample demo code:


>>> from Ft.DbDom import Dom
>>> from Ft.Ods import Database
>>> from Ft.DbDom import Reader
>>> from xml.dom.ext import PrettyPrint,StripXml
>>> 
>>> DBNAME='ods:alf@orion:5432:dom_test'
>>> 
>>> db = Database.Database()
>>> db.open(DBNAME)
>>> tx = db.new()
>>> tx.begin()
>>> 
>>> doc = Dom.DocumentImp()
>>> 
>>> e = doc.createElementNS('','root')
>>> doc.appendChild(e)
<DbDom Element Node at 8261610: name='root' with 0 attributes and 0 children>
>>> 
>>> fragment='''<children><node1/><node2/></children>'''
>>> r = Reader.Reader()
>>> f = r.fromString(fragment,doc)
>>> print f.firstChild
<DbDom Element Node at 81d8fe0: name='children' with 0 attributes and 3 children>
>>> print f.firstChild.parentNode
None


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124715&group_id=6473


From noreply@sourceforge.net  Wed Dec  6 16:24:55 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 6 Dec 2000 08:24:55 -0800
Subject: [XML-SIG] [Patch #102687] DbDom patch for bug #124715
Message-ID: <200012061624.IAA25268@sf-web2.i.sourceforge.net>

Patch #102687 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom patch for bug #124715

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102687&group_id=6473


From noreply@sourceforge.net  Wed Dec  6 17:41:35 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 6 Dec 2000 09:41:35 -0800
Subject: [XML-SIG] [Bug #124736] 4Ods LiteralListOfObjects fails on python list operations
Message-ID: <200012061741.JAA20916@sf-web1.i.sourceforge.net>

Bug #124736, was updated on 2000-Dec-06 09:41
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: 4Ods LiteralListOfObjects fails on python list operations

Details: using DbDom, e is an Element:
>>> e.childNodes[:]
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/Ft/Ods/Collections/CollectionBase.py", line 120, in __getslice__
    rt._4ods_initialize(self._4ods_getContents()[i:j])
TypeError: not enough arguments; expected 3, got 2


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124736&group_id=6473


From noreply@sourceforge.net  Wed Dec  6 18:09:06 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 6 Dec 2000 10:09:06 -0800
Subject: [XML-SIG] [Patch #102688] 4ODS patch for bug #124736 (__getslice__)
Message-ID: <200012061809.KAA02218@sf-web3.vaspecialprojects.com>

Patch #102688 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: 4ODS patch for bug #124736 (__getslice__)

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102688&group_id=6473


From noreply@sourceforge.net  Wed Dec  6 19:17:45 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 6 Dec 2000 11:17:45 -0800
Subject: [XML-SIG] [Patch #102690] PyXML 0.6.2 compile error with Python 2.0b1
Message-ID: <200012061917.LAA23932@sf-web1.i.sourceforge.net>

Patch #102690 has been updated. 

Project: pyxml
Category: expat
Status: Open
Submitted by: calvin
Assigned to : Nobody
Summary: PyXML 0.6.2 compile error with Python 2.0b1

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102690&group_id=6473


From fdrake@acm.org  Thu Dec  7 06:18:22 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 7 Dec 2000 01:18:22 -0500 (EST)
Subject: [XML-SIG] Re: sax parser leaks memory?
In-Reply-To: <200012050004.BAA01145@loewis.home.cs.tu-berlin.de>
References: <000501c05e33$573184e0$3c6340d5@hagrid>
 <200012050004.BAA01145@loewis.home.cs.tu-berlin.de>
Message-ID: <14895.11182.359006.281805@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > Thanks for the report. Here is a patch.

  Are you planning to check this in to either Python or PyXML?  I
think both could use it.  ;)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From noreply@sourceforge.net  Thu Dec  7 10:25:56 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 7 Dec 2000 02:25:56 -0800
Subject: [XML-SIG] [Bug #124829] DbDom : getAttribute / getAttributeNode bad implementation
Message-ID: <200012071025.CAA11021@sf-web2.i.sourceforge.net>

Bug #124829, was updated on 2000-Dec-07 02:25
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : getAttribute / getAttributeNode bad implementation

Details: Using DbDom, getAttributeNS returns an AttributeImp object (instead of the value of the attribute) and getAttributeNodeNS is not implemented.

Sample code (e is an ElementImp object):

>>> e.setAttributeNS('','toto','5')
>>> e.getAttributeNS('','toto')
<Ft.DbDom.Dom.AttributeImp instance at 8281bc0>
>>> e.getAttributeNodeNS('','toto')
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 163, in __getattr__
    raise AttributeError(name)
AttributeError: getAttributeNodeNS


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124829&group_id=6473


From noreply@sourceforge.net  Thu Dec  7 10:45:58 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 7 Dec 2000 02:45:58 -0800
Subject: [XML-SIG] [Patch #102700] DbDom : bug #124829 getAttribute/getAttributeNode
Message-ID: <200012071045.CAA25413@sf-web1.i.sourceforge.net>

Patch #102700 has been updated. 

Project: pyxml
Category: None
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : bug #124829 getAttribute/getAttributeNode

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102700&group_id=6473


From paul@prescod.net  Thu Dec  7 11:07:42 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 07 Dec 2000 06:07:42 -0500
Subject: [XML-SIG] Specializing DOM exceptions
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de>
Message-ID: <3A2F6F7E.6149A760@prescod.net>

"Martin v. Loewis" wrote:
> 
>...
> 
> Nobody would expect that splitText could possibly raise
> IndexError. Nobody would guess that it could raise IndexSizeErr,
> either - but at least you'd have the DOM documentation to tell you.

The DOM documentation does not mention a Python IndexSizeErr exception.
That's part of the Python binding so you can only find out about it in
the Python documentation.

> (*) In DOM, childNodes does not have a []-operator; only a method
> item(). Interestingly enough, that method is specified to return null
> in case of an out-of-range index, not to raise INDEX_SIZE_ERR.

That's part of the Python binding also:

>>> from xml.dom.minidom import parse
>>> d = parse("c:\\temp\\test.xml")
<xml.dom.minidom.Document instance at 0083C15C>
>>> d.childNodes[0]
<DOM Element: abc at 8452044>

I don't think returning null would be very Pythonic.

 Paul Prescod


From noreply@sourceforge.net  Thu Dec  7 11:11:10 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 7 Dec 2000 03:11:10 -0800
Subject: [XML-SIG] [Bug #124839] DbDom : reader.releaseNode fails on DocumentFragments
Message-ID: <200012071111.DAA27999@sf-web1.i.sourceforge.net>

Bug #124839, was updated on 2000-Dec-07 03:11
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom : reader.releaseNode fails on DocumentFragments

Details: releaseNode calls FreePersistentObject on DF, which are not persistent objects. And fails miserably...

sample code (doc is a DocumentImp object)

>>> fragment='''<children><node1/>
... <node2/></children>'''
>>> r = Reader.Reader()
>>> f = r.fromString(fragment,doc)
>>> r.releaseNode(f)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/Reader.py", line 30, in releaseNode
    FreePersistentObject(doc)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/__init__.py", line 57, in FreePersistentObject
    obj._pseudo_del()
AttributeError: _pseudo_del


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124839&group_id=6473


From noreply@sourceforge.net  Thu Dec  7 11:47:00 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 7 Dec 2000 03:47:00 -0800
Subject: [XML-SIG] [Patch #102704] DbDom releaseNode patch (bug #124839)
Message-ID: <200012071147.DAA30704@sf-web1.i.sourceforge.net>

Patch #102704 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: AFayolle
Assigned to : Nobody
Summary: DbDom releaseNode patch (bug #124839)

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102704&group_id=6473


From fdrake@acm.org  Thu Dec  7 14:15:18 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 7 Dec 2000 09:15:18 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <3A2F6F7E.6149A760@prescod.net>
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de>
 <3A2CAE52.999C4EE5@prescod.net>
 <200012052201.XAA00773@loewis.home.cs.tu-berlin.de>
 <3A2D73FB.19BA4E12@prescod.net>
 <200012060736.IAA00698@loewis.home.cs.tu-berlin.de>
 <3A2F6F7E.6149A760@prescod.net>
Message-ID: <14895.39798.226480.773640@cj42289-a.reston1.va.home.com>

Paul Prescod writes:
 > The DOM documentation does not mention a Python IndexSizeErr exception.
 > That's part of the Python binding so you can only find out about it in
 > the Python documentation.

  I'll take a look at this today and see what I think the right thing
is.

Martin says:
 > (*) In DOM, childNodes does not have a []-operator; only a method
 > item(). Interestingly enough, that method is specified to return null
 > in case of an out-of-range index, not to raise INDEX_SIZE_ERR.

Paul responds:
 > That's part of the Python binding also:
...
 > I don't think returning null would be very Pythonic.

  NodeList.item(i) should return None if the recommendation says it
should return null, but NodeList[] should handle negative indexes and
raise IndexError in the appropriate Pythonic way.  The Python DOM API
is written that way as well:

http://python.sourceforge.net/devel-docs/lib/dom-nodelist-objects.html


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From noreply@sourceforge.net  Thu Dec  7 14:32:46 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 7 Dec 2000 06:32:46 -0800
Subject: [XML-SIG] [Bug #124857] 4ODS operations can occur outside transactions
Message-ID: <200012071432.GAA07800@sf-web1.i.sourceforge.net>

Bug #124857, was updated on 2000-Dec-07 06:32
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: AFayolle
Assigned to : Nobody
Summary: 4ODS operations can occur outside transactions

Details: Here's a sample code. I would expect the last line to raise a TransactionNotInProgress exception, but in does not.


from Ft.DbDom import Dom
from Ft.Ods import Database

DBNAME='ods:alf@orion:5432:dom_test'

db = Database.Database()
db.open(DBNAME)
tx = db.new()
tx.begin()

doc = Dom.DocumentImp()
e = doc.createElementNS('','root')
doc.appendChild(e)
tx.commit()
e.setAttributeNS('','foo','bar')


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=124857&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Thu Dec  7 16:11:00 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 7 Dec 2000 17:11:00 +0100
Subject: [XML-SIG] Re: sax parser leaks memory?
In-Reply-To: <14895.11182.359006.281805@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <000501c05e33$573184e0$3c6340d5@hagrid>
 <200012050004.BAA01145@loewis.home.cs.tu-berlin.de> <14895.11182.359006.281805@cj42289-a.reston1.va.home.com>
Message-ID: <200012071611.RAA00732@loewis.home.cs.tu-berlin.de>

>   Are you planning to check this in to either Python or PyXML?  I
> think both could use it.  ;)

I just committed it to PyXML; thanks for the reminder :-)

I plan to synchronize Python pyexpat.c with PyXML pyexpat.c around the
time of the next PyXML release; there is a number of other changes
that needs to be carried over as well.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Dec  7 16:20:32 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 7 Dec 2000 17:20:32 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <3A2F6F7E.6149A760@prescod.net> (message from Paul Prescod on
 Thu, 07 Dec 2000 06:07:42 -0500)
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de> <3A2CAE52.999C4EE5@prescod.net> <200012052201.XAA00773@loewis.home.cs.tu-berlin.de> <3A2D73FB.19BA4E12@prescod.net> <200012060736.IAA00698@loewis.home.cs.tu-berlin.de> <3A2F6F7E.6149A760@prescod.net>
Message-ID: <200012071620.RAA00800@loewis.home.cs.tu-berlin.de>

> The DOM documentation does not mention a Python IndexSizeErr exception.
> That's part of the Python binding so you can only find out about it in
> the Python documentation.

No, but it does mention DOMException with an INDEX_SIZE_ERR code. Such
an exception is represented in Python by an IndexSizeErr object (which
is indeed a DOMException instance with a .code field of
INDEX_SIZE_ERR).  So Python's IndexSizeErr and DOM's INDEX_SIZE_ERR
are really one and the same - it's just that IDL cannot express
exception specialization.

> > (*) In DOM, childNodes does not have a []-operator; only a method
> > item(). Interestingly enough, that method is specified to return null
> > in case of an out-of-range index, not to raise INDEX_SIZE_ERR.
> 
> That's part of the Python binding also:
> 
> >>> from xml.dom.minidom import parse
> >>> d = parse("c:\\temp\\test.xml")
> <xml.dom.minidom.Document instance at 0083C15C>
> >>> d.childNodes[0]
> <DOM Element: abc at 8452044>
> 
> I don't think returning null would be very Pythonic.

Indeed. The childNodes collection behaves like a Python sequence - so
you'd expect sequence exceptions for the sequence operations. It is
also a DOM NodeList implementation, and I'd expect DOM exceptions for
DOM operations. I would not, however, expect standard Python
exceptions coming out of DOM operations, or DOM exceptions coming out
of Python sequence operations.

Again, if you think otherwise, just propose a specification (or is

class IndexSizeErr(DOMException, IndexError):
  code = INDEX_SIZE_ERR

really all that you are proposing?) I won't object to adding specific
text or code, even if I don't see a value in such an addition.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Dec  7 16:22:00 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 7 Dec 2000 17:22:00 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <14895.39798.226480.773640@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <200011242122.WAA01293@loewis.home.cs.tu-berlin.de>
 <3A2CAE52.999C4EE5@prescod.net>
 <200012052201.XAA00773@loewis.home.cs.tu-berlin.de>
 <3A2D73FB.19BA4E12@prescod.net>
 <200012060736.IAA00698@loewis.home.cs.tu-berlin.de>
 <3A2F6F7E.6149A760@prescod.net> <14895.39798.226480.773640@cj42289-a.reston1.va.home.com>
Message-ID: <200012071622.RAA00801@loewis.home.cs.tu-berlin.de>

>   NodeList.item(i) should return None if the recommendation says it
> should return null, but NodeList[] should handle negative indexes and
> raise IndexError in the appropriate Pythonic way.  

Exactly my understanding.

Regards,
Martin


From noreply@sourceforge.net  Fri Dec  8 16:22:59 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 8 Dec 2000 08:22:59 -0800
Subject: [XML-SIG] [Bug #125004] 4xslt: XPath doesn't like ISO-8859-1
Message-ID: <200012081622.IAA14454@sf-web3.vaspecialprojects.com>

Bug #125004, was updated on 2000-Dec-08 08:22
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: ornicar
Assigned to : Nobody
Summary: 4xslt: XPath doesn't like ISO-8859-1

Details: 
Hello,

  Today, I'm using 4xslt XSLT engine to transform an XML file into another
nicer XML file. In the attached example, there is a data.xml file that
contains a short description of my agenda but I don't like this
representation because the <appointment> node contains the time of the
meeting (before the ' Dur�e: ' word) and the duration of the meeting
(after the ' Dur�e: ' word) (in french, 'Dur�e' means 'Duration') :
    <appointment>11h00 Dur�e: 20mn</appointment>

  Therefore, I constructed an xslt file to turn my agenda into a new
agenda with separated nodes for the time of the meeting and the duration 
of the meeting :
     <appointment>
       <time>11h00</time>
       <duration>20mn</duration>
     </appointment>

   The xslt stylesheet has to take the content of the text node child of
the <appointement> node and to divide it in two parts: what is before '
Dur�e: ' and what is after. This can be easily done in xslt by using the 
substring-before() and substring-after() functions :
    <xsl:value-of select="substring-before(text(),' Dur�e: ')" />
and
    <xsl:value-of select="substring-after(text(),' Dur�e: ')" />

  Unfortunately, 4xslt doesn't like the "�" character in an XPath
expression (the expression inside the select) and returns the attached
stacktrace ending with:
    xml.xpath.XPathParserBase.SyntaxException: 
    ********** Syntax Exception **********
    Exception at or near "�"
      Line: 0, Production Number: 0
Of course changing "Dur�e" with "Duree" bothly in the xml file and in
the xslt stylesheet fixes the bug but this is not very satisfying.  Using
another xslt engine (e.g. Xalan) allows transformation even in the case
with an "�" character.

  This seems to be a bug in XPath expression processing (4xpath doesn't
like ISO-8859-1 characters).

    O. CAYROL.

PS: see attached files are below ...
_________________________________________________________________________
Olivier CAYROL LOGILAB - Paris (France)
                                                 http://www.logilab.com/
For Christmas, give yourself an Intelligent Personal Assistant (free)
Pour No�l, offrez-vous un Assistant Personnel Intelligent (c'est gratuit)
_________________________________________________________________________

_________________________________________________________________________
data.xml "Initial XML file"

<?xml version="1.0" encoding="ISO-8859-1"?>

<agenda>
  <appointment>11h00 Dur�e: 20mn</appointment>
  <appointment>11h30 Dur�e: 40mn</appointment>
</agenda>
_________________________________________________________________________
transf.xslt "XSLT stylesheet"

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:strip-space elements="*"/>
  <xsl:output method="xml" 
              encoding="ISO-8859-1" 
              indent="yes" />

  <xsl:template match="/">
<xmlagenda>
    <xsl:apply-templates select="agenda/appointment"/>
</xmlagenda>
  </xsl:template>

  <xsl:template match="appointment">
<appointment>
<time>
    <xsl:value-of select="substring-before(text(),' Dur�e: ')"/>
</time>
<duration>
    <xsl:value-of select="substring-after(text(),' Dur�e: ')"/>
<//duration>
</appointment>
  </xsl:template>

</xsl:stylesheet>
_________________________________________________________________________
agenda.xml "Expected XML output"

<?xml version="1.0" encoding="ISO-8859-1"?>
<xmlagenda>
    <appointment>
        <time>11h00</time>
        <duration>20mn</duration>
    </appointment>
    <appointment>
        <time>11h30</time>
        <duration>40mn</duration>
    </appointment>
</xmlagenda>
_________________________________________________________________________
Stacktrace

 $ 4xslt data.xml transf.xslt
Traceback (innermost last):
  File "/usr/bin/4xslt", line 5, in ?
    _4xslt.Run(sys.argv)
  File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 85, in
Run
    processor.appendStylesheetUri(sty)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 86,
in appendStylesheetUri
    sty = self._styReader.fromUri(styleSheetUri)
  File "/usr/lib/python1.5/site-packages/Ft/Lib/ReaderBase.py", line 99,
in fromUri
    rt = self.fromStream(stream, baseUri, ownerDoc, stripElements)
  File "/usr/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py",
line 300, in fromStream
    sheet.setup()
  File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line
144, in setup
    curr_node.setup()
  File "/usr/lib/python1.5/site-packages/xml/xslt/ValueOfElement.py", line
34, in setup
    self.__dict__['_expr'] = parser.parseExpression(self._select)
  File "/usr/lib/python1.5/site-packages/xml/xpath/XPathParser.py", line
36, in parseExpression
    XPathParserBase.XPathParserBase.parse(self, st)
  File "/usr/lib/python1.5/site-packages/xml/xpath/XPathParserBase.py",
line 60, in parse
    XPath.cvar.g_prodNum)
xml.xpath.XPathParserBase.SyntaxException: 
********** Syntax Exception **********
Exception at or near "�"
  Line: 0, Production Number: 0

_________________________________________________________________________


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125004&group_id=6473


From noreply@sourceforge.net  Sat Dec  9 00:26:51 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 8 Dec 2000 16:26:51 -0800
Subject: [XML-SIG] [Bug #125043] Losing attributes when cloning an element
Message-ID: <200012090026.QAA32725@usw-sf-web1.sourceforge.net>

Bug #125043, was updated on 2000-Dec-08 16:26
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: jkloth
Assigned to : nobody
Summary: Losing attributes when cloning an element

Details: I ran into a bug with cloning non-namespace XML Elements with
attributes,
illustrated by the following testcase:

>>> from xml.dom.Document import Document
>>> dom=Document(None)
>>> dom.appendChild(dom.createElement('foo'))
<Element Node at 134866000: Name = 'foo' with 0 attributes and 0
children>
>>> dom.documentElement.setAttribute('name', 'bar')
>>> dom.documentElement.setAttribute('spam', 'eggs')
>>> clone=dom.documentElement.cloneNode(deep=0)
>>> clone
<Element Node at 135075928: Name = 'foo' with 1 attributes and 0
children>
>>> dom.documentElement
<Element Node at 134866000: Name = 'foo' with 2 attributes and 0
children>
>>> clone.attributes
<NamedNodeMap at 135078128: {('', ''): <Attribute Node at 135078696:
Name = "name", Value = "bar">}>
>>> dom.documentElement.attributes
<NamedNodeMap at 134865912: {'spam': <Attribute Node at 135060184: Name
= "spam", Value = "eggs">, 'name': <Attribute Node at 135066776: Name =
"name", Value = "bar">}>


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125043&group_id=6473


From Mike.Olson@fourthought.com  Sat Dec  9 06:52:58 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Fri, 08 Dec 2000 23:52:58 -0700
Subject: [XML-SIG] SourceForge
Message-ID: <3A31D6CA.5F0629CD@FourThought.com>

Am I just having a bad night, or is something borken at SourceForge? 
I've been trying to update bog #124375 and I keep getting index errors. 
So I tried to submit a bug to sourceForge, and I get roughly the same
error.  I looked all over the site but couldn't find an email address to
ask, so I thought I'd see if others are have problems.

Mike


-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  9 08:07:35 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 9 Dec 2000 09:07:35 +0100
Subject: [XML-SIG] SourceForge
In-Reply-To: <3A31D6CA.5F0629CD@FourThought.com> (message from Mike Olson on
 Fri, 08 Dec 2000 23:52:58 -0700)
References: <3A31D6CA.5F0629CD@FourThought.com>
Message-ID: <200012090807.JAA00693@loewis.home.cs.tu-berlin.de>

> Am I just having a bad night, or is something borken at SourceForge? 

It appears indeed that SF is down. Currently, I get a page that reads

An error occured in the logger. ERROR: Relation 'activity_log' does not exist 

That's what you get for using PHP3 instead of Python :-)

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Sat Dec  9 10:04:38 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sat, 9 Dec 2000 11:04:38 +0100 (CET)
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
Message-ID: <Pine.LNX.4.21.0012091102570.11554-100000@orion.logilab.fr>

Hello,

It looks like sourceforge is back online. It also looks like the PyXML
project was lost in deep space: http://sourceforge.net/projects/PyXML/
send me to an invalid project page. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From Alexandre.Fayolle@logilab.fr  Sat Dec  9 10:09:03 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sat, 9 Dec 2000 11:09:03 +0100 (CET)
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
In-Reply-To: <Pine.LNX.4.21.0012091102570.11554-100000@orion.logilab.fr>
Message-ID: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr>

On Sat, 9 Dec 2000, Alexandre Fayolle wrote:

> Hello,
> 
> It looks like sourceforge is back online. It also looks like the PyXML
> project was lost in deep space: http://sourceforge.net/projects/PyXML/
> send me to an invalid project page. 

Well, sorry for raising a false alarm, it still there, but the page is
/projects/pyxml/ (lowercase), and this is why my bookmark no longer
worked. 

This leaves me wondering on why the name changed, though...

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  9 10:33:29 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 9 Dec 2000 11:33:29 +0100
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
In-Reply-To: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr>
 (message from Alexandre Fayolle on Sat, 9 Dec 2000 11:09:03 +0100
 (CET))
References: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr>
Message-ID: <200012091033.LAA01259@loewis.home.cs.tu-berlin.de>

> Well, sorry for raising a false alarm, it still there, but the page is
> /projects/pyxml/ (lowercase), and this is why my bookmark no longer
> worked. 
> 
> This leaves me wondering on why the name changed, though...

To my knowledge, the SF project was always pyxml, and thus never
changed. I don't know why a different spelling was accepted before.

Regards,
Martin


From Mike.Olson@fourthought.com  Sat Dec  9 18:53:02 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sat, 09 Dec 2000 11:53:02 -0700
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
References: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr> <200012091033.LAA01259@loewis.home.cs.tu-berlin.de>
Message-ID: <3A327F8E.C65DA0B6@FourThought.com>

"Martin v. Loewis" wrote:
> 
hmm, I still cannot change the status of a bug though....

Mike


> 
> Regards,
> Martin
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Dec  9 21:33:09 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 9 Dec 2000 22:33:09 +0100
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
In-Reply-To: <3A327F8E.C65DA0B6@FourThought.com> (message from Mike Olson on
 Sat, 09 Dec 2000 11:53:02 -0700)
References: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr> <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> <3A327F8E.C65DA0B6@FourThought.com>
Message-ID: <200012092133.WAA01892@loewis.home.cs.tu-berlin.de>

> hmm, I still cannot change the status of a bug though....

You could not modify the status of a patch; I have changed that. I
can't see any reason why you can't modify the status of a bug - what
is the response you get from SF?

Regards,
Martin


From Mike.Olson@fourthought.com  Sat Dec  9 22:06:42 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sat, 09 Dec 2000 15:06:42 -0700
Subject: [XML-SIG] sourceforge PyXML project disappeared ?!
References: <Pine.LNX.4.21.0012091107170.11583-100000@orion.logilab.fr> <200012091033.LAA01259@loewis.home.cs.tu-berlin.de> <3A327F8E.C65DA0B6@FourThought.com> <200012092133.WAA01892@loewis.home.cs.tu-berlin.de>
Message-ID: <3A32ACF2.3D387E23@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > hmm, I still cannot change the status of a bug though....
> 
> You could not modify the status of a patch; I have changed that. I
> can't see any reason why you can't modify the status of a bug - what
> is the response you get from SF?

Never mind, it seems as they have fixed something there.

Mike


> 
> Regards,
> Martin

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Sat Dec  9 22:14:26 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 9 Dec 2000 14:14:26 -0800
Subject: [XML-SIG] [Bug #125186] xsl:number fails for two-level numbering
Message-ID: <200012092214.OAA27650@usw-sf-web2.sourceforge.net>

Bug #125186, was updated on 2000-Dec-09 14:14
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: nobody
Assigned to : nobody
Summary: xsl:number fails for two-level numbering

Details: Consider the following stylesheet.

<?xml version="1.0"?>

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:template match="div1">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="div2">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="div1/head">
    <h2>
      <xsl:apply-templates select=".." mode="divnum"/>
      <xsl:apply-templates/>
    </h2>
  </xsl:template>

  <xsl:template match="div2/head">
    <h3>
      <xsl:apply-templates select=".." mode="divnum"/>
      <xsl:apply-templates/>
    </h3>
  </xsl:template>

  <xsl:template mode="divnum" match="div1">
    <xsl:number format="1 "/>
  </xsl:template>

  <xsl:template mode="divnum" match="div2">
    <xsl:number level="multiple" count="div1 | div2" format="1.1 "/>
  </xsl:template>

</xsl:transform>

Apply this to the following XML:

<?xml version="1.0"?>
<book>
<div1>
  <head>Chapter 1</head>
  Chapter 1 content.
  <div2>
    <head> Section 1.1</head>
    Section 1.1 content.
  </div2>
  <div2>
    <head> Section 1.2</head>
    Section 1.2 content.
  </div2>
</div1>
<div1>
  <head>Chapter 2</head>
  Chapter 2 content.
  <div2>
    <head> Section 2.1</head>
    Section 2.1 content.
  </div2>
  <div2>
    <head> Section 2.2</head>
    Section 2.2 content.
  </div2>
</div1>
</book>

The result is:

<?xml version='1.0' encoding='UTF-8'?>


<h2>1 Chapter 1</h2>
  Chapter 1 content.
  
    <h3>3  Section 1.1</h3>
    Section 1.1 content.
  
  
    <h3>3  Section 1.2</h3>
    Section 1.2 content.
  

  <h2>2 Chapter 2</h2>
  Chapter 2 content.
  
    <h3>4  Section 2.1</h3>
    Section 2.1 content.
  
  
    <h3>4  Section 2.2</h3>
    Section 2.2 content.

As you can see, the level two numbers are wrong.  Instead of e.g. "2.2", it gives a single number, which appears to the be total of all div1 through the current one, plus div2 elements through the current entire div1 section.  In other words, Section 2.1 is numbered as "4" since there are two div1 sections up through 2.2, and two div2 sections in the second div1 section.  

This is a subset of the "xmlspec.xsl" file, used to transform W3C specifications.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125186&group_id=6473


From dieter@handshake.de  Sun Dec 10 08:45:32 2000
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 10 Dec 2000 09:45:32 +0100 (CET)
Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference
Message-ID: <14899.17068.574076.957348@lindm.dm>

I use the SAX2 implementation bundled with the Python 2.0
distribution to process DocBook/XML documents.

When I turn on validation, "xmlproc" complains
"unsupported character number 'XXXX' in character reference"
for each XXXX larger than 255.

Apparently, "xmlproc" does not yet know that such character
references no longer make problems with the new Python
unicode support.

Is there already a fix? If not, I can look into the problem.


Dieter


From noreply@sourceforge.net  Sun Dec 10 09:10:34 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Dec 2000 01:10:34 -0800
Subject: [XML-SIG] [Bug #125225] system-property(xsl:vendor-url) fails
Message-ID: <200012100910.BAA06127@usw-sf-web1.sourceforge.net>

Bug #125225, was updated on 2000-Dec-10 01:10
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: rtmyers
Assigned to : nobody
Summary: system-property(xsl:vendor-url) fails

Details: Using 4XSLT v.0.10.2, RH 6.2, Python 1.5.2.

Following stylesheet:

<?xml version="1.0"?>

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:template match="/">
    <xsl:message><xsl:value-of select="system-property('xsl:vendor-url')"/></xsl:message>
  </xsl:template>

</xsl:transform>

Running this against arbitrary XML file gives traceback:

[rtm@rabbit xsgf]# 4xslt table.xml vendor.xsl
Traceback (innermost last):
  File "/usr/bin/4xslt", line 5, in ?
    _4xslt.Run(sys.argv)
  File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 87, in Run
    topLevelParams=top_level_params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 127, in runUri
    writer, uri, outputStream)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 177, in runNode
    self.applyTemplates(context, None)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 193, in applyTemplates
    found = sty.applyTemplates(context, mode, self, params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 356, in applyTemplates
    patternInfo[TEMPLATE].instantiate(context, processor, params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line 115, in instantiate
    context = child.instantiate(context, processor)[0]
  File "/usr/lib/python1.5/site-packages/xml/xslt/MessageElement.py", line 41, in instantiate
    context = child.instantiate(context, processor)[0]
  File "/usr/lib/python1.5/site-packages/xml/xslt/ValueOfElement.py", line 41, in instantiate
    result = self._expr.evaluate(context)
  File "/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py", line 171, in evaluate
    return self._func(context, arg0)
  File "/usr/lib/python1.5/site-packages/xml/xslt/ExtFunctions.py", line 126, in SystemProperty
    if split_name[0] == XSL_NAMESPACE:
NameError: XSL_NAMESPACE
[rtm@rabbit xsgf]# 

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125225&group_id=6473


From kentsin@sinaman.com  Mon Dec 11 00:00:53 2000
From: kentsin@sinaman.com (kentsin)
Date: Sun Dec 10 18:00:53 CST 2000
Subject: [XML-SIG] xml / html parsing for webbot
Message-ID: <20001210100053.20311.qmail@hk.sina.com.hk>

Dear All,

I am learning to build a webbot. I am reading Jeff's webbot code. 

I have some difficults and doubts:

1. xml.dom.walker and xml.dom.writer is missing in python 2.0 's xml package. What are their usage?

2. I have think of not building a dom tree but using regular expressions to extract all links. Can somebody tell me from their experience some comparision of the two approaches? What is better? Especially I found some pages which were generated by scripts, do contain unmatched tags in the pages. How the two approaches handle them?

Rgs,

KEnt Sin


===================================================================
�s���K�O�q�l�l�c http://sinamail.sina.com.hk 
�ߧY�U�� SinaTicker http://sinaticker.sina.com.hk


From martin@loewis.home.cs.tu-berlin.de  Sun Dec 10 10:48:39 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 10 Dec 2000 11:48:39 +0100
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: <20001210100053.20311.qmail@hk.sina.com.hk> (message from kentsin
 on Sun Dec 10 18:00:53 CST 2000)
References: <20001210100053.20311.qmail@hk.sina.com.hk>
Message-ID: <200012101048.LAA00761@loewis.home.cs.tu-berlin.de>

> 1. xml.dom.walker and xml.dom.writer is missing in python 2.0 's xml
> package. What are their usage?

Indeed. These classes originate from PyDOM, which is obsolete. In
Python 2.0, only minidom is included. There is no equivalent of a
walker class in minidom. Instead of a writer, you can probably use
.toxml() in most cases.

> I have think of not building a dom tree but using regular
> expressions to extract all links. Can somebody tell me from their
> experience some comparision of the two approaches? What is better?

In principle, an approach using regular expressions could fail more
easily than a solution that really analysis the structure of the
document. For most practical purposes, the solution using regular
expressions will work just fine. In the end, all that matters is that
it works.

> Especially I found some pages which were generated by scripts, do
> contain unmatched tags in the pages. How the two approaches handle
> them?

For that purpose, the DOM authors made special support for HTML. You
normally need a special parser, one that is capable of processing
HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I
believe, is capable of converting arbitrary HTML into a DOM tree.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Sun Dec 10 13:21:37 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sun, 10 Dec 2000 14:21:37 +0100 (CET)
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: <200012101048.LAA00761@loewis.home.cs.tu-berlin.de>
Message-ID: <Pine.LNX.4.21.0012101415420.16772-100000@orion.logilab.fr>

> For that purpose, the DOM authors made special support for HTML. You
> normally need a special parser, one that is capable of processing
> HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I
> believe, is capable of converting arbitrary HTML into a DOM tree.

Logilab contributed a much improved version of FromHtml to 4DOM a while
ago which was included in 4Suite 0.9.2 I think. I don't know which version
is shipped in PyXml 0.6.2, though. If you need this piece of code, and
can't find it in your distribution, jsut ask.


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Sun Dec 10 13:32:03 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 10 Dec 2000 06:32:03 -0700
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sun, 10 Dec 2000 11:48:39 +0100." <200012101048.LAA00761@loewis.home.cs.tu-berlin.de>
Message-ID: <200012101332.GAA11760@localhost.localdomain>

> > Especially I found some pages which were generated by scripts, do
> > contain unmatched tags in the pages. How the two approaches handle
> > them?
> 
> For that purpose, the DOM authors made special support for HTML. You
> normally need a special parser, one that is capable of processing
> HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I
> believe, is capable of converting arbitrary HTML into a DOM tree.

Correct as usual, Martin, although Python's standard htmllib gets much of the 
credit for wrangling unruly HTML.

Here's a little demo.  It shows how to read in any HTML and print out shiny 
XHTML.  Basically, it has the functionality of the highly popular Tidy 
(http://www.w3.org/People/Raggett/tidy/) or JTidy (http://lempinen.net/sami/jti
dy/) but with XHTML output (Can be easily modified to produce cleaned HTML 
output)

[uogbuji@borgia one-offs]$ cat html-to-xhtml-converter.py 
import sys
from xml.dom.ext.reader import HtmlLib
import xml.dom.ext

#set up a re-usable reader object
reader = HtmlLib.Reader()

#parse HTML ffrom file or URI given on command line.  Return the DOM document
doc = reader.fromUri(sys.argv[1])

#Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for 
empty tags, all attributes with given value, etc.

xml.dom.ext.XHtmlPrettyPrint(doc)

[uogbuji@borgia one-offs]$ cat data/example-from-wsdl-xslt-article.html 
<HTML>
  <HEAD>
    <TITLE>Service summary: EndorsementSearch</TITLE>
    <META charset='UTF-8' HTTP-EQUIV='content-type' CONTENT='text/html'>
  </HEAD>
  <BODY STYLE='background: #ffffff'>
    <H1>Service summary: EndorsementSearch</H1>
    <HR>
    <TABLE>
      <THEAD>Service: EndorsementSearchService</THEAD>
      <TBODY>
        <TR>
          <TD STYLE='background: #ccffff' COLSPAN='3'>
            <I>snowboarding-info.com Endorsement Service</I>
          </TD>
        </TR>
        <TR>
          <TD>Port: </TD>
          <TD STYLE='background: #ffccff'>http://www.snowboard-info.com/Endorse
mentSearch</TD>
          <TD STYLE='background: #ff66ff'>SOAP</TD>
        </TR>
      </TBODY>
    </TABLE>
  </BODY>
</HTML>
[uogbuji@borgia one-offs]$ python html-to-xhtml-converter.py 
data/example-from-wsdl-xslt-article.html
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"DTD/xhtml1-strict.dtd">

<html xmlns = 'http://www.w3.org/1999/xhtml'>
 <head>
  <title/>Service summary: EndorsementSearch
  <meta charset='UTF-8' http-equiv='content-type' content='text/html'/>
 </head>
 <body style='background: #ffffff'>
  <h1>Service summary: EndorsementSearch</h1>
  <hr/>
  <table>
   <thead/>Service: EndorsementSearchService
   <tbody/>
   <tr>
    <td style='background: #ccffff' colspan='3'>
     <i>snowboarding-info.com Endorsement Service</i>
    </td>
   </tr>
   <tr>
    <td>Port:</td>
    <td style='background: #ffccff'>http://www.snowboard-info.com/EndorsementSe
arch</td>
    <td style='background: #ff66ff'>SOAP</td>
   </tr>
  </table>
 </body>
</html>
[uogbuji@borgia one-offs]$ 


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Dec 10 13:51:59 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 10 Dec 2000 06:51:59 -0700
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Sun, 10 Dec 2000 14:21:37 +0100." <Pine.LNX.4.21.0012101415420.16772-100000@orion.logilab.fr>
Message-ID: <200012101351.GAA11831@localhost.localdomain>

> > For that purpose, the DOM authors made special support for HTML. You
> > normally need a special parser, one that is capable of processing
> > HTML, and still building a DOM tree. PyXML now includes 4DOM, which, I
> > believe, is capable of converting arbitrary HTML into a DOM tree.
> 
> Logilab contributed a much improved version of FromHtml to 4DOM a while
> ago which was included in 4Suite 0.9.2 I think. I don't know which version
> is shipped in PyXml 0.6.2, though. If you need this piece of code, and
> can't find it in your distribution, jsut ask.

This was after PyXML 0.6.2, so it's not included.  We have a few improvements 
to make yet to 4DOM before we release 4Suite 0.10.1 in a few weeks.  Are there 
any plans on the horizon to release PyXML 0.6.3?  If so, we'll get all the 
changes in before then.

I should note that the code from Logilab meticulously sets up the HTML content 
model according to spec.  It's a brilliant piece of work.  However, in many 
cases of HTML usage you would be able to get by just fine with the DOM code in 
PyXML 0.6.2.  If you start to run into problems, you might want to install 
4Suite 0.10.0 which includes LogiLab's code and many other fixes.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Dec 10 14:20:59 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 10 Dec 2000 07:20:59 -0700
Subject: [XML-SIG] 4XPath and Unicode
Message-ID: <3A33914B.6A28143A@fourthought.com>

See

https://sourceforge.net/bugs/?func=detailbug&group_id=6473&bug_id=125004

This bug covers the fact that 4XPath is really limited to the US-ASCII
encoding.  No ISO-8859-?, no Unicode, none of the other encodings
supported by the i18n-sig such as JIS or BIG5.  This really sucks. 
Especially after we've put so much work into i18n in other parts of
4Suite, and especially since Python 2.0 finally gives us native
character encoding support.

The problem is that 4XPath's lexer is implemented using Flex.  Flex is
really ancient code still mired in the world of C's char.  Even 8-bit
scanners can be a big deal for Flex, never mind wide characters.

We could hack in ISO-8859-? support into the Flex at great effort and
close the above bug, but it doesn't provide a long-term fix.

Another provlem with Flex is that we are having the devil of a time
making it thread-safe, which we need for 4Suite Server.  Bison, in
contrast, we've got safely concurrent now.

Conclusion: we've pretty much decided to ditch Flex, and ditch it
quickly.  In fact we're working towards 4Suite 0.10.1's using a
different scanner entirely when it's released in a couple of weeks.

Here are the options we're exploring:

1) Move all XPath parsing to another technology, perhaps Spark
(http://www.csr.uvic.ca/~aycock/python/).  Pro: it's in Python and
should be easy to maintain.  Con: we might lose performance, and most
Python scanner/parser packages seem to be only sporadically maintained. 
For instance, Spark's last update (0.6.1) was in April.  We'd like to
avoid being stuck maintaining a parser package in addition to everything
else.

2) Use an existing Python package for lexing, for instance mxTextTools. 
Pro: should be easier to convert and maintain.  Con: performance?
encoding support?

3) Write our own scanner in Python using SRE.  We'd probably have one
Python code to tokenize and then write a shell in C to feed the tokens
to Bison.  This would ensure best performance.  Pro: performance, we get
to add all the encoding support we want directly.  Con: maintainability.

We'd love to hear of any other ideas or comments on the above.  It will
be a good deal of work to fix our scanner, and we'd like to only have to
do it once, with relatively straightforward maintenance thereafter.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From calvin@cs.uni-sb.de  Sun Dec 10 14:35:14 2000
From: calvin@cs.uni-sb.de (Bastian Kleineidam)
Date: Sun, 10 Dec 2000 15:35:14 +0100 (CET)
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: <20001210100053.20311.qmail@hk.sina.com.hk>
Message-ID: <Pine.LNX.4.21.0012101504151.20378-100000@earth.cs.uni-sb.de>

Hello Kent,

>2. I have think of not building a dom tree but using regular expressions  
> to extract all links. Can somebody tell me from their experience some
> comparision of the two approaches? What is better? Especially I found
> some pages which were generated by scripts, do contain unmatched tags in
> the pages. How the two approaches handle them?

I am using Regexps:
_linkMatcher = r"""
    (?i)           # case insensitive
    <              # open tag
    \s*            # whitespace
    %s             # tag name
    \s+            # whitespace
    [^>]*?         # skip leading attributes
    %s             # attrib name
    \s*            # whitespace
    =              # equal sign
    \s*            # whitespace
    (?P<value>     # attribute value
     ".*?" |       # in double quotes
     '.*?' |       # in single quotes
     [^\s>]+)      # unquoted
    ([^">]|".*?")* # skip trailing attributes
    >              # close tag
    """
# and now fill in some tags:
LinkPatterns = (
    re.compile(_linkMatcher % ("a", "href"), re.VERBOSE),
    re.compile(_linkMatcher % ("img",   "src"), re.VERBOSE),
    re.compile(_linkMatcher % ("form",  "action"), re.VERBOSE),
    re.compile(_linkMatcher % ("body",  "background"), re.VERBOSE),
    re.compile(_linkMatcher % ("frame", "src"), re.VERBOSE),
    re.compile(_linkMatcher % ("link",  "href"), re.VERBOSE),
    # <meta http-equiv="refresh" content="x; url=...">
    re.compile(_linkMatcher % ("meta",  "url"), re.VERBOSE),
    re.compile(_linkMatcher % ("area",  "href"), re.VERBOSE),
    re.compile(_linkMatcher % ("script", "src"), re.VERBOSE),
)

This regex even catches missing quotes:
<a href="bla>
<a href=bla">

But only if you strip leading and trailing quotes from the URL.
For a complete code example get Linkchecker:
http://linkchecker.sourceforge.net
and look in linkcheck/UrlData.py

Bastian


From chapmanb@arches.uga.edu  Sun Dec 10 14:51:21 2000
From: chapmanb@arches.uga.edu (Brad Chapman)
Date: Sun, 10 Dec 2000 09:51:21 -0500 (EST)
Subject: [XML-SIG] 4XPath and Unicode
In-Reply-To: <3A33914B.6A28143A@fourthought.com>
References: <3A33914B.6A28143A@fourthought.com>
Message-ID: <14899.39017.639236.429461@taxus.athen1.ga.home.com>

Uche writes:
> Conclusion: we've pretty much decided to ditch Flex
> 
> Here are the options we're exploring:

[Spark, mxTextTools, SRE] 

> We'd love to hear of any other ideas or comments on the above.

One option to consider is Martel, written by Andrew Dalke:

http://www.biopython.org/~dalke/Martel

It's a parser generator which allows you to build up a grammer for a
format using regular expressions. It provides a bunch of "high level"
regular expressions to allow you to build up a readable and
maintainable grammer for what you want to parse.

It uses SRE and mxTextTools (both of which you mention above) under
the covers, and returns the parse tree as XML callbacks that you can
deal with using a standard SAX handler.

I've used it to develop parsers for a couple of different formats and
found it very nice to use. It is a "spare time" project of Andrew's,
but he is working on it quite often, so it is currently very
well-maintained. 

I hope this helps!

Brad


From martin@loewis.home.cs.tu-berlin.de  Sun Dec 10 18:32:18 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 10 Dec 2000 19:32:18 +0100
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: <200012101351.GAA11831@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012101351.GAA11831@localhost.localdomain>
Message-ID: <200012101832.TAA00709@loewis.home.cs.tu-berlin.de>

> This was after PyXML 0.6.2, so it's not included.  We have a few
> improvements to make yet to 4DOM before we release 4Suite 0.10.1 in
> a few weeks.  Are there any plans on the horizon to release PyXML
> 0.6.3?  If so, we'll get all the changes in before then.

There is a number of pending minidom changes which need to be
reviewed, corrected, and applied in order, both to PyXML and Python
proper. I don't know when this will happen, it much depends on Fred,
Andrew and myself finding the time for it. 

After that, I'd like to release 0.6.3. If possible, I'd like to get a
4DOM update there, too - but it would not be a problem to release
PyXML 0.6.4 shortly after 4Suite 0.10.1. Release early, release often.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sun Dec 10 18:41:32 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 10 Dec 2000 19:41:32 +0100
Subject: [XML-SIG] 4XPath and Unicode
In-Reply-To: <3A33914B.6A28143A@fourthought.com> (message from Uche Ogbuji on
 Sun, 10 Dec 2000 07:20:59 -0700)
References: <3A33914B.6A28143A@fourthought.com>
Message-ID: <200012101841.TAA00760@loewis.home.cs.tu-berlin.de>

> 1) Move all XPath parsing to another technology, perhaps Spark
> (http://www.csr.uvic.ca/~aycock/python/).  Pro: it's in Python and
> should be easy to maintain.

I hope I can find some time to write an XPath parser in YAPPS. Is
there some readily-readable grammar for XPath? I find the bisongen
input of 4Suite extremely hard to read.

I think the time would not be wasted to evaluate different parser
toolkits in that application. I have the feeling that XPath is
sufficiently simple put together a parser in any of these toolkits; we
could then evaluate speed and readability of the generator input.

> Con: we might lose performance, and most Python scanner/parser
> packages seem to be only sporadically maintained.  For instance,
> Spark's last update (0.6.1) was in April.  We'd like to avoid being
> stuck maintaining a parser package in addition to everything else.

As for performance: Most of it probably comes from the lexing speed;
with sre, I hope that we can perform comparable to flex.

If 4Suite (and perhaps PyXML) made an educated selection for a parser
generator toolkit, that may set sufficient precedence of establishing
a standard, and getting the author of the toolkit interested in
improving it.

Furthermore, these things normally don't need much maintainance -
bison is still in wide use, even though it is not maintained anymore.

> 2) Use an existing Python package for lexing, for instance mxTextTools. 
> Pro: should be easier to convert and maintain.  Con: performance?
> encoding support?

I'd discourage yet another C module. It is *very* unlikely that they
get reasonable Unicode support.

> 3) Write our own scanner in Python using SRE.  We'd probably have one
> Python code to tokenize and then write a shell in C to feed the tokens
> to Bison.  This would ensure best performance.  Pro: performance, we get
> to add all the encoding support we want directly.  Con: maintainability.

Also, this is exactly what all these parser toolkits do - I don't
think there is need for yet another one.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sun Dec 10 18:46:36 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 10 Dec 2000 19:46:36 +0100
Subject: [XML-SIG] 4XPath and Unicode
In-Reply-To: <14899.39017.639236.429461@taxus.athen1.ga.home.com> (message
 from Brad Chapman on Sun, 10 Dec 2000 09:51:21 -0500 (EST))
References: <3A33914B.6A28143A@fourthought.com> <14899.39017.639236.429461@taxus.athen1.ga.home.com>
Message-ID: <200012101846.TAA00854@loewis.home.cs.tu-berlin.de>

> I've used it to develop parsers for a couple of different formats and
> found it very nice to use. It is a "spare time" project of Andrew's,
> but he is working on it quite often, so it is currently very
> well-maintained. 

It seems that this supports only regular expressions, so it can't
really express an LR(n) language, such as XPath, can it?

Regards,
Martin


From uche.ogbuji@fourthought.com  Sun Dec 10 19:19:59 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 10 Dec 2000 12:19:59 -0700
Subject: [XML-SIG] 4XPath and Unicode
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sun, 10 Dec 2000 19:46:36 +0100." <200012101846.TAA00854@loewis.home.cs.tu-berlin.de>
Message-ID: <200012101919.MAA01996@localhost.localdomain>

> > I've used it to develop parsers for a couple of different formats and
> > found it very nice to use. It is a "spare time" project of Andrew's,
> > but he is working on it quite often, so it is currently very
> > well-maintained. 
> 
> It seems that this supports only regular expressions, so it can't
> really express an LR(n) language, such as XPath, can it?

I think this shoots it down.  XPath is not an enormously complex language, but 
it's not a regular grammar either.  I don't have a formal proof that XPath is 
LR(k), but I've written enough parsers that I think I can confidently say so 
(besides, Martin thinks so as well).  This means that we'll either have to 
find an LR(k) parser engine for Python, or just replace the scanner and stick 
with Bison.  I'm inclined to agree with Martin in his other post that we 
should just find a scanner package for Python that already takes advantage of 
SRE and feed its token stream to Bison.

I'll try to investigate some lexer toolkits today.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Dec 10 19:48:57 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 10 Dec 2000 12:48:57 -0700
Subject: [XML-SIG] [Fwd: [4suite] For Python 1.5.2 users]
Message-ID: <3A33DE29.7C758006@fourthought.com>


-------- Original Message --------
Subject: [4suite] For Python 1.5.2 users
Date: Sun, 10 Dec 2000 12:42:17 -0700
From: Uche Ogbuji <uche.ogbuji@fourthought.com>
Organization: Fourthought, Inc
To: 4suite@fourthought.com

At Alexandre's suggestion I've put up Martin von Loewis's add-on package
for unicode and ISO-8859-?.  This can be used with PyXML 0.6.0 through
0.6.2.  Versions 0.6.3 and higher will have it build in, but for now
it's available at

ftp://ftp.fourthought.com/pub/third-party/xml-sig/unicode-py152-20001210.tar.gz


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
_______________________________________________
4suite mailing list
4suite@lists.fourthought.com
http://lists.fourthought.com/mailman/listinfo/4suite


From fdrake@acm.org  Sun Dec 10 20:56:45 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sun, 10 Dec 2000 15:56:45 -0500 (EST)
Subject: [XML-SIG] xml / html parsing for webbot
In-Reply-To: <200012101832.TAA00709@loewis.home.cs.tu-berlin.de>
References: <200012101351.GAA11831@localhost.localdomain>
 <200012101832.TAA00709@loewis.home.cs.tu-berlin.de>
Message-ID: <14899.60941.119363.272129@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > There is a number of pending minidom changes which need to be
 > reviewed, corrected, and applied in order, both to PyXML and Python
 > proper. I don't know when this will happen, it much depends on Fred,
 > Andrew and myself finding the time for it. 

  I've been doing more XML stuff lately, so this is becoming more of a
priority for me.  I'm not sure exactly when I'll be able to get it
done, however, but it should be before too long.
  On a related note, I've just written an xml.sax.xmlreader.XMLReader
subclass that reads ESIS data, so we should be able to drive a SAX
application from an ESIS stream.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From kens@sightreader.com  Mon Dec 11 04:23:39 2000
From: kens@sightreader.com (Ken)
Date: Sun, 10 Dec 2000 22:23:39 -0600
Subject: [XML-SIG] Child nodes and lazy evaluation (Generators)
Message-ID: <003501c0632a$237aa3b0$04090a0a@devup.upcast.com>

>> This sounds like an excellent utility for a "pull DOM parser", where
>> you receive DOM events as you ask for them, out of a queue.  In a
>> basic "pull DOM parser" though, no real magic is necessary as long as
>> you have an incremental parser feeding the DOM builder.
>>
>> James Clark's Jade DSSSL processor uses a similar technique for
>> manipulating partial groves.  Jade had the ability to be parsing the
>> source file and doing the transform in parallel, if any node requested
>> was not yet parsed, the node request would block until the parser
>> thread caught up.
>
>Yes.  If Python gets coroutines, this would be pretty simple to implement
as
>well.  As I've mentioned on the 4Suite lists, if some of the facilities
from
>Stackless were to move into cpython (which seems likely), a _lot_ of
>sophistication will become available for XML processing patterns that I
think
>would put us way ahead of Java, Perl, etc.

Who needs to wait for coroutines?  The generator module already works!
Coroutines would, of course, make it faster, but it's fine as it is for I/O
bound processes.  Also, the Generator module can be rewritten later with
coroutines (or related technique) without changing the usage syntax, so a
current solution could have a long lifetime.  The main point of Generator is
the pretty usage syntax (i.e. a buffered asyncronous threaded data stream as
a simple sequence object).

James Clark's Jade approach sounds exactly like what I have in mind, except
for the usage syntax.  The children of a node would be returned as a
Generator (which would behave just like a list, except that it would block
for unparsed children).

Admittedly, this approach is a little frivolous in it's creation of threads
(you should ideally only need one parser thread), but as I mentioned, this
shouldn't be a problem for I/O bound situations, and maybe the nested
Generator concept could be improved upon without changing the syntax (e.g.
the generators could share a thread).

The Generator module is available at:
http://starship.python.net/crew/seehof/Generator.html


From mal@lemburg.com  Mon Dec 11 10:03:37 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 11 Dec 2000 11:03:37 +0100
Subject: [XML-SIG] 4XPath and Unicode
References: <3A33914B.6A28143A@fourthought.com> <200012101841.TAA00760@loewis.home.cs.tu-berlin.de>
Message-ID: <3A34A679.F2F6BC49@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > 2) Use an existing Python package for lexing, for instance mxTextTools.
> > Pro: should be easier to convert and maintain.  Con: performance?
> > encoding support?
> 
> I'd discourage yet another C module. It is *very* unlikely that they
> get reasonable Unicode support.

I wouldn't count on that given mxTextTools' heritage ;-) 

In fact, there will be a version which supports Unicode by mid-2001
because I have a need for this myself. It will most likely use the
same technique as SRE: simply provide two separate implementations,
one for 8-bit and one for 16-bit characters.

BTW, why can't you design a parser API and then provide parser
implementations which provide it ?! You'd then have the possibility
to switch to another implementation later on.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From uche.ogbuji@fourthought.com  Mon Dec 11 16:08:55 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 11 Dec 2000 09:08:55 -0700
Subject: [XML-SIG] Reader architecture and 4DOM
Message-ID: <200012111608.JAA04088@localhost.localdomain>

See

https://sourceforge.net/bugs/?func=detailbug&group_id=6473&bug_id=124382

I've taken care of most of this, but there is one remaining dependence on 
Ft.Lib in 4DOM.  All the readers inherit from Ft.Lib.ReaderBase.  The problem 
is that the same readerbase is used for the Domlettes in Ft.Lib.  It seems the 
only ways to eliminate the dependency are:

1) Move ReaderBase to xml.dom.ext.  Probably easiest, but I think it's 
logically incorrect.  The reader architecture is more general than just 4DOM.

2) Hack the distribution code to maintain copies of the reader base between 
Ft.Lib and xml.dom.ext.  This would be more work, and likely error-prone.

Any ideas?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Mon Dec 11 17:49:14 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 11 Dec 2000 18:49:14 +0100
Subject: [XML-SIG] 4XPath and Unicode
In-Reply-To: <3A34A679.F2F6BC49@lemburg.com> (mal@lemburg.com)
References: <3A33914B.6A28143A@fourthought.com> <200012101841.TAA00760@loewis.home.cs.tu-berlin.de> <3A34A679.F2F6BC49@lemburg.com>
Message-ID: <200012111749.SAA00696@loewis.home.cs.tu-berlin.de>

> I wouldn't count on that given mxTextTools' heritage ;-) 
> 
> In fact, there will be a version which supports Unicode by mid-2001
> because I have a need for this myself.

That's good to hear; but I'll wait until then...

> BTW, why can't you design a parser API and then provide parser
> implementations which provide it ?!

Mostly because parser generators typically don't have APIs. Many of
them have entirely different input syntaxes, which are then converted
into programming language code.

Now, it might be possible to have a callback-style API for our grammar
(XPath). Adapting a specific parser generator for this callback API is
just as much work as writing a fresh parser in the generator language.

Reusing the abstract syntax tree might be feasible, though - although
it is more likely that the current 4XPath AS will be used, instead of
somebody designing a new AS.

Regards,
Martin


From Mike.Olson@fourthought.com  Mon Dec 11 18:57:55 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 11 Dec 2000 11:57:55 -0700
Subject: [XML-SIG] Reader architecture and 4DOM
References: <200012111608.JAA04088@localhost.localdomain>
Message-ID: <3A3523B3.280CFB98@FourThought.com>

uche.ogbuji@fourthought.com wrote:
> 
> 2) Hack the distribution code to maintain copies of the reader base between
> Ft.Lib and xml.dom.ext.  This would be more work, and likely error-prone.

This is what we did for our test suite and it seems to be working fine.

Mike

> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Dec 11 19:08:08 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 11 Dec 2000 12:08:08 -0700
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sat, 02 Dec 2000 09:03:34 +0100." <200012020803.JAA00847@loewis.home.cs.tu-berlin.de>
Message-ID: <200012111908.MAA05005@localhost.localdomain>

> > Well, this would interfere pretty badly with 4DOM.  There is an
> > xml.dom.Node.py file in 4DOM and having a Node class in the __init__
> > would cause problems with the import.
> 
> What exactly would those problems be?

I guess I'm wrong about this.  I just tried adding 

class Node:
    pass

To Ft/Dom/__init__.py and expected everything to break, but all was well.  It 
seems that at least Python 2.0 is clever when the same import can be made as a 
package and an object.  Is this also the casde with Python 1.5.2?

Given this, I guess it does make sense to move a base Node class to the 
__init__.py

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Mon Dec 11 22:27:29 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 11 Dec 2000 17:27:29 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012111908.MAA05005@localhost.localdomain>
References: <martin@loewis.home.cs.tu-berlin.de>
 <200012020803.JAA00847@loewis.home.cs.tu-berlin.de>
 <200012111908.MAA05005@localhost.localdomain>
Message-ID: <14901.21713.862544.22201@cj42289-a.reston1.va.home.com>

uche.ogbuji@fourthought.com writes:
 > Given this, I guess it does make sense to move a base Node class to the 
 > __init__.py

  Done.  ;)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Mon Dec 11 23:24:52 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 12 Dec 2000 00:24:52 +0100
Subject: [XML-SIG] Announcing PyXPath 1.0
Message-ID: <200012112324.AAA01115@loewis.home.cs.tu-berlin.de>

--Multipart_Tue_Dec_12_00:24:52_2000-1
Content-Type: text/plain; charset=US-ASCII

After recent discussions on removing lex and yacc from 4XPath, I got
interested in writing a 100% pure XPath parser in Python, using
available parser generators.

The first result of this research is attached below. It hasn't been
tested much, but it does recognize the LocationPath expressions that
are given as examples in the XPath spec.

The parser is based on YAPPS. Since YAPPS is LL(1), some rewriting of
the grammar was necessary to make it LL(1).

I found that the generated scanner class of YAPPS is not usable for
XPath: there is a number of context-sensitive aspects in the XPath
lexis that make the straight-forward longest-match approach of YAPPS
unsuitable.

In particular, a regex lexer cannot distinguish between an NCName and
a FunctionName, and may decide to return an OperatorName in places
where it shouldn't. I tried resolving the former problem by only
having NCName as a token, but that caused a conflict in the LL(1)
parsing algorithm, which could not tell whether an expression was
going to be a FunctionCall (that would require to look ahead to the
LPAREN).

I haven't done any performance measurements with this grammar
yet. Also, it returns some ad-hoc data structure as the parse tree. If
there is interest, I will try to have it generate 4XPath data
structures; I'd probably need help from a 4Suite expert here.

I have tested the capability of parsing a Unicode string. The
definition of an NCName needs further work, since it does not yet
reflect the set of characters that count as letters in XML (or what
else is allowed in NCNames).

Regards,
Martin


--Multipart_Tue_Dec_12_00:24:52_2000-1
Content-Type: application/octet-stream; type=tar+gzip
Content-Disposition: attachment; filename="PyXPath.tgz"
Content-Transfer-Encoding: base64

H4sIAL5hNToAA+Q8/ZfaRpL5Ff6KDj4/ISMwTGI7IR7fzibrPe85jp/HuWSPIYyAhpFHSFpJeIZs
9v72q69utQRMvL7k3r13bHYQ/VFdXVVdny2/3v34OiyvHn7yO37U58Mnjx6pT5RSTx5/XvuWzxBa
TkaPTh49OTl5rNRoCM+fqEe/J1Lmsy3KMFfqk02Yl1Hyv7Hi/6nPa+H/LsyyYpDtfo81RsPh488/
P8L/0fCzk5OR4f/jzx7D8+hkOPrsEzX8PZBpfv6f8/+e+ityXo0GI9VXO12qMEnLK52rbFdepYnK
wryAX8WuKPWmfU+dbaJS/UWB0Og4UM/1PN+G+U6NvvzyC+g911pdlWU2fvgQgKT5bgDkTVZpvhzo
5fbhf4UwO3tISz5U0KyW6WK70UkZlhEsFiZLtc2WALtoA7TRsH+2XfcR9lh9Gy61AqAq18U2Bl6t
1SqKoUkXpYrKQscrFcF/pVeo2WwTRslsBjBOHvUBSYFxtlzqpUozWuzVd29VmaptwWAXaVLq21IV
OimiMnqvVbEIk0TnAUCpfW6uosWVgk611ItchwBA5zluRpd6QbAXYRbOoxjgwE7uQkIQeM3EHg0e
qaLcxbjJtb4FvnS9XHtqky63sfaRJE8cSM+jW4A0367VTVReKb3Jyp3KchhMWAANV3m6AYg5Uiva
ZGleqgdteaAlAuQs0PqeenulAZEw17BajgiEy3AOmNxc6UTdaMADmIVw5mkaa9h8UW5XK3Wv+rTL
fKvVqRq1V2Fc4NOQID+P8qKEhdINCIeOMxCn1TZhFN35d3za7aVeKRAbnSy7MUC79cdt5AU8D6T5
1qeWXJfbPMEOnlToMlwuu0WgzJxOpwM8ULdI/MLKDIi9CmPc9g7lIdcwjIbDgFvqjWDPY1U465kF
tgnsBZco60voWKNwFypdKZyPCxqwKP23CLMcVzgamARwFkdF2YW+ooL6dbqZRwlLrIWOx0a2je00
2SxTABsmU7si7wJAjl28yxrpCgeH7oP/4frS3twQCQYA2+DcGCECcqJr+Gh8mGi0yXDNzBTYK63a
9fRmrpd9OcJe4Nkn6lfei2QRb0WjSB+oAGgA+qTbMtuWYA09P6iBo2Pez+A4lQS08bsBOlRJtLC6
ISqyONypHIDDdivIcPz7ib7py4kHqHstAvf7OxRFBU/0WD9KrCZzyHBnr6xzjqotjGM4H9cwSAHH
8OwvCt/z21Pi3PkG++UwL+KwKPSHHuXqSNM8tCRAvKSwIvaNXqE8kLpYgITAdmWTXyG3REmBhknp
9AD/CM/oZy160ojei29AHryfJmH/57P+f86m5mHY/3L64F88GvPq+295EDb2pPH87RtuzN9M//XC
u7joXsBncHHxy+QneBiOTi686cWF/wD7fqFRnQODOjSmI0Blu1/HIfDXbvbftc6UDtGeUAduKUKF
cQNiOH8H5sTsBo/kbBYlUTmbddHagUbdzkHnhPka/5I+hYPawr4BdsEe8EtacBi04JdpYVt1KnOd
VXKd5bIKQpQz7DHu3ftFQP/5nrqvupd2vcvg0q5knwn0JZ539f36CjRkHmUgtn/bpiXYLjZNBUhX
4pVqje5HnCZiz/60CRcFaK2k7Mfp4nqg/k3v1B/DPN992n6v8zn4DJuZvl1oe/g9zxMyn+/Ap7j9
Ex49ofUPbMdUvk1YE4eolUCsgMaACckQoHcVgUkoWL8Z0GKlj/IhS4vT/ihQm2J92vljuFRvEVjH
MgP6ATn4K79hHPyGv8dJHqFdkalP1XBs9Gjn3tOCtsa66Fmn3dJga6t+Z+OTP6jFFTi3HdVTlwba
JfzojKnN4gIt0445j6/Sb8H+0w7MmTwTl7AiCItmQCblxiEsKDeydTSb6ZUB0DYypuKZ0bfMMNcd
MUic84jxUYqL0ghUtE4AXfhOQGUj5eC8GJWCjJx04XETJWEc0Ar+YDCYftVuvaB5NMIOMF0ICntC
kU88gsw50YhkUrkFvFEYs7Dmx7QTPtBA3zU5GJpfBktnlkGlDopRPZW9tlvkrAFu8GNpScFaEX5G
ueqC3szAOV6Kk1ek23whtsJvt5Br10GOuiaz2rdF3kINCePpXPtOr5lhOlX3OmDIA1m1m/u+gimW
eUQ14VyEg5lkp0Ph15/h3OOBizw49XIQ0S9A1wz5kII5ADHiQbAmHlOQORQj3oyJWWjuV3YB5iE6
HZVQwtQQ+MEmJE5vNEADCEPFcHY8bEAsp/VPT1Us2Av7/TFTAtfvmrV8Gf50fzQT7x5EMNfAtW1u
wxgjOIinON3cppc8ByCaJoBCWyWuuY1mnHVRm1I5iaY4rpWHEdiXzvcJHDj23PRyXEMDPDPUGMla
LzuMgPEIq+0AtLaAcjVFlz3I4wqt8xq9JNp5jKwcDV2GqKsQAq65Fq4mILlRAjiwJ0aKgjldbVHQ
6Y+G46mQuBp9v7i4SMAb+gNYKaXuFwoa4YsMFo8KyslwCn9OpsFlOflsCibK2DkeUN8P8Zol2HKc
hff8Kt3GSxpQl0MSYXDrwZ6jAaGdRxQIVegHIIkyqikTGCs0WUlSeU+R0xCn6TUFwSi35tSQhsBW
lGRDKaM2IG4Fyzuy8niuwxwcD1zYKhGiMWIKVhgD601YLsBqc3wZ5nGkczNd2Cc+GJ0y4CIunuV6
BQFUstA8dg6QZgQJ2NAfOY2wLrKrS8bM97gHcchEp2R2F46maklMCcKqF9dMKY2HgMNUSwQy5oQm
zHHOEnEmMwfmcKtDubGSY4bOc5RsYVetDaDNCA5oY91K6QfWcvu87EY9c0hgzuw99YNWaww6mch0
DiAOnWvcJ/5iwgIx30fplrSgTHVIl7lNhsQbaTSceoGUgQhkiU4WONdLFA8gTrIGKtNRhrWIBVbt
VEs47CEKOSuhY4K0IHfGcXzq1DYblmFv8x3xJmVMULeDcu703qVRpU2DTqA6SD5Gz3Fquoa45G6B
X1nb4yrdAkwMHWh7rFuiAvcNC85FNlDny1mPyv0tH5AAkrmvcx2W2hw2PhMkYsuwDGEEN58qB0Xz
1KuoFtiFAtTKqhKciRk9PjBtitRwvAj72HM4Qmh+l8Q70ShG/GtJDufA0vguUAaFDAwCKpUM6CaC
i6Ndtyc1mu3T04YOnhomO83GQaBfvtvvqjQzyrGjooeZMeTdEp5vq+0UrHMrfqrB4EPIU3dDKW6f
cXjOrihqeqe1K+cZfgQmSVClQt5KqBBiriwGmi+3mzkpTpveUkRatHu8ykYXRQh6lRID8W5gYrsM
1ocRiHCbxfl5tEZHAf1pZhYckwTgg/rFAfT7FBTSNikZy8k4mwYK4tHEY1Ns10UXv9eBg8azOr1L
/O6NLnudcUeWe5XeQGS240mkwsOcnSZZnMZRflRc2skmvO1m/S+GAXhxWe+L4dTuJFN95fS2ZY1z
DP2sJdQrtrARKlLUBF2ETrtwNvHuYG8uveJwDVEwu+/gKSIN9U49Ay8bvDSE/a4aia3sV8LQDLW6
wTaC/4NZkv3RUlFvNJ4eRD2P1lcV7hY5RjvIKsRrXbnt+gisEVcXuXFkUUPOwXHISy25EOinLrb2
6Ixikw826MlQTNwz9RgVN2g0CtdW6IxhpFjYVToQCmGISKuhh1XR6om7cmadOo6UAjd+DzHQXkaL
EAIVRyS9Z8oLLJpOm6e8Bxks6v3kuV0vxX9PwP+Yi39ejD17IvlU49Fd63LWDDK7khqszi254ezm
sr9T5dQCJHmi9RIcYDma72HbTZiGOXuZOkoe8npE39di2BspxnCevseslbpJ82umFSzeryX0IPbj
FM6k67kRMuy7+o1Zv1YLawQYfmFXrgemse52YOeG+I8isRms83SbdYe+Xx8lc+vuBM6ttTRAeP6U
vXPyBhFv8THtAXkP3AJ/27fGNpKsBsck/yEEZvJchQWHAhKLDCQYIVag7PeKyWjaew9nlGItBDye
utnm9408wp81/EExrCtvHrPmTs157Js8wkdUvKHh2h0puCTc6EPpCMoX2wQQjgLk8Ws/+DePx6L9
vcTAJsy6cbiZL0N1O1a3GMdYIL6Mp/XRquB322IPdn7G4Y3gjyU0n2T1Gw3kyEMuHLWwfUC06HYO
FZEukg76XngITPrY+ky0po0yXhMl4ZQAc7OdsSlO2o6SceSrcPpWgoAKgQO5Pl/SXmYRk0FoZqjg
gCPy2MV1ziNL8B6plmC26PicQS3Upa2Lf318/6jdD+qHDyaRURlN+nxlAzBxQB4keoGORb57wEGQ
ugmT0pQWD2gWnj7Xi9AUP616ohMhHqJLMQQFpyJe1qE4JDyofKvN+sdYZpb+eEYxAJct7gRud1t4
k52ePZq9jsDonhsnb3/WEQUA5+LYaPwIwEF91kQmYKB7KMBtblo5H8rmZpTGDY6vix9QC+DsOUoF
PD5GeH8etRDxrKpIQeEf0RVfc3XI5V8N3AHKcn4Wl7V1Hyz2faxm5lKhUcw6K6KYShnDahMgYnm4
2YS5Ey78PcH6F6vScReAhJsimHDphdK//j9k7pktyLCHv0jzJZtqU1uZwDeQuCqjsIq086H7wGQx
BcZdMmWZ2twbLD9zVYiGExQHc8UpH/l1t2kylxjUmonhmCbQ+akqQpizk0wKbMNdBiEULKZrFFMB
MbjWu6Jr8poU6aP7wXkQk3sfvPgG1vQxf9DHlJNE8h5mCYQv4FZcrkmQ1WYLPt4cY+v3YRyBh76E
aDRaRcBixPMcdMU2Y4XHit2aURSDffNaVRiseFK77ADa5emguWxRaRrkVaqQ7BECBYASG7mpMjeh
mQW0omsOqxSzyiYCXTCYGfXNQIpn3N81dpo6yEvKC7MpC4If2g6Dm0nVTueNZGgRzzsRlCQeWBmM
5wouNFJCsZZkNYm/UwrhyNXAxKJLPeG/7zuicaBf5IMPmSGIDJmsp+DCWQHiMeNqrPf06f3i2TPM
13IbD43DuY4xw4KhgsG0p7CCJVMhehgrz5/0u9LdO0Hf0OYVqyprE50h5TBqKeTkflHPGTMCgaJc
FYMibYChC5KjZRBE/JBwXWrw+yc+YvaL8ppJZsczw7TDUpdhFBcugzlv/uv8JVbSLZnlcjbPw8W1
LgGQc0yLqkr49CmYv17n2bOOk2Wp7n+IGXV0QZOBwmMMGBsD5MBVHqucDNMvkl4/JyLpJE8YNbpq
qOjZJyNSHBgid8bEHqwX1Xbt8jZBUfOr0GMPvimtMENnESWFMfLo2lKOAwuCDd9ku0SaGrJNgyYG
2lTGMtJgoDuBtOCajcUcmeJVWjzNaZ9EU5Cl2nYpj2jRIZMCo1AeJQ9shOyoOnLkDUuLz1+8OX9L
14SoqghAsUCuFpwA5YIBKt5Bu0XDv3v58rsfGuNhXzSe07AuzAEEDvSL68fO7YfAlN6xElUv2dF9
O8wVoqoNWfLneh0lCS5ALnNo6aQEJWcBh+FA/w9ZRoUrScSX7ux2y8anZPEgPlV07zFUUv/M8Yrc
NS1NVT051FxTbLcqG+RKzd5p+5iD5lhK1BZRGYEV59OHHpScOeRBdSj//g/nMOKPcIExVDGrnCoe
cuBkWmf+LXosHldqGTCSlogPMcHOUbs1k8l+LgyHMyMV77qibh7GA+fQzDeJ5InJMlRYvUPvglCz
BLCFrhpneXEcU0cIM4CYEKXNcGWB3NDEep5MNFMca5CQgeHtSIeOTSK6dICv6XFsDkKnK5dtFIE3
GqS3ZGcJ8/lqC65UrAjtAhxNupj6M10jUFiDMsHiqko5N0WA+BbR5DVMlLyM3ROnGqVVtlONYcws
Yw+qX1DL+7uqKmXPm2qiWfg7e/VXcx1RLVNNhQ47m+r8llP9vooGehCos5cvqyuMc9y1O25gp5s8
/wL6YUMarxsmqGPwieqsBdEBNQn+TFd2Kt0rJpSLw9ANCEulA45KTebJJggKlv6tyg9zDMXYmAWp
5DQJvGZgDjSDQ6s1B819bZZiBLvmEXnBU0xRRxaQAQT1DhmF3oYA1aRWf6zQ7um530Bq7xZbE2iL
rqIS+GmdX1S5OluSBsK9NZRjJdN3+iDOApXzYZSS1CE1ZWcs/MBePwAqReYwgPc/q4UciGXTU6HK
NMY8OBWvGShhjKM+ieVcuoUeOxHLRnzbWaJ4lOFC/22LlwHUHK/xOYfCTqOkDosM6lYRGouzNn5K
JfHYIXG4Q5xqKwO++WjZt3Yy062WSwa+5mxbAveqMpNoTd6jrYgeOUf7qFpxxy5zosx5qUaO1V3o
WAPgHLZ9zmFF1s4ZHzhircpQOrOcSW2HWECnPS8YPdL+iH3StdwsndDAqaWo2TjvQTBfT6nOTdD3
SFtB6Y2mDpkraJ+aMNil8/7mHFu5tvuSoMLGPzX2H0SW8OkDKoGqEf4oUnbCh+Fmh9dRRLWG5TRw
L6LVrq4osOYNnqZzdWwn12XpzlTOehNDQ1KO3RcrM0auyZXiBWHJFXNR27ygBDk4uJs5qM+o3MEW
56Fjq+929r5uGF82ehXWHATQ2yi/5s8ZmWStJgdsX1CNypPA74cwT/jyyjbWXpAEnmiarhfURlKc
fiQs48AdI3W/PsnzgTXACXHviwEE7a12XZPsCTJsgW8prOXi6G4zT2NWwk5CXigWgTu6QEbAGnjl
MKI8WwtchJksKu7eHTahTjoRcULuAP3MGybOCoGFK7c9EKjTj4XBUWVrjC8dquU2i7Goq3+NKXgx
2AvWgSd7xRsIIINjTybiD7tR0zBbhYuSr285PXzJLaLXEgyCVhXVGRxa3S+xs6fUtzJDeQGLBCVs
HJ2+HruXu7B1DbAweZnYgw32lm5ONg8Sv/WFacMaPeAovqwP5DeM6M4pXnWrlomYMIWzFE2tSMEZ
SNMtzKyTKyBYtCmnv6iaRUTqs2jJe+obeTuFX21j4S1sVjaukGEkGlts4onsopPYco5hA1usKfuB
Nxj4RhxYA7HSxOtYoXHKZLWzRL182R355jWhHNyKKNeFvU9i43i66bVNWL2lWzrBNc6cGRg3URzj
YFNpWAYUCUR0dXibxNG1jnemUm/A2ByeOAzNNG0WLa5nmIyuvaZBVs7mWAC8vAlEwynBj1lsE5bS
aLwObS7iiIa7RGCAx4sV9/xoMiYp3vpCjwyvSFDKgtL/Eb1VUcC40KzxFc0mq0CKmn220WDwirSV
hUjLA0QxI3x/DObDkIhh/ChX6/az9jJnRmQs+DZF665KgeNO+HzlxVibsxKYG9I7nqx+9AKjPE72
8xA0nU4gwOUT5qFz2Y2V8RWM3ViLFbqA6DYVX9AnaTG39Y0COuQSGbtF98QJ/7FMJGjcadJ2pzW3
aVytSN+9kdlPFyEA8sP+PATiValLGX/KiplfAzAVIjBJ4Gbja5biHvB1Mfb9TWbXddoo/rS3687K
MpS7KjJR6ntMnwMQqA6ZFr3RJToef0FdIYNCPk/RQiabrPOs0ygxHruNQO8E8Il0so1cFZQsmrkC
LAn7WiWylh5zFiHw5Gz9gLXEQzlEzYscLBibV2acohO+S/PhpeK7i8us2Q8Vr46Ua+n2wV0v9B28
bnAIUfcFjRnwdAbnd5dpg+mhafiBaXvJSIJEMPy7J0e07OSEYhdc7VcWw8/+bWAEgfdfDs30GleN
vR5t6u5lzHusAPezKQ91ri38hsRzX604QL1ATRDK1Ldo7PH/tyvQ0HE6lCk3Kf9Q4Y3qdHl3aCCp
ruBIzS+5q+AH7kDPLfVxRarqPx16x1nQ6SW9DrOAiNcRUL1OjQvxDO8tVIlUTMvPKv9sL1V5VzJG
VL+tAElrLZliF5QYk34GThms5qSLG+5k8Mhg4c2AzBSG5C1PNAEX7DaxRrIeh1uUVM2KEsTvZhpC
dotKEJOyt7tPFEnRsWkyngOYYhxYebdk8cH0pvhPMlCGnK1vSOk8YXyCDsIQ75eaU9CxIlFflzP7
ltNmJuhb8xYL87VVwXTGMOTmTSMOqeV9kIYoyeReZ4b/Hkg5C+B/NBK/Qb/ODiq6TmWa/zlFXDvM
e+pSxxKOXbLIXGIsdvKoj02CqO/ciadr8SndPCfzf5OHGZv9/VXwyhBQqEkrtOWyFrbVcGEP4QCo
w7Mcp2yvvmfcsZqKoZctE/p3PIp/6uyt4lpNuZnKxJzyhi7a0kvjEhzYa2ZS+6ObTOYd8kIi7hWF
0bETWEsL7nUs34OcwHdX/q/pAfZQTYxBV3f+u71nXWvjWDJ/raeYyMffSGYQFyfOFy1SLBs51h4M
tsC5fEJxZDQQfQFJKwkczmH3MfZR9gX2xbYufe8eDWDA+KzmSwx0V3dX36qrq+tCthc8C2uhtatu
5IfHZWb42NEJbJ1aTc4XnHzAewkDYOOTkICygPxdVyiHULGRPF17huHzzvbWr4a5kYcmtalI1Xtl
IymrXs6od6uxu2fU628/45p6xf30gITWdGGl3is7wwN0zzKwNSrZEoplonpVU5fEpocTBauR+MhN
4MBP02LZ74vkCP0xWg0PC7+MzxsXXBcmcqSZT6ovRmOMol292G8ng37/eH71du+9BuQqNrb0eAA3
MUfWT3oYPM54BKAoCPcU34hiIht43Sdaf8Bno5SSZwiePY0I7IJUoptz9vF1Ww3Izng2OBn8g3Vn
+KFf6gTCIkuHfaHhABt6ogrR6z0Z4qn3G15TwoeDKVl2d4E8cemkFYcuj/6wb5/mrKlXltZ0kqpT
H0k+wdJIdI6AV2kSYHP/YxrMB6LyI0V9KrZwXiB5FHp4fSD0D03VES3Vl7JekkpYx70QX5WOiGtg
RmXOIEjJ1TyGhkZBicDMJWIK0gekmYp3BrGc+BHTW9RruHYJImJ6ycZg4uak6qLuf+0r8KyJx1G+
YxM3mcLATVnoLmT82S371UHivnoHKQqcqHUHKdxm9NZMdhAsyzrp9fFmP8BHtMEshjxmqvS2N/tT
jcSkIlMtWjFASnMmijC83CShzIAXBSFL76wsTcOdjxgTVGgeM2aLb0R6JFTfwvDyAp6IWdYFrSsC
TUVRaAkWS3opyMEvm6PvUirFjh25OhtSsgNDWRc7N0hd18zlRjjT2o2rJqjsC66LYDe0XaXRgyKz
YEe/W90gdM2G8ocQmmUuV9ZBqsFFlAyiBlF61js+JdJZpNfkIdwKSdUiBjykJJsIGxm9xkuxkuGZ
o/r72lJvNhogUKdqg3fL2Id4SY27vbC5fHFtSfRdptOp1DJsW907CXLDqdT/VtIp7E6q35vUEtXn
J1pkQO+hj8g74oMWGRbjCSDUvU2L2im9C4jCljq4u4U0NeaXZEys6iLxNnABsQYqZcpL5ZbcYJNG
HJAg7O5e+/LATsVlockhkCuWHk3LxeiRdBeUtd6XilLet8SQaHUAN/IRXMetG4IWwqVoCM48mpQz
XJ0NVMZtkg2tCynpLtFNnwEUWkRIm5BLhKX4UbCQeMMNXkCXiOuTPFcoH/H2xVTyVhm55YrxC1qb
uBzYuo5kGXGGlEqnPniQbamRJUaFsYRFeYpPMEy0hYj3jStexYG3fZvliRJDRtU+/jEZ9oqta/aL
uoFYlJBlVN58woJSIQ2uZUlkS8WyKwM2RoiFwFalAXsYYkEm50onGrgv2CWTErfEj5zlUhluYOGy
bIhmG5FNq1pse7XRVeKSgHE6i9tQGVfKl3OllrJz+QJX39rKwCC/+BXQzapGDGSLWmd/XpdsN4p5
9CMqFSdTNIRPYmEHHyeBSsw64LSxzOunuIqUif3cWbdcaGUL+wWW7Z7ym0VNxZniXjMJb2t07Xn/
vlaLpcPTOKM5NuE8V7MIvM9ZGFIqAgAAHeJPqlJznQn+ErEFsFERAvjaJBqNBTz60okncbmCTjRL
5awxYsGueNMldjXawO1UjzYQFvtUzxyDq/kYzPQ9KO5YZG78sactzfr8fKpenI6kedrXhYVjufvo
WO4G/MXle4uDsShm+oujTM9j3MJfXHRb/uKu5y3uATnowsm6jsM4LJfrMm7hMO6Ldhh3J/7iPs1b
HC3Dq/iLW3iLKyy8xS28xS28xS28xV3dW5zlqGIznR54XkIyHXPPZCZ+GT5AVD5+pbjIyBXJwRP9
Kn26GzCsdYAg9FsAYrlO2cv1QB7esiiXVLT9/AvKvAjkCHcbmM2/BmCqlF0N5LBLLMrmXwMwnWh/
fwZX6/39SXeJQCFldW2NvK2vrn3bXfILNbc3EfBvfk5jb6+NWRsb+/ulzm/17tL+/kUdf0GX7PXA
4LSoqrCveB96l6uHyovUwP5+kZoQLuDLjyG5TH+XyH88uYSPbRDMKLtVw3264w1Gl1eUvyCzF6C1
QvHLcIsR5LktLTZVQ462n4LzFfusHmbr8GXo66nSlmqexnyHXUIZ6l9JdGbgrRgZE4srqfm4PRS9
lK/leOipNW43o0ae61njB46ENkvZg+S4Awb8uoTH9ebDy+fNJwzm5aPDNEEwzxJ8xClBPWW7Huvd
H4h9xyQyJk3S1AO3Xdfvp/S9JtaP08o0DZTIlpN7YnExy3FZT/zu6Qd71k+TO593nBi/YzyRXHPm
rK1lzBr0Qs/baYJTluhpjOM584cYASHzEaJQGBn4wMi+wfdt+Y7KAyFT5Zr1V5+DZ2vTRFMWz0GW
jqvMpVSCamErl29nReHyMZcTk1VjQZG8Qc3apywcGv4QHYDJ0lChjW8TCrEJvJki+ieoABwgXV2C
L9t2AXEBD0C3UV/HBqakEKyBrcAV6YJHrUul1mYi8RbYUJ04rXILmLScV91tzIKz9ohHcM4mQ3rR
wOcepwafWApGgLkHwQLRou4Ga45jdy376/iKVJHGy1zHHKXlc5+HF6F9zajZC0wElckkM9YhxsC5
J9jnParEBJhzwrvo884ID0rG2ZBLnOSXc1qsZ02jTy7CPNB1logY8FAhTbr4727OAhPUirv4Qmob
zz/EcJHd9RqjBeXv+luhm3hQ2mNLnJc6E5L5xwKRQw2AdNWujSitN7ny/BAdwybplkC/4LsjFtO8
gRoIqu02hgHPC5+Lkmhibme9urzeDZ1r4sj9vARA8PWfSgHy2RS1cNz7zJPL3We+ucx9BndqznWG
ei2vMPf2YlaKHwp5T15/7sPhxis5poBuTVbb0o/n+J5+U0/0/wrBGP4VYjHMDcVwXyMxZAdiWMRh
uLU4DCIGAxMD3ghSoScRj6Ty7xpehEjZrdbp6l38ozaRUl5rjwZn9DrKSz2SNUSlXypHPJH0Iiqf
YQ0AQKcnIkXDgvmlMj4vq00u5xUJm40a68hY2HeW16vdWi2uHMVVB1puQQWKHMBSjLFjlQINP6O3
huwk4aWALLpjIueznfb68tkF/SlMUuoTYyy0REjnyhlhQ/VKd3E8SbW6pKBvo+nMTdts/dTabOIh
V9wfPnq0P5TEaDeV1lPC+8sU62Mzmg/p7GOaqnfHZVEprW36leo4lBtQ0G/RlNqDh7yzHihMk8jS
helUD2FA94e4p0XKIcV9kBWJ0A9XwNXAE8Zi+erIiolI7LFU6BLpB3yXMtAV+IoXQimU1+Mmgxew
XgueBXh0YRZPIAGpZvVzQMl/qmI1J8YfVT4LD8R2FBIZsY1ZoZWjUejKK0IkViZHdujHTmE4HacH
6Omjr6I1k/WUGc95LEJJWL61gayoDFvBhX3MneOT/Omwn06ms5G0N4cSdmQVmirIAAZlmETEGFlx
oIVTDqec9MGmGSHlyUY4L6pC0+it/YiiCVvhqIEuwi8FoXYdVu8tPJin8OqovUr1EbM+X0VO0193
tx8oxuyDQLVPDmiP0xlyM7GzypTWhVJsss4IdiGT0jQKDRBeaqbOGW0CqVwm/nY00OTM/Jnw07Ho
Cs7Gn8SPS15XzCI+KclHdcowdciCimVBfTJ63IDB/ud/4v+GOpl2tvtP9uzvuEwwUB4mvcT05IjK
HrgpqsplL3uxNL0xwhiWbTcMEq1hWYw/bprBkLaE9k+kLI75YV2hEPSyP98dqCwbcv/JFnriGd/E
skyURefpDFNDLNOg5Miw7xEqBH0Ro0TqDkhNSOggFETnug/01KExx2RWilYTwEHM3pE9ew/82T/y
zHEfRj8LuhGRbg+2P/uh8MDd1xGZFUSPoHuo0jU0fFMZFxxBdmBkODIB7yD9ZuoGEyrND/5j0GpF
pd2yWdEZqKzSTqPT3mYWgAB8jE0woSatTtFAXvzwYfQc3d0IrPT1hCmT4SmHf9i5hg2FmW03r85D
uGpkqagzHWFF9KN0hmSVKhrPxJZi60XOqvCP0vR8WmGV8yqer4dVffMgUyosA7tA6K6T97p1SZCj
+B3e9NDxnHIBNuagX3RsALcWdQ6Pe0fTLpPZylHUEf0an3d1sZcIUxWahVggwSOoPzoInUCadJeg
veVDNMXEMshZA4f9zWq5U33yLRqGQgUFY13LA7PTdY8/HB9jyzMGvBzmHoI1DJkAGCRURO1xakgR
LLTslPpheQcl3D+Mk1IyAcAXxcRakKAOf2KMB6j/+Lwk7wXAcZzCGcWTBNs9JiTiKv34z3Lhq8V3
l9+b81+AtP+xQv9Wjm6ljbXV1afffBN9Bavpu6f2T/jWv1tfXYeU9bVv15+uPv3umyhaW/vm6Xdf
Rau3go3znaKkK4q+OkFb1+FdtHivvofS+A/pCS2CBD0FwtV5wDrVUbv5YvkvOOP+WF77/vvv19bW
nhaAqRyNz0n6Eq2vrsIp/ppGLzqrRFv/+z8fB9OCtLAR59/AMbFpnAxm0b9jfMT0GJiFXxtv3uxC
kT9ms3F1ZQXO89HkvIJHMaDVr6T905X/6kGR8QoFB1shbIFqnqJSuoiHAvzM6biP0aL49vswepUe
j7FnQv5HjBeZ6pKP3QncFj+cDo77GHcJJRev/9yZlNK/xhMZK1KYQnFSrWaI54WIlXI6q1bowVKM
tmZcpiDqbQB5vYGKiTGRNbNNhx7UXTuCGytvor8IvkROUjmlPWUYUSkIt66HxLVonpRABzxmaPVA
zg+Pj2GriNs0OpsUl9MRVAJXDp7C6M/h6GP0UfJkgt+FKioKYcGP83UTOaeTMTtyZ8/KWIyWIUXN
OoH57XOTB8ejKdxrEqjnwwiyiWn9a3ZK0bD68vo8SadyRVSoPa4WTywZ8VMZDkBN6KWgCj+XSQa6
rL1eoU0YcFLHvYNUBO6i1a4hdCHh0lS4FsGCeODhzZj9UKrgfFRkmsINeTY4YBv0afSRvHCS70X2
QAJc+rL2zPboUUHsIBoUXig8SVvMV1ejuNj5rdh9XLzYjzu/7cfdx7EBtU1i6mpU3N/vL5Uq8O/j
8g8X+HOpaID91JsMeh+O07ZUxa+icuPftDpklX9bXf7+fWXZaWLUT/dQzQ/vpDhjswucmYvxZIQ9
h7FH422yO4YRuhgCuFm88ddguo1ytwgWOBSAwbyQvyzDf/jqcoG2t4MPp7P04uAP2LIX/XR6gEsD
2tK/Kmg16fq35engwzGmUEyzMczrBdv3A5owQUgE9G8KGGuzuvpCYOrqiYbG5W0ObKmakVP+wazm
NYowgYPaGYtAeTgzVkNbbxrt5jall8z0tk4vm+m7e422X8ubrXe7lLpk1f283Xjx9+Ye5XSs2s2c
rpnzUhDc6/b/sT0AmzvcRsVJVOlO1nPRuwszsbm9CYkPhVSEboMUPUGT3q3RAW1hIj4AHi3Xo40N
M7VeN0qbGbqSdnrcQxcFVmXBj2oPgUMrEuYianyYjo5h2efVR7WFgC2cQwAa93glRu2OEE4bG5BZ
r3MzGTAO3h/QEAVPpnAXBMJzoUSNhHomYmf1uu4C1XpmYRKcEYIrnSWhzDL00/mMIQyVMDT9Z+k4
CDLd2MA8UXXm3E/NLmfUk9dhnMZ5eKAO6UqcIAh3NRcbWe8VK86t19yK6Vh3Ck+EXSFfnvDxgsZl
b4CHIb/zUzF/Flwi4RINV85ak9SNrM9dmjxxGlldvzMPna7VnoLzMO+ohO6SzrO3qtk3e2iQrEZx
tRpn9kBhVEKwRBbKHA57wHOGwwK2cJYzoNHFVmnucjGVkBaKkrMQJxwSAcH1iLNNdROd1cWJhE80
YNkhIpJrsmcO3e5YLctm+AzgP+y+CnyNvQ+nKsPHj4FWFh5cMAfAafSr3TniJRAnPIgEHpS2JNIc
1DHJwTuO7S0Ka8JAo0p4BBZuVR3rOq0J9wp1pnMFVl64JsypRhEVpjIeaA6NN48foC/h81NMMgBk
UexwgzZZAX7BX3dxxR5C5is8oEqc2anwRg0uq/hZLCp8ZtfHw7gzoXEUp+wkMOoDuDqdE7Cq1OPc
RXkv3V7fvH140sVWkliLdnM3g7GU2WE895l+t7IlP0hRownITLG6aGboPpoMpUS9MTk65dBtAn+x
SNCvUZyYJRIFahEDlTqfikuw6O/4uKkKAZz8vSsZIxvCbMstm3uGJ3F2w2dLl2ja72d13i59hw6q
7YWFO4tKqDxoWybKdnWe2V2zhNXVUF+BXQ+3VYov4uQskXmap7AaNbaHAMxg6UPsvLFGB8ewrnk/
hNkbDSE7n8XbGMtZlTHGVe9iIx/HVmfIFox8c3ytYnlrKbM94NeMvHK4TeMEcvrRGPapzh1ZX0ek
qEVJAjWRbZ/DQdxRcObXiqvdqXfHGhHupeq4sfC5mG6g+R+nvePBjIdC5CLeZrqBPErtJJRNOVRR
G3+Uz2U1At1wmxGclQDJ641Z2r1vjoa9Y2rPBJrKBSpzZc8sILNfTml750ZxLc5prgQgibxUSSC9
bUPtcnfjry9R99fXrDwwlnYlxpLu98nBoEEGJBCgYObaJEAB+dc3o7w7nhtxToMlAIEum0Dl+Q2L
4dyoXaLq2vXqrudXXb9mzZfAun49rEPUwahDT4sQtiFvK7EwAQEHH0IiYAFa9MKuwVkHKHe7RLul
eAm77gOW57UvhnY5vlQLy9duITDAfj261++G8kTyoaZ4wDvnYADKHOBgJc4wu2LUPBxKcIuD0VBQ
5XxcxGD3B2dxfu0Idb36T0b9S9SPUNeqPzCVqhLjsFs2cRC8N6wfsz2jUsWyORwcNvPoET4OtVPl
ZUprxZDG/ghm7Pgc37Ki/mDKQd1opxeEDskkLYjLdC0qKplyaf/jRYfE6Y+LhbciW8Chk9Aq6lXr
v8s/FAu/SBayxmrCpR/eqPvOfrHz236x+3hfv7Psx+ULASVuPfTIQhJpembh35YUlH8n29//GzQF
7b+VeBSLulJCrg5JS0IwYOa+VZlvvTy+I0HtJZXUlklllYSiAkh4rBKQFEGCRliKCSCxo2vSiV2V
CJdm+Lti/i2SzNTn1OCF+ltuxvrKykW1elGvXWzULr6uaQwHw6PjVEF1Nuq1ZOVZdVm3i7P18x8w
R7v4plPvRPuz/SF6zSgXcA4LsqyY/w4/otIbLW8R2ojdghKsMQw/P6GRjfsURWnyNQr/oAep2Pbi
EetHKQTxn6gwVb1NuYW9RyuEVu9WwgdLylWrNyu3Eu8xC6Gp6W5BOhKh9Z7jOlH6SZQVe24NrQzb
i6GVJZ0ZWokpkgT0jaV8Aao9mEDiT832853dpmHmmOEaMCKTDVU3khN22LSsfQSS+y/t5mkmfQWO
tPMpVcEktcJllYymLHtyyxsY/IFeTxxHYXnG3ViKIpi5PkPROggyf/dMIiFRD4jhaM5AMiEfTWJU
jGHZwmje9JyvBqaCzrAImpQGhsv/SCcj7a+0Z5QWr99nqId6Ji69Qg3AdD0mnEhh+CpBYI06oG2i
VxUxRVOtKZCwi8opzY3ltI8aMSrp9c9oX6qVpfLMyQn7VjRnQzoCI52A3jEaT5yTq7y0v9z7A/5S
nr+0coQAr9jTGnR7qJvZQaOEjwMMkEKhnCej0Qlr/KKdDzudC+wwqZuHgkS9DGDrppOZ17VabbC0
Zk42943cViqdD2lHJs2EBOL43mljYfqJNT1auvQAP+nKTHkyU7+wiyZhZBLoIEXFdFo2xpMzssZx
JtwRTtKj0+PehLRmUOFgpOs6kVhh3lzfe0a/ydjR7qHvtM7MtdzXRc53DUd21kjkubSTgGYPTipH
k9HpuFS0j8diWSoG4YIKbwX0MfgRS0R02kilZtyTTOPUdggsXfz66XHWPlCT73ldw+WViXQ5c6u5
tJnJXtlcMY0+x+6hJYr+pKLewcFo0heT0aMrDd4atWoORy2SVVCZGurrIP84GmpiRt4vkzOkVQL3
PuBQKlcA9RNS3XcWCZFYGnrtHNIFMd+mLrEGPFj/K6ow79VHUx326BHpSz+aYkSHEjSbYEeTmTPa
ovczK5HH4kxTLNSkMCbUyBiNjXSyg4LCjrcAsRoA1qFeHKOYrHoVO2N4SxWHjpXO69s8b3ijPUui
ajWJSknUSSJy3wuHk2QOza1DJyFahA7qHLvJWsvLa93OE3XEd+JnwEDiE2tcgv87cdefcx5A8hTV
CLmKehhxqORe9FjF2zYUqylqs3t39SoRsxS7gLEFSbbkCh8+mdlLJ07oQI8HKe4EMBWGluLSlIGt
XU0WqujstwSblpaD7TsOAdi3jsX9I/+qfnfM3udXqvtN1RL7AdMlBsCpasiGtwZ5WVpLVpM1g6jw
YLREuOA/epPewYw0aKWqoR4jNIhA/vs86h3O0olTBQV/OEtJoV4RPSK7RKhL7lpC3JDDxOnjK15w
QX36erqZqWZyxoMuNANh2FE3EH6EtQMhg/QDA24NjHmO5UN/7EGFHRyYZc2nwThjUmcfR3pip/Nn
1qmB5nnuzCJDVs2mE+a+pAl/whOOCh/GXlVqk1kLQHOsmbMn66hkEhQJoQda8XpI84ljH7OnyAQx
CzGxwAHUbDbB4u4+twb8/+/Ptv8Yn99GG/PtPyDr6Xeu/cd3qwv7jzv5FvYfC/uPhf3HvbX/KDgm
uwUO5MSW1eL543HBj9JlOSYwvQqEYKX7Fl8ofCuezou4Q4oJ6Xu4jrWLJJTHTPzp56KwHnPxp58L
LBJmVn0f5EVUH8A8+OHncVYop85Z9WAeZwVyNrjURqjUBmcFcp5RzrNADm78YsIKJn7uMuUtB3IS
yklCI8UDFchZWaEs+BHI4yzfHbp4JCPnp64xkgfMb2XsPt21SfKhvTczLjjHNMlvUN4WkitYKPku
5SUrnNxXQyW/4y8kwpn2Sl6Zt3lF5pktebV5AomE7Y78FcTXWfap72e3jWzPZ36JZStZVeOrJmcG
lpd84WSATqBlC6DrA1j3yeuM2+PQwG3uiBYrwTydHYR4LocjEMtBBE14GIo84AQaMGwO7yjGgCvc
dfxbAopYoFu2AgAE9TBv3demL3rtyHEXQxxY/OoE0wedWJ6CjosDQp4h4pSRB5E4quRppk4GdaTq
fWKsW3EQCPrNJN7xg+s6mbzHHbmyx0vbISiNQsBj/ywdOz5uIWW+V2rT4MlpMySE+uQm1II3VHQ/
xzK3FhLM0D1eLJAUZCD0cWOwLpoxsWj6/K3yBQ3GjfiKNVTG7cVsZIScERsr2tQ6N/w223Y7N+jC
OenI41ScnF3rwcH0Foy5Ycf9lTjxInYEfPdLaBNcddEym7rJDlo4dOSVwO60OgA+fUckkckNy5uL
ZMIUB6nZT4MD944f88g3F49eGKwhvxgv/kxNd3sHmjme83JhZuB4XhepiruybBR8X+yOUYTvfF2Y
Kd7mXNnk1uTf7zHpxR9VaxpzjpEvpF/Xj3IScr5OOyKo7YG2rEm07hJUzQjd3oq7UQbiOqenSSBN
zk/s2fBpunPLo/Jl0UxBsYLEzx1uYV3mxDexx1yZpIVIpGH05hPI0F31FteuOz3XHVOjoxnXmDmL
Nnw/F0PpXm0MA+CbZQGztqRlXOyF9xDWxuHwGGI9G+tMWuA6K00m++Fx/HApbbdSFUhrvm1zKOzI
TTPSzuq69ga9FD9Ogojrrlf7sLE9hdhzY2aFLi7Kr4hVTKaGSmiXHi6Nl+mhUmqa892aeAHxMi8i
3EXnZuUy2Moa5w4J0f08J5TtgT1vInUejdM26epclnbrxk1G0YHbG2jFq31ZJ7RzD2+HtNvkDul0
L7EB5FiHKfGcDUhwtj8FfexLvwuBSV3wWv4wBu/zGa68MiWZN7Ap7vENKjCvN8Ckfbny/CyJpL/D
Q4vIiV0aWmbzjt4Mj3BBOYfYVndBytUWvh7BRIuD/OGUXn6CcYJFTQImwGna1vcLWsifaTDvHERG
Tv7dSXk1kCeR5QFBz4JzbH2OJ1CD/0jmPTNmLlpjBK7wLJFz3M99ZDMdKplcsZTHLBZzYPIId3se
1IjZk6AHMjgL+npruQ+44h1H32FcdkPdm25WnKAEtMYIKa96Hg21Xn4yboXhIQoICGTLHsWwW/GJ
tBL3f5ZH4yuwBPOohqjIJwGXfBjJIwXWE8glKI9Lq0IPmraQ4fY4hsvu7GfzniKuI2C5EqchZfrP
4sAaNS7Sn+P4ulessXj7ydHVuVco34iOAboEDLKh+OXIaOar0zguBfWy+8Lk8YpHmHdK0+q4sctl
9tl7mWtWnoTS9wzr6wGEgG7yTL/EkIYVyMyuZEk35KMB6U9fXXB7qREyBMx39gB/jyULSRTk0v5l
FLiuJA9VayOD+7z5h4ywl3NfluI7JFvcuPjLvVDNERwE3LxJ4YG+ZOlJsFjkxfDTZ8teQqJFJe+a
MxGWZ1I5BY4b0yw52me6qd0W+Zp/rdsIXeouKUHL4fp8P6oOF+rhUrtFZHwfpjnY1G8PGd9Xax4u
tzgyAe+uN3UBvwPeMUQ8rvw64Y2QFo9lhj/wj1R1G7r7B/1PJL83cQhc4TZ0Q1jPuRxd7VKk9fvs
tWIoOua8XM9VttLVmPcq06vvZxG63BX7PPcxgpAI2Duge+ZMYYTPZdnzFuBtc2hglpfnHLLsCeNv
B7ssD9GfQqYpDtuXR6lMuX8mjZ+3+33JPt45PcpvVR6Q6Zvn64Jxv+66n8PA2/RRkMbQHghepj4T
C39bTHqQ+7vCRSmHvoRiPORQvq9vGaVg3IlPYkvv6DHoRrWIaqYzMF+5Nqw8jV9Ab5oddWS9AlsR
68ra6CN7tLNEpiHNXr1HF/QySO12LC5TDRdWQ/rOd/94mb0gAy415dvi4zhkrumUfxtYzW+zl7Ko
myAuYwyqghIE6hKGco6VibCem6snYccIvIw4dfGSmsP6e6hmstI6fEjmleB6WjD48XnjhB3JOf1o
QH10PxUJLzpJDho0nTePhhfE5FPO3Z3P6THhU/kuLB8Qu82za8vVeRM2bJ8ypqbZ4u2dDLckEcKt
71dt994PrznvaPCgL3EIBR35KrOzQFsB/abM6b6EeZmBvq02lo2xGO8b0B/ObENMpHOgcpzReS0w
yBVcNbB4zbQ69Fgfaer1efW2rqGwNUcsqfi5XJGkCsIcHPar2Ufip1XnPeZHqtR7VpL4XW0pl+bF
hDZNJXGGybdiCV1SJpEZ30I4L62xI6yS5RmR4bjt2eS8Khs+SmfoA67EZRNydFkulaUzz/Svg3Ts
+Gc0ItBiTVZ3fHeN48lgOHufYlEL0kgvyegWieiCXIPGPDMeLaqT8Kj6tQGHTXhGBBAn08rJ9CiJ
R8PoeDCEhba2dDA6Hc64uU51ip78u7D89odiUkQrpgNK3Yxoot0bRqPTGTrpp3riQgHX93v0f/f+
fa0Wv39/0hsM378XxIA9YJ6r8ehNjjgSgfCqin+Xo3otelIVTfD0YnpnDdAbjQVUZ72bxJO4XMG4
LyUxl7QBJG6NydG0GkUbOIn1aOMQGAREqw44FhZByxZByxZBy1ThRdAy4p4WQcsWQcsWQcsWQcsW
QcsWQcv8pRGkzYugZYFvEbRsEbSMvkXQskXQskXQskXQskXQslv5ZPwvFJbeUvivnPhf0fqTp09F
/K9vnz55uobxv56uPl3E/7qLj6SpHHBJhRMq4L2DBCW0+IskLKpWx0BaiomV9Nj5G8lkqewkIm3U
iUoMBRmwRwPJqlItgbIbl9KtarU/OHMTpbDKyvRlWXaNforAHejpGIjiShYqchxWAmO0on7JKm1l
jI4H05msB5lxpw0s2AH6TZx/qVxb6+YAHPemMBmXg1rOq62uATxJnxqmMHqeUC8H3hqVQ7gETFIT
8Jt1DSlw7Y8OVuwJMwt825WZ05QOUTNTVRab/TZWI54SteLH3gSPxWI3DoySCx0L6LhroxFqyQK4
fLuym3LfDWbH0G5rOJuM+oIpcSc0VMSFedzhnSBg8UrBCSzSGfx19QJzVuTKSspvJx1zxs87lVr8
j7jb7cCSQMBuoYCXVmKEiTLlP86lgpE23n8mFdYzp5e4z011788nz/92s7H5unk7beSd/9+uyvif
334H/+D5v/ZkbXH+38W3JyXmUT89GQ0pwuOIo4F+nMC+5esDMwhix53i/YfjPVYKrRlKhaZwdE2l
lFZfPPC9dFoo0PM1MJfRsogSuVZZi0rTNCUAams0GRwNhmV+zKwcRQAqmBJ6mxFRIwsySi1k68iX
IjKi4GCxqEAUU6KD3jRFqeNoOoMLLbfTo34R65MVb9K8XMhXhYPTCb5gQTXDEcpvhdyW2pmewpH9
bzAWAHIwG6YcgxOG5nAwOUn7ZnUfzqFX0zGfRXLQRqczfHwq5AdWfYPBIUXDI3xgTqCZ8Tl2sT84
FGE5+4OpOEkY++nocPaRhOqQOZhNC3bw1EMSLZ1H49MJkGuGwqiX+C5/mJI0C6VagDnMBNwS+0kB
rsJngz4++8hon70Po7OUcOEODEezwUGK0TDTHl4uo97xMWYPMPDnkAsWKJYnVeEXJBiMIat77NZZ
mJ6OkWvF+bS6lKgmWOyLt18Yanc4SSj2IS1goFRCsX+WAgStcBiT8SkwLAeD2TniMOsNhkIAq4ZX
z2BBDbEcuCnrnh8ktJNmcAGF4wgrVd2pFAqvG+291nb00842YPTfP7d2o83W7outRuv1btTY2op+
brTbje29VnM3+rm19ypqN39stDejvZ1o71Vrt7C783IPQJpJ1Np+sfVus7X9IxVrvX6z1WpumsV3
Xkavm+0Xr+DPxvPWVmvv16ixvVl42drbbu7uYgXR9k7U/Km5vRftvsJKfNyeN6OtVuP5VjN6udOG
4r9Gu2+aL1qNraTQ2t5stZsv9iLIeLGzvdt8+w5qgqxos/G68SMiwCXknz+/auzt7kB7bejU7rst
aOrHwsv2zutoa2eX0H23C/3abOw1sOib9g6gCnj+/Kq59woKAWYN+O/FXgvwA2hodK8NfyaF7eaP
W60fm9svmlhwh6D3dqAvO+92RYEkarRbuzhaO+/2sPQOVQh1bDcJoECjDUURC2q/2YY+v25QrS9p
9CM5+pUFQ7H4Ft/iW3yL78v6/g9ORwHsAEABAA==

--Multipart_Tue_Dec_12_00:24:52_2000-1
Content-Type: text/plain; charset=US-ASCII


--Multipart_Tue_Dec_12_00:24:52_2000-1--


From chetan@pybiz.com  Tue Dec 12 01:07:14 2000
From: chetan@pybiz.com (chetan patel)
Date: Mon, 11 Dec 2000 17:07:14 -0800
Subject: [XML-SIG] [ANNOUNCE] XDisect 1.0 - An XML Indexing and Search Engine
References: <200012112324.AAA01115@loewis.home.cs.tu-berlin.de>
Message-ID: <05bf01c063d7$dd8540f0$09d40518@C746107A>

PyBiz Inc announces release 1.0 of its product XDisect , an XML Indexing and
Search Engine

The release can be downloaded for free evaluation at the following url
http://www.xdfind.com/

I would appreciate your feedback and comments on the product.

Product Overview
=============
XDisect is an enterprise class  XML search product with high speed XML
indexing capabilities.  XDisect is ideal for distributed management of XML
Documents.  XDisect provides a solid foundation for next generation vertical
markets, secure portals and other dynamic e-business applications.

Features in this release
===============
- XDisect is completely written in Python 1.5.2
- High speed indexing of millions of XML documents
- Index sizes can be in excess of 2 GBs
- Supports Incremental indexing / updates to documents
- Runs on Linux, Solaris, Win NT/2000
- Supports the SQL query language
- Support sophisticated joins, keywords, free text, path based searching
- Supports Oracle's XSQL query standard
- Open HTTP/XML Api Interface for integration with most popular programming
environments
- XSLT Integration for direct html rendering of XML query results or
transformation
- Brwoser Based GUI for developers to look at the schemas and documents
stored in the repository

regards

Chetan Patel
PyBiz, Inc
www.pybiz.com


From kentsin@sinaman.com  Tue Dec 12 15:21:31 2000
From: kentsin@sinaman.com (kentsin)
Date: Tue Dec 12 09:21:31 CST 2000
Subject: [XML-SIG] xml / html parsing for web
Message-ID: <20001212012131.22258.qmail@hk.sina.com.hk>

I have download 4Suite but I found it difficult to understand from the document to build what I want. I have also read the linkcheck code which contain a very smart regular expression to parse almost all links. What I found missing is a javascript driven or form driven links : some site have 

<option .... value="link1"...

Which linkchecker can not follow.

Moreover, I would like to extract the form data and link them with labels found on the page. Associating the link with the hot text or image. Which linkchecker can not. 

Linkchecker's regular expression approach is much clear to me, but as a newbie I would like to hear from you that how far can it go? Does it worth for me to go into the 4dom way?

Can somebody point me to some 4dom sample code? 

Many thanks to all who reply.

Best Regards,

Kent Sin


===================================================================
�s���K�O�q�l�l�c http://sinamail.sina.com.hk 
�ߧY�U�� SinaTicker http://sinaticker.sina.com.hk


From kentsin@sinaman.com  Tue Dec 12 09:26:08 2000
From: kentsin@sinaman.com (kentsin)
Date: Tue Dec 12 09:26:08 HKT 2000
Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web
Message-ID: <20001212012608.14096.qmail@hk.sina.com.hk>

Dear all,

I just come across SAX, is it useful in my task? How does it compare to DOM and regular expression?

Rgs,

Kent Sin


===================================================================
�s���K�O�q�l�l�c http://sinamail.sina.com.hk 
�ߧY�U�� SinaTicker http://sinaticker.sina.com.hk


From noreply@sourceforge.net  Tue Dec 12 01:51:03 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 11 Dec 2000 17:51:03 -0800
Subject: [XML-SIG] [Bug #125424] Node.replaceChild broken in minidom
Message-ID: <200012120151.RAA03038@usw-sf-web2.sourceforge.net>

Bug #125424, was updated on 2000-Dec-11 17:51
Here is a current snapshot of the bug.

Project: Python/XML
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: keffy
Assigned to : nobody
Summary: Node.replaceChild broken in minidom

Details: 
In xml.dom.minidom, Node.replaceChild doesn't replace any children.  The definition from the source is:

    def replaceChild(self, newChild, oldChild):
        index = self.childNodes.index(oldChild)
        self.childNodes[index] = oldChild

Is there a good reason why it's not the following?

    def replaceChild(self, newChild, oldChild):
        index = self.childNodes.index(oldChild)
        self.childNodes[index] = newChild

Sorry if this is a repeat report or addresses a design
decision for the "mini" of minidom.  Sorry also that 
I'm clueless about the Unix-style patch system -- this
is as close as I come to submitting a fix. :-)

-- Kevin Russell
krussll@cc.umanitoba.ca


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125424&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 12 08:32:41 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 12 Dec 2000 09:32:41 +0100
Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web
In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk> (message from kentsin
 on Tue Dec 12 09:26:08 HKT 2000)
References: <20001212012608.14096.qmail@hk.sina.com.hk>
Message-ID: <200012120832.JAA00700@loewis.home.cs.tu-berlin.de>

> I just come across SAX, is it useful in my task? 

If you use an HTML parser (instead of an XML one), then maybe, yes.

> How does it compare to DOM and regular expression?

It is an event-based API, instead of a tree-based or a function-based
one.

Regards,
Martin


From larsga@garshol.priv.no  Tue Dec 12 09:25:41 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Dec 2000 10:25:41 +0100
Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web
In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk>
References: <20001212012608.14096.qmail@hk.sina.com.hk>
Message-ID: <m3bsuiq9d6.fsf@lambda.garshol.priv.no>

* kentsin@sinaman.com
| 
| I just come across SAX, is it useful in my task? How does it compare
| to DOM and regular expression?

Like the DOM SAX is used to work with a real XML parser. The DOM gives
you back the document as a full object structure, whereas SAX instead
gives you the document as a series of method calls.  So SAX is faster
and requires less memory, the DOM is easier to understand.  Which is
easier to use depends on what you want to do.

--Lars M.


From mak@mikroplan.com.pl  Tue Dec 12 10:26:38 2000
From: mak@mikroplan.com.pl (Grzegorz Makarewicz)
Date: Tue, 12 Dec 2000 11:26:38 +0100
Subject: [XML-SIG] [BUG] sax.ExpatParser.reset
Message-ID: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl>

Test failure in test/test_sax.test_expat_incremental_reset
due to bug in sax.expatreader.

mak

--- expatreader.py	Thu Nov 02 18:23:08 2000
+++ _xmlplus\sax\expatreader.py	Tue Dec 12 11:16:05 2000
@@ -69,8 +69,8 @@
 
     def feed(self, data, isFinal = 0):
         if not self._parsing:
-            self._parsing = 1
             self.reset()
+            self._parsing = 1
             self._cont_handler.startDocument()
 
         try:
@@ -118,6 +118,7 @@
 #         self._parser.NotStandaloneHandler = 
         self._parser.ExternalEntityRefHandler = self.external_entity_ref
 
+        self._parsing = 0
         self._entity_stack = []
         
     # Locator methods


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 12 08:35:26 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 12 Dec 2000 09:35:26 +0100
Subject: [XML-SIG] xml / html parsing for web
In-Reply-To: <20001212012131.22258.qmail@hk.sina.com.hk> (message from kentsin
 on Tue Dec 12 09:21:31 CST 2000)
References: <20001212012131.22258.qmail@hk.sina.com.hk>
Message-ID: <200012120835.JAA00746@loewis.home.cs.tu-berlin.de>

> Linkchecker's regular expression approach is much clear to me, but
> as a newbie I would like to hear from you that how far can it go?

I think that's hard to tell. Just draft some code, and see yourself.

> Can somebody point me to some 4dom sample code? 

Please have a look at the demo/dom directory of PyXML.

Regards,
Martin


From larsga@garshol.priv.no  Tue Dec 12 10:31:27 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Dec 2000 11:31:27 +0100
Subject: [XML-SIG] [BUG] sax.ExpatParser.reset
In-Reply-To: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl>
References: <NDBBIKNLJKPLOLAJJJAPEENGFDAA.mak@mikroplan.com.pl>
Message-ID: <m33dfuq6bk.fsf@lambda.garshol.priv.no>

* Grzegorz Makarewicz
|
| Test failure in test/test_sax.test_expat_incremental_reset
| due to bug in sax.expatreader.

Thank you Grzegorz, but this bug was already fixed in the CVS tree.
(Revision 1.18, 2000-10-14.)

--Lars M.


From larsga@garshol.priv.no  Tue Dec 12 11:43:09 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Dec 2000 12:43:09 +0100
Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference
In-Reply-To: <14899.17068.574076.957348@lindm.dm>
References: <14899.17068.574076.957348@lindm.dm>
Message-ID: <m3y9xlq302.fsf@lambda.garshol.priv.no>

* Dieter Maurer
|
| I use the SAX2 implementation bundled with the Python 2.0
| distribution to process DocBook/XML documents.
| 
| When I turn on validation, "xmlproc" complains
| "unsupported character number 'XXXX' in character reference"
| for each XXXX larger than 255.
| 
| Apparently, "xmlproc" does not yet know that such character
| references no longer make problems with the new Python
| unicode support.

You are quite right, xmlproc has not yet been updated to Python 2.0,
chiefly because I am too busy writing my book to do much development
these days.  I'm planning to add full Unicode support to it, but that
probably won't happen for another couple of months.

--Lars M.


From calvin@cs.uni-sb.de  Tue Dec 12 15:35:20 2000
From: calvin@cs.uni-sb.de (Bastian Kleineidam)
Date: Tue, 12 Dec 2000 16:35:20 +0100 (CET)
Subject: [XML-SIG] xml / html parsing for web
In-Reply-To: <20001212012131.22258.qmail@hk.sina.com.hk>
Message-ID: <Pine.LNX.4.21.0012121624570.31907-100000@earth.cs.uni-sb.de>

Kent,

> contain a very smart regular expression to parse almost all links. What
> I found missing is a javascript driven or form driven links : some site
> have <option .... value="link1"...
> Which linkchecker can not follow.
Yes. In general you can not tell if the option "value" is a link or if it
is just some data. The same is with Javascript. I can construct links out
of many parts:
<script>
mybase = "mydata/sub1"
if browser=="IE" {
   url = mybase+"/ieblubb.html"
else {
   url = mybase+"/netscapeblubb.html"
}
</script>
It is difficult to extract such dynamic urls.

> Moreover, I would like to extract the form data and link them with
> labels found on the page. Associating the link with the hot text or
> image. Which linkchecker can not. 
Yes, its the same.

Generally I think you can not always extract dynamic URLs out of forms or
Javascript because you never know if they are really URLs or just data.

Bastian


From calvin@cs.uni-sb.de  Tue Dec 12 15:36:47 2000
From: calvin@cs.uni-sb.de (Bastian Kleineidam)
Date: Tue, 12 Dec 2000 16:36:47 +0100 (CET)
Subject: [XML-SIG] Re: [XML-SIG[ xml / html parsing for web
In-Reply-To: <20001212012608.14096.qmail@hk.sina.com.hk>
Message-ID: <Pine.LNX.4.21.0012121635430.31907-100000@earth.cs.uni-sb.de>

> I just come across SAX, is it useful in my task? How does it compare to
> DOM and regular expression?
SAX is a parser, DOM is a parsetree format. You can use both. A parsetree
is usually the output from a parser.

Bastian


From larsga@garshol.priv.no  Tue Dec 12 21:52:44 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Dec 2000 22:52:44 +0100
Subject: [XML-SIG] Sab-pyth
Message-ID: <m3g0jtz4r7.fsf@lambda.garshol.priv.no>

Has anyone been able to compile Sab-pyth?  I can't do it at all on
Windows and am having problems on Linux, so if anyone could make this
available to me I would be very grateful.  Windows is preferred, but
Linux is also good.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 12 22:40:19 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 12 Dec 2000 23:40:19 +0100
Subject: [XML-SIG] XMLPROC: unsupported character number '>255' in character reference
In-Reply-To: <14899.17068.574076.957348@lindm.dm> (message from Dieter Maurer
 on Sun, 10 Dec 2000 09:45:32 +0100 (CET))
References: <14899.17068.574076.957348@lindm.dm>
Message-ID: <200012122240.XAA00752@loewis.home.cs.tu-berlin.de>

> Apparently, "xmlproc" does not yet know that such character
> references no longer make problems with the new Python
> unicode support.

Indeed. xmlproc currently does not use the Unicode type.

> Is there already a fix?

Not that I know of; Lars has not put anything into PyXML, yet.

Regards,
Martin


From kentsin@sinaman.com  Wed Dec 13 21:45:15 2000
From: kentsin@sinaman.com (kentsin)
Date: Wed Dec 13 21:45:15 HKT 2000
Subject: [XML-SIG] xml / html parsing for web
Message-ID: <20001213134515.25819.qmail@hk.sina.com.hk>

Yes, you are right. There are no general way to do this. I am not making a general spider, my job is to collect some information on the web automatically. I have a small set of targets, so I would like to build a framework of spider which I could customer for every target site. One of the target contains links build with a pull down option list. So I need a way to include that.

I think the regular expression way is simple for newbie like me to handle, the problem is that it seems very difficult to customize like the above cases? The other problem is that I want to base the selection of action on hot words (which is the words between <a> and </a>.) And I want to preserve the order of the links so I could customer the action to choose a specific link by its location. 

I think the regular expression method is very difficult for this, but I have try with the parser way, but they crash with ill structure htmls. 

There are many parser modules comes with python, Can someone comment on them on my case? How to choose between them?


===================================================================
�s���K�O�q�l�l�c http://sinamail.sina.com.hk 
�ߧY�U�� SinaTicker http://sinaticker.sina.com.hk


From noreply@sourceforge.net  Wed Dec 13 15:54:27 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Dec 2000 07:54:27 -0800
Subject: [XML-SIG] [Bug #125668] DbDom : Reader produces DocumentFragments
Message-ID: <E146EEd-0005Dt-00@usw-sf-web3.sourceforge.net>

Bug #125668, was updated on 2000-Dec-13 07:54
Here is a current snapshot of the bug.

Project: Python/XML
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: DbDom : Reader produces DocumentFragments

Details: Hi Mike!

When used without an Document parameter, Reader.fromStream() returns a DocumentFragment (instead of a Document).

This is because when the document parameter is None, fromStream creates a new DocumentImp and passes it to Sax2.Reader.fromStream which returns a DocumentFragment. 

A way to correct this would be to append the DF to the newly created Document in that case.

I'll see if I can setup a patch for this one.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125668&group_id=6473


From noreply@sourceforge.net  Wed Dec 13 16:01:51 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Dec 2000 08:01:51 -0800
Subject: [XML-SIG] [Patch #102818] DbDom patch for bug #125668 (Reader and Doc Frags)
Message-ID: <E146ELn-0003wn-00@usw-sf-web1.sourceforge.net>

Patch #102818 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: afayolle
Assigned to : nobody
Summary: DbDom patch for bug #125668 (Reader and Doc Frags)

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102818&group_id=6473


From Alexandre.Fayolle@logilab.fr  Wed Dec 13 17:45:25 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 13 Dec 2000 18:45:25 +0100 (CET)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012111908.MAA05005@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr>

On Mon, 11 Dec 2000 uche.ogbuji@fourthought.com wrote:

> To Ft/Dom/__init__.py and expected everything to break, but all was well.  It 
> seems that at least Python 2.0 is clever when the same import can be made as a 
> package and an object.  Is this also the casde with Python 1.5.2?

I tried that with python 1.5.2 (adding a empty Node class to
xml/dom/__init__.py) and it looks like it's fine too. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From fdrake@acm.org  Wed Dec 13 17:44:36 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 13 Dec 2000 12:44:36 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr>
References: <200012111908.MAA05005@localhost.localdomain>
 <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr>
Message-ID: <14903.46468.622296.363688@cj42289-a.reston1.va.home.com>

Alexandre Fayolle writes:
 > I tried that with python 1.5.2 (adding a empty Node class to
 > xml/dom/__init__.py) and it looks like it's fine too. 

  Great!  Now I won't have to worry about needing to back out my
changes.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Wed Dec 13 22:59:31 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 13 Dec 2000 15:59:31 -0700
Subject: [XML-SIG] Mixed encodings and XML
Message-ID: <3A37FF53.206662F3@fourthought.com>

[crossposted: 4Suite, xml-sig, i18n-sig]

Time for me to expose my ignorance on XML and i18n again.

How would one go about creating a well-formed XML document with multiple
encodings?  For instance, if I had UCS-2, UTF-8 and BIG5 all in one doc,
how could I make it work.  Take the following example

ftp://ftp.fourthought.com/pub/etc/HOWTO/cjkv.doc

This document is a CJKV HOWTO by Chen Chien-Hsun.  He originally wrote
it in HTML.  See

ftp://ftp.fourthought.com/pub/etc/HOWTO/CJKV_4XSLT.HTM

It contains many sections within HTML PREs with the different encodings
I mentioned.  They look like

<PRE LANG="zh-TW">
... BIG5-encoded stuff ...
</PRE>

I need to convert the document to XML Docbook format.  My naive attempts
at converting to 

<screen xml:lang="zh-TW">
... BIG5-encoded stuff ...
</screen>

Of course don't work because the parser takes one look at the BIG5 and
throws a well-formedness error.

Is there any way to manage this besides using XInclude?  Do any of the
Python parsers have any tricks that could help?

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tree@basistech.com  Wed Dec 13 23:09:47 2000
From: tree@basistech.com (Tom Emerson)
Date: Wed, 13 Dec 2000 18:09:47 -0500
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: <3A37FF53.206662F3@fourthought.com>
References: <3A37FF53.206662F3@fourthought.com>
Message-ID: <14904.443.228020.168633@cymru.basistech.com>

Uche Ogbuji writes:
> It contains many sections within HTML PREs with the different encodings
> I mentioned.  They look like
> 
> <PRE LANG="zh-TW">
> ... BIG5-encoded stuff ...
> </PRE>

The LANG attribute does not specify an encoding, it specifies a
language. You cannot safely imply anything about the encoding based on
the value of the LANG attribute. For example, "zh-TW" text could be
encoded in Big 5, Big 5+, GBK, CP950, CP936, EUC-CN (depending on the
text), ISO-2022-CN, ISO-2022-CN-EXT, and others.

The LANG attribute can be used by the application to help generate the
appropriate glyph variants, however, though I don't know of any off
hand that do this.

> I need to convert the document to XML Docbook format.  My naive attempts
> at converting to 
> 
> <screen xml:lang="zh-TW">
> ... BIG5-encoded stuff ...
> </screen>
>
> Of course don't work because the parser takes one look at the BIG5 and
> throws a well-formedness error.

Which it is required to do, see Section 4.3.3 of the XML specification.

> Is there any way to manage this besides using XInclude?  Do any of the
> Python parsers have any tricks that could help?

Convert all of those sections into Unicode, using UTF-8 as the
encoding form. You could write a trivial Python script to do this for
you.

The bigger problem (IMHO) will be convincing your DocBook tool chain
to handle the Asian characters. If you find a good solution to that
(i.e., allowing Simplified and Traditional Chinese, Korean, and (say)
Thai in a single document) let me know.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"


From mal@lemburg.com  Wed Dec 13 23:22:50 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 14 Dec 2000 00:22:50 +0100
Subject: [XML-SIG] Mixed encodings and XML
References: <3A37FF53.206662F3@fourthought.com> <14904.443.228020.168633@cymru.basistech.com>
Message-ID: <3A3804CA.5DC4B238@lemburg.com>

Tom Emerson wrote:
> 
> > I need to convert the document to XML Docbook format.  My naive attempts
> > at converting to
> >
> > <screen xml:lang="zh-TW">
> > ... BIG5-encoded stuff ...
> > </screen>
> >
> > Of course don't work because the parser takes one look at the BIG5 and
> > throws a well-formedness error.
> 
> Which it is required to do, see Section 4.3.3 of the XML specification.

This is not really related to text encodings, but somewhat similar:

Is there a standard way of including binary data in XML files ?

I would like to put a complete web-site into a (large) XML file.
The XML file should ideally contain not only the structure 
information, attributes, etc. but also the HTML files, the images
and maybe even sound files or flash apps.

Is something like this possible or will I have to use some
other storage method for the binary parts and reference these
from within the XML file (I would prefer not to, so that I can
include e.g. the HTML file content in XML searches) ?

Thanks,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From uche.ogbuji@fourthought.com  Thu Dec 14 00:14:40 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 13 Dec 2000 17:14:40 -0700
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: Message from Tom Emerson <tree@basistech.com>
 of "Wed, 13 Dec 2000 18:09:47 EST." <14904.443.228020.168633@cymru.basistech.com>
Message-ID: <200012140014.RAA15620@localhost.localdomain>

> Uche Ogbuji writes:
> > It contains many sections within HTML PREs with the different encodings
> > I mentioned.  They look like
> > 
> > <PRE LANG="zh-TW">
> > ... BIG5-encoded stuff ...
> > </PRE>
> 
> The LANG attribute does not specify an encoding, it specifies a
> language. You cannot safely imply anything about the encoding based on
> the value of the LANG attribute. For example, "zh-TW" text could be
> encoded in Big 5, Big 5+, GBK, CP950, CP936, EUC-CN (depending on the
> text), ISO-2022-CN, ISO-2022-CN-EXT, and others.
> 
> The LANG attribute can be used by the application to help generate the
> appropriate glyph variants, however, though I don't know of any off
> hand that do this.

Makes sense, but I wasn't clear on this.

> > I need to convert the document to XML Docbook format.  My naive attempts
> > at converting to 
> > 
> > <screen xml:lang="zh-TW">
> > ... BIG5-encoded stuff ...
> > </screen>
> >
> > Of course don't work because the parser takes one look at the BIG5 and
> > throws a well-formedness error.
> 
> Which it is required to do, see Section 4.3.3 of the XML specification.

I'm quite aware of this (I read the XML spec more often that I'd like to).  
That's why I said "of course".

> > Is there any way to manage this besides using XInclude?  Do any of the
> > Python parsers have any tricks that could help?
> 
> Convert all of those sections into Unicode, using UTF-8 as the
> encoding form. You could write a trivial Python script to do this for
> you.

Not what I need, unfortunately.  The whole point of the exercise is to have 
examples in the actual encodings.

> The bigger problem (IMHO) will be convincing your DocBook tool chain
> to handle the Asian characters. If you find a good solution to that
> (i.e., allowing Simplified and Traditional Chinese, Korean, and (say)
> Thai in a single document) let me know.

Hmm?  My docbook tool is simply 4XSLT, which handles the individual encodings 
just fine now.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Thu Dec 14 00:18:49 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 13 Dec 2000 17:18:49 -0700
Subject: [XML-SIG] Mixed encodings and XML
In-Reply-To: Message from "M.-A. Lemburg" <mal@lemburg.com>
 of "Thu, 14 Dec 2000 00:22:50 +0100." <3A3804CA.5DC4B238@lemburg.com>
Message-ID: <200012140018.RAA15661@localhost.localdomain>

> This is not really related to text encodings, but somewhat similar:
> 
> Is there a standard way of including binary data in XML files ?

No.

> I would like to put a complete web-site into a (large) XML file.
> The XML file should ideally contain not only the structure 
> information, attributes, etc. but also the HTML files, the images
> and maybe even sound files or flash apps.

Ah.  This is similar to what the ebXML folks and the SOAP folks were at odds 
over.  Not, this is a well-known deficiency in XML.  The most common 
suggestion is: put it all into one file, separate them with form-feeds, and 
have the application process each bit separately.  Clearly this doesn't suit 
your needs, but there's not much more to go on right now.

> Is something like this possible or will I have to use some
> other storage method for the binary parts and reference these
> from within the XML file (I would prefer not to, so that I can
> include e.g. the HTML file content in XML searches) ?

Could you expand on this last bit about the searches?  It hints at what might 
be a work-around if that's your main concern.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tree@basistech.com  Thu Dec 14 01:05:43 2000
From: tree@basistech.com (Tom Emerson)
Date: Wed, 13 Dec 2000 20:05:43 -0500
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: <200012140014.RAA15620@localhost.localdomain>
References: <tree@basistech.com>
 <14904.443.228020.168633@cymru.basistech.com>
 <200012140014.RAA15620@localhost.localdomain>
Message-ID: <14904.7399.328781.898962@cymru.basistech.com>

uche.ogbuji@fourthought.com writes:
> > Convert all of those sections into Unicode, using UTF-8 as the
> > encoding form. You could write a trivial Python script to do this for
> > you.
> 
> Not what I need, unfortunately.  The whole point of the exercise is
> to have examples in the actual encodings.

And the point of that is what? They will display (most probably) as
jibberish within the browser... or is that the point?

> Hmm?  My docbook tool is simply 4XSLT, which handles the individual encodings 
> just fine now.

Sure, but if you want to generate a LaTeX (and from there PDF or PS)
version you're screwed, AFAIK. If you are just generating HTML then
you're OK.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"


From uche.ogbuji@fourthought.com  Thu Dec 14 01:17:51 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 13 Dec 2000 18:17:51 -0700
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: Message from Tom Emerson <tree@basistech.com>
 of "Wed, 13 Dec 2000 20:05:43 EST." <14904.7399.328781.898962@cymru.basistech.com>
Message-ID: <200012140117.SAA15823@localhost.localdomain>

> uche.ogbuji@fourthought.com writes:
> > > Convert all of those sections into Unicode, using UTF-8 as the
> > > encoding form. You could write a trivial Python script to do this for
> > > you.
> > 
> > Not what I need, unfortunately.  The whole point of the exercise is
> > to have examples in the actual encodings.
> 
> And the point of that is what? They will display (most probably) as
> jibberish within the browser... or is that the point?

Good question.  I have not tried Chen Chien-Hsun's original HTML.  Perhaps 
even that won't work in a browser.  Makes sense.  What does a browser do with 
a document with

<META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'>
                                                            ^^^^^^^^^^
                                                            !!!!???!!!!

In the header and then runs into a big patch of UCS-2 or BIG5?

My guess is that it displays gibberish as you suggest.  In this case, I think 
there's no point expecting HTML generated from XML to do any better and it 
simply makes sense to break out the alternatively encoded portions into 
separate, linked files.

Chen, does this make sense?

> > Hmm?  My docbook tool is simply 4XSLT, which handles the individual encodings 
> > just fine now.
> 
> Sure, but if you want to generate a LaTeX (and from there PDF or PS)
> version you're screwed, AFAIK. If you are just generating HTML then
> you're OK.

Yeah.  That's all for now.

Thanks much.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tree@basistech.com  Thu Dec 14 01:22:19 2000
From: tree@basistech.com (Tom Emerson)
Date: Wed, 13 Dec 2000 20:22:19 -0500
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: <200012140117.SAA15823@localhost.localdomain>
References: <tree@basistech.com>
 <14904.7399.328781.898962@cymru.basistech.com>
 <200012140117.SAA15823@localhost.localdomain>
Message-ID: <14904.8395.286379.623954@cymru.basistech.com>

uche.ogbuji@fourthought.com writes:
> Good question.  I have not tried Chen Chien-Hsun's original HTML.
> Perhaps even that won't work in a browser.  Makes sense.  What does
> a browser do with a document with
> 
> <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'>
>                                                             ^^^^^^^^^^
>                                                             !!!!???!!!!
> 
> In the header and then runs into a big patch of UCS-2 or BIG5?

It treats those bytes as 8-bit Latin 1 characters and it displays
them. Once you've seen enough of these you start recognizing the
patterns, but it is still junk.

> My guess is that it displays gibberish as you suggest.  In this case, I think 
> there's no point expecting HTML generated from XML to do any better and it 
> simply makes sense to break out the alternatively encoded portions into 
> separate, linked files.

No. What makes sense, if the intention of the original author is to
show the Chinese text correctly, is to convert that section to UTF-8
and put that in the document.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"


From uche.ogbuji@fourthought.com  Thu Dec 14 02:45:46 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 13 Dec 2000 19:45:46 -0700
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: Message from Tom Emerson <tree@basistech.com>
 of "Wed, 13 Dec 2000 20:22:19 EST." <14904.8395.286379.623954@cymru.basistech.com>
Message-ID: <200012140245.TAA16426@localhost.localdomain>

> > My guess is that it displays gibberish as you suggest.  In this case, I think 
> > there's no point expecting HTML generated from XML to do any better and it 
> > simply makes sense to break out the alternatively encoded portions into 
> > separate, linked files.
> 
> No. What makes sense, if the intention of the original author is to
> show the Chinese text correctly, is to convert that section to UTF-8
> and put that in the document.

Eccovi!  Now I understand why we've been talking past each other.  I assumed 
you'd read the text in question: bad assumption, I admit.

No.  The intention is not to display Chinese characters correctly.  The 
intention, I'm pretty sure, is to provide examples than can be cut and pasted 
in order for people to play with the various snippets themselves.  As such, 
I'm not really concerned about what the HTML rendering looks like when it hits 
the different encodings.  What I was originally writing about was:

1.  Is there any way to convince an XML parser to work with source with mixed 
encoding.  The exchange with you has helped disabuse me of any silly notion 
that this might be so.  So I shall have to use XInclude.

2.  Will the results of the rendering be such that the LATIN-1 parts can be 
read normally and the portions with other encodings would be available for cut 
and paste?  If I use XInclude, no reason why not.

So thanks for all the help.  I think I was pretty much on a fool's errand from 
the start, but at least I know how to proceed.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 14 03:05:01 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 14 Dec 2000 04:05:01 +0100
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: <3A37FF53.206662F3@fourthought.com> (message from Uche Ogbuji on
 Wed, 13 Dec 2000 15:59:31 -0700)
References: <3A37FF53.206662F3@fourthought.com>
Message-ID: <200012140305.EAA00999@loewis.home.cs.tu-berlin.de>

> How would one go about creating a well-formed XML document with multiple
> encodings?

As others have pointed out: You don't. XML documents are in
Unicode. They may have some other encoding *for transfer*, but
conceptually, they are still in Unicode.

> It contains many sections within HTML PREs with the different encodings
> I mentioned.  They look like
> 
> <PRE LANG="zh-TW">
> ... BIG5-encoded stuff ...
> </PRE>

So what you really want is to include binary data in a tag. As you've
explained yourself when answering to Marc-Andre: That is not supported
in XML. Of course, if XML had a BDATA type (or section) you could
include a binary data fragment, and then any presentation tool would
have to provide visualization (such as opening a hex editor on
double-click).

In the specific case of cjkv.doc, I guess the best approach would be:
- use Python string escapes in Python code, e.g.
  sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9"
  # Shift-JIS encoded source string
- use Unicode text data where output is intended to be displayed properly
- don't cite the output if it will come out as gibberish on any terminal
  (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead,
  explain what the user will likely see.

Regards,
Martin


From tpassin@home.com  Thu Dec 14 04:00:00 2000
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 13 Dec 2000 23:00:00 -0500
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
References: <3A37FF53.206662F3@fourthought.com> <200012140305.EAA00999@loewis.home.cs.tu-berlin.de>
Message-ID: <00c001c06582$54fa0840$7cac1218@reston1.va.home.com>

Martin v. Loewis chimed in -

> So what you really want is to include binary data in a tag. As you've
> explained yourself when answering to Marc-Andre: That is not supported
> in XML. Of course, if XML had a BDATA type (or section) you could
> include a binary data fragment, and then any presentation tool would
> have to provide visualization (such as opening a hex editor on
> double-click).
>
> In the specific case of cjkv.doc, I guess the best approach would be:
> - use Python string escapes in Python code, e.g.
>   sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9"
>   # Shift-JIS encoded source string
> - use Unicode text data where output is intended to be displayed properly
> - don't cite the output if it will come out as gibberish on any terminal
>   (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead,
>   explain what the user will likely see.
>
 How about a good old-fashioned PI?  The PI could indicate when to switch to
another encoding for the purposes of display or conversion.  True, this takes
a specialized processor, but you are asking for specialized processing anyway.
This kind of instruction to a processor is just what a PI is supposed to be
for, I always thought.

Cheers,

Tom P


From uche.ogbuji@fourthought.com  Thu Dec 14 04:14:47 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 13 Dec 2000 21:14:47 -0700
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: Message from "Thomas B. Passin" <tpassin@home.com>
 of "Wed, 13 Dec 2000 23:00:00 EST." <00c001c06582$54fa0840$7cac1218@reston1.va.home.com>
Message-ID: <200012140414.VAA16674@localhost.localdomain>

> Martin v. Loewis chimed in -
> 
> > So what you really want is to include binary data in a tag. As you've
> > explained yourself when answering to Marc-Andre: That is not supported
> > in XML. Of course, if XML had a BDATA type (or section) you could
> > include a binary data fragment, and then any presentation tool would
> > have to provide visualization (such as opening a hex editor on
> > double-click).
> >
> > In the specific case of cjkv.doc, I guess the best approach would be:
> > - use Python string escapes in Python code, e.g.
> >   sjisStr = "\0x88\0xc0\0x91\0x53\0x82\0xc9\0x8e\0x67\0x82\0xa6\0x82\0xe9"
> >   # Shift-JIS encoded source string
> > - use Unicode text data where output is intended to be displayed properly
> > - don't cite the output if it will come out as gibberish on any terminal
> >   (e.g. when printing both SJIS and UTF-8 on the same terminal). Instead,
> >   explain what the user will likely see.
> >
>  How about a good old-fashioned PI?  The PI could indicate when to switch to
> another encoding for the purposes of display or conversion.  True, this takes
> a specialized processor, but you are asking for specialized processing anyway.
> This kind of instruction to a processor is just what a PI is supposed to be
> for, I always thought.

Very interesting thought.  However, my intention is to try to handle the CJKV 
doc with a minimum of highly specialized processing.  So now that I've come to 
my senses, I think I'll stick to my conclusion.  Besides, it will give me a 
chance to consider XInclude support throughout 4Suite.

Thanks to all for yor patience even when I wasn't making much sense.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From kajiyama@grad.sccs.chukyo-u.ac.jp  Thu Dec 14 04:31:20 2000
From: kajiyama@grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA)
Date: Thu, 14 Dec 2000 13:31:20 +0900
Subject: [XML-SIG] Re: Mixed encodings and XML
In-Reply-To: <200012140245.TAA16426@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012140120.KAA14252@dhcp198.grad.sccs.chukyo-u.ac.jp>
Message-ID: <200012140431.NAA14495@dhcp198.grad.sccs.chukyo-u.ac.jp>

uche.ogbuji@fourthought.com wrote:
| 
| The intention, I'm pretty sure, is to provide examples than
| can be cut and pasted in order for people to play with the
| various snippets themselves.

I don't think that mixing different encodings in a document is a
good idea.  A brower assumes an encoding when reading a sequence
of characters from a stream.  If the browser finds one or more
bytes out of the expected range, the result of decoding is
undefined in general.  So, cut-and-paste may or may not pass
correct character data to the user.

Safer ways for giving examples in various encodings are:
- to use Unicode for displaying code snippets in the document
  the end users see on their browsers, and
- to use native encodings in separate files to provide the real
  code snippets.

Authoring an XML source of the document is another story.

Regards,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>


From fdrake@acm.org  Thu Dec 14 04:54:58 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 13 Dec 2000 23:54:58 -0500 (EST)
Subject: [XML-SIG] Pending xml.dom patches for Python 2.1
Message-ID: <14904.21154.374339.173523@cj42289-a.reston1.va.home.com>

  There are currently patches pending for xml.dom in the SourceForge
patch manager for Python:

http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102477&grou=
p_id=3D5470

    This extends the minidom and pulldom modules to support more of
    the DOM and fix a range of smallish bugs.  It is an improvement,
    but should not be considered "complete"; see the notes I added to
    the patch with the today's update for a TODO list.

    Assigned to Martin von L=F6wis for review.

http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102485&grou=
p_id=3D5470

    Andrew's patch to check the validity of node insertions by their
    nodeType.  This needs to be updated to use the exceptions recently
    added to the xml.dom package (in __init__.py), but otherwise
    should be easy to integrate with the changes from the first patch.

    Marked out-of-date since it needs an update and integration with
    the first patch.

http://sourceforge.net/patch/?func=3Ddetailpatch&patch_id=3D102492&grou=
p_id=3D5470

    This patch will probably need to be substantially revised once the
    changes noted in the TODO list in the comments on the first patch
    have been made, but should work reasonably once those changes have
    been made.

    Marked postponed since the other patches and noted changes need to
    be resolved first, since they heavily impact the implementation of
    this functionality.  Assigned back to Andrew to update and re-open
    once the other changes have been handled and checked in.

Getting these patches finished and checked in should allow both open
bugs against the XML support in Python CVS to be closed:

http://sourceforge.net/bugs/?func=3Ddetailbug&bug_id=3D116677&group_id=3D=
5470
http://sourceforge.net/bugs/?func=3Ddetailbug&bug_id=3D116678&group_id=3D=
5470


  -Fred

--=20
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From larsga@garshol.priv.no  Thu Dec 14 10:03:11 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 14 Dec 2000 11:03:11 +0100
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: <200012140245.TAA16426@localhost.localdomain>
References: <200012140245.TAA16426@localhost.localdomain>
Message-ID: <m3zohzwc9s.fsf@lambda.garshol.priv.no>

* uche ogbuji
| 
| 1.  Is there any way to convince an XML parser to work with source
| with mixed encoding.

A single XML entity must be entirely in a single character encoding.
A document, however, can be in any number of different encodings,
provided each entity is internally consistent.  You can have encoding
declarations on both the document entity (in the form of the XML
declaration) and on subordinate entities (using text declarations).

So you can do what you want using entities.
 
--Lars M.


From frank63@ms5.hinet.net  Thu Dec 14 19:00:31 2000
From: frank63@ms5.hinet.net (Frank Chen)
Date: Thu, 14 Dec 2000 19:00:31 -0000
Subject: [XML-SIG] Re:Mixed encodings and XML
Message-ID: <200012141105.TAA16020@ms5.hinet.net>

Hi:

When I wrote this document, I made an assumption. If someone cannot see
BIG5 or Shift_JIS,
he knows he can "respectively" see BIG5 or Shift_JIS with a CJK viewer,
like NJStar.

Frank Chen


From mal@lemburg.com  Thu Dec 14 11:10:08 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 14 Dec 2000 12:10:08 +0100
Subject: [XML-SIG] Mixed encodings and XML
References: <200012140018.RAA15661@localhost.localdomain>
Message-ID: <3A38AA90.139D7FDB@lemburg.com>

uche.ogbuji@fourthought.com wrote:
> 
> > This is not really related to text encodings, but somewhat similar:
> >
> > Is there a standard way of including binary data in XML files ?
> 
> No.

Rich Salz pointed out in private mail that I could use base64 
as encoding (can '<' and '>' appear in base64 ?). Alas, I would
lose the search capability...

> > I would like to put a complete web-site into a (large) XML file.
> > The XML file should ideally contain not only the structure
> > information, attributes, etc. but also the HTML files, the images
> > and maybe even sound files or flash apps.
> 
> Ah.  This is similar to what the ebXML folks and the SOAP folks were at odds
> over.  Not, this is a well-known deficiency in XML.  The most common
> suggestion is: put it all into one file, separate them with form-feeds, and
> have the application process each bit separately.  Clearly this doesn't suit
> your needs, but there's not much more to go on right now.

Now thats about as non-XML like as it could get: form-feeds
to separate file parts... ;-)
 
> > Is something like this possible or will I have to use some
> > other storage method for the binary parts and reference these
> > from within the XML file (I would prefer not to, so that I can
> > include e.g. the HTML file content in XML searches) ?
> 
> Could you expand on this last bit about the searches?  It hints at what might
> be a work-around if that's your main concern.

I would like to be able to use XML searching machinery to scan
over web site structures. This includes limiting searches to
certain attributes, e.g. keywords or meta-descriptions of the content,
but should also cover full-text search of the content itself.

Even better would be a possible recursive application of this
scheme to embedded XML files, e.g. take a product catalog which
is stored as XML and made available on the site using special
site tools which only show the relevant parts of that file.

I think I would have to provide a special tag

	<content encoding="base64|hex|plain|..." mimetype="...">
	...
	</content>

to enable this.

Thanks,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 14 11:09:43 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 14 Dec 2000 12:09:43 +0100
Subject: [XML-SIG] PyXPath 1.1
Message-ID: <200012141109.MAA00827@loewis.home.cs.tu-berlin.de>

As promised earlier, I tried to use another alternative parser toolkit
for parsing XPath LocationPath expressions. Since Uche proposed to use
Spark, that's what I did.

As I result, I can now announce PyXPath 1.1, which is available from

http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.1.tgz

The major change over the previous version is the xpathspark module,
which requires spark.py from Spark 0.6.1 (I decided not to include
spark; let me know if you think I should).

As with the YAPPS parser, the Spark parser generates an ad-hoc syntax
tree, namely a nested list consisting of the rhs tupel for each
production that was applied. Only in trivial cases, I modified the
list, such as unwrapping a list of a single item.

With the three parsers which I now have (the YAPPS parser, 4XPath, and
the Spark parser), I performed some measurements. I took the list of
the LocationPath examples from the recommendation and asked each
parser to parse each expression 10 respectively 100 times. On a AMD K6
with 350 MHz, using Linux 2.4t7, glibc 2.2, and Python 2.0, I got the
following results:

10 iterations:
4XPath                 1.58s
YAPPS                  1.43s
YAPPS with pre         2.31s
Spark                 12.58s

100 iterations:
4XPath                 5.16s
YAPPS                 12.35s
YAPPS with pre        22.54s
Spark                124.92s

In these numbers, "pre" is the PCRE regex module of 1.5.2, but still
executed in Python 2; the default is sre.

=46rom these numbers, I conclude:

- sre is significantly faster than pre, so Python 2.0 is better for
  processing regular expressions than 1.5.2. Even when parsing from a
  Unicode string, the parser does not get much slower (numbers not
  shown here).

- Spark is an order of magnitude slower than YAPPS. The Spark
  documentation suggests that the parsing algorithm used in Spark is
  quite general, but also quite slow. YAPPS used a recursive-descent
  LL(1) parsing, which seems to win easily.

- The pure Python solution takes twice as much time as bison/flex
  solution. Note that for parsing a "small" number of expressions
  (300), the startup time of 4XPath overweights the parsing time, so
  the YAPPS parser is actually faster here. That may change once the
  YAPPS parser generates the same structure as 4XPath. IMO, this
  overhead is a fair price to pay for the increased portability, the
  Unicode support and the thread-safety of the Python solution.

Unless somebody can suggest more parser generators to try (*), I'd now
proceed with making the YAPPS parser 4XPath compatible.

Regards,
Martin

(*) Be aware that any alternative parser generator should support:
- tokenization of Unicode strings, either via an external lexer, or on
  its own using the re module
- support for LL(1) or LALR(1) grammars.
- ideally be pure Python, although an addition C module is acceptable
  as long as the resulting parser is still thread-safe.


From larsga@garshol.priv.no  Thu Dec 14 11:42:46 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 14 Dec 2000 12:42:46 +0100
Subject: [XML-SIG] Mixed encodings and XML
In-Reply-To: <3A38AA90.139D7FDB@lemburg.com>
References: <200012140018.RAA15661@localhost.localdomain> <3A38AA90.139D7FDB@lemburg.com>
Message-ID: <m3snnrw7nt.fsf@lambda.garshol.priv.no>

* mal@lemburg.com
| 
| Rich Salz pointed out in private mail that I could use base64 
| as encoding (can '<' and '>' appear in base64 ?).

base64 is indeed the common way to encode binary material inside XML
documents.  It uses only A-Za-z+/= for encoding.
 
| I would like to be able to use XML searching machinery to scan over
| web site structures. This includes limiting searches to certain
| attributes, e.g. keywords or meta-descriptions of the content, but
| should also cover full-text search of the content itself.

In that case I would recommend keeping the non-XML content external to
the XML documents and only reference them from the XML content.
 
| I think I would have to provide a special tag
| 
| 	<content encoding="base64|hex|plain|..." mimetype="...">
| 	...
| 	</content>
| 
| to enable this.

That seems like a very reasonable solution.

--Lars M.


From uche.ogbuji@fourthought.com  Thu Dec 14 15:21:44 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 14 Dec 2000 08:21:44 -0700
Subject: [XML-SIG] Re: [I18n-sig] Mixed encodings and XML
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "14 Dec 2000 11:03:11 +0100." <m3zohzwc9s.fsf@lambda.garshol.priv.no>
Message-ID: <200012141521.IAA18229@localhost.localdomain>

> 
> * uche ogbuji
> | 
> | 1.  Is there any way to convince an XML parser to work with source
> | with mixed encoding.
> 
> A single XML entity must be entirely in a single character encoding.
> A document, however, can be in any number of different encodings,
> provided each entity is internally consistent.  You can have encoding
> declarations on both the document entity (in the form of the XML
> declaration) and on subordinate entities (using text declarations).
> 
> So you can do what you want using entities.

Excellent!  Just when I'd convinced myself that I was on a fool's errand, 
comes Lars to the rescue.

I gues it's been too long since I've exercised all of XML 1.0.  I so rarely 
use entities that I completely forgot that they are exactly the solution.  I 
can use entities in special XML elements, and extend the docbook stylesheet to 
output the contents of those elements to a separate file using the 
"ft:write-file" extension element.

Perfect.  Thanks.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Thu Dec 14 15:27:31 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 14 Dec 2000 08:27:31 -0700
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Thu, 14 Dec 2000 12:09:43 +0100." <200012141109.MAA00827@loewis.home.cs.tu-berlin.de>
Message-ID: <200012141527.IAA18240@localhost.localdomain>

Wow Martin!  Brilliant work as usual.  Last weekend Jeremy quietly wrote a 
partial XPath lexer all in Python/SRE.  We'll try to bind it to bison and post 
this today so you can run your test harness on it.

I agree that we have little choice to to expect some slow-down.  flex/bison is 
certainly very fact, but it doesn't deal with wide chars and flex doesn't 
bother with thread-safety.  So speed at a dear price.

Maybe it's worth designing a plug-in API for XPath implementations so people 
can make their choices.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From paulp@ActiveState.com  Thu Dec 14 16:57:32 2000
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 14 Dec 2000 08:57:32 -0800
Subject: [XML-SIG] PyXPath 1.1
References: <200012141527.IAA18240@localhost.localdomain>
Message-ID: <3A38FBFC.6082824D@ActiveState.com>

uche.ogbuji@fourthought.com wrote:
> 
> Wow Martin!  Brilliant work as usual.  

Strongly agree.

> Maybe it's worth designing a plug-in API for XPath implementations so people
> can make their choices.

That's a good idea independent of this parsing issue. XPath
implementations will always have different performance characteristics,
especially if they take advantage of "secret handshakes" with certain
underlying DOMs.

What ever happened to this effort:

http://lists.w3.org/Archives/Public/www-dom-xpath/

 Paul Prescod


From fdrake@acm.org  Thu Dec 14 16:56:09 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 14 Dec 2000 11:56:09 -0500 (EST)
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <3A38FBFC.6082824D@ActiveState.com>
References: <200012141527.IAA18240@localhost.localdomain>
 <3A38FBFC.6082824D@ActiveState.com>
Message-ID: <14904.64425.308375.787523@cj42289-a.reston1.va.home.com>

Paul Prescod writes:
 > What ever happened to this effort:
 > 
 > http://lists.w3.org/Archives/Public/www-dom-xpath/

  I wasn't even aware of this -- it looks like a little spam killed
the list in the end!
  Frankly, I imagine everyone's been too busy to work it up, and the
DOM still seems to be evolving quite rapidly.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mclay@nist.gov  Thu Dec 14 05:08:12 2000
From: mclay@nist.gov (Michael McLay)
Date: Thu, 14 Dec 2000 00:08:12 -0500
Subject: [XML-SIG] Mixed encodings and XML
In-Reply-To: <3A3804CA.5DC4B238@lemburg.com>
References: <3A37FF53.206662F3@fourthought.com> <14904.443.228020.168633@cymru.basistech.com> <3A3804CA.5DC4B238@lemburg.com>
Message-ID: <00121400081206.16898@fermi.eeel.nist.gov>

On Wednesday 13 December 2000 18:22, M.-A. Lemburg wrote:
> Tom Emerson wrote:
> > > I need to convert the document to XML Docbook format.  My naive
> > > attempts at converting to
> > >
> > > <screen xml:lang="zh-TW">
> > > ... BIG5-encoded stuff ...
> > > </screen>
> > >
> > > Of course don't work because the parser takes one look at the BIG5 and
> > > throws a well-formedness error.
> >
> > Which it is required to do, see Section 4.3.3 of the XML specification.
>
> This is not really related to text encodings, but somewhat similar:
>
> Is there a standard way of including binary data in XML files ?

There is a standard solution defined for binary encoding in XML Schema.
Search for the term binary in http://www.w3.org/TR/xmlschema-0/

The specification for binary encoding in XML Schema is at:
 
   http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/#binary

Is anyone working on an XML Schema validator that works with the standard 
Python XML library?  


From uche.ogbuji@fourthought.com  Fri Dec 15 04:11:24 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 14 Dec 2000 21:11:24 -0700
Subject: [XML-SIG] Re: [4suite] memory leak problem 4DOM - update
References: <3A37FD5F.B9FD3620@fourthought.com> <3A393F92.C689182D@fourthought.com> <00121511533903.00886@localhost.localdomain> <0012151603120H.00886@localhost.localdomain>
Message-ID: <3A3999EC.47ACB9FF@fourthought.com>

We really should share this discussion with XML-SIG.

matt wrote:
> 
> Some answers to my own questions ... but still a problem
> 
> On Fri, 15 Dec 2000, matt wrote:
> > Ok, have done some more experimentation ..... I stepped through everything with
> > pdb, let things settle over a few iterations and then discovered the recuring
> > process.  Some of it is indeed in Py_expat .... in
> > xml/sax/drivers/drv_pyexpat.py to be specific.  The offending lines being the
> > buf = fileobj.read(16384) ones(see function below) ... these chomp 4 kb each
> > time through.  Well, they are not really that offending, they're just loading a
> > buffer like they are supposed to be doing.
> >
> 
> I made the patch.  Looking more carefully the memory gulp comes from
> self.parser.Parse in the parseFile function .....  which confuses me, because I
> made the patch, rebuitl and reinstalled ... including to make sure that all was
> updated :
> i.e. :
> copying xml/dom/ext/Printer.py -> build/lib.linux-i686-1.5/xml/dom/ext  (I had
> found a patch for that too)
> 
> gcc -g -O2 -fpic -DXML_NS -Iextensions/expat/xmltok
> -Iextensions/expat/xmlparse -I/usr/local/include/python1.5 -c extensi
> 
> copying build/lib.linux-i686-1.5/xml/parsers/pyexpat.so ->
> /usr/local/lib/python1.5/site-packages/xml/parsers
> 
> the patch was :
> Index: pyexpat.c
> ===================================================================
> RCS file: /cvsroot/pyxml/xml/extensions/pyexpat.c,v
> retrieving revision 1.16
> diff -u -r1.16 pyexpat.c
> --- pyexpat.c   2000/11/02 04:57:40     1.16
> +++ pyexpat.c   2000/12/05 00:00:33
> @@ -680,6 +680,7 @@
>      for (i=0; handler_info[i].name != NULL; i++) {
>          Py_XDECREF(self->handlers[i]);
>      }
> +    free (self->handlers);
>  #if PY_MAJOR_VERSION == 1 && PY_MINOR_VERSION < 6
>      /* Code for versions before 1.6 */
>      free(self);
> 
> and it indeed did succeed.
> 
> I guess I keep looking.  Anyone find this patch did not help?
> 
> regards
> Matt
> 
> >
> >
> > def parseFile(self,fileobj,sysID=None):
> >         self.reset()
> >         self.sysID=sysID
> >         self.doc_handler.startDocument()
> >
> >         buf = fileobj.read(16384)
> >         while buf != "":
> >             if self.parser.Parse(buf, 0) != 1:
> >                 self.__report_error()
> >             buf = fileobj.read(16384)
> >         self.parser.Parse("", 1)
> >
> >         self.doc_handler.endDocument()
> >
> >
> > So the problem I see is the freeing of this buffer 'buf' : I can only guess a
> > few things :
> > 1) obviously it gets put into the py_expat parser document, which space for
> > that frame gets allocated on the first time through.  Perhaps the py_expat
> > document is not releasing this buffer properly when ext.ReleaseNode(d) calls
> > all the delete nodes.  I haven't looked for anything cirsular there.
> >
> > 2) the fileob.read above is actually doing something weird.  The 4kb seems
> > weird considering it a) reads 16384 bytes, and my file is only 190 bytes, and b)
> > 16384 = 1.64 kb and not 4 kb.
> > 4 kb seems to me the size of some sort of stack frame for a function that never
> > gets released to be used again????
> >
> > Either way, using ext.ReleaseNode(d) did help somewhat, so I would guess that
> > py_expat is to blame somewhere.  I will now go in search of the patch for
> > py_expat and see if this solves the problem overall.
> >
> > to be continued .....
> >
> > Matt
> >
> >
> >
> >
> >
> > On Fri, 15 Dec 2000, Uche Ogbuji wrote:
> > > matt wrote:
> > > >
> > > > Using ext.ReleaseNode(d) helped partially.  On the first iteration through the
> > > > first loop it chomps about 332kb, which I never get back in either case, i.e.
> > > > a) using ext.ReleaseNode(d) or b) not.  After that I get smaller bites, if
> > > > using a) they are 4-12 kb bites, or in b) 16-20 kb bites.  Both methods seem to
> > > > oscillate between two values.  So there was an improvement, i.e approx 8 kb
> > > > improvement with using ext.ReleaseNode(d).  That first jump in both methods is
> > > > a bit of a shock, especially because it never gets given back.  However I had
> > > > the feeling this first jump was just python memory allocation, and that it
> > > > might release it some time later.
> > >
> > > This is pretty common because of Python's dynamic nature.  The first
> > > time in the loop you are importing a wole bunch of modules, which are of
> > > course added to the memory footprint.  After that subsequent imports
> > > don't add to memory.  The little incrementa jumps are probably indeed
> > > memory leaks, so any more info you have tol help us track it down would
> > > be appreciated.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From dagb@fast.no  Fri Dec 15 10:44:15 2000
From: dagb@fast.no (=?ISO-8859-1?Q?Dag=20Brattli?=)
Date: Fri, 15 Dec 2000 10:44:15 GMT
Subject: [XML-SIG] =?ISO-8859-1?Q?PyXML,=20sgmlop=20and=20xmllib?=
Message-ID: <200012151044.KAA34209@tepid.osl.fast.no>

Hi,

The xmllib.py for sgmlop is missing from PyXML. Does anybody
know where to find an updated version? Both README.sgmlop
and xml/parsers/__init__.py tells that there should be an xmllib.py 
around but it's not.

-- Dag
----
Dag Brattli,                     Mail:  dagb@fast.no
Senior Systems Engineer          Web:   http://www.fastsearch.com/
Fast Search & Transfer ASA       Phone: +47 776 96 688
P.O. Box 621                     Fax:   +47 776 96 689
NO-9257 Troms�, NORWAY           Cell:  +47 415 72 969 (new) 

Try FAST Mobile Search: http://mobile.alltheweb.com/


From noreply@sourceforge.net  Fri Dec 15 14:49:11 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Dec 2000 06:49:11 -0800
Subject: [XML-SIG] [Bug #125896] Ods proble with checkpoints
Message-ID: <E146wAZ-0003lt-00@usw-sf-web1.sourceforge.net>

Bug #125896, was updated on 2000-Dec-15 06:49
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: Ods proble with checkpoints

Details: It's got to do with transactions and checkpoints. I guess Narval's bit
stressing the Ods Engine. I've attached a sample test file, widely
inspired from some code from Narval init. It runs as is, but if you
uncomment line 32, everything falls apart (and you get a stacktrace
similar to the one reported before). 

-------------------------8<--------------------------------
from Ft.DbDom import Dom
from Ft.Ods import Database
from Ft.DbDom import Reader
from xml.dom.ext import PrettyPrint,StripXml,Print
from Ft.Ods import FreePersistentObject

from xml.xpath import Evaluate

AL_NS = ''

class MemoryDocument(Dom.DocumentImp) :
    def __init__(self) :
        Dom.DocumentImp.__init__(self)
        self.eid_count = 1
        self.eid_ref_count = {}
        
    def add_element(self,element) :
        """if the element has not already an eid (== not yet in memory)
        assign unique id to the element and append it to memory.
        """
        global tx
        eid = element.getAttributeNS(AL_NS,'eid')
        if not eid :
            eid = str(self.eid_count)
            element.setAttributeNS(AL_NS,'eid',eid)
            self.eid_count = self.eid_count + 1
            self.eid_ref_count[eid] = 1

            self.documentElement.appendChild(element)

            ### Uncomment following line to see the bug
            #tx.checkpoint()

            for node in self.documentElement.childNodes[:] :
                if node.tagName == 'plan' :
                    node.element_change(element)

        return eid, element


DBNAME='ods:alf@orion:5432:dom_test'

mydoc='''<root><child id="1"><info/></child>
<child id="2"><info>foo</info></child></root>'''

db = Database.Database()
db.open(DBNAME)
tx = db.new()
tx.begin()

doc = MemoryDocument()

e = doc.createElementNS('','elt')
doc.appendChild(e)
tx.checkpoint()
r = Reader.DbDomReader()

frag = r.fromString(mydoc,doc)

map(doc.add_element, Evaluate('root/child',frag))

tx.commit()

PrettyPrint(doc)


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125896&group_id=6473


From noreply@sourceforge.net  Fri Dec 15 14:51:11 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Dec 2000 06:51:11 -0800
Subject: [XML-SIG] [Bug #125897] PyExpat still uses Expat version 1.1
Message-ID: <E146wCV-0003mN-00@usw-sf-web1.sourceforge.net>

Bug #125897, was updated on 2000-Dec-15 06:51
Here is a current snapshot of the bug.

Project: Python/XML
Category: expat
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: mjpieters
Assigned to : nobody
Summary: PyExpat still uses Expat version 1.1

Details: PyExpat should be upgraded to Expat 1.2.

Expat 1.2 changes adds support for parsing external DTDs and parameter entities.

The xml.dom.ext.PyExpat reader (once unbroken ;)) already supports the additional interface for Expat 1.2 (XML_StartDoctypeDeclHandler -> Reader.startDTD).

This functionailty is needed to parse out the public and system Ids of a <!DOCTYPE> declaration, for example.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125897&group_id=6473


From Mike.Olson@fourthought.com  Fri Dec 15 14:48:28 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Fri, 15 Dec 2000 07:48:28 -0700
Subject: [XML-SIG] PyXPath 1.1
References: <200012141527.IAA18240@localhost.localdomain>
Message-ID: <3A3A2F3C.8AE8A27E@FourThought.com>

uche.ogbuji@fourthought.com wrote:
> 
> Wow Martin!  Brilliant work as usual.  Last weekend Jeremy quietly wrote a
> partial XPath lexer all in Python/SRE.  We'll try to bind it to bison and post
> this today so you can run your test harness on it.

Yes, thanks Martin this saves us a lot of time.  Question, will it
handle "mod mod mod" or "* * *"?  These needs to translate to the token
wildcard name, operator, wildcard name.  I ask 'cause this caused us
many headaches with 4XPath.  We had to do it with flex state.

> 
> Maybe it's worth designing a plug-in API for XPath implementations so people
> can make their choices.

This wouldn't be that difficult.  A simple interface to get a list of
tokens(and the matched string) from the scanner would suffice.  We will
need some logic to turn this list into YY unions for Bison, but that is
pretty simple as well.

Mike

> 
> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Nicolas.Chauvat@logilab.fr  Fri Dec 15 15:08:07 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Fri, 15 Dec 2000 16:08:07 +0100 (CET)
Subject: [XML-SIG] 4DOM.xml.xslt.DomWriter.py bugfix
Message-ID: <Pine.LNX.4.21.0012151544180.24303-100000@aries>

In 4Suite's 4DOM xml.xslt.DomWriter.py,

at line 76, read:

 pi =3D self.__ownerDoc.createProcessingInstruction(target,data)
           ^^

at line 81, read:

 comment =3D self.__ownerDoc.createDocument(text)
                ^^                        ^^^^

And you're back on track :-)

--=20
Nicolas Chauvat

http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)


From uche.ogbuji@fourthought.com  Fri Dec 15 15:23:04 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 15 Dec 2000 08:23:04 -0700
Subject: [XML-SIG] Re: [4suite] 4DOM.xml.xslt.DomWriter.py bugfix
References: <Pine.LNX.4.21.0012151544180.24303-100000@aries>
Message-ID: <3A3A3758.62591C7B@fourthought.com>

Nicolas Chauvat wrote:
> 
> In 4Suite's 4DOM xml.xslt.DomWriter.py,
> 
> at line 76, read:
> 
>  pi = self.__ownerDoc.createProcessingInstruction(target,data)
>            ^^
> 
> at line 81, read:
> 
>  comment = self.__ownerDoc.createDocument(text)
>                 ^^                        ^^^^
> 
> And you're back on track :-)

Done.  Thanks.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ngps@post1.com  Fri Dec 15 16:17:23 2000
From: ngps@post1.com (Ng Pheng Siong)
Date: Sat, 16 Dec 2000 00:17:23 +0800
Subject: [XML-SIG] Copyright character chokes parser
Message-ID: <20001216001723.A1163@madcap.dyndns.org>

Hi,

I'm fiddling with XBEL using PyXML 0.6.2.

I have a bookmark entry as follows:

    <bookmark href="http://www.optioninsight.com/" added="946429657" visited="946444587" modified="946429652" >
      <title>Option Insight� - Home of the Greatest Option Program. Ever.</title>
    </bookmark>


The copyright character (you might see it as <A9>) in the title chokes 
xbel_parse.py:

$ python xbel_parse.py --xbel < bm.xml 
Traceback (most recent call last):
  File "xbel_parse.py", line 91, in ?
    p.parseFile( sys.stdin )
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers/drv_pyexpat.py", line 68, in parseFile
    if self.parser.Parse(buf, 0) != 1:
xml.parsers.expat.error: not well-formed: line 68, column 27


A simple SAX-based parser written per the XML HOWTO throws an exception at 
the same spot:

$ python xbp.py < bm.xml
Traceback (most recent call last):
  File "xbp.py", line 19, in ?
    p.parse(sys.stdin)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 42, in parse
    xmlreader.IncrementalParser.parse(self, source)            
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse
    self.feed(buffer)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 86, in feed
    self._err_handler.fatalError(exc)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <stdin>:68:27: not well-formed


Line 68 column 27 is where the copyright character is.

Any hints to a workaround? (I'm not subscribed. Please cc replies.)

TIA. Cheers.
-- 
Ng Pheng Siong <ngps@post1.com> * http://www.post1.com/home/ngps


From uche.ogbuji@fourthought.com  Fri Dec 15 16:49:40 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 15 Dec 2000 09:49:40 -0700
Subject: [XML-SIG] Copyright character chokes parser
In-Reply-To: Message from Ng Pheng Siong <ngps@post1.com>
 of "Sat, 16 Dec 2000 00:17:23 +0800." <20001216001723.A1163@madcap.dyndns.org>
Message-ID: <200012151649.JAA22126@localhost.localdomain>

> I'm fiddling with XBEL using PyXML 0.6.2.
> =

> I have a bookmark entry as follows:
> =

>     <bookmark href=3D"http://www.optioninsight.com/" added=3D"946429657=
" visited=3D"946444587" modified=3D"946429652" >
>       <title>Option Insight=A9 - Home of the Greatest Option Program. E=
ver.</title>
>     </bookmark>

I just went through encoding hell of a more involved sort so I might as w=
ell =

chip in here.

Add =


<?xml version=3D'1.0' encoding=3D'ISO-8859-1'?>

As the first thing in your XML file (that is even before any white space)=
 and =

you should be fine.  If you don't specify an encoding, the parser assumes=
 UTF-8
(except if you use a byte-order mark in which case it assumes UTF-16).  T=
he =

copyright char is not legal UTF-8 because it''s a byte value exceeding 12=
7.  =

ISO-8859-1 or LATIN-1 allow you to use byte values above 127.


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Fri Dec 15 17:17:33 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Dec 2000 09:17:33 -0800
Subject: [XML-SIG] [Bug #125909] xml.dom.ext.Printer produces invalid or incomplete DTDs
Message-ID: <E146yU9-0005jI-00@usw-sf-web3.sourceforge.net>

Bug #125909, was updated on 2000-Dec-15 09:17
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: mjpieters
Assigned to : nobody
Summary: xml.dom.ext.Printer produces invalid or incomplete DTDs

Details: xml.dom.Printer.PrintVisitor.visitDocumentType will produce 
incorrect or incomplete XML in the following cases:

- There is no System ID defined, but there are entities or
  notations:

  The entitites and notations are not written out. The XML 
  spec says that a System ID isn't mandatory, DTDs with 
  <!DOCTYPE rootName [ <entities and notations> ]> is 
  perfectly valid.

- There is both a System ID and a Public ID defined:

  The Public and System ID are written out as:

     <!DOCTYPE PUBLIC "<public id>" SYSTEM "<system id>">

  The keyword 'SYSTEM' is illegal in this context, it 
  should read:

     <!DOCTYPE PUBLIC "<public id>" "<system id>">

- There is a double-quote character (") in either the 
  System ID or the Public ID:

  The Public or System ID in question will be written out 
  enclosed with double-quotes, while the XML spec provides 
  for enclosing the ID in single quotes (').

I'll submit a patch to the patch manager. 

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125909&group_id=6473


From ken@bitsko.slc.ut.us  Fri Dec 15 17:49:34 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 15 Dec 2000 11:49:34 -0600
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: Mike Olson's message of "Fri, 15 Dec 2000 07:48:28 -0700"
References: <200012141527.IAA18240@localhost.localdomain>
 <3A3A2F3C.8AE8A27E@FourThought.com>
Message-ID: <x7ae9xpob5.fsf@bitsko.slc.ut.us>

Mike Olson <Mike.Olson@fourthought.com> writes:

> uche.ogbuji@fourthought.com wrote:
> 
> > Maybe it's worth designing a plug-in API for XPath implementations
> > so people can make their choices.
> 
> This wouldn't be that difficult.  A simple interface to get a list
> of tokens(and the matched string) from the scanner would suffice.
> We will need some logic to turn this list into YY unions for Bison,
> but that is pretty simple as well.

At the plug-in API level, I'd be interested in something more at the
"location path" level, possibly an array of steps, each step with
axis, node test, and list of predicates.

This would involve defining a common, sharable data model for these,
but I think it would be more useful overall than a raw token list.

  -- Ken


From akuchlin@mems-exchange.org  Fri Dec 15 18:27:25 2000
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Fri, 15 Dec 2000 13:27:25 -0500
Subject: [XML-SIG] Adding scripts
Message-ID: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>

What do people think about adding some useful scripts to PyXML that
get installed in /usr/local/bin or somewhere like that?  Possibilities
would be (names off the top of my head):

xmlproc_val  : Validate files using xmlproc
xmlrpc_call  : Make an XML-RPC call (useful for shell scripts, or using
               XML-RPC from languages w/o an XML parser, such as Emacs Lisp)

Anyone have additional ideas?

--amk


From noreply@sourceforge.net  Fri Dec 15 18:25:52 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Dec 2000 10:25:52 -0800
Subject: [XML-SIG] [Patch #102861] xml.dom.ext.Printer produces invalid or incomplete DTDs
Message-ID: <E146zYG-0007CX-00@usw-sf-web2.sourceforge.net>

Patch #102861 has been updated. 

Project: pyxml
Category: None
Status: Open
Submitted by: mjpieters
Assigned to : nobody
Summary: xml.dom.ext.Printer produces invalid or incomplete DTDs

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102861&group_id=6473


From noreply@sourceforge.net  Fri Dec 15 18:44:25 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Dec 2000 10:44:25 -0800
Subject: [XML-SIG] [Bug #125917] DbDom : no cloneNode method.
Message-ID: <E146zqD-0007IU-00@usw-sf-web2.sourceforge.net>

Bug #125917, was updated on 2000-Dec-15 10:44
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: DbDom : no cloneNode method.

Details: Here's a small script that demonstrates the bug:

from Ft.DbDom.Dom import DocumentImp

d = DocumentImp()
e = d.createElementNS('','root')
d.appendChild(e)
f = e.cloneNode(1)

--------------

[alf@leo alf]$ python dbdomclone.py
Traceback (innermost last):
  File "dbdomclone.py", line 6, in ?
    f = e.cloneNode(1)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 170, in __getattr__
    raise AttributeError(name)
AttributeError: cloneNode


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125917&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 15 21:06:23 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 15 Dec 2000 22:06:23 +0100
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <x7ae9xpob5.fsf@bitsko.slc.ut.us> (message from Ken MacLeod on 15
 Dec 2000 11:49:34 -0600)
References: <200012141527.IAA18240@localhost.localdomain>
 <3A3A2F3C.8AE8A27E@FourThought.com> <x7ae9xpob5.fsf@bitsko.slc.ut.us>
Message-ID: <200012152106.WAA00918@loewis.home.cs.tu-berlin.de>

> At the plug-in API level, I'd be interested in something more at the
> "location path" level, possibly an array of steps, each step with
> axis, node test, and list of predicates.

Yes, that would be a reasonable XPath API. How do you like the 4Suite
ParsedLocationPath class, and corresponding structures?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 15 20:58:59 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 15 Dec 2000 21:58:59 +0100
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <3A3A2F3C.8AE8A27E@FourThought.com> (message from Mike Olson on
 Fri, 15 Dec 2000 07:48:28 -0700)
References: <200012141527.IAA18240@localhost.localdomain> <3A3A2F3C.8AE8A27E@FourThought.com>
Message-ID: <200012152058.VAA00867@loewis.home.cs.tu-berlin.de>

> Yes, thanks Martin this saves us a lot of time.  Question, will it
> handle "mod mod mod" or "* * *"?  These needs to translate to the
> token wildcard name, operator, wildcard name.  I ask 'cause this
> caused us many headaches with 4XPath.  We had to do it with flex
> state.

Currently, "* * *" is recognized as NameTest NameTest MultiplyOperator.

This was incorrect due to a minor bug. I just fixed that, it now
tokenizes this as STAR (i.e. NameTest) MultiplyOperator STAR and
NCName mod NCName, respectively.

The scanner generator in yapps was not suitable for the special rules,
so I have my own hand-written parser. For Spark, the "generated"
tokenization could be easily expanded to provide the correct token
sequence.

> > Maybe it's worth designing a plug-in API for XPath implementations
> > so people can make their choices.
> 
> This wouldn't be that difficult.

I think there is no need to have two different XPath tokenizers that
both use sre. Instead, I hope we can merge the two implementations,
using correctness and speed as measurements.

It then still needs to adjusted to the parser, but that is normally a
simple transformation - the underlying code should always be the same.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 15 21:09:28 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 15 Dec 2000 22:09:28 +0100
Subject: [XML-SIG] Adding scripts
In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
 (amk@mira.erols.com)
References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
Message-ID: <200012152109.WAA00962@loewis.home.cs.tu-berlin.de>

> xmlproc_val  : Validate files using xmlproc

Sounds like a good idea. 

> xmlrpc_call  : Make an XML-RPC call (useful for shell scripts, or using
>                XML-RPC from languages w/o an XML parser, such as Emacs Lisp)

I can't see the usage for that one. Why would you need an XML parser
to make an XML-RPC call? Formatting the request is easy. For
processing the response, there might be indeed the need for a parser -
how would this script present the result to the caller?

Regards,
Martin


From ngps@post1.com  Sun Dec 17 15:02:28 2000
From: ngps@post1.com (Ng Pheng Siong)
Date: Sun, 17 Dec 2000 23:02:28 +0800
Subject: [XML-SIG] Copyright character chokes parser
In-Reply-To: <200012151649.JAA22126@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 15, 2000 at 09:49:40AM -0700
References: <ngps@post1.com> <200012151649.JAA22126@localhost.localdomain>
Message-ID: <20001217230228.B300@madcap.dyndns.org>

On Fri, Dec 15, 2000 at 09:49:40AM -0700, uche.ogbuji@fourthought.com wrote:
> Add 
> <?xml version='1.0' encoding='ISO-8859-1'?>

Thanks, that did it.

Cheers.
-- 
Ng Pheng Siong <ngps@post1.com> * http://www.post1.com/home/ngps


From keichwa@gmx.net  Mon Dec 18 05:52:04 2000
From: keichwa@gmx.net (Karl Eichwalder)
Date: 18 Dec 2000 06:52:04 +0100
Subject: [XML-SIG] Re: Adding scripts
In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
Message-ID: <shelz6e0or.fsf@tux.gnu.franken.de>

"A.M. Kuchling" <amk@mira.erols.com> writes:

> xmlproc_val  : Validate files using xmlproc
> xmlrpc_call  : Make an XML-RPC call (useful for shell scripts, or using
>                XML-RPC from languages w/o an XML parser, such as Emacs Li=
sp)
>=20
> Anyone have additional ideas?

Please, consider the prefix =BBpy_=AB or something.

--=20
work : ke@suse.de                          |                   ,__o
     : http://www.suse.de/~ke/             |                 _-\_<,
home : keichwa@gmx.net                     |                (*)/'(*)


From Mike.Olson@fourthought.com  Mon Dec 18 08:47:04 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 18 Dec 2000 01:47:04 -0700
Subject: [XML-SIG] Small memory leak
Message-ID: <3A3DCF08.12C1A98F@FourThought.com>

In the DTDParser.  There is a cirular reference, cyclops output follows:

0x84e1850 rc:1 instance xml.parsers.xmlproc.dtdparser.DTDParser
    repr: <xml.parsers.xmlproc.dtdparser.DTDParser instance at 84e1850>
    this.ent ->
0x84e1968 rc:1 instance xml.parsers.xmlproc.xmlapp.EntityHandler
    repr: <xml.parsers.xmlproc.xmlapp.EntityHandler instance at 84e1968>
    this.parser ->
0x84e18d0 rc:1 instance xml.parsers.xmlproc.xmlapp.ErrorHandler
    repr: <xml.parsers.xmlproc.xmlapp.ErrorHandler instance at 84e18d0>
    this.locator ->
0x84e1850 rc:1 instance xml.parsers.xmlproc.dtdparser.DTDParser
    repr: <xml.parsers.xmlproc.dtdparser.DTDParser instance at 84e1850>


I got around it by changing the deref function on DTDParser to also set
self.ent to None.

    def deref(self):
        "Removes circular references."
        self.ent = self.dtd_consumer = self.dtd = self.app = self.err =
None

Mike

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Mon Dec 18 10:49:56 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 18 Dec 2000 11:49:56 +0100
Subject: [XML-SIG] Small memory leak
In-Reply-To: <3A3DCF08.12C1A98F@FourThought.com> (message from Mike Olson on
 Mon, 18 Dec 2000 01:47:04 -0700)
References: <3A3DCF08.12C1A98F@FourThought.com>
Message-ID: <200012181049.LAA00706@loewis.home.cs.tu-berlin.de>

> I got around it by changing the deref function on DTDParser to also set
> self.ent to None.

Unless Lars Marius objects - would you like to commit that change to
PyXML?

Regards,
Martin


From uche.ogbuji@fourthought.com  Tue Dec 19 03:28:02 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 18 Dec 2000 20:28:02 -0700
Subject: [XML-SIG] Lexical handlers for PyXML?
Message-ID: <200012190328.UAA13043@localhost.localdomain>

Looks as if there is no lexical handler support in drv_pyexpat or drv_xmlproc. 
 They're all mentioned in to-do lists.  I know Lars is pretty much buried in 
work and unless someone else picks up the flag it might be a while before it 
happens.

I can certainly add lexical handler support to drv_pyexpat (I'll sign up for 
the easy part.  Heh!)

Mostly I wanted to be sure it's not completely forgotten.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Mike.Olson@fourthought.com  Tue Dec 19 03:33:00 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 18 Dec 2000 20:33:00 -0700
Subject: [XML-SIG] Small memory leak
References: <3A3DCF08.12C1A98F@FourThought.com> <200012181049.LAA00706@loewis.home.cs.tu-berlin.de>
Message-ID: <3A3ED6EC.754898E0@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > I got around it by changing the deref function on DTDParser to also set
> > self.ent to None.
> 
> Unless Lars Marius objects - would you like to commit that change to
> PyXML?

No objections so I checked it in.

Mike

> 
> Regards,
> Martin

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Tue Dec 19 03:51:39 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Dec 2000 19:51:39 -0800
Subject: [XML-SIG] [Bug #126272] LexicalHandler not supported for drv_pyexpat.
Message-ID: <E148DoR-0003Ww-00@usw-sf-web1.sourceforge.net>

Bug #126272, was updated on 2000-Dec-18 19:51
Here is a current snapshot of the bug.

Project: Python/XML
Category: SAX
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: fdrake
Assigned to : uche
Summary: LexicalHandler not supported for drv_pyexpat.

Details: Uche pointed out that LexicalHandler wasn't support for either pyexpat or xmlproc, and volunteered to implement it.

This bug report is his reminder!

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126272&group_id=6473


From fdrake@acm.org  Tue Dec 19 03:47:51 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 18 Dec 2000 22:47:51 -0500 (EST)
Subject: [XML-SIG] Lexical handlers for PyXML?
In-Reply-To: <200012190328.UAA13043@localhost.localdomain>
References: <200012190328.UAA13043@localhost.localdomain>
Message-ID: <14910.55911.770435.756449@cj42289-a.reston1.va.home.com>

uche.ogbuji@fourthought.com writes:
 > Looks as if there is no lexical handler support in drv_pyexpat or
 > drv_xmlproc. 
...
 > I can certainly add lexical handler support to drv_pyexpat (I'll
 > sign up for the easy part.  Heh!)

  I'd love to see it get done!  In fact, I just filed a bug & assigned
it to you as a reminder.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From noreply@sourceforge.net  Tue Dec 19 04:06:19 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Dec 2000 20:06:19 -0800
Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX parser
Message-ID: <E148E2d-0003as-00@usw-sf-web1.sourceforge.net>

Bug #126275, was updated on 2000-Dec-18 20:06
Here is a current snapshot of the bug.

Project: Python/XML
Category: expat
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: uche
Assigned to : nobody
Summary: pyexpat.c doesn't match docs or SAX parser

Details: _xmlplus/sax/expatreader.py, line 81

self._parser.Parse(data, isFinal)

And the Python 2.0 docs say this is right.

But 'ave a butcher's at PyXML-0.6.1/extensions/pyexpat.c line 379 and following, particularly the PyArg_ParseTuple

static PyObject *
xmlparse_Parse(xmlparseobject *self, PyObject *args)
{
    char *s;
    int slen;
    int isFinal = 0;
    int rv;

    if (!PyArg_ParseTuple(args, "s#|i:Parse", &s, &slen, &isFinal))
        return NULL;

Uh oh.  Surely enough:

>>> doc = r.fromString(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py", line 49, in fromString
    rt = self.fromStream(stream, ownerDoc)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 267, in fromStream
    self.parser.parse(stream)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 42, in parse
    xmlreader.IncrementalParser.parse(self, source)            
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse
    self.feed(buffer)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 81, in feed
    self._parser.Parse(data, isFinal)
TypeError: not enough arguments; expected 4, got 2
>>> 

Hmm.  So what's right?  The C code or the SAX driver and docs?

Note: Python 2.0's pyexpat.c is the same way as PyXML's


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126275&group_id=6473


From uche.ogbuji@fourthought.com  Tue Dec 19 04:14:45 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 18 Dec 2000 21:14:45 -0700
Subject: [XML-SIG] Lexical handlers for PyXML?
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Mon, 18 Dec 2000 22:47:51 EST." <14910.55911.770435.756449@cj42289-a.reston1.va.home.com>
Message-ID: <200012190414.VAA13313@localhost.localdomain>

> 
> uche.ogbuji@fourthought.com writes:
>  > Looks as if there is no lexical handler support in drv_pyexpat or
>  > drv_xmlproc. 
> ...
>  > I can certainly add lexical handler support to drv_pyexpat (I'll
>  > sign up for the easy part.  Heh!)
> 
>   I'd love to see it get done!  In fact, I just filed a bug & assigned
> it to you as a reminder.  ;-)

Oh yeah?  Remind me to send you a time machine and a ticket on the Titanic.  
Ah well, I'll take it on.

Even more serious, probably, is this bug, which I just submitted.

https://sourceforge.net/bugs/?func=detailbug&bug_id=126275&group_id=6473

I assume Paul or whoever wrote pyexpat sent you the interface.  Looks as if 
the code doesn't match the docs, and it bombs SAX2 with pyexpat.

It looks as if the right thing to do is to just match the docs and ixnay the 
extra parameters in the C code, but I'm guessing there are others who know 
better.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Dec 19 04:23:54 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 18 Dec 2000 21:23:54 -0700
Subject: [XML-SIG] Lexical handlers for PyXML?
In-Reply-To: Message from uche.ogbuji@fourthought.com
 of "Mon, 18 Dec 2000 21:14:45 MST." <200012190414.VAA13313@localhost.localdomain>
Message-ID: <200012190423.VAA13351@localhost.localdomain>

> >   I'd love to see it get done!  In fact, I just filed a bug & assigned
> > it to you as a reminder.  ;-)
> 
> Oh yeah?  Remind me to send you a time machine and a ticket on the Titanic.  
> Ah well, I'll take it on.

Just in case anyone is low on humor supplements, this was a joke.  I know, I 
know, but you never know, especially since I don't use emoticons on principle.

Happy holiday, all.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Tue Dec 19 04:36:54 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 18 Dec 2000 23:36:54 -0500 (EST)
Subject: [XML-SIG] Lexical handlers for PyXML?
In-Reply-To: <200012190414.VAA13313@localhost.localdomain>
References: <200012190414.VAA13313@localhost.localdomain>
 <200012190423.VAA13351@localhost.localdomain>
 <fdrake@acm.org>
 <14910.55911.770435.756449@cj42289-a.reston1.va.home.com>
Message-ID: <14910.58854.493580.318328@cj42289-a.reston1.va.home.com>

uche.ogbuji@fourthought.com writes:
 > Even more serious, probably, is this bug, which I just submitted.

  I agree; this is a problem.  The version in the Python CVS tree
(xml.sax.expatreader) seems fine, or it was working for me this
morning (I don't find I use PyXML often anymore now that we have
something in the standard library).  I'll try and look at it this
week.

 > Just in case anyone is low on humor supplements, this was a joke.

  Sure....   ;)

 > Happy holiday, all.

  Bah, humbug!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 19 10:05:36 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 19 Dec 2000 11:05:36 +0100
Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX parser
In-Reply-To: <E148E2d-0003as-00@usw-sf-web1.sourceforge.net>
 (noreply@sourceforge.net)
References: <E148E2d-0003as-00@usw-sf-web1.sourceforge.net>
Message-ID: <200012191005.LAA10014@loewis.home.cs.tu-berlin.de>

> Hmm.  So what's right?  The C code or the SAX driver and docs?

My guess is that this has nothing to do with Parse(), the function
works correctly. Instead, the problem is that pyexpat invokes a
callback on the content handler, and *that* call has problems with the
number of arguments. Most likely, it's a call to characters, which
occurs frequently when a DocumentHandler is used in a place where a
ContentHandler is expected (i.e. in SAX2).

The straight-forward solution is to have expat call a Python function
with the right number of arguments, and to have that function call the
content handler. Unfortunately, that will add another Python function
call for every characters event, even though in every working
application the argument number mismatch will never be a problem.

So somehow pyexpat should put itself into the traceback. I'm not sure
how this would be done best, though - we can't give reasonable line
number, for example.

Contributions are welcome.

Regards,
Martin


From noreply@sourceforge.net  Tue Dec 19 15:19:15 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Dec 2000 07:19:15 -0800
Subject: [XML-SIG] [Bug #126342] DbDom: cloneNode bug (18/12 snapshot)
Message-ID: <E148OXr-0007NE-00@usw-sf-web3.sourceforge.net>

Bug #126342, was updated on 2000-Dec-19 07:19
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: DbDom: cloneNode bug (18/12 snapshot)

Details: I get an integrity exception when trying to set a cloned Attribute. 

Sample code:
from Ft.DbDom import Dom
from Ft.Ods import Database
from Ft.DbDom import Reader
from xml.dom.ext import PrettyPrint,StripXml,Print
from Ft.Ods import FreePersistentObject

from Ft.DbDom.Dom import DocumentImp
DBNAME='ods:alf@orion:5432:dom_test'

db = Database.Database()
db.open(DBNAME)
tx = db.new()
tx.begin()

d = DocumentImp()
e = d.createElementNS('','root')
d.appendChild(e)
e.setAttributeNS('','foo','bar')
f=d.createElementNS('','child')
e.appendChild(f)
for attr in e.attributes:
    f.setAttributeNodeNS(attr.cloneNode(1))

tx.commit()


------------------8<-------------------
Sample output:
[alf@leo alf]$ python dbdomclone.py 
Traceback (innermost last):
  File "dbdomclone.py", line 22, in ?
    f.setAttributeNodeNS(attr.cloneNode(1))
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/Dom.py", line 226, in setAttributeNodeNS
    self.add_attributes(node)
  File "/usr/lib/python1.5/site-packages/Ft/DbDom/Element/__init__.py", line 22, in add_attributes
    self._4ods_addRelationship('attributes',Attribute.Attribute_stub,'ownerElement','form',target,inverse)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 271, in _4ods_addRelationship
    val._4ods_formRelationship(inverseName,self.__class__,name,'add',self,0)
  File "/usr/lib/python1.5/site-packages/Ft/Ods/PersistentObject.py", line 232, in _4ods_formRelationship
    raise IntegrityException(name)
Ft.Ods.IntegrityException: Integrity error on relationship ownerElement


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126342&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 19 15:53:31 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 19 Dec 2000 16:53:31 +0100
Subject: [XML-SIG] Upgrading Expat
Message-ID: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de>

I just imported Expat 1.2 into the PyXML tree, and updated the pyexpat
module to expose the new handlers supported by Expat. Unfortunately,
there is no version number in the Expat headers, so anybody compiling
the expat module must now what the expat version is. For PyXML,
setup.py can always know what the Expat version is we ship; for Python
proper, it would default to 1.1 unless specified otherwise.

Regards,
Martin


From uche.ogbuji@fourthought.com  Tue Dec 19 16:09:29 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 19 Dec 2000 09:09:29 -0700
Subject: [XML-SIG] [Bug #126275] pyexpat.c doesn't match docs or SAX
 parser
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Tue, 19 Dec 2000 11:05:36 +0100." <200012191005.LAA10014@loewis.home.cs.tu-berlin.de>
Message-ID: <200012191609.JAA31431@localhost.localdomain>

> > Hmm.  So what's right?  The C code or the SAX driver and docs?
> 
> My guess is that this has nothing to do with Parse(), the function
> works correctly. Instead, the problem is that pyexpat invokes a
> callback on the content handler, and *that* call has problems with the
> number of arguments. Most likely, it's a call to characters, which
> occurs frequently when a DocumentHandler is used in a place where a
> ContentHandler is expected (i.e. in SAX2).

Aieee.  Just so.  I need to stop raising alarms when I should be sleeping.  By 
the time I glanced at the PyArg_ParseTuple I had already convinced myself what 
the bug was, so I quite readily read it wrongly.

Culpa mea.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Tue Dec 19 16:07:23 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 19 Dec 2000 11:07:23 -0500 (EST)
Subject: [XML-SIG] Upgrading Expat
In-Reply-To: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de>
References: <200012191553.QAA11565@loewis.home.cs.tu-berlin.de>
Message-ID: <14911.34747.370233.637321@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > I just imported Expat 1.2 into the PyXML tree, and updated the pyexpat
 > module to expose the new handlers supported by Expat. Unfortunately,
 > there is no version number in the Expat headers, so anybody compiling

  Could you file a bug report for this at
http://sourceforge.net/projects/expat/?  I'll try and make sure
something gets added.
  There is an XML_ExpatVersion() function in the CVS version, but that
still doesn't provide for compile-time checking, or support the older
versions we've been using.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Tue Dec 19 18:55:11 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 19 Dec 2000 19:55:11 +0100
Subject: [XML-SIG] PyXML, sgmlop and xmllib
In-Reply-To: <200012151044.KAA34209@tepid.osl.fast.no> (message from	Dag Brattli on Fri, 15 Dec 2000 10:44:15 GMT)
References: <200012151044.KAA34209@tepid.osl.fast.no>
Message-ID: <200012191855.TAA12487@loewis.home.cs.tu-berlin.de>

> The xmllib.py for sgmlop is missing from PyXML. Does anybody know
> where to find an updated version?

You can find one in old copies of PyXML, e.g. in PyXML 0.5.1.

> Both README.sgmlop and xml/parsers/__init__.py tells that there
> should be an xmllib.py around but it's not.

Yes, that's an error in the documentation which will be corrected in
the next release; users should use sgmlop directly, or, say, the SAX
driver.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Wed Dec 20 13:03:39 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 20 Dec 2000 14:03:39 +0100 (CET)
Subject: [XML-SIG] 4DOM and DTD
Message-ID: <Pine.LNX.4.21.0012201359220.4545-100000@leo.logilab.fr>

Hello,

I was wondering if there's a way to get a reference to the DTD object once
an XML document has been read using the the validating reader stub in
4DOM (the idea is to enable be able to validate it at some later point,
after it's been modified, to ensure that the document in still valid
before flushing it to disk, for example.)

Thanks for your help.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From martin@loewis.home.cs.tu-berlin.de  Wed Dec 20 14:52:43 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 20 Dec 2000 15:52:43 +0100
Subject: [XML-SIG] Lexical handlers for PyXML?
Message-ID: <200012201452.PAA00800@loewis.home.cs.tu-berlin.de>

> Looks as if there is no lexical handler support in drv_pyexpat or
> drv_xmlproc.

Sure there is. The SAX2 xmlproc driver definitely emits LexicalHandler
and DeclHandler events.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Dec 20 14:49:13 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 20 Dec 2000 15:49:13 +0100
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <Pine.LNX.4.21.0012201359220.4545-100000@leo.logilab.fr> (message
 from Alexandre Fayolle on Wed, 20 Dec 2000 14:03:39 +0100 (CET))
References: <Pine.LNX.4.21.0012201359220.4545-100000@leo.logilab.fr>
Message-ID: <200012201449.PAA00741@loewis.home.cs.tu-berlin.de>

> I was wondering if there's a way to get a reference to the DTD
> object once an XML document has been read using the the validating
> reader stub in 4DOM

I believe that is not possible: The 4DOM readers use only SAX1
parsers, and the only reader that reports DeclHandler and
LexicalHandler events is the SAX2 xmlproc driver.

Regards,
Martin


From uche.ogbuji@fourthought.com  Wed Dec 20 15:52:24 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 20 Dec 2000 08:52:24 -0700
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Wed, 20 Dec 2000 15:49:13 +0100." <200012201449.PAA00741@loewis.home.cs.tu-berlin.de>
Message-ID: <200012201552.IAA16613@localhost.localdomain>

> > I was wondering if there's a way to get a reference to the DTD
> > object once an XML document has been read using the the validating
> > reader stub in 4DOM
> 
> I believe that is not possible: The 4DOM readers use only SAX1
> parsers, and the only reader that reports DeclHandler and
> LexicalHandler events is the SAX2 xmlproc driver.

I was actually in the process of migrating to the SAX2 framework when I ran 
into all the troubles I've been reporting.  You've corrected me on some things 
so I'll have a second look, but it has been much more of a chore than it needs 
to be.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Wed Dec 20 18:05:15 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 20 Dec 2000 19:05:15 +0100
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <200012201552.IAA16613@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012201552.IAA16613@localhost.localdomain>
Message-ID: <200012201805.TAA01147@loewis.home.cs.tu-berlin.de>

> I was actually in the process of migrating to the SAX2 framework
> when I ran into all the troubles I've been reporting.  You've
> corrected me on some things so I'll have a second look, but it has
> been much more of a chore than it needs to be.

I think the decision to change the signature of characters between a
DocumentHandler and a ContentHandler has by far caused the most
portability problems recently. Since the number of authors that have
written DocumentHandlers is limited, I hope there will be a time when
this is not a problem anymore.

Regards,
Martin


From larsga@garshol.priv.no  Wed Dec 20 19:22:12 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Dec 2000 20:22:12 +0100
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <200012201805.TAA01147@loewis.home.cs.tu-berlin.de>
References: <200012201552.IAA16613@localhost.localdomain> <200012201805.TAA01147@loewis.home.cs.tu-berlin.de>
Message-ID: <m3u27yaoez.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| I think the decision to change the signature of characters between a
| DocumentHandler and a ContentHandler has by far caused the most
| portability problems recently. Since the number of authors that have
| written DocumentHandlers is limited, I hope there will be a time
| when this is not a problem anymore.

It's beginning to look like adding an adapter to the PyXML package
would be a good idea, perhaps as part of the saxtools.  I can't do it
just yet, but if nobody gets there before me I will probably do it
once the book is done.

--Lars M.


From larsga@garshol.priv.no  Wed Dec 20 19:27:45 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Dec 2000 20:27:45 +0100
Subject: [XML-SIG] Adding scripts
In-Reply-To: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
References: <200012151827.NAA01187@207-172-146-21.s21.tnt3.ann.va.dialup.rcn.com>
Message-ID: <m3snniao5q.fsf@lambda.garshol.priv.no>

* A. M. Kuchling
|
| What do people think about adding some useful scripts to PyXML that
| get installed in /usr/local/bin or somewhere like that?  

I like the idea.

| Possibilities would be (names off the top of my head):
| 
| xmlproc_val  : Validate files using xmlproc

xvcmd.py in the xmlproc distribution does this and could be used.

The xmlproc distribution contains more scripts that might fall into
this category:

 wxValidator.py	        : wxPython-based parser interface
 xpcmd.py               : non-validating cousin of xpcmd.py
 dtdcmd.py              : parse and check DTDs
 dtd2schema.py          : naive DTD to XML Schema converter
 

I've also been thinking about tools like:

 - something that normalizes XML documents
 - something that makes XML documents standalone
 - a DTD normalizer (this exists, but is not in the xmlproc distro yet)

--Lars M.


From larsga@garshol.priv.no  Wed Dec 20 19:30:21 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Dec 2000 20:30:21 +0100
Subject: [XML-SIG] Small memory leak
In-Reply-To: <200012181049.LAA00706@loewis.home.cs.tu-berlin.de>
References: <3A3DCF08.12C1A98F@FourThought.com> <200012181049.LAA00706@loewis.home.cs.tu-berlin.de>
Message-ID: <m3r932ao1e.fsf@lambda.garshol.priv.no>

* Mike Olson
|
| I got around it by changing the deref function on DTDParser to also set
| self.ent to None.

* Martin v. Loewis
| 
| Unless Lars Marius objects - would you like to commit that change to
| PyXML?

The fix is perfectly fine.  I've now also applied it to my local CVS
tree, which will be merged with the PyXML one as soon as I have time.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Wed Dec 20 21:26:39 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 20 Dec 2000 22:26:39 +0100
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <m3u27yaoez.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 20 Dec 2000 20:22:12 +0100)
References: <200012201552.IAA16613@localhost.localdomain> <200012201805.TAA01147@loewis.home.cs.tu-berlin.de> <m3u27yaoez.fsf@lambda.garshol.priv.no>
Message-ID: <200012202126.WAA01550@loewis.home.cs.tu-berlin.de>

> It's beginning to look like adding an adapter to the PyXML package
> would be a good idea, perhaps as part of the saxtools.  I can't do it
> just yet, but if nobody gets there before me I will probably do it
> once the book is done.

It's not that changing the code is so difficult that you'd need
support libraries - in my experience, the necessary changes are
trivial.

What *is* a problem is to know that you have to make changes, and to
find out what those changes are. It is particularly confusing that the
Python traceback puts you on the wrong track.

Regards,
Martin


From fdrake@acm.org  Thu Dec 21 01:55:30 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 20 Dec 2000 20:55:30 -0500 (EST)
Subject: [XML-SIG] forwarded message from noreply@sourceforge.net
Message-ID: <14913.25362.547012.190609@cj42289-a.reston1.va.home.com>

--PrH0oNW7ir
Content-Type: text/plain; charset=us-ascii
Content-Description: message body and .signature
Content-Transfer-Encoding: 7bit


  Progess!  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


--PrH0oNW7ir
Content-Type: message/rfc822
Content-Description: forwarded message
Content-Transfer-Encoding: 7bit

Return-Path: <nobody@sourceforge.net>
Received: from mh3-sfba.mail.home.com ([24.0.95.134])
          by mail.rdc1.md.home.com (InterMail vM.4.01.03.00 201-229-121)
          with ESMTP
          id <20001221015740.ZWNK10139.mail.rdc1.md.home.com@mh3-sfba.mail.home.com>
          for <fdrake01@mail.reston1.va.home.com>;
          Wed, 20 Dec 2000 17:57:40 -0800
Received: from mx3-sfba.mail.home.com (mx3-sfba.mail.home.com [24.0.95.138])
	by mh3-sfba.mail.home.com (8.9.3/8.9.0) with ESMTP id RAA19288
	for <fdrake01@home.com>; Wed, 20 Dec 2000 17:57:39 -0800 (PST)
Received: from mail.acm.org (mail.acm.org [199.222.69.4])
	by mx3-sfba.mail.home.com (8.9.1/8.9.1) with ESMTP id RAA17390
	for <fdrake01@home.com>; Wed, 20 Dec 2000 17:57:39 -0800 (PST)
Received: from usw-sf-netmisc.sourceforge.net (usw-sf-sshgate.sourceforge.net [216.136.171.253])
	by mail.acm.org (8.9.3/8.9.3) with ESMTP id UAA39740
	for <fdrake@acm.org>; Wed, 20 Dec 2000 20:57:34 -0500
Received: from usw-sf-web2-b.sourceforge.net
	([10.3.1.6] helo=usw-sf-web2.sourceforge.net ident=mail)
	by usw-sf-netmisc.sourceforge.net with esmtp (Exim 3.16 #1 (Debian))
	id 148uz4-0001hS-00; Wed, 20 Dec 2000 17:57:30 -0800
Received: from nobody by usw-sf-web2.sourceforge.net with local (Exim 3.16 #1 (Debian))
	id 148uz5-0000Mm-00; Wed, 20 Dec 2000 17:57:31 -0800
Message-Id: <E148uz5-0000Mm-00@usw-sf-web2.sourceforge.net>
From: noreply@sourceforge.net
Sender: nobody <nobody@sourceforge.net>
To: loewis@informatik.hu-berlin.de, fdrake@acm.org, expat-bugs@sourceforge.net
Subject: [Bug #126353] xmlparse.h does not indicate a version
Date: Wed, 20 Dec 2000 17:57:31 -0800

Bug #126353, was updated on 2000-Dec-19 09:04
Here is a current snapshot of the bug.

Project: Expat XML Parser
Category: None
Status: Closed
Resolution: Fixed
Bug Group: None
Priority: 6
Submitted by: loewis
Assigned to : fdrake
Summary: xmlparse.h does not indicate a version

Details: Applications that need to compile for different versions of expat cannot determine the expat version at compile time. Therefore, manual intervention or advanced guessing is necessary to compile such applications, which is undesirable.

Follow-Ups:

Date: 2000-Dec-20 17:57
By: fdrake

Comment:
Added compile-time detectable version information to expat.h (new name for xmlparse.h).  Three new #defines, XML_MAJOR_VERSION, XML_MINOR_VERSION, and XML_MICRO_VERSION, have been added.  XML_ExpatVersion() computes it's result dynamically using this information, and the new function XML_ExpatVersionInfo() returns this information in a structure.

This will be available in Expat 1.96.0.
-------------------------------------------------------

Date: 2000-Dec-19 09:09
By: fdrake

Comment:
Assigned to me, since I asked Martin to actually make this a bug report.  I'll note that the application in question is the Python binding for Expat, but the need is not limited to scripting language bindings.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126353&group_id=10127

--PrH0oNW7ir--


From Alexandre.Fayolle@logilab.fr  Thu Dec 21 09:27:39 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 21 Dec 2000 10:27:39 +0100 (CET)
Subject: [XML-SIG] New stuff on w3.org
Message-ID: <Pine.LNX.4.21.0012211022260.5741-100000@leo.logilab.fr>

Since I believe not everybody on this list monitors the W3C website
closely (I, for one, do not), I thought I might as well post a few pieces
on info here concerning Recommendations and Proposed Recommendations. For
more info, please refer to http://www.w3.org

On Dec. 19th, XHTML Basic bacame a Recommentation. 
On Dec. 20th, XLink and XML Base became Proposed Recommentations. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From mbulik@mecalog.fr  Thu Dec 21 14:05:47 2000
From: mbulik@mecalog.fr (Michal BULIK)
Date: Thu, 21 Dec 2000 15:05:47 +0100
Subject: [XML-SIG] problem with install
Message-ID: <3A420E3B.B8131D61@mecalog.fr>

I have just installed python 1.5.2 from source on an
SGI with Irix 6.5 and then I've tried to install 
PyXML.

When I try to execute setup.py the pgm complains about
missing distutils.core :

jorasses 1788% python setup.py build
Traceback (innermost last):
  File "setup.py", line 8, in ?
    from distutils.core import setup, Extension
ImportError: No module named distutils.core

I could find no such a file in the python tree ...

I'm sorry if the question is completely stupid, but
I'm a python newbie ...

Best regards, Michal Bulik

-------------------------------------------------------------
Michal BULIK                    Tel. : 33 (0) 1 55 59 01 90
MECALOG                         Fax  : 33 (0) 1 55 59 96 36
Centre d'affaires, Bat. A       E-mail : mbulik@mecalog.fr
2, rue de la Renaissance
F - 92184 ANTONY CEDEX          http://www.radioss.com
-------------------------------------------------------------


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 21 14:58:33 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 21 Dec 2000 15:58:33 +0100
Subject: [XML-SIG] problem with install
In-Reply-To: <3A420E3B.B8131D61@mecalog.fr> (message from Michal BULIK on Thu,
 21 Dec 2000 15:05:47 +0100)
References: <3A420E3B.B8131D61@mecalog.fr>
Message-ID: <200012211458.PAA00665@loewis.home.cs.tu-berlin.de>

> ImportError: No module named distutils.core
> 
> I could find no such a file in the python tree ...

You need to install the distutils,
http://www.python.org/sigs/distutils-sig

Distutils are included with Python starting from 1.6.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Thu Dec 21 16:30:25 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 21 Dec 2000 17:30:25 +0100 (CET)
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <200012201449.PAA00741@loewis.home.cs.tu-berlin.de>
Message-ID: <Pine.LNX.4.21.0012211726420.1285-100000@leo.logilab.fr>

On Wed, 20 Dec 2000, Martin v. Loewis wrote:

> > I was wondering if there's a way to get a reference to the DTD
> > object once an XML document has been read using the the validating
> > reader stub in 4DOM
> 
> I believe that is not possible: The 4DOM readers use only SAX1
> parsers, and the only reader that reports DeclHandler and
> LexicalHandler events is the SAX2 xmlproc driver.

I'm a bit surprised, but Uche did not comment on this, so you must be
right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide
then? I really thought that  specifying validate=1 in FromXml made it use
xmlproc. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From noreply@sourceforge.net  Thu Dec 21 16:52:37 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Dec 2000 08:52:37 -0800
Subject: [XML-SIG] [Bug #126612] 4DOM: handling attribute default value
Message-ID: <E1498xJ-0005lp-00@usw-sf-web3.sourceforge.net>

Bug #126612, was updated on 2000-Dec-21 08:52
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: 4DOM: handling attribute default value

Details: Hi there,

I tried to investigate this, but got stuck with the lack of Sax2 support, since resolution involves accessing a DTD object. 

The DOM spec says that Element.removeAttribute should do the following: "If the removed attribute is known to have a default value, an attribute immediately appears containing the default value as well as the corresponding namespace URI, local name, and prefix when applicable."

This is not the case in the current implementation of 4DOM.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126612&group_id=6473


From Alexandre.Fayolle@logilab.fr  Thu Dec 21 17:00:14 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 21 Dec 2000 18:00:14 +0100 (CET)
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: <Pine.LNX.4.21.0012211726420.1285-100000@leo.logilab.fr>
Message-ID: <Pine.LNX.4.21.0012211754590.1365-100000@leo.logilab.fr>

On Thu, 21 Dec 2000, Alexandre Fayolle wrote:

> I'm a bit surprised, but Uche did not comment on this, so you must be
> right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide
> then? I really thought that  specifying validate=1 in FromXml made it use
> xmlproc. 

OK, I refered to the Source code, and see the problem. I'll sum it up, in
case someone else is interested but is too lazy to check for
him/herself. If I'm wrong, please correct me.

There are two packages providing Sax interface to parsers, xml.sax.drivers
and xml.sax.drivers2. The first one uses Sax1 parsers, and is used by
xml.dom.ext.reader.Sax2. I reckon the latter will be soon upgraded to use
xml.sax.drivers2, but could not so far because of the lack of SAX2 parsers
in xml-sig (?). However, everything should be ready in reader.Sax2 to use
the Sax2 interfaces to the xml-sig parsers. 


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From noreply@sourceforge.net  Thu Dec 21 17:01:55 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Dec 2000 09:01:55 -0800
Subject: [XML-SIG] [Bug #126613] 4DOM: documentType node has empty systemID
Message-ID: <E14996J-0005oh-00@usw-sf-web3.sourceforge.net>

Bug #126613, was updated on 2000-Dec-21 09:01
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: 4DOM: documentType node has empty systemID

Details: This is probably due to a SAX1 parser being used in reader.Sax2, and therefore does not report the documentType properly; if so please consider this report as a reminder of something to be checked when the package is updated. 

When building a DOM with validate=1, the doctype systemID and publicID are empty strings. 

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=126613&group_id=6473


From div@commerceflow.com  Thu Dec 21 21:59:21 2000
From: div@commerceflow.com (Div Shekhar)
Date: Thu, 21 Dec 2000 13:59:21 -0800
Subject: [XML-SIG] problem with install
References: <3A420E3B.B8131D61@mecalog.fr> <200012211458.PAA00665@loewis.home.cs.tu-berlin.de>
Message-ID: <3A427D39.F352B92F@commerceflow.com>

I've had a similar problem with 1.6a2, so I'm currently using PyXML
0.5.5.1

div@div:~/py/PyXML-0.6.2$ python setup.py 
Traceback (most recent call last):
  File "setup.py", line 8, in ?
    from distutils.core import setup, Extension
ImportError: cannot import name Extension


From uche.ogbuji@fourthought.com  Fri Dec 22 04:17:49 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 21 Dec 2000 21:17:49 -0700
Subject: [XML-SIG] TEST: IGNORE
Message-ID: <3A42D5ED.60CF99B5@fourthought.com>


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Dec 22 04:57:23 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 21 Dec 2000 21:57:23 -0700
Subject: [XML-SIG] 4DOM and DTD
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Thu, 21 Dec 2000 18:00:14 +0100." <Pine.LNX.4.21.0012211754590.1365-100000@leo.logilab.fr>
Message-ID: <200012220457.VAA01821@localhost.localdomain>

> On Thu, 21 Dec 2000, Alexandre Fayolle wrote:
> 
> > I'm a bit surprised, but Uche did not comment on this, so you must be
> > right. Just being curious, what does the xml.dom.ext.reader.Sax2 provide
> > then? I really thought that  specifying validate=1 in FromXml made it use
> > xmlproc. 
> 
> OK, I refered to the Source code, and see the problem. I'll sum it up, in
> case someone else is interested but is too lazy to check for
> him/herself. If I'm wrong, please correct me.
> 
> There are two packages providing Sax interface to parsers, xml.sax.drivers
> and xml.sax.drivers2. The first one uses Sax1 parsers, and is used by
> xml.dom.ext.reader.Sax2. I reckon the latter will be soon upgraded to use
> xml.sax.drivers2, but could not so far because of the lack of SAX2 parsers
> in xml-sig (?). However, everything should be ready in reader.Sax2 to use
> the Sax2 interfaces to the xml-sig parsers. 

Close.  I actually went most of the way on this, as you can see from the 
latest CVS snapshop.  I ran into a lot of problems which I mostly 
misinterpreted out of fatigue and sloth.

I plan to have another go, probably today, and I might even add LexicalHandler 
support to drv_pyexpat while I'm at it.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Dec 22 05:26:19 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 21 Dec 2000 22:26:19 -0700
Subject: [XML-SIG] Oddities
Message-ID: <200012220526.WAA01901@localhost.localdomain>

I think I have the pyexpat/lexhandler work in hand.  However, while testing 
it, I ran into two oddities in setup.py.

Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find 
extensions/expat/xmlparse/hashtable.c.  I just commented this out of setup.py 
and it compiles fine now.

Secondly, it tries to place the docs at

/usr/local/xmldoc

Tsk.  tsk.  That should be

/usr/local/doc/PyXML-<version>

It looks, however, as if someone went to some length to avoid the standard 
way, so I'd like to know why before fixing it.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From eugeneai@icc.ru  Fri Dec 22 09:19:11 2000
From: eugeneai@icc.ru (Evgeny Cherkashin)
Date: Fri, 22 Dec 2000 17:19:11 +0800
Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat
Message-ID: <200012220816.QAA08820@monster.icc.ru>

This is a multi-part message in MIME format.

--Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


Hi!

Please find patch to support python encodings by pyexpat.
Is it possible to include it in next release of PyXML?

It seems that the patch will work fine for 8bit->unicode translation.

The patch works simple: it builds expat structure encoding table by translation of template (vector of chars "\0\1...\0xff') into desired encoding (no translation procedure needed)

Sincerely,
Evgeny
--


--Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30
Content-Type: application/octet-stream;
 name="pyexpat_diff"
Content-Disposition: attachment;
 filename="pyexpat_diff"
Content-Transfer-Encoding: base64

QmluYXJ5IGZpbGVzIG9yaWcvUHlYTUwtMC42LjIvYnVpbGQvbGliLmxpbnV4LWk1ODYtMi4wL194
bWxwbHVzL3BhcnNlcnMvcHlleHBhdC5zbyBhbmQgbmV3L1B5WE1MLTAuNi4yL2J1aWxkL2xpYi5s
aW51eC1pNTg2LTIuMC9feG1scGx1cy9wYXJzZXJzL3B5ZXhwYXQuc28gZGlmZmVyCmRpZmYgLXJ1
TiBvcmlnL1B5WE1MLTAuNi4yL2V4dGVuc2lvbnMvcHlleHBhdC5jIG5ldy9QeVhNTC0wLjYuMi9l
eHRlbnNpb25zL3B5ZXhwYXQuYwotLS0gb3JpZy9QeVhNTC0wLjYuMi9leHRlbnNpb25zL3B5ZXhw
YXQuYwlUaHUgTm92ICAyIDEyOjU0OjUwIDIwMDAKKysrIG5ldy9QeVhNTC0wLjYuMi9leHRlbnNp
b25zL3B5ZXhwYXQuYwlGcmkgRGVjIDIyIDExOjE2OjMyIDIwMDAKQEAgLTYyNSw2ICs2MjUsNjEg
QEAKIC8qIC0tLS0tLS0tLS0gKi8KIAogCisjaWYgIShQWV9NQUpPUl9WRVJTSU9OID09IDEgJiYg
UFlfTUlOT1JfVkVSU0lPTiA8IDYpCisKKy8qIAorICAgIHB5ZXhwYXQgaW50ZXJuYXRpb25hbCBl
bmNvZGluZyBzdXBwb3J0LgorICAgIE1ha2UgaXQgYXMgc2ltcGxlIGFzIHBvc3NpYmxlLgorKi8K
Kworc3RhdGljIGNoYXIgdGVtcGxhdGVfYnVmZmVyWzI1Nl07CitQeU9iamVjdCAqIHRlbXBsYXRl
X3N0cmluZz1OVUxMOworCitzdGF0aWMgdm9pZCAKK2luaXRfdGVtcGxhdGVfYnVmZmVyKCkKK3sK
KyAgICBpbnQgaTsKKyAgICBmb3IgKGk9MDtpPDI1NjtpKyspIHsKKwl0ZW1wbGF0ZV9idWZmZXJb
aV09aTsKKyAgICB9OworICAgIHRlbXBsYXRlX2J1ZmZlclsyNTZdPTA7Cit9OworCitpbnQgCitQ
eVVua25vd25FbmNvZGluZ0hhbmRsZXIodm9pZCAqZW5jb2RpbmdIYW5kbGVyRGF0YSwgCitjb25z
dCBYTUxfQ2hhciAqbmFtZSwgCitYTUxfRW5jb2RpbmcgKiBpbmZvKQoreworICAgIFB5VW5pY29k
ZU9iamVjdCAqIF91X3N0cmluZz1OVUxMOworICAgIGludCByZXN1bHQ9MDsKKyAgICBpbnQgaTsK
KyAgICAKKyAgICBfdV9zdHJpbmc9KFB5VW5pY29kZU9iamVjdCAqKSBQeVVuaWNvZGVfRGVjb2Rl
KHRlbXBsYXRlX2J1ZmZlciwgMjU2LCBuYW1lLCAicmVwbGFjZSIpOyAvLyBZZXMsIHN1cHBvcnRz
IG9ubHkgOGJpdCBlbmNvZGluZ3MKKyAgICAKKyAgICBpZiAoX3Vfc3RyaW5nPT1OVUxMKSB7CisJ
cmV0dXJuIHJlc3VsdDsKKyAgICB9OworICAgIAorICAgIGZvciAoaT0wOyBpPDI1NjsgaSsrKSB7
CisJUHlfVU5JQ09ERSBjID0gX3Vfc3RyaW5nLT5zdHJbaV0gOyAvLyBTdHVwaWQgdG8gYWNjZXNz
IGRpcmVjdGx5LCBidXQgZmFzdAorCWlmIChjPT1QeV9VTklDT0RFX1JFUExBQ0VNRU5UX0NIQVJB
Q1RFUikgeworCSAgICBpbmZvLT5tYXBbaV0gPSAtMTsKKwl9IGVsc2UgeworCSAgICBpbmZvLT5t
YXBbaV0gPSBjOworCX07CisgICAgfTsKKyAgICAKKyAgICBpbmZvLT5kYXRhID0gTlVMTDsKKyAg
ICBpbmZvLT5jb252ZXJ0ID0gTlVMTDsKKyAgICBpbmZvLT5yZWxlYXNlID0gTlVMTDsKKyAgICBy
ZXN1bHQ9MTsKKyAgICAKKyAgICBQeV9ERUNSRUYoX3Vfc3RyaW5nKTsKKyAgICByZXR1cm4gcmVz
dWx0OworfQorCisjZW5kaWYKKwogc3RhdGljIHhtbHBhcnNlb2JqZWN0ICoKIG5ld3htbHBhcnNl
b2JqZWN0KGNoYXIgKmVuY29kaW5nLCBjaGFyICpuYW1lc3BhY2Vfc2VwYXJhdG9yKQogewpAQCAt
NjU4LDYgKzcxMywxMCBAQAogICAgICAgICByZXR1cm4gTlVMTDsKICAgICB9CiAgICAgWE1MX1Nl
dFVzZXJEYXRhKHNlbGYtPml0c2VsZiwgKHZvaWQgKilzZWxmKTsKKyNpZiBQWV9NQUpPUl9WRVJT
SU9OID09IDEgJiYgUFlfTUlOT1JfVkVSU0lPTiA8IDYKKyNlbHNlCisgICAgWE1MX1NldFVua25v
d25FbmNvZGluZ0hhbmRsZXIoc2VsZi0+aXRzZWxmLCAoWE1MX1Vua25vd25FbmNvZGluZ0hhbmRs
ZXIpIFB5VW5rbm93bkVuY29kaW5nSGFuZGxlciwgTlVMTCk7CisjZW5kaWYKIAogICAgIGZvcihp
ID0gMDsgaGFuZGxlcl9pbmZvW2ldLm5hbWUgIT0gTlVMTDsgaSsrKQogICAgICAgICAvKiBkbyBu
b3RoaW5nICovOwpAQCAtODIxLDcgKzg4MCw2IEBACiAvKiBFbmQgb2YgY29kZSBmb3IgeG1scGFy
c2VyIG9iamVjdHMgKi8KIC8qIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tICovCiAKLQogc3RhdGljIGNoYXIgcHlleHBhdF9QYXJzZXJDcmVh
dGVfX2RvY19fW10gPQogIlBhcnNlckNyZWF0ZShbZW5jb2RpbmdbLCBuYW1lc3BhY2Vfc2VwYXJh
dG9yXV0pIC0+IHBhcnNlclxuXAogUmV0dXJuIGEgbmV3IFhNTCBwYXJzZXIgb2JqZWN0LiI7CkBA
IC05MzcsNiArOTk1LDEwIEBACiAgICAgUHlNb2R1bGVfQWRkT2JqZWN0KG0sICJfX3ZlcnNpb25f
XyIsCiAgICAgICAgICAgICAgICAgICAgICAgIFB5U3RyaW5nX0Zyb21TdHJpbmdBbmRTaXplKHJl
disxMSwgc3RybGVuKHJldisxMSktMikpOwogCisjaWYgUFlfTUFKT1JfVkVSU0lPTiA9PSAxICYm
IFBZX01JTk9SX1ZFUlNJT04gPCA2CisjZWxzZQorICAgIGluaXRfdGVtcGxhdGVfYnVmZmVyKCk7
CisjZW5kaWYKICAgICAvKiBYWFggV2hlbiBFeHBhdCBzdXBwb3J0cyBzb21lIHdheSBvZiBmaWd1
cmluZyBvdXQgaG93IGl0IHdhcwogICAgICAgIGNvbXBpbGVkLCB0aGlzIHNob3VsZCBjaGVjayBh
bmQgc2V0IG5hdGl2ZV9lbmNvZGluZyAKICAgICAgICBhcHByb3ByaWF0ZWx5LiAK

--Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30
Content-Type: application/octet-stream;
 name="enc_test.xml"
Content-Disposition: attachment;
 filename="enc_test.xml"
Content-Transfer-Encoding: base64

PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0ia29pOC1yIj8+Cjx0YWcgbmFtZT0i6c3RIiB2
YWx1ZT0i+s7B3sXOycUiPgrhINzUzyDX08Ugz9PUwczYztnFINPJzdfPzNkgwsXaIChcImUpOgoK
ysPVy8XOx9vd2sjfxtnXwdDSz8zE1tHe083J1NjCwOrj9evl7uf7/fro/+b59+Hw8u/s5Pb88f7z
7en0+OLgCjwvdGFnPg==

--Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30
Content-Type: application/octet-stream;
 name="test_encodings.py"
Content-Disposition: attachment;
 filename="test_encodings.py"
Content-Transfer-Encoding: base64

IyEvdXNyL2Jpbi9lbnYgcHl0aG9uCgoiIiIKVGhpcyB3aWxsIHNob3cgcnVzc2lhbiB0ZXh0IGlu
IGtvaTgtciBlbmNvZGluZy4KIiIiCgpmcm9tIHhtbC5wYXJzZXJzIGltcG9ydCBleHBhdA0KaW1w
b3J0IHN0cmluZw0KDQpjbGFzcyBYTUxUcmVlOg0KCWRlZiBfX2luaXRfXyhzZWxmKToNCgkJcGFz
cw0KDQoJIyBEZWZpbmUgYSBoYW5kbGVyIGZvciBzdGFydCBlbGVtZW50IGV2ZW50cw0KCWRlZiBT
dGFydEVsZW1lbnQoc2VsZiwgbmFtZSwgYXR0cnMgKToNCgkJI25hbWUgPSBuYW1lLmVuY29kZSgp
DQoJCXByaW50ICI8IiwgcmVwcihuYW1lKSwgIj4iDQoJCXByaW50ICJhdHRyIG5hbWU6IiwgYXR0
cnMuZ2V0KCJuYW1lIix1IiIpLmVuY29kZSgia29pOC1yIikKCQlwcmludCAiYXR0ciB2YWx1ZToi
LCBhdHRycy5nZXQoInZhbHVlIix1IiIpLmVuY29kZSgia29pOC1yIikKDQoJZGVmIEVuZEVsZW1l
bnQoc2VsZiwgIG5hbWUgKToNCgkJcHJpbnQgIjwvIiwgcmVwcihuYW1lKSwgIj4iDQoNCglkZWYg
Q2hhcmFjdGVyRGF0YShzZWxmLCBkYXRhICk6DQoJCWlmIHN0cmluZy5zdHJpcChkYXRhKToNCgkJ
CWRhdGEgPSBkYXRhLmVuY29kZSgia29pOC1yIikNCgkJCXByaW50IGRhdGENCg0KDQoJZGVmIExv
YWRUcmVlKHNlbGYsIGZpbGVuYW1lKToNCgkJIyBDcmVhdGUgYSBwYXJzZXINCgkJUGFyc2VyID0g
ZXhwYXQuUGFyc2VyQ3JlYXRlKCkNCg0KCQkjIFRlbGwgdGhlIHBhcnNlciB3aGF0IHRoZSBzdGFy
dCBlbGVtZW50IGhhbmRsZXIgaXMNCgkJUGFyc2VyLlN0YXJ0RWxlbWVudEhhbmRsZXIgPSBzZWxm
LlN0YXJ0RWxlbWVudA0KCQlQYXJzZXIuRW5kRWxlbWVudEhhbmRsZXIgPSBzZWxmLkVuZEVsZW1l
bnQNCgkJUGFyc2VyLkNoYXJhY3RlckRhdGFIYW5kbGVyID0gc2VsZi5DaGFyYWN0ZXJEYXRhDQoN
CgkJIyBQYXJzZSB0aGUgWE1MIEZpbGUNCgkJUGFyc2VyU3RhdHVzID0gUGFyc2VyLlBhcnNlKG9w
ZW4oZmlsZW5hbWUsJ3InKS5yZWFkKCksIDEpDQoNCg0KZGVmIHJ1blRlc3QoKToNCgl3aW4gPSBY
TUxUcmVlKCkNCgl3aW4uTG9hZFRyZWUoImVuY190ZXN0LnhtbCIpDQoJcmV0dXJuIHdpbg0KDQpy
dW5UZXN0KCkK

--Multipart_Fri__22_Dec_2000_17:19:11_+0800_08163b30--


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 22 13:31:59 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 22 Dec 2000 14:31:59 +0100
Subject: [XML-SIG] Oddities
In-Reply-To: <200012220526.WAA01901@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012220526.WAA01901@localhost.localdomain>
Message-ID: <200012221331.OAA00885@loewis.home.cs.tu-berlin.de>

> Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find
> extensions/expat/xmlparse/hashtable.c.  I just commented this out of
> setup.py and it compiles fine now.

Oops. I had an uncommitted fix for that in my setup.py...

> Secondly, it tries to place the docs at
> 
> /usr/local/xmldoc
> 
> Tsk.  tsk.  That should be
> 
> /usr/local/doc/PyXML-<version>
> 
> It looks, however, as if someone went to some length to avoid the standard 
> way, so I'd like to know why before fixing it.

By default, setup.py should not install the doc files at all - what
would be the standard way to have the installed there? Again, it was a
checkin error that it is installed - only that the doc2xmldoc=1 line
should *not* have been committed :-(

The intent here is that the doc files go into the RPM as %doc, are
installed as xmldoc on Windows, and are not touched otherwise.

Regards,
Martin


From uche.ogbuji@fourthought.com  Fri Dec 22 15:44:37 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 22 Dec 2000 08:44:37 -0700
Subject: [XML-SIG] Oddities
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Fri, 22 Dec 2000 14:31:59 +0100." <200012221331.OAA00885@loewis.home.cs.tu-berlin.de>
Message-ID: <200012221544.IAA03136@localhost.localdomain>

> > Firstly, PyXML CVS wouldn't compile pyexpat because it couldn't find
> > extensions/expat/xmlparse/hashtable.c.  I just commented this out of
> > setup.py and it compiles fine now.
> 
> Oops. I had an uncommitted fix for that in my setup.py...

Ah.  Never mind, though.  I checked it in with my pyexpat changes.

> > Secondly, it tries to place the docs at
> > 
> > /usr/local/xmldoc
> > 
> > Tsk.  tsk.  That should be
> > 
> > /usr/local/doc/PyXML-<version>
> > 
> > It looks, however, as if someone went to some length to avoid the standard 
> > way, so I'd like to know why before fixing it.
> 
> By default, setup.py should not install the doc files at all - what
> would be the standard way to have the installed there?

By "standard" I mean Linux standard.  I'm not sure if Solaris, etc. place docs 
at the same spot.  But for Linux, vendor-packaged docs go in

/usr/doc/<package-name>-<package-version>

and third-party package docs to

/usr/local/doc/<package-name>-<package-version>

Actually, it looks as if Red Hat has started moving to the latter location for 
all docs.  Anyway, every Python/distutils package I've installed follows this 
convention and places its docs in /usr/local/doc/<package-name>
-<package-version>.  As does 4Suite, of course.

I think the default should be to install docs.  They are an important part of 
the package.  Even better if people know exactly where to look for them.

> Again, it was a
> checkin error that it is installed - only that the doc2xmldoc=1 line
> should *not* have been committed :-(
> 
> The intent here is that the doc files go into the RPM as %doc, are
> installed as xmldoc on Windows, and are not touched otherwise.

Hmm.  I think they should be installed by setup.py as well.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ben@thoughtstream.org  Fri Dec 22 16:00:44 2000
From: ben@thoughtstream.org (Ben Darnell)
Date: Fri, 22 Dec 2000 11:00:44 -0500
Subject: [XML-SIG] Oddities
In-Reply-To: <200012221544.IAA03136@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 22, 2000 at 08:44:37AM -0700
References: <martin@loewis.home.cs.tu-berlin.de> <200012221544.IAA03136@localhost.localdomain>
Message-ID: <20001222110044.B2227@unity.ncsu.edu>

On Fri, Dec 22, 2000 at 08:44:37AM -0700, uche.ogbuji@fourthought.com wrote:
> By "standard" I mean Linux standard.  I'm not sure if Solaris, etc. place docs 
> at the same spot.  But for Linux, vendor-packaged docs go in
> 
> /usr/doc/<package-name>-<package-version>
> 
> and third-party package docs to
> 
> /usr/local/doc/<package-name>-<package-version>

By "standard" you mean Red Hat standard.  Debian, for instance, uses
/usr/share/doc/<package-name>

-Ben
-- 
Ben Darnell              ben@thoughtstream.org
http://thoughtstream.org
Finger bgdarnel@debian.org for PGP/GPG key 1024D/1F06E509


From uche.ogbuji@fourthought.com  Fri Dec 22 16:13:54 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 22 Dec 2000 09:13:54 -0700
Subject: [XML-SIG] Oddities
In-Reply-To: Message from Ben Darnell <ben@thoughtstream.org>
 of "Fri, 22 Dec 2000 11:00:44 EST." <20001222110044.B2227@unity.ncsu.edu>
Message-ID: <200012221613.JAA03272@localhost.localdomain>

> On Fri, Dec 22, 2000 at 08:44:37AM -0700, uche.ogbuji@fourthought.com wrote:
> > By "standard" I mean Linux standard.  I'm not sure if Solaris, etc. place docs 
> > at the same spot.  But for Linux, vendor-packaged docs go in
> > 
> > /usr/doc/<package-name>-<package-version>
> > 
> > and third-party package docs to
> > 
> > /usr/local/doc/<package-name>-<package-version>
> 
> By "standard" you mean Red Hat standard.  Debian, for instance, uses
> /usr/share/doc/<package-name>

Really?  I thought /usr/local/doc was Linux Standard Base.  I don't have a 
ref, mind, I was commenting off-head.  Also, many other distros besides Red 
Hat do it this way.

Nevertheless, I still think docs should be installed with every package.  Do 
you have any idea for an algorithm for package documentation location?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 22 16:06:13 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 22 Dec 2000 17:06:13 +0100
Subject: [XML-SIG] Oddities
In-Reply-To: <200012221544.IAA03136@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012221544.IAA03136@localhost.localdomain>
Message-ID: <200012221606.RAA01619@loewis.home.cs.tu-berlin.de>

> I think the default should be to install docs.  They are an
> important part of the package.

That poses an interesting problem for distutils. Karl Eichwalder from
SuSE requested that the PyXML RPM should use the %doc directive for
declaring documentation files. That is easy enough to do; rpm will
then, on installation, chose a location for these files (typically
/usr/doc or /usr/share/doc). *That*, AFAIK, is the official way.

Now, if I also install them, then bdist_rpm will include them twice,
and the will also get installed twice. That is undesirable.

Regards,
Martin


From teg@redhat.com  Fri Dec 22 23:34:50 2000
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 22 Dec 2000 18:34:50 -0500
Subject: [XML-SIG] Oddities
In-Reply-To: <200012221544.IAA03136@localhost.localdomain>
References: <200012221544.IAA03136@localhost.localdomain>
Message-ID: <xuy4rzw6ndx.fsf@halden.devel.redhat.com>

uche.ogbuji@fourthought.com writes:

> By "standard" I mean Linux standard.  I'm not sure if Solaris, etc. place docs 
> at the same spot.  But for Linux, vendor-packaged docs go in
> 
> /usr/doc/<package-name>-<package-version>

It's /usr/share/doc/<package-name>-<package-version> now (FHS)

> /usr/local/doc/<package-name>-<package-version>
> 
> Actually, it looks as if Red Hat has started moving to the latter location for 
> all docs.

No, we don't touch /usr/local at all.

-- 
Trond Eivind Glomsr�d
Red Hat, Inc.


From paulp@ActiveState.com  Sat Dec 23 00:13:27 2000
From: paulp@ActiveState.com (Paul Prescod)
Date: Fri, 22 Dec 2000 16:13:27 -0800
Subject: [XML-SIG] New stuff on w3.org
References: <Pine.LNX.4.21.0012211022260.5741-100000@leo.logilab.fr>
Message-ID: <3A43EE27.BBC9D32A@ActiveState.com>

Alexandre Fayolle wrote:
> 
> Since I believe not everybody on this list monitors the W3C website
> closely (I, for one, do not), ...

> On Dec. 19th, XHTML Basic bacame a Recommentation.
> On Dec. 20th, XLink and XML Base became Proposed Recommentations.

An even more interesting development is that a draft version of XSLT now
has a formal mechanism for embedding other scripting languages. An
example is at the bottom

http://www.w3.org/TR/xslt11

 Paul Prescod

<xsl:script implements-prefix="date"
             language="ecmascript"
             src="DateRoutines.js"/>
<xsl:script implements-prefix="util"
             language="ecmascript">
function upper(n) {
  return n.toUpperCase();
}
function lower(n) {
 return n.toLowerCase();
}

function iff(arg1, arg2, arg3) {
  if (arg1) {
    return arg2;
  }
  else {
    return arg3;
  }
}
</xsl:script>


From div@commerceflow.com  Sat Dec 23 01:50:20 2000
From: div@commerceflow.com (Div Shekhar)
Date: Fri, 22 Dec 2000 17:50:20 -0800
Subject: [XML-SIG] how to clean up parser without causing parsing?
Message-ID: <3A4404DC.D76CB60D@commerceflow.com>

Hi!

I'm using xmlproc through the SAX interface. (PyXML 0.5.5.1/Python 1.6)
I have this code to parse a file:

! p = XMLParserFactory.make_parser( 'xml.sax.drivers.drv_xmlproc' )
! sp = MyHandler()
! p.setDocumentHandler( sp )   # other handlers left out for simplicity
! try:
!     p.parseFile( file )
! finally:                     # even if an exception is raised
!     p.close()                #  call close() to free memory

My handler does some validation, and raises an exception when it's not
happy with the XML that comes from the file.

The close() causes remaining data to be parsed, which results in more
SAX callbacks coming to my handler, which throws new exceptions which
are very confusing.

To work around this, I replaced the 'p.close()' with the close()
implementation in drv_xmlproc, and moved one line above the 'finally':

! try:
!     p.parseFile( file )
!     p.parser.close()                      # \   cut & paste from
! finally:
!     p.parser.deref()                      #  |  drv_xmlproc.close()
!     p.err_handler = p.dtd_handler = None  #  |
!     p.doc_handler = p.parser = None       #  |  
!     p.locator = p.ent_handler = None      # /

I thought of the following alternatives:

1. have my handler set a flag, and then ignore further calls.
2. point the parser to a do nothing handler before calling close()
3. doing the following:  p.reset()  p.close()

But they're not as efficient. What should I be doing?

Sincerely,
Div

(P.S. Any chance of ExtendedParser adding a free() method? :)


From uche.ogbuji@fourthought.com  Sat Dec 23 04:07:51 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 22 Dec 2000 21:07:51 -0700
Subject: [XML-SIG] Oddities
In-Reply-To: Message from teg@redhat.com (Trond Eivind
 =?iso-8859-1?q?Glomsr=F8d?=)
 of "22 Dec 2000 18:34:50 EST." <xuy4rzw6ndx.fsf@halden.devel.redhat.com>
Message-ID: <200012230407.VAA01167@localhost.localdomain>

> uche.ogbuji@fourthought.com writes:
> =

> > By "standard" I mean Linux standard.  I'm not sure if Solaris, etc. p=
lace docs =

> > at the same spot.  But for Linux, vendor-packaged docs go in
> > =

> > /usr/doc/<package-name>-<package-version>
> =

> It's /usr/share/doc/<package-name>-<package-version> now (FHS)
> =

> > /usr/local/doc/<package-name>-<package-version>
> > =

> > Actually, it looks as if Red Hat has started moving to the latter loc=
ation for =

> > all docs.
> =

> No, we don't touch /usr/local at all.

Good to have an authority on the subject.  Thanks.

So it looks like  /usr/share/doc/<package-name>-<package-version> across =
the =

board now.  Does this settle the matter for Linux?

Any thoughts about other Unixen?


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Dec 23 04:10:17 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 22 Dec 2000 21:10:17 -0700
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: Message from Paul Prescod <paulp@ActiveState.com>
 of "Fri, 22 Dec 2000 16:13:27 PST." <3A43EE27.BBC9D32A@ActiveState.com>
Message-ID: <200012230410.VAA01180@localhost.localdomain>

> Alexandre Fayolle wrote:
> > 
> > Since I believe not everybody on this list monitors the W3C website
> > closely (I, for one, do not), ...
> 
> > On Dec. 19th, XHTML Basic bacame a Recommentation.
> > On Dec. 20th, XLink and XML Base became Proposed Recommentations.
> 
> An even more interesting development is that a draft version of XSLT now
> has a formal mechanism for embedding other scripting languages. An
> example is at the bottom

Yes.  This was one of my more depressing discoveries of the month.

They couldn't just provide a node-set function, maybe some grouping 
primitives, and be done with XSLT 1.1.

Sigh.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 11:23:14 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 12:23:14 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr> (message
 from Alexandre Fayolle on Wed, 13 Dec 2000 18:45:25 +0100 (CET))
References: <Pine.LNX.4.21.0012131843370.1942-100000@leo.logilab.fr>
Message-ID: <200012231123.MAA01132@loewis.home.cs.tu-berlin.de>

> > To Ft/Dom/__init__.py and expected everything to break, but all
> > was well.  It seems that at least Python 2.0 is clever when the
> > same import can be made as a package and an object.  Is this also
> > the casde with Python 1.5.2?
>
> I tried that with python 1.5.2 (adding a empty Node class to
> xml/dom/__init__.py) and it looks like it's fine too. 

Actually, there is a problem. If you do "import xml.dom.Node", then
you'll loose the class from __init__. Please see the attached example.

So I think the change of adding xml.dom.Node needs to be reverted
somehow.

Regards
Martin

#!/bin/sh
# This is a shell archive (produced by GNU sharutils 4.2).
# To extract the files from this archive, save it to some FILE, remove
# everything before the `!/bin/sh' line above, then type `sh FILE'.
#
# Made on 2000-12-23 12:22 CET by <martin@mira>.
# Source directory was `/home/martin/tmp/x'.
#
# Existing files will *not* be overwritten unless `-c' is specified.
#
# This shar contains:
# length mode       name
# ------ ---------- ------------------------------------------
#     19 -rw-r--r-- pack/__init__.py
#     19 -rw-r--r-- pack/A.py
#     61 -rw-r--r-- testing.py
#
save_IFS="${IFS}"
IFS="${IFS}:"
gettext_dir=FAILED
locale_dir=FAILED
first_param="$1"
for dir in $PATH
do
  if test "$gettext_dir" = FAILED && test -f $dir/gettext \
     && ($dir/gettext --version >/dev/null 2>&1)
  then
    set `$dir/gettext --version 2>&1`
    if test "$3" = GNU
    then
      gettext_dir=$dir
    fi
  fi
  if test "$locale_dir" = FAILED && test -f $dir/shar \
     && ($dir/shar --print-text-domain-dir >/dev/null 2>&1)
  then
    locale_dir=`$dir/shar --print-text-domain-dir`
  fi
done
IFS="$save_IFS"
if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED
then
  echo=echo
else
  TEXTDOMAINDIR=$locale_dir
  export TEXTDOMAINDIR
  TEXTDOMAIN=sharutils
  export TEXTDOMAIN
  echo="$gettext_dir/gettext -s"
fi
touch -am 1231235999 $$.touch >/dev/null 2>&1
if test ! -f 1231235999 && test -f $$.touch; then
  shar_touch=touch
else
  shar_touch=:
  echo
  $echo 'WARNING: not restoring timestamps.  Consider getting and'
  $echo "installing GNU \`touch', distributed in GNU File Utilities..."
  echo
fi
rm -f 1231235999 $$.touch
#
if mkdir _sh01060; then
  $echo 'x -' 'creating lock directory'
else
  $echo 'failed to create lock directory'
  exit 1
fi
# ============= pack/__init__.py ==============
if test ! -d 'pack'; then
  $echo 'x -' 'creating directory' 'pack'
  mkdir 'pack'
fi
if test -f 'pack/__init__.py' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'pack/__init__.py' '(file already exists)'
else
  $echo 'x -' extracting 'pack/__init__.py' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'pack/__init__.py' &&
class A:
X  val = 3
SHAR_EOF
  $shar_touch -am 12231215100 'pack/__init__.py' &&
  chmod 0644 'pack/__init__.py' ||
  $echo 'restore of' 'pack/__init__.py' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'pack/__init__.py:' 'MD5 check failed'
d0e22baa34ce648d02a5985bb626ca97  pack/__init__.py
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'pack/__init__.py'`"
    test 19 -eq "$shar_count" ||
    $echo 'pack/__init__.py:' 'original size' '19,' 'current size' "$shar_count!"
  fi
fi
# ============= pack/A.py ==============
if test -f 'pack/A.py' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'pack/A.py' '(file already exists)'
else
  $echo 'x -' extracting 'pack/A.py' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'pack/A.py' &&
class B:
X  val = 4
SHAR_EOF
  $shar_touch -am 12231215100 'pack/A.py' &&
  chmod 0644 'pack/A.py' ||
  $echo 'restore of' 'pack/A.py' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'pack/A.py:' 'MD5 check failed'
7c0bf0114ca239435403d33f3c475cb3  pack/A.py
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'pack/A.py'`"
    test 19 -eq "$shar_count" ||
    $echo 'pack/A.py:' 'original size' '19,' 'current size' "$shar_count!"
  fi
fi
# ============= testing.py ==============
if test -f 'testing.py' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'testing.py' '(file already exists)'
else
  $echo 'x -' extracting 'testing.py' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'testing.py' &&
import pack
print pack.A.val
import pack.A
print pack.A.val
X
SHAR_EOF
  $shar_touch -am 12231216100 'testing.py' &&
  chmod 0644 'testing.py' ||
  $echo 'restore of' 'testing.py' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'testing.py:' 'MD5 check failed'
045d5a097b0968507fce45f10d00c5b2  testing.py
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'testing.py'`"
    test 61 -eq "$shar_count" ||
    $echo 'testing.py:' 'original size' '61,' 'current size' "$shar_count!"
  fi
fi
rm -fr _sh01060
exit 0


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 11:44:31 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 12:44:31 +0100
Subject: [XML-SIG] how to clean up parser without causing parsing?
In-Reply-To: <3A4404DC.D76CB60D@commerceflow.com> (message from Div Shekhar on
 Fri, 22 Dec 2000 17:50:20 -0800)
References: <3A4404DC.D76CB60D@commerceflow.com>
Message-ID: <200012231144.MAA01234@loewis.home.cs.tu-berlin.de>

> I thought of the following alternatives:
> 
> 1. have my handler set a flag, and then ignore further calls.
> 2. point the parser to a do nothing handler before calling close()
> 3. doing the following:  p.reset()  p.close()
> 
> But they're not as efficient. What should I be doing?

I suggest to use PyXML 0.6, and the SAX2 xmlproc driver. AFAICT, it is
safe to just drop the reference to the parser (certainly in Python
2.0, where potential cycles are collected). The xmlproc driver is not
incremental, so the SAX2 version releases the underlying parser at the
end of parse(). If you release the reader object, that will in turn
release the references to your handlers.

So in short, you should write

! p = XMLParserFactory.make_parser( 'xml.sax.drivers.drv_xmlproc' )
! sp = MyHandler()
! p.setDocumentHandler( sp )   # other handlers left out for simplicity
! p.parseFile( file )
! p = None

> (P.S. Any chance of ExtendedParser adding a free() method? :)

Since it is an experimental interface, why not? Please submit patches
to sourceforge.net/projects/pyxml.

In Python, explicit memory management is normally not necessary. So
these methods are typically called close() or release().

Please note that adding the operation to the interface won't give you
anything; you'd also have to modify the existing parsers. I personally
won't change any of the existing SAX1 drivers; efforts should be put
into the SAX2 drivers, IMO. Also, PyXML 0.5 is no longer maintained,
so I'd apply any patches I get only to 0.6.x.

Regards,
Martin


From akuchlin@mems-exchange.org  Sat Dec 23 13:57:53 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Sat, 23 Dec 2000 08:57:53 -0500
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <200012230410.VAA01180@localhost.localdomain>; from uche.ogbuji@fourthought.com on Fri, Dec 22, 2000 at 09:10:17PM -0700
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain>
Message-ID: <20001223085753.A11534@newcnri.cnri.reston.va.us>

On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote:
>They couldn't just provide a node-set function, maybe some grouping 
>primitives, and be done with XSLT 1.1.

Lots of people on W3C mailing lists do seem hell-bent on giving the
world another example of rampant overcomplexity to put on the shelf
next to the OSI protocols.  (For me, it was XSchema: two documents
specify it, and they're around 400K and 600K of HTML.  Don't hold your
breath waiting for a Python implementation...)

--amk


From uche.ogbuji@fourthought.com  Sat Dec 23 16:45:32 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 23 Dec 2000 09:45:32 -0700
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: Message from Andrew Kuchling <akuchlin@cnri.reston.va.us>
 of "Sat, 23 Dec 2000 08:57:53 EST." <20001223085753.A11534@newcnri.cnri.reston.va.us>
Message-ID: <200012231645.JAA02928@localhost.localdomain>

> On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote:
> >They couldn't just provide a node-set function, maybe some grouping 
> >primitives, and be done with XSLT 1.1.
> 
> Lots of people on W3C mailing lists do seem hell-bent on giving the
> world another example of rampant overcomplexity to put on the shelf
> next to the OSI protocols.  (For me, it was XSchema: two documents
> specify it, and they're around 400K and 600K of HTML.  Don't hold your
> breath waiting for a Python implementation...)

Yeah.  I just had my moment with XSchema: while wrestling with SOAP.  I'd 
always been familiar with them, and that's why I had always shunned them for 
Schematron, but now that I've got even more close and personal with XSchema, I 
think I can say I've never seen a worse example of an overwrought 
specification since ANSI STD C++.

I know that a Python version of XSchema is unlikely to come from this quarter. 
 We're happily chugging away with Schematron.  Now we use it through XSLT, but 
we might consider writing a pure Python engine for it (a Perl engine was 
recently announced).


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Dec 23 16:49:37 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 23 Dec 2000 09:49:37 -0700
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sat, 23 Dec 2000 12:23:14 +0100." <200012231123.MAA01132@loewis.home.cs.tu-berlin.de>
Message-ID: <200012231649.JAA02948@localhost.localdomain>

> > > To Ft/Dom/__init__.py and expected everything to break, but all
> > > was well.  It seems that at least Python 2.0 is clever when the
> > > same import can be made as a package and an object.  Is this also
> > > the casde with Python 1.5.2?
> >
> > I tried that with python 1.5.2 (adding a empty Node class to
> > xml/dom/__init__.py) and it looks like it's fine too. 
> 
> Actually, there is a problem. If you do "import xml.dom.Node", then
> you'll loose the class from __init__. Please see the attached example.

OK, not so fast, mate.  Do you really think we'll let yer out so easily after 
yer talked us into this?

Seriously, after a quick survey of my code, the only place I import Node is in 
order to get at the constants.

I think we can deal with the problem you mentioned by re-naming 
xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all the internal 
4DOM imports accordingly.

This should break little existing code and it would keep the nenefit of being 
able to share the constants and any other material we need for normalization 
across the DOM implementations.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Sat Dec 23 16:50:49 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 23 Dec 2000 11:50:49 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012231649.JAA02948@localhost.localdomain>
References: <martin@loewis.home.cs.tu-berlin.de>
 <200012231123.MAA01132@loewis.home.cs.tu-berlin.de>
 <200012231649.JAA02948@localhost.localdomain>
Message-ID: <14916.55273.8789.573578@cj42289-a.reston1.va.home.com>

uche.ogbuji@fourthought.com writes:
 > I think we can deal with the problem you mentioned by re-naming
 > xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all
 > the internal 4DOM imports accordingly.

  I think this is the best solution.

 > This should break little existing code and it would keep the
 > nenefit of being able to share the constants and any other material
 > we need for normalization across the DOM implementations.

  Especially since this is more important than being able to access
the implementation class via import!  There's a factory method on the
Document object, so there's no need to import the class from outside
the DOM implementation.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 21:10:39 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 22:10:39 +0100
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <20001223085753.A11534@newcnri.cnri.reston.va.us> (message from
 Andrew Kuchling on Sat, 23 Dec 2000 08:57:53 -0500)
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us>
Message-ID: <200012232110.WAA00730@loewis.home.cs.tu-berlin.de>

> Lots of people on W3C mailing lists do seem hell-bent on giving the
> world another example of rampant overcomplexity to put on the shelf
> next to the OSI protocols.

My feelings exactly. That's what you get when you try to extend an
archtitecture to do things it was not supposed to do...

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 21:17:07 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 22:17:07 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012231649.JAA02948@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012231649.JAA02948@localhost.localdomain>
Message-ID: <200012232117.WAA00775@loewis.home.cs.tu-berlin.de>

> I think we can deal with the problem you mentioned by re-naming
> xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all
> the internal 4DOM imports accordingly.

That would solve the problem as well, so I'm all for it. Any proposal
for a new module name?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 21:45:18 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 22:45:18 +0100
Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat
In-Reply-To: <200012220816.QAA08820@monster.icc.ru> (message from Evgeny
 Cherkashin on Fri, 22 Dec 2000 17:19:11 +0800)
References: <200012220816.QAA08820@monster.icc.ru>
Message-ID: <200012232145.WAA01155@loewis.home.cs.tu-berlin.de>

> Please find patch to support python encodings by pyexpat.
> Is it possible to include it in next release of PyXML?

Dear Evgeni,

I always wanted to have that feature in pyexpat, so I'm glad you wrote
the code. I've applied it to the CVS tree, so it will appear in the
next release.

Thanks for contributing,
Martin


From fdrake@acm.org  Sat Dec 23 22:10:24 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 23 Dec 2000 17:10:24 -0500 (EST)
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <200012232110.WAA00730@loewis.home.cs.tu-berlin.de>
References: <paulp@ActiveState.com>
 <200012230410.VAA01180@localhost.localdomain>
 <20001223085753.A11534@newcnri.cnri.reston.va.us>
 <200012232110.WAA00730@loewis.home.cs.tu-berlin.de>
Message-ID: <14917.8912.749393.373487@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > My feelings exactly. That's what you get when you try to extend an
 > archtitecture to do things it was not supposed to do...

  I'm afraid the DOM isn't faring much better.  ;-(


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From seven.nine@gte.net  Sat Dec 23 22:23:50 2000
From: seven.nine@gte.net (Chris Jones)
Date: Sat, 23 Dec 2000 14:23:50 -0800
Subject: [XML-SIG] New stuff on w3.org
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us>
Message-ID: <3A4525F6.8010906@gte.net>

Forgive the abrupt de-cloak... but this is nice to hear...  I'm diving 
quite deeply into implementing Python with PyXML, and was really 
wondering what you (the creators) think the core aspects of PyXML are-- 
I'm really banking on it, think its a great API, and would like to know 
where you're headed.  When any organization is going to dive deep into a 
technology, questions (and FUD) inevitably arise about the longevity and 
direction of the technologies you're using.  I agree that complexity for 
complexity's sake is the fastest way to kill an API, protocol, or standard.

Anyone care to speak up about what they think the core functionality of 
PyXML should be for the long-term (in this world I think thats about 6 
to 9 months)?

Thanks in advance,
Chris Jones
Consultant
<seven.nine@gte.net>

Andrew Kuchling wrote:

> On Fri, Dec 22, 2000 at 09:10:17PM -0700, uche.ogbuji@fourthought.com wrote:
> 
>> They couldn't just provide a node-set function, maybe some grouping 
>> primitives, and be done with XSLT 1.1.
> 
> 
> Lots of people on W3C mailing lists do seem hell-bent on giving the
> world another example of rampant overcomplexity to put on the shelf
> next to the OSI protocols.  (For me, it was XSchema: two documents
> specify it, and they're around 400K and 600K of HTML.  Don't hold your
> breath waiting for a Python implementation...)
> 
> --amk
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
> 
> 


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 22:57:28 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 23:57:28 +0100
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <14917.8912.749393.373487@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <paulp@ActiveState.com>
 <200012230410.VAA01180@localhost.localdomain>
 <20001223085753.A11534@newcnri.cnri.reston.va.us>
 <200012232110.WAA00730@loewis.home.cs.tu-berlin.de> <14917.8912.749393.373487@cj42289-a.reston1.va.home.com>
Message-ID: <200012232257.XAA01573@loewis.home.cs.tu-berlin.de>

>  > My feelings exactly. That's what you get when you try to extend an
>  > archtitecture to do things it was not supposed to do...
> 
>   I'm afraid the DOM isn't faring much better.  ;-(

It just occured to me that this is an application of Peter's
principle.  A good technology results in users asking for more, so it
is extended and extended until it reaches its level of incompetence.

Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Dec 23 22:56:29 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 23 Dec 2000 23:56:29 +0100
Subject: [XML-SIG] Better pyexpat backtraces
Message-ID: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de>

Since a number of people have run into the trap of thinking that Parse
is called with a bad argument number, I just checked-in a patch to
pyexpat that adds an artificial frame object on the stack. With that,
if you pass a DocumentHandler in place of a ContentHandler, you now
get a back-trace that reads

Traceback (most recent call last):
  File "a.py", line 48, in ?
    parser.parse( comic_xml )
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse
    self.feed(buffer)
  File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed
    self._parser.Parse(data, isFinal)
  File "pyexpat.c", line 370, in CharacterData
TypeError: not enough arguments to characters(); expected 4, got 2

Normally, you would not get a stack frame that points to pyexpat.c;
please let me know what you think.

The "to characters()" part is not my doing; that is a Python 2.1
feature.

Regards,
Martin


From uche.ogbuji@fourthought.com  Sun Dec 24 02:08:45 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 23 Dec 2000 19:08:45 -0700
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sat, 23 Dec 2000 22:17:07 +0100." <200012232117.WAA00775@loewis.home.cs.tu-berlin.de>
Message-ID: <200012240208.TAA01782@localhost.localdomain>

> > I think we can deal with the problem you mentioned by re-naming
> > xml/dom/Node.py, perhaps to xml/dom/FtNode.py and then adjust all
> > the internal 4DOM imports accordingly.
> 
> That would solve the problem as well, so I'm all for it. Any proposal
> for a new module name?

"xml/dom/FtNode.py", since a module name can't start with "4".


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Dec 24 02:21:40 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 23 Dec 2000 19:21:40 -0700
Subject: [XML-SIG] Python>=1.6 SIMPLE encoding support patch for pyexpat
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sat, 23 Dec 2000 22:45:18 +0100." <200012232145.WAA01155@loewis.home.cs.tu-berlin.de>
Message-ID: <200012240221.TAA01835@localhost.localdomain>

> > Please find patch to support python encodings by pyexpat.
> > Is it possible to include it in next release of PyXML?
> 
> Dear Evgeni,
> 
> I always wanted to have that feature in pyexpat, so I'm glad you wrote
> the code. I've applied it to the CVS tree, so it will appear in the
> next release.
> 
> Thanks for contributing,
> Martin

Seconded.  Now folks can process XML with all the great unicode codecs folks 
have been contributing without needing to go through Python to convert to 
Unicode first.

Much appreciated.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Sun Dec 24 03:04:13 2000
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 23 Dec 2000 22:04:13 -0500
Subject: [XML-SIG] Holiday Best Wishes
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net>
Message-ID: <001d01c06d56$320b72c0$7cac1218@reston1.va.home.com>

Happy Holidays to everyone on the list.    It's been a privilege to share your
knowledge and contributions this year.

Cheers,

Tom P


From ken@bitsko.slc.ut.us  Tue Dec 26 17:49:41 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 26 Dec 2000 11:49:41 -0600
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: "Martin v. Loewis"'s message of "Fri, 15 Dec 2000 22:06:23 +0100"
References: <200012141527.IAA18240@localhost.localdomain>
 <3A3A2F3C.8AE8A27E@FourThought.com> <x7ae9xpob5.fsf@bitsko.slc.ut.us>
 <200012152106.WAA00918@loewis.home.cs.tu-berlin.de>
Message-ID: <x7elyv9ioa.fsf@bitsko.slc.ut.us>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> > At the plug-in API level, I'd be interested in something more at
> > the "location path" level, possibly an array of steps, each step
> > with axis, node test, and list of predicates.
> 
> Yes, that would be a reasonable XPath API. How do you like the
> 4Suite ParsedLocationPath class, and corresponding structures?

Likely! :-) I briefly skimmed the source and 4suite.org and can't seem
to get a good description of what those structures look like, is there
a URL I missed?

Note also: I'm getting odd URL redirects going to 4suite.{org|com},
with URLs being replaced with quoted strings that then won't resolve:

  http://www.4suite.org/
    --> http://www.4suite.org/"index.epy"

This seems to happen on "directory" URLs.

  -- Ken


From ken@bitsko.slc.ut.us  Tue Dec 26 23:10:38 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 26 Dec 2000 17:10:38 -0600
Subject: [XML-SIG] Better pyexpat backtraces
In-Reply-To: "Martin v. Loewis"'s message of "Sat, 23 Dec 2000 23:56:29 +0100"
References: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de>
Message-ID: <x7k88m93td.fsf@bitsko.slc.ut.us>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> Since a number of people have run into the trap of thinking that Parse
> is called with a bad argument number, I just checked-in a patch to
> pyexpat that adds an artificial frame object on the stack. With that,
> if you pass a DocumentHandler in place of a ContentHandler, you now
> get a back-trace that reads
> 
> Traceback (most recent call last):
>   File "a.py", line 48, in ?
>     parser.parse( comic_xml )
>   File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 43, in parse
>     xmlreader.IncrementalParser.parse(self, source)
>   File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/xmlreader.py", line 120, in parse
>     self.feed(buffer)
>   File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed
>     self._parser.Parse(data, isFinal)
>   File "pyexpat.c", line 370, in CharacterData
> TypeError: not enough arguments to characters(); expected 4, got 2
> 
> Normally, you would not get a stack frame that points to pyexpat.c;
> please let me know what you think.
> 
> The "to characters()" part is not my doing; that is a Python 2.1
> feature.

But that is correct and the intended error message, right?  Passing a
DocumentHandler to a SAX2 parser will result in characters() being
called with "only" two arguments when a SAX1 handler expects four.

Just checking what you meant there.

  -- Ken


From uche.ogbuji@fourthought.com  Wed Dec 27 01:20:52 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 26 Dec 2000 18:20:52 -0700
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: Message from Ken MacLeod <ken@bitsko.slc.ut.us>
 of "26 Dec 2000 11:49:41 CST." <x7elyv9ioa.fsf@bitsko.slc.ut.us>
Message-ID: <200012270120.SAA02777@localhost.localdomain>

> "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:
> 
> > > At the plug-in API level, I'd be interested in something more at
> > > the "location path" level, possibly an array of steps, each step
> > > with axis, node test, and list of predicates.
> > 
> > Yes, that would be a reasonable XPath API. How do you like the
> > 4Suite ParsedLocationPath class, and corresponding structures?
> 
> Likely! :-) I briefly skimmed the source and 4suite.org and can't seem
> to get a good description of what those structures look like, is there
> a URL I missed?

There is no such beast.  These were originally intended to be purely internal 
objects.  If we decided to expose them as an API, we'd want to decide on the 
naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and 
document them properly.

For now, your best bet is to have a look at XPath/Parsed* in 4Suite (and also 
check out Xslt/Parsed* for the associated Pattern machine objects).

> Note also: I'm getting odd URL redirects going to 4suite.{org|com},
> with URLs being replaced with quoted strings that then won't resolve:
> 
>   http://www.4suite.org/
>     --> http://www.4suite.org/"index.epy"
> 
> This seems to happen on "directory" URLs.

Hmm.  I looked into this, but I'm not seeing it.  I went as bare-bones as 
possible to avoid user agent artifacts and all that:

[uogbuji@borgia uogbuji]$ telnet www.4suite.org 80
Trying 204.144.146.184...
Connected to dollar.4suite.org.
Escape character is '^]'.
GET http://www.4suite.org/ HTTP/1.0

HTTP/1.1 200 OK
Date: Wed, 27 Dec 2000 01:14:59 GMT
Server: Apache/1.3.12 (Unix) mod_snake/0.4.1
Last-Modified: Thu, 02 Nov 2000 19:07:30 GMT
ETag: "36f0d-178-3a01bb72"
Accept-Ranges: bytes
Content-Length: 376
Connection: close
Content-Type: text/html

<html>
<head>
  <meta http-equiv='Content-Type' content='text/html'>
  <meta http-equiv='Refresh' content='1;URL="index.epy"'>
</head>
<body>

  <TABLE WIDTH="100%" HEIGHT="100%">
    <TR>
      <TD ALIGN="CENTER">  <img src="images/4suite-org.gif"/><BR>
         <FONT SIZE="+1"><A HREF="index.epy">Click to Enter</A></FONT>
      </TD>
    </TR>
  </TABLE>

</body>
</html>
Connection closed by foreign host.
[uogbuji@borgia uogbuji]$ 

As you can see, the meta refresh goes to the relative "index.epy".  I don't 
know how this would cause the effect you mention.  What user agent are you 
using?

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From akuchlin@mems-exchange.org  Wed Dec 27 16:26:05 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Wed, 27 Dec 2000 11:26:05 -0500
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <3A4525F6.8010906@gte.net>; from seven.nine@gte.net on Sat, Dec 23, 2000 at 02:23:50PM -0800
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net>
Message-ID: <20001227112605.C31745@kronos.cnri.reston.va.us>

On Sat, Dec 23, 2000 at 02:23:50PM -0800, Chris Jones wrote:
>Anyone care to speak up about what they think the core functionality of 
>PyXML should be for the long-term (in this world I think thats about 6 
>to 9 months)?

Beats me; it's whatever people choose to implement and contribute.  To
pursue the XSchema example, I'm sure that if someone implemented
XSchema for Python, it would certainly be considered for inclusion.
But no one has said publicly that they're working on such support or
released any code.  This is how free software projects work; usually
there's no plan, so you can't say what will happen over the next 6
months.  If a feature -- XSchema, XSLT, whatever -- matters to you,
you can help implement it and rewrite the plan yourself, but
prediction is essentially impossible.  (At the last Python conference
Guido had a set of slides with new features for 1.6 and 2.0; some of
those features made it in, but several others didn't.)

--amk


From sean@digitome.com  Wed Dec 27 18:04:25 2000
From: sean@digitome.com (Sean McGrath)
Date: Wed, 27 Dec 2000 18:04:25 +0000
Subject: [XML-SIG] New stuff on w3.org
Message-ID: <4.3.2.7.0.20001227180351.00ba8ee0@www.digitome.com>

[Andrew Kuchling]

>Beats me; it's whatever people choose to implement and contribute.  To
>pursue the XSchema example, I'm sure that if someone implemented
>XSchema for Python, it would certainly be considered for inclusion.
>But no one has said publicly that they're working on such support or
>released any code.

Henry Thompson's XSL is an XSchema validator written in Python.
Souce is available. See:
         http://www.ltg.ed.ac.uk/~ht/xsv-status.html

Sean


From sean@digitome.com  Wed Dec 27 18:08:25 2000
From: sean@digitome.com (Sean McGrath)
Date: Wed, 27 Dec 2000 18:08:25 +0000
Subject: Freudian slip alert (Was: Re: [XML-SIG] New stuff on w3.org)
Message-ID: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com>

Of course, I meant "XSV" not "XSL" in my list posting. Sorry.

Henry Thompson's Python implementation of an XSChema
validator is XSV, not XSL.

Sean

-------
[Andrew Kuchling]

>Beats me; it's whatever people choose to implement and contribute.  To
>pursue the XSchema example, I'm sure that if someone implemented
>XSchema for Python, it would certainly be considered for inclusion.
>But no one has said publicly that they're working on such support or
>released any code.

Henry Thompson's XSV is an XSchema validator written in Python.
Souce is available. See:
         http://www.ltg.ed.ac.uk/~ht/xsv-status.html

Sean


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 28 10:01:42 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 28 Dec 2000 11:01:42 +0100
Subject: [XML-SIG] New stuff on w3.org
In-Reply-To: <3A4525F6.8010906@gte.net> (message from Chris Jones on Sat, 23
 Dec 2000 14:23:50 -0800)
References: <paulp@ActiveState.com> <200012230410.VAA01180@localhost.localdomain> <20001223085753.A11534@newcnri.cnri.reston.va.us> <3A4525F6.8010906@gte.net>
Message-ID: <200012281001.LAA00943@loewis.home.cs.tu-berlin.de>

> Forgive the abrupt de-cloak... but this is nice to hear...  I'm diving 
> quite deeply into implementing Python with PyXML, and was really 
> wondering what you (the creators) think the core aspects of PyXML are-- 
> I'm really banking on it, think its a great API, and would like to know 
> where you're headed.  

To me, the core part of PyXML are the parsers (expat and xmlproc), and
the parser APIs (SAX and DOM); for all of those, you'll see
improvements in upcoming releases.

> Anyone care to speak up about what they think the core functionality
> of PyXML should be for the long-term (in this world I think thats
> about 6 to 9 months)?

As amk explained, free software lives from user contributions. Without
any contributions, PyXML will look essentially the same in 9 months as
it does today. There is a chance that we start distributing more parts
of 4Suite in PyXML, in addition to 4DOM; these parts would most likely
be 4XPath and 4XSLT.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 28 10:39:13 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 28 Dec 2000 11:39:13 +0100
Subject: Freudian slip alert (Was: Re: [XML-SIG] New stuff on w3.org)
In-Reply-To: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com> (message
 from Sean McGrath on Wed, 27 Dec 2000 18:08:25 +0000)
References: <4.3.2.7.0.20001227180604.00ba54f0@www.digitome.com>
Message-ID: <200012281039.LAA01170@loewis.home.cs.tu-berlin.de>

> Henry Thompson's Python implementation of an XSChema
> validator is XSV, not XSL.

Thanks for the pointer; I've added a link on the PyXML "other
software" page.

Martin


From larsga@garshol.priv.no  Thu Dec 28 11:47:28 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 28 Dec 2000 12:47:28 +0100
Subject: [XML-SIG] saxtools package
Message-ID: <m3k88kojhr.fsf@lambda.garshol.priv.no>

I'll start working on the saxtools package once my book is done and
the new year begins. Meanwhile, I'll need to refer to it from the
book, and so it needs a package name. To me xml.saxtools seems like
the obvious solution.

What say ye?

--Lars M.


From larsga@garshol.priv.no  Thu Dec 28 11:59:37 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 28 Dec 2000 12:59:37 +0100
Subject: [XML-SIG] Using SP in Python
Message-ID: <m3ito4oixi.fsf@lambda.garshol.priv.no>

I've written a simple wrapper for the SP SGML parser's generic API and
also a SAX driver for that wrapper.  The SAX driver probably belongs
in saxtools and will be placed there.  The SP wrapper is perhaps
better off as a separate project, but if anyone feels it belongs in the
XML-SIG, I'll be happy to reconsider.

Appended are a sample application that emits ESIS, the C module and
the SAX driver, in that order.

Comments of all kinds would be welcome.

--Lars M.


======================================================================

import pysp

class EsisHandler:
    
    def start_element(self, name, attrs):
        print "(" + name
        for pair in attrs.items():
            print "A%s %s" % pair

    def error(self, msg):
        print "E" + msg
            
    def data(self, data):
        print "-" + repr(data)

    def sdata(self, text, name):
        print "[" + text + " " + name

    def pi(self, data):
        print "?" + data
        
    def end_element(self, name):
        print ")" + name

class Empty:
    pass
        
pysp.add_catalog("/home/larsga/data/catalog")
parser = pysp.make_parser("/home/larsga/cvs-co/data/book/bok.sgml")
parser.run(Empty())

======================================================================


/**
 * A wrapper module for the generic API of the SP SGML parser.
 *
 * $Id$
 */

/**
 * Todo:
 * - implement more events
 * - support more SP options
 * - better support for attributes through dedicated attribute type?
 * - let parser use an internal dictionary to intern element and attr names?
 */

#include "Python.h"

// define this if your libsp.a has been built with multibyte support
// (this is the default)
// undefine it if it has not
// if you fail to define this and libsp.a _does_ have multibyte support
// all your element and attribute names will be one character long...
#define SP_MULTI_BYTE 1

#include "ParserEventGeneratorKit.h"

// defines SP_VERSION as SP_T("x.x.x")
#include "version.h"
#define SP_T(x) x

static char pysp_module_documentation[] =
  "Python wrapper for the generic API of the SP SGML parser.";

/* ----------------------------------------------------------------------
   INTERNAL STUFF
 */

ParserEventGeneratorKit parserGenerator;

/* ----------------------------------------------------------------------
   UTILITIES
 */

char* extract_string(const SGMLApplication::CharString &string) {
  char* str = new char[string.len + 1];
  for (int ix = 0; ix < string.len; ix++)
    str[ix] = char(string.ptr[ix]);
  str[string.len] = 0;

  return str;
}

void extract_string(char* buffer, const SGMLApplication::CharString &string) {
  for (int ix = 0; ix < string.len; ix++)
    buffer[ix] = char(string.ptr[ix]);  
}

/* ----------------------------------------------------------------------
   SGML APPLICATION
 */

class PYSPApplication : public SGMLApplication {
 public:
  PYSPApplication(PyObject *_pyapp, EventGenerator *_eventGen) {
    Py_INCREF(_pyapp);
    pyapp = _pyapp;
    eventGen = _eventGen;
    position = NULL;
    openEntity = NULL;
  }

  void openEntityChange(const OpenEntityPtr &event) {
    openEntity = (OpenEntityPtr*) &event;
  }

  void startElement(const StartElementEvent &event) {
    position = (Position*) &event.pos;
    char *gi = extract_string(event.gi);
    PyObject *attrs = PyDict_New();
    for (size_t ix = 0; ix < event.nAttributes; ix++) {
      if (event.attributes[ix].type != Attribute::implied &&
	  event.attributes[ix].type != Attribute::invalid) {
	char *name = extract_string(event.attributes[ix].name);
	PyDict_SetItemString(attrs, name, getValue(event.attributes[ix]));
	delete[] name;
      }
    }    
    PyObject *arglist = Py_BuildValue("(sO)", gi, attrs);      

    handleCallback("start_element", arglist);
    
    delete[] gi;
  }

  void data(const DataEvent &event) {
    position = (Position*) &event.pos;
    char *data = extract_string(event.data);    
    PyObject *arglist = Py_BuildValue("(s)", data);

    handleCallback("data", arglist);
    
    delete[] data;
  }

  void sdata(const SdataEvent &event) {
    position = (Position*) &event.pos;
    char *text = extract_string(event.text);    
    char *name = extract_string(event.entityName);    
    PyObject *arglist = Py_BuildValue("(ss)", text, name);

    handleCallback("sdata", arglist);
    
    delete[] text, name;    
  }
  
  void endElement(const EndElementEvent &event) {
    position = (Position*) &event.pos;
    char *gi = extract_string(event.gi);    
    PyObject *arglist = Py_BuildValue("(s)", gi);

    handleCallback("end_element", arglist);
    
    delete[] gi;
  }

  void pi(const PiEvent &event) {
    position = (Position*) &event.pos;
    char *data = extract_string(event.data);    
    PyObject *arglist = Py_BuildValue("(s)", data);

    handleCallback("pi", arglist);
    
    delete[] data;
  }
  
  void error(const ErrorEvent &event) {
    position = (Position*) &event.pos;
    char* msg = extract_string(event.message);    
    PyObject *arglist = Py_BuildValue("(s)", msg);

    handleCallback("error", arglist);

    delete[] msg;
  }

  Location* getLocation() {
    return new Location(*openEntity, *position);
  }
  
  ~PYSPApplication() {
    Py_DECREF(pyapp);
  }
  
 private:
  PyObject *pyapp;
  EventGenerator *eventGen;
  Position *position;
  OpenEntityPtr *openEntity;

  void handleCallback(char *name, PyObject *arglist) {
    // get function from pyapp
    PyObject *callback = PyObject_GetAttrString(pyapp, name);
    if (callback == NULL) {
      PyErr_Clear(); // not really a problem; ignore
      return;
    }

    if (!PyCallable_Check(callback)) {
      eventGen->halt();
      PyErr_SetString(PyExc_TypeError, "callback attribute must be callable");
      return;
    }
    
    // call function
    if (PyEval_CallObject(callback, arglist) == NULL) 
      eventGen->halt();

    Py_DECREF(arglist);    
  }

  PyObject *getValue(const Attribute &attr) {
    PyObject *value = PyString_FromString("<value>");
    char *tmp_value;
    int value_len = 0;
    int pos = 0;
    
    switch(attr.type) {
    case Attribute::cdata:
      for (int ix = 0; ix < attr.nCdataChunks; ix++) 
	value_len += attr.cdataChunks[ix].data.len;
      
      tmp_value = new char[value_len + 1];
      for (int ix = 0; ix < attr.nCdataChunks; ix++) {
	extract_string(tmp_value + pos, attr.cdataChunks[ix].data);
	pos += attr.cdataChunks[ix].data.len;
      }
      tmp_value[pos] = 0;

      value = PyString_FromString(tmp_value);
      delete[] tmp_value;
      break;
    case Attribute::tokenized:
      tmp_value = extract_string(attr.tokens);
      value = PyString_FromString(tmp_value);
      delete[] tmp_value;
      break;      
    }

    return value;
  }
};


/* ----------------------------------------------------------------------
   SGML PARSER CLASS
 */

typedef struct {
  PyObject_HEAD

  EventGenerator *eventGen;
  PYSPApplication *application;
} sgmlparseobject;

static char Sgmlparsetype__doc__[] = "SGML parser.";

static char sgmlparse_halt__doc__[] =
"halt()\n Halt the generation of events by run(). This can be at any point\nduring the execution of run(). It is safe to call this function from a\ndifferent thread from that which called run(). ";

extern "C" PyObject* sgmlparse_halt(sgmlparseobject *self,
				    PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  self->eventGen->halt();

  Py_INCREF(Py_None);
  return Py_None;
}

static char sgmlparse_get_line_number__doc__[] =
"get_line_number()\n Returns the line number of the current event.";

extern "C" PyObject* sgmlparse_get_line_number(sgmlparseobject *self,
					       PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  PyObject *value = Py_BuildValue("i", location->lineNumber);
  delete location;
  return value;
}

static char sgmlparse_get_column_number__doc__[] =
"get_column_number()\n Returns the column number of the current event.";

extern "C" PyObject* sgmlparse_get_column_number(sgmlparseobject *self,
						 PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  PyObject *value = Py_BuildValue("i", location->columnNumber);
  delete location;
  return value;
}

static char sgmlparse_get_filename__doc__[] =
"get_filename()\n Returns the name of the file where the current event occurred.";

extern "C" PyObject* sgmlparse_get_filename(sgmlparseobject *self,
					    PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  char* tmp = extract_string(location->filename);
  PyObject *value = Py_BuildValue("s", tmp);
  delete location;
  delete tmp;
  return value;
}

static char sgmlparse_get_entity_name__doc__[] =
"get_entity_name()\n Returns the name of the entity where the current event occurred.";

extern "C" PyObject* sgmlparse_get_entity_name(sgmlparseobject *self,
					       PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  char* tmp = extract_string(location->entityName);
  PyObject *value = Py_BuildValue("s", tmp);
  delete location;
  delete tmp;
  return value;
}

static char sgmlparse_get_byte_offset__doc__[] =
"get_byte_offset()\n Returns number of bytes in the storage object preceding the point\nwhere the current event occurred.";

extern "C" PyObject* sgmlparse_get_byte_offset(sgmlparseobject *self,
					       PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  PyObject *value = Py_BuildValue("i", location->byteOffset);
  delete location;
  return value;
}

static char sgmlparse_get_entity_offset__doc__[] =
"get_entity_offset()\n Returns number of characters in the current entity preceding the\npoint where the current event occurred.";

extern "C" PyObject* sgmlparse_get_entity_offset(sgmlparseobject *self,
						 PyObject *args) {
  if (!PyArg_ParseTuple(args, ""))
    return NULL;

  SGMLApplication::Location *location = self->application->getLocation();
  PyObject *value = Py_BuildValue("i", location->entityOffset);
  delete location;
  return value;
}

static char sgmlparse_run__doc__[] =
"run(app)\n Generate the sequence of events, calling the corresponding\nmember of app for each event. Returns the number of errors. This must\nnot be called more than once for any SGML parser object.";

extern "C" PyObject* sgmlparse_run(sgmlparseobject *self,
				    PyObject *args) {
  PyObject* app;
  
  if (!PyArg_ParseTuple(args, "O", &app))
    return NULL;

  PYSPApplication realapp = PYSPApplication(app, self->eventGen);
  self->application = &realapp;
  self->eventGen->run(realapp);
  
  if (PyErr_Occurred()) 
    return NULL; // an error occurred in a callback; tell Python about it
  
  Py_INCREF(Py_None);
  return Py_None;
}

struct PyMethodDef sgmlparse_methods[] = {
        {"halt",             (PyCFunction) sgmlparse_halt,
	 METH_VARARGS,   sgmlparse_halt__doc__},
        {"run",              (PyCFunction) sgmlparse_run,
	 METH_VARARGS,   sgmlparse_run__doc__},
        {"get_line_number",  (PyCFunction) sgmlparse_get_line_number,
	 METH_VARARGS,   sgmlparse_get_line_number__doc__},
        {"get_column_number",(PyCFunction) sgmlparse_get_column_number,
	 METH_VARARGS,   sgmlparse_get_column_number__doc__},
        {"get_filename",     (PyCFunction) sgmlparse_get_filename,
	 METH_VARARGS,   sgmlparse_get_filename__doc__},
        {"get_entity_name",  (PyCFunction) sgmlparse_get_entity_name,
	 METH_VARARGS,   sgmlparse_get_entity_name__doc__},
        {"get_byte_offset",  (PyCFunction) sgmlparse_get_byte_offset,
	 METH_VARARGS,   sgmlparse_get_byte_offset__doc__},
        {"get_entity_offset",(PyCFunction) sgmlparse_get_entity_offset,
	 METH_VARARGS,   sgmlparse_get_entity_offset__doc__},
        {NULL,          NULL}           /* sentinel */
};

extern "C" void sgmlparse_dealloc(sgmlparseobject *self) {
  delete self->eventGen;
  self->eventGen = NULL;
  PyMem_DEL(self);
}

extern "C" PyObject* sgmlparse_getattr(sgmlparseobject *self, char *name)
{
  if (strcmp(name, "__members__") == 0){
    PyObject *list = PyList_New(0);
    for (int ix = 0; sgmlparse_methods[ix].ml_name; ix++)
      PyList_Append(list, PyString_FromString(sgmlparse_methods[ix].ml_name));
    return list;
  }
  
  return Py_FindMethod(sgmlparse_methods, (PyObject*) self, name);
}

static PyTypeObject Sgmlparsetype = {
        PyObject_HEAD_INIT(NULL) 0,              /*ob_size*/
        "sgmlparser",                            /*tp_name*/
        sizeof(sgmlparseobject),                 /*tp_basicsize*/
        0,                                       /*tp_itemsize*/
        /* methods */
        (destructor)  sgmlparse_dealloc,         /*tp_dealloc*/
        (printfunc)   0,                         /*tp_print*/
        (getattrfunc) sgmlparse_getattr,         /*tp_getattr*/
        (setattrfunc) 0,                         /*tp_setattr*/
        (cmpfunc)     0,                         /*tp_compare*/
        (reprfunc)    0,                         /*tp_repr*/
                      0,                         /*tp_as_number*/
                      0,                         /*tp_as_sequence*/
                      0,                         /*tp_as_mapping*/
        (hashfunc)    0,                         /*tp_hash*/
        (ternaryfunc) 0,                         /*tp_call*/
        (reprfunc)    0,                         /*tp_str*/

        /* Space for future expansion */
        0L,0L,0L,0L,
        Sgmlparsetype__doc__ /* Documentation string */
};

/* ----------------------------------------------------------------------
   FUNCTIONS
 */

static char pysp_make_parser__doc__[] =
"make_parser(filename) -> parser\n\
Return a new SGML parser object bound to the given file name.";

extern "C" PyObject* pysp_make_parser(PyObject *self, PyObject *args) {
    char *filename;
    sgmlparseobject *parser;

    if (!PyArg_ParseTuple(args, "s", &filename))
      return NULL;

    EventGenerator *evg = parserGenerator.makeEventGenerator(1, &filename);

    parser = PyObject_NEW(sgmlparseobject, &Sgmlparsetype);
    if (parser == NULL)
      return NULL;
   
    parser->eventGen = evg;
    evg->inhibitMessages(1); // don't print error messages to stderr
    return (PyObject*) parser;
}

static char pysp_add_catalog__doc__[] =
"add_catalog(filename)\n\
Tell the pysp module about a catalog file.";

extern "C" PyObject* pysp_add_catalog(PyObject *self, PyObject *args) {
    char *filename;

    if (!PyArg_ParseTuple(args, "s", &filename))
      return NULL;

    parserGenerator.setOption(ParserEventGeneratorKit::addCatalog, filename);
    
    Py_INCREF(Py_None);
    return Py_None;
}

/* ----------------------------------------------------------------------
   MODULE INITIALIZATION
 */

static PyMethodDef PYSPMethods[] = {
  {"make_parser",  pysp_make_parser, METH_VARARGS, pysp_make_parser__doc__},
  {"add_catalog",  pysp_add_catalog, METH_VARARGS, pysp_add_catalog__doc__},
  {NULL,      NULL}        /* Sentinel */
};

extern "C" void initpysp() {
  PyObject *module, *dict;

  Sgmlparsetype.ob_type = &PyType_Type;
                          
  module = Py_InitModule4("pysp", PYSPMethods, pysp_module_documentation,
			  (PyObject*) NULL, PYTHON_API_VERSION);
  dict = PyModule_GetDict(module);
  PyDict_SetItemString(dict, "sp_version", Py_BuildValue("s", SP_VERSION));
  PyDict_SetItemString(dict, "version", Py_BuildValue("s", "0.01"));
}

======================================================================

"""A SAX driver for the SP SGML parser, using the pysp extension module.

$Id$
"""

# --- Import wizardry

from xml.sax._exceptions import *
try:
    import pysp
except ImportError:
    raise SAXReaderNotAvailable("pysp not supported", None)

from xml.sax import xmlreader, saxutils, handler

AttributesImpl = xmlreader.AttributesImpl

import string

# --- Constants

version = "0.01"

namespace = "http://garshol.priv.no/symbolic/"
property_catalogs = "http://garshol.priv.no/symbolic/" + "properties/catalogs"

# --- PySPParser

class PySPParser(xmlreader.XMLReader, xmlreader.Locator):
    "SAX driver for the pysp C module."

    def __init__(self):
        xmlreader.XMLReader.__init__(self)
        self._source = xmlreader.InputSource()
        self._parser = None
        self._parsing = 0

        self._catalogs = []

    # XMLReader methods

    def parse(self, source):
        "Parse an XML document from a file. (Nothing else is supported.)"
        source = saxutils.prepare_input_source(source)

        self._cont_handler.setDocumentLocator(self)
        for catalog in self._catalogs:
            pysp.add_catalog(catalog)

        parser = pysp.make_parser(source.getSystemId())
        parser.run(self)
        
    def getFeature(self, name):
        raise SAXNotRecognizedException("Feature '%s' not recognized" % name)

    def setFeature(self, name, state):
        if self._parsing:
            raise SAXNotSupportedException("Cannot set features while parsing")

        raise SAXNotRecognizedException("Feature '%s' not recognized" % name)

    def getProperty(self, name):
        if name == property_catalogs:
            return self._catalogs
        
        raise SAXNotRecognizedException("Property '%s' not recognized" % name)

    def setProperty(self, name, value):
        if self._parsing:
            raise SAXNotSupportedException("Cannot set properties while parsing")

        if name == property_catalogs:
            if type(value) != type([]):
                raise SAXException("Value must be a list of strings!")

            self._catalogs = value
            return
        
        raise SAXNotRecognizedException("Property '%s' not recognized" % name)
        
    # Locator methods

    def getColumnNumber(self):
        return self._parser.get_column_number()

    def getLineNumber(self):
        return self._parser.get_line_number()

    def getPublicId(self):
        return None # FIXME!

    def getSystemId(self):
        return self._parser.get_filename()
    
    # event handlers
    def start_element(self, name, attrs):
        self._cont_handler.startElement(name, AttributesImpl(attrs))

    def end_element(self, name):
        self._cont_handler.endElement(name)

    def pi(self, data):
        pos = string.find(data, " ")
        if pos != -1:
            self._cont_handler.processingInstruction(data[ : pos],
                                                     data[pos + 1 : ])

    def data(self, data):
        self._cont_handler.characters(data)

    def sdata(self, text, entityname):
        # FIXME: does this make sense?
        self._cont_handler.characters(text)

    def error(self, msg):
        self._err_handler.error(SAXException(msg))

# ---
        
def create_parser(*args, **kwargs):
    return apply(PySPParser, args, kwargs)
        
# ---

if __name__ == "__main__":
    from xml.sax.saxutils import XMLGenerator
    from xml.sax.handler import ErrorHandler
    p = create_parser()
    p.setContentHandler(XMLGenerator(open("bok.xml", "w")))
    p.setErrorHandler(ErrorHandler())
    p.setProperty(property_catalogs, ["/home/larsga/data/catalog"])
    p.parse("/home/larsga/cvs-co/data/book/bok.sgml")


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 28 15:53:54 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 28 Dec 2000 16:53:54 +0100
Subject: [XML-SIG] Using SP in Python
In-Reply-To: <m3ito4oixi.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 28 Dec 2000 12:59:37 +0100)
References: <m3ito4oixi.fsf@lambda.garshol.priv.no>
Message-ID: <200012281553.QAA00673@loewis.home.cs.tu-berlin.de>

> I've written a simple wrapper for the SP SGML parser's generic API
> and also a SAX driver for that wrapper.  The SAX driver probably
> belongs in saxtools and will be placed there.

Why not in xml.sax.drivers2?

> The SP wrapper is perhaps better off as a separate project, but if
> anyone feels it belongs in the XML-SIG, I'll be happy to reconsider.

If distributed with PyXML, we'd probably need code in setup.py to
detect presence of an acceptable SP installation. If that was
available, I'm +0 for including it in PyXML, probably into
xml.parsers.

Regards,
Martin


From fdrake@acm.org  Thu Dec 28 16:06:57 2000
From: fdrake@acm.org (Fred L. Drake)
Date: Thu, 28 Dec 2000 11:06:57 -0500
Subject: [XML-SIG] Using SP in Python
In-Reply-To: <m3ito4oixi.fsf@lambda.garshol.priv.no>
Message-ID: <web-403127@digicool.com>

On 28 Dec 2000 12:59:37 +0100,
 Lars Marius Garshol <larsga@garshol.priv.no> wrote:
 > I've written a simple wrapper for the SP SGML parser's
 > generic API and
 > also a SAX driver for that wrapper.  The SAX driver
 > probably belongs
 > in saxtools and will be placed there.  The SP wrapper is
 > perhaps
 > better off as a separate project, but if anyone feels it
 > belongs in the
 > XML-SIG, I'll be happy to reconsider.

  This is great news!
  I'm not sure why the extension and driver belong in
separate projects; shouldn't they be in the same project?
The driver can be listed by name in the table used by
xml.sax.make_parser(), but when the import fails it'll just
keep going (not having the code available to check, I can
make bold assertions! ;).

 > Appended are a sample application that emits ESIS, the C
 > module and the SAX driver, in that order.
 > 
 > Comments of all kinds would be welcome.

  This reminds me that I have an XMLReader that works from
ESIS input data.  I plan to add it to xml.sax when I get
back from the holidays.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Thu Dec 28 16:46:22 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 28 Dec 2000 17:46:22 +0100
Subject: [XML-SIG] saxtools package
In-Reply-To: <m3k88kojhr.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 28 Dec 2000 12:47:28 +0100)
References: <m3k88kojhr.fsf@lambda.garshol.priv.no>
Message-ID: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de>

> I'll start working on the saxtools package once my book is done and
> the new year begins. Meanwhile, I'll need to refer to it from the
> book, and so it needs a package name. To me xml.saxtools seems like
> the obvious solution.
> 
> What say ye?

Re-reading your list of things that will go into it (from 24 Oct): I
think the extra drivers should be somewhere inside xml.sax, so that
xml.sax.parse() can find them. Likewise, LexicalHandler and DTDHandler
ought belong into xml.sax; they are interfaces, and the properties to
set and retrieve them are there already.

For the utilities, it seems that xml.saxtools was already accepted.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 29 15:57:36 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 29 Dec 2000 16:57:36 +0100
Subject: [XML-SIG] Announcing PyXPath 1.2
Message-ID: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de>

I have now completed the first fully-functional version of a 4XPath
parser, so PyXPath *should* work as a drop-in replacement of the
bison/lex part of 4XPath; essentially, it offers a function
pyxpath.Compile that has the same meaning as xml.xpath.Compile. It
uses the Parsed* classes of 4XPath as-is, so no modification to these
classes is necessary.

The distribution is available from

http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.2.tgz

To introduce some abstraction from the specific classes, and from the
fact that 4XPath uses bison token numbers in many places, I have
defined an abstract interface to XPath, which is attached
below. Unlike a former W3C effort, this API is currently designed
towards "pluggable parsers", i.e. the implementation of the abstract
syntax tree is separated from the parser engine.

This interface currently does not at all attempt to support
evaluation; thus it is orthogonal to Scott Boag's draft, which only
supported evaluation but not creation of an XPath tree. I plan to
extend that API to also support evaluation; contributions are welcome.

Even though I managed to make the current 4XPath classes to appear as
an implementation of that API, this conformance works so far only for
the ExprFactory interface. According to the API, each object should
have a number of attributes to allow navigation in the
expression. Since 4XPath does not expose any attributes, I decided to
come up with my own attribute names and types. I'd like to know
potential improvements to that API before making 4XPath fully
conforming.

The API is IDL based, which is meant in the same way as in the DOM:
there is a (yet to be specified) mapping to Python, which roughly
works that way:
- global constants are defined in the module xml.xpath.
- DOMString means Unicode objects, although normal strings should
  be accepted were possible.
- attributes are accessed as attributes; _get_ accessor functions
  are optional.

Any comments are welcome.

Regards,
Martin

module XPath{

  typedef wstring DOMString;

  const unsigned short ABSOLUTE_LOCATION_PATH = 1;
  const unsigned short ABBREVIATED_ABSOLUTE_LOCATION_PATH = 2;
  const unsigned short RELATIVE_LOCATION_PATH = 3;
  const unsigned short ABBREVIATED_RELATIVE_LOCATION_PATH = 4;
  const unsigned short STEP_EXPR = 5; // STEP would conflict with Step in case
  const unsigned short NODE_TEST = 6;
  const unsigned short NAME_TEST = 7;
  const unsigned short BINARY_EXPR = 8;
  const unsigned short UNARY_EXPR = 9;
  const unsigned short PATH_EXPR = 10;
  const unsigned short ABBREVIATED_PATH_EXPR = 11; // filter '//' path
  const unsigned short FILTER_EXPR = 12;
  const unsigned short VARIABLE_REFERENCE = 13;
  const unsigned short LITERAL_EXPR = 14;
  const unsigned short NUMBER_EXPR = 15;
  const unsigned short FUNCTION_CALL = 16;                              


  interface Expr{
    readonly attribute unsigned short exprType;
  };

  interface AbsoluteLocationPath;
  interface AbbreviatedAbsoluteLocationPath;
  interface RelativeLocationPath;
  interface Step;
  interface AxisSpecifier;
  interface NodeTest;
  typedef sequence<Expr> PredicateList, ExprList;
  interface NameTest;
  interface BinaryExpr;
  interface UnaryExpr;
  interface UnionExpr;
  interface PathExpr;
  interface FilterExpr;
  interface VariableReference;
  interface Literal;
  interface Number;
  interface FunctionCall;

  interface ExprFactory{
    AbsoluteLocationPath createAbsoluteLocationPath(in RelativeLocationPath p);
    AbsoluteLocationPath createAbbreviatedAbsoluteLocationPath(in RelativeLocationPath p);
    RelativeLocationPath createRelativeLocationPath(in RelativeLocationPath left,
						    in Step right);
    RelativeLocationPath createAbbreviatedRelativeLocationPath(in RelativeLocationPath left,
							       in Step right);

    Step createStep(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates);
    // . is represented as self::node(); .. as parent::node()
    Step createAbbreviatedStep(in boolean dotdot); // false for .; true for ..
    // An omitted axisname is created as CHILD; @ is created as ATTRIBUTE

    AxisSpecifier createAxisSpecifier(in unsigned short name);

    NodeTest createNodeTest(in unsigned short type);
    NameTest createNameTest(in DOMString prefix, in DOMString localName);

    BinaryExpr createBinaryExpr(in unsigned short operator, in Expr left, in Expr right);

    UnaryExpr createUnaryExpr(in Expr exp);

    PathExpr createPathExpr(in Expr filter, in Expr path);
    // filter '//' path
    PathExpr createAbbreviatedPathExpr(in Expr filter, in Expr path);

    FilterExpr createFilterExpr(in Expr filter, in Expr predicate);

    // the name must still contain the leading $
    VariableReference createVariableReference(in DOMString name);

    Literal createLiteral(in DOMString literal);
    Number createNumber(in DOMString value);
    FunctionCall createFunctionCall(in DOMString name, in ExprList args);
  };

  interface Parser{
    Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step
  };
  
  interface AbsoluteLocationPath:Expr{
    /* '/' relative-opt, or '//' relative */
    readonly attribute Expr relative; // step or relative path
  };

  interface RelativeLocationPath:Expr{
    readonly attribute Expr left; // step or relative path
    readonly attribute Step right;
  };

  interface Step:Expr{
    readonly attribute AxisSpecifier axis;
    readonly attribute NodeTest test;
    readonly attribute PredicateList predicates;
  };

  const unsigned short ANCESTOR = 1;
  const unsigned short ANCESTOR_OR_SELF = 2;
  const unsigned short _ATTRIBUTE = 3; // attribute is a keyword
  const unsigned short CHILD = 4;
  const unsigned short DESCENDANT = 5;
  const unsigned short DESCENDANT_OR_SELF = 6;
  const unsigned short FOLLOWING = 7;
  const unsigned short FOLLOWING_SIBLING = 8;
  const unsigned short NAMESPACE = 9;
  const unsigned short PARENT = 10;
  const unsigned short PRECEDING = 11;
  const unsigned short PRECEDING_SIBLING = 12;
  const unsigned short SELF = 13;
  interface AxisSpecifier:Expr{
    readonly attribute unsigned short name;
  };

  const unsigned short COMMENT = 1;
  const unsigned short TEXT = 2;
  const unsigned short PROCESSING_INSTRUCTION = 3;
  const unsigned short NODE = 4;
  interface NodeTest:Expr{
    readonly attribute unsigned short test;
    readonly attribute DOMString literal; // only for PROCESSING_INSTRUCTION
  };

  interface NameTest:Expr{
    readonly attribute DOMString prefix; // may be null
    readonly attribute DOMString localName; // may be "*"
  };

  const unsigned short BINOP_OR = 1;
  const unsigned short BINOP_AND = 2;
  const unsigned short BINOP_EQ = 3;
  const unsigned short BINOP_NEQ = 4;
  const unsigned short BINOP_LT = 5;
  const unsigned short BINOP_GT = 6;
  const unsigned short BINOP_LE = 7;
  const unsigned short BINOP_GE = 8;
  const unsigned short BINOP_PLUS = 9;
  const unsigned short BINOP_MINUS = 10;
  const unsigned short BINOP_TIMES = 11;
  const unsigned short BINOP_DIV = 12;
  const unsigned short BINOP_MOD = 13;
  const unsigned short BINOP_UNION = 14;
  interface BinaryExpr:Expr{
    readonly attribute unsigned short operator;
    readonly attribute Expr left,right;
  };

  // can be only the unary minus
  interface UnaryExpr:Expr{
    readonly attribute Expr exp;
  };

  interface PathExpr:Expr{
    readonly attribute Expr filter;
    readonly attribute Expr path;
  };

  interface FilterExpr:Expr{
    readonly attribute Expr filter;
    readonly attribute Expr predicate;
  };

  interface VariableReference:Expr{
    readonly attribute DOMString name;
  };

  interface Literal:Expr{
    readonly attribute DOMString value;
  };

  interface Number:Expr{
    readonly attribute double value;
  };

  interface FunctionCall:Expr{
    readonly attribute DOMString name;
    readonly attribute ExprList args;
  };

};


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 29 16:03:50 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 29 Dec 2000 17:03:50 +0100
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <200012270120.SAA02777@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200012270120.SAA02777@localhost.localdomain>
Message-ID: <200012291603.RAA01507@loewis.home.cs.tu-berlin.de>

> There is no such beast.  These were originally intended to be purely internal 
> objects.  If we decided to expose them as an API, we'd want to decide on the 
> naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and 
> document them properly.

If you could follow the IDL API I just posted, renaming the classes
would not be necessary: the "official" way to create instances of
those classes would be to use the factory; the official way to find
out what kind of expression you have would be to look at the exprType
attribute.

> For now, your best bet is to have a look at XPath/Parsed* in 4Suite
> (and also check out Xslt/Parsed* for the associated Pattern machine
> objects).

Given that the Pattern grammar is only slightly larger than the XPath
grammar: Would it be useful to provide only a single interface, with
the option of either parsing a LocationPath or a Pattern? At least
when using YAPPS, it is not difficult to have two start symbols in a
single grammar.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 29 16:26:27 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 29 Dec 2000 17:26:27 +0100
Subject: [XML-SIG] 4XPath: parsing Unicode string
In-Reply-To: <200011252105.GAA01817@dhcp198.grad.sccs.chukyo-u.ac.jp> (message
 from Tamito KAJIYAMA on Sun, 26 Nov 2000 06:05:21 +0900)
References: <200011252105.GAA01817@dhcp198.grad.sccs.chukyo-u.ac.jp>
Message-ID: <200012291626.RAA01579@loewis.home.cs.tu-berlin.de>

> I have a problem that I cannot pass a Unicode string containing
> Japanese characters to the 4XPath parser.  Following reproduces
> the problem:

Please have a look at the PyXPath package I've just released. I
noticed that there is still an incompatibility to 4XPath: it only
allows to compile LocationPath expressions, not full expressions.
Putting full Unicode into the expression is no problem, though:

>>> print pyxpath.Compile(u'para[substring-after("2000\u5E7410\u670830\u65E5", "\u6708")]')
<Step at 82ff5dc: child::para[substring-after("2000\u5E7410\u670830\u65E5", "\u6708")]>

If you attempt to use that package in addition to 4XPath, please let me know.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 29 17:54:46 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 29 Dec 2000 18:54:46 +0100
Subject: [XML-SIG] Better pyexpat backtraces
In-Reply-To: <x7k88m93td.fsf@bitsko.slc.ut.us> (message from Ken MacLeod on 26
 Dec 2000 17:10:38 -0600)
References: <200012232256.XAA01648@loewis.home.cs.tu-berlin.de> <x7k88m93td.fsf@bitsko.slc.ut.us>
Message-ID: <200012291754.SAA02094@loewis.home.cs.tu-berlin.de>

> But that is correct and the intended error message, right?  Passing a
> DocumentHandler to a SAX2 parser will result in characters() being
> called with "only" two arguments when a SAX1 handler expects four.

Right. Before, you'd get an error message saying

     self.feed(buffer)
   File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/expatreader.py", line 87, in feed
     self._parser.Parse(data, isFinal)
TypeError: not enough arguments; expected 4, got 2

That was confusing; it would suggest that there is an error in the
call to Parse. It's just the traceback that has changed.

Regards,
Martin


From uche.ogbuji@fourthought.com  Fri Dec 29 19:07:01 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 29 Dec 2000 12:07:01 -0700
Subject: [XML-SIG] PyXPath 1.1
References: <200012270120.SAA02777@localhost.localdomain> <200012291603.RAA01507@loewis.home.cs.tu-berlin.de>
Message-ID: <3A4CE0D5.1C51C02D@fourthought.com>

"Martin v. Loewis" wrote:

> If you could follow the IDL API I just posted
                                  ^^^^^^^^^^^^^
No dice.  XML-SIG has disappeared again.  I haven't received anything
since yesterday afternoon.  The archives aren't showing anything either.

What's up with the mailing lists?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From larsga@garshol.priv.no  Fri Dec 29 19:29:58 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Dec 2000 20:29:58 +0100
Subject: [XML-SIG] saxtools package
In-Reply-To: <200012281646.RAA00972@loewis.home.cs.tu-berlin.de>
References: <m3k88kojhr.fsf@lambda.garshol.priv.no> <200012281646.RAA00972@loewis.home.cs.tu-berlin.de>
Message-ID: <m37l4jqb49.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| Re-reading your list of things that will go into it (from 24 Oct): I
| think the extra drivers should be somewhere inside xml.sax, so that
| xml.sax.parse() can find them.

If that means that they also go into the Python distribution, then I'm
perfectly happy with that.

| Likewise, LexicalHandler and DTDHandler ought belong into xml.sax;
| they are interfaces, and the properties to set and retrieve them are
| there already.

I agree.  This should be in xml.sax.
 
| For the utilities, it seems that xml.saxtools was already accepted.

Good!  Then I'll start checking things in as soon as I can.  (I have
many of the bits and pieces already.)

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Fri Dec 29 22:56:26 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 29 Dec 2000 23:56:26 +0100
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <3A4CE0D5.1C51C02D@fourthought.com> (message from Uche Ogbuji on
 Fri, 29 Dec 2000 12:07:01 -0700)
References: <200012270120.SAA02777@localhost.localdomain> <200012291603.RAA01507@loewis.home.cs.tu-berlin.de> <3A4CE0D5.1C51C02D@fourthought.com>
Message-ID: <200012292256.XAA00714@loewis.home.cs.tu-berlin.de>

> No dice.  XML-SIG has disappeared again.  I haven't received anything
> since yesterday afternoon.  The archives aren't showing anything either.
> 
> What's up with the mailing lists?

Apparently, python.org ran out of disk space. Barry mentioned that it
should be fixed now, but it apparently isn't. I got some messages back
(mainly to python-help); when it comes back and doesn't have my
messages, I'll have to repost.

Regards,
Martin


From ken@bitsko.slc.ut.us  Sat Dec 30 18:01:28 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 30 Dec 2000 12:01:28 -0600
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: "Martin v. Loewis"'s message of "Fri, 29 Dec 2000 17:03:50 +0100"
References: <200012270120.SAA02777@localhost.localdomain>
 <200012291603.RAA01507@loewis.home.cs.tu-berlin.de>
Message-ID: <x7elyp6b5z.fsf@bitsko.slc.ut.us>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> > There is no such beast.  These were originally intended to be
> > purely internal objects.  If we decided to expose them as an API,
> > we'd want to decide on the naming (Martin doesn't like the
> > "Parsed" prefixes, I'm +0 on killing them) and document them
> > properly.
> 
> If you could follow the IDL API I just posted, renaming the classes
> would not be necessary: the "official" way to create instances of
> those classes would be to use the factory; the official way to find
> out what kind of expression you have would be to look at the
> exprType attribute.

Yes, the classes/attributes in that IDL look excellent.

> > For now, your best bet is to have a look at XPath/Parsed* in
> > 4Suite (and also check out Xslt/Parsed* for the associated Pattern
> > machine objects).
> 
> Given that the Pattern grammar is only slightly larger than the
> XPath grammar: Would it be useful to provide only a single
> interface, with the option of either parsing a LocationPath or a
> Pattern? At least when using YAPPS, it is not difficult to have two
> start symbols in a single grammar.

Yes, I think it would be very useful to reuse the same interface.

  -- Ken


From martin@loewis.home.cs.tu-berlin.de  Sun Dec 31 08:10:12 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 31 Dec 2000 09:10:12 +0100
Subject: [XML-SIG] saxtools package
In-Reply-To: <m37l4jqb49.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 29 Dec 2000 20:29:58 +0100)
References: <m3k88kojhr.fsf@lambda.garshol.priv.no> <200012281646.RAA00972@loewis.home.cs.tu-berlin.de> <m37l4jqb49.fsf@lambda.garshol.priv.no>
Message-ID: <200012310810.JAA00707@loewis.home.cs.tu-berlin.de>

> | Re-reading your list of things that will go into it (from 24 Oct): I
> | think the extra drivers should be somewhere inside xml.sax, so that
> | xml.sax.parse() can find them.
> 
> If that means that they also go into the Python distribution, then I'm
> perfectly happy with that.

That's a different matter. Both Python and PyXML support
xml.sax.parse, but only PyXML offers a choice of parsers.

Regards,
Martin