From teep@mozart.inet.co.th  Mon Jan  3 14:11:09 2000
From: teep@mozart.inet.co.th (Prateep Siamwalla)
Date: Mon, 3 Jan 2000 21:11:09 +0700 (GMT+0700)
Subject: [XML-SIG] Trouble installing PyXML-0.5.2
Message-ID: <Pine.OSF.3.96.1000103205922.30102A-100000@mozart.inet.co.th>

Hello  pythoners,
I have been having some problems installing the latest python xml package
(downloaded from the XML-SIG pages)

My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and
pythonlib-1.22-5.  I downloaded the PyXML 0.5.2 package and extracted to
/usr/local/PyXML-0.5.2

I've tried running "make -f Makefile.pre.in boot" and 
"python setup.py build" from /usr/local/PyXML-0.5.2/ and I seem to be
repeatedly running against a wall which reads :

make[1]: Entering directory `/usr/local/pythonish/xml-0.5.1'
make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile',
needed by `sedscript'.  Stop.
make[1]: Leaving directory `/usr/local/pythonish/xml-0.5.1'
make: *** [boot] Error 2 


Incidentally VERSION=1.5, installdir=/usr, and exec_installdir=/usr

I am very new to python and I fear I am doing something completely stupid,
hopefully this description is enough for someone to point out my errors,
if not, please tell me what other information I should provide.

-looking forward to xmling,
teep


From akuchlin@mems-exchange.org  Mon Jan  3 15:10:49 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 3 Jan 2000 10:10:49 -0500 (EST)
Subject: [XML-SIG] Trouble installing PyXML-0.5.2
In-Reply-To: <Pine.OSF.3.96.1000103205922.30102A-100000@mozart.inet.co.th>
References: <Pine.OSF.3.96.1000103205922.30102A-100000@mozart.inet.co.th>
Message-ID: <14448.48121.821005.55136@amarok.cnri.reston.va.us>

Prateep Siamwalla writes:
>My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and
>pythonlib-1.22-5.  I downloaded the PyXML 0.5.2 package and extracted to
>/usr/local/PyXML-0.5.2

You also need to install the python-devel RPM to get the
/usr/lib/python1.5/config/ directory, which contains the files needed
to compile new Python extensions.  

You shouldn't have to extract the files into /usr/local/PyXML-0.5.2,
though that shouldn't cause any problems; running 'python setup.py
install' should copy all the files into
/usr/lib/python1.5/site-packages/xml/ .

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Every man is wise when attacked by a mad dog; fewer when pursued by a mad
woman; only the wisest survive when attacked by a mad notion.
    -- Robertson Davies, _Marchbanks' Almanac_


From guido@CNRI.Reston.VA.US  Mon Jan  3 17:37:59 2000
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 03 Jan 2000 12:37:59 -0500
Subject: [XML-SIG] Python Conference -- Early Bird Registration ends Jan 5!
Message-ID: <200001031737.MAA24342@eric.cnri.reston.va.us>

This is the last warning.  The conference is getting booked full,
don't wait till the last moment!

If you haven't registered and paid by January 5, you will paying full
price...  So, be smart and register NOW.  Also don't forget to book
your hotel room by January 3 to qualify for the conference rate!

Some highlights from the conference program:

- 8 tutorials on topics ranging from JPython to Fnorb;
- a keynote by Open Source evangelist Eric Raymond;
- another by Randy Pausch, father of the Alice Virtual Reality project;
- a separate track for Zope developers and users;
- live demonstrations of important Python applications;
- refereed papers, and short talks on current topics;
- a developers' day where the feature set of Python 2.0 is worked out.

Our motto, due to Bruce Eckel, is: "Life's better without braces."

Come and join us at the Key Bridge Marriott in Rosslyn (across the
bridge from Georgetown), January 24-27 in 2000.  Make the Python
conference the first conference you attend in the new millennium!

The early bird registration deadline is January 5.  More info:

    http://www.python.org/workshops/2000-01/

The program is now complete with the titles of all presentations.
There is still space in the demo session and in the short talks
session.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From uche.ogbuji@fourthought.com  Tue Jan  4 09:14:30 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 04 Jan 2000 02:14:30 -0700
Subject: [XML-SIG] ANN: 4DOM 0.9.1
Message-ID: <3871B9F6.F55BF3B2@fourthought.com>

FourThought LLC (http://FourThought.com) announces the release of

                             4DOM 0.9.1
                      -----------------------
                An XML/HTML Python library using the
                  Document Object Model interface

4DOM is a Python library for XML and HTML processing and manipulation
using the W3C's Document Object Model for interface.  4DOM implements
DOM Core level 2, HTML level 2 and Level 2 Document Traversal.

4DOM should work on all platforms supported by Python.  If you have
any problems with a particular platform, please e-mail the authors.

4DOM is designed to allow developers rapidly design applications
that read, write or manipulate HTML and XML.

News
----

This is a bug-fix release.

More info and Obtaining 4DOM
----------------------------

Please see

        http://FourThought.com/4Suite/4DOM

Or you can download 4DOM from

        ftp://FourThought.com/pub/4Suite/4DOM

4DOM is distributed under a license similar to that of Python.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Tue Jan  4 09:29:34 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 04 Jan 2000 02:29:34 -0700
Subject: [XML-SIG] ANN: 4XPath and 4XSLT 0.8.1
Message-ID: <3871BD7E.A6AF1AC0@fourthought.com>

FourThought LLC (http://FourThought.com) announces the release of

                      4XSLT and 4XPath 0.8.1
                      ----------------------
                      A python implementation
                     of the W3C's XSLT language


4XSLT is an XML transformation processor based on the W3C's
specification
for the XSLT transform language.  4XPath implements the W3C XPath
language
for indicating and selecting XML document components.

http://www.w3.org/TR/xslt

4XPath implements the full 4XPath recommendation except for the 'lang'
core function.

Currently, 4XSLT supports a sub-set of the XSLT recommendation including
the following:

Full expression support and attribute-value template expansion
xsl:include                     xsl:import
xsl:template                    xsl:apply-imports
xsl:apply-templates             xsl:copy
xsl:call-template               xsl:if
xsl:for-each                    xsl:choose
xsl:element                     xsl:when
xsl:attribute                   xsl:otherwise
xsl:text                        xsl:message
xsl:value-of                    xsl:variable
xsl:processing-instruction      xsl:param
xsl:comment                     xsl:with-param
xsl:strip-space                 xsl:key
xsl:preserve-space              xsl:copy-of
xsl:sort                        xsl:namespace-alias
xsl:output

and, of course, xsl:stylesheet, xsl:transform, literal elements and text

Using the xml output method, 4XSLT produces the result tree by throwing
events from the emerging SAX 2 standard to a handler, so it can be
easily
modified to supply results to any SAX 2 consumer.  For the 'html' and
'text' output methods special SAX consumers produce HTML DOM nodes and
plain text respectively.


News
----

Changes in 0.8.1
----------------

 - 4XSLT implements xsl:xsl:sort and xsl:namespace-alias
 - 4XSLT now implements template priorities
 - 4XPath now has a clear DOM-query interface
 - many big-fixes and more extensive testing

More info and Obtaining 4XPath and 4XSLT
----------------------------------------

Please see

        http://FourThought.com/4Suite/4XPath
        http://FourThought.com/4Suite/4XSLT

Or you can download 4XSLT from

        ftp://FourThought.com/pub/4Suite/4XPath
        ftp://FourThought.com/pub/4Suite/4XSLT

4XPath and 4XSLT are distributed under a license similar to that of
Python.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From teep@inet.co.th  Tue Jan  4 14:12:10 2000
From: teep@inet.co.th (Prateep Siamwalla)
Date: Tue, 4 Jan 2000 21:12:10 +0700
Subject: [XML-SIG] Trouble installing PyXML-0.5.2
Message-ID: <001101bf56bd$b3603e60$46a809c0@tarzan>

Thanks very much, I've downloaded the extra rpms and have the modules
compiled.

-teep

-----Original Message-----
From: Andrew M. Kuchling <akuchlin@mems-exchange.org>
To: Prateep Siamwalla <teep@mozart.inet.co.th>
Cc: xml-sig@python.org <xml-sig@python.org>
Date: Monday, January 03, 2000 10:10 PM
Subject: Re: [XML-SIG] Trouble installing PyXML-0.5.2


>Prateep Siamwalla writes:
>>My system is a RedHat 6.0, and I have installed rpms of Python 1.5.2-2 and
>>pythonlib-1.22-5.  I downloaded the PyXML 0.5.2 package and extracted to
>>/usr/local/PyXML-0.5.2
>
>You also need to install the python-devel RPM to get the
>/usr/lib/python1.5/config/ directory, which contains the files needed
>to compile new Python extensions.
>
>You shouldn't have to extract the files into /usr/local/PyXML-0.5.2,
>though that shouldn't cause any problems; running 'python setup.py
>install' should copy all the files into
>/usr/lib/python1.5/site-packages/xml/ .
>
>--
>A.M. Kuchling http://starship.python.net/crew/amk/
>Every man is wise when attacked by a mad dog; fewer when pursued by a mad
>woman; only the wisest survive when attacked by a mad notion.
>    -- Robertson Davies, _Marchbanks' Almanac_
>
>


From larsga@garshol.priv.no  Wed Jan  5 11:12:07 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: Wed, 5 Jan 2000 12:12:07 +0100
Subject: [XML-SIG] dtddoc: Version 0.11 released!
Message-ID: <200001051112.MAA01505@lambda.garshol.priv.no>

Changes since version 0.11:

  The DTD is unchanged. Other than that, the following has changed: 

    * The makeskel tool has been added. 
    * A very experimental DocBook RefEntry backend has been added. 
    * The -t strict option has been added. 
    * dtddoc now checks for the correct xmlproc version. 
    * All reported bugs have been fixed. (Thanks to Stig Erik Sandų,
    Phong Vu and Alan Karben.)


This version is mainly released as a bug fix release, but some users
may also find the other changes useful.

The home page has moved to a permanent new location, which is:

  <URL: http://www.garshol.priv.no/download/software/dtddoc/ >

--Lars M.


From larsga@garshol.priv.no  Wed Jan  5 11:29:33 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Jan 2000 12:29:33 +0100
Subject: [XML-SIG] Future plans
In-Reply-To: <199912201809.LAA06565@localhost.localdomain>
References: <199912201809.LAA06565@localhost.localdomain>
Message-ID: <m34scs69k2.fsf@ifi.uio.no>

* uche ogbuji
| 
| Lars published a SAX2 module that pretty much covers the ground of
| the current status.  I've been cajoling the folks on XML-DEV to
| finish the SAX2 spec, and things are coming about slowly.  

I've been trying to follow the current discussion on XML-DEV (of
course it had to happen when I'm on vacation), and the plan is to let
the dust settle a little on XML-DEV before discussion is started here.
There are some things we might want to do differently from the Java
folks, so there will probably need to be some discussion.

In any case, things are moving along, although slowly.

| 4DOM comes with a pretty complete SAX2 -> DOM reader, which is used
| by 4XSLT.

Do you use my package or do you use the stuff that you put together?

--Lars M.


From larsga@garshol.priv.no  Wed Jan  5 11:34:29 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Jan 2000 12:34:29 +0100
Subject: [XML-SIG] Future plans
In-Reply-To: <199912201637.LAA12992@amarok.cnri.reston.va.us>
References: <199912201637.LAA12992@amarok.cnri.reston.va.us>
Message-ID: <m33dsc69bu.fsf@ifi.uio.no>

* Andrew M. Kuchling
| 
| Some things to do:
| 
|      * I propose dropping the wstrop and xmlarch code from the CVS
|        tree: wstrop because Python 1.6 will have built-in Unicode
|        support of some strip, and xmlarch because architectual forms
|        are fairly rarely used, and don't need to be in the core.

I agree that wstrop should be dropped. 
  
|      * What about namespace support in SAX -- what's the status of SAX2?

SAX2 will have namespace support, but the actual form of it is
uncertain at the moment. I've also been thinking that we may want
qualified names to be represented as tuples, either

  (namespace name (URI), localpart (element type name), prefix)

or

  (namespace name (URI), localpart (element type name))

--Lars M.


From uche.ogbuji@fourthought.com  Wed Jan  5 16:09:37 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 05 Jan 2000 09:09:37 -0700
Subject: [XML-SIG] Future plans
In-Reply-To: Your message of "05 Jan 2000 12:29:33 +0100."
 <m34scs69k2.fsf@ifi.uio.no>
Message-ID: <200001051609.JAA02544@localhost.localdomain>

> | 4DOM comes with a pretty complete SAX2 -> DOM reader, which is used
> | by 4XSLT.
> 
> Do you use my package or do you use the stuff that you put together?

We use your package.  We bundled it with 4Suite-base.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Wed Jan  5 16:13:23 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 05 Jan 2000 09:13:23 -0700
Subject: [XML-SIG] Future plans
In-Reply-To: Your message of "05 Jan 2000 12:34:29 +0100."
 <m33dsc69bu.fsf@ifi.uio.no>
Message-ID: <200001051613.JAA02566@localhost.localdomain>

> |      * What about namespace support in SAX -- what's the status of SAX2?
> 
> SAX2 will have namespace support, but the actual form of it is
> uncertain at the moment.

Actually, it has settled down, and is probably pretty much determined at this 
point.  There was much good debate about it on XML-DEV.

> I've also been thinking that we may want
> qualified names to be represented as tuples, either
> 
>   (namespace name (URI), localpart (element type name), prefix)
> 
> or
> 
>   (namespace name (URI), localpart (element type name))

I think it might be more natural to always make it a triple, and simply have 
'' as the third item when there is no namespace.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From larsga@garshol.priv.no  Wed Jan  5 16:30:34 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Jan 2000 17:30:34 +0100
Subject: [XML-SIG] Future plans
In-Reply-To: <200001051613.JAA02566@localhost.localdomain>
References: <200001051613.JAA02566@localhost.localdomain>
Message-ID: <m34scs4h1x.fsf@ifi.uio.no>

* Lars Marius Garshol
|
| SAX2 will have namespace support, but the actual form of it is
| uncertain at the moment.

* uche ogbuji
| 
| Actually, it has settled down, and is probably pretty much
| determined at this point.  There was much good debate about it on
| XML-DEV.

I've read through the debate, but I've failed to notice any
agreement. However, I've printed out the main posts for perusal at
home, so maybe I'll find it there.
 
* Lars Marius Garshol
|
| I've also been thinking that we may want
| qualified names to be represented as tuples, either
| 
|   (namespace name (URI), localpart (element type name), prefix)
| 
| or
| 
|   (namespace name (URI), localpart (element type name))

* uche ogbuji
|
| I think it might be more natural to always make it a triple, and
| simply have '' as the third item when there is no namespace.

No prefix, you mean? I agree, but the question is whether we really
want the prefix here or whether we should just always use a binary
tuple.

--Lars M.


From gstein@lyra.org  Wed Jan  5 22:19:09 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 5 Jan 2000 14:19:09 -0800 (PST)
Subject: [XML-SIG] namespace/localpart tuples (was: Future plans)
In-Reply-To: <200001051613.JAA02566@localhost.localdomain>
Message-ID: <Pine.LNX.4.10.10001051415380.412-100000@nebula.lyra.org>

On Wed, 5 Jan 2000 uche.ogbuji@fourthought.com wrote:
>...
> > I've also been thinking that we may want
> > qualified names to be represented as tuples, either
> > 
> >   (namespace name (URI), localpart (element type name), prefix)
> > 
> > or
> > 
> >   (namespace name (URI), localpart (element type name))
> 
> I think it might be more natural to always make it a triple, and simply have 
> '' as the third item when there is no namespace.

At processing time, the prefix that was used is irrelevant. It shouldn't
be preserved.

You could end up in a situation where a client thinks that prefix "should"
be used when regenerating XML output... the problem is that it may
conflict (say, if you combined a couple XML docs) or not be defined in the
(new) output (if you dropped some portion that defined the namespace).

IMO, it is much better to regenerate a new set of prefixes for the set of
namespace URIs that are present in an XML document.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From uche.ogbuji@fourthought.com  Wed Jan  5 23:13:46 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 05 Jan 2000 16:13:46 -0700
Subject: [XML-SIG] namespace/localpart tuples (was: Future plans)
In-Reply-To: Your message of "Wed, 05 Jan 2000 14:19:09 PST."
 <Pine.LNX.4.10.10001051415380.412-100000@nebula.lyra.org>
Message-ID: <200001052313.QAA03676@localhost.localdomain>

> > > I've also been thinking that we may want
> > > qualified names to be represented as tuples, either
> > > 
> > >   (namespace name (URI), localpart (element type name), prefix)
> > > 
> > > or
> > > 
> > >   (namespace name (URI), localpart (element type name))
> > 
> > I think it might be more natural to always make it a triple, and simply have 
> > '' as the third item when there is no namespace.
> 
> At processing time, the prefix that was used is irrelevant. It shouldn't
> be preserved.

The prefix has no semantic value: it is indeed syntactic sugar.  However, it 
is very important to maintain the "principle of least surprise" for users.

If a user runs his XSLT stylesheet through a SAX processor and finds that all 
his "xsl:template" elements have been renamed to "prefix00001:template", he 
might be very confused indeed.

Note that there is at least one case in which the prefix does matter: XSLT 
uses the prefix to match declared namespaces in the stylesheet to namespaces 
in the source document.  Now many people have already railed against this 
violation of the spirit of XML Namespaces 1.0, but there is no srguing that it 
was the most elegant solution to a difficult problem that the XSLT WG faced in 
dealing with namespaces.

So, in short, though prefixes are not technically part of the document, there 
are good arguments for including them in the SAX binding.

> You could end up in a situation where a client thinks that prefix "should"
> be used when regenerating XML output... the problem is that it may
> conflict (say, if you combined a couple XML docs) or not be defined in the
> (new) output (if you dropped some portion that defined the namespace).

The best solution to this is education.  If the interface documentation 
clearly states that prefixes are not technically part of the document, 
hopefully users will avoid mis-using them.  This is not ideal, but there's not 
much better to do given the practical issues involved.

> IMO, it is much better to regenerate a new set of prefixes for the set of
> namespace URIs that are present in an XML document.

Even as a user who knows better about the meaning of prefixes, I would be very 
annoyed at a processor that did this.  I often deal with documented with 4 or 
more namespaces (this is not too unusual: very common in RDF) and I give my 
prefixes mnemonic names to help sort things out.  I don't want processors 
renaming them to "p01a3", etc.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From gstein@lyra.org  Wed Jan  5 23:30:04 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 5 Jan 2000 15:30:04 -0800 (PST)
Subject: [XML-SIG] namespace/localpart tuples
In-Reply-To: <200001052313.QAA03676@localhost.localdomain>
Message-ID: <Pine.LNX.4.10.10001051526550.412-100000@nebula.lyra.org>

On Wed, 5 Jan 2000 uche.ogbuji@fourthought.com wrote:
>...
> The prefix has no semantic value: it is indeed syntactic sugar.  However, it 
> is very important to maintain the "principle of least surprise" for users.
> 
> If a user runs his XSLT stylesheet through a SAX processor and finds that all 
> his "xsl:template" elements have been renamed to "prefix00001:template", he 
> might be very confused indeed.

hehe... agreed on that one :-)

> Note that there is at least one case in which the prefix does matter: XSLT 
> uses the prefix to match declared namespaces in the stylesheet to namespaces 
> in the source document.  Now many people have already railed against this 
> violation of the spirit of XML Namespaces 1.0, but there is no srguing that it 
> was the most elegant solution to a difficult problem that the XSLT WG faced in 
> dealing with namespaces.

Oh, kee-rist. What dickheads.

All right... then there is a reason to keep the prefix. Sigh. The
three-tuple containing the prefix should be used since XSLT applies
semantic meaning to it.

>...
> > IMO, it is much better to regenerate a new set of prefixes for the set of
> > namespace URIs that are present in an XML document.
> 
> Even as a user who knows better about the meaning of prefixes, I would be very 
> annoyed at a processor that did this.  I often deal with documented with 4 or 
> more namespaces (this is not too unusual: very common in RDF) and I give my 
> prefixes mnemonic names to help sort things out.  I don't want processors 
> renaming them to "p01a3", etc.

Yah. DAV typically uses a few namespace, too (DAV: itself plus
product-specific properties), so I'm familiar with this.

If you don't like the renaming, then avoid mod_dav :-)  (it renames stuff
to things like ns0, ns1, i0, i1, ...). I could explain why, but you
probably don't want to hear... hehe

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tpassin@idsonline.com  Fri Jan  7 03:31:20 2000
From: tpassin@idsonline.com (Thomas B. Passin)
Date: Thu, 6 Jan 2000 22:31:20 -0500
Subject: [XML-SIG] Future plans
References: <199912201637.LAA12992@amarok.cnri.reston.va.us> <m33dsc69bu.fsf@ifi.uio.no>
Message-ID: <003601bf58c0$aae89e80$de2a08d1@tomshp>

Lars Marius Garshol wrote:
>
> * Andrew M. Kuchling
> |
> | Some things to do:
> |
> |      * I propose dropping the wstrop and xmlarch code from the CVS
> |        tree: wstrop because Python 1.6 will have built-in Unicode
> |        support of some strip, and xmlarch because architectual forms
> |        are fairly rarely used, and don't need to be in the core.
>
> I agree that wstrop should be dropped.
>
> |      * What about namespace support in SAX -- what's the status of SAX2?
>
> SAX2 will have namespace support, but the actual form of it is
> uncertain at the moment. I've also been thinking that we may want
> qualified names to be represented as tuples, either
>
>   (namespace name (URI), localpart (element type name), prefix)
>
> or
>
>   (namespace name (URI), localpart (element type name))
>
I think we should follow the lead of Megginson and the XML-DEV discussions
on whether there should be a separate prefix part - I personally think there
should be, but let's follow what they end up with on this.  Yes, tuples seem
to be the perfect way to do qualified names, no matter how the others want
to do them for Java or C++.

> --Lars M.
>
>
Regards,
Tom Passin


From tpassin@idsonline.com  Fri Jan  7 03:36:40 2000
From: tpassin@idsonline.com (Thomas B. Passin)
Date: Thu, 6 Jan 2000 22:36:40 -0500
Subject: [XML-SIG] namespace/localpart tuples (was: Future plans)
References: <200001052313.QAA03676@localhost.localdomain>
Message-ID: <003701bf58c0$abed8b60$de2a08d1@tomshp>

uche.ogbuji@fourthought.com> wrote:

> > > > I've also been thinking that we may want
> > > > qualified names to be represented as tuples, either
> > > >
> > > >   (namespace name (URI), localpart (element type name), prefix)
> > > >
> > > > or
> > > >
> > > >   (namespace name (URI), localpart (element type name))
> > >
> > > I think it might be more natural to always make it a triple, and
simply have
> > > '' as the third item when there is no namespace.
> >
> > At processing time, the prefix that was used is irrelevant. It shouldn't
> > be preserved.
>
> The prefix has no semantic value: it is indeed syntactic sugar.  However,
it
> is very important to maintain the "principle of least surprise" for users.
>
> If a user runs his XSLT stylesheet through a SAX processor and finds that
all
> his "xsl:template" elements have been renamed to "prefix00001:template",
he
> might be very confused indeed.
>
> Note that there is at least one case in which the prefix does matter: XSLT
> uses the prefix to match declared namespaces in the stylesheet to
namespaces
> in the source document.  Now many people have already railed against this
> violation of the spirit of XML Namespaces 1.0, but there is no srguing
that it
> was the most elegant solution to a difficult problem that the XSLT WG
faced in
> dealing with namespaces.
>
> So, in short, though prefixes are not technically part of the document,
there
> are good arguments for including them in the SAX binding.
>
> > You could end up in a situation where a client thinks that prefix
"should"
> > be used when regenerating XML output... the problem is that it may
> > conflict (say, if you combined a couple XML docs) or not be defined in
the
> > (new) output (if you dropped some portion that defined the namespace).
>
> The best solution to this is education.  If the interface documentation
> clearly states that prefixes are not technically part of the document,
> hopefully users will avoid mis-using them.  This is not ideal, but there's
not
> much better to do given the practical issues involved.
>
> > IMO, it is much better to regenerate a new set of prefixes for the set
of
> > namespace URIs that are present in an XML document.
>
> Even as a user who knows better about the meaning of prefixes, I would be
very
> annoyed at a processor that did this.  I often deal with documented with 4
or
> more namespaces (this is not too unusual: very common in RDF) and I give
my
> prefixes mnemonic names to help sort things out.  I don't want processors
> renaming them to "p01a3", etc.
>
>
I'm completely with Uche on this - sugar or not, we should preserve the
prefixes.  After all, your software can always ignore them later if you
don't care.  And, again as Uche says, the prefixes sometimes are chosen to
help document the meaning of the document.

Tom Passin


From aa8vb@yahoo.com  Fri Jan  7 16:23:40 2000
From: aa8vb@yahoo.com (Randall Hopper)
Date: Fri, 7 Jan 2000 11:23:40 -0500
Subject: [XML-SIG] XML Writing Tools (& escape_markup)
Message-ID: <20000107112340.A27660@vislab.epa.gov>

Do any Python tools exist to write (or aid in writing) indented XML?

Thanks,

-- 
Randall Hopper
aa8vb@yahoo.com


From fdrake@acm.org  Fri Jan  7 16:48:56 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 7 Jan 2000 11:48:56 -0500 (EST)
Subject: [XML-SIG] XML Writing Tools (& escape_markup)
In-Reply-To: <20000107112340.A27660@vislab.epa.gov>
References: <20000107112340.A27660@vislab.epa.gov>
Message-ID: <14454.6392.791758.466469@weyr.cnri.reston.va.us>

Randall Hopper writes:
 > Do any Python tools exist to write (or aid in writing) indented XML?

  In the CVS repository, check xml.sax.writer.  It needs documentation 
and a bit of work, but my working copy at home is pretty messed up at
the moment, and I don't know just when I'll get back to it.  ;(
  On the other hand, if you try it and have any specific suggestions
or comments, I'd appreciate hearing them!


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From uche.ogbuji@fourthought.com  Fri Jan  7 18:28:21 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 07 Jan 2000 11:28:21 -0700
Subject: [XML-SIG] XML Writing Tools (& escape_markup)
In-Reply-To: Your message of "Fri, 07 Jan 2000 11:23:40 EST."
 <20000107112340.A27660@vislab.epa.gov>
Message-ID: <200001071828.LAA03595@localhost.localdomain>

> Do any Python tools exist to write (or aid in writing) indented XML?

4DOM (http://Fourthought/4Suite/4DOM) has a pretty-printer which emits 
indented XML.  Here is an example:

<ADDRBOOK >
  <ENTRY ID='pa' >
    <NAME>Pieter Aaron</NAME>
    <ADDRESS>404 Error Way</ADDRESS>
    <PHONENUM DESC='Work' >404-555-1234</PHONENUM>
    <PHONENUM DESC='Fax' >404-555-4321</PHONENUM>
    <PHONENUM DESC='Pager' >404-555-5555</PHONENUM>
    <EMAIL>pieter.aaron@inter.net</EMAIL>
  </ENTRY>
  <ENTRY-LINK xlink:href='addr_book2.xml'  xlink:link='simple' >
  </ENTRY-LINK>
  <ENTRY ID='en' >
    <NAME>Emeka Ndubuisi</NAME>
    <ADDRESS>42 Spam Blvd</ADDRESS>
    <PHONENUM DESC='Work' >767-555-7676</PHONENUM>
    <PHONENUM DESC='Fax' >767-555-7642</PHONENUM>
    <PHONENUM DESC='Pager' >800-SKY-PAGEx767676</PHONENUM>
    <EMAIL>endubuisi@spamtron.com</EMAIL>
  </ENTRY>
</ADDRBOOK >

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From chapmanb@arches.uga.edu  Fri Jan  7 22:50:08 2000
From: chapmanb@arches.uga.edu (Brad Chapman)
Date: Fri, 7 Jan 2000 17:50:08 -0500
Subject: [XML-SIG] Removing a set of DOM nodes
Message-ID: <l0313030bb49c1a291870@[172.16.0.2]>

Hello all!
	I have a quick question about removing a node and all it's children
nodes from a DOM tree.
	I have an application where I identify a particular node in a DOM
tree and then need to remove it and all of its subnodes from the tree. The
tree can branch quite far beneath the node I am deleting, so there are many
sub-sub...-sub-nodes under the parent node. So far I have just been dealing
with this by destroying all of the nodes immmediate child nodes by
iterating through them and applying nodeToDelete.removeChild(oldNode), and
then deleting the parent node.
	Although this has the desired affect (gets the node and everything
under it out of the tree) I am curious about memory leakage. Are all of the
sub-sub...-sub-nodes that were not explicitly deleted removed once they are
detached from the main document node, or do they continue to exist? And, if
they do need to be explicitly deleted, is there any code already available
that does this? (ie. a class inheriting from the Walker class)
	Thanks in advance for any advice!

Brad


From dieter@handshake.de  Sun Jan  9 10:04:09 2000
From: dieter@handshake.de (Dieter Maurer)
Date: Sun,  9 Jan 2000 11:04:09 +0100 (CET)
Subject: [XML-SIG] Removing a set of DOM nodes
In-Reply-To: <l0313030bb49c1a291870@[172.16.0.2]>
References: <l0313030bb49c1a291870@[172.16.0.2]>
Message-ID: <14456.23567.554931.949249@lindm.dm>

Hello Brad,

if you use PyDOM (i.e. the DOM implementation of the XML SIG)
then it will be sufficient to just remove the parent of the
subtree. PyDOM uses proxies to avoid any cyclic references
such that the usual Python garbage collection can clean
up without problem.

If you use 4DOM, the FourThought people should tell you
about the necessary procedure to perform manual garbage
collection.
I assume, you must call a "destroy" method after the
subtree has been removed from the tree. This will
take care of any recursion down the tree.

Dieter


From JKnight496@aol.com  Sun Jan  9 20:05:29 2000
From: JKnight496@aol.com (JKnight496@aol.com)
Date: Sun, 9 Jan 2000 15:05:29 EST
Subject: [XML-SIG] DATABASE PUBLISHING
Message-ID: <39.3926f5ad.25aa4409@aol.com>

To whom it may conern,

I recently checked-out an XML book from a library.  Within it, Microsoft 
ACCESS
was an example database that the author used to publish an ACCESS database
on line.  To do this, he refrenced a "python" script.  

I have an ACCESS database that I want to publish on-line.  Do I need to 
dowload
any type of decompiler to read the python scripting language?  

Any help would be greatly appreciated!!

Many thanks

Jared Knight


From mgushee@havenrock.com  Sun Jan  9 21:54:18 2000
From: mgushee@havenrock.com (Matt Gushee)
Date: Sun, 9 Jan 2000 16:54:18 -0500 (EST)
Subject: [XML-SIG] DATABASE PUBLISHING
In-Reply-To: <39.3926f5ad.25aa4409@aol.com>
References: <39.3926f5ad.25aa4409@aol.com>
Message-ID: <14457.906.341911.535565@gargle.gargle.HOWL>

Hmm ... this type of question is probably best directed to
comp.lang.python, rather than XML-SIG. But since we're already here:

JKnight496@aol.com writes:

 > I have an ACCESS database that I want to publish on-line.  Do I need to 
 > dowload
 > any type of decompiler to read the python scripting language?  

In general, no (I'm not sure such a thing exists for Python, anyway).
Python is an interpreted language, which means the executable
scripts/programs are plain text. Well, actually, Python programs *can
be* distributed in a compiled, therefore unreadable, form, but it's not
all that common. And the fact that your author referred to a 'script'
as opposed to a 'program' suggests to me that it's almost certainly in
plain text form.

Best of luck!

--
Matt Gushee
Portland, Maine, USA
mgushee@havenrock.com
http://www.havenrock.com/


From tpassin@idsonline.com  Mon Jan 10 00:01:35 2000
From: tpassin@idsonline.com (Thomas B. Passin)
Date: Sun, 9 Jan 2000 19:01:35 -0500
Subject: [XML-SIG] DATABASE PUBLISHING
References: <39.3926f5ad.25aa4409@aol.com>
Message-ID: <001701bf5afd$df0166e0$b02a08d1@tomshp>

<JKnight496@aol.com> wrote

>
> To whom it may conern,
>
> I recently checked-out an XML book from a library.  Within it, Microsoft
> ACCESS
> was an example database that the author used to publish an ACCESS database
> on line.  To do this, he refrenced a "python" script.
>
> I have an ACCESS database that I want to publish on-line.  Do I need to
> dowload
> any type of decompiler to read the python scripting language?
>
> Any help would be greatly appreciated!!
>

Python scripts are usually text (*.py), although they could be compiled
(*.pyc).  If your book references a python script, the script may require
another python package or module to work.  You can tell what is needed by
looking at the "import" statements.  Most import statements call for
standard modules that come with python, but if you see one called, for
example, mxODBC, or most anything with "ODBC", that is one you'd have to
get.  (This particular one you can find at the Database SIG page on the
www.python.org site).

What is the book you mentioned?

Tom Passin


From uche.ogbuji@fourthought.com  Mon Jan 10 00:28:02 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 09 Jan 2000 17:28:02 -0700
Subject: [XML-SIG] Removing a set of DOM nodes
In-Reply-To: Your message of "Sun, 09 Jan 2000 11:04:09 +0100."
 <14456.23567.554931.949249@lindm.dm>
Message-ID: <200001100028.RAA18892@localhost.localdomain>

I think he's using PyDOM, because near the end of his message he spoke about 
the "DOM Walker".  4DOM instead has a generic Visitor pattern implementation 
(although a simple document-order visitor comes with it).

> if you use PyDOM (i.e. the DOM implementation of the XML SIG)
> then it will be sufficient to just remove the parent of the
> subtree. PyDOM uses proxies to avoid any cyclic references
> such that the usual Python garbage collection can clean
> up without problem.
> 
> If you use 4DOM, the FourThought people should tell you
> about the necessary procedure to perform manual garbage
> collection.
> I assume, you must call a "destroy" method after the
> subtree has been removed from the tree. This will
> take care of any recursion down the tree.

Yes.  You just call ReleaseNode() on the node when you're done with it and it 
will release the node and all descendants.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From larsga@garshol.priv.no  Tue Jan 11 17:10:05 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 11 Jan 2000 18:10:05 +0100
Subject: [XML-SIG] Future plans
In-Reply-To: <003601bf58c0$aae89e80$de2a08d1@tomshp>
References: <199912201637.LAA12992@amarok.cnri.reston.va.us> <m33dsc69bu.fsf@ifi.uio.no> <003601bf58c0$aae89e80$de2a08d1@tomshp>
Message-ID: <m3r9foa61e.fsf@ifi.uio.no>

* Thomas B. Passin
|
| I think we should follow the lead of Megginson and the XML-DEV
| discussions on whether there should be a separate prefix part - I
| personally think there should be, but let's follow what they end up
| with on this.  

They have decided that the prefix should be made available.

| Yes, tuples seem to be the perfect way to do qualified names, no
| matter how the others want to do them for Java or C++.

They do seem attractive, but the trouble is that if we include the
prefix in the tuples, then the same name with different prefixes will
not be equal. We will have to solve this somehow, perhaps by making
incompatible changes the way they will for Java.

--Lars M.


From larsga@garshol.priv.no  Tue Jan 11 17:13:35 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 11 Jan 2000 18:13:35 +0100
Subject: [XML-SIG] Groves
In-Reply-To: <199912291541.IAA02695@localhost.localdomain>
References: <199912291541.IAA02695@localhost.localdomain>
Message-ID: <m3puv8a5vk.fsf@ifi.uio.no>

* uche ogbuji
| 
| Unfortunately, this probably puts practical use of the pure grove
| model on hold for us.  We are actually working heavily with RDF in a
| current project, using internal Python/RDF tools that the Python
| community will probably see soon in OSS form, and we'll probably
| continue to use RDF directly. 

Just for the record: I also have a RDF tool for Python slowing brewing
at home. It might be an idea for us to agree on the interface to
in-memory RDF objects, and perhaps also to make it the same as the one
for grove nodes.

| I shall continue to study the grove model, however, as a means of
| thinking as clearly as possible about data.

As will I. Some sort of harmonization between groves and RDF would be
most interesting, I think.

--Lars M.


From evangelo@pigdog.org  Sat Jan 15 01:13:44 2000
From: evangelo@pigdog.org (ESP)
Date: 14 Jan 2000 17:13:44 -0800
Subject: [XML-SIG] XBEL
Message-ID: <u66www307.fsf@coursenet.com>

I dunno if anyone's keeping an eye on this stuff, but I have a patch
to submit for the msie_parse.py util.

If someone could tell me how to submit it, I'd be super grateful.

~ESP

-- 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ESP <evangelo@pigdog.org> | http://pigdog.org/ |  RoR - Alucard
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


From Jeremy Siek <jsiek@lsc.nd.edu>  Sun Jan 16 02:45:40 2000
From: Jeremy Siek <jsiek@lsc.nd.edu> (Jeremy Siek <jsiek@lsc.nd.edu>)
Date: Sat, 15 Jan 2000 21:45:40 -0500 (EST)
Subject: [XML-SIG] trouble shooting XSL demo
Message-ID: <200001160245.VAA03829@philoctetes.lsc.nd.edu>

Hi,

I'm trying to learn how to the python XSL package
and ran into a difficulty right away:

I try to run the demo like so:

python /usr/local/lib/XSL/Processor.py addr_book1.xml addr_book1.xsl 

And get the following error:

IOError: (2, 'No such file or directory')

The traceback started at:
engine.include_xsl_file(name)
...


Suggestions? Probably some simple install problem, yes?


Thanks ahead of time,

Jeremy


From uche.ogbuji@fourthought.com  Sun Jan 16 03:17:33 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 15 Jan 2000 20:17:33 -0700
Subject: [XML-SIG] trouble shooting XSL demo
In-Reply-To: Your message of "Sat, 15 Jan 2000 21:45:40 EST."
 <200001160245.VAA03829@philoctetes.lsc.nd.edu>
Message-ID: <200001160317.UAA03338@localhost.localdomain>

> I'm trying to learn how to the python XSL package
> and ran into a difficulty right away:
> 
> I try to run the demo like so:
> 
> python /usr/local/lib/XSL/Processor.py addr_book1.xml addr_book1.xsl 
> 
> And get the following error:
> 
> IOError: (2, 'No such file or directory')
> 
> The traceback started at:
> engine.include_xsl_file(name)
> ...

Whoa!  It looks as if the version you are trying to use is a year old.  Not 
only has the software changed significantly, but so has the standard as well.

You can get the latest version of 4XSLT at

http://FourThought.com/4Suite/4XSLT

If you are using Linux, RPMs are available, and if you are running Windows, 
please let us know and we can send you some tips for compiling the package.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From akuchlin@mems-exchange.org  Tue Jan 18 01:49:52 2000
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Mon, 17 Jan 2000 20:49:52 -0500
Subject: [XML-SIG] Developer's Day position paper
Message-ID: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>

Here's the position paper for the XML-SIG's Developer's Day session.
The issues I'd most like to see resolved are:

    * Should we switch to 4DOM?  Or should I seriously start pushing
on implementing DOM Level 2?  Attendees are encouraged to look at
4DOM's code, so they can form some opinions of its quality.

    * Are there any issues about a SAX2 Python binding that we should
discuss?

    * What about xmllib.py? (I raise this issue with some trepidation :) )

Some things on the list are quite simple; for example, consensus
seemed to be that qp_xml.py should go into the tree.  Greg, you want
to go ahead and check it in?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Don't try anything."
    "It's all right; I won't hurt you."
    -- Unnamed thug and Liz Shaw, in "The Ambassadors of Death"


The XML-SIG has been mostly drifting for the last year, with little
attempt made to push out a 1.0 version.  Part of this stemmed from
external forces (little development on a Python Unicode type, no
namespace support in SAX or DOM), and part stemmed from internal
reasons (my getting distracted, few people having time to work on it).

We need to finish a 1.0 version as soon as possible.  Paul Prescod
listed some standards that should be supported: XML SAX Unicode XPath
XPointer DOM XSLT.  XSLT seems too far afield, but the rest are
probably worthwhile candidates.  (Can we prioritize them?)

Issues remaining:

DOM Level 2 support  
===================

Should we switch to 4Thought's implementations of DOM, XPath, and
XPointer?  Participants should try to at least read through some of
4Thought's code, so they can form an opinion of its quality.

Pros:

* An existing DOM Level 2 implementation.

* Maintainers use it actively for real work; PyDOM maintainer has
  short attention span.

* Has XPath and XSLT tools built on top of it.
  (Paul Prescod wrote a few weeks ago that "Ideally we would have one
  (or at most two!) implementation of each of the major specs: XML,
  SAX, Unicode, XPath, XPointer, XSLT, DOM"; if you take 4DOM + 4XSL +
  4Path, this would mean that Unicode is the only missing piece.)

* Faster than PyDOM

* Potential for CORBA support by adding some extra bits

Cons:

* Does anyone other than the maintainers have any experience with it?
  Any comments?  (If you don't want to slag it off publicly, you can
  send me unfavorable comments privately, and I'll preserve your
  anonymity.)

* Uses Ft.Dom package name, not xml.dom

* Potential incompatibilities with existing code, Sean's book, etc.
  (But probably a bit of glue code will let us smooth over such
  problems.)

* Requires releasing nodes explicitly

* Requires that 4Suite base be added to XML-SIG distribution
  (But the only dependency, at least in the DOM, seems to be on
  Ft.Lib.TraceOut.)


SAX Namespace support 
===================== 
This requires that XML-DEV converges to some consensus on SAX2.  Has
it done so?  Are there issues facing a SAX2 binding that we should
discuss?


Unicode
=======

We've been waiting on the official Python solution.  M.-A. Lemburg is
implementing a proposal
(http://starship.skyport.net/~lemburg/unicode-proposal.txt) but it
hasn't reached the Python CVS tree yet.  Therefore, we still haven't
dropped wstrop.  My inclination is to do this as soon as Unicode is
reasonably stable in the CVS tree.


Adding qp_xml.py
================

To the xml.parser package, presumably?

xmllib.py
=========

A touchy catfight ensued on the xml-sig mailing about xmllib.py's
standards compliance (with or without sgmlop).  Should xmllib.py be
dropped?  Can we do an xmllib.py compatible class on top of Expat?


Glue for Java and COM parsers
=============================
We don't have any particular support for Java-based parsers, or
Microsoft's XML-parsing COM component, but we probably should.
(Still, this is probably a low priority.)


From gstein@lyra.org  Tue Jan 18 12:36:56 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 18 Jan 2000 04:36:56 -0800 (PST)
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
Message-ID: <Pine.LNX.4.10.10001180429300.13911-100000@nebula.lyra.org>

On Mon, 17 Jan 2000, A.M. Kuchling wrote:
>...
> Some things on the list are quite simple; for example, consensus
> seemed to be that qp_xml.py should go into the tree.  Greg, you want
> to go ahead and check it in?

I did not consider myself "authoritative" regarding the XML-SIG
distribution, so I never gave myself checkin privileges :-). I can go
ahead and do so, though, so that I can check in (and maintain) qp_xml.

>...
> Adding qp_xml.py
> ================
> 
> To the xml.parser package, presumably?

Either there or xml.utils. I think xml.parsers makes more sense, but am
open to opinion.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From fdrake@acm.org  Tue Jan 18 18:18:20 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 18 Jan 2000 13:18:20 -0500 (EST)
Subject: [XML-SIG] XBEL
In-Reply-To: <u66www307.fsf@coursenet.com>
References: <u66www307.fsf@coursenet.com>
Message-ID: <14468.44652.383946.198726@weyr.cnri.reston.va.us>

ESP writes:
 > I dunno if anyone's keeping an eye on this stuff, but I have a patch
 > to submit for the msie_parse.py util.
 > 
 > If someone could tell me how to submit it, I'd be super grateful.

  If Andrew hasn't already asked for it, please post it to the XML-SIG 
list if it isn't too long.  (If it is long, summarize the change to
the list and send the patch to Andrew or myself.)
  Thanks!


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From larsga@garshol.priv.no  Tue Jan 18 18:25:26 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Jan 2000 19:25:26 +0100
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <Pine.LNX.4.10.10001180429300.13911-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10001180429300.13911-100000@nebula.lyra.org>
Message-ID: <m3ln5nqlt5.fsf@lambda.garshol.priv.no>

* Greg Stein
| 
| [qp_xml location]
| Either there or xml.utils. I think xml.parsers makes more sense, but
| am open to opinion.

Personally, I don't think it's a parser. expat is the parser, and
qp_xml is a client to that parser that builds a data structure
suitable for navigation. xml.utils is better, IMHO.

--Lars M.


From skip@mojam.com (Skip Montanaro)  Tue Jan 18 18:33:19 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 18 Jan 2000 12:33:19 -0600 (CST)
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
Message-ID: <14468.45551.632590.597771@beluga.mojam.com>

    AMK> xmllib.py
    AMK> =========

    AMK> A touchy catfight ensued on the xml-sig mailing about xmllib.py's
    AMK> standards compliance (with or without sgmlop).  Should xmllib.py be
    AMK> dropped?  Can we do an xmllib.py compatible class on top of Expat?

I haven't really been following this list for awhile, because my eyes just
sort of glaze over when everyone starts slinging various XML-related
acronyms (does XML seem to be worse afflicted with this disease than the
rest of the Internet community?  seems like it to me).  I will throw in my
two cents on this issue (not having noticed the aforementioned catfight in
this list because of optical glazing).  I currently use
xmllib+sgmlop+xmlrpclib to do XML-RPC stuff in a production server.  If you
can't find a solution that is as easy to use and that performs at least as
well, I'll have to freeze on what I have now.  I've tried solutions that
were implemented in straight Python and Python+private C-based extensions.
The former is useless performance-wise, the other is a major headache to
maintain.  I for one am thankful that Fredrik Lundh developed and released
both sgmlop and xmlrpclib.  They make my days much more pleasant.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/
847-971-7098


From paul@prescod.net  Tue Jan 18 19:16:47 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 18 Jan 2000 11:16:47 -0800
Subject: [XML-SIG] Developer's Day position paper
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com>
Message-ID: <3884BC1F.3CBD0B2B@prescod.net>

I agree with everything Skip says. We need something as fast and easy as
xmllib. We also need something compliant to the Unicode and XML
specifications. Does anyone disagree? Does anyone think that these goals
are mutually exclusive?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Earth will soon support only survivor species -- dandelions, roaches, 
lizards, thistles, crows, rats. Not to mention 10 billion humans.
	- Planet of the Weeds, Harper's Magazine, October 1998


From paul@prescod.net  Tue Jan 18 19:17:01 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 18 Jan 2000 11:17:01 -0800
Subject: [XML-SIG] Developer's Day position paper
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com>
Message-ID: <3884BC2D.E4EE339D@prescod.net>

I agree with everything Skip says. We need something as fast and easy as
xmllib. We also need something compliant to the Unicode and XML
specifications. Does anyone disagree? Does anyone think that these goals
are mutually exclusive?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Earth will soon support only survivor species -- dandelions, roaches, 
lizards, thistles, crows, rats. Not to mention 10 billion humans.
	- Planet of the Weeds, Harper's Magazine, October 1998


From fdrake@acm.org  Tue Jan 18 19:56:45 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 18 Jan 2000 14:56:45 -0500 (EST)
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <3884BC1F.3CBD0B2B@prescod.net>
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
 <14468.45551.632590.597771@beluga.mojam.com>
 <3884BC1F.3CBD0B2B@prescod.net>
Message-ID: <14468.50557.762952.669644@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > I agree with everything Skip says. We need something as fast and easy as
 > xmllib. We also need something compliant to the Unicode and XML
 > specifications. Does anyone disagree? Does anyone think that these goals
 > are mutually exclusive?

  No and no.  There's absolutely no reason to drop support for
xmllib.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From akuchlin@mems-exchange.org  Wed Jan 19 03:21:40 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 18 Jan 2000 22:21:40 -0500 (EST)
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <14468.45551.632590.597771@beluga.mojam.com>
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
 <14468.45551.632590.597771@beluga.mojam.com>
Message-ID: <14469.11716.833194.855673@newcnri.cnri.reston.va.us>

Skip Montanaro writes:
>this list because of optical glazing).  I currently use
>xmllib+sgmlop+xmlrpclib to do XML-RPC stuff in a production server.  If you
>can't find a solution that is as easy to use and that performs at least as
>well, I'll have to freeze on what I have now. 

OK, hold that thought.  Attached is a simple benchmark script that use
PyExpat and xmllib+sgmlop to parse the 279K hamlet.xml file
(ummm.. it's part of Jon Bosak's XML sample data, but I'm not sure
where to download it from).  Could someone please verify these
results, or point out some stupid error in the benchmark script?
(Since I use my system for developing the XML package, it's possible
that I'm getting an old or broken version of xmllib+sgmlop.)

[amk@mira bench]$ python pyexp.py
PyExpat w/ null handlers: 279663 bytes in 0.04 seconds = 6575.70 K/sec
PyExpat w/ StartElementHandler: 279663 bytes in 0.19 seconds = 1457.33
K/sec
PyExpat w/ Start,End: 279663 bytes in 0.25 seconds = 1102.61 K/sec
PyExpat w/ Start,End,Char,PI: 279663 bytes in 0.36 seconds = 758.03
K/sec
Fast xmllib: 279663 bytes in 3.15 seconds = 86.66 K/sec
Slow xmllib: 279663 bytes in 17.77 seconds = 15.37 K/sec
Raw sgmlop: 279663 bytes in 0.02 seconds = 11004.42 K/sec
[amk@mira bench]$

Assuming no errors in the benchmark, xmllib on top of PyExpat should
be around half as fast as xmllib on top of sgmlop, probably roughly
40K/sec on my machine.  (That's just a guess, though.)  Like
economists, this benchmark probably points in several directions. :)

--amk


import os, time
from xml.parsers import pyexpat

f = open('hamlet.xml', 'r')
data = f.read()
size = f.tell()

def dummy(*args): pass

def print_duration(parser, duration):
    print '%s: %i bytes in %.02f seconds = %.02f K/sec' % (parser, size,
                                       duration, size/duration/1024.0)

parser = pyexpat.ParserCreate( )
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ null handlers', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ StartElementHandler', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
parser.EndElementHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ Start,End', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
parser.EndElementHandler = dummy
parser.CharacterDataHandler = dummy
parser.ProcessingInstructionHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ Start,End,Char,PI', time.time() - start)


from xml.parsers import xmllib
p = xmllib.FastXMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Fast xmllib', time.time() - start)

p = xmllib.SlowXMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Slow xmllib', time.time() - start)

import sgmlop
p = sgmlop.XMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Raw sgmlop', time.time() - start)


From tpassin@idsonline.com  Wed Jan 19 04:34:56 2000
From: tpassin@idsonline.com (Thomas B. Passin)
Date: Tue, 18 Jan 2000 23:34:56 -0500
Subject: [XML-SIG] Developer's Day position paper
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com> <14468.45551.632590.597771@beluga.mojam.com> <3884BC1F.3CBD0B2B@prescod.net>
Message-ID: <002f01bf6236$8c674f40$342a08d1@tomshp>

Paul wrote:


> I agree with everything Skip says. We need something as fast and easy as
> xmllib. We also need something compliant to the Unicode and XML
> specifications. Does anyone disagree? Does anyone think that these goals
> are mutually exclusive?
>
> --

I agree - easy and reasonably fast.  There's lots of jobs out there that use
small xml files and want to be easy to get off and running. xmllib is good
for that.  Also agree on Unicode and the XML standards (I think we can wait
awhile longer for the dust to settle on xml-schemas, though).  I think that
an XPath processor would be important since it could be the basis of any
number of query processors.

Another thought, does anyone else think this should count for anything?
That is the subject of compatibility with JPython.  Right now, xmllib (minus
the c-based parsers) ought to work with JPython, but the 4thought suite
won't, since they need to compile C stuff with bison, etc.  I think we ought
to have a basic library that's easy to use and will work with both flavors
of Python on any machine.  (On the other hand, JPython should make it
relatively easy to work with all those nice Java products).  This leads me
to think that we shouldn't rely on the 4thought suite as the **sole**
processors in the library.  Anyone want to add some thoughts to this?

Tom Passin


From paul@prescod.net  Fri Jan 21 11:38:01 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 21 Jan 2000 03:38:01 -0800
Subject: [XML-SIG] xmllib on expat
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
 <14468.45551.632590.597771@beluga.mojam.com> <14469.11716.833194.855673@newcnri.cnri.reston.va.us>
Message-ID: <38884519.CEB31DE0@prescod.net>

I had just decided to do this "xmllib on top of PyExpat" idea this
afternoon! I was planning to make big changes to pyexpat but then it
looked like I wouldn't have to. The interfaces are pretty close. But I'm
actually having trouble getting PyExpat to work at all. Am I doing
something stupid?

I wanted to get it out tonight but it is getting very late so I'll have
to figure it out on a plane tomorrow and send it Saturday morning. :(

Here's my code:

import pyexpat

parser = pyexpat.ParserCreate()
def show( *x ):
	print x

parser.StartElementHandler=show
parser.EndElementHandler=show
parser.CharacterDataHandler=show
parser.ProcessingInstructionHandler=show

data=open( "hamlet.xml" ).read()
parser.Parse( data, 0 )

Nothing happens. Nothing ever gets printed.

I've already written the xmllib on pyexpat code so if I can get this
little example to work, I would be home free.

Also, I'm having trouble with pyexpat sometimes hanging depending on the
buffer length. That code needs a general cleanup but its only 500 lines
so we're not talking alot of code.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Earth will soon support only survivor species -- dandelions, roaches, 
lizards, thistles, crows, rats. Not to mention 10 billion humans.
	- Planet of the Weeds, Harper's Magazine, October 1998


From paul@prescod.net  Fri Jan 21 11:37:34 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 21 Jan 2000 03:37:34 -0800
Subject: [XML-SIG] Developer's Day position paper
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
 <14468.45551.632590.597771@beluga.mojam.com> <14469.11716.833194.855673@newcnri.cnri.reston.va.us>
Message-ID: <388844FE.2B682DC7@prescod.net>

Andrew Kuchling wrote:
> 
> [amk@mira bench]$ python pyexp.py
> PyExpat w/ null handlers: 279663 bytes in 0.04 seconds = 6575.70 K/sec
> PyExpat w/ StartElementHandler: 279663 bytes in 0.19 seconds = 1457.33
> K/sec
> PyExpat w/ Start,End: 279663 bytes in 0.25 seconds = 1102.61 K/sec
> PyExpat w/ Start,End,Char,PI: 279663 bytes in 0.36 seconds = 758.03
> K/sec
> Fast xmllib: 279663 bytes in 3.15 seconds = 86.66 K/sec
> Slow xmllib: 279663 bytes in 17.77 seconds = 15.37 K/sec
> Raw sgmlop: 279663 bytes in 0.02 seconds = 11004.42 K/sec
> [amk@mira bench]$
> 
> Assuming no errors in the benchmark, xmllib on top of PyExpat should
> be around half as fast as xmllib on top of sgmlop, probably roughly
> 40K/sec on my machine.  (That's just a guess, though.)  Like
> economists, this benchmark probably points in several directions. :)

I don't see where you get that figure. Actual parsing takes up a small
fraction of xmllib's time. If you reduce that to zero you still don't
speed up the entire process much. If you double it, you don't slow down
the entire process much. If you double the 0.02 seconds (parsing time)
in the 3.15 (xmllib processing time) you change the time to 3.17 seconds
-- an increase of just 0.6%. (but its late...I may be missing something)

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Earth will soon support only survivor species -- dandelions, roaches, 
lizards, thistles, crows, rats. Not to mention 10 billion humans.
	- Planet of the Weeds, Harper's Magazine, October 1998


From guido@CNRI.Reston.VA.US  Fri Jan 21 19:01:08 2000
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 21 Jan 2000 14:01:08 -0500
Subject: [XML-SIG] Python Conference weather
Message-ID: <200001211901.OAA29501@eric.cnri.reston.va.us>

If you're traveling to the Python Conference, be advised that winter
has finally arrived in the Washington, DC area.  We're currently
experiencing *high* temperatures of 22 degrees F (-6 degrees C); with
the wind chill it will feel much colder.  So be sure to pack warm
clothes.

Yesterday, about 6 inches of snow fell, disrupting air travel; more is
expected on Sunday, so expect delays flying into DC.  Local
transportation should be fully operational, but may experience some
delays.

We've placed a weather advisory in the local section of the conference
website:

  http://www.python.org/workshops/2000-01/local.html

Over 250 people have registered for the conference.  See you all there!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin@mems-exchange.org  Fri Jan 21 20:27:58 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 21 Jan 2000 15:27:58 -0500 (EST)
Subject: [XML-SIG] Developer's Day position paper
In-Reply-To: <388844FE.2B682DC7@prescod.net>
References: <200001180149.UAA01038@207-172-112-37.s37.tnt4.ann.va.dialup.rcn.com>
 <14468.45551.632590.597771@beluga.mojam.com>
 <14469.11716.833194.855673@newcnri.cnri.reston.va.us>
 <388844FE.2B682DC7@prescod.net>
Message-ID: <14472.49486.805766.80052@amarok.cnri.reston.va.us>

Paul Prescod writes:
>I don't see where you get that figure. Actual parsing takes up a small
>fraction of xmllib's time. If you reduce that to zero you still don't
>speed up the entire process much. If you double it, you don't slow

<slaps self silly>  Of course, you're right; the vast majority of the
time is spent in calling Python code, not parsing, and the calling
time shouldn't scale as badly as I thought.  

I've been reading through the sgmlop.c code.  While it probably
wouldn't be too difficult to add the missing features (there are
already FIXMEs in the code where you would do DTD-related parsing) and
tighten up the parser's strictness when in XML mode, it is a
significant bit of work.  Also, given that Expat handles multiple
encodings, while sgmlop gains its speed from being able to use code
like this:

            while (ISALNUM(*p) || *p == '.')
                if (++p >= end)
                    goto eol;

I wonder if the added dereferences of a character encoding translation
table would knock sgmlop's speed down to Expat's?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
It would be nice to be unfailingly, perpetually, remorselessly funny, day in
and day out, year in and year out until somebody murdered you, now wouldn't it?
    -- Robertson Davies, _The Diary of Samuel Marchbanks_


From gstein@lyra.org  Sat Jan 22 02:48:51 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 21 Jan 2000 18:48:51 -0800 (PST)
Subject: [XML-SIG] xmllib on expat
In-Reply-To: <38884519.CEB31DE0@prescod.net>
Message-ID: <Pine.LNX.4.10.10001211846460.13911-100000@nebula.lyra.org>

On Fri, 21 Jan 2000, Paul Prescod wrote:
>...
> Here's my code:
> 
> import pyexpat
> 
> parser = pyexpat.ParserCreate()
> def show( *x ):
> 	print x
> 
> parser.StartElementHandler=show
> parser.EndElementHandler=show
> parser.CharacterDataHandler=show
> parser.ProcessingInstructionHandler=show
> 
> data=open( "hamlet.xml" ).read()
> parser.Parse( data, 0 )

Try changing that 0 to a 1... meaning "end of input." It may be possible
that Expat is buffering the input for some weird reason.

Otherwise, the code looks fine to me. qp_xml uses pyexpat if you want a
short, simple reference.

  http://www.lyra.org/greg/python/qp_xml.py

>...
> Also, I'm having trouble with pyexpat sometimes hanging depending on the
> buffer length. That code needs a general cleanup but its only 500 lines
> so we're not talking alot of code.

Never seen this.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From anthony@interlink.com.au  Sat Jan 22 03:13:49 2000
From: anthony@interlink.com.au (Anthony Baxter)
Date: Sat, 22 Jan 2000 14:13:49 +1100
Subject: [XML-SIG] request for advice on decoding Zope XML exports.
Message-ID: <200001220313.OAA30273@mbuna.arbhome.com.au>

I'd like to put together something to parse the XML format export that
Zope can produce. Unfortunately, I've not yet delved into the wonders
(if that is the word) of the XML-SIG's work, and I was wondering if
there's pointers to some simple examples of this sort of thing out
there...

thanks,
Anthony


From larsga@garshol.priv.no  Sat Jan 22 09:06:19 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 22 Jan 2000 10:06:19 +0100
Subject: [XML-SIG] request for advice on decoding Zope XML exports.
In-Reply-To: <200001220313.OAA30273@mbuna.arbhome.com.au>
References: <200001220313.OAA30273@mbuna.arbhome.com.au>
Message-ID: <m366wmear8.fsf@lambda.garshol.priv.no>

* Anthony Baxter
|
| I'd like to put together something to parse the XML format export
| that Zope can produce. Unfortunately, I've not yet delved into the
| wonders (if that is the word) of the XML-SIG's work, and I was
| wondering if there's pointers to some simple examples of this sort
| of thing out there...

There are some examples in the XML-SIG package in the demo directory.
The quotes and xbel directories contain SAX examples, whereas the dom
directory has a html2html sample that might be useful. 

There doesn't seem to be an qp_xml samples, but perhaps there should
be?

--Lars M.


From larsga@garshol.priv.no  Sat Jan 22 16:47:11 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 22 Jan 2000 17:47:11 +0100
Subject: [XML-SIG] XBEL support in Jazilla
Message-ID: <m3k8l2jbow.fsf@lambda.garshol.priv.no>

As of Jazilla 0.2, it supports XBEL as a bookmark format. This might
be worth listing on the XBEL pages.

  <URL: http://jazilla.sourceforge.net/ >

--Lars M.


From paul@prescod.net  Mon Jan 24 15:56:47 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 24 Jan 2000 09:56:47 -0600
Subject: [XML-SIG] Expat as xmllib
Message-ID: <388C763F.13264AF0@prescod.net>

This is a multi-part message in MIME format.
--------------E2834FEC56D5F06E9B5E259A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

The attached library allows expat to be used as a basis for a parser
with the xmllib interface.

Performance:

Without any xmllib-specific optimization, pyexpat runs almost as fast as
sgmlop:

raw sgmlop: 13222 items; 0.426 seconds; 1281.29 kbytes per second
fast xmllib: 13222 items; 1.445 seconds; 378.03 kbytes per second
slow xmllib: 13222 items; 6.651 seconds; 82.11 kbytes per second
pyexpat: 13210 items; 1.527 seconds; 357.68 kbytes per second

I can think of several optimizations that could speed it up quite a bit.
Also if you compare it to the xmllib in the standard distribution, we
are talking night and day so if we bundle expat we're only improving
things for them.

Conformance

Pyexpat caught more errors than xmllib, was more accepting of legal XML
input (e.g. <?foo?>) and handled entities (especially character
entities) in a manner consistent with the XML specification.

These explain the differenced in the number of "items" above.

Backwards Compatibility

The only big compatibility difference between xmllib on pyexpat and
xmllib on sgmlop is that expat expands entity references like &amp; to
"&" instead of to a separate event. This is actually a feature of expat
because it is doing entity expansion *for you*. The XML spec requires
this behavior.

The library and a test program are attached.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Earth will soon support only survivor species -- dandelions, roaches, 
lizards, thistles, crows, rats. Not to mention 10 billion humans.
	- Planet of the Weeds, Harper's Magazine, October 1998
--------------E2834FEC56D5F06E9B5E259A
Content-Type: text/plain; charset=us-ascii;
 name="ExpatOp.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ExpatOp.py"

from xml.parsers import xmllib
import pyexpat

handlerMap=[("finish_starttag", "StartElementHandler"),
	    ("finish_endtag", "EndElementHandler"),
	    ("handle_data","CharacterDataHandler"),
	    ("handle_proc","ProcessingInstructionHandler")]

class ExpatPretendingToBeSGMLOp:
	def __init__(self, encoding=None):
		if encoding:
			self.pyexpat=pyexpat.ParserCreate(encoding)
		else:
			self.pyexpat=pyexpat.ParserCreate()
	def close( self ):
		self.pyexpat.Parse( "", 1 )
	def parse( self, data ):
		self.pyexpat.Parse( data, 1 )
	def feed( self, data ):
		self.pyexpat.Parse( data, 0 )
	def register( self, obj ):
		for oldname,newname in handlerMap:
			method=getattr( obj, oldname, None )
			setattr( self.pyexpat, newname, method )

class XMLParser( xmllib.FastXMLParser ):
	def reset( self ):
		xmllib.FastXMLParser.reset(self)
		self.parser=ExpatPretendingToBeSGMLOp()
		self.feed=self.parser.pyexpat.Parse
		self.parser.register( self )

if __name__=="__main__":
	import sys
	junk = open( "out.tmp","w")

	if len( sys.argv )>1:
		filename=sys.argv[1]
	else:
		filename="hamlet.xml"

	class myparser( XMLParser ):
		def handle_proc(self, target,data):
			junk.write( "\n?"+target+data )
		def handle_data( self, data):
			junk.write( "\n'"+data)
		def finish_starttag(self,gi,attrs):
			junk.write( "\n<>"+gi+ `attrs` )
		def finish_endtag( self, gi ):
			junk.write( "\n</>"+gi )

	myparser().feed( open( filename).read() )


--------------E2834FEC56D5F06E9B5E259A
Content-Type: text/plain; charset=us-ascii;
 name="testxml1.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="testxml1.py"

# basic tests

test_sgmlop = 1

import sys
import time, string
from xml.parsers import sgmlop, xmllib, ExpatOp

try:
    FILE, VERBOSE = sys.argv[1], 2
except IndexError:
    FILE, VERBOSE = "hamlet.xml", 1

print
print "test collecting parsers on", FILE
print

# --------------------------------------------------------------------
# sgmlop

class myCollector:
    def __init__(self):
        self.data = []
        self.text = []
    def finish_starttag(self, tag, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("start", tag, data)
    def handle_proc(self, tag, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("pi", tag, data)
    def handle_special(self, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("special", data)
    def handle_entityref(self, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("entity", data)
    def handle_data(self, data):
        self.text.append(data)
    def handle_cdata(self, data):
        self.text.append("CDATA" + data)

def doRawSGMLOp():
	global parser
	t = time.clock()
	for i in range(1):
	    out = myCollector()
	    fp = open(FILE)
	    parser = sgmlop.XMLParser()
	    parser.register(out)
	    b = 0
	    while 1:
		data = fp.read(512)
		if not data:
		    break
		parser.feed(data)
		b = b + len(data)
	    parser.close()
	t1 = time.clock() - t

	print "raw sgmlop:", len(out.data), "items;", round(t1, 3), "seconds;",
	print round(b / t1 / 512, 2), "kbytes per second"
	return t1

# --------------------------------------------------------------------
# xmllib

base=None

def makeparser( basecls ):
	global base
	base=basecls
	class FastXMLParser(base):
	    def __init__(self):
		base.__init__(self)
		self.data = []
		self.text = []
	    def unknown_starttag(self, tag, data):
		if self.text:
		    self.data.append(repr(string.join(self.text, "")))
		    self.text = []
		self.data.append("start", tag, data)
	    def handle_proc(self, tag, data):
		if self.text:
		    self.data.append(repr(string.join(self.text, "")))
		    self.text = []
		self.data.append("pi", tag, data)
	    def handle_special(self, data):
		if self.text:
		    self.data.append(repr(string.join(self.text, "")))
		    self.text = []
		self.data.append("special", data)
	    def handle_entityref(self, data):
		if self.text:
		    self.data.append(repr(string.join(self.text, "")))
		    self.text = []
		self.data.append("entity", data)
	    def handle_data(self, data):
		self.text.append(data)
	    def handle_cdata(self, data):
		self.text.append("CDATA" + data)
	return FastXMLParser

def doFastXMLLib():
	global parser2

	FastXMLParser = makeparser( xmllib.FastXMLParser )

	t = time.clock()
	for i in range(1):
	    fp = open(FILE)
	    parser2 = FastXMLParser()
	    b = 0
	    while 1:
		data = fp.read(512)
		if not data:
		    break
		parser2.feed(data)
		b = b + len(data)
	    parser2.close()
	t2 = time.clock() - t

	print "fast xmllib:", len(parser2.data), "items;", round(t2, 3), "seconds;",
	print round(b / t2 / 512, 2), "kbytes per second"
	return t2

class SlowXMLParser(xmllib.SlowXMLParser):
    def __init__(self):
        xmllib.SlowXMLParser.__init__(self)
        self.data = []
        self.text = []
    def unknown_starttag(self, tag, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("start", tag, data)
    def handle_proc(self, tag, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("pi", tag, data)
    def handle_special(self, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("special", data)
    def handle_entityref(self, data):
        if self.text:
            self.data.append(repr(string.join(self.text, "")))
            self.text = []
        self.data.append("entity", data)
    def handle_data(self, data):
        self.text.append(data)
    def handle_cdata(self, data):
        self.text.append("CDATA" + data)

def doSlowXMLLib():
	global parser3
	t = time.clock()
	for i in range(1):
	    fp = open(FILE)
	    parser3 = SlowXMLParser()
	    b = 0
	    while 1:
		data = fp.read(512)
		if not data:
		    break
		parser3.feed(data)
		b = b + len(data)
	    parser3.close()
	t3 = time.clock() - t

	print "slow xmllib:", len(parser3.data), "items;", round(t3, 3), "seconds;",
	print round(b / t3 / 512, 2), "kbytes per second"
	return t3

def doPyExpat():
	global parser4
	# PyExpat
	FastXMLParser = makeparser( ExpatOp.XMLParser )

	t = time.clock()
	for i in range(1):
	    fp = open(FILE)
	    parser4 = FastXMLParser()
	    b = 0
	    while 1:
		data = fp.read(512)
		if not data:
		    break
		parser4.feed(data)
		b = b + len(data)
	    parser4.close()
	t4 = time.clock() - t

	print "pyexpat:", len(parser4.data), "items;", round(t4, 3), "seconds;",
	print round(b / t4 / 512, 2), "kbytes per second"
	return t4

t1=doRawSGMLOp()
t2=doFastXMLLib()
t3=doSlowXMLLib()
t4=doPyExpat()

print
print "normalized timing:"
print "slow xmllib", 1.0
print "fast xmllib", round(t2 / t3, 2), "(%sx)" % round(t3 / t2, 1)
print "sgmlop     ", round(t1 / t3, 2), "(%sx)" % round(t3 / t1, 1)
print "pyexpat ", round(t4 / t3, 2), "(%sx)" % round(t3 / t4, 1)
print

print "looking for differences:"

items = min(len(parser2.data), len(parser4.data))

for i in xrange(items):
    if parser2.data[i] != parser3.data[i]:
        for j in range(max(i-5, 0), min(i+5, items)):
            if parser2.data[j] != parser3.data[j]:
                print "+", j+1, parser2.data[j]
                print "*", j+1, parser3.data[j]
            else:
                print "=", j+1, parser2.data[j]
        break
else:
    print "(no differences)"


--------------E2834FEC56D5F06E9B5E259A--


From akuchlin@mems-exchange.org  Mon Jan 24 16:49:51 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 24 Jan 2000 11:49:51 -0500 (EST)
Subject: [XML-SIG] PyXML 0.5.3 released
Message-ID: <200001241649.LAA12996@amarok.cnri.reston.va.us>

I've released a new snapshot of the PyXML snapshot.  
The changes are pretty minor:

    * Fixed setup.py to work with the Distutils, following suggestions
from Greg Ward
    * Dropped the xmlarch code, as previously discussed
    * Started signing the distribution with my GnuPG key

That's about it.  I'll try to add the Expat/xmllib code for the next
snapshot.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
To see the world in a grain of sand, / And a heaven in a wild flower; / Hold
infinity in the palm of your hand, / And eternity in an hour.
    -- William Blake, "Auguries of Innocence"


From akuchlin@mems-exchange.org  Mon Jan 24 17:35:52 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 24 Jan 2000 12:35:52 -0500 (EST)
Subject: [XML-SIG] Expat as xmllib
In-Reply-To: <388C763F.13264AF0@prescod.net>
References: <388C763F.13264AF0@prescod.net>
Message-ID: <14476.36216.685678.946314@amarok.cnri.reston.va.us>

Paul Prescod writes:
>Without any xmllib-specific optimization, pyexpat runs almost as fast as
>sgmlop:
>raw sgmlop: 13222 items; 0.426 seconds; 1281.29 kbytes per second
>fast xmllib: 13222 items; 1.445 seconds; 378.03 kbytes per second
>slow xmllib: 13222 items; 6.651 seconds; 82.11 kbytes per second
>pyexpat: 13210 items; 1.527 seconds; 357.68 kbytes per second
>I can think of several optimizations that could speed it up quite a bit.

21K/sec difference, or around 6% slower; very good.  Let's discuss
these optimizations at IPC8; I'd like to get a version of this into
the CVS tree ASAP.

>Also if you compare it to the xmllib in the standard distribution, we
>are talking night and day so if we bundle expat we're only improving
>things for them.

Note that the xmllib in 1.5.2 and xml.parsers.xmllib are different;
namespace support has been added to the 1.5.2 version.  This is a
divergence that's needed fixing for a while, and now seems like a good
opportunity..

Is Expat becoming a fairly common component of Linux and *BSD
distributions?  I still dislike the idea of adding Expat to the Python
distribution, because of possible collisions with updated versions of
Expat.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
And at times the fact of her absence will hit you like a blow to the chest,
and you will weep. But this will happen less and less as time goes on.
    -- From SANDMAN: "The Song of Orpheus"


From paul@prescod.net  Tue Jan 25 06:15:49 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 25 Jan 2000 01:15:49 -0500
Subject: [XML-SIG] Expat as xmllib
References: <388C763F.13264AF0@prescod.net> <14476.36216.685678.946314@amarok.cnri.reston.va.us>
Message-ID: <388D3F95.82F32C4@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> 21K/sec difference, or around 6% slower; very good.  Let's discuss
> these optimizations at IPC8; I'd like to get a version of this into
> the CVS tree ASAP.

The optimizations I am thinking about are in pyexpat itself. It also
needs better error handling.

> Note that the xmllib in 1.5.2 and xml.parsers.xmllib are different;
> namespace support has been added to the 1.5.2 version.  This is a
> divergence that's needed fixing for a while, and now seems like a good
> opportunity..

Expat can do namespaces for us in C. 

> Is Expat becoming a fairly common component of Linux and *BSD
> distributions?  I still dislike the idea of adding Expat to the Python
> distribution, because of possible collisions with updated versions of
> Expat.

I'm not convinced that this is a big problem but let's just say it is.
How hard would it be to rename exported object names and the final
library name. It seems like it would be useful (and reasonably doable)
to create a tool that uniqifies dynamic libraries in general. You can
get the names using object-file reading tools or by parsing the text.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
The new revolutionaries believe the time has come for an aggressive 
move against our oppressors. We have established a solid beachhead 
on Friday. We now intend to fight vigorously for 'casual Thursdays.'
  -- who says America's revolutionary spirit is dead?


From ke@gnu.franken.de  Tue Jan 25 18:18:30 2000
From: ke@gnu.franken.de (Karl EICHWALDER)
Date: 25 Jan 2000 19:18:30 +0100
Subject: [XML-SIG] Re: Expat as xmllib
In-Reply-To: "Andrew M. Kuchling"'s message of "Mon, 24 Jan 2000 12:35:52 -0500 (EST)"
References: <388C763F.13264AF0@prescod.net> <14476.36216.685678.946314@amarok.cnri.reston.va.us>
Message-ID: <sh901erp55.fsf@tux.gnu.franken.de>

"Andrew M. Kuchling" <akuchlin@mems-exchange.org> writes:

|   Is Expat becoming a fairly common component of Linux and *BSD
|   distributions?

Debian and SuSE are featuring expat; SuSE with the next release,
scheduled for March 2000 (IIRC).  I didn't check other distributions.

-- 
work : ke@suse.de                          |
     : http://www.suse.de/~ke/             |          ------    ,__o
home : ke@gnu.franken.de                   |         ------   _-\_<,
     : http://www.franken.de/users/gnu/ke/ |        ------   (*)/'(*)


From uche.ogbuji@fourthought.com  Wed Jan 26 18:35:18 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 26 Jan 2000 11:35:18 -0700
Subject: [XML-SIG] ANN: 4DOM 0.9.2
Message-ID: <388F3E66.ADEC966D@fourthought.com>

FourThought LLC (http://FourThought.com) announces the release of

                             4DOM 0.9.2
                      -----------------------
                An XML/HTML Python library using the
                  Document Object Model interface

4DOM is a Python library for XML and HTML processing and manipulation
using the W3C's Document Object Model for interface.  4DOM implements
DOM Core level 2, HTML level 2 and Level 2 Document Traversal.

4DOM should work on all platforms supported by Python.  If you have
any problems with a particular platform, please e-mail the authors.

4DOM is designed to allow developers rapidly design applications
that read, write or manipulate HTML and XML.

News
----

 - Major fixes to namespace code
 - Other bug-fixes

More info and Obtaining 4DOM
----------------------------

Please see

        http://FourThought.com/4Suite/4DOM

Or you can download 4DOM from

        ftp://FourThought.com/pub/4Suite/4DOM

4DOM is distributed under a license similar to that of Python.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Wed Jan 26 18:41:48 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 26 Jan 2000 11:41:48 -0700
Subject: [XML-SIG] ANN: 4XSLT 0.8.2 and 4XPath 4.8.2
Message-ID: <388F3FEC.C2256552@fourthought.com>

FourThought LLC (http://FourThought.com) announces the release of

                      4XSLT and 4XPath 0.8.2
                      ----------------------
                      A python implementation
                     of the W3C's XSLT language


4XSLT is an XML transformation processor based on the W3C's
specification
for the XSLT transform language.  4XPath implements the W3C XPath
language
for indicating and selecting XML document components.
 
http://www.w3.org/TR/xslt 
 
4XPath implements the full 4XPath recommendation except for the 'lang'
core function.

Currently, 4XSLT supports a sub-set of the XSLT recommendation including
the following:

Full expression support and attribute-value template expansion
xsl:include                     xsl:import
xsl:template                    xsl:apply-imports
xsl:apply-templates             xsl:copy
xsl:call-template               xsl:if
xsl:for-each                    xsl:choose 
xsl:element                     xsl:when
xsl:attribute                   xsl:otherwise
xsl:text                        xsl:message
xsl:value-of                    xsl:variable   
xsl:processing-instruction      xsl:param
xsl:comment                     xsl:with-param
xsl:strip-space                 xsl:key
xsl:preserve-space              xsl:copy-of
xsl:sort                        xsl:namespace-alias
xsl:output
  
and, of course, xsl:stylesheet, xsl:transform, literal elements and text

Using the xml output method, 4XSLT produces the result tree by throwing
events from the emerging SAX 2 standard to a handler, so it can be
easily
modified to supply results to any SAX 2 consumer.  For the 'html' and
'text' output methods special SAX consumers produce HTML DOM nodes and
plain text respectively.

Note: 4XSLT and 4XPath cannot work with JPython.

News
----
                     
Changes in 0.8.2
----------------

 - Added i18n hooks
 - Added support for terminate option on xsl:message 
 - Added more error checks
 - Fixed attribute-value templates
 - Fixed params
 - Kludge to avoid strtod('NaN') problem on Windows and FreeBSD:
hopefully
temporary
 - Bug-fixes

More info and Obtaining 4XPath and 4XSLT
----------------------------------------

Please see

        http://FourThought.com/4Suite/4XPath
        http://FourThought.com/4Suite/4XSLT

Or you can download 4XSLT from

        ftp://FourThought.com/pub/4Suite/4XPath
        ftp://FourThought.com/pub/4Suite/4XSLT 

4XPath and 4XSLT are distributed under a license similar to that of
Python.


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From forkel@arsnova.de  Wed Jan 26 19:09:47 2000
From: forkel@arsnova.de (Malte Forkel)
Date: Wed, 26 Jan 2000 20:09:47 +0100
Subject: [XML-SIG] precompiled version?
Message-ID: <388F467B.B6758C2C@arsnova.de>

Hi,

any chance to find a precompiled version of the XML toolkit?
I'm using Python on Windows/NT.

Thanks, Malte


From akuchlin@mems-exchange.org  Fri Jan 28 18:07:30 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 28 Jan 2000 13:07:30 -0500 (EST)
Subject: [XML-SIG] DevDay results
Message-ID: <200001281807.NAA18796@amarok.cnri.reston.va.us>

The XML-SIG's developer's day session went well, and, unlike most DD
sessions, we actually achieved consensus on something. :) To summarize
the outcome:

    * The current PyDOM code will be dropped and replaced with 4DOM.
      The precise details of how this will work are still to be
      resolved; will the 4DOM code move into xml.dom, or will xml.dom
      import from xml.Ft.dom and provide some wrappers?

    * PyExpat's interface will be changed to be SAX-like, and we'll
      lobby Guido to add PyExpat to 1.6, along with Expat itself.  It
      will be renamed, preferably to something with SAX in the name.  
      (expat_sax? pysax?  pyxml? whatever...)  It'll be updated to
      support all the features in current versions of Expat; Jim
      Fulton has an updated version of PyExpat inside Zope that will
      probably be used.

    * xmllib.py will be left unmodified, though it'll be deprecated in
      favor of PyExpat.   
      
    * When 1.6 begins supporting Unicode, we'll fork the development
      tree into two branches; the branch that works with 1.5 will be
      maintained, though probably not actively developed.  This will
      leave the other branch free to use 1.6-specific features without
      worrying about backward compatibility.  

If I've forgotten something from the session, please let me know.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
First things first, but not necessarily in that order.
    -- The Doctor, in John Flanagan and Andrew McCulloch's _Meglos_


From jack@oratrix.nl  Fri Jan 28 21:59:45 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 28 Jan 2000 22:59:45 +0100
Subject: [XML-SIG] DevDay results
In-Reply-To: Message by "Andrew M. Kuchling" <akuchlin@mems-exchange.org> ,
 Fri, 28 Jan 2000 13:07:30 -0500 (EST) , <200001281807.NAA18796@amarok.cnri.reston.va.us>
Message-ID: <20000128215951.5DC70189FE1@oratrix.oratrix.nl>

Recently, "Andrew M. Kuchling" <akuchlin@mems-exchange.org> said:
>     * PyExpat's interface will be changed to be SAX-like, and we'll
>       lobby Guido to add PyExpat to 1.6, along with Expat itself.  It
>       will be renamed, preferably to something with SAX in the name.  
>       (expat_sax? pysax?  pyxml? whatever...)  It'll be updated to
>       support all the features in current versions of Expat; Jim
>       Fulton has an updated version of PyExpat inside Zope that will
>       probably be used.

A suggestion to whoever is going to implement this: if we're going to
include a private version of expat it's probably a good idea to change 
all the C global symbols. Expat is pretty popular, and I've been
bitten a few times by global symbol name clashes where Python used one 
version of a library and a package used (or embedded in an application 
in which Python was also embedded) had incorporated a different version.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From paul@prescod.net  Fri Jan 28 22:04:59 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 28 Jan 2000 16:04:59 -0600
Subject: [XML-SIG] Expat answers
References: <200001281807.NAA18796@amarok.cnri.reston.va.us>
Message-ID: <3892128B.4C72799F@prescod.net>

Answers to questions that people have asked me about Expat:

1. Expat can parse DTDs if we want it to. If you compile DTD support in
then you can turn on or off parameter entity parsing on a per-parser
instance basis.

"""
XML_PARAM_ENTITY_PARSING_NEVER 
Don't parse parameter entities or the external subset 


XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE 
Parse parameter entites and the external subset unless standalone was
set to "yes" in the XML declaration. 


XML_PARAM_ENTITY_PARSING_ALWAYS 
Always parse parameter entities and the external subset 
"""

It doesn't validate but we could probably build that in Python

2. Expat can be compiled to output either UTF-8 or UTF-16 (which is for
our purposes the same as UCS-2). It is theoretically possible to make a
parser that understands Unicode enough to do proper well-formedness
checking yet leaves characters in their native encoding but as far as I
know, no such tool exists. I don't believe that sgmlop could ever be
that tool, even when it is rewritten on top of Fredrick's fast Unicode
regexp engine because that engine would still be UTF-16/UCS-2 specific.

If you need to process shift-JIS information then you need to allow
Expat to convert it to UTF-16 and then convert it back to shift-JIS. I
don't think that there is any XML parser in the world that allows you to
work in any arbitrary native encoding with no conversions. Maybe some
day.

Handling for non-Unicode character sets is simply not supported. The XML
world decided specifically against this based on two arguments:

 * one cannot argue against Unicode on the basis of character encoding
*efficiency* because we allow any encoding (even those compatible with
the Unicode subset of shift-JIS etc.) to be used.

 * one cannot argue against Unicode on the basis that it does not allow
"private" characters because it does:

http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Oct-1998/0366.html
http://www.ascc.net/xml/en/utf-8/faq/faq-xsl.html

3. Expat outputs UTF-16 so it is ready for 20-bit Unicode, wherein we
will find:

"Plane 1 is going to hold ancient and invented scripts and musical
symbols, while Plane 2 (U-0002xxxx) is reserved for additional Han
ideographs, Plane 14 (U-000Exxxx) is going to start with some meta
characters for language tagging and there are two entire bonus
private-use planes."

Python itself will not handle 20bit characters yet, so the situation
with them will be just like the situation with 16 bit characters in
Python/xmllib today (Python will think that they are two characters).

 Paul Prescod


"Andrew M. Kuchling" wrote:
> 
> The XML-SIG's developer's day session went well, and, unlike most DD
> sessions, we actually achieved consensus on something. :) To summarize
> the outcome:
> 
>     * The current PyDOM code will be dropped and replaced with 4DOM.
>       The precise details of how this will work are still to be
>       resolved; will the 4DOM code move into xml.dom, or will xml.dom
>       import from xml.Ft.dom and provide some wrappers?
> 
>     * PyExpat's interface will be changed to be SAX-like, and we'll
>       lobby Guido to add PyExpat to 1.6, along with Expat itself.  It
>       will be renamed, preferably to something with SAX in the name.
>       (expat_sax? pysax?  pyxml? whatever...)  It'll be updated to
>       support all the features in current versions of Expat; Jim
>       Fulton has an updated version of PyExpat inside Zope that will
>       probably be used.
> 
>     * xmllib.py will be left unmodified, though it'll be deprecated in
>       favor of PyExpat.
> 
>     * When 1.6 begins supporting Unicode, we'll fork the development
>       tree into two branches; the branch that works with 1.5 will be
>       maintained, though probably not actively developed.  This will
>       leave the other branch free to use 1.6-specific features without
>       worrying about backward compatibility.
> 
> If I've forgotten something from the session, please let me know.
> 
> --
> A.M. Kuchling                   http://starship.python.net/crew/amk/
> First things first, but not necessarily in that order.
>     -- The Doctor, in John Flanagan and Andrew McCulloch's _Meglos_
> 
> 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
The new revolutionaries believe the time has come for an aggressive 
move against our oppressors. We have established a solid beachhead 
on Friday. We now intend to fight vigorously for 'casual Thursdays.'
  -- who says America's revolutionary spirit is dead?


From paul@prescod.net  Mon Jan 31 08:46:59 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Jan 2000 02:46:59 -0600
Subject: [XML-SIG] Expat strategy
References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl>
Message-ID: <38954C03.D8C05F1@prescod.net>

Jack Jansen wrote:
> 
> A suggestion to whoever is going to implement this: if we're going to
> include a private version of expat it's probably a good idea to change
> all the C global symbols. Expat is pretty popular, and I've been
> bitten a few times by global symbol name clashes where Python used one
> version of a library and a package used (or embedded in an application
> in which Python was also embedded) had incorporated a different version.

1. Exports

There was some debate about what would happen if we statically linked
pyexpat to xmlparse.dll. I am confident that we could, on most
reaasonable platforms, export only the symbols Python needs to bootstrap
and not all of Expat's static symbols. It is routine on Windows to
statically link to a C library without worrying about conflicts with
"open". 

Perl's expat.dll exports exactly two names: _boot_XML__Parser__Expat and
_boot_XML__Parser__Expat. BTW, it's 112K.

Anyhow, I count 49 exported symbols and all of them begin with the
prefix XML_ so they can be safely renamed with 49 #defines if we decide
it is necessary. That's ugly but safe and effective.

2. API

We had talked of embedding SAX directly in PyExpat but in retrospect I
don't think that there is any need to do so. We can layer SAX 1 and 2 on
top of a transliterated Expat API without any loss of performance. This
is true because of Expat's handler architecture. Even if you layer
xmllib on top of sax 1 on top of another implementation of xmllib on top
of another layer of sax 2 on top of expat, you get high performance if
the "handler" is the same method at all levels. In other words, we can
"wrap" expat at the Python level without doing any proxying of events.

I'm only mentioning xmllib to emphasize the point that the number of
layers doesn't matter because you don't lose performance in the layers.
I'm not proposing that we layer xmllib on top of Expat.

If you pass a method "foo" to xmllib as finish_starttag and it passes it
to sax 2 as SAX2_StartElement and it passes it to SAX1 as
SAX1_StartElement which passes it to Expat as XML_SetElementHandler, you
still only get one Python function call per element in the document.

So let's expose the raw Expat API and build SAX 1 and SAX 2 layers on
top of it. 

3. Error handling

PyExpat is one of a very few modules in the library to use setjmp. It
uses it for error handling and I'm not sure if there is any way around
it so I won't advocate its removal unless someone can propose a better
way.  I'm not clear how to signal to expat that it should quit parsing
other than through setjmp/longjmp.

In general, though, error handling doesn't seem to work for me:

>>> from xml.parsers.pyexpat import ParserCreate, ErrorString
>>> p=ParserCreate()
>>> p.foo="abc"
Traceback (innermost last):
  File "<stdin>", line 1, in ?
SystemError: error return without exception set
>>> p.StartElementHandler=junk
>>> p.Parse( "<a></a>" )
0
>>> from xml.parsers.pyexpat import ParserCreate, ErrorString
>>> p=ParserCreate()
>>> def junk2(a,b):
...     print a,b
...     assert 0
...
>>> p.StartElementHandler=junk2
>>> print p.Parse( "<abc><def></def></abc>", 1 )
abc []
def []
1

Errors in the Python do not appropriately abort the process, despite 
the setjmp/longjmp. I am guessing that this is due to the fact 
that the call goes across Windows DLL boundaries. If that's really all
it is then it will work better once we statically link to expat. I'd
still rather not use setjmp/longjmp if there was a way around it...

if (rv == NULL) {
	if (self->jmpbuf_valid)
		longjmp(self->jmpbuf, 1);
	My_WriteStderr("Exception in CharacterDataHandler()\n");
	PyErr_Clear();
}

One funny thing is the code after the longjmp. I guess maybe its a
fallback for when the long-jump doesn't work. It doesn't seem to work on
Windows, though.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
The new revolutionaries believe the time has come for an aggressive 
move against our oppressors. We have established a solid beachhead 
on Friday. We now intend to fight vigorously for 'casual Thursdays.'
  -- who says America's revolutionary spirit is dead?


From larsga@garshol.priv.no  Mon Jan 31 09:15:59 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 31 Jan 2000 10:15:59 +0100
Subject: [XML-SIG] DevDay results
In-Reply-To: <200001281807.NAA18796@amarok.cnri.reston.va.us>
References: <200001281807.NAA18796@amarok.cnri.reston.va.us>
Message-ID: <m3g0ve7ga8.fsf@lambda.garshol.priv.no>

* Andrew M. Kuchling
| 
|     * PyExpat's interface will be changed to be SAX-like, and we'll
|       lobby Guido to add PyExpat to 1.6, along with Expat itself.  It
|       will be renamed, preferably to something with SAX in the name.  
|       (expat_sax? pysax?  pyxml? whatever...)

Saxpat?

--Lars M.


From paul@prescod.net  Mon Jan 31 09:08:00 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Jan 2000 03:08:00 -0600
Subject: [XML-SIG] DevDay results
References: <200001281807.NAA18796@amarok.cnri.reston.va.us>
Message-ID: <389550F0.5D1AE3F@prescod.net>

I don't remember if we achieved clear concensus on whether to bundle the
DOM or anything else into 1.6 along with Python. I think we were leaning
towards bundling the DOM based on the argument that SAX and DOM were the
"two biggies" in terms of API.
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
The new revolutionaries believe the time has come for an aggressive 
move against our oppressors. We have established a solid beachhead 
on Friday. We now intend to fight vigorously for 'casual Thursdays.'
  -- who says America's revolutionary spirit is dead?


From uche.ogbuji@fourthought.com  Mon Jan 31 12:24:18 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 31 Jan 2000 05:24:18 -0700
Subject: [XML-SIG] DevDay results
In-Reply-To: Your message of "Mon, 31 Jan 2000 03:08:00 CST."
 <389550F0.5D1AE3F@prescod.net>
Message-ID: <200001311224.FAA02204@localhost.localdomain>

> I don't remember if we achieved clear concensus on whether to bundle the
> DOM or anything else into 1.6 along with Python. I think we were leaning
> towards bundling the DOM based on the argument that SAX and DOM were the
> "two biggies" in terms of API.

Hmm.  As much as I'd find it cool to have 4DOM bundled into Python, it is 
rather vast: 104 files, excluding the 84-file test-suite (which we haven't 
been publishing, but we shall now that the xml-sig has adopted it).  I have 
the sense that a raised eyebrow would be the nicest we can expect from Guido.

My vote would be to bundle SAX and Expat, which will do for many uses.  If 
they need more sophisticated XML, they can download the XML package to get 
DOM, XPath, XSLT, etc.

I think this is the way it is in Perl (in fact, I'm not even sure XML is 
bundled at all in Perl).  Of course, Perl has CPAN, which makes finding 
modules much less travail, but that is a problem for Python to solve in other 
ways than bundling every package into the main distro.  I understand the 
dist-utils SIG are close to a solution.

-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From fdrake@acm.org  Mon Jan 31 15:53:04 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 31 Jan 2000 10:53:04 -0500 (EST)
Subject: [XML-SIG] DevDay results
In-Reply-To: <200001311224.FAA02204@localhost.localdomain>
References: <389550F0.5D1AE3F@prescod.net>
 <200001311224.FAA02204@localhost.localdomain>
Message-ID: <14485.45024.576090.154937@weyr.cnri.reston.va.us>

uche.ogbuji@fourthought.com writes:
 > My vote would be to bundle SAX and Expat, which will do for many uses.  If 
 > they need more sophisticated XML, they can download the XML package to get 
 > DOM, XPath, XSLT, etc.

Uche,
  I agree; I think that's pretty much the consensus.  It certainly
seems reasonable.  That allows existing xmllib users to convert easily 
to something that will be maintained in the standard library and users 
of the other APIs will still need to do what they have to do now
(download something); it'll just be easier to have one XML package for 
the "advanced" users.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From ken@bitsko.slc.ut.us  Mon Jan 31 20:03:56 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 31 Jan 2000 14:03:56 -0600
Subject: [XML-SIG] Expat strategy
In-Reply-To: Paul Prescod's message of Mon, 31 Jan 2000 02:46:59 -0600
References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> <38954C03.D8C05F1@prescod.net>
Message-ID: <x5hffuknyr.fsf@bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> 2. API
> 
> We had talked of embedding SAX directly in PyExpat but in retrospect I
> don't think that there is any need to do so. We can layer SAX 1 and 2 on
> top of a transliterated Expat API without any loss of performance. This
> is true because of Expat's handler architecture. Even if you layer
> xmllib on top of sax 1 on top of another implementation of xmllib on top
> of another layer of sax 2 on top of expat, you get high performance if
> the "handler" is the same method at all levels. In other words, we can
> "wrap" expat at the Python level without doing any proxying of events.

> So let's expose the raw Expat API and build SAX 1 and SAX 2 layers on
> top of it. 

I agree that there's no reason to try to "block" the raw API from
being used, but general usage documentation should focus on SAX.
Otherwise new module authors might write to the raw interface and
lose interoperability with other SAX modules.

  -- Ken


From jack@oratrix.nl  Mon Jan 31 21:21:04 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 31 Jan 2000 22:21:04 +0100
Subject: [XML-SIG] Expat strategy
In-Reply-To: Message by Paul Prescod <paul@prescod.net> ,
 Mon, 31 Jan 2000 02:46:59 -0600 , <38954C03.D8C05F1@prescod.net>
Message-ID: <20000131212109.195E0D3AC2@oratrix.oratrix.nl>

Recently, Paul Prescod <paul@prescod.net> said:
> PyExpat is one of a very few modules in the library to use setjmp. It
> uses it for error handling and I'm not sure if there is any way around
> it so I won't advocate its removal unless someone can propose a better
> way.  I'm not clear how to signal to expat that it should quit parsing
> other than through setjmp/longjmp.

I think I put in the setjmp/longjmp, basically because I could see no
other way to stop the parser, indeed. There's a couple of other
libraries I embedded in Python that have the same problem (jpeg and
pbm spring to mind).

Aside from the cross-segment longjmps, which needed a bit of massaging 
on the Mac, so assume the same is could be true on Windows, there's
one very big problem with setjmp/longjmp and that is that they're not
thread-safe.

However, in the case of the use in Pyexpat the Python programmer will
have to do something pretty gross to invoke this bug: as the
jmpbuf_valid flag is saved in the parser object and set/reset around
the Parse() call you'll have to create one parser and call
parser.Parse() on the one object simultaneously in two threads.
Still, putting a mutex in the object is probably a good idea.

> if (rv == NULL) {
> 	if (self->jmpbuf_valid)
> 		longjmp(self->jmpbuf, 1);
> 	My_WriteStderr("Exception in CharacterDataHandler()\n");
> 	PyErr_Clear();
> }
> 
> One funny thing is the code after the longjmp. I guess maybe its a
> fallback for when the long-jump doesn't work. It doesn't seem to work on
> Windows, though.

I think the code is better replaced by an abort(): if
myStartElementHandler and myEndElementHandler are called outside of a
Parse() invocation there's something pretty basic about expat that I
didn't understand when I wrote this code:-)

But please note that all this is based on how Pyexpat looked when I
maintained it, I haven't had the time to track developments since
then... 
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack@oratrix.nl  Mon Jan 31 21:27:02 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 31 Jan 2000 22:27:02 +0100
Subject: [XML-SIG] Expat strategy
In-Reply-To: Message by Jack Jansen <jack@oratrix.nl> ,
 Mon, 31 Jan 2000 22:21:04 +0100 , <20000131212109.195E0D3AC2@oratrix.oratrix.nl>
Message-ID: <20000131212707.626A4D3AC2@oratrix.oratrix.nl>

Recently, Jack Jansen <jack@oratrix.nl> said:
> I think the code is better replaced by an abort(): if
> myStartElementHandler and myEndElementHandler are called outside of a
> Parse() invocation there's something pretty basic about expat that I
> didn't understand when I wrote this code:-)

Whoops, there is a buglet in the code on second inspection. While I
don't think it can be triggered normally, can someone add a line to
clear the jmpbuf_valid flag when the setjmp returns in the longjmp()ed 
condition?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From paul@prescod.net  Mon Jan 31 22:42:53 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Jan 2000 14:42:53 -0800
Subject: [XML-SIG] Expat strategy
References: <20000128215951.5DC70189FE1@oratrix.oratrix.nl> <38954C03.D8C05F1@prescod.net> <x5hffuknyr.fsf@bitsko.slc.ut.us>
Message-ID: <38960FED.8ED5867C@prescod.net>

Ken MacLeod wrote:
> 
> I agree that there's no reason to try to "block" the raw API from
> being used, but general usage documentation should focus on SAX.
> Otherwise new module authors might write to the raw interface and
> lose interoperability with other SAX modules.

Agree 100%. My documentation for the raw API would be: "read the source
code or go read Clark Cooper's article on XML.com." :)

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Ivory towers are no longer in order. We need ivory 
networks. Today, sitting quietly and thinking is the 
world´s greatest generator of wealth and prosperity."
 - http://www.bespoke.org/viridian/print.asp?t=140


From paul@prescod.net  Mon Jan 31 22:51:51 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Jan 2000 14:51:51 -0800
Subject: [XML-SIG] DevDay results
References: <200001311224.FAA02204@localhost.localdomain>
Message-ID: <38961207.EF6835F2@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 

> My vote would be to bundle SAX and Expat, which will do for many uses.  If
> they need more sophisticated XML, they can download the XML package to get
> DOM, XPath, XSLT, etc.

My concern is that I don't consider the DOM "advanced". Hell, Visual
Basic and Javascript programmers can't even spell SAX but they all use
the DOM. If a new user asked me which to learn first, I'd say "the DOM"
because any semi-competent newbie can find their way around a tree(?) to
get the information they need whereas being smart enough to buffer the
right information in the right order takes a little more algorithmic
fore-though ('scuse me).

Plus, I kind of feel that an XSL-ish tree iteration with triggers is
going to be the dominant XML processing model of the 21st century.

Nevertheless, I'll leave this for now. I don't want to jinx our chances
of getting expat in. Maybe in 1.7 we could have some kind of minimal
read-only DOM 1 with namespaces.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Ivory towers are no longer in order. We need ivory 
networks. Today, sitting quietly and thinking is the 
world´s greatest generator of wealth and prosperity."
 - http://www.bespoke.org/viridian/print.asp?t=140