From larsga@ifi.uio.no  Tue Dec  1 14:12:05 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 01 Dec 1998 15:12:05 +0100
Subject: [XML-SIG] Trivial DOM patch
Message-ID: <wk4srglznu.fsf@ifi.uio.no>

This patch fixes a trivial buglet in the DOM example in xml.dom.core:

[larsga@birk105 dom]$ cvs diff core.py
Index: core.py
===================================================================
RCS file: /projects/cvsroot/xml/dom/core.py,v
retrieving revision 1.33
diff -r1.33 core.py
30c30,31
< doc.appendChild (head)                  # and this
---
> html.appendChild(head)                  # and this
> doc.appendChild (html)                  # and this

--Lars M.


From akuchlin@cnri.reston.va.us  Tue Dec  1 14:14:49 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue,  1 Dec 1998 09:14:49 -0500 (EST)
Subject: [XML-SIG] New xml-0.5 prerelease
Message-ID: <13923.63646.128559.398310@amarok.cnri.reston.va.us>

Round the loop we go again!  I've put a new prerelease of the XML
package up at http://www.python.org/sigs/xml-sig/files/ ; look for
xml-0.5.tgz or .zip.

	I really want to announce 0.5, so please try compiling it and
let me know if it goes smoothly; send me private e-mail even if you
have no problems, so that I know that people have actually tried it.
If I hear no problem reports by Thursday or Friday, I'll call it 0.5
final and announce it in various places, because we really need to
start grabbing some mindshare for Python in the XML field.  

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Chemistry is physics without thought; mathematics is physics without purpose.
    -- Anonymous


From Jeff.Johnson@icn.siemens.com  Tue Dec  1 16:05:41 1998
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Tue, 1 Dec 1998 11:05:41 -0500
Subject: [XML-SIG] SGML to DOM?
Message-ID: <852566CD.00588464.00@li01.lm.ssc.siemens.com>


I sure hope this is a stupid question with an easy answer...

I have a large SGML document that I am converting to HTML and another SGML
DTD.  For my prototype, I opened the document in ArborText's Adept SGML
editor and saved it as XML.  This made it well formed and escaped some '<'
and '>' characters that were not markup.  Unfortunately, it took the line
feeds out of some pre-formatted text.

I figured that was not a problem because I wanted to have Python read in
the native SGML anyway.  Then I briefly read the sgmllib docs; the part
about it not supporting full SGML, just whatever HTML needs.

Could someone tell me if there is a way to read in a non-well formed SGML
document, with preformatted text into a DOM tree?

Thanks in advance,
Jeff


From akuchlin@cnri.reston.va.us  Tue Dec  1 16:28:39 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue,  1 Dec 1998 11:28:39 -0500 (EST)
Subject: [XML-SIG] SGML to DOM?
In-Reply-To: <852566CD.00588464.00@li01.lm.ssc.siemens.com>
References: <852566CD.00588464.00@li01.lm.ssc.siemens.com>
Message-ID: <13924.6050.370984.918423@amarok.cnri.reston.va.us>

Jeff.Johnson@icn.siemens.com writes:
>Could someone tell me if there is a way to read in a non-well formed SGML
>document, with preformatted text into a DOM tree?

	If your SGML parser can output ESIS-formatted data, there's
xml.dom.esis_builder which might help you.  It doesn't support all of
ESIS, though, and that really needs to be fixed; I think ESIS support
is important to the really serious SGML users, as opposed to the XML
dilettantes.  Unfortunately, I'm just a dilettante.

	BTW, does anyone have a test SGML document which exercises all
of the ESIS command characters?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It is not that I wanted to know a great deal, in order to acquire what is now
called expertise, and which enables one to become an expert-tease to people
who don't know as much as you do about the tiny corner you have made your own.
    -- Robertson Davies, _The Rebel Angels_


From digitome@iol.ie  Wed Dec  2 13:49:32 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Wed, 2 Dec 1998 13:49:32 GMT
Subject: [XML-SIG] SGML to DOM?
Message-ID: <199812021349.NAA19589@GPO.iol.ie>

A couple of ideas for you:-

1) Instead of "save-as" from Adept (and loosing the line feeds you mention)
have you tried converting with James Clarks SX utility?

2) If you generate ESIS (James Clark's nsgmls) you can get into DOM in
Python via
the ESIS builder. There will be ESIS event types in your SGML that are invalid
if your SGML uses features not available in XML. These are easily spotted
owing to the line oriented nature of ESIS.

At 11:05 01/12/98 -0500, you wrote:
>
>
>I sure hope this is a stupid question with an easy answer...
>
>I have a large SGML document that I am converting to HTML and another SGML
>DTD.  For my prototype, I opened the document in ArborText's Adept SGML
>editor and saved it as XML.  This made it well formed and escaped some '<'
>and '>' characters that were not markup.  Unfortunately, it took the line
>feeds out of some pre-formatted text.
>
>I figured that was not a problem because I wanted to have Python read in
>the native SGML anyway.  Then I briefly read the sgmllib docs; the part
>about it not supporting full SGML, just whatever HTML needs.
>
>Could someone tell me if there is a way to read in a non-well formed SGML
>document, with preformatted text into a DOM tree?
>
>Thanks in advance,
>Jeff
>
>
>
>_______________________________________________
>XML-SIG maillist  -  XML-SIG@python.org
>http://www.python.org/mailman/listinfo/xml-sig
>
>
</Sean>
SELUR NOHTYP


From Jeff.Johnson@icn.siemens.com  Wed Dec  2 15:11:34 1998
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Wed, 2 Dec 1998 10:11:34 -0500
Subject: [XML-SIG] SGML to DOM?
Message-ID: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com>


I was lucky enough to have found SX yesterday and within 20 minutes had a
working solution :)

I haven't tried ESIS yet but my files look really good after I convert them
to HTML so I think everything is working fine.

I do have a small problem with SX.  My files convert well on Win98 but get
truncated on Win NT.  I'll try to figure out what is wrong (maybe an EOF
character?) and if it's a problem with SX I'll send the info to James
Clark.

Thanks for the help from Sean and Andrew!


Sean Mc Grath <digitome@iol.ie> on 12/02/98 08:49:32 AM

To:   xml-sig@python.org
cc:    (bcc: Jeff Johnson/Customer Service/Siemens_Stromberg-Carlson/US)
Subject:  Re: [XML-SIG] SGML to DOM?


A couple of ideas for you:-

1) Instead of "save-as" from Adept (and loosing the line feeds you mention)
have you tried converting with James Clarks SX utility?

2) If you generate ESIS (James Clark's nsgmls) you can get into DOM in
Python via
the ESIS builder. There will be ESIS event types in your SGML that are
invalid
if your SGML uses features not available in XML. These are easily spotted
owing to the line oriented nature of ESIS.

At 11:05 01/12/98 -0500, you wrote:
>
>
>I sure hope this is a stupid question with an easy answer...
>
>I have a large SGML document that I am converting to HTML and another SGML
>DTD.  For my prototype, I opened the document in ArborText's Adept SGML
>editor and saved it as XML.  This made it well formed and escaped some '<'
>and '>' characters that were not markup.  Unfortunately, it took the line
>feeds out of some pre-formatted text.
>
>I figured that was not a problem because I wanted to have Python read in
>the native SGML anyway.  Then I briefly read the sgmllib docs; the part
>about it not supporting full SGML, just whatever HTML needs.
>
>Could someone tell me if there is a way to read in a non-well formed SGML
>document, with preformatted text into a DOM tree?
>
>Thanks in advance,
>Jeff
>
>
>
>_______________________________________________
>XML-SIG maillist  -  XML-SIG@python.org
>http://www.python.org/mailman/listinfo/xml-sig
>
>
</Sean>
SELUR NOHTYP


_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://www.python.org/mailman/listinfo/xml-sig


From akuchlin@cnri.reston.va.us  Wed Dec  2 15:31:14 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  2 Dec 1998 10:31:14 -0500 (EST)
Subject: [XML-SIG] SGML to DOM?
In-Reply-To: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com>
References: <852566CE.00538F2B.00@li01.lm.ssc.siemens.com>
Message-ID: <13925.23562.658406.15353@amarok.cnri.reston.va.us>

Jeff.Johnson@icn.siemens.com writes:
>I haven't tried ESIS yet but my files look really good after I convert them
>to HTML so I think everything is working fine.

	BTW, last night I checked in a few changes to dom.esis_builder
which add support for a few more ESIS events, but I'm not sure which
ones are important to support.  

	Also, I fiddled a bit with demo/xbel to get handling of Lynx
bookmark files working again, and added a .toxml() method for the
dom.core.Notation class.

	Has anyone tried the 0.5 prerelease yet?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
When one has stopped loving somebody, one feels that he has become someone
else, even though he is still the same person.
    -- Sei Shonagon, _The Pillow Book_


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Dec  2 16:02:03 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 2 Dec 1998 11:02:03 -0500 (EST)
Subject: [XML-SIG] unicode package?
Message-ID: <13925.25723.635465.305860@weyr.cnri.reston.va.us>

  Is the unicode/ directory in the xml tree supposed to be a package?
If so, it needs an __init__.py.  I'd also recommend moving wstrop.*
into that package.
  The other C modules also should be moved into appropriate
directories, and not installed in the site-packages/ directory but
within the xml package at appropriate points.
  I think the C modules should end up being the following modules:

	intl		xml.unicode.intl
	pyexpat		xml.parsers.expat
	sgmlop		xml.parsers._sgmlop
	wstrop		xml.unicode._wstrop

  Note the addition of underscore prefixes for implementation-only
modules.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From akuchlin@cnri.reston.va.us  Wed Dec  2 16:51:26 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  2 Dec 1998 11:51:26 -0500 (EST)
Subject: [XML-SIG] unicode package?
In-Reply-To: <13925.25723.635465.305860@weyr.cnri.reston.va.us>
References: <13925.25723.635465.305860@weyr.cnri.reston.va.us>
Message-ID: <13925.27965.508563.446147@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>  Is the unicode/ directory in the xml tree supposed to be a package?
>If so, it needs an __init__.py.  I'd also recommend moving wstrop.*
>into that package.

	At the moment, no.  The modules in unicode/ all get installed
in site-packages, so once installed, they're not associated with the 
XML code at all.

>  The other C modules also should be moved into appropriate
>directories, and not installed in the site-packages/ directory but
>within the xml package at appropriate points.
>  I think the C modules should end up being the following modules:
>
>	intl		xml.unicode.intl
>	pyexpat		xml.parsers.expat
>	sgmlop		xml.parsers._sgmlop
>	wstrop		xml.unicode._wstrop

	This is a good question; should the Unicode support be
included as a subpackage of xml, or should it be a standalone system
that just happens to come with the XML package?  I can see arguments
for both possibilities; what does everyone think?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Python is an experiment in how much freedom programmers need. Too much freedom
and nobody can read another's code; too little and expressiveness is
endangered.
    -- Guido van Rossum, 13 Aug 1996


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Dec  2 17:15:17 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 2 Dec 1998 12:15:17 -0500 (EST)
Subject: [XML-SIG] unicode package?
In-Reply-To: <13925.27965.508563.446147@amarok.cnri.reston.va.us>
References: <13925.25723.635465.305860@weyr.cnri.reston.va.us>
 <13925.27965.508563.446147@amarok.cnri.reston.va.us>
Message-ID: <13925.30117.718226.570922@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > 	At the moment, no.  The modules in unicode/ all get installed
 > in site-packages, so once installed, they're not associated with the 
 > XML code at all.

  Ugh!

 > 	This is a good question; should the Unicode support be
 > included as a subpackage of xml, or should it be a standalone system
 > that just happens to come with the XML package?  I can see arguments
 > for both possibilities; what does everyone think?

  The xml package should not install *anything* outside the xml
package.  My understanding from the break-out session at IPC7 was that 
the support included in the package is largely a stop-gap solution
until a more general solution for Python 1.6 has been implemented.  At 
that point, the xml.unicode support can be either updated to use the
standard support or removed; which we pick should depend on how much
of the installed base won't be migrating to Python 1.6 quickly.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From digitome@iol.ie  Wed Dec  2 17:18:55 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Wed, 2 Dec 1998 17:18:55 GMT
Subject: [XML-SIG] unicode package?
Message-ID: <199812021718.RAA32292@GPO.iol.ie>

[AMK]
>
>	This is a good question; should the Unicode support be
>included as a subpackage of xml, or should it be a standalone system
>that just happens to come with the XML package?  I can see arguments
>for both possibilities; what does everyone think?
>
I think it should just happen to come with the XML package.
As Unicode support grows, we will see Unicode popping out
of relational databases, bad HTML 4.0 and plain text
files. IOW, lots of other Python modules will want to
make use of it.


</Sean>
SELUR NOHTYP


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Dec  2 17:47:29 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 2 Dec 1998 12:47:29 -0500 (EST)
Subject: [XML-SIG] unicode package?
In-Reply-To: <199812021718.RAA32292@GPO.iol.ie>
References: <199812021718.RAA32292@GPO.iol.ie>
Message-ID: <13925.32049.57046.953765@weyr.cnri.reston.va.us>

Sean Mc Grath writes:
 > I think it should just happen to come with the XML package.
 > As Unicode support grows, we will see Unicode popping out
 > of relational databases, bad HTML 4.0 and plain text
 > files. IOW, lots of other Python modules will want to
 > make use of it.

Sean,
  Since Python will eventually provide these facilities as part of the 
base installation, the support provided by/with the XML pacakge should 
only be at the "global" level if we're sure that the public interfaces 
to these things won't be significantly different.  There's no reason
other packages can't use what we provide, but it needs to be clear
that what's being provided is a stop-gap solution and may behave
differently from what's eventually provided with Python.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From kajiyama@etl.go.jp  Thu Dec  3 03:03:47 1998
From: kajiyama@etl.go.jp (Tamito Kajiyama)
Date: Thu, 3 Dec 98 03:03:47 JST
Subject: [XML-SIG] New xml-0.5 prerelease
In-Reply-To: <13923.63646.128559.398310@amarok.cnri.reston.va.us> (akuchlin@cnri.reston.va.us)
Message-ID: <9812021803.AA26308@etlibs2.etl.go.jp>

"Andrew M. Kuchling" <akuchlin@cnri.reston.va.us> writes:
| Round the loop we go again!  I've put a new prerelease of the XML
| package up at http://www.python.org/sigs/xml-sig/files/ ; look for
| xml-0.5.tgz or .zip.
| 
| 	I really want to announce 0.5, so please try compiling it and
| let me know if it goes smoothly; send me private e-mail even if you
| have no problems, so that I know that people have actually tried it.

I tried xml-0.5.tgz on SunOS 4.1.4_JL.

This OS seems not to have libintl.h so that compiling intl.c failed.
I built the XML package by removing 'intl*' from Makefile.pre.in and
Setup.in.  IMHO, I don't think it's worth supporting SunOS 4.x, since
this version branch seems no longer supported by Sun and many vendors,
and it would be out-of-date in the near future.

Also, Python 1.5.1 seems not to define PySys_WriteStderr, and I had the
following error:

    Python 1.5.1 (#45, Jul 16 1998, 10:46:19)  [GCC 2.7.2.1] on sunos4
    Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
    >>> from xml.sax import saxexts
    >>> parser = saxexts.make_parser()
    ld.so: Undefined symbol: _PySys_WriteStderr

I avoided this error by explicitly giving saxexts.make_parser() a parser
name (e.g. 'xmlproc').

BTW, if the directory $prefix/lib/python1.5/site-packages does not
exist, the installation process simply fails.  How about creating it if
it does not exist as it is for subdirectories?  I wonder if this is a
general installation problem of Python...

Regards,

-- 
KAJIYAMA, Tamito <kajiyama@etl.go.jp>


From akuchlin@cnri.reston.va.us  Wed Dec  2 18:30:59 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  2 Dec 1998 13:30:59 -0500 (EST)
Subject: [XML-SIG] New xml-0.5 prerelease
In-Reply-To: <9812021803.AA26308@etlibs2.etl.go.jp>
References: <13923.63646.128559.398310@amarok.cnri.reston.va.us>
 <9812021803.AA26308@etlibs2.etl.go.jp>
Message-ID: <13925.34498.718079.620238@amarok.cnri.reston.va.us>

Tamito Kajiyama writes:
>I tried xml-0.5.tgz on SunOS 4.1.4_JL.
  <Excellent bug reports deleted>

Thank you very much; these installation issues are all rather serious,
and just the sort of thing that we don't want to have in a formal
release.  I'll work on fixing them tonight, and will try to issue a
new prerelease tomorrow.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
There is no excellent beauty that hath not some strangeness in the proportion.
    -- Francis Bacon, "Of Beauty"


From gstein@lyra.org  Wed Dec  2 21:35:06 1998
From: gstein@lyra.org (Greg Stein)
Date: Wed, 02 Dec 1998 13:35:06 -0800
Subject: [XML-SIG] unicode package?
References: <199812021718.RAA32292@GPO.iol.ie> <13925.32049.57046.953765@weyr.cnri.reston.va.us>
Message-ID: <3665B28A.12C15E2C@lyra.org>

Fred L. Drake wrote:
> 
> Sean Mc Grath writes:
>  > I think it should just happen to come with the XML package.
>  > As Unicode support grows, we will see Unicode popping out
>  > of relational databases, bad HTML 4.0 and plain text
>  > files. IOW, lots of other Python modules will want to
>  > make use of it.
> 
> Sean,
>   Since Python will eventually provide these facilities as part of the
> base installation, the support provided by/with the XML pacakge should
> only be at the "global" level if we're sure that the public interfaces
> to these things won't be significantly different.  There's no reason
> other packages can't use what we provide, but it needs to be clear
> that what's being provided is a stop-gap solution and may behave
> differently from what's eventually provided with Python.

I *very* strongly agree with Fred's position here and in his prior
email. We shouldn't mess around with trying to pretend something is
applicable generally until we're sure that it is right.

Here is a perfect case in point: the existence of _wstrop alone is not
right -- the final implementation should use Unicode object methods, not
external functions.

-g

--
Greg Stein, http://www.lyra.org/


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Dec  2 22:40:09 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 2 Dec 1998 17:40:09 -0500 (EST)
Subject: [XML-SIG] unicode package?
In-Reply-To: <3665B28A.12C15E2C@lyra.org>
References: <199812021718.RAA32292@GPO.iol.ie>
 <13925.32049.57046.953765@weyr.cnri.reston.va.us>
 <3665B28A.12C15E2C@lyra.org>
Message-ID: <13925.49609.390429.961575@weyr.cnri.reston.va.us>

Greg Stein writes:
 > I *very* strongly agree with Fred's position here and in his prior

  Wow, I must be having a good day!  (And to think I spent half of it
in a meeting! ;-)

 > Here is a perfect case in point: the existence of _wstrop alone is not
 > right -- the final implementation should use Unicode object methods, not

  Good point.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From akuchlin@cnri.reston.va.us  Thu Dec  3 15:13:21 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Thu,  3 Dec 1998 10:13:21 -0500 (EST)
Subject: [XML-SIG] xml-0.5pre2 released
Message-ID: <13926.42346.660340.459393@amarok.cnri.reston.va.us>

Here's the second pre-release of the XML package:

	http://www.python.org/sigs/xml-sig/files/xml-0.5pre2.tgz

I've fixed the 1.5.2-ism of PySys_WriteStderr that crept into the
pyexpat module, and also moved the Unicode stuff into xml.unicode, as
argued by Fred and Greg S.; the package should now not install
anything outside of site-packages/xml.  While I was at it, I twiddled
the test suite a bit and moved some files around.

	Dieter Maurer sent me a lengthy list of errors in private
e-mail; most are minor things like broken links and demo programs, so
I'm not sure if I'll do a third pre-release, though I will fix the
problems.  Anyway, please try this new version, and let me know if
anything broke in the process.  I'd still like to do an announcement
tomorrow...

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
On Tuesdays he also wears the blue socks and the grey underwear and counts his
bath towels. He has twenty-five bath towels. But how could anyone survive with
less?
    -- The narrator introduces us to Michael Smith, in ENIGMA #1: "The Lizard,
       The Head, The Enigma"


From jday@csihq.com  Thu Dec  3 15:56:49 1998
From: jday@csihq.com (John Day)
Date: Thu, 03 Dec 1998 10:56:49 -0500
Subject: [XML-SIG] xmlproc/DOM vs. WebL?
Message-ID: <3.0.1.32.19981203105649.00687874@mail.csihq.com>

Hi,

I'm a newbie to both Python and XML, trying to figure
out how it works and how to make it useful for creating
Web agents, concept databases etc.

I have recently stumbled across another scripting language
called "WebL", which seems to be a smallish but elegant
XML/HTML interpreter written in Java. COMPAQ is giving
it away free w/src for non-commercial use:

http://www.research.digital.com/SRC/WebL/

My reason for addressing this group is that I would like to
know how it stacks up against Python/XML. In particular, does
Python have any XML functions that do 'markup algebra' as
described in the WebL docs? How would you compare their
respective capabilities in general? (I'm hoping one you
Python gurus has already looked at WebL).

The WebL script examples for web-crawling and other agent actions are
amazingly small. (On the downside, they seem to run extremely
slowly on my machine).

Perhaps there is some functionality here that could be applied
to Python/XML. My hunch is that this markup algebra stuff could
run a lot faster in Python.

John Day


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Dec  3 16:02:38 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 3 Dec 1998 11:02:38 -0500 (EST)
Subject: [XML-SIG] xml-0.5pre2 released
In-Reply-To: <13926.42346.660340.459393@amarok.cnri.reston.va.us>
References: <13926.42346.660340.459393@amarok.cnri.reston.va.us>
Message-ID: <13926.46622.322046.458881@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Here's the second pre-release of the XML package:

  doc/xml-ref.txt needs to be regenerated since xml-ref.tex has
changed.
  I'll take a look at the updated installation.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From michael@graphion.com  Thu Dec  3 18:25:21 1998
From: michael@graphion.com (Michael Sanborn)
Date: Thu, 03 Dec 1998 10:25:21 -0800
Subject: [XML-SIG] Why is Builder.push() the way it is?
Message-ID: <3666D791.F1252EB0@graphion.com>

I'm new to Python, and am very interested in using the DOM
implementation. So I'm puzzling over something in builder.py, and hope
to have help understanding it.

There's a section in push() like this:

if self.current_element:
    self.current_element.insertBefore(node, None)
elif nodetype in _LEGAL_DOCUMENT_CHILDREN:
    if nodetype == TEXT_NODE:
        if string.strip(node.get_nodeValue()) != "":
        self.document.appendChild(node)
    else:
        self.document.appendChild(node)

Now, as far as I can see from the DOM spec and the definition of
_LEGAL_DOCUMENT_CHILDREN, if nodetype is in _LEGAL_DOCUMENT_CHILDREN,
nodetype will never be equal to TEXT_NODE. I was rather imagining that
this section would read:

if self.current_element:
    if nodetype == TEXT_NODE:
        if string.strip(node.get_nodeValue()) != "":
            self.document.appendChild(node)
    else:
        self.current_element.insertBefore(node, None)
elif nodetype in _LEGAL_DOCUMENT_CHILDREN:
    self.document.appendChild(node)

I expect that I'm mistaken, but I'd like to know why.

As a second, minor question, why does one sometimes use
appendChild(node) and other times insertBefore(node, None)?

Best regards,

Michael Sanborn
Graphion Typesetting


From akuchlin@cnri.reston.va.us  Thu Dec  3 19:52:31 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Thu,  3 Dec 1998 14:52:31 -0500 (EST)
Subject: [XML-SIG] Why is Builder.push() the way it is?
In-Reply-To: <3666D791.F1252EB0@graphion.com>
References: <3666D791.F1252EB0@graphion.com>
Message-ID: <13926.58361.528160.505768@amarok.cnri.reston.va.us>

Michael Sanborn writes:
>Now, as far as I can see from the DOM spec and the definition of
>_LEGAL_DOCUMENT_CHILDREN, if nodetype is in _LEGAL_DOCUMENT_CHILDREN,
>nodetype will never be equal to TEXT_NODE. I was rather imagining that
>this section would read:

	Hmm... Actually, you're not mistaken; that code does look
suspicious.  Thanks for the bug report!  Certainly there's nothing
clever going on the covers that makes that code reasonable.  I'll do
some archaeology in the CVS logs and try to figure out when the
problem crept in, and fix it.  (Maybe not in time for the 0.5 release,
though.)

>As a second, minor question, why does one sometimes use
>appendChild(node) and other times insertBefore(node, None)?

	appendChild(node) actually just calls insertBefore(node,
None), so there's no real difference other than the extra method call.
If you were trying to do something very high-performance, you might
use the latter form just to avoid the extra function call, but for
most uses it doesn't matter.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I suppose I had vaguely hoped that you had changed, my brother. That you'd
noticed that there were other people in the world. That you had begun to see
people as other than things that dream, as creatures of stories.
    -- Destruction to Dream, in SANDMAN #48: "Brief Lives:8"


From dkuhlman@enterpriselink.com  Fri Dec  4 20:07:27 1998
From: dkuhlman@enterpriselink.com (Dave Kuhlman)
Date: Fri, 04 Dec 1998 12:07:27 -0800
Subject: [XML-SIG] Installing and Test xml-0.5pre2
Message-ID: <366840FF.3EDAA9CC@EnterpriseLink.com>

xml-0.5pre2 looks very good to me.

I installed and tested under Linux Debian 2.0.
I'm using Python 1.5.1.

I had no problems compiling and installing.

In case it is not obvious, I'm extremely grateful for the work you
have done in support of XML for Python.  I really believe that
Python is going to be one of the best tools for processing XML. 
And, you people are making it happen.  Thanks.

Here are some notes about changes I made when running the demos. 
Read these with some skepticism.  Please don't spend too much time
replying to my comments.  I'm happier when you're fixing the code,
and I need to learn some of this for myself.

In demo/quotes/qtfmt.py, I changed:

19c19,21
< import wstring, iso8859  # For fixing UTF-8 encoding
---
> from xml.unicode import wstring
> from xml.unicode import iso8859
> 
353c355
< 
p=xml.sax.saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")
---
> 	p=xml.sax.saxexts.XMLParserFactory.make_parser("pyexpat")


This probably should have been fixed by setting something in my 
environment.  (And, I believe that you fixed the path to pyexpat in 
pre3.) You might tell how to set up for these tests in the README in 
demo/quotes.

Whoa!  I just saw a xml-0.5pre3 on the download site.  You guys work 
fast. I downloaded it and installed it on my WinNT 4.0 machine here 
at work.  (I only get to use Linux at home; here at work the 
un-enlightened make me use WinNT.)  Again, with Python 1.5.1. Here 
is what I did to make it work:

1. Unzip the .zip file in a directory named, say, C:\Python\Test.
It created a sub-directory C:\Python\Test\xml-0.5.

2. Rename directory "xml-0.5" to "xml" (because on my Linux 
machine, that's the name of the sub-directory it looks for under 
site-packages, so I guessed that this is the PYTHONPATH we need).

3. Create and run a batch file set_envir.bat containing the 
following:

    set PATH=C:\Python\Test\xml-0.5\windows;%PATH%
    set PYTHONPATH=C:\Python\Test;%PYTHONPATH%

4. Run some demos.

You should consider including a README.windows file containing the 
above instructions or the correct ones if mine are wrong. (And it
doesn't quite work for pyexpat.  See below.)

A comment on SAX drivers -- Are all the files in 
site-packages/xml/SAX/drivers that begin with "drv_" supposed to be 
SAX drivers?  There were several that didn't work when I gave them 
as arguments to saxtimer.py.  Testing on WinNT 4.0, now.  
Specifically I got the error message "ERROR: Parser not available" 
when I tried to use:

    xmltoolkit
    xmldc
    sgmlop
    pyexpat

The following SAX drivers worked:

    xmllib
    sgmllib
    xmlproc

I can't get pyexpat to load.  This fails in demo/quotes/qtfmt.py and 
demo/sax/saxtimer.py.  I'm guessing that it has something to do with 
my path or PYTHONPATH, but I have not figured out what.  I have to 
spend more time looking at rec_find_module in saxexts.py, I suppose.

I'd like to see a few more notes (in README files) on how to run 
each of the demos.  Also, I'd like a few notes on how to set up my 
environment to run the demos.  I looked at some of the stuff in the 
doc directory, but have not had time to read it thoroughly. When I 
do, maybe my questions will be answered.

Many thanks again.

  -- Dave

-- 
Dave Kuhlman
EnterpriseLink Technology Corp
http://www.enterpriselink.com
2542 S. Bascom Ave., Suite #203
Campbell, CA 95008
dkuhlman@EnterpriseLink.com
408-558-2011


From Jeff.Johnson@icn.siemens.com  Fri Dec  4 20:51:30 1998
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Fri, 4 Dec 1998 15:51:30 -0500
Subject: [XML-SIG] Installing and Test xml-0.5pre2
Message-ID: <852566D0.00727B2B.00@li01.lm.ssc.siemens.com>


> I can't get pyexpat to load.

The way I do it for Windows is...

copy xml/windows/pyexpat.dll to xml/parsers
copy xml/expat/bin/xmlparse.dll to somewhere in path
copy xml/expat/bin/xmltok.dll to somewhere in path

For a while, the pyexpat.dll in CVS was corrupted but I think its been
fixed.  I'm not sure where xmlparse.dll and xmltok.dll should be but as
long as they are in a directory in your PATH, they will be used.


That's all I can remember, I hope that covers it.


From akuchlin@cnri.reston.va.us  Fri Dec  4 21:55:57 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri,  4 Dec 1998 16:55:57 -0500 (EST)
Subject: [XML-SIG] Eliminating whitespace
Message-ID: <13928.20594.800816.604650@amarok.cnri.reston.va.us>

A common task when processing a document using the DOM is to strip out
unnecessary whitespace.  I'd definitely like to have a function or set
of functions to do this, and would like to discuss what the interface
should look like.

The problem: given a DOM tree, you want to remove whitespace from it.
There are several dimensions to the problem:

	* Delete whitespace, or collapse it down to a single space?

	* Just act on Text nodes that are all whitespace?  Or act on
Text nodes with leading, trailing, or internal whitespace?  (If acting
on internal whitespace, you'll probably be collapsing down to a single
space, not deleting everything.  Though who knows?)

	Anyway, I don't think there's any call for making elaborate
whitespace-deleting classes that can be customized in various ways.
So, how about a function (or method on dom.core.Node?).  Strawman
interface:

normalize_whitespace( DOMtree, 
      collapse = [true | false] default false,
      inside_node = [true | false] default false,
      where = LEFT, RIGHT, INSIDE, or a bitwise OR of these flags
	      Default = all of them
)	

Examples:

normalize_whitespace( DOMtree )   Drop all whitespace-only nodes
    
normalize_whitespace( DOMtree, 1, 1 )   Collapse all runs of
					whitespace down to single spaces

normalize_whitespace( DOMtree, 1, 1, LEFT | RIGHT ) 
	Strip trailing and leading whitespace from all Text nodes

I have a sneaking feeling that there's one argument too many in that
function, and it could be made more compact somehow, but can't think
of anything definite.  Anyone got suggestions?  (Where's Tim Peters
when you need him?)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "I'll be curious to see what he thinks Hell is."
    "Garn, I hope he ain't British. Some of that stuff them people dream up...
it's enough to gag a maggot."
    -- Demons awaiting Stanley's arrival in Hell in STANLEY AND HIS MONSTER #4


From akuchlin@cnri.reston.va.us  Fri Dec  4 22:13:27 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri,  4 Dec 1998 17:13:27 -0500 (EST)
Subject: [XML-SIG] pre3 -> 0.5
Message-ID: <13928.24060.914543.362950@amarok.cnri.reston.va.us>

I've copied xml-0.5pre3 and renamed it to xml-0.5, the final package.
The difference between pre2 and pre3 is simply fixing some minor nits,
and not compiling the intl module by default.  I really don't want to
mess around with pre-releases anymore, so, if you try it out and find
some hideous bug, let me know at amk1@erols.com .  Otherwise, I'll
write up an announcement and start sending it out over the weekend.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The universe may / be as great as they say. / But it wouldn't be missed / if
it didn't exist.
    -- Piet Hein


From mazito@softlab.com.ar  Fri Dec  4 22:15:29 1998
From: mazito@softlab.com.ar (Mario A. Zito)
Date: Fri, 04 Dec 1998 19:15:29 -0300
Subject: [XML-SIG] On going projects (and our own project)
Message-ID: <36685F01.D67A5EF6@softlab.com.ar>

I am new to XML, this list and to Python. It would be of interest to me
(an perhaps to other list members) if more senior members can describe
the ongoing projects they plan to use XML and Python for, so that more
junior members (like me) can get an idea of the different ways of using
this combination.

In particular, I am planning to use it to construct an integrated mail
based  defect tracking, project managment and distributed version
control system. All mails will be submitted as XML documents, and a
Python based mail server will parse them and take the needed actions,
correlate versions with defects, send bug reports to the right developer
(also as XML docs), save the bugs in a database, store new versions in
CVS, generate project status reports, etc.

This will be our first Python/XML project, and (we expect) it will be
able to support our own development projects. Our idea is to try to
assemble it (as much as possible) from already existent software (such
as CVS), and concentrate on the mail processor. And of course, gain
working experience with Python and XML by the way.

If we come out with something that (really) works, and others are
interested in it, we may put it for public use.

Any ideas, suggestions or any type of collaboration will be welcomed.
If someone is interested on them, I can post to this list our proposed
DTDs as the evolve.

If someone objects to this, please let me know so I don't break any
explicit or implicit rules (maybe this must be in some other list ?)

Thanks.

Mario A. Zito
SoftLab SRL


From fleck@informatik.uni-bonn.de  Sat Dec  5 13:18:22 1998
From: fleck@informatik.uni-bonn.de (Markus Fleck)
Date: Sat, 05 Dec 1998 14:18:22 +0100
Subject: [XML-SIG] On going projects (and our own project)
References: <36685F01.D67A5EF6@softlab.com.ar>
Message-ID: <3669329E.57@informatik.uni-bonn.de>

Mario A. Zito wrote:
> In particular, I am planning to use it to construct an integrated mail
> based  defect tracking, project managment and distributed version
> control system. All mails will be submitted as XML documents, and a
> Python based mail server will parse them and take the needed actions,
> correlate versions with defects, send bug reports to the right developer
> (also as XML docs), save the bugs in a database, store new versions in
> CVS, generate project status reports, etc.

Cool. The "GNU Gather" project will use Python and the Roxen WWW server
to create a WebDAV/RTSP-based groupware framework, including applications
like issue tracking and "knowledge database" management.

We don't have any release schedule yet; in fact, the programming for
"GNU Gather" hasn't even started. So while you might want to subscribe
yourself to the "GNU Gather" initial announcements mailing list (see my
.sig for URL), "GNU Gather" is probably a bit to heavyweight (and will
take too long to finish) if all you want at the moment is "just" a mail
tracking system.

BTW, there's a web page about bug tracking and problem management tools
for Linux at <http://linas.org/linux/pm.html>.

Yours,
Markus.

-- 
////////////////////////////////////////////////////////////////////////////
   Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079
          UNIX Administrator - comp.lang.python.announce Moderator
   "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


From larsga@ifi.uio.no  Sat Dec  5 21:33:42 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 05 Dec 1998 22:33:42 +0100
Subject: [XML-SIG] Installing and Test xml-0.5pre2
In-Reply-To: <366840FF.3EDAA9CC@EnterpriseLink.com>
References: <366840FF.3EDAA9CC@EnterpriseLink.com>
Message-ID: <wkyaomxoi1.fsf@ifi.uio.no>

* Dave Kuhlman
| 
| A comment on SAX drivers -- Are all the files in 
| site-packages/xml/SAX/drivers that begin with "drv_" supposed to be 
| SAX drivers? 

Yes, although drv_xmltok is for XMLTok (the older version of expat).
The rest are common libraries shared between different drivers.

| There were several that didn't work when I gave them as arguments to
| saxtimer.py.  Testing on WinNT 4.0, now.  Specifically I got the
| error message "ERROR: Parser not available" when I tried to use:
| 
|     xmltoolkit
|     xmldc
|     sgmlop
|     pyexpat

Well, you need to have the parsers installed. They work for me, but if
they don't for you I'm very interested in hearing about it. Could you
check if they're installed so you can load the parser from the command
line and let me know how it works out?
 
| I can't get pyexpat to load.  This fails in demo/quotes/qtfmt.py and
| demo/sax/saxtimer.py.  I'm guessing that it has something to do with
| my path or PYTHONPATH, but I have not figured out what.  I have to
| spend more time looking at rec_find_module in saxexts.py, I suppose.

Try importing it from the command-line. If that works and
rec_find_module does not, then please send me a bug report and I'll
fix it.

(Oh, and thanks for giving us some feedback.)

--Lars M.


From akuchlin@cnri.reston.va.us  Sun Dec  6 16:47:14 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 6 Dec 1998 11:47:14 -0500
Subject: [XML-SIG] Proposed announcement
Message-ID: <199812061647.LAA26113@207-172-39-232.s232.tnt10.ann.erols.com>

To be sent to: c.l.py.announce, comp.text.xml, xml-dev, www-dom, 
	       announcement on freshmeat.net,
	       other suggestions?

==================
Version 0.5 of the Python/XML distribution can be downloaded from 
	http://www.python.org/sigs/xml-sig/files/xml-0.5.tgz

The Python/XML distribution contains the basic tools required for
processing XML data using the Python programming language, assembled
into one easy-to-install package.  The distribution includes parsers
and standard interfaces such as SAX and DOM, along with various other
useful modules.  Version 0.5 can be considered a beta release.

Major changes in this version:
	* The DOM implementation has been extensively modified, and is
now much closer to compliance with the DOM Recommendation.  

	* A Unicode type has been added as the subpackage xml.unicode.wstring. 

	* Various subpackages have been upgraded to their most recent versions.

The package currently contains:

	* XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius
Garshol), xmllib.py (Sjoerd Mullender) using the sgmlop.c accelerator
module (Fredrik Lundh).

	* SAX interface (Lars Marius Garshol)
	* DOM interface (Stefane Fermigier, A.M. Kuchling)
	* xmlarch.py, for architectural forms processing (Geir Ove Gr�nmo)
	* Unicode wide-string module (Martin von L�wis)
	* Various utility modules and functions (various people)
	* Documentation and example programs (various people)

The code is being developed bazaar-style by contributors from the
Python XML Special Interest Group, so please send comments, questions,
or bug reports to <xml-sig@python.org>.

For general information about Python, see:
	http://www.python.org
The Python XML-SIG home page is:
	http://www.python.org/sigs/xml-sig/

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Problems worthy of attack / prove their worth by hitting back.
    -- Piet Hein


From akuchlin@cnri.reston.va.us  Sun Dec  6 22:05:23 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 6 Dec 1998 17:05:23 -0500
Subject: [XML-SIG] XML and Zope
Message-ID: <199812062205.RAA26917@207-172-56-151.s151.tnt12.ann.erols.com>

As part of experimenting with Zope, I wanted to create a new tag under
DocumentTemplate, and chose to create one that formatted some XML; I
re-used my quotation formatting code, which made the job pretty
trivial.  It wasn't too hard to do, and you can see some notes on it
at http://starship.skyport.net/crew/amk/zope/new-tag.html .

As an example, in a DTML document I can now put:

<!--#quotation-->
  <quotation>
   The days come and go... <source>Ralph Waldo Emerson</source>
  </quotation>
<!--#/quotation-->

The tag will convert the fragment of XML it contains into HTML; more
realistically, the content would come from a database query or some
other source, and be present as a variable, something like:

<!--#quotation-->
  <!--#var text-->
<!--#/quotation-->

This is a very simple example, of course.  What could we do that would
be more general and more useful?  An XSL styler (think of it: <!--#xsl
stylesheet="mine.xsl"--> ...) would be an obvious prospect, but would
also be a sizable job.  Is there something smaller that would be
easier to implement, but still useful for someone?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Barney turned his little squinty blue eyes on me. "We go to the garrick now
and become warbs," he said. "The hell we do!" I thought to myself quickly.
    -- James Thurber, "The Black Magic of Barney Haller", in _The Thurber
       Carnival_


From kajiyama@etl.go.jp  Mon Dec  7 09:34:34 1998
From: kajiyama@etl.go.jp (Tamito Kajiyama)
Date: Mon, 7 Dec 98 09:34:34 JST
Subject: [XML-SIG] Proposed announcement
In-Reply-To: <199812061647.LAA26113@207-172-39-232.s232.tnt10.ann.erols.com> (amk1@erols.com)
Message-ID: <9812070034.AA05534@etlibs2.etl.go.jp>

"A.M. Kuchling" <amk1@erols.com> writes:
| To be sent to: c.l.py.announce, comp.text.xml, xml-dev, www-dom, 
| 	       announcement on freshmeat.net,
| 	       other suggestions?

Please excuse me sending a problem report at the last moment of the
final release (the wide range of the distinations of the proposed
announcement reminds me ;).

On SunOS 4.1.4_JL, the pyexpat module fails because of a call of an
undefined procedure at runtime.  The fix is simple: running ranlib on
expat/libexpat.a before linking to pyexpat.so.  And, here is a trivial
patch:

*** expat/Makefile.orig Sun Dec  6 01:02:48 1998
--- expat/Makefile      Sun Dec  6 01:01:56 1998
***************
*** 40,43 ****
--- 40,44 ----
  
  libexpat.a: $(OBJS)
        ar cr libexpat.a $(OBJS)
+       ranlib libexpat.a
  
I don't know if this problem happen on platforms other than SunOS.

Regards,

-- 
KAJIYAMA, Tamito <kajiyama@etl.go.jp>


From fleck@informatik.uni-bonn.de  Mon Dec  7 10:41:08 1998
From: fleck@informatik.uni-bonn.de (Markus Fleck)
Date: Mon, 07 Dec 1998 11:41:08 +0100
Subject: [XML-SIG] Pointer: "SPIN_py - SGML Parser Integration
 Project"
 Project"
Message-ID: <366BB0C4.105B@informatik.uni-bonn.de>

Hi!

32BITSONLINE has an article about SPIN_py:

> SPIN_py - SGML Parser Integration Project 
> [...]
> SPIN is an interface to SP. It delivers edge
> events from SP to your script directly from
> the C++ API to your Python script.

URL: <http://www.32bitsonline.com/news.php3?news=news/199811/linux/lx199811301&page=1>

Greets,
Markus.

-- 
////////////////////////////////////////////////////////////////////////////
   Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079
          UNIX Administrator - comp.lang.python.announce Moderator
   "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


From SHunting@goSPS.com  Mon Dec  7 15:23:19 1998
From: SHunting@goSPS.com (Hunting, Sam)
Date: Mon, 7 Dec 1998 10:23:19 -0500
Subject: [XML-SIG] Parameter entity visualization tool
Message-ID: <518E520AF877D111B58100A0C9920BF527B060@SPS01>

In python, is there such a thing as a parameter entity visualization tool,
that would show how content model "building blocks" work? This would seem to
be very useful in the context of understanding, maintaining, configuring,
extending DTDs/schemas like Voyager
(http://www.w3.org/TR/1998/WD-html-in-xml-19981205/) and of course the usual
suspects like TEI and docbook

<!ELEMENT (%paragraph.stuff)>

%paragraph.stuff
	#PCDATA
	| %this.stuff
		 a
		|b 
            | %that.stuff
		c
	           |d
            | %the.other.stuff
	          ""
I envision it working like a collapsible outliner, but a printout would be
fine too.

            
From p_schneider1@yahoo.com  Tue Dec  8 17:47:52 1998
From: p_schneider1@yahoo.com (Paul Schneider)
Date: Wed, 9 Dec 1998 04:47:52 +1100 (EST)
Subject: [XML-SIG] XML for Windows
Message-ID: <19981208174752.5216.rocketmail@send105.yahoomail.com>

Hi there!

I just downloaded and unpacked the xml-package
xml-0_5pre3.zip. The makefile supplied to compile
and install it is only for UNIX. 

-Is there a different pachage for NT? 
-How do I get the package running under Windows NT?

Paul


_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com


From jday@csihq.com  Wed Dec  9 00:19:58 1998
From: jday@csihq.com (John Day)
Date: Tue, 08 Dec 1998 19:19:58 -0500
Subject: [XML-SIG] xml install problems
Message-ID: <3.0.1.32.19981208191958.00692914@mail.csihq.com>

Hi,

I'm a Python and Xml newbie, having problems installing the latest
xml-0.5 under Linux.

I did not have Python installed, so I got the latest Python
1.5.1 from www.python.org and installed it with prefix=/home/jday
It seemed to install OK and make test ran OK.


Then I unzipped the xml-0.5 into /home/jday/xml/xml-0.5/ and
did:
  make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/home/jday
  make
  make install

The make seemed to make and install everything OK, _but_ 6 out of 7 tests
failed:


jday@medusa:/home/jday/xml/xml-0.5> make test
cd test ; PYTHONPATH=.. python testxml.py
test_arch
test test_arch skipped -- an optional feature could not be imported
test_dom
test test_dom skipped -- an optional feature could not be imported
test_pyexpat
test_sax
test test_sax skipped -- an optional feature could not be imported
test_unicode
test test_unicode skipped -- an optional feature could not be imported
test_utils
test test_utils skipped -- an optional feature could not be imported
test_xmllib
test test_xmllib skipped -- an optional feature could not be imported
1 test OK.
6 tests skipped: test_arch test_dom test_sax test_unicode test_utils
test_xmllib

jday@medusa:/home/jday/xml/xml-0.5/test> python test_arch.py
Traceback (innermost last):
  File "test_arch.py", line 6, in ?
    from xml.sax import saxexts, saxlib, saxutils
ImportError: No module named xml.sax

PYTHONPATH was not defined so I tried
 setenv PYTHONPATH /home/jday/lib/python1.5/
with no improvement.

I know very little about python and your XML implementations. What
am I doing wrong?

Thanks,
John Day
Palm Bay, Florida 


From betty@eccnet.eccnet.com  Wed Dec  9 01:33:23 1998
From: betty@eccnet.eccnet.com (Betty Harvey)
Date: Tue, 8 Dec 1998 20:33:23 -0500 (EST)
Subject: [XML-SIG] xml install problems
In-Reply-To: <3.0.1.32.19981208191958.00692914@mail.csihq.com>
Message-ID: <Pine.LNX.3.96.981208202935.5713A-100000@eccnet.eccnet.com>

John:

I had similar problems but much earlier.  I tried installing
Python on Linux 5.0.  The Makefile.pre.in worked just fine,
however, when I tried the 'make' I got the following
error:

gcc -fPIC -O -I/usr/include/python1.4 -I/usr/include/python1.4
-DHAVE_CONFIG_H
-Iexpat/xmlparse -c ./pyexpat.c
./pyexpat.c: In function `mywrite':
./pyexpat.c:64: void value not ignored as it ought to be
make: *** [pyexpat.o] Error 1                                    

Betty


On Tue, 8 Dec 1998, John Day wrote:

> Hi,
> 
> I'm a Python and Xml newbie, having problems installing the latest
> xml-0.5 under Linux.
> 
> I did not have Python installed, so I got the latest Python
> 1.5.1 from www.python.org and installed it with prefix=/home/jday
> It seemed to install OK and make test ran OK.
> 
> 
> Then I unzipped the xml-0.5 into /home/jday/xml/xml-0.5/ and
> did:
>   make -f Makefile.pre.in Makefile VERSION=1.5 installdir=/home/jday
>   make
>   make install
> 
> The make seemed to make and install everything OK, _but_ 6 out of 7 tests
> failed:
> 
> 
> jday@medusa:/home/jday/xml/xml-0.5> make test
> cd test ; PYTHONPATH=.. python testxml.py
> test_arch
> test test_arch skipped -- an optional feature could not be imported
> test_dom
> test test_dom skipped -- an optional feature could not be imported
> test_pyexpat
> test_sax
> test test_sax skipped -- an optional feature could not be imported
> test_unicode
> test test_unicode skipped -- an optional feature could not be imported
> test_utils
> test test_utils skipped -- an optional feature could not be imported
> test_xmllib
> test test_xmllib skipped -- an optional feature could not be imported
> 1 test OK.
> 6 tests skipped: test_arch test_dom test_sax test_unicode test_utils
> test_xmllib
> 
> jday@medusa:/home/jday/xml/xml-0.5/test> python test_arch.py
> Traceback (innermost last):
>   File "test_arch.py", line 6, in ?
>     from xml.sax import saxexts, saxlib, saxutils
> ImportError: No module named xml.sax
> 
> PYTHONPATH was not defined so I tried
>  setenv PYTHONPATH /home/jday/lib/python1.5/
> with no improvement.
> 
> I know very little about python and your XML implementations. What
> am I doing wrong?
> 
> Thanks,
> John Day
> Palm Bay, Florida 
> 
> 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
> 


From akuchlin@cnri.reston.va.us  Wed Dec  9 03:48:56 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Tue, 8 Dec 1998 22:48:56 -0500
Subject: [XML-SIG] Whitespace stripping functions
Message-ID: <199812090348.WAA03586@207-172-46-251.s251.tnt9.ann.erols.com>

I've added a dom.utils module for small utility functions for the DOM
and plan to check it into the CVS tree.  cvs.python.org is
inaccessible for some reason, so a copy is appended below.

It implements tree_print(), strip_whitespace(), and
collapse_whitespace().  tree_print() is intended for debugging, and
returns a string showing the tree structure of a DOM subtree.
strip_whitespace() removes leading/trailing/both whitespace in-place
from a DOM tree, and collapse_whitespace() folds runs of whitespace
into a single space.  

     Comments, suggestions?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I spent a busy day today, but got little done. This is because I am at last
becoming perfect in the art of seeming busy, even when very little is going on
in my head or under my hands. This is an art which every man learns, if he
does not intend to work himself to death.
    -- Robertson Davies, _The Table Talk of Samuel Marchbanks_

# utils.py

import re
from xml.dom import core

# Various utility functions that are often handy.

def tree_print(node, indent = 0):
    """Print a representation of a tree that makes the tree structure explicit.
    Intended mostly for debugging use, so it's a lossy printout."""
    s = indent*' ' + repr(node) + '\n'
    for n in node.get_childNodes():
        s = s + tree_print(n, indent + 2)
    return s
    
# this should grow up into a general-purpose whitespace post-processor,
# options to include:
#   - whether to strip (s/\s+//) or collapse (s/\s+/ /)
#   - where to do it: head, tail, or interior of text nodes, or
#                     all-whitespace nodes only
# Initial implementation by Greg Ward; modified and collapse_whitespace added
# by AMK.

import string
WS_LEFT, WS_BOTH, WS_RIGHT, WS_INTERNAL = [1,2,3,4]

strip_func = {WS_LEFT: string.lstrip,
              WS_BOTH: string.strip,
              WS_RIGHT: string.rstrip }

collapse_pat = {WS_LEFT: '^\s+',
                WS_BOTH: '(^\s+)|(\s+$)',
                WS_RIGHT: '\s+$',
                WS_INTERNAL: '\s+'}
                
def strip_whitespace (node, func = WS_BOTH):
    """Remove leading and/or trailing whitespace from a DOM tree.
    node -- top node; its subtree will be traversed
    func -- one of WS_LEFT, WS_RIGHT, WS_BOTH telling which whitespace to strip
    """
    if func == WS_INTERNAL:
        raise ValueError, "WS_INTERNAL not acceptable value for strip_whitespace()"
    func = strip_func[func]
    if node.nodeType == core.DOCUMENT_NODE:
        node = node.documentElement

    stack = [node]

    while (stack):
        # get the top node from the stack
        node = stack[-1]
        # XXX a general-purpose "visit" operation could go right here

        # walk this node's list of children, deleting those that are
        # all whitespace and saving the rest to be pushed onto the stack
        children = []
        for child in node.childNodes[:] :
            if child.nodeType == core.TEXT_NODE:
                orig = child.get_nodeValue()
                v = func( orig )
                if v == "":
                    node.removeChild (child)
                elif v != orig:
                    child.set_nodeValue( v )
            elif child.hasChildNodes():
                children.append (child)
        children.reverse()
        stack[-1:] = children
        
    # end: while stack not empty

# end strip_whitespace

def collapse_whitespace (node, func = WS_BOTH):
    """Collapse runs of whitespace down to a single space.
    
    node -- top node; its subtree will be traversed
    func -- one of WS_LEFT, WS_RIGHT, WS_BOTH, WS_INTERNAL telling which
            whitespace should be collapsed.  
    """
    pat = collapse_pat[ func ]
    pat = re.compile( pat )
    if node.nodeType == core.DOCUMENT_NODE:
        node = node.documentElement

    stack = [node]

    while (stack):
        # get the top node from the stack
        node = stack[-1]
        # XXX a general-purpose "visit" operation could go right here

        # walk this node's list of children, deleting those that are
        # all whitespace and saving the rest to be pushed onto the stack
        children = []
        
        for child in node.childNodes[:] :
            if child.nodeType == core.TEXT_NODE:
                orig = child.get_nodeValue()
                v = pat.sub(' ', orig)
                if v != orig:
                    child.set_nodeValue( v )
            elif child.hasChildNodes():
                children.append (child)
        children.reverse()
        stack[-1:] = children
        
    # end: while stack not empty

# end collapse_whitespace


From jday@csihq.com  Wed Dec  9 11:15:57 1998
From: jday@csihq.com (John Day)
Date: Wed, 09 Dec 1998 06:15:57 -0500
Subject: [XML-SIG] xml install problems
Message-ID: <3.0.1.32.19981209061557.006e5384@mail.csihq.com>

Hi,

I wrote yesterday that the Python and xml-0.5 installs proceeded
without error, yet I could not run the xml-0.5 test files. I'm still
trying to pinpoint the exact problem.

My problem seems to be that Python1.5 can't see the xml site-package.
It exists and seems to contain everything:
jday@medusa:/home/jday/lib/python1.5/site-packages/xml> dir
total 14
drwxr-xr-x   8 jday     csi          1024 Dec  8 19:11 ./
drwxr-xr-x   3 jday     csi          1024 Dec  5 06:24 ../
-rw-r--r--   1 jday     csi            37 Dec  8 19:11 __init__.py
-rw-r--r--   1 jday     csi           175 Dec  8 19:11 __init__.pyc
-rw-r--r--   1 jday     csi           169 Dec  8 19:11 __init__.pyo
-rw-r--r--   1 jday     csi           427 Dec  8 19:11 _checkversion.py
-rw-r--r--   1 jday     csi           654 Dec  8 19:11 _checkversion.pyc
-rw-r--r--   1 jday     csi           621 Dec  8 19:11 _checkversion.pyo
drwxrwxr-x   2 jday     csi          1024 Dec  8 19:11 arch/
drwxrwxr-x   2 jday     csi          1024 Dec  8 19:11 dom/
drwxrwxr-x   3 jday     csi          1024 Dec  8 19:11 parsers/
drwxrwxr-x   3 jday     csi          1024 Dec  8 19:11 sax/
drwxrwxr-x   2 jday     csi          1024 Dec  8 19:11 unicode/
drwxrwxr-x   2 jday     csi          1024 Dec  8 19:11 utils/

But when I run python I can't import any of them:

jday@medusa:/home/jday/lib/python1.5/site-packages/xml> python
Python 1.5.1 (#1, Dec  8 1998, 18:51:08)  [GCC 2.8.1] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import xml.sax
Traceback (innermost last):
  File "<stdin>", line 1, in ?
ImportError: No module named xml.sax

Python obviously can't see the xml package. I've got python in my home
bin and I've set PYTHONPATH to /home/jday/lib/python1.5/ What else is
there to do?

I'm new to Python so I don't really know the basic mechanism for installing 
these packages? I can't find it in any of the docs. (I'm guessing this is a 
simple problem to fix :-)

Thanks,

John Day


From fredrik@pythonware.com  Wed Dec  9 11:25:34 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 9 Dec 1998 12:25:34 +0100
Subject: [XML-SIG] xml install problems
Message-ID: <001301be2366$a42b4c90$f29b12c2@pythonware.com>

>Python obviously can't see the xml package. I've got python in my home
>bin and I've set PYTHONPATH to /home/jday/lib/python1.5/ What else is
>there to do?

try this:

$ python
>>> import sys
>>> sys.path

this prints a list of all entries added to the python path.

by the way, does "import sax" work ?

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From jday@csihq.com  Wed Dec  9 12:16:52 1998
From: jday@csihq.com (John Day)
Date: Wed, 09 Dec 1998 07:16:52 -0500
Subject: [XML-SIG] xml install problems
In-Reply-To: <001301be2366$a42b4c90$f29b12c2@pythonware.com>
Message-ID: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com>

You wrote:
.....
>try this:
>
>$ python
>>>> import sys
>>>> sys.path
>
>this prints a list of all entries added to the python path.
>
>by the way, does "import sax" work ?
>
.....................................................
Here's my sys.path (thanks, I didn't know about this):

>>> import sys
>>> for i in sys.path: print i
...
/home/jday/bin/lib/python1.5/
/home/jday/bin/lib/python1.5/test
/home/jday/bin/lib/python1.5/plat-linux2
/home/jday/bin/lib/python1.5/lib-tk
/home/jday/bin/lib/python1.5/lib-dynload
>>> import sax
Traceback (innermost last):
  File "<stdin>", line 1, in ?
ImportError: No module named sax
.......................................................

Now I'm confused. None of the above packages are members
of the "site-packages" directory. (xml is the only entry)
I assumed "site-packages" would list _all_ installed pythonic
packages. 


-jday


From kajiyama@etl.go.jp  Wed Dec  9 23:49:21 1998
From: kajiyama@etl.go.jp (Tamito Kajiyama)
Date: Wed, 9 Dec 98 23:49:21 JST
Subject: [XML-SIG] xml install problems
In-Reply-To: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com> (message from John Day on Wed, 09 Dec 1998 07:16:52 -0500)
Message-ID: <9812091449.AA11121@etlibs2.etl.go.jp>

John Day <jday@csihq.com> writes:
| 
| >>> import sys
| >>> for i in sys.path: print i
| ..
| /home/jday/bin/lib/python1.5/
| /home/jday/bin/lib/python1.5/test
| /home/jday/bin/lib/python1.5/plat-linux2
| /home/jday/bin/lib/python1.5/lib-tk
| /home/jday/bin/lib/python1.5/lib-dynload
| 
| Now I'm confused. None of the above packages are members
| of the "site-packages" directory. (xml is the only entry)
| I assumed "site-packages" would list _all_ installed pythonic
| packages. 

Each element of sys.path is a directory Python searches modules.  See
Section 3.1 of the Python Library Manual for more information about
`sys.path' (http://www.python.org/doc/lib/module-sys.html).

BTW, in your message <3.0.1.32.19981208191958.00692914@mail.csihq.com>,
you said you installed Python 1.5.1 with prefix=/home/jday.  So, the
directories listed in sys.path should be

  /home/jday/lib/python1.5/
  /home/jday/lib/python1.5/test
  /home/jday/lib/python1.5/plat-linux2

and so on.  Also, the directory /home/jday/lib/python1.5/site-packages
should be in sys.path if you don't run Python with the -S option.

I believe it is an installation problem of Python, not the XML package.
I can't understand the cause of your problem.  How about posting a
message to comp.lang.python?

-- 
KAJIYAMA, Tamito <kajiyama@etl.go.jp>


From akuchlin@cnri.reston.va.us  Wed Dec  9 15:05:18 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  9 Dec 1998 10:05:18 -0500 (EST)
Subject: [XML-SIG] xml install problems
In-Reply-To: <Pine.LNX.3.96.981208202935.5713A-100000@eccnet.eccnet.com>
References: <3.0.1.32.19981208191958.00692914@mail.csihq.com>
 <Pine.LNX.3.96.981208202935.5713A-100000@eccnet.eccnet.com>
Message-ID: <13934.36700.619892.696779@amarok.cnri.reston.va.us>

Betty Harvey writes:
>I had similar problems but much earlier.  I tried installing
>Python on Linux 5.0.  The Makefile.pre.in worked just fine,
>however, when I tried the 'make' I got the following
>error:
>gcc -fPIC -O -I/usr/include/python1.4 -I/usr/include/python1.4
>-DHAVE_CONFIG_H
>-Iexpat/xmlparse -c ./pyexpat.c
>./pyexpat.c: In function `mywrite':
>./pyexpat.c:64: void value not ignored as it ought to be
>make: *** [pyexpat.o] Error 1                                    

	I'd recommend using Python 1.5, because 1.5 added several new
features that are used in the XML code, most notably packages and the
class-based exceptions.  The compile error you report also stems from
a difference between Python 1.4 and 1.5.  While I can easily produce a
patch that fixes pyexpat.c, you'd then run into more difficulties: the
missing package support, exceptions, no re module, etc.

	(Or do XML-SIG people think that compatibility with 1.4 is
important?  If so, we can work on making sure it works with the older
version.)  In the meantime, I'll document the dependence on Python 1.5
more explicitly in the README; thanks for your bug report!

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "What are we going to do now?"
    "Keep it confused, feed it with useless information. I wonder if I have a
television set handy?"
    -- Sgt. Benton and the second Doctor, in "The Three Doctors"


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Dec  9 15:39:09 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 9 Dec 1998 10:39:09 -0500 (EST)
Subject: [XML-SIG] xml install problems
In-Reply-To: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com>
References: <001301be2366$a42b4c90$f29b12c2@pythonware.com>
 <3.0.1.32.19981209071652.006fadc8@mail.csihq.com>
Message-ID: <13934.39325.895915.517536@weyr.cnri.reston.va.us>

John Day writes:
 > Here's my sys.path (thanks, I didn't know about this):
 > 
 > >>> import sys
 > >>> for i in sys.path: print i
 > ..
 > /home/jday/bin/lib/python1.5/
 > /home/jday/bin/lib/python1.5/test
 > /home/jday/bin/lib/python1.5/plat-linux2
 > /home/jday/bin/lib/python1.5/lib-tk
 > /home/jday/bin/lib/python1.5/lib-dynload
 > >>> import sax
 > Traceback (innermost last):
 >   File "<stdin>", line 1, in ?
 > ImportError: No module named sax
 > ......................................................
 > 
 > Now I'm confused. None of the above packages are members
 > of the "site-packages" directory. (xml is the only entry)
 > I assumed "site-packages" would list _all_ installed pythonic
 > packages. 

  I'm going to be bold and guess that you're using Python 1.5, not
1.5.1.  Python 1.5 did not automatically import the "site" module, but 
1.5.1 does (if I recall correctly ;).
  Try doing "import site, xml".  If that works, then you can do either 
of two things:

	- Add "import site" before your "import xml" in you
	  application code, or
	- Upgrade to Python 1.5.1.

  Of course, now that I've written all this, you probably already have 
1.5.1 and I'm confused.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From larsga@ifi.uio.no  Wed Dec  9 16:14:59 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 09 Dec 1998 17:14:59 +0100
Subject: [XML-SIG] xml install problems
In-Reply-To: <13934.36700.619892.696779@amarok.cnri.reston.va.us>
References: <3.0.1.32.19981208191958.00692914@mail.csihq.com> 	<Pine.LNX.3.96.981208202935.5713A-100000@eccnet.eccnet.com> <13934.36700.619892.696779@amarok.cnri.reston.va.us>
Message-ID: <wkemq9thq4.fsf@ifi.uio.no>

* Andrew M. Kuchling
| 
| (Or do XML-SIG people think that compatibility with 1.4 is
| important?  If so, we can work on making sure it works with the
| older version.)

Sounds pretty hopeless to me, I'm afraid. Both xmllib and xmlproc use
the re module, saxlib uses class exceptions (and has to) and I guess
the DOM does too. That leaves the C-based parsers and Dan Connolly's
more or less useless parser.

Personally, I wouldn't give it priority.

--Lars M.


From jday@csihq.com  Wed Dec  9 17:12:46 1998
From: jday@csihq.com (John Day)
Date: Wed, 09 Dec 1998 12:12:46 -0500
Subject: [XML-SIG] xml install problems
In-Reply-To: <9812091449.AA11121@etlibs2.etl.go.jp>
References: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com>
Message-ID: <3.0.1.32.19981209121246.007682ec@mail.csihq.com>

At 11:49 PM 12/9/98 JST, Tamito Kajiyama wrote:
>
>BTW, in your message <3.0.1.32.19981208191958.00692914@mail.csihq.com>,
>you said you installed Python 1.5.1 with prefix=/home/jday.  So, the
>directories listed in sys.path should be
>
>  /home/jday/lib/python1.5/
>  /home/jday/lib/python1.5/test
>  /home/jday/lib/python1.5/plat-linux2
>
>and so on.  Also, the directory /home/jday/lib/python1.5/site-packages
>should be in sys.path if you don't run Python with the -S option.
>

Tamito,

You were right, problem was the Python installation not xml-0.5. I've got
everything working more or less OK now. Here's what I did (for the 
benefit of any silent minority having similar problems):

1. [From Python make directory]:
   Did 'make distclean' to clear out the original Python installation.
   Rebuilt Python1.5.1 from './configure --prefix=/home/jday' [I must have
   used the wrong prefix before]
2. After 'make' and 'make install' I ended up with _two_ executables:
   one in [prefix]/bin and the other in the make directory. The one in
   the make directory immediately allowed me to 'import xml.sax' etc
   The one in [prefix]/bin still couldn't see the site-packages until
   I defined 
      setenv PYTHONPATH /home/jday/lib/python1.5/site-packages
   Then it allowed 'import xml.sax' also. [I don't understand why the
   executable in the make directory creates a different sys.path than
   the one in the [prefix]/bin directory]
3. In the xml-0.5 directory I rebuilt everything. The 'make test' still
   doesn't work [because it temporarily trashes PYTHONPATH] but I was
   able to run each test separately 'python test/test_arch.py' etc.
   

So I think I am now XML-enabled. Thanks to everybody for helping me
out. Now to figure out how the XML parser and other stuff works ;-)

-jday


From jday@csihq.com  Wed Dec  9 19:30:24 1998
From: jday@csihq.com (John Day)
Date: Wed, 09 Dec 1998 14:30:24 -0500
Subject: [XML-SIG] sax demo
Message-ID: <3.0.1.32.19981209143024.0076a5b8@mail.csihq.com>

FYI, in demo/sax/saxhack.py: line 82

  class slowParser(xmllib.SlowXMLParser):

causes error: only parser in xmllib appears to be

  class slowParser(xmllib.TestXMLParser):

This works OK.

-jday


From akuchlin@cnri.reston.va.us  Wed Dec  9 20:36:19 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  9 Dec 1998 15:36:19 -0500 (EST)
Subject: [XML-SIG] xml install problems
In-Reply-To: <3.0.1.32.19981209121246.007682ec@mail.csihq.com>
References: <3.0.1.32.19981209071652.006fadc8@mail.csihq.com>
 <9812091449.AA11121@etlibs2.etl.go.jp>
 <3.0.1.32.19981209121246.007682ec@mail.csihq.com>
Message-ID: <13934.50329.296793.922448@amarok.cnri.reston.va.us>

John Day writes:
>3. In the xml-0.5 directory I rebuilt everything. The 'make test' still
>   doesn't work [because it temporarily trashes PYTHONPATH] but I was
>   able to run each test separately 'python test/test_arch.py' etc.

	I should fix that; the intention is that you can run the test
suite without having to actually install the package, but that relies
on having a symlink from xml to '.' in the main directory.  Perhaps it
should behave in a way being suggested in the Distutils-sig, and
construct a fake installation tree inside the package; actual
installation would then be a matter of just copying the tree.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It's the same old story... Whatever it turns into on the way, whatever it is
you originally undertake to spin or knit or weave, keep it going long enough
and, in the end, my lilies, it's always a winding sheet.
    -- One of the three Fates, in SANDMAN #69: "The Kindly Ones:13"


From betty@eccnet.eccnet.com  Thu Dec 10 17:41:06 1998
From: betty@eccnet.eccnet.com (Betty Harvey)
Date: Thu, 10 Dec 1998 12:41:06 -0500 (EST)
Subject: [XML-SIG] xml install problems
In-Reply-To: <13934.36700.619892.696779@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.96.981210123755.2985A-100000@eccnet.eccnet.com>


On Wed, 9 Dec 1998, Andrew M. Kuchling wrote:

> 	I'd recommend using Python 1.5, because 1.5 added several new
> features that are used in the XML code, most notably packages and the
> class-based exceptions.  The compile error you report also stems from
> a difference between Python 1.4 and 1.5.  While I can easily produce a
> patch that fixes pyexpat.c, you'd then run into more difficulties: the
> missing package support, exceptions, no re module, etc.

Question about installing 1.5 on LINUX 5.0.  I am unable to install 1.5
because LINUX is using Python 1.4 for some system support, including
RPM.  Is there a safe method for installing 1.5?

Has anyone installed Linux 5.2?  Is Python 1.5 available on 5.2.  I
have the CD for 5.2 but haven't upgraded yet.

Betty

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Betty Harvey                           | Phone: 301-540-8251 FAX: 4268
Electronic Commerce Connection, Inc.   | 
13017 Wisteria Drive, P.O. Box 333     | 
Germantown, Md.  20874                 |
harvey@eccnet.eccnet.com               | Washington,DC SGML Users Grp
URL:  http://www.eccnet.com            | http://www.eccnet.com/sgmlug/
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/  


From fredrik@pythonware.com  Thu Dec 10 18:01:29 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 10 Dec 1998 19:01:29 +0100
Subject: [XML-SIG] xml install problems
Message-ID: <00d301be2467$1e7ef220$f29b12c2@pythonware.com>

>Question about installing 1.5 on LINUX 5.0.  I am unable to install 1.5
>because LINUX is using Python 1.4 for some system support, including
>RPM.  Is there a safe method for installing 1.5?

Sure.  Quoting from the README file:

    All subdirectories created will have Python's version number in their
    name, e.g. the library modules are installed in
    "/usr/local/lib/python1.5/" by default.  The Python binary is
    installed as "python1.5" and a hard link named "python" is created.
    The only file not installed with a version number in its name is the
    manual page, installed as "/usr/local/man/man1/python.1" by default.

    If you have a previous installation of a pre-1.5 Python that you don't
    want to replace yet, use

    	make altinstall

    This installs the same set of files as "make install" except it
    doesn't create the hard link to "python1.5" named "python" and it
    doesn't install the manual page at all.

Dunno about RedHat 5.2; we're still on 4.2 over here.  But
http://www.redhat.com/product.phtml/RH5020 says it's
using Python 1.5.1.

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Dec 10 18:34:33 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 10 Dec 1998 13:34:33 -0500 (EST)
Subject: [XML-SIG] xml.dom.core.NamedNodeMap.get() method
Message-ID: <13936.5177.144595.797587@weyr.cnri.reston.va.us>

  I've attached a patch to add a get() method to NamedNodeMap, to make 
it a little more dictionary like.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


Index: core.py
===================================================================
RCS file: /projects/cvsroot/xml/dom/core.py,v
retrieving revision 1.36
diff -c -c -r1.36 core.py
*** core.py	1998/12/09 03:18:58	1.36
--- core.py	1998/12/10 18:30:30
***************
*** 191,196 ****
--- 191,201 ----
          key = arg.nodeName
          self[key] = arg
  
+     def get(self, key, default=None):
+         if self.data.has_key(key):
+             return self[key]
+         return default
+ 
      def item(self, index):
          return self.data.values[ index ]
  

From gstein@lyra.org  Thu Dec 10 18:52:29 1998
From: gstein@lyra.org (Greg Stein)
Date: Thu, 10 Dec 1998 10:52:29 -0800
Subject: [XML-SIG] xml install problems
References: <00d301be2467$1e7ef220$f29b12c2@pythonware.com>
Message-ID: <3670186D.50B8090A@lyra.org>

Fredrik Lundh wrote:
> 
> >Question about installing 1.5 on LINUX 5.0.  I am unable to install 1.5
> >because LINUX is using Python 1.4 for some system support, including
> >RPM.  Is there a safe method for installing 1.5?
> ...
> Dunno about RedHat 5.2; we're still on 4.2 over here.  But
> http://www.redhat.com/product.phtml/RH5020 says it's
> using Python 1.5.1.

RedHat started installing 1.5.1 as part of RedHat 5.1.

In other words, RedHat 5.1 and 5.2 have the most recent (public) version
of Python.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Jean-Michel.Bruel@univ-pau.fr  Fri Dec 11 11:02:59 1998
From: Jean-Michel.Bruel@univ-pau.fr (Jean-Michel BRUEL)
Date: Fri, 11 Dec 1998 12:02:59 +0100 (MET)
Subject: [XML-SIG] [CFP] UML'99
Message-ID: <199812111102.MAA02794@crisv4.univ-pau.fr>

[apologies if you receive multiple copies of this announcement]

=================================================================
 Call for Papers                <<UML>>'99
=================================================================

 Second International Conference on the
 Unified Modeling Language

 October 28-30, 1999, Fort Collins, Colorado, USA
 (just before OOPSLA)
=================================================================
 http://www.cs.colostate.edu/UML99
=================================================================

Invited Speaker:
                        Grady Booch

Scope:
   <<UML>>'99 will bring together researchers in academia and
   industry who are developing processes, methods, techniques,
   and semantic foundations for the UML. The conference will
   provide a forum for discussing and evaluating promising
   approaches that will enhance the application of UML.

   The <<UML>>'99 organizing committee invites authors to
   submit papers presenting original and unpublished research
   and experience reports on UML or related topics.
   Typical areas include (but are not limited to):

   - Integration of software development techniques
   - Significant or useful extensions
   - Metamodels and model interchange
   - Formal semantics
   - Business processes and modeling
   - Experiences reports that contribute significant research
     ideas or make unconventional use of UML
   - OCL and other contraint notations
   - Reuse at the modeling level
   - Patterns, pattern mining
   - Extensions and restrictions of UML
   - UML compared to other notations
   - Mapping of UML to programming languages, frameworks,
     databases, and architectures
   - Modeling software architectures with UML
   - Verification with UML models
   - Transformation of UML models (incl. code generation)
   - Refinement and composition of UML models
   - Method engineering in the large
   - Management of UML projects
   - Modeling of distributed systems
   - UML and real-time
   - Metrics and measures based on UML

Important dates (deadlines are hard!):
   Deadline for abstract                05 May 1999
   Deadline for submission              15 May 1999
   Notification to authors              15 July 1999
   Final version of accepted papers     25 August 1999

Conference web page:
   http://www.cs.colostate.edu/UML99

Submissions:
   Submit your 10-15 page manuscript electronically in Postscript
   or pdf using the Springer LNCS style. Details are available at
   the conference web page. The <<UML>>'99 proceedings will be
   published by Springer-Verlag in the LNCS series.

Program Committee:
   C. Atkinson, Germany         J. Bezivin, France
   J. Bieman, USA               G. v. Bochmann, Canada
   R. Breu, Germany             J.-M. Bruel, France
   F. Buschmann, Germany        B. Cheng, USA
   D. Coleman, USA              S. Cook, UK
   D. D'Souza, USA              J. Daniels, UK
   G. Engels, Germany           A. S. Evans, UK
   E. Fernandez, USA            M. Fowler, USA
   E. Gery, Israel              M. Gogolla, Germany
   M. Griss, USA                R. Grosu, USA               
   D. Harel, Israel             B. Henderson-Sellers, Australia
   P. Hruby, Denmark            H. Hussmann, Germany        
   I. Jacobson, USA             G. Kappel, Austria          
   S. Kent, UK                  H. Kilov, USA               
   C. Kobryn, USA               P. Kruchten, USA            
   K. Lano, UK                  G. Leavens, USA             
   M. Loomis, USA               S. Mellor, USA              
   R. Mitchell, UK              A. Moreira, Portugal        
   P.-A. Muller, France         L. Northrop, USA            
   G. Overgaard, Sweden         B. Paech, Germany           
   J. Rumbaugh. USA             A. Schurr, Germany          
   E. Seidewitz, USA            B. Selic, Canada            
   R. Soley, USA                J. Warmer, Netherlands      
   T. Wasserman, USA            A. Wills, UK                
   R. Wirfs-Brock, USA

Organizing Committee:
   Conference Chair:
     Robert B. France, USA
   Program Chair:
     Bernhard Rumpe, Germany
   Publicity Chairs:
     J.-M. Bruel, France
     J. Bieman, USA
     J. Suzuki, Japan
   Steering Committee:
     J. Bezivin, France
     R. B. France, USA
     P.-A. Muller, France
     B. Rumpe, Germany

Further Information:
   Robert B. France             E-mail: france@cs.colostate.edu
   Computer Science Department  Tel:    970-491-6356
   Colorado State University    Fax:    970-491-2466
   Fort Collins, CO 80523, USA

   Bernhard Rumpe               E-mail: rumpe@in.tum.de
   Institut fuer Informatik     Tel:    0049-89-289-28129
   T. Universitaet Muenchen     Fax:    0049-89-289-28183
   80290 Muenchen, Germany


From George McNinch <George.J.McNinch.1@nd.edu>  Fri Dec 11 14:47:42 1998
From: George McNinch <George.J.McNinch.1@nd.edu> (George J McNinch)
Date: 11 Dec 1998 09:47:42 -0500
Subject: [XML-SIG] build problems: xml-0.5
Message-ID: <yvz3ogpau44x.fsf@galois.math.nd.edu>

This is a MIME multipart message.  If you are reading
this, you shouldn't.

--=-=-=

Hi--

I have not been able to build xml-0.4 or xml-0.5

gmcninch@galois 7% uname -a
IRIX galois 6.2 03131015 IP22

I'm _not_ using gcc, but IRIX cc.

Find attached the compile log.

Best,
George McNinch


--=-=-=
Content-Disposition: inline;
 filename="~/lib/python/xml-0.5/compile_outcome.txt"

cd /usr/people/gmcninch/lib/python/xml-0.5/
make -k 
	cd expat ; make libexpat.a CC="cc -n32" CFLAGS=" -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse"
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -o gennmtab/gennmtab gennmtab/gennmtab.c
	rm -f xmltok/nametab.h
	gennmtab/gennmtab >xmltok/nametab.h
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmltok/xmltok.o xmltok/xmltok.c
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmltok/xmlrole.o xmltok/xmlrole.c
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/xmlwf.o xmlwf/xmlwf.c
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/codepage.o xmlwf/codepage.c
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlparse/xmlparse.o xmlparse/xmlparse.c
"xmlparse/xmlparse.c", line 723: error(1131): expected a field name
    int tok = XmlContentTok(encoding, start, end, &next);
              ^

"xmlparse/xmlparse.c", line 723: error(1131): expected a field name
    int tok = XmlContentTok(encoding, start, end, &next);
              ^

"xmlparse/xmlparse.c", line 754: error(1131): expected a field name
    int tok = XmlContentTok(encoding, start, end, &next);
              ^

"xmlparse/xmlparse.c", line 754: error(1131): expected a field name
    int tok = XmlContentTok(encoding, start, end, &next);
              ^

"xmlparse/xmlparse.c", line 1510: error(1131): expected a field name
      int tok = XmlPrologTok(encoding, s, end, &next);
                ^

"xmlparse/xmlparse.c", line 1510: error(1131): expected a field name
      int tok = XmlPrologTok(encoding, s, end, &next);
                ^

"xmlparse/xmlparse.c", line 1807: error(1131): expected a field name
      int tok = XmlPrologTok(encoding, s, end, &next);
                ^

"xmlparse/xmlparse.c", line 1807: error(1131): expected a field name
      int tok = XmlPrologTok(encoding, s, end, &next);
                ^

"xmlparse/xmlparse.c", line 1925: warning(1110): statement is unreachable
        break;
        ^

"xmlparse/xmlparse.c", line 2007: error(1131): expected a field name
      int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next);
                ^

"xmlparse/xmlparse.c", line 2007: error(1131): expected a field name
      int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next);
                ^

10 errors detected in the compilation of "xmlparse/xmlparse.c".
*** Error code 2 (bu21)
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlparse/hashtable.o xmlparse/hashtable.c
	cc -n32 -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H -Ixmltok -Ixmlparse -c -o xmlwf/unixfilemap.o xmlwf/unixfilemap.c
`libexpat.a' not remade because of errors (bu14)
	cc -n32  -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H  -Iexpat/xmlparse -c ./pyexpat.c
"./pyexpat.c", line 297: warning(1164): argument of type "void (*)()" is
          incompatible with parameter of type "XML_StartElementHandler"
  	XML_SetElementHandler(self->itself, my_StartElementHandler,
  	                                    ^

"./pyexpat.c", line 298: warning(1164): argument of type "void (*)()" is
          incompatible with parameter of type "XML_EndElementHandler"
  			      my_EndElementHandler);
  			      ^

"./pyexpat.c", line 299: warning(1164): argument of type "void (*)()" is
          incompatible with parameter of type "XML_CharacterDataHandler"
  	XML_SetCharacterDataHandler(self->itself, my_CharacterDataHandler);
  	                                          ^

"./pyexpat.c", line 301: warning(1164): argument of type "void (*)()" is
          incompatible with parameter of type
          "XML_ProcessingInstructionHandler"
  					    my_ProcessingInstructionHandler);
  					    ^

	ld -n32 -shared -all  pyexpat.o  expat/libexpat.a -o pyexpat.so
ld32: FATAL 9: I/O error (expat/libexpat.a): No such file or directory
*** Error code 32 (bu21)
	cc -n32  -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H  -c ./sgmlop.c
	ld -n32 -shared -all  sgmlop.o  -o sgmlop.so
	cc -n32  -O -I/usr/freeware/include/python1.5 -I/usr/freeware/include/python1.5 -DHAVE_CONFIG_H  -c ./wstrop.c
"./wstrop.c", line 417: warning(1164): argument of type "char *" is
          incompatible with parameter of type "const unsigned char *"
      l1=from_utf8(string+i,&wtmp);
                   ^

"./wstrop.c", line 426: warning(1164): argument of type "char *" is
          incompatible with parameter of type "const unsigned char *"
      tmp+=from_utf8(tmp,wstr->string+i);
                     ^

"./wstrop.c", line 627: warning(1164): argument of type "char *" is
          incompatible with parameter of type "unsigned char *"
      str+=to_utf8(self->string[i],str);
                                   ^

"./wstrop.c", line 807: warning(1164): argument of type "char *" is
          incompatible with parameter of type "unsigned char *"
    utf7_to_ucs2(PyString_AsString(ucs2),string,len,flags);
                 ^

"./wstrop.c", line 829: warning(1164): argument of type "char *" is
          incompatible with parameter of type "unsigned char *"
    len=ucs2_to_utf7(0,PyString_AsString(ucs2),PyObject_Length(ucs2),
                       ^

"./wstrop.c", line 838: warning(1164): argument of type "char *" is
          incompatible with parameter of type "unsigned char *"
    ucs2_to_utf7(PyString_AsString(utf7),PyString_AsString(ucs2),
                                         ^

"./wstrop.c", line 892: warning(1515): a value of type "char *" cannot be
          assigned to an entity of type "unsigned char *"
    s=PyString_AsString(result);
     ^

	ld -n32 -shared -all  wstrop.o  -o wstrop.so
`default' not remade because of errors (bu14)

Compilation finished at Fri Dec 11 09:29:57

--=-=-=--


From akuchlin@cnri.reston.va.us  Fri Dec 11 16:53:02 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 11 Dec 1998 11:53:02 -0500 (EST)
Subject: [XML-SIG] Equality tests on DOM nodes
Message-ID: <13937.18858.948855.840376@amarok.cnri.reston.va.us>

[CC'ed to xml-sig@python.org and www-dom@w3.org; followups to
 www-dom@w3.org]

With reference to the Python DOM implementation, someone has raised
the question of testing the equality of nodes.  I don't think there's
anything in the DOM Recommendation that discusses this question,
possibly because the issue doesn't raise its head in Java.

	Briefly, what should 'node1 == node2' do?  In Python, object
identity is tested using the 'is' operator, so 'node1 is node2'
returns true iff node1 and node2 are actually the same object.  'node1
== node2' should therefore test for equal values of the node.  This
differs from Java, where n1==n2 tests object identity, and a further
comparison would have to be implemented as a method.

	It seems fairly obvious that node1==node2 should check whether 
the node type and value are identical, and return false if they're
not.  But there are some trickier questions:

	* Should Element instances also compare their attributes?  
I would say 'yes', since the attributes are really associated with the 
Element node.

	* If the two nodes have identical type and value, should the
comparison be recursive, comparing the children of the nodes.  The ==
operator would then be comparing entire subtrees rooted at node1 and
node2.  I'm not certain if this is the best choice for the meaning of
==, but see no clear reason to choose recursive vs. non-recursive ==.
Any suggestions?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    Q. Does Kibo believe in furniture?
    A. No. Go away, furniture!
    -- The alt.religion.kibology FAQ


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 11 17:20:01 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Dec 1998 12:20:01 -0500 (EST)
Subject: [XML-SIG] Equality tests on DOM nodes
In-Reply-To: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
Message-ID: <13937.21569.329411.356332@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > 	* If the two nodes have identical type and value, should the
 > comparison be recursive, comparing the children of the nodes.  The ==
 > operator would then be comparing entire subtrees rooted at node1 and
 > node2.  I'm not certain if this is the best choice for the meaning of
 > ==, but see no clear reason to choose recursive vs. non-recursive ==.
 > Any suggestions?


  Since I'm the one who raised this with Andrew, I'll mention that my
first reaction was that it would be recursive.  I don't see any clear
indication that "shallow" equality has any real meaning.
  This corresponds to the basic notion of equality testing in Python.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From arabbit@earthlink.net  Fri Dec 11 17:59:20 1998
From: arabbit@earthlink.net (Paul Butkiewicz)
Date: Fri, 11 Dec 1998 12:59:20 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
Message-ID: <000101be252f$fa764c60$da39bfa8@arabbit>

Not to sound facetious, but to put this question in context, I might well
ask how we implement < and > for nodes?  We generally don't use those
particular operators on something real.  I would never say rock a > rock b,
but I might say rock a weighs more than rock b.  With respect to the
equality and equivalence, I am very safe saying book a has the same author
as book b, because she's really the same person.  If I'm talking about book
a and someone else is talking about book b, I might point out that they are
talking about the same book.  But if I say book a is the same as book b for
two different books, while this is a commonly used construct, it invites
argument --- "No, this book is dog-eared and has coffee stains on it.  I
want *my* book back!"

"Honey, these two coffee tables are identical.  Let's get the cheaper one."
"No.  This one is particle board and veneer, while this one is mission oak!
How can you think they're the same?"

What was my point?  I think it was to say that it invites folly, especially
when you're talking about an international, world-wide, universal standard,
to specify that two things are equal when they do not refer to the same
thing and/or measurable differences exist between them.  It seems obvious
 perhaps only to me ) that attributes must be equal and the equality must be
true recursively, if you dare to define equality for nodes.  I think the
next question might be,

Does context make a difference in equality or equivalence?

I could easily say that this paragraph is identical to that paragraph when
we're talking about a printed page, but XML, in it's most commonly discussed
usage, is about document metadata, and context is a part of that metadata.
A node is, after all, part of a larger document.

Paul

-----Original Message-----
From: www-dom-request@w3.org [mailto:www-dom-request@w3.org]On Behalf Of
Andrew M. Kuchling
Sent: Friday, December 11, 1998 11:53 AM
To: www-dom@w3.org
Cc: xml-sig@python.org
Subject: Equality tests on DOM nodes


[CC'ed to xml-sig@python.org and www-dom@w3.org; followups to
 www-dom@w3.org]

With reference to the Python DOM implementation, someone has raised
the question of testing the equality of nodes.  I don't think there's
anything in the DOM Recommendation that discusses this question,
possibly because the issue doesn't raise its head in Java.

	Briefly, what should 'node1 == node2' do?  In Python, object
identity is tested using the 'is' operator, so 'node1 is node2'
returns true iff node1 and node2 are actually the same object.  'node1
== node2' should therefore test for equal values of the node.  This
differs from Java, where n1==n2 tests object identity, and a further
comparison would have to be implemented as a method.

	It seems fairly obvious that node1==node2 should check whether
the node type and value are identical, and return false if they're
not.  But there are some trickier questions:

	* Should Element instances also compare their attributes?
I would say 'yes', since the attributes are really associated with the
Element node.

	* If the two nodes have identical type and value, should the
comparison be recursive, comparing the children of the nodes.  The ==
operator would then be comparing entire subtrees rooted at node1 and
node2.  I'm not certain if this is the best choice for the meaning of
==, but see no clear reason to choose recursive vs. non-recursive ==.
Any suggestions?

--
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    Q. Does Kibo believe in furniture?
    A. No. Go away, furniture!
    -- The alt.religion.kibology FAQ


From ray@imall.com  Fri Dec 11 17:58:35 1998
From: ray@imall.com (Ray Whitmer)
Date: Fri, 11 Dec 1998 10:58:35 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
Message-ID: <36715D4A.9660A0D0@imall.com>

Andrew M. Kuchling wrote:

> [CC'ed to xml-sig@python.org and www-dom@w3.org; followups to
>  www-dom@w3.org]
>
> With reference to the Python DOM implementation, someone has raised
> the question of testing the equality of nodes.  I don't think there's
> anything in the DOM Recommendation that discusses this question,
> possibly because the issue doesn't raise its head in Java.

I don't know Python, but very object in Java has an equals method to
signify deeper comparison than "==", for example, String.equals tells
whether the contents of two strings are identical.

>         * Should Element instances also compare their attributes?
> I would say 'yes', since the attributes are really associated with the
> Element node.
>
>         * If the two nodes have identical type and value, should the
> comparison be recursive, comparing the children of the nodes.  The ==
> operator would then be comparing entire subtrees rooted at node1 and
> node2.  I'm not certain if this is the best choice for the meaning of
> ==, but see no clear reason to choose recursive vs. non-recursive ==.
> Any suggestions?

For my own uses on both the client and server (in Java, not Python), the
full/deep comparison is the most useful and as such I implemented it in a
private API extension extremely efficiently.  A full/deep comparison is
very useful in many situations, and can be implemented much more
efficiently than forcing the user to check equality one attribute or
recursive child at a time (with acceptable tradeoffs in other parts of the
implementation).

But I would recommend NOT using the built-in Python operator, just as I am
not using the built-in equals method in Java, until it has been defined in
the standard how this should be implemented.  Otherwise, users of your
implementation will not be interoperable with users of other
implementations, and also possibly not interoperable with the standard
definition if one is ever officially formulated.  Instead, define the
operator to raise an exception, if Python has one, and if you need an
equality check, write one in a private API with your own name on it so it
will be clear to users that by using your method, they will be sacrificing
portability, in exchange for a concise, permanent definition of its
behavior.

The problem in Python is much bigger -- possibly rendering my advice
irrelevant -- since no official DOM API binding has been released for that
language in the first place.  I am just following how I would tell someone
to deal with the equals function in Java where users will expect
portability between implementations.

I don't know Python, so it is also possible that Python may impose more
rigidity on the requirements of == (than Java does on equals), making it
possible to know what the standard implementation should be, but your
raising the question would seem to indicate that it does not.

Ray Whitmer


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 11 18:06:38 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Dec 1998 13:06:38 -0500 (EST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <000101be252f$fa764c60$da39bfa8@arabbit>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <000101be252f$fa764c60$da39bfa8@arabbit>
Message-ID: <13937.24366.729293.26105@weyr.cnri.reston.va.us>

Paul Butkiewicz writes:
 > Not to sound facetious, but to put this question in context, I might well
 > ask how we implement < and > for nodes?  We generally don't use those

  This is a very real concern.  I think comparison if nodes is only
interesting for equality.  When Python finally implements the "rich
comparison" semantics that have been proposed, equality will be
testable indepently of ordering.

 > to specify that two things are equal when they do not refer to the same
 > thing and/or measurable differences exist between them.  It seems obvious
 >  perhaps only to me ) that attributes must be equal and the equality must be
 > true recursively, if you dare to define equality for nodes.  I think the

  Well said.

 > Does context make a difference in equality or equivalence?
 > 
 > I could easily say that this paragraph is identical to that paragraph when
 > we're talking about a printed page, but XML, in it's most commonly discussed
 > usage, is about document metadata, and context is a part of that metadata.
 > A node is, after all, part of a larger document.

  A good point.  I was particularly interested in equality *without
consideration for parent*.  So, I was ignoring context.
  Perhaps there is no fully general equality that isn't identity?  I
think the Python implementation would still require implementation of
a comparison method to achieve this since it uses proxy nodes, but
that's really just an implementation detail.  Python's native identity 
operator doesn't work in the presence of proxies that represent the
same node.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From michael@graphion.com  Fri Dec 11 18:11:58 1998
From: michael@graphion.com (Michael Sanborn)
Date: Fri, 11 Dec 1998 10:11:58 -0800
Subject: [XML-SIG] New to Python OO
Message-ID: <3671606D.6D731B98@graphion.com>

Suppose I wanted to create a customized method to write out a DOM tree,
say as plain text, like a totxt() paralleling toxml(). And say my
program
imports xml.dom.core and xml.dom.builder. I would have thought that the
way to approach this would be to define a local Node class derived from
core.py that added an empty totxt() method, and then to define local
subclasses of Node (such as Text) with specific totxt() methods. My
reasoning was that the Builder class would then build the tree with my
enhanced Nodes. But that doesn't seem to be happening. Instead, Builder
seems to be constructing the tree with regular core Nodes that don't
recognize my totxt() method. Can anyone give me advice on how to achieve

this?

Thanks,

Michael Sanborn
Graphion Typesetting


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 11 18:17:21 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Dec 1998 13:17:21 -0500 (EST)
Subject: [XML-SIG] Re: Equality tests on DOM nodes
In-Reply-To: <36715D4A.9660A0D0@imall.com>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <36715D4A.9660A0D0@imall.com>
Message-ID: <13937.25009.925375.550977@weyr.cnri.reston.va.us>

Ray Whitmer writes:
 > The problem in Python is much bigger -- possibly rendering my advice
 > irrelevant -- since no official DOM API binding has been released for that

  The spec does include IDL, and a Python binding for IDL is being
developed.  (Now, I've not checked that the Python DOM uses the Python 
IDL binding.  Andrew, perhaps you can address this in the Python
XML-SIG?)

 > I don't know Python, so it is also possible that Python may impose more
 > rigidity on the requirements of == (than Java does on equals), making it
 > possible to know what the standard implementation should be, but your
 > raising the question would seem to indicate that it does not.

  A couple of issues seem appearant to me, but depth is not one of
them.
  First, the current implementation of Python's comparison semantics
require complete ordering, which doesn't make sense in this case.
That can be ignored for now if the documentation states that only
equality/inequality is supported.  Future versions of Python are
expected to correct this problem.
  Second, the concerns Paul Butkiewicz raised about the relevance of
context need to be addressed.  Basic equality may have to be
interpreted as node identity.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From arabbit@earthlink.net  Fri Dec 11 18:19:45 1998
From: arabbit@earthlink.net (Paul Butkiewicz)
Date: Fri, 11 Dec 1998 13:19:45 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <36715D4A.9660A0D0@imall.com>
Message-ID: <000401be2532$d4647b20$da39bfa8@arabbit>

>I don't know Python, but [e]very object in Java has an equals method to
>signify deeper comparison than "==", for example, String.equals tells
>whether the contents of two strings are identical.

I must be feeling contrary today, but I think you're saying isn't true.
String.equals( String ) does examine the contents of two different objects
to determine that they are identical.  But this is the case only because
String explicitly overrides the equals( Object ) method in Object, which
isn't true of many objects.  The equals( Object ) method in Object only
returns true if the objects are actually the same object, ie.
( *x )->equals( *y ) if and only if x == y.

Paul


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 11 18:31:12 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Dec 1998 13:31:12 -0500 (EST)
Subject: [XML-SIG] New to Python OO
In-Reply-To: <3671606D.6D731B98@graphion.com>
References: <3671606D.6D731B98@graphion.com>
Message-ID: <13937.25840.255411.454141@weyr.cnri.reston.va.us>

Michael Sanborn writes:
 > Suppose I wanted to create a customized method to write out a DOM tree,
 > say as plain text, like a totxt() paralleling toxml(). And say my
 > program
 > imports xml.dom.core and xml.dom.builder. I would have thought that the
 > way to approach this would be to define a local Node class derived from
 > core.py that added an empty totxt() method, and then to define local
 > subclasses of Node (such as Text) with specific totxt() methods. My
 > reasoning was that the Builder class would then build the tree with my
 > enhanced Nodes. But that doesn't seem to be happening. Instead, Builder
 > seems to be constructing the tree with regular core Nodes that don't

 > recognize my totxt() method. Can anyone give me advice on how to achieve
 > this?

  There are two questions that need to be addressed here:  1) How
should all this work, and 2) how to make it work now.
  Let's start with the second question, since it's easier.  This is an 
approach I've used to write out an ESIS stream, so I can claim it
works.  Write the transform you want as a function (or maybe an
object, if that's more conventient for state management), and pass the 
document to it.  It just needs to walk the tree and handle each node
type appropriately.  From your brief description, I'd say this
wouldn't be too hard.  (You may find the stuff in the formatter module
from the standard library handy as well.)
  What *should* be done is different.  ;-)  First, the DOM should
support the visitor pattern.  Not difficult to implement, but it's not 
in the DOM spec (yet).  This would allow transforms to be written more 
cleanly.
  The ability to subclass the node types and have the subclasses be
used would be really nice.  The builder (and anything else) should
only use the methods on the document object to create new nodes.  You
should then be able to subclass the Document class to make the factory 
methods do the right thing.  (Some details may need to change on the
builder, but that's trivial.)  The biggest issue with this is the
performance hit.  That may be more than is acceptable.
  Use of the visitor pattern would certainly be more useful and easier 
in most cases.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From arabbit@earthlink.net  Fri Dec 11 18:35:23 1998
From: arabbit@earthlink.net (Paul Butkiewicz)
Date: Fri, 11 Dec 1998 13:35:23 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <000401be2532$d4647b20$da39bfa8@arabbit>
Message-ID: <000501be2535$03afcd60$da39bfa8@arabbit>

Wow.  I'm replying to myself.  If I did that walking down the street, people
would stare at me.

A further implementation difficulty has occurred to me:  There are likely
many people out there who would like to or are using the DOM in conjunction
with a database, making the node objects persistent.  These folks would
probably prefer that equality indicate not just that two nodes are identical
but that they represent the same record in the database.

Paul

-----Original Message-----
From: www-dom-request@w3.org [mailto:www-dom-request@w3.org]On Behalf Of
Paul Butkiewicz
Sent: Friday, December 11, 1998 1:20 PM
To: Ray Whitmer; Andrew M. Kuchling
Cc: www-dom@w3.org; xml-sig@python.org
Subject: RE: Equality tests on DOM nodes


>I don't know Python, but [e]very object in Java has an equals method to
>signify deeper comparison than "==", for example, String.equals tells
>whether the contents of two strings are identical.

I must be feeling contrary today, but I think you're saying isn't true.
String.equals( String ) does examine the contents of two different objects
to determine that they are identical.  But this is the case only because
String explicitly overrides the equals( Object ) method in Object, which
isn't true of many objects.  The equals( Object ) method in Object only
returns true if the objects are actually the same object, ie.
( *x )->equals( *y ) if and only if x == y.

Paul


From gwachob@aimnet.com  Fri Dec 11 18:48:06 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Fri, 11 Dec 1998 10:48:06 -0800 (PST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <13937.24366.729293.26105@weyr.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.05.9812111039350.19876-100000@shell1.ncal.verio.com>

On Fri, 11 Dec 1998, Fred L. Drake wrote:

> 
> Paul Butkiewicz writes:
>  > Not to sound facetious, but to put this question in context, I might well
>  > ask how we implement < and > for nodes?  We generally don't use those
>   Perhaps there is no fully general equality that isn't identity?  I
> think the Python implementation would still require implementation of
> a comparison method to achieve this since it uses proxy nodes, but
> that's really just an implementation detail.  Python's native identity 
> operator doesn't work in the presence of proxies that represent the
> same node.

Before you define equality generally for nodes, don't you have to define
equality for each element and even each attribute? This may be a trivial
task, but I suspect there are some issues (like if a Text node contains an
entity reference, which, after being evaluated, results in a text string
which is the same text string contained in another Text node without that
entity referene) that are not specified. 

Another issue would be order of children. Without a DTD, how do you tell
when order of child elements is significant? Perhaps this has to be an
parameter to the deep comparison operator. 

If you made a decision on these issues and defined a comparison operator,
I would say that it should be recursive because otherwise, the comparison
operator isn't all that useful. Of course, given all the vagaries in the
mapping of semantics of the word "equal" to the semantic meanings of
various subtrees of two DOM trees, I wonder whether a single generalized
equality operator will be useful to many people..

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 11 18:54:32 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Dec 1998 13:54:32 -0500 (EST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <Pine.GSO.4.05.9812111039350.19876-100000@shell1.ncal.verio.com>
References: <13937.24366.729293.26105@weyr.cnri.reston.va.us>
 <Pine.GSO.4.05.9812111039350.19876-100000@shell1.ncal.verio.com>
Message-ID: <13937.27240.670116.621025@weyr.cnri.reston.va.us>

Gabe Wachob writes:
 > Before you define equality generally for nodes, don't you have to define
 > equality for each element and even each attribute? This may be a trivial
...
 > Another issue would be order of children. Without a DTD, how do you tell
 > when order of child elements is significant? Perhaps this has to be an

  Very good points.  This makes it incredibly expensive to "do it
right" with any level of abstraction.
  I guess it's not that hard to just write a routine that "does the
right thing" for exactly what is needed in each case.  And it's
looking increasingly appropriate.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From jday@csihq.com  Fri Dec 11 19:12:05 1998
From: jday@csihq.com (John Day)
Date: Fri, 11 Dec 1998 14:12:05 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <000101be252f$fa764c60$da39bfa8@arabbit>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
Message-ID: <3.0.1.32.19981211141205.00767594@mail.csihq.com>

At 12:59 PM 12/11/98 -0500, you wrote:
>Not to sound facetious, but to put this question in context, I might well
>ask how we implement < and > for nodes?  We generally don't use those
>particular operators on something real.  I would never say rock a > rock b,
>but I might say rock a weighs more than rock b. 

This is a valid question with a meaningful reply. Operators like '<' and '>' 
can be implemented by any relation which is transitive, reflexive, and
anti-symmetric. Since reflexive implies A<B -> B<A is more correct to use 
notations like '>=' and '<='. The relation doesn't have to mean 'greater'
or 'less'. It can be _any_ relation which satisfies the partial order
defintion. A very useful one is "IS_A_SUBSET_OF".

[It is understood that 'rock' itself is an "extential" object, understood
by some set of "intents" (attributes) such as 'heavy', 'gray', 'hard',
'big' etc. The relation can be written in extential form
but its meaning is usually  applied to the intents. A extent like a rock
cannot be perceived unless it has intents]

Such relations define a "partial order" which have many uses in information
retrieval, which XML certainly applies to. 

Let's say I'm searching for documents containing Concept X, where a concept
if defined by the presence of a certain element node ("extent"), possibly 
qualified by attributes("intents". So 'equality' could be viewed as equivalence 
in the sense that two documents are equivalent if they contain the same 
concept(s). 

There may be other concepts in the documents that don't match, but this
does not necessarily destroy the equivalence that we're searching for.

Doesn't this imply that there is room for 'shallow' kinds of matching' to 
support this kind of reasoning? Of course, there is still a need for 
relations like "exactly identical", but subsethood is also a useful
relation.

-jday


From arabbit@earthlink.net  Fri Dec 11 19:29:54 1998
From: arabbit@earthlink.net (Paul Butkiewicz)
Date: Fri, 11 Dec 1998 14:29:54 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <3.0.1.32.19981211141205.00767594@mail.csihq.com>
Message-ID: <000701be253c$a1688680$da39bfa8@arabbit>

OK, I hadn't really thought about that.  But can you come up with a way of
ordering nodes that deserves to be defined as part of a global and timeless
standard instead of being merely implementation specific?

Paul

-----Original Message-----
From: John Day [mailto:jday@csihq.com]
Sent: Friday, December 11, 1998 2:12 PM
To: Paul Butkiewicz; Andrew M. Kuchling; www-dom@w3.org
Cc: xml-sig@python.org
Subject: Re: [XML-SIG] RE: Equality tests on DOM nodes


At 12:59 PM 12/11/98 -0500, you wrote:
>Not to sound facetious, but to put this question in context, I might well
>ask how we implement < and > for nodes?  We generally don't use those
>particular operators on something real.  I would never say rock a > rock b,
>but I might say rock a weighs more than rock b.

This is a valid question with a meaningful reply. Operators like '<' and '>'
can be implemented by any relation which is transitive, reflexive, and
anti-symmetric. Since reflexive implies A<B -> B<A is more correct to use
notations like '>=' and '<='. The relation doesn't have to mean 'greater'
or 'less'. It can be _any_ relation which satisfies the partial order
defintion. A very useful one is "IS_A_SUBSET_OF".

[It is understood that 'rock' itself is an "extential" object, understood
by some set of "intents" (attributes) such as 'heavy', 'gray', 'hard',
'big' etc. The relation can be written in extential form
but its meaning is usually  applied to the intents. A extent like a rock
cannot be perceived unless it has intents]

Such relations define a "partial order" which have many uses in information
retrieval, which XML certainly applies to.

Let's say I'm searching for documents containing Concept X, where a concept
if defined by the presence of a certain element node ("extent"), possibly
qualified by attributes("intents". So 'equality' could be viewed as
equivalence
in the sense that two documents are equivalent if they contain the same
concept(s).

There may be other concepts in the documents that don't match, but this
does not necessarily destroy the equivalence that we're searching for.

Doesn't this imply that there is room for 'shallow' kinds of matching' to
support this kind of reasoning? Of course, there is still a need for
relations like "exactly identical", but subsethood is also a useful
relation.

-jday


From gwachob@aimnet.com  Fri Dec 11 19:45:01 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Fri, 11 Dec 1998 11:45:01 -0800 (PST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <13937.27240.670116.621025@weyr.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.05.9812111128530.24079-100000@shell1.ncal.verio.com>

On Fri, 11 Dec 1998, Fred L. Drake wrote:

> 
> Gabe Wachob writes:
>  > Before you define equality generally for nodes, don't you have to define
>  > equality for each element and even each attribute? This may be a trivial
> ..
>  > Another issue would be order of children. Without a DTD, how do you tell
>  > when order of child elements is significant? Perhaps this has to be an
> 
>   Very good points.  This makes it incredibly expensive to "do it
> right" with any level of abstraction.
>   I guess it's not that hard to just write a routine that "does the
> right thing" for exactly what is needed in each case.  And it's
> looking increasingly appropriate.

Someone mentioned in this list or another that a set of objects
corresponding to a Visitor pattern is something that should be added to
DOM. There could be a default "equalityVisitor" that would have certain
default equality rules built in (lets say, a separate equality test method
for each DOM class). You could simply subclass the equalityVisitor to
modify the semantics of the equality test for whatever particular elements
you needed. Perhaps the equality visitor could simply have enough
configurable parameters to make it do what you want without having to
subclass.

The original context of the Visitor suggestion was for rendering XML into
HTML (I believe).

For info on Visitor Pattern see the Gang of Four book "Design Patterns"
http://iamwww.unibe.ch/CHOOSE/Articles/95-1/DP-book-review.html
 
	-Gabe


-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From akuchlin@cnri.reston.va.us  Fri Dec 11 20:06:40 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 11 Dec 1998 15:06:40 -0500 (EST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <13937.27240.670116.621025@weyr.cnri.reston.va.us>
References: <13937.24366.729293.26105@weyr.cnri.reston.va.us>
 <Pine.GSO.4.05.9812111039350.19876-100000@shell1.ncal.verio.com>
 <13937.27240.670116.621025@weyr.cnri.reston.va.us>
Message-ID: <13937.31273.223202.338497@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>
>Gabe Wachob writes:
> > Another issue would be order of children. Without a DTD, how do you tell
> > when order of child elements is significant? Perhaps this has to be an
>
>  I guess it's not that hard to just write a routine that "does the
>right thing" for exactly what is needed in each case.  And it's
>looking increasingly appropriate.

Indeed; it looks like there are several different variations on what
equality would mean for a DOM node, and none seems obvious as the most
intuitive meaning for ==.  So the best course seems to be to define ==
applied between nodes to raise an exception, to be changed in case DOM
Level N defines it, and have a collection of functions which implement
different equality tests.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The large body of the swan wedged in the shattered glass of the car windscreen
fills the film frame. Its head is bent back on itself in a parody of its
orthodox gracefulness.
    -- Peter Greenaway, _A Zed and Two Noughts_ (1986)


From akuchlin@cnri.reston.va.us  Fri Dec 11 20:30:17 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 11 Dec 1998 15:30:17 -0500 (EST)
Subject: [XML-SIG] New to Python OO
In-Reply-To: <3671606D.6D731B98@graphion.com>
References: <3671606D.6D731B98@graphion.com>
Message-ID: <13937.31578.270593.15411@amarok.cnri.reston.va.us>

Michael Sanborn writes:
>imports xml.dom.core and xml.dom.builder. I would have thought that the
>way to approach this would be to define a local Node class derived from
>core.py that added an empty totxt() method, and then to define local
>subclasses of Node (such as Text) with specific totxt() methods. 

	Things aren't that simple, because of the implementation,
which consists of a tree of hidden objects; the classes that you
interact with, such as Node, Element, Text, etc. are all proxies for
that hidden tree, and create new Node, Element, Text, ... proxies when 
you request a new portion.  So all the retrieval methods would have to 
be aware 
	
>My
>reasoning was that the Builder class would then build the tree with my
>enhanced Nodes. But that doesn't seem to be happening. Instead, Builder
>seems to be constructing the tree with regular core Nodes that don't
>recognize my totxt() method. Can anyone give me advice on how to achieve
>this?

	My suspicion is that subclassing Node classes isn't the way to
go; instead, you'll write functions and classes (probably using
existing classes such as Builder and Walker) that operate on DOM
trees.  However I'd really like to see a discussion of this.  We need
to work out common Python/DOM patterns, so that we can add appropriate
helper modules and functions.  (They'll also be useful to document as
examples.)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The multiple human needs and desires that demand privacy among two or more
people in the midst of social life must inevitably lead to cryptology wherever
men thrive and wherever they write.
    -- David Kahn, _The Codebreakers_


From akuchlin@cnri.reston.va.us  Fri Dec 11 20:35:30 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 11 Dec 1998 15:35:30 -0500 (EST)
Subject: [XML-SIG] Re: Equality tests on DOM nodes
In-Reply-To: <13937.25009.925375.550977@weyr.cnri.reston.va.us>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <36715D4A.9660A0D0@imall.com>
 <13937.25009.925375.550977@weyr.cnri.reston.va.us>
Message-ID: <13937.33182.137625.135265@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>Ray Whitmer writes:
> > The problem in Python is much bigger -- possibly rendering my advice
> > irrelevant -- since no official DOM API binding has been released for that
>
>  The spec does include IDL, and a Python binding for IDL is being
>developed.  (Now, I've not checked that the Python DOM uses the Python 
>IDL binding.  Andrew, perhaps you can address this in the Python
>XML-SIG?)

	I haven't checked it either, not having read the Python IDL
binding.  Since it uses Fnorb, I'd imagine that 4DOM definitely would
follow the IDL binding.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The boast of heraldry, the pomp of power, / And all that beauty, all that
wealth e'er gave, / Awaits alike th' inevitable hour: / The paths of glory
lead but to the grave.
    -- Thomas Gray


From jday@csihq.com  Fri Dec 11 20:45:08 1998
From: jday@csihq.com (John Day)
Date: Fri, 11 Dec 1998 15:45:08 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <000701be253c$a1688680$da39bfa8@arabbit>
References: <3.0.1.32.19981211141205.00767594@mail.csihq.com>
Message-ID: <3.0.1.32.19981211154508.0076cc40@mail.csihq.com>

At 02:29 PM 12/11/98 -0500, Paul Butkiewicz wrote:
>OK, I hadn't really thought about that.  But can you come up with a way of
>ordering nodes that deserves to be defined as part of a global and timeless
>standard instead of being merely implementation specific?
>
>

The branch of mathematics called "Order Theory" (a subset of Discete Math)
is already a 'global and timeless standard'. I don't think we would want
to dictate any specific orders. That should be left to specific
implementors. 

Example: Compare two documents: the Bible and the Koran. 
Under the concept 'testaments of religious beliefs' they are virtually
identical. Under the concept '<somebody's personal beliefs>', the books
might be completely different. [Jon Bosak's 'tstmt.dtd' is a kind
of 'most general unifier' for the first concept above]

-jday


From ray@imall.com  Fri Dec 11 21:01:33 1998
From: ray@imall.com (Ray Whitmer)
Date: Fri, 11 Dec 1998 14:01:33 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <000501be2535$03afcd60$da39bfa8@arabbit>
Message-ID: <3671882D.DC7D9E5A@imall.com>

Paul Butkiewicz wrote:

> A further implementation difficulty has occurred to me:  There are likely
> many people out there who would like to or are using the DOM in conjunction
> with a database, making the node objects persistent.  These folks would
> probably prefer that equality indicate not just that two nodes are identical
> but that they represent the same record in the database.

While this would be a useful function, I don't think it makes sense that it
should be the function of "equals".  But it does point out the many possible
interpretations, which was the point of my original response.  As I stated
before, overriding equals would be a bad idea without an agreed-upon portable
interpretation.  People wonder why some of us are not sad that Java doesn't
support general operator overloading, which would add yet another whole set of
such ambiguities as "equals" provides.

> I must be feeling contrary today, but I think you're saying isn't true.
> String.equals( String ) does examine the contents of two different objects
> to determine that they are identical.  But this is the case only because
> String explicitly overrides the equals( Object ) method in Object, which
> isn't true of many objects.  The equals( Object ) method in Object only
> returns true if the objects are actually the same object, ie.
> ( *x )->equals( *y ) if and only if x == y.

The point of equals is so that it can be overridden with a deeper,
class-specific interpretation.  While Object it is too incomplete for a good
deeper sense of equality, equals is only really useful with a set of classes
where it is overridden in at least some of the classes to provide a deeper (but
still consistent, transitive, symmetric, reflexive, useful) sense of equality.
Otherwise, just use the "==" operator.  Not only is the Object implementation of
equals redundant with the "==" operator, but it is also less efficient.

Ray Whitmer


From gwachob@aimnet.com  Fri Dec 11 21:19:35 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Fri, 11 Dec 1998 13:19:35 -0800 (PST)
Subject: [XML-SIG] New to Python OO
In-Reply-To: <13937.31578.270593.15411@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.05.9812111308250.28967-100000@shell1.ncal.verio.com>

On Fri, 11 Dec 1998, Andrew M. Kuchling wrote:

> >My
> >reasoning was that the Builder class would then build the tree with my
> >enhanced Nodes. But that doesn't seem to be happening. Instead, Builder
> >seems to be constructing the tree with regular core Nodes that don't
> >recognize my totxt() method. Can anyone give me advice on how to achieve
> >this?
> 
> 	My suspicion is that subclassing Node classes isn't the way to
> go; instead, you'll write functions and classes (probably using
> existing classes such as Builder and Walker) that operate on DOM
> trees.  However I'd really like to see a discussion of this.  We need
> to work out common Python/DOM patterns, so that we can add appropriate
> helper modules and functions.  (They'll also be useful to document as
> examples.)

(Speaking of the Python DOM implementation here)

The Walker class is sort of a Visitor (not really). The walker "calls
back" (really calls methods of its subclass) methods when the walker first
visits and when the walker leaves a particular Node (assuming a depth
first left-to-right traversal). 

A Visitor pattern class would not neccesarily include the "traversal"
function (subclasses could) -- it would simply have "handleElement", 
"handleAttribute", "handleText", etc (sorta like SAX). A visitor pattern
would handle an entire "subtree" at a time (I would guess) instead of
thinking of the tree in a traversal sense (ie "startElement",
"endElement"). 

It seems to me conceptually cleaner for most applications (if somewhat
less efficient in some cases) to deal with the tree structurally instead
of procedurally and thats why I would like to see a Visitor pattern..

Ultimately, it would be nice to be able to encode "transform" functions on
trees -- approaching and surpassing the functionality of XSL from a
programmtic (instead of stylesheet) point of view. For example (in prose):

Take all the children of the "AUTHOR" element which have the attribute
"INFORMATION" value of "PRIVATE" and compute a funciton on that attribute
value and put it in a list. 

XSL can do a lot of this, but not all (or at least not cleanly, IMHO). 

Thoughts?

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From rll@eps.inso.com  Fri Dec 11 21:47:39 1998
From: rll@eps.inso.com (Richard L. Lavallee)
Date: Fri, 11 Dec 1998 16:47:39 -0500
Subject: [XML-SIG] Re: Equality tests on DOM nodes
Message-ID: <199812112147.QAA11011@chineseballs.eps.inso.com>

Regarding the problem of comparing DOM nodes,
one implementation solution is to assign a "DOM node identifier" (DNI)
to each DOM node, and use these as the basis for comparison.

A DNI is an integer, base 1, which monotonically increases up to the
maximum number of nodes in a particular DOM.
The root node DNI is assigned "1", and the remainder are assigned
in pre-order.

When nodes persist their DNI's persist with them, for any given
version of the particular DOM instance.

So:  how may any two DOM nodes be compared?

Just examine their respective DNI's numerically.

E.g.. a DOM node with DNI 42 is "==" to a DOM node with DNI 42.

DNI_42 > DNI_5

DNI_9 < DNI_12

Of course, this works best for read-only DOM's;
because arbitrary node insertion would disrupt the DNI sequencing.
But I would argue that node insertion results in a new document version
which necessarily has its own uniques set of DNI's anyway.

How's that?

-rll


From ray@imall.com  Fri Dec 11 23:21:38 1998
From: ray@imall.com (Ray Whitmer)
Date: Fri, 11 Dec 1998 16:21:38 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <199812112147.QAA11011@chineseballs.eps.inso.com>
Message-ID: <3671A902.60DC600@imall.com>

Richard L. Lavallee wrote:

> Regarding the problem of comparing DOM nodes,
> one implementation solution is to assign a "DOM node identifier" (DNI)
> to each DOM node, and use these as the basis for comparison.
>
> A DNI is an integer, base 1, which monotonically increases up to the
> maximum number of nodes in a particular DOM.
> The root node DNI is assigned "1", and the remainder are assigned
> in pre-order.
>
> When nodes persist their DNI's persist with them, for any given
> version of the particular DOM instance.
>
> So:  how may any two DOM nodes be compared?
>
> Just examine their respective DNI's numerically.
>
> E.g.. a DOM node with DNI 42 is "==" to a DOM node with DNI 42.
>
> DNI_42 > DNI_5
>
> DNI_9 < DNI_12
>
> Of course, this works best for read-only DOM's;
> because arbitrary node insertion would disrupt the DNI sequencing.
> But I would argue that node insertion results in a new document version
> which necessarily has its own uniques set of DNI's anyway.

I think what you are proposing is yet another type of comparison function
that detects the order of two nodes in traversal order of the hierarchy.
This is a very useful function, too, which should be assigned to yet
another function.

I had to improve on the methodology you describe as follows to efficiently
manage a mutable (modifiable) hierarchy:

First, don't run the numbers continuously through the hierarchy, but rather
keep different sequences for each set of siblings.  Then, count the depth
of each node being compared, replace the node that is deeper with its
ancestor at the higher level, and go up the tree until you find the
siblings with the common ancestor.  Then, use the numbers to find the order
there.  But if you have large numbers of siblings, this is still a problem
shifting large ranges, potentially of millions of siblings.

So my final solution was to represent siblings in a btree, and then order
just within fixed-length btree nodes, so you never have to shift many at
all, and you can still compare quite rapidly.

Ray Whitmer


From uche.ogbuji@fourthought.com  Sat Dec 12 00:30:37 1998
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 11 Dec 1998 17:30:37 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
In-Reply-To: Your message of "Fri, 11 Dec 1998 15:35:30 EST."
 <13937.33182.137625.135265@amarok.cnri.reston.va.us>
Message-ID: <199812120030.RAA08745@malatesta.local>

> Fred L. Drake writes:
> >Ray Whitmer writes:
> > > The problem in Python is much bigger -- possibly rendering my advice
> > > irrelevant -- since no official DOM API binding has been released for that
> >
> >  The spec does include IDL, and a Python binding for IDL is being
> >developed.  (Now, I've not checked that the Python DOM uses the Python 
> >IDL binding.  Andrew, perhaps you can address this in the Python
> >XML-SIG?)
> 
> 	I haven't checked it either, not having read the Python IDL
> binding.  Since it uses Fnorb, I'd imagine that 4DOM definitely would
> follow the IDL binding.

Yes.  In fact, we are participating in the do-sig to complete and formalize 
the IDL binding.  We ran into problematic differences between both of the main 
current ORBs that support Python: Fnorb and ILU, and there has been discussion 
of this, all of which should lead to even more clarity in the Python-IDL 
binding.

The IDL binding _does_ give good guidance on how to interpret the DOM spec, 
since so much of DOM is formally specified in IDL, and the Python-IDL binding 
in its current state is not too difficult a read, so you might want to check 
it out at:

http://www.python.org/sigs/do-sig/corbamap.html


-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sat Dec 12 00:59:06 1998
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 11 Dec 1998 17:59:06 -0700
Subject: [XML-SIG] New to Python OO
In-Reply-To: Your message of "Fri, 11 Dec 1998 13:19:35 PST."
 <Pine.GSO.4.05.9812111308250.28967-100000@shell1.ncal.verio.com>
Message-ID: <199812120059.RAA08780@malatesta.local>

> The Walker class is sort of a Visitor (not really). The walker "calls
> back" (really calls methods of its subclass) methods when the walker first
> visits and when the walker leaves a particular Node (assuming a depth
> first left-to-right traversal). 
> 
> A Visitor pattern class would not neccesarily include the "traversal"
> function (subclasses could) -- it would simply have "handleElement", 
> "handleAttribute", "handleText", etc (sorta like SAX). A visitor pattern
> would handle an entire "subtree" at a time (I would guess) instead of
> thinking of the tree in a traversal sense (ie "startElement",
> "endElement"). 

This is an excellent point.  We are currently working on introducing the 
visitor pattern into 4DOM for the next version or two, over which we would 
overlay a global function, tentatively VisitInOrder(), which does the 
equivalent of the walker on PyDOM by doing an in-order traversal and invoking 
accept(AppropriateVisitor) on each of the DOM nodes.  We like this idea 
because of the extensibility: we can then have visitors that print out raw 
text, or that pretty-print with extra whitespace.  A user could add his own 
visitor that performs transforms as you mention, etc.

> It seems to me conceptually cleaner for most applications (if somewhat
> less efficient in some cases) to deal with the tree structurally instead
> of procedurally and thats why I would like to see a Visitor pattern..
> 
> Ultimately, it would be nice to be able to encode "transform" functions on
> trees -- approaching and surpassing the functionality of XSL from a
> programmtic (instead of stylesheet) point of view. For example (in prose):
> 
> Take all the children of the "AUTHOR" element which have the attribute
> "INFORMATION" value of "PRIVATE" and compute a funciton on that attribute
> value and put it in a list. 
> 
> XSL can do a lot of this, but not all (or at least not cleanly, IMHO). 
> 
> Thoughts?

It seems to me that your example above is definitely not in the domain of 
style-sheets, but DOM programming.  I guess I could imagine the ECMAScript to 
do it in XSL, but it just makes me ask "why?".

A DOM visitor, or a SAX application, however, appear far more appropriate ways 
to do this.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org


From gwachob@aimnet.com  Sat Dec 12 07:54:06 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Fri, 11 Dec 1998 23:54:06 -0800 (PST)
Subject: [XML-SIG] My DOM Visitor Class(es)
Message-ID: <Pine.GSO.4.05.9812112340350.24313-100000@shell1.ncal.verio.com>

Hi folks-
	I threw together a very simple DOM Visitor class (it also has a
"Walker" mixin to create a Visitor that automatically walks a tree and
visits each Node).
	You can get it at:
http://www.aimnet.com/~gwachob/DOMVisitor.py

	I use the term "Visitor" loosely -- while inspired by the Visitor
Design Pattern in the book "Design Patterns", it is technically not
following that pattern. It looks to be useful nontheless. 
	The basic Visitor class does very very little -- if you subclass
it, you must add visit_ELEMENT, visit_TEXT, etc methods. The basic Visitor
class simply gets the type of the node you pass it and calls a
visit_<TYPENAME> method on that Node. (ie visit_TEXT(node)). 
	The WalkerMixin changes this basic behavior by visiting the Node's
children after the Node itself is visited. What makes this really nice is
that the method which visits the Node returns a value which tells the main
dispatcher method (visit) whether or not to visit the Node's children.
Thus, whole subtrees can be treated separately (or not at all) depending
on a visit to the root node of the subtree (that visit to the "root" node
of a subtree can visit parts of the subtree itself, or you may decide in
implementing the visit method that you can skip the entire subtree because
that subtree is irrelevant for your purposes. You can even build a
separate walker for that subtree to do some completely different
processing. How about multithreaded parsing?) 

Wouldn't using a DOM tree in this way (structurally) better allow DOM
parsers to only hold part of the DOM tree in memory?

	Anyway, enough rambling, I'd like people to take a look at the
code, tell me what they think (has any body else written code like this?),
tell me what improvements the code would need (yeah, yeah, it uses
recursion), etc. 
	Oh yeah, and please use it for your own projects!

	-Gabe
	

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From fleck@informatik.uni-bonn.de  Sat Dec 12 11:02:13 1998
From: fleck@informatik.uni-bonn.de (Markus Fleck)
Date: Sat, 12 Dec 1998 12:02:13 +0100
Subject: [XML-SIG] Python WebDAV at Xerox?
Message-ID: <36724D35.3345@informatik.uni-bonn.de>

Hi!

I just found out about Xerox's DAV server & client
in Python, <http://sandbox.xerox.com/webdav/>.
The code hasn't been released yet, but it is
mentioned in the WebDAV interoperability matrix at
<http://www.ics.uci.edu/pub/ietf/webdav/interop.html>.

Quoting from <http://sandbox.xerox.com/webdav/>:

>  About the implementation
>
>  The server is implemented in Python, and runs on
>  Python 1.4 or later. It runs on Unix and Windows.
>  The persistent store for resources is is a Posix file,
>  properties are stored in a dbm database. 
>
>  I am attempting to make the source code available, but
>  must secure permission from Xerox lawyers. Please be patient. 
>
>  I also have a client-side library in Python. Likewise, I am
>  attempting to release it. 
>
>  Feedback
>
>  Send comments to jdavis@parc.xerox.com. 

Yours,
Markus.

-- 
////////////////////////////////////////////////////////////////////////////
   Markus B Fleck - University of Bonn - CS Department IV - WHOIS MF5079
          UNIX Administrator - comp.lang.python.announce Moderator
   "GNU Gather" Free Internet Groupware Project - http://cscw.net/gather/
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


From MHammond@skippinet.com.au  Sat Dec 12 11:12:47 1998
From: MHammond@skippinet.com.au (Mark Hammond)
Date: Sat, 12 Dec 1998 22:12:47 +1100
Subject: [XML-SIG] XBEL Patch to msie_parse.py
Message-ID: <002501be25c0$59c25890$0801a8c0@bobcat>

This is a multi-part message in MIME format.

------=_NextPart_000_0026_01BE261C.8D32D090
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Better late than never :-)

Ive attached a diff which attempts to use the "win32api" module to
locate the favorites folder in the registry.  It has been tested on NT
and 98, but only on English systems - Im fairly sure that it will also
work on non-English systems and Windows 95, but all testing
appreciated :-)

I took the approach that a command line arg could point to the
favorites folder - if not specified, then it attempts to use win32api
to find it.  If that fails, it prints a message asking for the command
line param.

Also note that there where a couple of other changes WRT the arguments
to certain functions - it appears this file did not keep up to date
with bookmark.py

Mark.

------=_NextPart_000_0026_01BE261C.8D32D090
Content-Type: application/octet-stream;
	name="msie_parse.diff"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
	filename="msie_parse.diff"

KioqIFx0ZW1wXG1zaWVfcGFyc2UucHkJV2VkIERlYyAwMiAyMjozNDowMiAxOTk4Ci0tLSBtc2ll
X3BhcnNlLnB5CVNhdCBEZWMgMTIgMjE6MDg6NDkgMTk5OAoqKioqKioqKioqKioqKioKKioqIDE3
LDI4ICoqKioKICBjbGFzcyBNU0lFOg0KICAgICAgIyBpbnRlcm5ldCBleHBsb3Jlcg0KICANCiEg
ICAgIGRlZiBfX2luaXRfXyhzZWxmLGJvb2ttYXJrcyk6DQohICAgICAgICAgIyBGSVhNRTogdXNl
IHJlZ2lzdHJ5IGZvciB0aGlzIQ0KISANCiAgICAgICAgICBzZWxmLmJtcz1ib29rbWFya3MNCiAg
ICAgICAgICBzZWxmLnJvb3QgPSBOb25lDQohICAgICAgICAgc2VsZi5wYXRoID0gb3MucGF0aC5q
b2luKFVTUkRJUiwgRElSKQ0KICANCiAgICAgICAgICBzZWxmLl9fd2FsaygpDQogIA0KLS0tIDE3
LDI2IC0tLS0KICBjbGFzcyBNU0lFOg0KICAgICAgIyBpbnRlcm5ldCBleHBsb3Jlcg0KICANCiEg
ICAgIGRlZiBfX2luaXRfXyhzZWxmLGJvb2ttYXJrcywgcGF0aCk6DQogICAgICAgICAgc2VsZi5i
bXM9Ym9va21hcmtzDQogICAgICAgICAgc2VsZi5yb290ID0gTm9uZQ0KISAgICAgICAgIHNlbGYu
cGF0aCA9IHBhdGgNCiAgDQogICAgICAgICAgc2VsZi5fX3dhbGsoKQ0KICANCioqKioqKioqKioq
KioqKgoqKiogMzIsNDQgKioqKgogICAgICAgICAgZm9yIGZpbGUgaW4gb3MubGlzdGRpcihwYXRo
KToNCiAgICAgICAgICAgICAgZnVsbG5hbWUgPSBvcy5wYXRoLmpvaW4ocGF0aCwgZmlsZSkNCiAg
ICAgICAgICAgICAgaWYgb3MucGF0aC5pc2RpcihmdWxsbmFtZSk6DQohICAgICAgICAgICAgICAg
ICBzZWxmLmJtcy5hZGRfZm9sZGVyKGZpbGUsTm9uZSxOb25lKQ0KICAgICAgICAgICAgICAgICAg
c2VsZi5fX3dhbGsoc3VicGF0aCArIFtmaWxlXSkNCiAgICAgICAgICAgICAgZWxzZToNCiAgICAg
ICAgICAgICAgICAgIHVybCA9IHNlbGYuX19nZXR1cmwoZnVsbG5hbWUpDQogICAgICAgICAgICAg
ICAgICBpZiB1cmw6DQogICAgICAgICAgICAgICAgICAgICAgc2VsZi5ibXMuYWRkX2Jvb2ttYXJr
KG9zLnBhdGguc3BsaXRleHQoZmlsZSlbMF0sTm9uZSwNCiEgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgTm9uZSx1cmwpDQogIA0KICAgICAgZGVmIF9fZ2V0dXJsKHNl
bGYsIGZpbGUpOg0KICAgICAgICAgIHRyeToNCi0tLSAzMCw0MiAtLS0tCiAgICAgICAgICBmb3Ig
ZmlsZSBpbiBvcy5saXN0ZGlyKHBhdGgpOg0KICAgICAgICAgICAgICBmdWxsbmFtZSA9IG9zLnBh
dGguam9pbihwYXRoLCBmaWxlKQ0KICAgICAgICAgICAgICBpZiBvcy5wYXRoLmlzZGlyKGZ1bGxu
YW1lKToNCiEgICAgICAgICAgICAgICAgIHNlbGYuYm1zLmFkZF9mb2xkZXIoZmlsZSxOb25lKQ0K
ICAgICAgICAgICAgICAgICAgc2VsZi5fX3dhbGsoc3VicGF0aCArIFtmaWxlXSkNCiAgICAgICAg
ICAgICAgZWxzZToNCiAgICAgICAgICAgICAgICAgIHVybCA9IHNlbGYuX19nZXR1cmwoZnVsbG5h
bWUpDQogICAgICAgICAgICAgICAgICBpZiB1cmw6DQogICAgICAgICAgICAgICAgICAgICAgc2Vs
Zi5ibXMuYWRkX2Jvb2ttYXJrKG9zLnBhdGguc3BsaXRleHQoZmlsZSlbMF0sTm9uZSwNCiEgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTm9uZSxOb25lLHVybCkNCiAg
DQogICAgICBkZWYgX19nZXR1cmwoc2VsZiwgZmlsZSk6DQogICAgICAgICAgdHJ5Og0KKioqKioq
KioqKioqKioqCioqKiA1OCw2MiAqKioqCiAgIyAtLS0gVGVzdHByb2dyYW0NCiAgDQogIGlmIF9f
bmFtZV9fID09ICdfX21haW5fXyc6DQohICAgICBtc2llPU1TSUUoYm9va21hcmsuQm9va21hcmtz
KCkpDQogICAgICBtc2llLmJtcy5kdW1wX3hiZWwoKQ0KLS0tIDU2LDc3IC0tLS0KICAjIC0tLSBU
ZXN0cHJvZ3JhbQ0KICANCiAgaWYgX19uYW1lX18gPT0gJ19fbWFpbl9fJzoNCiEgICAgIGltcG9y
dCBzeXMNCiEgICAgIGlmIGxlbihzeXMuYXJndik+MToNCiEgICAgICAgICBwYXRoID0gc3lzLmFy
Z3ZbMV0NCiEgICAgIGVsc2U6DQohICAgICAgICAgdHJ5Og0KISAgICAgICAgICAgICBpbXBvcnQg
d2luMzJhcGksIHdpbjMyY29uDQohICAgICAgICAgZXhjZXB0IEltcG9ydEVycm9yOg0KISAgICAg
ICAgICAgICBwcmludCAiVGhlIHdpbjMyYXBpIG1vZHVsZSBpcyBub3QgYXZhaWxhYmxlIG9uIHRo
aXMgc3lzdGVtIg0KISAgICAgICAgICAgICBwcmludCAic28gd2UgY2FudCBhdXRvbWF0aWNhbGx5
IGZpbmQgeW91ciBmYXZvcml0ZXMgZm9sZGVyLiINCiEgICAgICAgICAgICAgcHJpbnQgIlBsZWFz
ZSByZS1ydW4gdGhpcyBwcm9ncmFtIHNwZWNpZml5aW5nIHRoZSBsb2NhdGlvbiBvZiB5b3VyIg0K
ISAgICAgICAgICAgICBwcmludCAiZmF2b3JpdGVzIGZvbGRlciBvbiB0aGUgY29tbWFuZCBsaW5l
LiINCiEgICAgICAgICAgICAgc3lzLmV4aXQoMSkNCiEgICAgICAgICBrZXluYW1lID0gciJTb2Z0
d2FyZVxNaWNyb3NvZnRcV2luZG93c1xDdXJyZW50VmVyc2lvblxFeHBsb3JlclxTaGVsbCBGb2xk
ZXJzIg0KISAgICAgICAgIGhrZXkgPSB3aW4zMmFwaS5SZWdPcGVuS2V5KHdpbjMyY29uLkhLRVlf
Q1VSUkVOVF9VU0VSLCBrZXluYW1lKQ0KISAgICAgICAgIHBhdGgsIHBhdGh0eXBlID0gd2luMzJh
cGkuUmVnUXVlcnlWYWx1ZUV4KGhrZXksICJGYXZvcml0ZXMiKQ0KISAgICAgICAgIGFzc2VydCBw
YXRodHlwZSA9PSB3aW4zMmNvbi5SRUdfU1oNCiEgDQohICAgICBtc2llPU1TSUUoYm9va21hcmsu
Qm9va21hcmtzKCksIHBhdGgpDQogICAgICBtc2llLmJtcy5kdW1wX3hiZWwoKQ0K

------=_NextPart_000_0026_01BE261C.8D32D090--


From gstein@lyra.org  Sat Dec 12 11:29:12 1998
From: gstein@lyra.org (Greg Stein)
Date: Sat, 12 Dec 1998 03:29:12 -0800
Subject: [XML-SIG] Python WebDAV at Xerox?
References: <36724D35.3345@informatik.uni-bonn.de>
Message-ID: <36725388.36448E78@lyra.org>

Markus Fleck wrote:
> 
> Hi!
> 
> I just found out about Xerox's DAV server & client
> in Python, <http://sandbox.xerox.com/webdav/>.
> The code hasn't been released yet, but it is
> mentioned in the WebDAV interoperability matrix at
> <http://www.ics.uci.edu/pub/ietf/webdav/interop.html>.

He has been itching to get it released since September :-)

Jim did state there is one benefit to the delay in the release. It
guarantees that mod_dav was built independently. The IETF likes
independent implementations before moving a Proposed Standard to an
Actual Standard.

On topic:
I'm not sure what XML parser he uses for the message bodies. I got a
couple tracebacks during some initial interop testing, but I didn't
immediately recognize anything.

Jim also has a DAV client written in Python. No idea on the XML stuff
there either.

Note that he doesn't deal with some of the encoding issues yet.

mod_dav uses James Clark's Expat parser (nice parser!).

Cheers,
-g

p.s. okay. so I didn't really say anything interesting or useful. bleh.
:-)

--
Greg Stein, http://www.lyra.org/


From arabbit@earthlink.net  Sat Dec 12 15:34:21 1998
From: arabbit@earthlink.net (Paul Butkiewicz)
Date: Sat, 12 Dec 1998 10:34:21 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <3671A902.60DC600@imall.com>
Message-ID: <000701be25e4$e3826f60$5839bfa8@arabbit>

>First, don't run the numbers continuously through the hierarchy, but rather
>keep different sequences for each set of siblings.  Then, count the depth
>of each node being compared, replace the node that is deeper with its
>ancestor at the higher level, and go up the tree until you find the
>siblings with the common ancestor.  Then, use the numbers to find the order
>there.  But if you have large numbers of siblings, this is still a problem
>shifting large ranges, potentially of millions of siblings.

>So my final solution was to represent siblings in a btree, and then order
>just within fixed-length btree nodes, so you never have to shift many at
>all, and you can still compare quite rapidly.

We're getting way into implementation-specific details here, but in the
first proposed solution:  Suppose we are in an environment that requires us
to both be able to insert nodes quickly and obtain a node's order quickly
and we have a large number of nodes.  And we're implementing the first
solution.  There isn't really a reason that the number has to be an integer,
is there?  For quick insertion and ordering, we could very well keep two
integers, numerator and denominator, and if something belongs between 1/1
and 2/1 we just stick it at 1/2 rather than changing the numbers on the next
20000 nodes.  And then, later, when the system is taking a breather, we can
come back, lock the whole set of siblings, and rearrange the numbers?

Not that anyone actually implements things this way, probably for good
reason, but if I can't throw out crazy ideas here, where can I?

Paul

P.S.  Ray, you missed my point on the whole Object.equals thing.  My point
is that if we look to java for guidance (which must make *someone* out there
cringe :), than the way equals is implemented in String is the exception
rather than the norm.  I don't think nodes are like strings at all.


From bbennett@unixg.ubc.ca  Sat Dec 12 22:40:28 1998
From: bbennett@unixg.ubc.ca (Bruce Bennett)
Date: Sat, 12 Dec 1998 14:40:28 -0800
Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat
Message-ID: <l03130300b295f5e65ff5@[207.23.94.54]>

Greetings, xml-sig folk --

When I try to import pyexpat (as at the beginning of pyexpattest.py), I see
a curious error message:

	Python 1.5.1 (#37, Apr 27 1998, 13:36:17)  [CW CFM68K w/GUSI w/MSL]
	Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
	>>> import sys
	>>> import pyexpat
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	ImportError: PythonCore--PySys_WriteStderr:
	  A fragment had "hard" unresolved imports.

Having no clue what this errmsg really means, I then try to import
'Pyexpat', hoping to confirm that Python is seeing the pyexpat lib file:

	>>> import Pyexpat
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	NameError: Case mismatch for module name Pyexpat
	(filename pyexpat.cfm68k.slb)

So yes, it's seeing it.

Is there a problem with pyexpat.cfm68k.slb? Or with something I'm (not)
doing? Does importing pyexpat require the definition of paths to other
dependencies in the xml-0.5 package?

I'm running System 7.5.5 with CFM-68K Runtime Enabler v. 4.0, and
encountering no other problems with Mac Python 1.5.1.

--

BTW, in the recently-released xml-0.5 package for Python, the file
README.pyexpat says the requisite Macintosh binaries are available as

>   ftp://ftp.cwi.nl/pub/jack/python/pyexpat.hqx (macintosh binary-only).
                                     ^^^^^^^^^^^
At present, however, the filename in fact seems to be 'pyexpat.sit.hqx'.
Further (in case the preceding observation wasn't petty enough), conformity
with other Mac Python shared libs suggests the orthography
'pyexpat.CFM68k.slb' instead of the current 'pyexpat.cfm68k.slb'.

Regards,

-- Bruce Bennett <bbennett@unixg.ubc.ca>


From kajiyama@etl.go.jp  Sun Dec 13 08:14:49 1998
From: kajiyama@etl.go.jp (Tamito Kajiyama)
Date: Sun, 13 Dec 98 08:14:49 JST
Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat
In-Reply-To: <l03130300b295f5e65ff5@[207.23.94.54]> (bbennett@unixg.ubc.ca)
Message-ID: <9812122314.AA16915@etlibs2.etl.go.jp>

bbennett@unixg.ubc.ca (Bruce Bennett) writes:
| 
| When I try to import pyexpat (as at the beginning of pyexpattest.py), I see
| a curious error message:
| 
| 	Python 1.5.1 (#37, Apr 27 1998, 13:36:17)  [CW CFM68K w/GUSI w/MSL]
| 	Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
| 	>>> import sys
| 	>>> import pyexpat
| 	Traceback (innermost last):
| 	  File "<stdin>", line 1, in ?
| 	ImportError: PythonCore--PySys_WriteStderr:
| 	  A fragment had "hard" unresolved imports.

I suspect that the XML package you are using is of a pre-release version
(probably xml-0.5pre1).

If you have compiled the XML package yourself, try the final version of
it.  If you have installed a binary distribution for Macintosh, ask the
maintainer of the binary distribution ;-)

-- 
KAJIYAMA, Tamito <kajiyama@etl.go.jp>


From dieter@handshake.de  Sun Dec 13 21:17:23 1998
From: dieter@handshake.de (Dieter Maurer)
Date: 13 Dec 1998 22:17:23 +0100
Subject: [XML-SIG] ANN: WeakDict's: addressing CPython's problem with cyclic structures
Message-ID: <x7g1aj91y3.fsf@lindm.dm>

The following message is a courtesy copy of an article
that has been posted as well.

WeakDict (Weak Dictionaries) have been designed to address
CPythons problems with cyclic references.
More precisely, WeakDict's allow the realization
of weak references, references that are **NOT** counted in
the reference count and can therefore be used to build
cyclic structures without obstructing the reference counting
scheme.

This might be interesting e.g. for the DOM implementation of
the XML-SIG. Other applications include object maps and
caches of various kinds.

WeakDict's are very similar to normal Python dictionaries,
with the following essential exceptions:

 - all values in a WeakDict must be instances of 'WeakValue'
   (or a derived class)

 - the reference to a value in a WeakDict is *NOT* counted
   in the reference count of the value.
   Thus, it does not prevent the value from being garbaged collected.

 - When a value is garbaged collected, the corresponding
   entry disappears from the WeakDict.

More information and download:
	URL:http://www.handshake.de/~dieter/weakdict.html


From paul@prescod.net  Sun Dec 13 21:32:33 1998
From: paul@prescod.net (Paul Prescod)
Date: Sun, 13 Dec 1998 15:32:33 -0600
Subject: [XML-SIG] Zope, DTML and XML
Message-ID: <36743271.376A09A8@prescod.net>

Of course Zope must eventually move into the XML world. Zope needs to do
templates. XSL also does templates. In fact templates are almost as
central to XSL as they are to Zope. I would suggest that Zope should use
XSL template syntax for DTML templates as far as is possible. In fact,
maybe when XSL becomes popular enough, it might make sense to describe the
interaction between Zope and the Python runtime in terms of XML
transformations. That's for the future, though. 

In the meantime, the point is that the template syntax should be the same.
Here are the details from the current XSL spec:

"The value of an attribute of a literal result element is interpreted as
an attribute value template: it can contain string expressions contained
in curly braces ({})."

"Within a template, the xsl:value-of element can be used to compute
generated text, for example by extracting text from the source tree or by
inserting the value of a string constant. The xsl:value-of element does
this with a string expression that is specified as the value of the expr
attribute. String expressions can also be used inside attribute values of
literal result elements by enclosing the string expression in curly brace
({})."

"The xsl:value-of element is replaced by the value of the string
expression specified by the expr attribute. The expr attribute is
required."

"e.g. <xsl:value-of expr="attribute(first-name)"/>"

"In an attribute value that is interpreted as an attribute value template,
such as an attribute of a literal result element, string expressions can
be used by surrounding the string expression with curly braces ({}). The
attribute value template is instantiated by replacing the string
expression together with surrounding curly braces by the value of the
string expression.

The following example creates an IMG result element from a photograph
element in the source; the value of the SRC attribute of the IMG element
is computed from the value of the image-dir constant and the content of
the href child of the photograph element; the value of the WIDTH attribute
of the IMG element is computed from the value of the the width attribute
of the size child of the photograph element:

<xsl:define-constant name="image-dir" value="/images"/>

<xsl:template match="photograph">
<IMG SRC="{constant(image-dir)}/{href}" WIDTH="{size/attribute(width)}"/>
</xsl:template>

With this source

<photograph>
  <href>headquarters.jpg</href>
  <size width="300"/>
</photograph>

the result would be

<IMG SRC="/images/headquarters.jpg" WIDTH="300"/>

When an attribute value template is instantiated, a double left or right
curly brace outside a string expression will be replaced by a single curly
brace. It is an error if a right curly brace occurs in an attribute value
template outside a string expression without being followed by a second
right curly brace; an XSL processor may signal the error or recover by
treating the right curly brace as if it had been doubled. A right curly
brace inside an AttributeValue in a string expression is not recognized as
terminating the string expression."

http://www.w3.org/TR/WD-xsl

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From jeremy@allaire.com  Mon Dec 14 01:33:05 1998
From: jeremy@allaire.com (Jeremy Allaire)
Date: Sun, 13 Dec 1998 20:33:05 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <C3843BD1B83DD2119D79000092A7BAD449BFC4@PLATINUM.allaire.com>

Hello folks-

I'm interesting in engaging anyone/everyone from the Python community to
work with us on a WDDX platform module for Python.  With the help of a few
developers, we've been able to muster/ship WDDX modules for ASP/COM, Java,
ColdFusion, Perl and JavaScript, and would love to see a Python
implementation.

Given the recent XML release for Python, seems like it would be a great
project to make cross-language distributed web applications even more
possible.

Take a visit to www.WDDX.org, and most importantly take a view of the SDK,
developed by Nate Weiss, which brings it all together with all of the above
languages.

Best and regards,
Jeremy Allaire


From jim@Digicool.com  Mon Dec 14 12:57:54 1998
From: jim@Digicool.com (Jim Fulton)
Date: Mon, 14 Dec 1998 12:57:54 +0000
Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML
References: <36743271.376A09A8@prescod.net>
Message-ID: <36750B52.EE1EBC7D@digicool.com>

Paul Prescod wrote:
> 
> Of course Zope must eventually move into the XML world. Zope needs to do
> templates.

It already does, via DTML.

> XSL also does templates.

I would have thought that XSL *was* a template mechanism.  What do you mean
by "template"?

> In fact templates are almost as central to XSL as they are to Zope.

I would say far more so,

> I would suggest that Zope should use
> XSL template syntax for DTML templates as far as is possible.

It appears to me that DTML and XSL represent two very different
approaches to solving the same or similar problems.  They are
both intended for generating text from objects.  DTML generates text
from Python objects.  XSL generates text from XML objects.

DTML takes a higly procedural approach.  In DTML, you generate
text directly.  In XSL (as I understand it) you specify a set of
rules for applying transformations to XML elements.  This is fairly 
declarative in nature.  In the example you gave, you didn't render a 
specific picture element.  Instead, you have a rule for converting
picture elements to img tags.

Another difference between DTML and XSL is in how content is determined.
DTML is typically used to define as well as format content.  A DTML 
document directly specifies data that is often extracted from large
object spaces.  In XSL, it appears that the content is largely defined
by a source document and an XSL "template" simply specifies transformations.
Of course, an XSL specification can also filter, so there is some
ability to extract, but it is much less direct than with DTML.

Given the very different natures of DTML and XSL, I don't see much
point in making the syntaxes all that consistent. 

> In fact,
> maybe when XSL becomes popular enough, it might make sense to describe the
> interaction between Zope and the Python runtime in terms of XML
> transformations.

It may very well. If Zope made it easy to generate XML from Zope (ie Python)
objects, then people who  like XSL could apply XSL transformations to the
resulting XSL, bypassing DTML altogether.

In other words, I see XSL as an alternative to DTML, not another form of it.

Or, DTML may turn out to be a good tool for generating XML from objects, and
then XSL could be applied to DTML output, in which case the two would
act in tandem.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (540) 371-6909              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From paul@prescod.net  Mon Dec 14 12:52:27 1998
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Dec 1998 06:52:27 -0600
Subject: [XML-SIG] Perl and character encodings
Message-ID: <36750A0B.EBEB7355@prescod.net>

Thought this might be of interest:

> Version 2.17 of XML::Parser has been uploaded to CPAN. With this version,
> the entire API of James Clark's expat library is accessible from perl.
> 
> The major new feature is access to character set encodings other than
> expat's built-in set (UTF-8, UTF-16, ISO-8859-1, US-ASCII). This is done
> through binary character encoding maps appearing in the pathlist
> represented by @XML::Parser::Expat::Encoding_Path. The following encoding
> maps come with this distribution and require no further action on the part
> of the user, i.e. if expat comes across the encoding, it will just use it
> without user intervention:
> 
> Big5
> ISO-8859-2
> ISO-8859-3
> ISO-8859-4
> ISO-8859-5
> ISO-8859-7
> ISO-8859-8
> ISO-8859-9
> Shift_JIS
> windows-1250
> 
> Other maps may be created and installed in the encoding search path by
> using the tools in the newly released XML::Encoding distribution.

--

> Subject: Re: XML::Parser Version 2.17 has been uploaded to CPAN
> From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
> Date: Mon, 14 Dec 1998 15:53:48 +0900
> X-Message-Number: 4
> 
> I tried an XML document in Shift_JIS and an equivalent document in UTF-16.  
> XML::Parser created exactly the same result.  Great work!
> 
> Cheers,
> 
> Makoto
>  
> Fuji Xerox Information Systems


-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From ray@imall.com  Mon Dec 14 18:12:23 1998
From: ray@imall.com (Ray Whitmer)
Date: Mon, 14 Dec 1998 11:12:23 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <000701be25e4$e3826f60$5839bfa8@arabbit>
Message-ID: <36755507.F8657565@imall.com>

Paul Butkiewicz wrote:

> We're getting way into implementation-specific details here, but in the
> first proposed solution:  Suppose we are in an environment that requires us
> to both be able to insert nodes quickly and obtain a node's order quickly
> and we have a large number of nodes.  And we're implementing the first
> solution.  There isn't really a reason that the number has to be an integer,
> is there?  For quick insertion and ordering, we could very well keep two
> integers, numerator and denominator, and if something belongs between 1/1
> and 2/1 we just stick it at 1/2 rather than changing the numbers on the next
> 20000 nodes.  And then, later, when the system is taking a breather, we can
> come back, lock the whole set of siblings, and rearrange the numbers?
>
> Not that anyone actually implements things this way, probably for good
> reason, but if I can't throw out crazy ideas here, where can I?

Yes, or leave huge gaps in your integer values, or use something like a bit
string, where you can keep tacking bits on.  I pursued this type of solutions
for quite a while before I used the BTree solution.  It still gets quite messy
in large situations.  I came up with the BTree solution because it was far less
messy, im my experience, and scaled much better.

It is not clear when you talk about "the first solution" if you mean keeping
consecutive ordering throughout the hierarchy, or only of siblings.  Keeping it
throughout the hierarchy is even less managable.

> P.S.  Ray, you missed my point on the whole Object.equals thing.  My point
> is that if we look to java for guidance (which must make *someone* out there
> cringe :), than the way equals is implemented in String is the exception
> rather than the norm.  I don't think nodes are like strings at all.

I don't think I missed the point.  You didn't say to look to Java for guidance.
You said to look to the default implementation in Java Object, which I argued
does not and can not represent the purpose of equals in Java, which String,
Color, DataFlavor, Dimension, Font, Insets, MenuShortcut, Point, Rectangle, File
... -- any of the 63 classes in jdk1.1.7a that override equals -- do a better
job of representing.

String and these other classes are not the exception.  They are the rule, point,
and whole purpose of having an equals method.  Classes which have not overridden
equals have a less meaningful definition.  I use "equals" on classes which have
overridden it much more often than on those which have not overridden it.  If I
want to know whether two are the same allocation, I will use "==".  If I want to
know if one successfully substitutes for the other without changing meaning, I
use "equals".  There can be ambiguity in judging what should be significant in
the equals call, but it is not unreasonable to expect that the Java DOM binding
might eventually specify some behavior here, which would not be the "=="
comparison.

Ray Whitmer


From paul@prescod.net  Mon Dec 14 19:03:23 1998
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Dec 1998 13:03:23 -0600
Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML
References: <36743271.376A09A8@prescod.net> <36750B52.EE1EBC7D@digicool.com>
Message-ID: <367560FB.632C3ED5@prescod.net>

Jim Fulton wrote:
> 
> Paul Prescod wrote:
> >
> > Of course Zope must eventually move into the XML world. Zope needs to do
> > templates.
> 
> It already does, via DTML.

Right, but DTML code is not valid XML code. It can't be edited in an XML
editor, stored in an XML repository, routed through XML-based workflow,
etc. etc.

> > XSL also does templates.
> 
> I would have thought that XSL *was* a template mechanism.  What do you mean
> by "template"?

XSL can be thought of as a template mechanism. But an XSL stylesheet has
many templates and describes a flow of control between them, whereas DTML
documents are a single template.

> > In fact templates are almost as central to XSL as they are to Zope.
> 
> I would say far more so,

Fair enough. I meant to say that that they are almost as central to XSL as
they are to DTML. six of one...

> > I would suggest that Zope should use
> > XSL template syntax for DTML templates as far as is possible.
> 
> It appears to me that DTML and XSL represent two very different
> approaches to solving the same or similar problems.  They are
> both intended for generating text from objects.  DTML generates text
> from Python objects.  XSL generates text from XML objects.

Not quite. XSL generates XML objects (technically speaking, "nodes") from
other XML objects (other nodes).

> DTML takes a higly procedural approach.  In DTML, you generate
> text directly.  In XSL (as I understand it) you specify a set of
> rules for applying transformations to XML elements.  This is fairly
> declarative in nature.  In the example you gave, you didn't render a
> specific picture element.  Instead, you have a rule for converting
> picture elements to img tags.

Right. But the same holds for DTML. You don't write DTML to generate an
IMG tag for a specific picture. If you knew exactly what picture you
wanted, you would use the HTML for it. You use DTML extensions when you
want to figure out the picture to use at runtime, just like in XSL. I
don't see this as a difference.

> Another difference between DTML and XSL is in how content is determined.
> DTML is typically used to define as well as format content.  A DTML
> document directly specifies data that is often extracted from large
> object spaces.  In XSL, it appears that the content is largely defined
> by a source document and an XSL "template" simply specifies transformations.
> Of course, an XSL specification can also filter, so there is some
> ability to extract, but it is much less direct than with DTML.

What you seem to be saying is that DTML works on large Python object-bases
and XSL works on small XML document inputs. But that is a difference in
degree, not in kind. I could encode a phonebook as a single XML document
and use XSL to generate a list of all of the numbers in a particular
zipcode. How is that different from using DTML in the same context to
solve the same problem?

The big difference, of course, is that XSL's set of expressions is quite
limited where as Python is quite flexible. That's why I propose using the
same syntax but changing the expressions to be Python expressions.

> Given the very different natures of DTML and XSL, I don't see much
> point in making the syntaxes all that consistent.

Do you have another XML-compliant syntax in mind or have you decided that
XML compliance isn't critical?

> It may very well. If Zope made it easy to generate XML from Zope (ie Python)
> objects, then people who  like XSL could apply XSL transformations to the
> resulting XSL, bypassing DTML altogether.

Sure, but how do I specify the objects that I want to work on from the XSL
stylesheet? You can't [*] export the database as a single XML document, so
you must allow a syntax that allows drilling into Python objects: Python
syntax.

[*] It is vaguely possible that un-extended XSL could work directly on a
Zope database if we could express all Python objects as XML data... this
requires more thought...but even so, you couldn't evaluate arbitrary
Python code, you could only refer to preexisting objects.

> In other words, I see XSL as an alternative to DTML, not another form of it.

I don't really see the difference. Either an extended XSL replaces DTML or
an XSL-syntax DTML replaces DTML. All I'm saying is that the next
generation templating syntax should be XSL-based.

> Or, DTML may turn out to be a good tool for generating XML from objects, and
> then XSL could be applied to DTML output, in which case the two would
> act in tandem.

Why have two steps? It seems better to just use XSL syntax, either
extended with Python expression syntax or not.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From hinsen@cnrs-orleans.fr  Mon Dec 14 20:37:48 1998
From: hinsen@cnrs-orleans.fr (Konrad Hinsen)
Date: Mon, 14 Dec 1998 21:37:48 +0100
Subject: [XML-SIG] XML 0.5 problems
Message-ID: <199812142037.VAA19848@dirac.cnrs-orleans.fr>

I just tried to install the latest XML package release, to make sure
that my XML exploration session planned for the Christmas vacation
won't be spoiled by technical problems. And here they are. I did the
test installation on an AIX 4.3 machine running Python 1.5.1.

1) At first try nothing looked right, and nothing worked. Some exploration
   revealed that my standard reflex of replacing Makefile.pre.in by
   my patched one was not such a good idea, because the XML package
   includes a modified version.
   I understand that this is the easiest way to handle installation,
   but it also presents problems:
   - Makefile.pre.in varies with Python versions
   - Some people need patched versions; for example, the standard
     version does not work for AIX.
   It wasn't much trouble for me to patch the file coming with XML
   to work with AIX, but only because I had also done the original
   patch. I recommend a more robust installation approach for the
   final release (perhaps a short Python script...)

2) I then tried some of the demos, again with little success. Some
   examples:

     cd unicode; python test.py 
     Traceback (innermost last):
       File "test.py", line 1, in ?
	 from xml.unicode import wstring
     ImportError: No module named unicode

     cd sax; python saxdemo.py 
     Traceback (innermost last):
       File "saxdemo.py", line 5, in ?
	 from xml.sax import saxexts, saxlib, saxutils
     ImportError: No module named sax

   Then I tried a few simple imports, with the result that I can
   import xml, but none of its subpackages, although all the
   directories exist and contain something that looks right.
   But all imports *do* work if the current directory is
   the xml-0.5 installation directory. I do have . in
   PYTHONPATH, which probably explains the difference. My
   conclusion: something is wrong with the installation!

Happy bug hunting,
  Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------


From cowan@locke.ccil.org  Mon Dec 14 20:45:48 1998
From: cowan@locke.ccil.org (John Cowan)
Date: Mon, 14 Dec 1998 15:45:48 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <000101be252f$fa764c60$da39bfa8@arabbit> <13937.24366.729293.26105@weyr.cnri.reston.va.us>
Message-ID: <367578FC.373DACD1@locke.ccil.org>

Fred L. Drake wrote:

>   Perhaps there is no fully general equality that isn't identity?

To be precise:  Fully general equality (fge) for mutable objects is
identity.  Fge for immutable objects is the fge-ness of their parts,
since indiscernable objects are identical (Leibniz's criterion).

E.g. immutable strings are equal if their characters are equal,
but (mutable) vectors are equal only if they are identical objects.

(There are other definitions of equality, of course, but they are
not general.)

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)


From Fred L. Drake, Jr." <fdrake@acm.org  Mon Dec 14 20:55:25 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Mon, 14 Dec 1998 15:55:25 -0500 (EST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <367578FC.373DACD1@locke.ccil.org>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <000101be252f$fa764c60$da39bfa8@arabbit>
 <13937.24366.729293.26105@weyr.cnri.reston.va.us>
 <367578FC.373DACD1@locke.ccil.org>
Message-ID: <13941.31549.873183.1048@weyr.cnri.reston.va.us>

John Cowan writes:
 > To be precise:  Fully general equality (fge) for mutable objects is
 > identity.  Fge for immutable objects is the fge-ness of their parts,
 > since indiscernable objects are identical (Leibniz's criterion).

  Leibniz?  Wow, and to think I actually know the name!  Shades of a
day long past!  (I first heard of Leibniz when I studied architecture, 
of all things!)
  I think this is just about where we've ended up on this one, but it
is definately stricter than is generally used for Python.  Typically,
two Python objects (let's take lists as an examples) are considered
equal if their contents are the same; equality of two objects is not
considered to be an unchangable characteristic.  If I have two lists:

	a = [1, 2]
	b = [1, 2]

they are considered equal now, but if I then do this:

	a.reverse()

they are no longer equal.  I think the biggest problem for doing this
with DOM nodes is the issue of context: if the parents are different,
the nodes should probably be considered different.
  Now, if I create two different nodes and insert equivalent data into 
each (say, character data nodes that contain equal data), I think they 
should compare equal.  The problem is that this is not the interesting 
case in practice.  What I *wanted* was less clearly a matter of
equality, and more a matter of a particularly strong correspondence.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Jeffrey@digicool.com  Mon Dec 14 21:15:55 1998
From: Jeffrey@digicool.com (Jeffrey Shell)
Date: Mon, 14 Dec 1998 16:15:55 -0500
Subject: [XML-SIG] RE: [Zope] - Re: [XML-SIG] Re: [Zope] - Zope, DTML and XML
Message-ID: <613145F79272D211914B0020AFF6401901AD4A@gandalf.digicool.com>

> Right, but DTML code is not valid XML code. It can't be 
> edited in an XML
> editor, stored in an XML repository, routed through XML-based 
> workflow,
> etc. etc.

But using DTML you can generate any kind of XML that you want, and get
the level of effectiveness that you are stating.  And then that
generated XML can be routed through all the XML based workflow that you
want.

I've done a couple of relatively small experiments with this, once using
DTML over a sequence of Tabula records to generate a good size XML file
to test an XML parser.  So it was an XML expression of objects in the
database, but using DTML allowed me to make the XML structure that I
wanted independant of the structure of the Zope database or the Tabula
database or what-have-you.

And even that doesn't stop you from using XSL directly in Zope.  On NT
*twitch* with IE5 *twitch* you could make a Zope object to call
Microsofts XSL processor.  Then you could have a DTML page that goes
over some sort of query to return an XML document linked to the XSL
style sheet.  Add another document that calls the XSL processor and
passes in the rendered XML document, and Walla!

> > > I would suggest that Zope should use
> > > XSL template syntax for DTML templates as far as is possible.
> > 
> > It appears to me that DTML and XSL represent two very different
> > approaches to solving the same or similar problems.  They are
> > both intended for generating text from objects.  DTML generates text
> > from Python objects.  XSL generates text from XML objects.
> 
> Not quite. XSL generates XML objects (technically speaking, 
> "nodes") from
> other XML objects (other nodes).

Using XML to go through XML and generate XML?  :)

> The big difference, of course, is that XSL's set of 
> expressions is quite
> limited where as Python is quite flexible. That's why I 
> propose using the
> same syntax but changing the expressions to be Python expressions.

Doesn't this kill off any sort of 'XSL portability'?  I can imagine a
system where both Zope and, say, Access *twitch* (independantly of Zope)
could generate XML documents of the same or similar DTD and have the
same XSL document(s) be able to render them both on an entirely
different machine.

> > Given the very different natures of DTML and XSL, I don't see much
> > point in making the syntaxes all that consistent.
> 
> Do you have another XML-compliant syntax in mind or have you 
> decided that
> XML compliance isn't critical?

It's easy to write some DTML to generate XML.  There's a big piece of
compliance right there.  Currently, there's no XML on the intake side.
I think this is a _far_ more important thing to do than spend a bunch of
time writing yet another XSL parser.  I would rather be able to generate
that phone book file as XML and be able to upload it into Zope as
intelligent-ish Zope objects (into a Tabula, as Zope Folders, or
who-knows-what) and write a simple

<!--#in "phone_numbers(zip=22401)"-->
 <TR>
  <TD><!--#var fullname--></TD><TD><!--#var phone--></TD>
 </TR>
<!--#/in-->

DTML document rather than the complex XSL involved.  Then I can add the
ability for people to add new phone numbers and modify their entries in
this dataset and re-export it to a new updated XML file, and use some
other XSL parser to generate a printable phone book, PDF, RTF, and HTML
from that.  The XML file can be easily done in DTML by:

<?xml version="1.0">
<phonebook zip="<!--#var inZip-->">
 <!--#in "phone_numbers(zip=inZip)"-->
  <listing for="<!--#var fullname-->">
   <fullname><!--#var fullname--></fullname>
   <phone><!--#var phone-->
  </listing>
 <!--#/in-->
</phonebook>

> > It may very well. If Zope made it easy to generate XML from 
> Zope (ie Python)
> > objects, then people who  like XSL could apply XSL 
> transformations to the
> > resulting XSL, bypassing DTML altogether.
> 
> Sure, but how do I specify the objects that I want to work on 
> from the XSL
> stylesheet? You can't [*] export the database as a single XML 
> document, so
> you must allow a syntax that allows drilling into Python 
> objects: Python
> syntax.

See above.  There's a few variations that can be done.

> I don't really see the difference. Either an extended XSL 
> replaces DTML or
> an XSL-syntax DTML replaces DTML. All I'm saying is that the next
> generation templating syntax should be XSL-based.

XSL is _much much much_ tougher for beginners to Grok.  It's very very
powerful, yes, but sometimes just to do a simple tabular based report in
it is waaaay too much of a headache.  We discussed this a long time ago
here at digicool with just an XML based replacement for DTML (XSL was
just barely off the drawing board at the time of these discussions).  I
still emphasize that it's (a) not that hard to generate complient XML
using DTML (the DTML document itself doesn't have to be XML complient,
just the document as rendered), and (b) importing XML should be a bigger
priority.

just my <CURRENCY STYLE="cents">.02</CURRENCY>.


From jday@csihq.com  Mon Dec 14 21:20:14 1998
From: jday@csihq.com (John Day)
Date: Mon, 14 Dec 1998 16:20:14 -0500
Subject: [XML-SIG] Normalized AttVals
Message-ID: <3.0.1.32.19981214162014.006a5290@mail.csihq.com>

Forgive my ignorance of Python and the XML standards, but I 
am confused by the behavior of pyexpat.

Re: quoted attribute contents ("AttVal")
When '>' is encountered e.g. <code op=">"> it is "normalized"
to '&gt;', however, when '&' is encountered it is a fatal
error e.g. <a href="www.zzz.com?a=1&b=3">

Is this pyexpat behavior correct? Why can't the parser tell that
'&b' above is _not_ a defined entity because it is not terminated
by ';'? It seems to me that this usage could be normalized to
'&amp;b', just like pyexpat did for '>'. Then it would be backward
compatible with HTML (sort of).

The impact of this seems to be enormous. All of the existing HTML
parameter generators will have to change the way they post arguments,
when HTML is replaced by XML, right?

-jday 


From michael@graphion.com  Mon Dec 14 21:25:10 1998
From: michael@graphion.com (Michael Sanborn)
Date: Mon, 14 Dec 1998 13:25:10 -0800
Subject: [XML-SIG] Re: New to Python OO
Message-ID: <36758235.57BC4FBE@graphion.com>

Fred L. Drake writes:

>   There are two questions that need to be addressed here:  1) How
> should all this work, and 2) how to make it work now.
>   Let's start with the second question, since it's easier.  This is
> an
> approach I've used to write out an ESIS stream, so I can claim it
> works.  Write the transform you want as a function (or maybe an
> object, if that's more conventient for state management), and pass
> the
> document to it.  It just needs to walk the tree and handle each node
> type appropriately.

Yes, this gets me over the hump just fine, thanks. I'm now able to write
out the result of SQL queries as XML and then, with only a few lines of
additional code (subclassing Walker), alternatively write it out in my
company's proprietary typesetting markup. And this after less than a
month's acquaintance with Python. I think I'm in love!

When I have a little more time, I'll also look at Gabe Wachob's Visitor
class (recently posted to this list), to see if I can also do it the way
it 'should' be done. :-)

Thanks for everything.

Michael Sanborn
Graphion Typesetting


From akuchlin@cnri.reston.va.us  Mon Dec 14 21:51:01 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 14 Dec 1998 16:51:01 -0500 (EST)
Subject: [XML-SIG] Normalized AttVals
In-Reply-To: <3.0.1.32.19981214162014.006a5290@mail.csihq.com>
References: <3.0.1.32.19981214162014.006a5290@mail.csihq.com>
Message-ID: <13941.34375.768950.944498@amarok.cnri.reston.va.us>

John Day writes:
>Re: quoted attribute contents ("AttVal")
>When '>' is encountered e.g. <code op=">"> it is "normalized"
>to '&gt;', however, when '&' is encountered it is a fatal
>error e.g. <a href="www.zzz.com?a=1&b=3">
>
>Is this pyexpat behavior correct? Why can't the parser tell that
>'&b' above is _not_ a defined entity because it is not terminated
>by ';'? It seems to me that this usage could be normalized to
>'&amp;b', just like pyexpat did for '>'. Then it would be backward
>compatible with HTML (sort of).

	Actually, the fact that the above HTML href works is an
artifact of the error recovery in HTML parsers; you really are
supposed to write <a href="www.zzz.com?a=1&amp;b=3">.  There were some
lengthy threads about this in comp.infosystems.www.authoring.html a
few months ago, when someone found that in "a=1&section=4", their
browser was picking up &sect and turning it into a character, which
made the link not behave as expected.

	I think the XML community wishes to avoid depending on error
recovery in this way, because it leads to the same pit that HTML fell
into.  HTML parsers were really forgiving of invalid HTML, so few
authors bothered to check whether their HTML was valid, so you could
never, ever switch to using a stricter parser because so little of the
HTML in existence would be accepted by it.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
And Herakles was full of it. He just got dead drunk for a couple of weeks in
Phrygia and told everyone he'd been to the land of the dead.
    -- Death, in SANDMAN: "The Song of Orpheus"


From akuchlin@cnri.reston.va.us  Mon Dec 14 21:55:47 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 14 Dec 1998 16:55:47 -0500 (EST)
Subject: [XML-SIG] XML 0.5 problems
In-Reply-To: <199812142037.VAA19848@dirac.cnrs-orleans.fr>
References: <199812142037.VAA19848@dirac.cnrs-orleans.fr>
Message-ID: <13941.35069.861693.617350@amarok.cnri.reston.va.us>

Konrad Hinsen writes:
>   - Some people need patched versions; for example, the standard
>     version does not work for AIX.

	What's the patch that's required for AIX?  And is there some
reason it can't be rolled into the Makefile.pre.in for 1.5.2?

>2) I then tried some of the demos, again with little success. Some
>   examples:
>
>     cd unicode; python test.py 
>     Traceback (innermost last):
>       File "test.py", line 1, in ?
>	 from xml.unicode import wstring
>     ImportError: No module named unicode

	Are you getting this error after you've installed the package
under site-packages?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Americans are benevolently ignorant about Canada, while Canadians are
malevolently well informed about the United States.
    -- J. Bartlett Brebner


From cowan@locke.ccil.org  Mon Dec 14 21:56:38 1998
From: cowan@locke.ccil.org (John Cowan)
Date: Mon, 14 Dec 1998 16:56:38 -0500
Subject: [XML-SIG] RE: Equality tests on DOM nodes
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <000101be252f$fa764c60$da39bfa8@arabbit>
 <13937.24366.729293.26105@weyr.cnri.reston.va.us>
 <367578FC.373DACD1@locke.ccil.org> <13941.31549.873183.1048@weyr.cnri.reston.va.us>
Message-ID: <36758996.B9842B78@locke.ccil.org>

Fred L. Drake wrote:

> Typically,
> two Python objects (let's take lists as an examples) are considered
> equal if their contents are the same; equality of two objects is not
> considered to be an unchangable characteristic.

The trouble with that scheme is that it makes equality hard to
reason about.  Intuitively, we expect equality to be transitive,
(if a = b and b = c then a = c), reflexive (a = a), and symmetrical
(if a = b then b = a).  Making equality depend on mutable properties
defeats this: a might = b at one time, but a later check for 
b = a might fail.

>         a.reverse()

I presume this is a *destructive* reverse (leaves a reversed)?
 
-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)


From Michael.Scharf@gmx.de  Mon Dec 14 21:56:11 1998
From: Michael.Scharf@gmx.de (Michael Scharf)
Date: Mon, 14 Dec 1998 22:56:11 +0100
Subject: [XML-SIG] Q: which XML would you recommend?
Message-ID: <3675897B.D926F51@gmx.de>

I need a  Christmas present for myself ;-)

Today I was in the bookstore looking for a XML book. There
are very many (some have ~1000 pages?!)! What I am looking
for is a Python-Tutorial/O'Reiley style book. Something for
someone who knows programming and HTML and a bit of SGML. No
XML for dummies with 10 pages explaining what <xx/>
means. Also nothing that explains everything 'very
theoretically' without any example. A practical introduction
where I can start doing while I read (or imagining what and
how I could do it). 100-200 pages would be best.

Thanks for your help.

Michael
-- 
     ''''\     Michael Scharf
    ` c-@@     TakeFive Software
    `    >     http://www.TakeFive.com
     \_ V      mailto:Michael_Scharf@TakeFive.co.at


From akuchlin@cnri.reston.va.us  Mon Dec 14 22:02:44 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 14 Dec 1998 17:02:44 -0500 (EST)
Subject: [XML-SIG] Mac Python (CFM68K) won't import pyexpat
In-Reply-To: <l03130300b295f5e65ff5@[207.23.94.54]>
References: <l03130300b295f5e65ff5@[207.23.94.54]>
Message-ID: <13941.35434.210358.532950@amarok.cnri.reston.va.us>

Bruce Bennett writes:
>	Python 1.5.1 (#37, Apr 27 1998, 13:36:17)  [CW CFM68K w/GUSI w/MSL]
>	Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>	>>> import sys
>	>>> import pyexpat
>	Traceback (innermost last):
>	  File "<stdin>", line 1, in ?
>	ImportError: PythonCore--PySys_WriteStderr:
>	  A fragment had "hard" unresolved imports.

	Are you using one of the pre-releases of the xml-0.5 package?
PySys_WriteStderr is a C function that was added after 1.5.1; this
problem was fixed in the final release by adding a private version of
PySys_WriteStderr.  Possibly the #ifdef that enables the private
version is wrong, or perhaps you have one of the prereleases of the
code.  (Let me know what you find...)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
A slovenly action repeated thrice has become a habit.
    -- Robertson Davies, _Leaven of Malice_ (?)


From jim.fulton@Digicool.com  Mon Dec 14 22:13:02 1998
From: jim.fulton@Digicool.com (Jim Fulton)
Date: Mon, 14 Dec 1998 17:13:02 -0500
Subject: [XML-SIG] Re: [Zope] - Zope, DTML and XML
References: <36743271.376A09A8@prescod.net> <36750B52.EE1EBC7D@digicool.com> <367560FB.632C3ED5@prescod.net>
Message-ID: <36758D6E.A925B0FF@digicool.com>

Paul Prescod wrote:
> 
> Jim Fulton wrote:
> >
> > Paul Prescod wrote:
> > >
> > > Of course Zope must eventually move into the XML world. Zope needs to do
> > > templates.
> >
> > It already does, via DTML.
> 
> Right, but DTML code is not valid XML code. It can't be edited in an XML
> editor, stored in an XML repository, routed through XML-based workflow,
> etc. etc.

Is that important?  Python isn't valid XML code either, but it's
still useful.  I think it would be useful if there was an XML-compatible
syntax for DTML, but I don't see that having much to do with XSL.
The difference between XSL and DTML run far deeper than syntax.

(I had a similar discussion with some folks a while back wrt
 ASP and DTML.  On the surface, DTML and ASP are similar, but the
 semantics are really very different.)

> > > XSL also does templates.
> >
> > I would have thought that XSL *was* a template mechanism.  What do you mean
> > by "template"?
> 
> XSL can be thought of as a template mechanism. But an XSL stylesheet has
> many templates and describes a flow of control between them, whereas DTML
> documents are a single template.

OK, OK, what ever.  You know alot more about XSL that I do. :)

> > > In fact templates are almost as central to XSL as they are to Zope.
> >
> > I would say far more so,
> 
> Fair enough. I meant to say that that they are almost as central to XSL as
> they are to DTML. six of one...
> 
> > > I would suggest that Zope should use
> > > XSL template syntax for DTML templates as far as is possible.
> >
> > It appears to me that DTML and XSL represent two very different
> > approaches to solving the same or similar problems.  They are
> > both intended for generating text from objects.  DTML generates text
> > from Python objects.  XSL generates text from XML objects.
> 
> Not quite. XSL generates XML objects (technically speaking, "nodes") from
> other XML objects (other nodes).

Ditto.

> > DTML takes a higly procedural approach.  In DTML, you generate
> > text directly.  In XSL (as I understand it) you specify a set of
> > rules for applying transformations to XML elements.  This is fairly
> > declarative in nature.  In the example you gave, you didn't render a
> > specific picture element.  Instead, you have a rule for converting
> > picture elements to img tags.
> 
> Right. But the same holds for DTML. You don't write DTML to generate an
> IMG tag for a specific picture.

Often you do.  Or at least, you typically start out with a relatively 
specific thing.  For example, an in tag is applied to a specific collection
or to the results of a specific call (e.g. a database query).  Then, 
code is applied to elements within the collection.

> If you knew exactly what picture you
> wanted, you would use the HTML for it. You use DTML extensions when you
> want to figure out the picture to use at runtime, just like in XSL. I
> don't see this as a difference.

XSL is rule-based.  You don't say "interate over this and within this
iteration output X and then output Y".  In XSL (speaking as someone
pretty ignorant of XSL ;) you say things like "if you see a Foo, convert
it to a bar ....".  It's like the difference between Python and Prolog
(or, uh, sendmail.cf ... sorry, low blow revealing XSL skepticism ;).

> > Another difference between DTML and XSL is in how content is determined.
> > DTML is typically used to define as well as format content.  A DTML
> > document directly specifies data that is often extracted from large
> > object spaces.  In XSL, it appears that the content is largely defined
> > by a source document and an XSL "template" simply specifies transformations.
> > Of course, an XSL specification can also filter, so there is some
> > ability to extract, but it is much less direct than with DTML.
> 
> What you seem to be saying is that DTML works on large Python object-bases
> and XSL works on small XML document inputs.

DTML makes calls into a large object base, typically pulling out a small
subset.  XSL on the other hand seems to be geared toward transforming a 
body of data.

> But that is a difference in
> degree, not in kind.

It doesn't feel like the same sort of thing to me.  Perhaps I'm
just too ignorant of XSL.

> I could encode a phonebook as a single XML document
> and use XSL to generate a list of all of the numbers in a particular
> zipcode. How is that different from using DTML in the same context to
> solve the same problem?

It's not different.  I think DTML and XML problem spaces 
definately overlap.  In fact, I'd be happy to drop the argument that
the problem spaces are different.  I still think the approaches are too 
different to make it worthwhile to try to turn one into the other.

> The big difference, of course, is that XSL's set of expressions is quite
> limited where as Python is quite flexible. That's why I propose using the
> same syntax but changing the expressions to be Python expressions.
> 
> > Given the very different natures of DTML and XSL, I don't see much
> > point in making the syntaxes all that consistent.
> 
> Do you have another XML-compliant syntax in mind or have you decided that
> XML compliance isn't critical?

I have a syntax in mind.  But that seems to me to be beside the point.
This discussion isn't really about syntax issues, is it?
 
> > It may very well. If Zope made it easy to generate XML from Zope (ie Python)
> > objects, then people who  like XSL could apply XSL transformations to the
> > resulting XSL, bypassing DTML altogether.
> 
> Sure, but how do I specify the objects that I want to work on from the XSL
> stylesheet? You can't [*] export the database as a single XML document, so
> you must allow a syntax that allows drilling into Python objects: Python
> syntax.

If this is true, then you seem to be supporting my argument that one way
that the two is different is that DTML is geared toward drilling into an 
object space while XSL is geared to transforming a body of data.
 
> [*] It is vaguely possible that un-extended XSL could work directly on a
> Zope database if we could express all Python objects as XML data... this
> requires more thought...but even so, you couldn't evaluate arbitrary
> Python code, you could only refer to preexisting objects.
> 
> > In other words, I see XSL as an alternative to DTML, not another form of it.
> 
> I don't really see the difference. Either an extended XSL replaces DTML or
> an XSL-syntax DTML replaces DTML.

Why must one replace the other?  You don't believe that there
should be only one programming language, do you?  I think that the 
approaches of these two systems apeal to different users.

> All I'm saying is that the next
> generation templating syntax should be XSL-based.

This is what you think.  We'll have to agree to disagree.
 
> > Or, DTML may turn out to be a good tool for generating XML from objects, and
> > then XSL could be applied to DTML output, in which case the two would
> > act in tandem.
> 
> Why have two steps?

OK, let's eliminate the XSL step. ;)

Seriously, DTML and XSL have different strengths.
Sometimes we combine DTML and Python, or even C, because
DTML isn't good for everything.  

In fact, an idea that we are very fond of with Zope
if that objects can have methods written in a multitude of
languages (possibly by a multitude of people).  Right now, 
it's not unusual to have objects with methods written in 4 
different languages (Python, C, DTML, SQL).
I'm perfectly happy to see XSL thrown into the mix.

> It seems better to just use XSL syntax, either
> extended with Python expression syntax or not.

I don't agree.

Of course, I'm happy to see people experiment.

It doesn't sound to me like you want an XSL syntax for DTML.
It sounds more to me like you want some sort of XSL processor in 
Zope (or just Python) that is extended to make calls into an
object system.  If you think you can adapt DTML to this somehow, 
go for it.  I'll be interested to see what you come up with.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gwachob@aimnet.com  Mon Dec 14 22:43:31 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Mon, 14 Dec 1998 14:43:31 -0800 (PST)
Subject: [XML-SIG] Re: New to Python OO
In-Reply-To: <36758235.57BC4FBE@graphion.com>
Message-ID: <Pine.GSO.4.05.9812141437460.2493-100000@shell1.ncal.verio.com>

On Mon, 14 Dec 1998, Michael Sanborn wrote:

> Fred L. Drake writes:
> When I have a little more time, I'll also look at Gabe Wachob's Visitor
> class (recently posted to this list), to see if I can also do it the way
> it 'should' be done. :-)

I'm reworking it constantly -- I've added a notion of "subtree value" to
it -- the idea that an entire subtree can be visited and produce a string
which represents its "value" (alternatively, visiting a subtree can
produce side effects like populating a dictionary for use by another
subtree).

Its not terribly clean (ie, the default behavior for node's value is to
take the node's "Value" as returned by the visit method called on that
node and append that value to the value of each of the node's children.
This will basically "flatten" an XML file (The default value of a text
node is the text itself -- other nodes' default values are "").

Anyway, when I get the current version working I'll post it up at the same
URL -- http://www.aimnet.com/~gwachob/DOMVisitor.py

	-Gabe

 -------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From paul@prescod.net  Mon Dec 14 23:55:52 1998
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Dec 1998 17:55:52 -0600
Subject: [XML-SIG] Normalized AttVals
References: <3.0.1.32.19981214162014.006a5290@mail.csihq.com>
Message-ID: <3675A588.E7A7D999@prescod.net>

John Day wrote:
> 
> Re: quoted attribute contents ("AttVal")
> When '>' is encountered e.g. <code op=">"> it is "normalized"
> to '&gt;', however, when '&' is encountered it is a fatal
> error e.g. <a href="www.zzz.com?a=1&b=3">

That's what the XML spec says.

AttValue ::=  '"' ([^<&"] | Reference)* '"'  
   |  "'" ([^<&'] | Reference)* "'" 

That means that "<" and "&" are never allowed in attribute values except
as parts of an attribute reference.

> Is this pyexpat behavior correct? Why can't the parser tell that
> '&b' above is _not_ a defined entity because it is not terminated
> by ';'? 

That's what full SGML does, but that's not what XML does. XML is supposed
to be easier to implement.

> It seems to me that this usage could be normalized to
> '&amp;b', just like pyexpat did for '>'. Then it would be backward
> compatible with HTML (sort of).

There are several ways that it isn't backwards compatible with HTML

> The impact of this seems to be enormous. All of the existing HTML
> parameter generators will have to change the way they post arguments,
> when HTML is replaced by XML, right?

This has been a known problem for a long time.

http://www.uni-ulm.de/uni/fak/natwis/strudo/ampersand.html

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From SBEAKLEY@uact.edu  Tue Dec 15 02:36:51 1998
From: SBEAKLEY@uact.edu (Sara Beakley)
Date: Mon, 14 Dec 1998 19:36:51 -0700
Subject: [XML-SIG] unsubscribe
Message-ID: <F034C7B16862D211A78200A0C9A840F629CDE9@ARRAKIS>

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------ =_NextPart_001_01BE27D3.C5D32D70
Content-Type: text/plain

unsubscribe
> ----------
> From: 	John Cowan[SMTP:cowan@locke.ccil.org]
> Sent: 	Monday, December 14, 1998 2:56 PM
> To: 	DOM List; xml-sig@python.org
> Subject: 	Re: [XML-SIG] RE: Equality tests on DOM nodes
> 
> Fred L. Drake wrote:
> 
> > Typically,
> > two Python objects (let's take lists as an examples) are considered
> > equal if their contents are the same; equality of two objects is not
> > considered to be an unchangable characteristic.
> 
> The trouble with that scheme is that it makes equality hard to
> reason about.  Intuitively, we expect equality to be transitive,
> (if a = b and b = c then a = c), reflexive (a = a), and symmetrical
> (if a = b then b = a).  Making equality depend on mutable properties
> defeats this: a might = b at one time, but a later check for 
> b = a might fail.
> 
> >         a.reverse()
> 
> I presume this is a *destructive* reverse (leaves a reversed)?
>  
> -- 
> John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
> 	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
> 	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
> 		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)
> 

------ =_NextPart_001_01BE27D3.C5D32D70
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.0.1460.9">
<TITLE>unsubscribe</TITLE>
</HEAD>
<BODY>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">unsubscribe</FONT>
<UL>
<P><FONT SIZE=3D2 FACE=3D"MS Sans Serif">----------</FONT>
<BR><B><FONT SIZE=3D2 FACE=3D"MS Sans Serif">From:</FONT></B> &nbsp; =
<FONT SIZE=3D2 FACE=3D"MS Sans Serif">John =
Cowan[SMTP:cowan@locke.ccil.org]</FONT>
<BR><B><FONT SIZE=3D2 FACE=3D"MS Sans Serif">Sent:</FONT></B> &nbsp; =
<FONT SIZE=3D2 FACE=3D"MS Sans Serif">Monday, December 14, 1998 2:56 =
PM</FONT>
<BR><B><FONT SIZE=3D2 FACE=3D"MS Sans Serif">To:</FONT></B> =
&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 FACE=3D"MS Sans Serif">DOM List; =
xml-sig@python.org</FONT>
<BR><B><FONT SIZE=3D2 FACE=3D"MS Sans Serif">Subject:</FONT></B> =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 FACE=3D"MS Sans =
Serif">Re: [XML-SIG] RE: Equality tests on DOM nodes</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Fred L. Drake wrote:</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">&gt; Typically,</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">&gt; two Python objects (let's take =
lists as an examples) are considered</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">&gt; equal if their contents are the =
same; equality of two objects is not</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">&gt; considered to be an unchangable =
characteristic.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">The trouble with that scheme is that =
it makes equality hard to</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">reason about.&nbsp; Intuitively, we =
expect equality to be transitive,</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">(if a =3D b and b =3D c then a =3D =
c), reflexive (a =3D a), and symmetrical</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">(if a =3D b then b =3D a).&nbsp; =
Making equality depend on mutable properties</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">defeats this: a might =3D b at one =
time, but a later check for </FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">b =3D a might fail.</FONT>
</P>

<P><FONT SIZE=3D2 =
FACE=3D"Arial">&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
a.reverse()</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">I presume this is a *destructive* =
reverse (leaves a reversed)?</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">-- </FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">John =
Cowan&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT><U> <FONT COLOR=3D"#0000FF" =
SIZE=3D2 FACE=3D"Arial"><A HREF=3D"http://www.ccil.org/~cowan" =
TARGET=3D"_blank">http://www.ccil.org/~cowan</A></FONT></U>&nbsp; =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 =
FACE=3D"Arial">cowan@ccil.org</FONT></P>

<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 =
FACE=3D"Arial">You tollerday donsk?&nbsp; N.&nbsp; You tolkatiff =
scowegian?&nbsp; Nn.</FONT>
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 =
FACE=3D"Arial">You spigotty anglease?&nbsp; Nnn.&nbsp; You phonio =
saxo?&nbsp; Nnnn.</FONT>
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2 FACE=3D"Arial">Clear all =
so!&nbsp; 'Tis a Jute.... (Finnegans Wake 16.5)</FONT></P>
</UL>
</BODY>
</HTML>
------ =_NextPart_001_01BE27D3.C5D32D70--


From gwachob@aimnet.com  Tue Dec 15 07:05:27 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Mon, 14 Dec 1998 23:05:27 -0800 (PST)
Subject: [XML-SIG] Parsers which include external parsed entities
Message-ID: <Pine.GSO.4.05.9812142300060.11145-100000@shell1.ncal.verio.com>

Are there any parsers out there which automatically include external
parsed entities? 

I am building an app which has (to begin with) a list of items (urls) and
a categorization breakdown of those items. I'd like to keep the items in a
separate file from the categorizations and "glue" them together for
purposes of the application in a third file by including external parsed
entity references to those other xml files. I could parse them as separate
files, but thats not "pretty" (but it probably is more efficient ;-)

I've found that I have to pipe my xml files through SGMLNORM (which, ugh,
upcases all my tags) to get this effect. 

Is there a technical reason why these parses DON'T automatically "include"
externally parsed entities when producing SAX or ESIS (and then DOM)
output? I know there is no requirement that the external entities be
parsed, but are there parsers (written in Python or other languages) that
you can force to include external parsed entities?

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From fredrik@pythonware.com  Tue Dec 15 08:56:59 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 15 Dec 1998 09:56:59 +0100
Subject: [XML-SIG] RE: Equality tests on DOM nodes
Message-ID: <00db01be2808$f0d01250$f29b12c2@pythonware.com>

John Cowan wrote:
> Fred L. Drake wrote:
>
>> Typically,
>> two Python objects (let's take lists as an examples) are considered
>> equal if their contents are the same; equality of two objects is not
>> considered to be an unchangable characteristic.
>
>The trouble with that scheme is that it makes equality hard to
>reason about.  Intuitively, we expect equality to be transitive,
>(if a = b and b = c then a = c), reflexive (a = a), and symmetrical
>(if a = b then b = a).  Making equality depend on mutable properties
>defeats this: a might = b at one time, but a later check for 
>b = a might fail.

Do your bank agree with you on this one?

("hey, I know there was $1000 on this account a week ago,
and it's definitely the same account number!")

(but sure, Python provides the "is" operator if you
really want to test for object identity.  Beginners
seem to have trouble grasping that concept, though,
so I doubt it qualifies as "intuitive"...)

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From hinsen@cnrs-orleans.fr  Tue Dec 15 09:48:25 1998
From: hinsen@cnrs-orleans.fr (Konrad Hinsen)
Date: Tue, 15 Dec 1998 10:48:25 +0100
Subject: [XML-SIG] XML 0.5 problems
In-Reply-To: <13941.35069.861693.617350@amarok.cnri.reston.va.us>
 (akuchlin@cnri.reston.va.us)
References: <199812142037.VAA19848@dirac.cnrs-orleans.fr> <13941.35069.861693.617350@amarok.cnri.reston.va.us>
Message-ID: <199812150948.KAA13922@dirac.cnrs-orleans.fr>

> >   - Some people need patched versions; for example, the standard
> >     version does not work for AIX.
> 
> 	What's the patch that's required for AIX?  And is there some
> reason it can't be rolled into the Makefile.pre.in for 1.5.2?

I don't know the system well enough to decide. The problem is that
shared library linking is a rather complicated process under AIX,
which is handled by two shell scripts. These shell scripts come with
the Python distribution (they are "ld_so_aix" and "makexp_aix") and
are ultimately installed in the "config" subdirectory of the Python
library. But during the compilation of the interpreter and its
standard library modules, they reside in the "Modules" subdirectory of
the Python distribution. The settings in the configuration reflect
this initial situation, not the one after installation. So if you
use the standard Makefile.pre.in, the two critical definitions becom

LINKCC=		$(srcdir)/makexp_aix python.exp "" $(LIBRARY); $(PURIFY) $(CC)
LDSHARED=	$(srcdir)/ld_so_aix $(CC)

whereas they should be

LINKCC=		$(LIBPL)/makexp_aix $(LIBPL)/python.exp "" $(LIBRARY); $(PURIFY) $(CC)
LDSHARED=	$(LIBPL)/ld_so_aix $(CC) -bI:$(LIBPL)/python.exp


I suppose this could be arranged during the installation process, but
I don't really want to figure out how that works!

> >     cd unicode; python test.py 
> >     Traceback (innermost last):
> >       File "test.py", line 1, in ?
> >	 from xml.unicode import wstring
> >     ImportError: No module named unicode
> 
> 	Are you getting this error after you've installed the package
> under site-packages?

Forget about this problem; I found out that I had a file xml.py
somewhere else on my PYTHONPATH. I have no idea where it comes from,
but deleting it didn't seem to have any negative effect. Sorry for
the false alarm!

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------


From larsga@ifi.uio.no  Tue Dec 15 10:27:55 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 15 Dec 1998 11:27:55 +0100
Subject: [XML-SIG] Parsers which include external parsed entities
In-Reply-To: <Pine.GSO.4.05.9812142300060.11145-100000@shell1.ncal.verio.com>
References: <Pine.GSO.4.05.9812142300060.11145-100000@shell1.ncal.verio.com>
Message-ID: <wkww3tbsyc.fsf@ifi.uio.no>

* Gabe Wachob
|
| Are there any parsers out there which automatically include external
| parsed entities?

xmlproc does, both in validating and well-formedness mode.
 
| I've found that I have to pipe my xml files through SGMLNORM (which,
| ugh, upcases all my tags) to get this effect.
 
Why not use SX instead? That shouldn't have the same problem.

| Is there a technical reason why these parsers DON'T automatically
| "include" externally parsed entities when producing SAX or ESIS (and
| then DOM) output?

Many parsers don't bother to parse the internal DTD subset and so
don't have any entity information. For the rest I don't really know.

--Lars M.


From akuchlin@cnri.reston.va.us  Tue Dec 15 13:42:50 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue, 15 Dec 1998 08:42:50 -0500 (EST)
Subject: [XML-SIG] Q: which XML would you recommend?
In-Reply-To: <3675897B.D926F51@gmx.de>
References: <3675897B.D926F51@gmx.de>
Message-ID: <13942.25794.548851.696945@amarok.cnri.reston.va.us>

Michael Scharf writes:
>Today I was in the bookstore looking for a XML book. There
>are very many (some have ~1000 pages?!)! What I am looking
>for is a Python-Tutorial/O'Reiley style book. Something for
>someone who knows programming and HTML and a bit of SGML. No

	I've read the issue of O'Reilly's late _Web Journal_ about
XML; it was a nice overview, but it's now outdated in many respects.
Sean McGrath's _XML By Example_ is sitting in my to-read pile, but I
haven't gotten around to it yet.  If anyone has read other XML books,
brief recommendations (or warnings) for the book page would be
great...

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "And are your Lord's lessons learned in *you*, Cannon?"
    "I am confident that I will pass through St. Peter's gates with only minor
negotiations."
    -- The Sandman and the Cannon, in SANDMAN MYSTERY THEATRE: "The Cannon",
       act IV


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Dec 15 14:37:45 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 15 Dec 1998 09:37:45 -0500 (EST)
Subject: [XML-SIG] RE: Equality tests on DOM nodes
In-Reply-To: <36758996.B9842B78@locke.ccil.org>
References: <13937.18858.948855.840376@amarok.cnri.reston.va.us>
 <000101be252f$fa764c60$da39bfa8@arabbit>
 <13937.24366.729293.26105@weyr.cnri.reston.va.us>
 <367578FC.373DACD1@locke.ccil.org>
 <13941.31549.873183.1048@weyr.cnri.reston.va.us>
 <36758996.B9842B78@locke.ccil.org>
Message-ID: <13942.29753.814023.621356@weyr.cnri.reston.va.us>

John Cowan writes:
 > The trouble with that scheme is that it makes equality hard to
 > reason about.  Intuitively, we expect equality to be transitive,
 > (if a = b and b = c then a = c), reflexive (a = a), and symmetrical
 > (if a = b then b = a).  Making equality depend on mutable properties
 > defeats this: a might = b at one time, but a later check for 
 > b = a might fail.

  That is correct.  This is very important for the programmer to know
about, and is a real consideration when designing a class for which
equality or ordering are important issues.  This is one reason why
many Python programmers use a minimalist approach for immutable data:
it's clear that a particular value will not change underneath you.
  However, I don't think comparison of mutable objects is necessarily
a signigicant problem.  I think most programmers expect equality of
objects to be meaning only when the comparison is made; any longevity
of the result depends on the specific guarantees made by that object.

 > >         a.reverse()
 > 
 > I presume this is a *destructive* reverse (leaves a reversed)?

  Yes, that's exactly how the list .reverse() method operates..
  I think we're sufficiently off-topic; we can move this to personal
email or some other forum if you wish to continue.  The topic is
interesting.  This might be good for comp.lang.python.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From db@Eng.Sun.COM  Tue Dec 15 17:42:09 1998
From: db@Eng.Sun.COM (David Brownell)
Date: Tue, 15 Dec 1998 09:42:09 -0800
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com>
Message-ID: <36769F71.A90D8C2A@eng.sun.com>

On the general topic of "equality", I hope that it's clear to everyone
that there are almost innumerable definitions of the notion based on
the particular task being performed ... don't go hoping for a single
universal "always useful" definition!!!


Ray Whitmer wrote:
> 
> 	 it is not unreasonable to expect that the Java DOM binding
> might eventually specify some behavior here, which would not be the "=="
> comparison.

Though there's one thing to consider:  The behavior of Object.equals()
and Object.hashCode() is specified to make objects work as hashtable
keys in the natural manner.  For example, strings can be used as keys
since they're immutable and equals() is overridden ... were they mutable,
or did they not override equals(), that'd not be so.

If org.w3c.dom.Node.equals(Object) were defined to invoke the DOM
method equals(Node, true) then when a node was changed, it'd need
to get moved to a different location in any hashtable.

For the moment, I have a hard time seeing any better implementations
of Object.equals() and Object.hashCode() for DOM nodes than the default!

- Dave


From gwachob@aimnet.com  Tue Dec 15 18:44:59 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Tue, 15 Dec 1998 10:44:59 -0800 (PST)
Subject: [XML-SIG] Parsers which include external parsed entities
In-Reply-To: <wkww3tbsyc.fsf@ifi.uio.no>
Message-ID: <Pine.GSO.4.05.9812151019300.13233-100000@shell1.ncal.verio.com>

On 15 Dec 1998, Lars Marius Garshol wrote:

> 
> * Gabe Wachob
> |
> | Are there any parsers out there which automatically include external
> | parsed entities?
> 
> xmlproc does, both in validating and well-formedness mode.

I'm having problems parsing this with all of the python xml parsers:

<?xml version="1.0"?>
<!DOCTYPE top  [
	<!ENTITY linklist SYSTEM "links.xml">
	<!ENTITY classification SYSTEM "classification.xml">
]
>
<top>
	&linklist;
</top>


Is my brain mush? Whats wrong with this? I get no errors, but I also get
no DOM tree. Is this a problem with the XML here, the parser, or the DOM 
builder? If I try to parse a more "vanilla" XML file, I get a DOM tree
just fine:

<top>
<head>
This is head text
</head>
<body>
This is body text
</body>
</top>

> | I've found that I have to pipe my xml files through SGMLNORM (which,
> | ugh, upcases all my tags) to get this effect.
>  
> Why not use SX instead? That shouldn't have the same problem.

It does have the same problem (upcasing). XML is case sensitive, while 
SGML is not -- SX assumes it the incoming data is SGML and therefore
ignores the case of the incoming element tags text. Is there an option for
SX to behave case sensitively? (as an aside, I wish they had named it
something besides SX, since the xmodem protocol handler also has a binary
named sx)

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From spepping@scaprea.hobby.nl  Mon Dec 14 19:24:23 1998
From: spepping@scaprea.hobby.nl (Simon Pepping)
Date: Mon, 14 Dec 1998 20:24:23 +0100 (MET)
Subject: [XML-SIG] Installing XML package
Message-ID: <Pine.LNX.3.95.981214201143.323A-100000@scaprea.hobby.nl>

Hello,

When installing the Python XML package, I encountered the following
difficulties:

- make install did not work, because the site-packages directory did
not yet exist. I think the installation should check for this.
- Many py files in the parsers directory had mixed tabs/spaces. This
is awkward when exchanging files; e.g., in my settings tab = 4 spaces,
so that the first and second level indentations were identical.

I hope this helps.

Simon Pepping
email: spepping@scaprea.hobby.nl


From paul@prescod.net  Tue Dec 15 22:02:56 1998
From: paul@prescod.net (Paul Prescod)
Date: Tue, 15 Dec 1998 16:02:56 -0600
Subject: [XML-SIG] WDDX for Python
References: <C3843BD1B83DD2119D79000092A7BAD449BFC4@PLATINUM.allaire.com>
Message-ID: <3676DC90.2AD41AA2@prescod.net>

Simeon, I am looking into the development of the Python binding for WDDX
as I said I might a few weeks ago. I'm cc:ing the Python xml-sig because
they might be interested.

I'm not entirely happy with the logical level of WDDX. My problem is I
can't easily understand when I would use WDDX.

Information passing situations seem to fall under two wide categories.
Either we have a negotiated format (i.e. packet template) or we do not. 

If we DO, then why do we want to tag things <STRING>...</STRING>,
<NUMBER>...</NUMBER> etc. Can't we infer the types of things from our
pre-negotiated template?

If we DO NOT, then wouldn't it be useful to be able to linearize objects
of *named types* instead of only primitive types? i.e. 

<OBJECT TYPE="Traceback">
  <VAR ...>
  <VAR ...>
  <VAR ...>
  <VAR ...>
</OBJECT>

Using the TYPE attribute, we could look up the constuctor for the
appropriate type and invoke it. The problem is that Python programmers
seldom work with data structures made of primitive and compound types.
Rather they work with structures of objects. If you can't encode and
decode those easily then you haven't made the job of encoding data
structures much easier.

We could encode objects as structs, but then their type gets lost so that
they cannot be rebuilt "on the other end."

Maybe you could help me to understand a typical usage situation.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From simeons@allaire.com  Tue Dec 15 23:02:30 1998
From: simeons@allaire.com (Simeon Simeonov)
Date: Tue, 15 Dec 1998 18:02:30 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <009e01be287e$fe0b3410$7315b5cd@ssimeonov.allaire.com>

Hi, Paul!

It's good to hear from you. My comments are intersperced below:


>I'm not entirely happy with the logical level of WDDX. My problem is I
>can't easily understand when I would use WDDX.

Think of WDDX as the epitome of the 80/20 rule. It tries to provide a
solution for 80% of the meaningful data exchange problems with 20% of the
effort. The 80% that WDDX focuses on involve the easy and efficient exchange
of complex structured _data_ (not objects) between different language
platforms. So far, WDDX can be used with C++, Java, COM (VBScript, ASP,
Delphi, PowerBuilder, etc.), ColdFusion, JavaScript, and Perl. WDDX is
particularly well-suited for use on the Web because its XML data format can
be easily transported over HTTP.

Example apps:

- At Allaire we use WDDX to exchange data between the ColdFusion Application
Server and the ColdFusion Studio remote debugger.

- Some big public ecommerce and content providers are working on
WDDX-enabling their sites to expose data for application use.

- Try this URL for another cool example of WDDX use:
http://forums.allaire.com/Forums/Index.cfm?CFApp=49&Message_ID=225377

>Information passing situations seem to fall under two wide categories.
>Either we have a negotiated format (i.e. packet template) or we do not.
>
>If we DO, then why do we want to tag things <STRING>...</STRING>,
><NUMBER>...</NUMBER> etc. Can't we infer the types of things from our
>pre-negotiated template?
>

I agree. We fall in the latter category.

>If we DO NOT, then wouldn't it be useful to be able to linearize objects
>of *named types* instead of only primitive types? i.e.
>
><OBJECT TYPE="Traceback">
>  <VAR ...>
>  <VAR ...>
>  <VAR ...>
>  <VAR ...>
></OBJECT>
>
>Using the TYPE attribute, we could look up the constuctor for the
>appropriate type and invoke it. The problem is that Python programmers
>seldom work with data structures made of primitive and compound types.
>Rather they work with structures of objects.

When you want to reach such a wide audience you have to make concessions. In
particular, we had to decide that we couldn't exchange objects because some
of the target languages have no notion of such.

>If you can't encode and
>decode those easily then you haven't made the job of encoding data
>structures much easier.

I would disagree with you here... How can a Python app exchange data with an
ecommerce app written in ColdFusion? Or a book browser that's written in
Perl? Or with Microsoft Word? How can it send a recordset and a three
dimensional array to a web browser where these data can be used to build
cool DHTML UI?

The core problem of cross-language data exchange is very difficult. WDDX
offers you one way to talk to a _huge_ audience of applications. It is not
perfect, but it is far better than the "roll-your-own" approach.

>We could encode objects as structs, but then their type gets lost so that
>they cannot be rebuilt "on the other end."

This is correct. It will be easy to work with objects in Python and encode
them as structs using something like the dynamic serialization shown by the
JavaScript serializer.

And, yes, it is not easy to wrap objects around the data returned by a
deserializer. Probably the easiest way to do this is to build an object
factory for particular types of WDDX packets and apply it on the result of
the deserialization. Whether this will be worth doing depends on your
application.

>Maybe you could help me to understand a typical usage situation.

Bottom line: WDDX is not a solution for python-python object serialization.
It can, however, open python apps up and let them communicate with a _huge_
number of other applications.

Hope this help. Stay in touch.

Regards,

Sim
Allaire


From gwachob@aimnet.com  Wed Dec 16 00:13:32 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Tue, 15 Dec 1998 16:13:32 -0800 (PST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <C3843BD1B83DD2119D79000092A7BAD449BFC4@PLATINUM.allaire.com>
Message-ID: <Pine.GSO.4.05.9812151606070.13587-100000@shell1.ncal.verio.com>

Hi folks-
	In response to this request, I put together a Deserializer (there
are some issues in serializing that I didn't want to address yet) for WDDX
data into a python object.

	One question I had is this:

In the DTD, you show that a data element can contain one or more of any of
the data types plus recordset/struct/array. Does this mean that this is a
valid XML fragment:

<data>
<number>43</number>
<struct>
...
</struct>
</data>

I made the assumption that is was, so in my deserialization, I create an
object WDDXObject which contains an array items -- in the previous case
the array would contain a number as its first element, and the struct
object (WDDXStruct) as its second element. 

If data has more than one child, then how do you refer to each child if
you don't implement the deserialization the way I do with an array as the
"top level" child of the deserialized object (I ask because I didn't want
to do it this way, but I couldn't think of another simple way of doing
it). What if you have two structs with two element variables with the same
name? 

So, anyway, my deserializer fully implements the DTD and the spec as far
as I understand it. It does not parse the timeDate type (I could throw it
in a wrapper object with nice methods and all).

The URL is http://www.aimnet.com/~gwachob/software.html

It uses my current rev of my DOMVisitor.py  Everything is not well tested,
and in fact, may not be the most efficient. However, here it is...

	-Gabe 

On Sun, 13 Dec 1998, Jeremy Allaire wrote:

> Hello folks-
> 
> I'm interesting in engaging anyone/everyone from the Python community to
> work with us on a WDDX platform module for Python.  With the help of a few
> developers, we've been able to muster/ship WDDX modules for ASP/COM, Java,
> ColdFusion, Perl and JavaScript, and would love to see a Python
> implementation.
> 
> Given the recent XML release for Python, seems like it would be a great
> project to make cross-language distributed web applications even more
> possible.
> 
> Take a visit to www.WDDX.org, and most importantly take a view of the SDK,
> developed by Nate Weiss, which brings it all together with all of the above
> languages.
> 
> Best and regards,
> Jeremy Allaire
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
> 

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From JackUnger@aol.com  Wed Dec 16 04:43:44 1998
From: JackUnger@aol.com (JackUnger@aol.com)
Date: Tue, 15 Dec 1998 23:43:44 EST
Subject: [XML-SIG] (no subject)
Message-ID: <e4e24254.36773a80@aol.com>

In a message dated 12/15/98 7:56:59 AM Central Standard Time,
akuchlin@cnri.reston.va.us writes:

<< Michael Scharf writes:
 >Today I was in the bookstore looking for a XML book. There
 >are very many (some have ~1000 pages?!)! What I am looking
 >for is a Python-Tutorial/O'Reiley style book. Something for
 >someone who knows programming and HTML and a bit of SGML. No
 
 	I've read the issue of O'Reilly's late _Web Journal_ about
 XML; it was a nice overview, but it's now outdated in many respects.
 Sean McGrath's _XML By Example_ is sitting in my to-read pile, but I
 haven't gotten around to it yet.  If anyone has read other XML books,
 brief recommendations (or warnings) for the book page would be
 great...
  >>

</lurk>
One comment on Steven Holzner's XML Complete. Its Java based and when I got
last March there were already problems with the code. It was based on an early
version of MS Parser for Java and by the time I got the book JDK 1.1 and a
corresponding version of the MS Parser were in place and most of the code in
the book didn't work. I haven't revisited the book with Python in hand to
translate and see if it works. 

Back to <lurk> mode. 8^)
Jack Ungerleider


From simeons@allaire.com  Wed Dec 16 14:40:04 1998
From: simeons@allaire.com (Simeon Simeonov)
Date: Wed, 16 Dec 1998 09:40:04 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <00ca01be2901$f824a5d0$7315b5cd@ssimeonov.allaire.com>

Hi, Gabe!

Great work!

> One question I had is this:
>
>In the DTD, you show that a data element can contain one or more of any of
>the data types plus recordset/struct/array. Does this mean that this is a
>valid XML fragment:
>
><data>
><number>43</number>
><struct>
>...
></struct>
></data>

Nope, it does not. There was "bug" in the DTD. The content of the data
element had a *. It really should have one and only one child element. The
version on the site must not have been updated. I'll make sure it is.

Regards,

Sim
Allaire


From akuchlin@cnri.reston.va.us  Wed Dec 16 15:20:47 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 16 Dec 1998 10:20:47 -0500 (EST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <Pine.GSO.4.05.9812151606070.13587-100000@shell1.ncal.verio.com>
References: <C3843BD1B83DD2119D79000092A7BAD449BFC4@PLATINUM.allaire.com>
 <Pine.GSO.4.05.9812151606070.13587-100000@shell1.ncal.verio.com>
Message-ID: <13943.53067.284156.866042@amarok.cnri.reston.va.us>

Gabe Wachob writes:
>Hi folks-
>	In response to this request, I put together a Deserializer (there
>are some issues in serializing that I didn't want to address yet) for WDDX
>data into a python object.

	Neat! FYI, I've also been working on marshalling a bit, trying
to produce a generic Python-to-XML-marshalling class that can be
subclassed to implement a specific format like WDDX or XML-RPC.  It's
too early to report any results, since I haven't actually implemented
unmarshalling yet, and the code hasn't been added to the CVS tree.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Things in Python are very clear, but are harder to find than the secrets of
wizards. Things in Perl are easy to find, but look like arcane spells to
invoke magic.
    -- Mike Meyer, 6 Nov 1997


From larsga@ifi.uio.no  Wed Dec 16 15:29:51 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 16 Dec 1998 16:29:51 +0100
Subject: [XML-SIG] Parsers which include external parsed entities
In-Reply-To: <Pine.GSO.4.05.9812151019300.13233-100000@shell1.ncal.verio.com>
References: <Pine.GSO.4.05.9812151019300.13233-100000@shell1.ncal.verio.com>
Message-ID: <wkemq0t89c.fsf@ifi.uio.no>

* Gabe Wachob
| 
| I'm having problems parsing this with all of the python xml parsers:
|
| <?xml version="1.0"?>
| <!DOCTYPE top  [
| 	  <!ENTITY linklist SYSTEM "links.xml">
| 	  <!ENTITY classification SYSTEM "classification.xml">
| ]
| >

Congratulations! You've found a bug in xmlproc! It turns out that this
way to end the internal DTD subset is well-formed after all. I'll fix
this now so that it will work with the next release (which shouldn't
be too far off).

Meanwhile, just change ']\n>' to ']>' and it should work.

| Is my brain mush? 

Don't think so. I'm more worried about mine... :)

| I get no errors, but I also get no DOM tree. 

This is probably because you don't set any errorhandler so the errors
are just silently swallowed. saxutils.ErrorPrinter is handy if you
want one that simply prints the error messages.

| XML is case sensitive, while SGML is not -- SX assumes it the
| incoming data is SGML and therefore ignores the case of the incoming
| element tags text. 

I should have guessed that, of course.

| Is there an option for SX to behave case sensitively?

In a sense, yes. If you use an SGML declaration where you set element
type names to be case sensitive it shouldn't behave in this way. 

Another trick you can try is jade with an identity-transform DSSSL
stylesheet like:

jade -d id.dsl -t xml mydoc.sgml

and the stylesheet:

(default
   (make element))

You'll lose comments and PIs. You'll also lose attributes, but it
shouldn't be too hard to write a little snippet that puts them in,
using queries on (current-node). Don't have time to put that together
now, unfortunately.

--Lars M.


From gwachob@aimnet.com  Wed Dec 16 16:06:54 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Wed, 16 Dec 1998 08:06:54 -0800 (PST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <13943.53067.284156.866042@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.05.9812160804290.4153-100000@shell1.ncal.verio.com>

On Wed, 16 Dec 1998, Andrew M. Kuchling wrote:

> Gabe Wachob writes:
> >Hi folks-
> >	In response to this request, I put together a Deserializer (there
> >are some issues in serializing that I didn't want to address yet) for WDDX
> >data into a python object.
> 
> 	Neat! FYI, I've also been working on marshalling a bit, trying
> to produce a generic Python-to-XML-marshalling class that can be
> subclassed to implement a specific format like WDDX or XML-RPC.  It's
> too early to report any results, since I haven't actually implemented
> unmarshalling yet, and the code hasn't been added to the CVS tree.

If you get this done, I know there are people at the casbah project who
might want to use such a thing for LDO (their lightweight distributed
object) component. I discussed LDO with Ken MacLeod, and there are some
thorny issues that I'm sure if you haven't run across you may (my memory
on the specific issues are cloudy). Anyway the Casbah URL is
http://www.ntlug.org/casbah

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From ray@imall.com  Wed Dec 16 16:45:18 1998
From: ray@imall.com (Ray Whitmer)
Date: Wed, 16 Dec 1998 09:45:18 -0700
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com> <36769F71.A90D8C2A@eng.sun.com>
Message-ID: <3677E39E.8BAECF6E@imall.com>

David Brownell wrote:

> Though there's one thing to consider:  The behavior of Object.equals()
> and Object.hashCode() is specified to make objects work as hashtable
> keys in the natural manner.  For example, strings can be used as keys
> since they're immutable and equals() is overridden ... were they mutable,
> or did they not override equals(), that'd not be so.

That is a sad but true that Hashtable influenced the implementation of
Object.  Equals is problematic in Object's API because of its ambiguity, but
about every other language seems to do something similarly ambiguous.  You
raise a connection between equals and immutability that I generally tend to
overlook as nonessential.  There are plenty of other examples in the jdk that
also overlook it that I cited before (like Point or Rectangle), again
demonstrating the ambiguity of the interpretation of equals, which I think we
are mostly agreed upon.  Users of Hashtable must rely on discipline, because
there is not enough typing to otherwise guarantee that the interpretation of
equals will not change.

In any case, equals should not be usable for Node until a clear portable
definition is established, whether that be the identity interpretation or some
deeper interpretation.

Ray


From db@Eng.Sun.COM  Wed Dec 16 17:39:27 1998
From: db@Eng.Sun.COM (David Brownell)
Date: Wed, 16 Dec 1998 09:39:27 -0800
Subject: [XML-SIG] Re: Equality tests on DOM nodes
References: <000701be25e4$e3826f60$5839bfa8@arabbit> <36755507.F8657565@imall.com> <36769F71.A90D8C2A@eng.sun.com> <3677E39E.8BAECF6E@imall.com>
Message-ID: <3677F04F.391A3B35@eng.sun.com>

Ray Whitmer wrote:
> 
> David Brownell wrote:
> 
> > Though there's one thing to consider:  The behavior of Object.equals()
> > and Object.hashCode() is specified to make objects work as hashtable
> > keys in the natural manner.  For example, strings can be used as keys
> > since they're immutable and equals() is overridden ... were they mutable,
> > or did they not override equals(), that'd not be so.
> 
> That is a sad but true that Hashtable influenced the implementation of
> Object.  Equals is problematic in Object's API because of its ambiguity, but
> about every other language seems to do something similarly ambiguous. 

I don't see anything being "sad" in the influence you mention.

Any answer that's picked to define "equality" (or "identity") is going to
be pretty arbitrary, and become (in some context/task) "ambiguous".  So there
will always be a need to define application-specific definitions for this. 

I'll also note that after several years (!) of discussion on the topic,
OMG decided to -- gasp! -- let objects be used as keys into hashtables
in CORBA 2.0, as its first foray into the murky waters of this problem.
It's got a low system-wide cost, and provides the benefits folk need.
 

>	 You
> raise a connection between equals and immutability that I generally tend to
> overlook as nonessential.  There are plenty of other examples in the jdk that
> also overlook it that I cited before (like Point or Rectangle),

There may be no official API policy with respect to immutability, though I'll
ask about that one.  One can adopt a policy (with some imperfect degree of
success) like "if you want to change it, don't use it as a hashtable key...".
I mentioned it to highlight some of the complexity behind the notion of one
thing being "equal" to another -- it could change over time!


> In any case, equals should not be usable for Node until a clear portable
> definition is established, whether that be the identity interpretation or some
> deeper interpretation.

At this point in time, the definition would seem to be the default that's
supported by all java.lang.Object instances.

- Dave


From akuchlin@cnri.reston.va.us  Thu Dec 17 01:48:09 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Wed, 16 Dec 1998 20:48:09 -0500
Subject: [XML-SIG] Open issues: Namespaces and Unicode
Message-ID: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com>

There are two major issues still unresolved at this point, from the
list assembled during the Developer's Day session at IPC7.  Other
things, like WDDX and all that, are more minor and not showstoppers.

     1) Unicode support.  

The wstring type was added in version 0.5 of the package, but it was
just added to the installation, not integrated with the XML parsers.
sgmlop and pyexpat are probably the only parsers that stand a chance
of handling 16-bit Unicode.  xmlproc relies on the re module, and
making re handle Unicode would be a big job, so users would have to
UTF-8 encode their data first.  

      From poking around inside Expat, it looks like it can handle
UTF-16, agreeing with a simple test with xmlwf; try running this test
program to generate a file named t.xml and then run it through xmlwf:

from xml.unicode import wstring
s=wstring.L("""<?xml version="1.0" encoding="UTF-16"?>
<thing>text</thing>""")
f = open('t.xml', 'w') ; f.write(s.utf16() ) ; f.close()

Amazingly, if the resulting file is then parsed by Python code using
pyexpat, the resulting UTF8 output is correct, even though the code
doesn't do anything special about Unicode at all.  I suspect that this
is only a coincidence, and won't work on a machine of different
endianness.  
	     
     Anyway, we should probably modify at least one of the parsers to
handle a wide string.  Pyexpat is probably the best candidate, since
the Unicode support is already there in Expat itself.  Does this seem
to be a reasonable course of action?  Any volunteers?

     2) Namespace support.  

We also wanted to arrive at some form of namespace support for the SAX
and DOM interfaces.  Unfortunately, no one responsible seems to be
defining what namespace support should look like in SAX and DOM.  The
plan for SAX might be to use a parser filter that implemented the
additional namespace processing; in a Nov. 13 xml-dev post David
Megginson supported this idea, and said he'd like to formalise the
idea of a SAX filter in SAX 1.0.1.  I'm not aware of any public info
about the changes, but have written Megginson asking about it.

      There also seems no sign of namespace support for the DOM,
though I've posted to the www-dom mailing list asking about it.  This
presents us with two options: ignore DOM namespaces completely for 1.0
and wait for some guidance from the working group; or add some utility
function or module to do it, knowing that it will probably be made
obsolete in the future.  (For example, there might be a
do_namespaces() function in xml.dom.utils that walked over a DOM tree
looking for xmlns:* attributes and decorated all the nodes with an
attribute containing the namespace URI, or a Node method that scanned
its ancestors looking for namespace declarations.)

    What do you think?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It is in this matter that I fall foul of so many American writers on writing;
they seem to think that writing is a confidence game by means of which the
author cajoles a restless, dull-witted, shallow audience into hearing his
point of view. Such an attitude is base, and can only beget base prose.
    -- Robertson Davies, "Elements of Style"


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Dec 17 16:42:11 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 17 Dec 1998 11:42:11 -0500 (EST)
Subject: [XML-SIG] Open issues: Namespaces and Unicode
In-Reply-To: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com>
References: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com>
Message-ID: <13945.13411.498583.532812@weyr.cnri.reston.va.us>

A.M. Kuchling writes:
 >      1) Unicode support.  
...
 > is only a coincidence, and won't work on a machine of different
 > endianness.  

  I suspect expat is able to determine endianness and takes care of
byteswapping as needed.

 >      Anyway, we should probably modify at least one of the parsers to
 > handle a wide string.  Pyexpat is probably the best candidate, since
 > the Unicode support is already there in Expat itself.  Does this seem
 > to be a reasonable course of action?  Any volunteers?

  Yes, and no.  ;-)

 >      2) Namespace support.  
...
 > and DOM interfaces.  Unfortunately, no one responsible seems to be
 > defining what namespace support should look like in SAX and DOM.  The
 > plan for SAX might be to use a parser filter that implemented the

  If we can get an agreement as to just what SAX filters are supposed
to look like, I'm willing to do any new coding needed to implement a
namespace handler.  I understand that someone has already done some
work on SAX filters, but SAX itself really needs to define this, and
preferably define the SAX interface in IDL as well.  Let us know if
you get any info from Dave.
  The results of the last call on the Namespace draft should be known
in early January.  We should wait until that's done before worrying
about it much.

 >       There also seems no sign of namespace support for the DOM,
...
 > function or module to do it, knowing that it will probably be made
 > obsolete in the future.  (For example, there might be a
 > do_namespaces() function in xml.dom.utils that walked over a DOM tree
 > looking for xmlns:* attributes and decorated all the nodes with an

  I think a "decorating" function like this would be a sufficient
interim solution.  I would not place it in the Node or Element class
because of the expected obsolescence.
  I'm willing, but for either project (SAX or DOM namespaces), I won't
have time until January.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Jeff.Johnson@icn.siemens.com  Thu Dec 17 17:54:00 1998
From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com)
Date: Thu, 17 Dec 1998 12:54:00 -0500
Subject: [XML-SIG] new code for xml.dom.utils
Message-ID: <852566DD.0062463D.00@li01.lm.ssc.siemens.com>


I threw together this class and thought it might be a good candidate for
the xml.dom.utils module.  It makes it really easy to get a DOM tree from a
file.  I made it a class even though it could just as easily be a bunch of
functions but as a class it might be subclassed for some neat things I
can't think of right now.

The following subclass would allow an HTML or XML file to be pretty printed
with a single line of code (a pretty silly example but it's just an
example):

class DomDumper(DomHelper)
     __init__(self,filename):
          DomHelper.__init__(self,filename)
          self.dom.dump()

d = DomDumper(sys.argv[1])


Here's the file:


import sys, string, os
from xml.dom import core
from xml.dom import html_builder
from xml.sax import saxexts
from xml.dom.sax_builder import SaxBuilder

class DomHelper:
     def __init__(self,filename=None):
          self.filename = filename
          if filename != None:
               self.dom = self.readFile(filename)

     def readFile(self,filename):
          """Given an XML, HTML, or SGML filename with appropriate file
extensione,
          return the DOM document."""

          type = self.getFileType(filename)
          file = open(filename,'r')
          dom = self.readStream(file,type)
          file.close()
          return dom

     def readStream(self,stream,type='XML'):
          if type == 'XML':
               dom = self.readXml(stream)
          elif type == 'HTML':
               dom = self.readHtml(stream)
          elif type == 'SGML':
               dom = self.readSgml(stream)
          else:
               dom = None
          return dom

     def readXml(self,stream,parserName=None):
          """parserName could be 'pyexpat', 'sgmlop', etc."""
          p = saxexts.make_parser(parserName)
          dh = SaxBuilder()
          p.setDocumentHandler(dh)
          p.feed(stream.read())
          doc = dh.document
          p.close()
          return doc

     def readHtml(self,stream):
          b = html_builder.HtmlBuilder()
          b.feed(stream.read())
          b.close()
          doc = b.document
          # There was some bug that prevents the builder from
          # freeing itself (maybe it has already been fixed?).
          # The next two lines break its references to the DOM
          # tree so that it can be freed.
          b.document = None
          b.current_element = None
          return doc

     def readSgml(self):
          # Don't know much about this part.  This could call SX to
          # convert the SGML to XML, then read it in.  That's what I
          # do for some SGML files I need to convert.  Any suggestions?
          print "This is not implemented."

     def getFileType(self,filename):
          """Given a filename, figure out if the file contains XML, HTML,
or SGML.
          For now, use the file extension to make the determination."""

          filename = string.lower(filename)
          (name,ext) = os.path.splitext(filename)

          if ext in ('.htm','.html'):
               type = 'HTML'
          elif ext in ('.sgm','.sgml'):
               type = 'SGML'
          elif ext == '.xml':
               type = 'XML'
          else:
               type = '' # should this return None instead?
          return type


if __name__ == '__main__':
     if len(sys.argv) == 2:
          d = DomHelper()
          dom = d.readFile(sys.argv[1])
          dom.dump()
     else:
          print "Usage: python %s <?ML filename>" % sys.argv[0]


From jeremy@allaire.com  Thu Dec 17 20:43:57 1998
From: jeremy@allaire.com (Jeremy Allaire)
Date: Thu, 17 Dec 1998 15:43:57 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <005001be29fd$f80bcc20$2b15b5cd@jallaire_lt.allaire.com>

Gabe-

This is awsome!  Lovin support for Python in the mix.  Is there anything we
can do to help solve problems on the serialization side?  Also, you should
drop a note to Nate Weiss (nweiss@icesinc.com) who is the creator of the SDK
so he can include your bits and build some samples off of it.

Also, FYI, PCWeek just ran a story on WDDX:
   http://www.zdnet.com/pcweek/stories/news/0,4153,380476,00.html

Thanks and regards,
Jeremy

-----Original Message-----
From: Gabe Wachob <gwachob@aimnet.com>
To: Jeremy Allaire <jeremy@allaire.com>
Cc: 'xml-sig@python.org' <xml-sig@python.org>; Simeon Simeonov
<simeons@allaire.com>
Date: Tuesday, December 15, 1998 7:15 PM
Subject: Re: [XML-SIG] WDDX for Python


>Hi folks-
> In response to this request, I put together a Deserializer (there
>are some issues in serializing that I didn't want to address yet) for WDDX
>data into a python object.
>
> One question I had is this:
>
>In the DTD, you show that a data element can contain one or more of any of
>the data types plus recordset/struct/array. Does this mean that this is a
>valid XML fragment:
>
><data>
><number>43</number>
><struct>
>...
></struct>
></data>
>
>I made the assumption that is was, so in my deserialization, I create an
>object WDDXObject which contains an array items -- in the previous case
>the array would contain a number as its first element, and the struct
>object (WDDXStruct) as its second element.
>
>If data has more than one child, then how do you refer to each child if
>you don't implement the deserialization the way I do with an array as the
>"top level" child of the deserialized object (I ask because I didn't want
>to do it this way, but I couldn't think of another simple way of doing
>it). What if you have two structs with two element variables with the same
>name?
>
>So, anyway, my deserializer fully implements the DTD and the spec as far
>as I understand it. It does not parse the timeDate type (I could throw it
>in a wrapper object with nice methods and all).
>
>The URL is http://www.aimnet.com/~gwachob/software.html
>
>It uses my current rev of my DOMVisitor.py  Everything is not well tested,
>and in fact, may not be the most efficient. However, here it is...
>
> -Gabe
>
>On Sun, 13 Dec 1998, Jeremy Allaire wrote:
>
>> Hello folks-
>>
>> I'm interesting in engaging anyone/everyone from the Python community to
>> work with us on a WDDX platform module for Python.  With the help of a
few
>> developers, we've been able to muster/ship WDDX modules for ASP/COM,
Java,
>> ColdFusion, Perl and JavaScript, and would love to see a Python
>> implementation.
>>
>> Given the recent XML release for Python, seems like it would be a great
>> project to make cross-language distributed web applications even more
>> possible.
>>
>> Take a visit to www.WDDX.org, and most importantly take a view of the
SDK,
>> developed by Nate Weiss, which brings it all together with all of the
above
>> languages.
>>
>> Best and regards,
>> Jeremy Allaire
>>
>> _______________________________________________
>> XML-SIG maillist  -  XML-SIG@python.org
>> http://www.python.org/mailman/listinfo/xml-sig
>>
>
>-------------------------------------------------------------------
>http://www.aimnet.com/~gwachob               http://www.findlaw.com
>"A popular Government, without popular information, or the means of
>acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps
>both." -- James Madison
>                       import std.disclaimer
>
>


From gwachob@aimnet.com  Thu Dec 17 21:12:18 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Thu, 17 Dec 1998 13:12:18 -0800 (PST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <005001be29fd$f80bcc20$2b15b5cd@jallaire_lt.allaire.com>
Message-ID: <Pine.GSO.4.05.9812171304040.26076-100000@shell1.ncal.verio.com>

On Thu, 17 Dec 1998, Jeremy Allaire wrote:

> Gabe-
> 
> This is awsome!  Lovin support for Python in the mix.  Is there anything we
> can do to help solve problems on the serialization side?  

Well, I wonder aloud whether its possible (or worth attempting) to write a
serializer for arbitrary python objects. What is the approach taken in
other languages? I have not looked at much WDDX stuff besides the DTD..
(in fact, the first time I had ever looked at the WDDX stuff for more than
a minute was when I sat down to write the Deserializer).

Also, I'm not sure what sort of Python objects or data types would map to
a timeDate WDDX element. 

I'm thinking that the best thing to do would be to create a WDDXCreator
object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc).

I don't know -- looking at how other languages like Java do it would be
instructional..

> Also, you should
> drop a note to Nate Weiss (nweiss@icesinc.com) who is the creator of the SDK
> so he can include your bits and build some samples off of it.

Done

> Also, FYI, PCWeek just ran a story on WDDX:
>    http://www.zdnet.com/pcweek/stories/news/0,4153,380476,00.html

Nice.

I actually have no immediate use for WDDX, nor any past experience in it.
I've recently been getting into XML using Python and your message to
XML-SIG (the Python XML SIG list) was timed perfectly for a "Gee, that
looks like a cool thing to play around with" project... Turns out that
Python is such a cool language that it only took an hour or so to write...

	-Gabe

> 
> Thanks and regards,
> Jeremy
> 
> -----Original Message-----
> From: Gabe Wachob <gwachob@aimnet.com>
> To: Jeremy Allaire <jeremy@allaire.com>
> Cc: 'xml-sig@python.org' <xml-sig@python.org>; Simeon Simeonov
> <simeons@allaire.com>
> Date: Tuesday, December 15, 1998 7:15 PM
> Subject: Re: [XML-SIG] WDDX for Python
> 
> 
> >Hi folks-
> > In response to this request, I put together a Deserializer (there
> >are some issues in serializing that I didn't want to address yet) for WDDX
> >data into a python object.
> >
> > One question I had is this:
> >
> >In the DTD, you show that a data element can contain one or more of any of
> >the data types plus recordset/struct/array. Does this mean that this is a
> >valid XML fragment:
> >
> ><data>
> ><number>43</number>
> ><struct>
> >...
> ></struct>
> ></data>
> >
> >I made the assumption that is was, so in my deserialization, I create an
> >object WDDXObject which contains an array items -- in the previous case
> >the array would contain a number as its first element, and the struct
> >object (WDDXStruct) as its second element.
> >
> >If data has more than one child, then how do you refer to each child if
> >you don't implement the deserialization the way I do with an array as the
> >"top level" child of the deserialized object (I ask because I didn't want
> >to do it this way, but I couldn't think of another simple way of doing
> >it). What if you have two structs with two element variables with the same
> >name?
> >
> >So, anyway, my deserializer fully implements the DTD and the spec as far
> >as I understand it. It does not parse the timeDate type (I could throw it
> >in a wrapper object with nice methods and all).
> >
> >The URL is http://www.aimnet.com/~gwachob/software.html
> >
> >It uses my current rev of my DOMVisitor.py  Everything is not well tested,
> >and in fact, may not be the most efficient. However, here it is...
> >
> > -Gabe
> >
> >On Sun, 13 Dec 1998, Jeremy Allaire wrote:
> >
> >> Hello folks-
> >>
> >> I'm interesting in engaging anyone/everyone from the Python community to
> >> work with us on a WDDX platform module for Python.  With the help of a
> few
> >> developers, we've been able to muster/ship WDDX modules for ASP/COM,
> Java,
> >> ColdFusion, Perl and JavaScript, and would love to see a Python
> >> implementation.
> >>
> >> Given the recent XML release for Python, seems like it would be a great
> >> project to make cross-language distributed web applications even more
> >> possible.
> >>
> >> Take a visit to www.WDDX.org, and most importantly take a view of the
> SDK,
> >> developed by Nate Weiss, which brings it all together with all of the
> above
> >> languages.
> >>
> >> Best and regards,
> >> Jeremy Allaire
> >>
> >> _______________________________________________
> >> XML-SIG maillist  -  XML-SIG@python.org
> >> http://www.python.org/mailman/listinfo/xml-sig
> >>
> >
> >-------------------------------------------------------------------
> >http://www.aimnet.com/~gwachob               http://www.findlaw.com
> >"A popular Government, without popular information, or the means of
> >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps
> >both." -- James Madison
> >                       import std.disclaimer
> >
> >
> 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
> 

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From jeremy@allaire.com  Thu Dec 17 21:31:19 1998
From: jeremy@allaire.com (Jeremy Allaire)
Date: Thu, 17 Dec 1998 16:31:19 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <009001be2a04$95f3a100$2b15b5cd@jallaire_lt.allaire.com>

>Well, I wonder aloud whether its possible (or worth attempting) to write a
>serializer for arbitrary python objects. What is the approach taken in
>other languages? I have not looked at much WDDX stuff besides the DTD..
>(in fact, the first time I had ever looked at the WDDX stuff for more than
>a minute was when I sat down to write the Deserializer).
>
>Also, I'm not sure what sort of Python objects or data types would map to
>a timeDate WDDX element.
>
>I'm thinking that the best thing to do would be to create a WDDXCreator
>object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc).
>
>I don't know -- looking at how other languages like Java do it would be
>instructional..


You should look at the Perl and COM implementations -- they're part of the
downloadable SDK, including references and examples.

>I actually have no immediate use for WDDX, nor any past experience in it.
>I've recently been getting into XML using Python and your message to
>XML-SIG (the Python XML SIG list) was timed perfectly for a "Gee, that
>looks like a cool thing to play around with" project... Turns out that
>Python is such a cool language that it only took an hour or so to write...


That's awesome that it was so easy to put together.  I think the serializer
side is dooable without a lot of work.  That would then let Python be a
'object server' to any other scripting language on the Web, as opposed to
the deserializer which would allow Python to be a 'client' to other
distributed web applications.

WDDX is useful for a lot of things.  For one, it allows you to tie together
applications created with different applications.  It also allows you to
expose your Python apps as 'services' that can be leveraged over the net by
any other web application, creating what we're calling 'web syndicate
networks'.  It's even useful for doing rich DHTML/JavaScript front-ends with
Python back-ends, as with WDDX you can pass live objects from your server to
the browser and have them load automagically as JavaScript objects in the
page.  There's some good examples in the SDK of this behavior.

Regards,
Jeremy


>
> -Gabe
>
>>
>> Thanks and regards,
>> Jeremy
>>
>> -----Original Message-----
>> From: Gabe Wachob <gwachob@aimnet.com>
>> To: Jeremy Allaire <jeremy@allaire.com>
>> Cc: 'xml-sig@python.org' <xml-sig@python.org>; Simeon Simeonov
>> <simeons@allaire.com>
>> Date: Tuesday, December 15, 1998 7:15 PM
>> Subject: Re: [XML-SIG] WDDX for Python
>>
>>
>> >Hi folks-
>> > In response to this request, I put together a Deserializer (there
>> >are some issues in serializing that I didn't want to address yet) for
WDDX
>> >data into a python object.
>> >
>> > One question I had is this:
>> >
>> >In the DTD, you show that a data element can contain one or more of any
of
>> >the data types plus recordset/struct/array. Does this mean that this is
a
>> >valid XML fragment:
>> >
>> ><data>
>> ><number>43</number>
>> ><struct>
>> >...
>> ></struct>
>> ></data>
>> >
>> >I made the assumption that is was, so in my deserialization, I create an
>> >object WDDXObject which contains an array items -- in the previous case
>> >the array would contain a number as its first element, and the struct
>> >object (WDDXStruct) as its second element.
>> >
>> >If data has more than one child, then how do you refer to each child if
>> >you don't implement the deserialization the way I do with an array as
the
>> >"top level" child of the deserialized object (I ask because I didn't
want
>> >to do it this way, but I couldn't think of another simple way of doing
>> >it). What if you have two structs with two element variables with the
same
>> >name?
>> >
>> >So, anyway, my deserializer fully implements the DTD and the spec as far
>> >as I understand it. It does not parse the timeDate type (I could throw
it
>> >in a wrapper object with nice methods and all).
>> >
>> >The URL is http://www.aimnet.com/~gwachob/software.html
>> >
>> >It uses my current rev of my DOMVisitor.py  Everything is not well
tested,
>> >and in fact, may not be the most efficient. However, here it is...
>> >
>> > -Gabe
>> >
>> >On Sun, 13 Dec 1998, Jeremy Allaire wrote:
>> >
>> >> Hello folks-
>> >>
>> >> I'm interesting in engaging anyone/everyone from the Python community
to
>> >> work with us on a WDDX platform module for Python.  With the help of a
>> few
>> >> developers, we've been able to muster/ship WDDX modules for ASP/COM,
>> Java,
>> >> ColdFusion, Perl and JavaScript, and would love to see a Python
>> >> implementation.
>> >>
>> >> Given the recent XML release for Python, seems like it would be a
great
>> >> project to make cross-language distributed web applications even more
>> >> possible.
>> >>
>> >> Take a visit to www.WDDX.org, and most importantly take a view of the
>> SDK,
>> >> developed by Nate Weiss, which brings it all together with all of the
>> above
>> >> languages.
>> >>
>> >> Best and regards,
>> >> Jeremy Allaire
>> >>
>> >> _______________________________________________
>> >> XML-SIG maillist  -  XML-SIG@python.org
>> >> http://www.python.org/mailman/listinfo/xml-sig
>> >>
>> >
>> >-------------------------------------------------------------------
>> >http://www.aimnet.com/~gwachob               http://www.findlaw.com
>> >"A popular Government, without popular information, or the means of
>> >acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps
>> >both." -- James Madison
>> >                       import std.disclaimer
>> >
>> >
>>
>>
>> _______________________________________________
>> XML-SIG maillist  -  XML-SIG@python.org
>> http://www.python.org/mailman/listinfo/xml-sig
>>
>
>-------------------------------------------------------------------
>http://www.aimnet.com/~gwachob               http://www.findlaw.com
>"A popular Government, without popular information, or the means of
>acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps
>both." -- James Madison
>                       import std.disclaimer
>
>


From simeons@allaire.com  Thu Dec 17 22:44:20 1998
From: simeons@allaire.com (Simeon Simeonov)
Date: Thu, 17 Dec 1998 17:44:20 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <023c01be2a0e$c92de710$7315b5cd@ssimeonov.allaire.com>

Gabe,

>Also, I'm not sure what sort of Python objects or data types would map to
>a timeDate WDDX element.

To do WDDX serialization you really need to define a set of Python objects /
interfaces that other developers should use. Probably the best example code
to look at is the JavaScript serializer. Here is what I did:

- I created a WddxRecordset object because JS did not have the notion of a
recordset. Internally, I used it just as you deserialize recordsets--as an
object with property arrays. However, making it an object allowed me to
provide custom serialization semantics via a wddxSerialize(serializer)
method.

- All arrays and simple types I mapped to WDDX directly.

- All objects that did not define a custom serialization method I serialized
as structs. This allows for convenient serialization of any JS object.

Hope this provides some food for thought.

Regards,

Sim
Allaire


From paul@prescod.net  Thu Dec 17 22:07:03 1998
From: paul@prescod.net (Paul Prescod)
Date: Thu, 17 Dec 1998 16:07:03 -0600
Subject: [XML-SIG] WDDX for Python
References: <Pine.GSO.4.05.9812171304040.26076-100000@shell1.ncal.verio.com>
Message-ID: <36798087.FC366A71@prescod.net>

The serializer is a little bit more tricky. We should probably discuss
what the right thing here is.

Gabe Wachob wrote:
> 
> Well, I wonder aloud whether its possible (or worth attempting) to write a
> serializer for arbitrary python objects. 

Depends on your definition:

 * arbitrary Python instances and a finite list of builtin types? Yes.
 * transient objects such as file handles and TKinter windows? No.
 * what about objects like compiled regular expressions and AST trees?

According to the Pickle documentation, no C built-ins can be pickled
except the most basic types. I'm surprised that there isn't any way to
make user-defined built-in types (e.g. a C-programmed DOM-node) picklable.
Anyone know more about this? The docs say:

> Classes can further influence how their instances are pickled -- if 
> the class defines the method __getstate__(), it is called and the 
> return state is pickled as the contents for the instance, 

Does this really apply ONLY to classes, or also to built-in types?

Another issue is whether we try to be smart about Python instances that
represent lists of things and mappings. Do we map them to lists and
structs or not?

> Also, I'm not sure what sort of Python objects or data types would map to
> a timeDate WDDX element.

This is a problem I have been discussing in the newsgroup. We would have
to define a WDDX time object and Python programmers could convert
seconds-past-the-epoch integers or time tuple-lists to time objects:
wddx.time( time.gmtime()). It would be nicer to have 1.5.2 contain some
tiny time class but I haven't got any feedback to indicate that that will
happen, so shipping our own is the next best thing.

> I'm thinking that the best thing to do would be to create a WDDXCreator
> object that would work on WDDXObjects (ie WDDXStruct, WDDXdateTime, etc).

That's fine for date/time and for the top-level packets, but you don't
want to force the programmer to convert every item in a list (e.g.) to a
WDDX type. That would be onerous.

> I don't know -- looking at how other languages like Java do it would be
> instructional..

I think that Javascript is a better guide because it is a more dynamic
language like Python.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Dec 17 23:10:16 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 17 Dec 1998 18:10:16 -0500 (EST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <36798087.FC366A71@prescod.net>
References: <Pine.GSO.4.05.9812171304040.26076-100000@shell1.ncal.verio.com>
 <36798087.FC366A71@prescod.net>
Message-ID: <13945.36696.210677.726104@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > According to the Pickle documentation, no C built-ins can be pickled
 > except the most basic types. I'm surprised that there isn't any way to
 > make user-defined built-in types (e.g. a C-programmed DOM-node) picklable.

Hey Paul!
  You can use the copy_reg module to register pickling operations on
built-in types that aren't already picklable.  To see how do this from
C, look at Modules/parsermodule.c.

 > This is a problem I have been discussing in the newsgroup. We would have
 > to define a WDDX time object and Python programmers could convert
 > seconds-past-the-epoch integers or time tuple-lists to time objects:
 > wddx.time( time.gmtime()). It would be nicer to have 1.5.2 contain some

  I've not had time to keep up with the newsgroup / list, but agree we 
need this.  I've thought a little about this for the iso8601 module;
I'd like a class that can represent dates that are "not precise", like
"december, 1998".  The ISO 8601 standard includes such things, and
being able to represent them is useful.  (I've not had time to look at 
mxDateTime yet.)

 > want to force the programmer to convert every item in a list (e.g.) to a
 > WDDX type. That would be onerous.

  Support for a commonly used type (mxDataTime stuff?) might be the
best way, and provide a type for people without that extension.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From paul@prescod.net  Thu Dec 17 22:43:43 1998
From: paul@prescod.net (Paul Prescod)
Date: Thu, 17 Dec 1998 16:43:43 -0600
Subject: [XML-SIG] WDDX for Python
References: <009e01be287e$fe0b3410$7315b5cd@ssimeonov.allaire.com>
Message-ID: <3679891F.865B7550@prescod.net>

Simeon Simeonov wrote:
> 
> Think of WDDX as the epitome of the 80/20 rule. 

Maybe 60/40? :)

> When you want to reach such a wide audience you have to make concessions. In
> particular, we had to decide that we couldn't exchange objects because some
> of the target languages have no notion of such.

Objects??? Are we integrating with Perl 4.0?

Okay, what if we just add an *optional* attribute called "type" to
structs. People could ignore it if they want to but as a Python programmer
I wouldn't feel like I was throwing away Really Important Information. 

Also, what if we added an optional "id" attribute and a <REFERENCE>
type...(maybe I can wait on the reference type for WDDX 2, but I'd rather
not)

> I would disagree with you here... How can a Python app exchange data with an
> ecommerce app written in ColdFusion? Or a book browser that's written in
> Perl? Or with Microsoft Word? How can it send a recordset and a three
> dimensional array to a web browser where these data can be used to build
> cool DHTML UI?

If I had to send a 3D array of integers to Perl, I would send a bunch of
lines like this:

23 43 564 234
40 203 03 203 
23 430 23 10

It is presumably two lines of Perl code to split that up and convert it to
integers. To me, the big win comes when I can send an OBJECT to Perl
without dumbing it down into basic types. In fact, I think that the only
features that I need to do this AS WELL AS the native Python tool called
"pickle" is the "type" attribute and ID/IDREF. If I could just not throw
away types then I could at least handle simple, non-recursive data
structures okay (i.e. ID/IDREF can maybe wait).

> And, yes, it is not easy to wrap objects around the data returned by a
> deserializer. Probably the easiest way to do this is to build an object
> factory for particular types of WDDX packets and apply it on the result of
> the deserialization. Whether this will be worth doing depends on your
> application.

Except that packet types aren't self-labelling either. They do have a
place for meta-data, however. If we could provide a place in structs for
arbitrary metadata, we would be almost home.

BTW, wouldn't the packet metadata be more useful if there was some
attribute that let me say what kind of metadata it was, like HTML META
tags?

> Bottom line: WDDX is not a solution for python-python object serialization.
> It can, however, open python apps up and let them communicate with a _huge_
> number of other applications.

Sure, but we're so close to making it useful for Python->Python and (more
interesting) Python->arbitrary OO language (including Perl 5) object
exchange. I think that all we need is one attribute.

The attribute should contain a URI (URIs are language independent) and
each deserializer could have a mapping from URIs to class constructors.
Languages that don't have a notion of class would ignore the URI. URIs are
verbose but of course they compress beautifully.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From Daniel Biddle <deltab@ps.cus.umist.ac.uk>  Thu Dec 17 23:25:09 1998
From: Daniel Biddle <deltab@ps.cus.umist.ac.uk> (Daniel Biddle)
Date: Thu, 17 Dec 1998 23:25:09 +0000 (GMT)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <13945.36696.210677.726104@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.96.981217231714.16079A-100000@ps.cus.umist.ac.uk>

On 1998-12-17 (Thu) Fred L. Drake wrote:

>   I've not had time to keep up with the newsgroup / list, but agree we 
> need this.  I've thought a little about this for the iso8601 module;
> I'd like a class that can represent dates that are "not precise", like
> "december, 1998".  The ISO 8601 standard includes such things, and
> being able to represent them is useful.  (I've not had time to look at 
> mxDateTime yet.)

Does it? I've typed out the whole standard and am about to convert it into
HTML, and I've not noticed anything like "december, 1998" being possible.
Do you mean "1998-12"?

-- Daniel Biddle


From simeons@allaire.com  Thu Dec 17 23:52:36 1998
From: simeons@allaire.com (Simeon Simeonov)
Date: Thu, 17 Dec 1998 18:52:36 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com>

Hi, Paul!

Many nice comments here.

>Okay, what if we just add an *optional* attribute called "type" to
>structs. People could ignore it if they want to but as a Python programmer
>I wouldn't feel like I was throwing away Really Important Information.

Yup, this is probably the easiest way to go about providing some basic
object serialization. I don't have a problem with this.

>Also, what if we added an optional "id" attribute and a <REFERENCE>
>type...(maybe I can wait on the reference type for WDDX 2, but I'd rather
>not)

This is a much nastier problem as it complicates and slows down both the
serialization and deserialization algorithms. Not that it's a difficult
thing to implement, but it does require the maintenance of data global to
the entire serialization/deserialization process and it slows the process
down considerably. We should probably handle this by optionally notifying
the serializer/deserializer that they are dealing with aggregate data and no
references.

>> I would disagree with you here... How can a Python app exchange data with
an
>> ecommerce app written in ColdFusion? Or a book browser that's written in
>> Perl? Or with Microsoft Word? How can it send a recordset and a three
>> dimensional array to a web browser where these data can be used to build
>> cool DHTML UI?
>
>If I had to send a 3D array of integers to Perl, I would send a bunch of
>lines like this:
>
>23 43 564 234
>40 203 03 203
>23 430 23 10
>
>It is presumably two lines of Perl code to split that up and convert it to
>integers. To me, the big win comes when I can send an OBJECT to Perl
>without dumbing it down into basic types. In fact, I think that the only
>features that I need to do this AS WELL AS the native Python tool called
>"pickle" is the "type" attribute and ID/IDREF. If I could just not throw
>away types then I could at least handle simple, non-recursive data
>structures okay (i.e. ID/IDREF can maybe wait).

Humor me and try to do the same using JavaScript or VBScript. Humor me even
further and exchange an array of arbitrary strings in a safe and efficient
manner. :) I think you'll find the problem unpleasantly fickle...

>> Bottom line: WDDX is not a solution for python-python object
serialization.
>> It can, however, open python apps up and let them communicate with a
_huge_
>> number of other applications.
>
>Sure, but we're so close to making it useful for Python->Python and (more
>interesting) Python->arbitrary OO language (including Perl 5) object
>exchange. I think that all we need is one attribute.
>
>The attribute should contain a URI (URIs are language independent) and
>each deserializer could have a mapping from URIs to class constructors.
>Languages that don't have a notion of class would ignore the URI. URIs are
>verbose but of course they compress beautifully.


I agree with you here. Do you have a particular URI type (look'n'feel) in
mind?

Sim


From paul@prescod.net  Fri Dec 18 03:54:15 1998
From: paul@prescod.net (Paul Prescod)
Date: Thu, 17 Dec 1998 21:54:15 -0600
Subject: [XML-SIG] WDDX for Python
References: <Pine.GSO.4.05.9812171304040.26076-100000@shell1.ncal.verio.com>
 <36798087.FC366A71@prescod.net> <13945.36696.210677.726104@weyr.cnri.reston.va.us>
Message-ID: <3679D1E7.B6881DB4@prescod.net>

"Fred L. Drake" wrote:
> 
>   You can use the copy_reg module to register pickling operations on
> built-in types that aren't already picklable.  To see how do this from
> C, look at Modules/parsermodule.c.

I'll have to implement a similar module for WDDX. I can't use copy_reg
because WDDX has a cross-language requirement. I can't encode the type
name in terms of modules and constructor functions: I must indirect
through a URI.

>   Support for a commonly used type (mxDataTime stuff?) might be the
> best way, and provide a type for people without that extension.

I can make a two-way registry which describes mappings both way. Then I'll
prime the registry with any date/time classes people give me URLs for.
Then users can choose their own date/time class. Whichever one appears as
input (mxDateTime, /F's, etc.) will get interpreted correctly as a date.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From paul@prescod.net  Fri Dec 18 03:02:04 1998
From: paul@prescod.net (Paul Prescod)
Date: Thu, 17 Dec 1998 21:02:04 -0600
Subject: [XML-SIG] WDDX for Python
References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com>
Message-ID: <3679C5AC.83D10000@prescod.net>

Simeon Simeonov wrote:
> 
> Yup, this is probably the easiest way to go about providing some basic
> object serialization. I don't have a problem with this.

Great!

> I agree with you here. Do you have a particular URI type (look'n'feel) in
> mind?

There are three conventions that should be followed:

 * SGML convention is that the URI should be to a document describing the
object type. That way if you ever "find" a packet, (e.g. as a
serialization of a large data structure) then you can research it.

 * XML Namespaces convention is that applications should not depend on any
particular type of data at the other end (or of the URI pointing to
anything at all)

 * general URL convention is that you or your organization should own the
domain name.

> >Also, what if we added an optional "id" attribute and a <REFERENCE>
> >type...(maybe I can wait on the reference type for WDDX 2, but I'd rather
> >not)
> 
> This is a much nastier problem as it complicates and slows down both the
> serialization and deserialization algorithms. Not that it's a difficult
> thing to implement, but it does require the maintenance of data global to
> the entire serialization/deserialization process and it slows the process
> down considerably. We should probably handle this by optionally notifying
> the serializer/deserializer that they are dealing with aggregate data and no
> references.

I admit that this increases the complexity alot. The biggest problem is
dealing with mutually recursive references between objects: especially in
strongly typed programming languages. In dynamically typed languages you
can easily build proxies for the object that isn't available yet. In a
static language I don't know offhand what you would do.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From akuchlin@cnri.reston.va.us  Fri Dec 18 04:22:35 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Thu, 17 Dec 1998 23:22:35 -0500
Subject: [XML-SIG] Recent CVS changes
Message-ID: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>

Some stuff that's been added to the XML CVS tree tonight:

	* Jeff Johnson's DomHelper class has been added to
xml.dom.utils, renamed to FileReader and with some minor changes to
allow passing in a file-like object.  I hope I didn't break anything
in those changes.

	* While waiting for a friend to show up for dinner, I got my
generic marshalling code finished and cleaned up, and also worked on
subclassing it to handle WDDX and XML-RPC, finishing neither of them
but getting pretty close.  XML-RPC is complete except for the
datetime.iso8601 type; I'm not sure how the caller should pass in
something to be marshalled as a date.  (This ties in to the absence of
a standard date-time type.)  WDDX is still missing dateTime,
recordSet, and some other things I can't remember.  Another hour should
suffice to finish it.  (That's what I like about Python: writing 90%
of the code takes 10% of the time, and the other 10% also takes 10% of
the time.)

	I'd be interested in seeing what people think of
xml.marshal.generic; does its structure seem easily amenable to
further subclassing to implement other data serializers?  Also, does
anyone know of other DTDs for data serialization?  I'd like to take a
crack at implementing them all, and seeing if they're all fairly clean 
to implement.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Who was it that designed brown envelopes? I feel sure that he hated people
whoever he was. I wonder where he's buried?
    -- Tom Baker, in his autobiography


From paul@prescod.net  Fri Dec 18 06:28:35 1998
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Dec 1998 00:28:35 -0600
Subject: [XML-SIG] Marshalling
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
Message-ID: <3679F613.68A22D40@prescod.net>

"A.M. Kuchling" wrote:
> 
>        I'd be interested in seeing what people think of
> xml.marshal.generic; does its structure seem easily amenable to
> further subclassing to implement other data serializers?  Also, does
> anyone know of other DTDs for data serialization?  I'd like to take a
> crack at implementing them all, and seeing if they're all fairly clean 
> to implement.

It looks like you've put a lot of thought into it, so please forgive my
random, partially thought-out questions:

 * why have a single class for marshalling and unmarshalling?

 * this stuff is a little weird: "m = self.__class__()" Could we put all
of the mutable data in a separate class and avoid it? Maybe I'm just
skittish about strange idioms...

 * Could m_unimplemented be called by default for unhandled classes?

 * Maybe string handling should be safer...i.e. control characters

User defined types issues:

 1. What do we do about instances? I suggest looping over data-properties
and saving them as named structs. The names should be unique URIs.

 2. what do we do about built-in types (i.e. complex)? I suggest using
copy_reg to deconstruct ... and using URI-named structs again.

 3. pickle uses various magic methods: __reduce__, __getinitargs__,
__getstate__. Should XML marshalling support some or all of that stuff?

My modest contribution is the following code which handles the mapping
from URIs to types and also registers types with copy_reg .

"""Type_reg.py

Type registry -- mapping from URLs to builders and decomposers.
"""
import copy_reg

registry={}

def register( url, type, pickle_function, constructor ):
	copy_reg.pickle( type, pickle_function, constructor )
	registry[url]=type, constructor

def rebuild( url, args ):
	type, cons = registry[url]
	return apply( cons, args  )

def decompose( obj ):
	pickle_function = copy_reg.dispatch_table[type( obj )]
	return pickle_function( obj )[1]

register( "http://www.python.org/doc/ref/types.html#complex", 
	type( 1j ), copy_reg.pickle_complex, complex )

### Todo: register various date/time types

--

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From gstein@lyra.org  Fri Dec 18 08:34:03 1998
From: gstein@lyra.org (Greg Stein)
Date: Fri, 18 Dec 1998 00:34:03 -0800
Subject: [XML-SIG] WDDX for Python
References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com> <3679C5AC.83D10000@prescod.net>
Message-ID: <367A137B.470C3AC4@lyra.org>

Paul Prescod wrote:
> ...
>  * XML Namespaces convention is that applications should not depend on any
> particular type of data at the other end (or of the URI pointing to
> anything at all)

In short: it is a URI, not a URL. It doesn't locate anything; it just
identifies something uniquely.

Nominally, if an XML element looks like:

<foo:ELEM xmlns:foo="URI_goes_here"/>

Then, the element is uniquely identified as "URI_goes_hereELEM" (they're
appended). In a more familiar form, you might have a URI of
"http://my.domain.com/some_app/xml_elems/" so that you end up with final
URIs like "http://my.domain.com/some_app/xml_elem/ELEM"

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Sjoerd.Mullender@cwi.nl  Fri Dec 18 12:11:07 1998
From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender)
Date: Fri, 18 Dec 1998 13:11:07 +0100
Subject: [XML-SIG] Open issues: Namespaces and Unicode
In-Reply-To: Your message of Thu, 17 Dec 1998 11:42:11 -0500.
 <13945.13411.498583.532812@weyr.cnri.reston.va.us>
References: <199812170148.UAA00786@207-172-59-116.s306.tnt2.ann.erols.com>
 <13945.13411.498583.532812@weyr.cnri.reston.va.us>
Message-ID: <UTC199812181211.NAA20132.sjoerd@bireme.cwi.nl>

On Thu, Dec 17 1998 "Fred L. Drake" wrote:

>  >      2) Namespace support.  

In my private version of xmllib I have support for XML namespaces.  I
haven't submitted this version to Guido yet for several reasons:
- The namespace support (at least for the current namespace proposal)
  is very new (like 1 day).
- My current version isn't compatible with the old version that is in
  the Python core.
- I haven't documented the new interface yet.

Is anybody interested in taking a look at my new version anyway?

The most important API changes are:
- I don't look look to see if there any methods with a name matching
  start_TAG end end_TAG since TAG can contain characters that aren't
  allowed in Python identifiers.  Instead I look in a dicionary that
  maps tag names to start and end methods.
- You can specify the valid attributes and default values for all
  elements.  The way this is done has also changed.

-- Sjoerd Mullender <Sjoerd.Mullender@cwi.nl>
   <URL:http://www.cwi.nl/~sjoerd/>


From akuchlin@cnri.reston.va.us  Fri Dec 18 14:13:59 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 18 Dec 1998 09:13:59 -0500 (EST)
Subject: [XML-SIG] Marshalling
In-Reply-To: <3679F613.68A22D40@prescod.net>
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net>
Message-ID: <13946.24429.905213.372579@amarok.cnri.reston.va.us>

Paul Prescod writes:
> * why have a single class for marshalling and unmarshalling?

	My fuzzy argument for this was that I wanted the user to write
only a single subclass, not two of them. 

> * this stuff is a little weird: "m = self.__class__()" Could we put all
>of the mutable data in a separate class and avoid it? Maybe I'm just
>skittish about strange idioms...

	Probably data_stack shouldn't be an attribute of the class,
but be passed to each of the unmarshalling functions.  That would mean
that the Marshaller class would have no mutable attributes at all, and
the self.__class__ thing would be unnecessary.

> * Could m_unimplemented be called by default for unhandled classes?

	Good point; I'll clean that up, and also make the listing of
unmarshalling functions tidier.

> * Maybe string handling should be safer...i.e. control characters

	Shouldn't control characters, such as chr(9) or chr(7) be
fine?  The code already escapes <,&,>, and aren't those the only
characters to worry about?  

	Another potential problem is that on unmarshalling, the XML
parser may change newlines around inside your string.  If you care,
then you'd have to base64-encode all your strings.  I may add code to
check for Tim Bray's proposed attribute, xml:packed="base64" (or
whatever it is), and automatically decode it.

>User defined types issues:
> 1. What do we do about instances? I suggest looping over data-properties
>and saving them as named structs. The names should be unique URIs.

> 2. what do we do about built-in types (i.e. complex)? I suggest using
>copy_reg to deconstruct ... and using URI-named structs again.

	The generic code actually does complex numbers, but I see your 
point.  

> 3. pickle uses various magic methods: __reduce__, __getinitargs__,
>__getstate__. Should XML marshalling support some or all of that stuff?

 <sigh> Definitely, if it supports generic Python instances.  However, 
I'm less interested in reproducing pickle in XML than in providing a
base for supporting all the various DTDs that are popping up.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
A wise man can do no better than to turn from the churches and look up through
the airy majesty of the wayside trees with exultation, with resignation, at
the unconquerable unimplicated sun.
    -- Llewelyn Powys, _The Pathetic Fallacy_


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 18 15:03:09 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 18 Dec 1998 10:03:09 -0500 (EST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <Pine.LNX.3.96.981217231714.16079A-100000@ps.cus.umist.ac.uk>
References: <13945.36696.210677.726104@weyr.cnri.reston.va.us>
 <Pine.LNX.3.96.981217231714.16079A-100000@ps.cus.umist.ac.uk>
Message-ID: <13946.28333.391076.100292@weyr.cnri.reston.va.us>

I wrote:
 > "december, 1998".  The ISO 8601 standard includes such things, and
 > being able to represent them is useful.  (I've not had time to look at 

Daniel Biddle replied:
 > Does it? I've typed out the whole standard and am about to convert it into
 > HTML, and I've not noticed anything like "december, 1998" being possible.
 > Do you mean "1998-12"?

  Yes.  I was not meaning that the syntax I presented was ISO 8601
compliant, only that the date I described was expressible.
  Sorry for any confusion; the ISO 8601 syntax is quite strict, and
terse (and appropriately so).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From paul@prescod.net  Fri Dec 18 15:04:55 1998
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Dec 1998 09:04:55 -0600
Subject: [XML-SIG] Marshalling
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net> <13946.24429.905213.372579@amarok.cnri.reston.va.us>
Message-ID: <367A6F17.3823C9B3@prescod.net>

"Andrew M. Kuchling" wrote:
> 
>         My fuzzy argument for this was that I wanted the user to write
> only a single subclass, not two of them.

Consider having some kind of DTD-adapter class. Python is sufficiently
flexible that sometimes delegation and adapters are simpler than
subclassing.

>         Probably data_stack shouldn't be an attribute of the class,
> but be passed to each of the unmarshalling functions.  That would mean
> that the Marshaller class would have no mutable attributes at all, and
> the self.__class__ thing would be unnecessary.

Good idea.

> > * Maybe string handling should be safer...i.e. control characters
> 
>         Shouldn't control characters, such as chr(9) or chr(7) be
> fine?  The code already escapes <,&,>, and aren't those the only
> characters to worry about?

chr(9), yes. chr(7) no.

From REC-XML:

Char ::=  #x9 | #xA | #xD | [#x20-#D7FF] | [#xE000-#xFFFD]
              | [#x10000-#x10FFFF]

>  <sigh> Definitely, if it supports generic Python instances.  However,
> I'm less interested in reproducing pickle in XML than in providing a
> base for supporting all the various DTDs that are popping up.

Presumably the number of new DTDs is going to slow down. That territory on
the noosphere is getting crowded.

To me, transporting instances is the difference between being useful and
being mildly convenient. Allaire has agreed to support my "type"
attribute, which strikes me as the major thing required to make this stuff
useful for Python->Python object transmission.

Also, I think it would be a good idea for Python's ASCII pickle format to
(eventually!) be standards-based (i.e. WDDX or something). Sure, it would
result in a blow-up, but ASCII pickle is already vebose and slow. Given
the choice between proprietary, verbose and slow or open, really verbose
and very slow, I think that the latter would be better. If ASCII pickle is
intended for human readability and debugging, then why not make it more
readable and even editable in XML editors?

The whole basis for WDDX and XML-RPC is that XML is bloody verbose but it
is also very human-friendly.

Anyhow, I'm not trying to invent work for you. If there is some easy way I
can add instance marshalling support to only the WDDX subclass (or
"adapter") then I will do that. We can migrate it towards full pickle
functionality when and if it  becomes popular enough to justify the work.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 18 15:35:48 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 18 Dec 1998 10:35:48 -0500 (EST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <3679D1E7.B6881DB4@prescod.net>
References: <Pine.GSO.4.05.9812171304040.26076-100000@shell1.ncal.verio.com>
 <36798087.FC366A71@prescod.net>
 <13945.36696.210677.726104@weyr.cnri.reston.va.us>
 <3679D1E7.B6881DB4@prescod.net>
Message-ID: <13946.30292.59954.75623@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > I'll have to implement a similar module for WDDX. I can't use copy_reg
 > because WDDX has a cross-language requirement. I can't encode the type

  Yes; my response was only to the pickle part of your question.  I
don't see why there can't be an xml.wddx.registry module or something
like that which implements the specific mechanics.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 18 15:58:23 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 18 Dec 1998 10:58:23 -0500 (EST)
Subject: [XML-SIG] Marshalling
In-Reply-To: <13946.24429.905213.372579@amarok.cnri.reston.va.us>
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net>
 <13946.24429.905213.372579@amarok.cnri.reston.va.us>
Message-ID: <13946.31647.653312.303899@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > * why have a single class for marshalling and unmarshalling?

Andrew M. Kuchling writes:
 > 	My fuzzy argument for this was that I wanted the user to write
 > only a single subclass, not two of them. 

  Now's my turn to say "this is bogus".  This is bogus.  It's
entirely appropriate to separate the two functions.  This also makes
sense if you only need to support one or the other for some format not 
provided with the base package.  There is precedence for separate
classes in pickle and xdrlib.

Paul Prescod writes:
 > 3. pickle uses various magic methods: __reduce__, __getinitargs__,
 >__getstate__. Should XML marshalling support some or all of that stuff?

Andrew M. Kuchling writes:
 >  <sigh> Definitely, if it supports generic Python instances.  However, 
 > I'm less interested in reproducing pickle in XML than in providing a
 > base for supporting all the various DTDs that are popping up.

  This seems to be an issue for the specific subclasses; some systems
will support more than others, and the Python implementations should
"do the right thing" as appropriate for the specific requirements.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 18 16:08:47 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 18 Dec 1998 11:08:47 -0500 (EST)
Subject: [XML-SIG] WDDX for Python
In-Reply-To: <3679C5AC.83D10000@prescod.net>
References: <027501be2a18$52bc0cb0$7315b5cd@ssimeonov.allaire.com>
 <3679C5AC.83D10000@prescod.net>
Message-ID: <13946.32271.869068.621724@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > dealing with mutually recursive references between objects: especially in
 > strongly typed programming languages. In dynamically typed languages you
 > can easily build proxies for the object that isn't available yet. In a
 > static language I don't know offhand what you would do.

  Either static or dynamic languages can be supported using a patch
list.  Using a patch list eliminates the need to construct proxies as
well.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From simeons@allaire.com  Fri Dec 18 16:06:06 1998
From: simeons@allaire.com (Simeon Simeonov)
Date: Fri, 18 Dec 1998 11:06:06 -0500
Subject: [XML-SIG] WDDX for Python
Message-ID: <029b01be2aa0$51b939e0$7315b5cd@ssimeonov.allaire.com>

>There are three conventions that should be followed:
>
> * SGML convention is that the URI should be to a document describing the
>object type. That way if you ever "find" a packet, (e.g. as a
>serialization of a large data structure) then you can research it.
>
> * XML Namespaces convention is that applications should not depend on any
>particular type of data at the other end (or of the URI pointing to
>anything at all)
>
> * general URL convention is that you or your organization should own the
>domain name.
>


I like the XML Namespaces approach. The type URI should be no more than a
unique ID that both ends of a data exchange will use (most likely) to plug
into some kind of an object factory. So a generic Python object can
serialize its data to a structure w/ a type= attribute obtained from this
object factory.

Sim
Allaire


From ken@bitsko.slc.ut.us  Fri Dec 18 16:51:26 1998
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: Fri, 18 Dec 1998 10:51:26 -0600 (CST)
Subject: [XML-SIG] Recent CVS changes
In-Reply-To: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com> from "A.M. Kuchling" at Dec 17, 98 11:22:35 pm
Message-ID: <199812181651.KAA31392@bitsko.slc.ut.us>

Andrew Kuchling wrote:
>	I'd be interested in seeing what people think of
> xml.marshal.generic; does its structure seem easily amenable to further
> subclassing to implement other data serializers?  Also, does anyone
> know of other DTDs for data serialization?  I'd like to take a crack
> at implementing them all, and seeing if they're all fairly clean
> to implement.

Another is LDO's XML serialization:

  <http://www.ntlug.org/cgi-bin/cvsweb/LDO/ldo-xml.dtd>

The DTD itself has basic specs and I hope to complete more docs over
Christmas vacation.

  -- Ken


From Sjoerd.Mullender@cwi.nl  Fri Dec 18 17:33:02 1998
From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender)
Date: Fri, 18 Dec 1998 18:33:02 +0100
Subject: [XML-SIG] New version of xmllib
Message-ID: <UTC199812181733.SAA10767.sjoerd@bireme.cwi.nl>

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <10433.914002226.1@bireme.cwi.nl>

Here is my current version of xmllib.py and the documentation.  This
version has some API changes with respect to the version currently in
Python (also the one in 1.5.2a).
This version supports XML namespaces.

-- Sjoerd Mullender <Sjoerd.Mullender@cwi.nl>
   <URL:http://www.cwi.nl/~sjoerd/>


------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <10433.914002226.2@bireme.cwi.nl>
Content-Description: xmllib.py
Content-Disposition: attachment; filename="xmllib.py"

# A parser for XML, using the derived class as static DTD.
# Author: Sjoerd Mullender.

import re
import string


version = '0.2'

# Regular expressions used for parsing

_S = '[ \t\r\n]+'                       # white space
_opS = '[ \t\r\n]*'                     # optional white space
_Name = '[a-zA-Z_:][-a-zA-Z0-9._:]*'    # valid XML name
_QStr = "(?:'[^']*'|\"[^\"]*\")"        # quoted XML string
illegal = re.compile('[^\t\r\n -\176\240-\377]') # illegal chars in content
interesting = re.compile('[]&<]')

amp = re.compile('&')
ref = re.compile('&(' + _Name + '|#[0-9]+|#x[0-9a-fA-F]+)[^-a-zA-Z0-9._:]')
entityref = re.compile('&(?P<name>' + _Name + ')[^-a-zA-Z0-9._:]')
charref = re.compile('&#(?P<char>[0-9]+[^0-9]|x[0-9a-fA-F]+[^0-9a-fA-F])')
space = re.compile(_S + '$')
newline = re.compile('\n')

attrfind = re.compile(
    _S + '(?P<name>' + _Name + ')'
    '(' + _opS + '=' + _opS +
    '(?P<value>'+_QStr+'|[-a-zA-Z0-9.:+*%?!()_#=~]+))?')
starttagopen = re.compile('<' + _Name)
starttagend = re.compile(_opS + '(?P<slash>/?)>')
starttagmatch = re.compile('<(?P<tagname>'+_Name+')'
                      '(?P<attrs>(?:'+attrfind.pattern+')*)'+
                      starttagend.pattern)
endtagopen = re.compile('</')
endbracket = re.compile(_opS + '>')
endbracketfind = re.compile('(?:[^>\'"]|'+_QStr+')*>')
tagfind = re.compile(_Name)
cdataopen = re.compile(r'<!\[CDATA\[')
cdataclose = re.compile(r'\]\]>')
# this matches one of the following:
# SYSTEM SystemLiteral
# PUBLIC PubidLiteral SystemLiteral
_SystemLiteral = '(?P<%s>'+_QStr+')'
_PublicLiteral = '(?P<%s>"[-\'()+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*"|' \
                        "'[-()+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*')"
_ExternalId = '(?:SYSTEM|' \
                 'PUBLIC'+_S+_PublicLiteral%'pubid'+ \
              ')'+_S+_SystemLiteral%'syslit'
doctype = re.compile('<!DOCTYPE'+_S+'(?P<name>'+_Name+')'
                     '(?:'+_S+_ExternalId+')?'+_opS)
xmldecl = re.compile('<\?xml'+_S+
                     'version'+_opS+'='+_opS+'(?P<version>'+_QStr+')'+
                     '(?:'+_S+'encoding'+_opS+'='+_opS+
                        "(?P<encoding>'[A-Za-z][-A-Za-z0-9._]*'|"
                        '"[A-Za-z][-A-Za-z0-9._]*"))?'
                     '(?:'+_S+'standalone'+_opS+'='+_opS+
                        '(?P<standalone>\'(?:yes|no)\'|"(?:yes|no)"))?'+
                     _opS+'\?>')
procopen = re.compile(r'<\?(?P<proc>' + _Name + ')' + _opS)
procclose = re.compile(_opS + r'\?>')
commentopen = re.compile('<!--')
commentclose = re.compile('-->')
doubledash = re.compile('--')
attrtrans = string.maketrans(' \r\n\t', '    ')

# definitions for XML namespaces
_NCName = '[a-zA-Z_][-a-zA-Z0-9._]*'    # XML Name, minus the ":"
ncname = re.compile(_NCName + '$')
qname = re.compile('(?:(?P<prefix>' + _NCName + '):)?' # optional prefix
                   '(?P<local>' + _NCName + ')$')

xmlns = re.compile('xmlns(?::(?P<ncname>'+_NCName+'))?$')

# XML parser base class -- find tags and call handler functions.
# Usage: p = XMLParser(); p.feed(data); ...; p.close().
# The dtd is defined by deriving a class which defines methods with
# special names to handle tags: start_foo and end_foo to handle <foo>
# and </foo>, respectively.  The data between tags is passed to the
# parser by calling self.handle_data() with some data as argument (the
# data may be split up in arbutrary chunks).  Entity references are
# passed by calling self.handle_entityref() with the entity reference
# as argument.

class XMLParser:
    attributes = {}                     # default, to be overridden
    elements = {}                       # default, to be overridden

    # Interface -- initialize and reset this instance
    def __init__(self):
        self.reset()

    # Interface -- reset this instance.  Loses all unprocessed data
    def reset(self):
        self.rawdata = ''
        self.stack = []
        self.nomoretags = 0
        self.literal = 0
        self.lineno = 1
        self.__at_start = 1
        self.__seen_doctype = None
        self.__seen_starttag = 0
        self.__namespaces = {'xml':None}   # xml is implicitly declared

    # For derived classes only -- enter literal mode (CDATA) till EOF
    def setnomoretags(self):
        self.nomoretags = self.literal = 1

    # For derived classes only -- enter literal mode (CDATA)
    def setliteral(self, *args):
        self.literal = 1

    # Interface -- feed some data to the parser.  Call this as
    # often as you want, with as little or as much text as you
    # want (may include '\n').  (This just saves the text, all the
    # processing is done by goahead().)
    def feed(self, data):
        self.rawdata = self.rawdata + data
        self.goahead(0)

    # Interface -- handle the remaining data
    def close(self):
        self.goahead(1)

    # Interface -- translate references
    def translate_references(self, data, all = 1):
        i = 0
        while 1:
            res = amp.search(data, i)
            if res is None:
                return data
            res = ref.match(data, res.start(0))
            if res is None:
                self.syntax_error("bogus `&'")
                i =i+1
                continue
            i = res.end(0)
            if data[i - 1] != ';':
                self.syntax_error("`;' missing after entity/char reference")
                i = i-1
            str = res.group(1)
            pre = data[:res.start(0)]
            post = data[i:]
            if str[0] == '#':
                if str[1] == 'x':
                    str = chr(string.atoi(str[2:], 16))
                else:
                    str = chr(string.atoi(str[1:]))
                data = pre + str + post
                i = res.start(0)+len(str)
            elif all:
                if self.entitydefs.has_key(str):
                    data = pre + self.entitydefs[str] + post
                    i = res.start(0)    # rescan substituted text
                else:
                    self.syntax_error('reference to unknown entity')
                    # can't do it, so keep the entity ref in
                    data = pre + '&' + str + ';' + post
                    i = res.start(0) + len(str) + 2
            else:
                # just translating character references
                pass                    # i is already postioned correctly

    # Internal -- handle data as far as reasonable.  May leave state
    # and data to be processed by a subsequent call.  If 'end' is
    # true, force handling all data as if followed by EOF marker.
    def goahead(self, end):
        rawdata = self.rawdata
        i = 0
        n = len(rawdata)
        while i < n:
            if i > 0:
                self.__at_start = 0
            if self.nomoretags:
                data = rawdata[i:n]
                self.handle_data(data)
                self.lineno = self.lineno + string.count(data, '\n')
                i = n
                break
            res = interesting.search(rawdata, i)
            if res:
                    j = res.start(0)
            else:
                    j = n
            if i < j:
                if self.__at_start:
                    self.syntax_error('illegal data at start of file')
                self.__at_start = 0
                data = rawdata[i:j]
                if not self.stack and space.match(data) is None:
                    self.syntax_error('data not in content')
                if illegal.search(data):
                    self.syntax_error('illegal character in content')
                self.handle_data(data)
                self.lineno = self.lineno + string.count(data, '\n')
            i = j
            if i == n: break
            if rawdata[i] == '<':
                if starttagopen.match(rawdata, i):
                    if self.literal:
                        data = rawdata[i]
                        self.handle_data(data)
                        self.lineno = self.lineno + string.count(data, '\n')
                        i = i+1
                        continue
                    k = self.parse_starttag(i)
                    if k < 0: break
                    self.__seen_starttag = 1
                    self.lineno = self.lineno + string.count(rawdata[i:k], '\n')
                    i = k
                    continue
                if endtagopen.match(rawdata, i):
                    k = self.parse_endtag(i)
                    if k < 0: break
                    self.lineno = self.lineno + string.count(rawdata[i:k], '\n')
                    i =  k
                    continue
                if commentopen.match(rawdata, i):
                    if self.literal:
                        data = rawdata[i]
                        self.handle_data(data)
                        self.lineno = self.lineno + string.count(data, '\n')
                        i = i+1
                        continue
                    k = self.parse_comment(i)
                    if k < 0: break
                    self.lineno = self.lineno + string.count(rawdata[i:k], '\n')
                    i = k
                    continue
                if cdataopen.match(rawdata, i):
                    k = self.parse_cdata(i)
                    if k < 0: break
                    self.lineno = self.lineno + string.count(rawdata[i:i], '\n')
                    i = k
                    continue
                res = xmldecl.match(rawdata, i)
                if res:
                    if not self.__at_start:
                        self.syntax_error("<?xml?> declaration not at start of document")
                    version, encoding, standalone = res.group('version',
                                                              'encoding',
                                                              'standalone')
                    if version[1:-1] != '1.0':
                        raise RuntimeError, 'only XML version 1.0 supported'
                    if encoding: encoding = encoding[1:-1]
                    if standalone: standalone = standalone[1:-1]
                    self.handle_xml(encoding, standalone)
                    i = res.end(0)
                    continue
                res = procopen.match(rawdata, i)
                if res:
                    k = self.parse_proc(i)
                    if k < 0: break
                    self.lineno = self.lineno + string.count(rawdata[i:k], '\n')
                    i = k
                    continue
                res = doctype.match(rawdata, i)
                if res:
                    if self.literal:
                        data = rawdata[i]
                        self.handle_data(data)
                        self.lineno = self.lineno + string.count(data, '\n')
                        i = i+1
                        continue
                    if self.__seen_doctype:
                        self.syntax_error('multiple DOCTYPE elements')
                    if self.__seen_starttag:
                        self.syntax_error('DOCTYPE not at beginning of document')
                    k = self.parse_doctype(res)
                    if k < 0: break
                    self.__seen_doctype = res.group('name')
                    self.lineno = self.lineno + string.count(rawdata[i:k], '\n')
                    i = k
                    continue
            elif rawdata[i] == '&':
                if self.literal:
                    data = rawdata[i]
                    self.handle_data(data)
                    i = i+1
                    continue
                res = charref.match(rawdata, i)
                if res is not None:
                    i = res.end(0)
                    if rawdata[i-1] != ';':
                        self.syntax_error("`;' missing in charref")
                        i = i-1
                    if not self.stack:
                        self.syntax_error('data not in content')
                    self.handle_charref(res.group('char')[:-1])
                    self.lineno = self.lineno + string.count(res.group(0), '\n')
                    continue
                res = entityref.match(rawdata, i)
                if res is not None:
                    i = res.end(0)
                    if rawdata[i-1] != ';':
                        self.syntax_error("`;' missing in entityref")
                        i = i-1
                    name = res.group('name')
                    if self.entitydefs.has_key(name):
                        self.rawdata = rawdata = rawdata[:res.start(0)] + self.entitydefs[name] + rawdata[i:]
                        n = len(rawdata)
                        i = res.start(0)
                    else:
                        self.syntax_error('reference to unknown entity')
                        self.unknown_entityref(name)
                    self.lineno = self.lineno + string.count(res.group(0), '\n')
                    continue
            elif rawdata[i] == ']':
                if self.literal:
                    data = rawdata[i]
                    self.handle_data(data)
                    i = i+1
                    continue
                if n-i < 3:
                    break
                if cdataclose.match(rawdata, i):
                    self.syntax_error("bogus `]]>'")
                self.handle_data(rawdata[i])
                i = i+1
                continue
            else:
                raise RuntimeError, 'neither < nor & ??'
            # We get here only if incomplete matches but
            # nothing else
            break
        # end while
        if i > 0:
            self.__at_start = 0
        if end and i < n:
            data = rawdata[i]
            self.syntax_error("bogus `%s'" % data)
            if illegal.search(data):
                self.syntax_error('illegal character in content')
            self.handle_data(data)
            self.lineno = self.lineno + string.count(data, '\n')
            self.rawdata = rawdata[i+1:]
            return self.goahead(end)
        self.rawdata = rawdata[i:]
        if end:
            if not self.__seen_starttag:
                self.syntax_error('no elements in file')
            if self.stack:
                self.syntax_error('missing end tags')
                while self.stack:
                    self.finish_endtag(self.stack[-1][0])

    # Internal -- parse comment, return length or -1 if not terminated
    def parse_comment(self, i):
        rawdata = self.rawdata
        if rawdata[i:i+4] <> '<!--':
            raise RuntimeError, 'unexpected call to handle_comment'
        res = commentclose.search(rawdata, i+4)
        if res is None:
            return -1
        if doubledash.search(rawdata, i+4, res.start(0)):
            self.syntax_error("`--' inside comment")
        if rawdata[res.start(0)-1] == '-':
            self.syntax_error('comment cannot end in three dashes')
        if illegal.search(rawdata, i+4, res.start(0)):
            self.syntax_error('illegal character in comment')
        self.handle_comment(rawdata[i+4: res.start(0)])
        return res.end(0)

    # Internal -- handle DOCTYPE tag, return length or -1 if not terminated
    def parse_doctype(self, res):
        rawdata = self.rawdata
        n = len(rawdata)
        name = res.group('name')
        pubid, syslit = res.group('pubid', 'syslit')
        if pubid is not None:
            pubid = pubid[1:-1]         # remove quotes
            pubid = string.join(string.split(pubid)) # normalize
        if syslit is not None: syslit = syslit[1:-1] # remove quotes
        j = k = res.end(0)
        if k >= n:
            return -1
        if rawdata[k] == '[':
            level = 0
            k = k+1
            dq = sq = 0
            while k < n:
                c = rawdata[k]
                if not sq and c == '"':
                    dq = not dq
                elif not dq and c == "'":
                    sq = not sq
                elif sq or dq:
                    pass
                elif level <= 0 and c == ']':
                    res = endbracket.match(rawdata, k+1)
                    if res is None:
                        return -1
                    self.handle_doctype(name, pubid, syslit, rawdata[j+1:k])
                    return res.end(0)
                elif c == '<':
                    level = level + 1
                elif c == '>':
                    level = level - 1
                    if level < 0:
                        self.syntax_error("bogus `>' in DOCTYPE")
                k = k+1
        res = endbracketfind.match(rawdata, k)
        if res is None:
            return -1
        if endbracket.match(rawdata, k) is None:
            self.syntax_error('garbage in DOCTYPE')
        self.handle_doctype(name, pubid, syslit, None)
        return res.end(0)

    # Internal -- handle CDATA tag, return length or -1 if not terminated
    def parse_cdata(self, i):
        rawdata = self.rawdata
        if rawdata[i:i+9] <> '<![CDATA[':
            raise RuntimeError, 'unexpected call to parse_cdata'
        res = cdataclose.search(rawdata, i+9)
        if res is None:
            return -1
        if illegal.search(rawdata, i+9, res.start(0)):
            self.syntax_error('illegal character in CDATA')
        if not self.stack:
            self.syntax_error('CDATA not in content')
        self.handle_cdata(rawdata[i+9:res.start(0)])
        return res.end(0)

    __xml_namespace_attributes = {'ns':None, 'src':None, 'prefix':None}
    # Internal -- handle a processing instruction tag
    def parse_proc(self, i):
        rawdata = self.rawdata
        end = procclose.search(rawdata, i)
        if end is None:
            return -1
        j = end.start(0)
        if illegal.search(rawdata, i+2, j):
            self.syntax_error('illegal character in processing instruction')
        res = tagfind.match(rawdata, i+2)
        if res is None:
            raise RuntimeError, 'unexpected call to parse_proc'
        k = res.end(0)
        name = res.group(0)
        if name == 'xml:namespace':
            self.syntax_error('old-fashioned namespace declaration')
            # namespace declaration
            # this must come after the <?xml?> declaration (if any)
            # and before the <!DOCTYPE> (if any).
            if self.__seen_doctype or self.__seen_starttag:
                self.syntax_error('xml:namespace declaration too late in document')
            attrdict, namespace, k = self.parse_attributes(name, k, j)
            if namespace:
                self.syntax_error('namespace declaration inside namespace declaration')
            for attrname in attrdict.keys():
                if not self.__xml_namespace_attributes.has_key(attrname):
                    self.syntax_error("unknown attribute `%s' in xml:namespace tag" % attrname)
            if not attrdict.has_key('ns') or not attrdict.has_key('prefix'):
                self.syntax_error('xml:namespace without required attributes')
            prefix = attrdict.get('prefix')
            if ncname.match(prefix) is None:
                self.syntax_error('xml:namespace illegal prefix value')
                return end.end(0)
            if self.__namespaces.has_key(prefix):
                self.syntax_error('xml:namespace prefix not unique')
            self.__namespaces[prefix] = attrdict['ns']
        else:
            if string.find(string.lower(name), 'xml') >= 0:
                self.syntax_error('illegal processing instruction target name')
            self.handle_proc(name, rawdata[k:j])
        return end.end(0)

    # Internal -- parse attributes between i and j
    def parse_attributes(self, tag, i, j):
        rawdata = self.rawdata
        attrdict = {}
        namespace = {}
        while i < j:
            res = attrfind.match(rawdata, i)
            if res is None:
                break
            attrname, attrvalue = res.group('name', 'value')
            i = res.end(0)
            if attrvalue is None:
                self.syntax_error("no value specified for attribute `%s'" % attrname)
                attrvalue = attrname
            elif attrvalue[:1] == "'" == attrvalue[-1:] or \
                 attrvalue[:1] == '"' == attrvalue[-1:]:
                attrvalue = attrvalue[1:-1]
            else:
                self.syntax_error("attribute `%s' value not quoted" % attrname)
            res = xmlns.match(attrname)
            if res is not None:
                # namespace declaration
                ncname = res.group('ncname')
                namespace[ncname or ''] = attrvalue or None
                continue
            if '<' in attrvalue:
                self.syntax_error("`<' illegal in attribute value")
            if attrdict.has_key(attrname):
                self.syntax_error("attribute `%s' specified twice" % attrname)
            attrvalue = string.translate(attrvalue, attrtrans)
            attrdict[attrname] = self.translate_references(attrvalue)
        return attrdict, namespace, i

    # Internal -- handle starttag, return length or -1 if not terminated
    def parse_starttag(self, i):
        rawdata = self.rawdata
        # i points to start of tag
        end = endbracketfind.match(rawdata, i+1)
        if end is None:
            return -1
        tag = starttagmatch.match(rawdata, i)
        if tag is None or tag.end(0) != end.end(0):
            self.syntax_error('garbage in starttag')
            return end.end(0)
        nstag = tagname = tag.group('tagname')
        if not self.__seen_starttag and self.__seen_doctype and \
           tagname != self.__seen_doctype:
            self.syntax_error('starttag does not match DOCTYPE')
        if self.__seen_starttag and not self.stack:
            self.syntax_error('multiple elements on top level')
        k, j = tag.span('attrs')
        attrdict, nsdict, k = self.parse_attributes(tagname, k, j)
        self.stack.append((tagname, nsdict, nstag))
        res = qname.match(tagname)
        if res is not None:
            prefix, nstag = res.group('prefix', 'local')
            if prefix is None:
                prefix = ''
            ns = None
            for t, d, nst in self.stack:
                if d.has_key(prefix):
                    ns = d[prefix]
            if ns is None and prefix != '':
                ns = self.__namespaces.get(prefix)
            if ns is not None:
                nstag = ns + ' ' + nstag
            elif prefix != '':
                nstag = prefix + ':' + nstag # undo split
            self.stack[-1] = tagname, nsdict, nstag
        # translate namespace of attributes
        nattrdict = {}
        for key, val in attrdict.items():
            res = qname.match(key)
            if res is not None:
                aprefix, key = res.group('prefix', 'local')
                if aprefix is None:
                    aprefix = ''
                ans = None
                for t, d, nst in self.stack:
                    if d.has_key(aprefix):
                        ans = d[aprefix]
                if ans is None and aprefix != '':
                    ans = self.__namespaces.get(aprefix)
                if ans is not None:
                    key = ans + ' ' + key
                elif aprefix != '':
                    key = aprefix + ':' + key
                elif ns is not None:
                    key = ns + ' ' + key
            nattrdict[key] = val
        attrdict = nattrdict
        attributes = self.attributes.get(nstag)
        if attributes is not None:
            for key in attrdict.keys():
                if not attributes.has_key(key):
                    self.syntax_error("unknown attribute `%s' in tag `%s'" % (key, tagname))
            for key, val in attributes.items():
                if val is not None and not attrdict.has_key(key):
                    attrdict[key] = val
        method = self.elements.get(nstag, (None, None))[0]
        self.finish_starttag(nstag, attrdict, method)
        if tag.group('slash') == '/':
            self.finish_endtag(tagname)
        return tag.end(0)

    # Internal -- parse endtag
    def parse_endtag(self, i):
        rawdata = self.rawdata
        end = endbracketfind.match(rawdata, i+1)
        if end is None:
            return -1
        res = tagfind.match(rawdata, i+2)
        if res is None:
            if self.literal:
                self.handle_data(rawdata[i])
                return i+1
            self.syntax_error('no name specified in end tag')
            tag = ''
            k = i+2
        else:
            tag = res.group(0)
            if self.literal:
                if not self.stack or tag != self.stack[-1][0]:
                    self.handle_data(rawdata[i])
                    return i+1
                self.literal = 0
            k = res.end(0)
        if endbracket.match(rawdata, k) is None:
            self.syntax_error('garbage in end tag')
        self.finish_endtag(tag)
        return end.end(0)

    # Internal -- finish processing of start tag
    def finish_starttag(self, tagname, attrdict, method):
        if method is not None:
            self.handle_starttag(tagname, method, attrdict)
        else:
            self.unknown_starttag(tagname, attrdict)

    # Internal -- finish processing of end tag
    def finish_endtag(self, tag):
        if not tag:
            self.syntax_error('name-less end tag')
            found = len(self.stack) - 1
            if found < 0:
                self.unknown_endtag(tag)
                return
        else:
            found = -1
            for i in range(len(self.stack)):
                if tag == self.stack[i][0]:
                    found = i
            if found == -1:
                self.syntax_error('unopened end tag')
                method = self.elements.get(tag, (None, None))[1]
                if method is not None:
                    self.handle_endtag(tag, method)
                else:
                    self.unknown_endtag(tag)
                return
        while len(self.stack) > found:
            if found < len(self.stack) - 1:
                self.syntax_error('missing close tag for %s' % self.stack[-1][2])
            nstag = self.stack[-1][2]
            method = self.elements.get(nstag, (None, None))[1]
            if method is not None:
                self.handle_endtag(nstag, method)
            else:
                self.unknown_endtag(nstag)
            del self.stack[-1]

    # Overridable -- handle xml processing instruction
    def handle_xml(self, encoding, standalone):
        pass

    # Overridable -- handle DOCTYPE
    def handle_doctype(self, tag, pubid, syslit, data):
        pass

    # Overridable -- handle start tag
    def handle_starttag(self, tag, method, attrs):
        method(attrs)

    # Overridable -- handle end tag
    def handle_endtag(self, tag, method):
        method()

    # Example -- handle character reference, no need to override
    def handle_charref(self, name):
        try:
            if name[0] == 'x':
                n = string.atoi(name[1:], 16)
            else:
                n = string.atoi(name)
        except string.atoi_error:
            self.unknown_charref(name)
            return
        if not 0 <= n <= 255:
            self.unknown_charref(name)
            return
        self.handle_data(chr(n))

    # Definition of entities -- derived classes may override
    entitydefs = {'lt': '&#60;',        # must use charref
                  'gt': '&#62;',
                  'amp': '&#38;',       # must use charref
                  'quot': '&#34;',
                  'apos': '&#39;',
                  }

    # Example -- handle entity reference, no need to override
    def handle_entityref(self, name):
        table = self.entitydefs
        if table.has_key(name):
            self.handle_data(table[name])
        else:
            self.unknown_entityref(name)
            return

    # Example -- handle data, should be overridden
    def handle_data(self, data):
        pass

    # Example -- handle cdata, could be overridden
    def handle_cdata(self, data):
        pass

    # Example -- handle comment, could be overridden
    def handle_comment(self, data):
        pass

    # Example -- handle processing instructions, could be overridden
    def handle_proc(self, name, data):
        pass

    # Example -- handle relatively harmless syntax errors, could be overridden
    def syntax_error(self, message):
        raise RuntimeError, 'Syntax error at line %d: %s' % (self.lineno, message)

    # To be overridden -- handlers for unknown objects
    def unknown_starttag(self, tag, attrs): pass
    def unknown_endtag(self, tag): pass
    def unknown_charref(self, ref): pass
    def unknown_entityref(self, ref): pass


class TestXMLParser(XMLParser):

    def __init__(self):
        self.testdata = ""
        XMLParser.__init__(self)

    def handle_xml(self, encoding, standalone):
        self.flush()
        print 'xml: encoding =',encoding,'standalone =',standalone

    def handle_doctype(self, tag, pubid, syslit, data):
        self.flush()
        print 'DOCTYPE:',tag, `data`

    def handle_entity(self, name, strval, pubid, syslit, ndata):
        self.flush()
        print 'ENTITY:',`data`

    def handle_data(self, data):
        self.testdata = self.testdata + data
        if len(`self.testdata`) >= 70:
            self.flush()

    def flush(self):
        data = self.testdata
        if data:
            self.testdata = ""
            print 'data:', `data`

    def handle_cdata(self, data):
        self.flush()
        print 'cdata:', `data`

    def handle_proc(self, name, data):
        self.flush()
        print 'processing:',name,`data`

    def handle_comment(self, data):
        self.flush()
        r = `data`
        if len(r) > 68:
            r = r[:32] + '...' + r[-32:]
        print 'comment:', r

    def syntax_error(self, message):
        print 'error at line %d:' % self.lineno, message

    def unknown_starttag(self, tag, attrs):
        self.flush()
        if not attrs:
            print 'start tag: <' + tag + '>'
        else:
            print 'start tag: <' + tag,
            for name, value in attrs.items():
                print name + '=' + '"' + value + '"',
            print '>'

    def unknown_endtag(self, tag):
        self.flush()
        print 'end tag: </' + tag + '>'

    def unknown_entityref(self, ref):
        self.flush()
        print '*** unknown entity ref: &' + ref + ';'

    def unknown_charref(self, ref):
        self.flush()
        print '*** unknown char ref: &#' + ref + ';'

    def close(self):
        XMLParser.close(self)
        self.flush()

def test(args = None):
    import sys

    if not args:
        args = sys.argv[1:]

    if args and args[0] == '-s':
        args = args[1:]
        klass = XMLParser
    else:
        klass = TestXMLParser

    if args:
        file = args[0]
    else:
        file = 'test.xml'

    if file == '-':
        f = sys.stdin
    else:
        try:
            f = open(file, 'r')
        except IOError, msg:
            print file, ":", msg
            sys.exit(1)

    data = f.read()
    if f is not sys.stdin:
        f.close()

    x = klass()
    try:
        for c in data:
            x.feed(c)
        x.close()
    except RuntimeError, msg:
        print msg
        sys.exit(1)


if __name__ == '__main__':
    test()

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <10433.914002226.3@bireme.cwi.nl>
Content-Description: libxmllib.tex
Content-Disposition: attachment; filename="libxmllib.tex"

\section{\module{xmllib} ---
         A parser for XML documents.}
% Author: Sjoerd Mullender
\declaremodule{standard}{xmllib}

\modulesynopsis{A parser for XML documents.}

\index{XML}

This module defines a class \class{XMLParser} which serves as the basis 
for parsing text files formatted in XML (eXtended Markup Language).

\begin{classdesc}{XMLParser}{}
The \class{XMLParser} class must be instantiated without arguments.
\end{classdesc}

This class provides the following interface methods and instance variables:

\begin{memberdesc}{attributes}
A mapping of element names to mappings.  The latter mapping maps
attribute names that are valid for the element to the default value of 
the attribute, or if there is no default to \code{None}.  The default
value is the empty dictionary.
\end{memberdesc}

\begin{memberdesc}{elements} 
A mapping of element names to tuples.  The tuples contain a function
for handling the start and end tag respectively of the element, or
\code{None} if the method \method{unknown_starttag()} or
\method{unknown_endtag()} is to be called.  The default value is the
empty dictionary.
\end{memberdesc}

\begin{memberdesc}{entitydefs}
A mapping of entitynames to their values.  The default value contains
definitions for \code{'lt'}, \code{'gt'}, \code{'amp'}, \code{'quot'}, 
and \code{'apos'}.
\end{memberdesc}

\begin{methoddesc}{reset}{}
Reset the instance.  Loses all unprocessed data.  This is called
implicitly at the instantiation time.
\end{methoddesc}

\begin{methoddesc}{setnomoretags}{}
Stop processing tags.  Treat all following input as literal input
(CDATA).
\end{methoddesc}

\begin{methoddesc}{setliteral}{}
Enter literal mode (CDATA mode).  This mode is automatically exited
when the close tag matching the last unclosed open tag is encountered.
\end{methoddesc}

\begin{methoddesc}{feed}{data}
Feed some text to the parser.  It is processed insofar as it consists
of complete tags; incomplete data is buffered until more data is
fed or \method{close()} is called.
\end{methoddesc}

\begin{methoddesc}{close}{}
Force processing of all buffered data as if it were followed by an
end-of-file mark.  This method may be redefined by a derived class to
define additional processing at the end of the input, but the
redefined version should always call \method{close()}.
\end{methoddesc}

\begin{methoddesc}{translate_references}{data}
Translate all entity and character references in \var{data} and
returns the translated string.
\end{methoddesc}

\begin{methoddesc}{handle_xml}{encoding, standalone}
This method is called when the \samp{<?xml ...?>} tag is processed.
The arguments are the values of the encoding and standalone attributes 
in the tag.  Both encoding and standalone are optional.  The values
passed to \method{handle_xml()} default to \code{None} and the string
\code{'no'} respectively.
\end{methoddesc}

\begin{methoddesc}{handle_doctype}{tag, data}
This method is called when the \samp{<!DOCTYPE...>} tag is processed.
The arguments are the name of the root element and the uninterpreted
contents of the tag, starting after the white space after the name of
the root element.
\end{methoddesc}

\begin{methoddesc}{handle_starttag}{tag, method, attributes}
This method is called to handle start tags for which a start tag
handler is defined in the instance variable \member{elements}.  The
\var{tag} argument is the name of the tag, and the \var{method}
argument is the function (method) which should be used to support semantic
interpretation of the start tag.  The \var{attributes} argument is a
dictionary of attributes, the key being the \var{name} and the value
being the \var{value} of the attribute found inside the tag's
\code{<>} brackets.  Character and entity references in the
\var{value} have been interpreted.  For instance, for the start tag
\code{<A HREF="http://www.cwi.nl/">}, this method would be called as
\code{handle_starttag('A', self.elements['A'][0], \{'HREF': 'http://www.cwi.nl/'\})}.
The base implementation simply calls \var{method} with \var{attributes}
as the only argument.
\end{methoddesc}

\begin{methoddesc}{handle_endtag}{tag, method}
This method is called to handle endtags for which an end tag handler
is defined in the instance variable \member{elements}.  The \var{tag}
argument is the name of the tag, and the \var{method} argument is the
function (method) which should be used to support semantic
interpretation of the end tag.  For instance, for the endtag
\code{</A>}, this method would be called as \code{handle_endtag('A',
self.elements['A'][1])}.  The base implementation simply calls
\var{method}.
\end{methoddesc}

\begin{methoddesc}{handle_data}{data}
This method is called to process arbitrary data.  It is intended to be
overridden by a derived class; the base class implementation does
nothing.
\end{methoddesc}

\begin{methoddesc}{handle_charref}{ref}
This method is called to process a character reference of the form
\samp{\&\#\var{ref};}.  \var{ref} can either be a decimal number,
or a hexadecimal number when preceded by an \character{x}.
In the base implementation, \var{ref} must be a number in the
range 0-255.  It translates the character to \ASCII{} and calls the
method \method{handle_data()} with the character as argument.  If
\var{ref} is invalid or out of range, the method
\code{unknown_charref(\var{ref})} is called to handle the error.  A
subclass must override this method to provide support for character
references outside of the \ASCII{} range.
\end{methoddesc}

\begin{methoddesc}{handle_entityref}{ref}
This method is called to process a general entity reference of the
form \samp{\&\var{ref};} where \var{ref} is an general entity
reference.  It looks for \var{ref} in the instance (or class)
variable \member{entitydefs} which should be a mapping from entity
names to corresponding translations.
If a translation is found, it calls the method \method{handle_data()}
with the translation; otherwise, it calls the method
\code{unknown_entityref(\var{ref})}.  The default \member{entitydefs}
defines translations for \code{\&amp;}, \code{\&apos}, \code{\&gt;},
\code{\&lt;}, and \code{\&quot;}.
\end{methoddesc}

\begin{methoddesc}{handle_comment}{comment}
This method is called when a comment is encountered.  The
\var{comment} argument is a string containing the text between the
\samp{<!--} and \samp{-->} delimiters, but not the delimiters
themselves.  For example, the comment \samp{<!--text-->} will
cause this method to be called with the argument \code{'text'}.  The
default method does nothing.
\end{methoddesc}

\begin{methoddesc}{handle_cdata}{data}
This method is called when a CDATA element is encountered.  The
\var{data} argument is a string containing the text between the
\samp{<![CDATA[} and \samp{]]>} delimiters, but not the delimiters
themselves.  For example, the entity \samp{<![CDATA[text]]>} will
cause this method to be called with the argument \code{'text'}.  The
default method does nothing, and is intended to be overridden.
\end{methoddesc}

\begin{methoddesc}{handle_proc}{name, data}
This method is called when a processing instruction (PI) is
encountered.  The \var{name} is the PI target, and the \var{data}
argument is a string containing the text between the PI target and the
closing delimiter, but not the delimiter itself.  For example, the
instruction \samp{<?XML text?>} will cause this method to be called
with the arguments \code{'XML'} and \code{'text'}.  The default method
does nothing.  Note that if a document starts with \samp{<?xml
...?>}, \method{handle_xml()} is called to handle it.
\end{methoddesc}

\begin{methoddesc}{handle_special}{data}
This method is called when a declaration is encountered.  The
\var{data} argument is a string containing the text between the
\samp{<!} and \samp{>} delimiters, but not the delimiters
themselves.  For example, the entity \samp{<!ENTITY text>} will
cause this method to be called with the argument \code{'ENTITY text'}.  The
default method does nothing.  Note that \samp{<!DOCTYPE ...>} is
handled separately if it is located at the start of the document.
\end{methoddesc}

\begin{methoddesc}{syntax_error}{message}
This method is called when a syntax error is encountered.  The
\var{message} is a description of what was wrong.  The default method 
raises a \exception{RuntimeError} exception.  If this method is
overridden, it is permissable for it to return.  This method is only
called when the error can be recovered from.  Unrecoverable errors
raise a \exception{RuntimeError} without first calling
\method{syntax_error()}.
\end{methoddesc}

\begin{methoddesc}{unknown_starttag}{tag, attributes}
This method is called to process an unknown start tag.  It is intended
to be overridden by a derived class; the base class implementation
does nothing.
\end{methoddesc}

\begin{methoddesc}{unknown_endtag}{tag}
This method is called to process an unknown end tag.  It is intended
to be overridden by a derived class; the base class implementation
does nothing.
\end{methoddesc}

\begin{methoddesc}{unknown_charref}{ref}
This method is called to process unresolvable numeric character
references.  It is intended to be overridden by a derived class; the
base class implementation does nothing.
\end{methoddesc}

\begin{methoddesc}{unknown_entityref}{ref}
This method is called to process an unknown entity reference.  It is
intended to be overridden by a derived class; the base class
implementation does nothing.
\end{methoddesc}

\subsection{XML Namespaces}

This module has support for XML namespaces as defined in the XML
Namespaces proposed recommendation.

Tag and attribute names that are defined in an XML namespace are
handled as if the name of the tag or element consisted of the
namespace (i.e. the URL that defines the namespace) followed by a
space and the name of the tag or attribute.  For instance, the tag
\code{<html xmlns='http://www.w3.org/TR/REC-html40'>} is treated as if 
the tag name was \code{'http://www.w3.org/TR/REC-html40 html'}, and
the tag \code{<html:a href='http://frob.com'>} inside the above
mentioned element is treated as if the tag name were
\code{'http://www.w3.org/TR/REC-html40 a'} and the attribute name as
if it were \code{'http://www.w3.org/TR/REC-html40 src'}.

An older draft of the XML Namespaces proposal is also recognized, but
triggers a warning.

------- =_aaaaaaaaaa0--


From paul@prescod.net  Fri Dec 18 20:10:16 1998
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Dec 1998 14:10:16 -0600
Subject: [XML-SIG] Recent CVS changes
References: <199812181651.KAA31392@bitsko.slc.ut.us>
Message-ID: <367AB6A8.F2D7B76@prescod.net>

Ken MacLeod wrote:
> 
> Another is LDO's XML serialization:
> 
>   <http://www.ntlug.org/cgi-bin/cvsweb/LDO/ldo-xml.dtd>
> 
> The DTD itself has basic specs and I hope to complete more docs over
> Christmas vacation.

Can you people please explain why we need all of these competing
proposals? XML-RPC looks like a superset of WDDX (in that it has a concept
of "method"). It could be described as a superset of WDDX, couldn't it? 

LDO looks like a *subset* of WDDX except for the REF element type. 

Can't we all just get along?

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon


From akuchlin@cnri.reston.va.us  Fri Dec 18 22:10:22 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 18 Dec 1998 17:10:22 -0500 (EST)
Subject: [XML-SIG] Marshalling
In-Reply-To: <13946.31647.653312.303899@weyr.cnri.reston.va.us>
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net>
 <13946.24429.905213.372579@amarok.cnri.reston.va.us>
 <13946.31647.653312.303899@weyr.cnri.reston.va.us>
Message-ID: <13946.53571.550184.123458@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>
>>>Paul Prescod writes:
>>> * why have a single class for marshalling and unmarshalling?
>>Andrew M. Kuchling writes:
>> 	My fuzzy argument for this was that I wanted the user to write
>> only a single subclass, not two of them. 
>
>  Now's my turn to say "this is bogus".  This is bogus.  It's

	<wrestling-announcer>And Prescod slams Kuchling into the mat,
stunning him!  Now Drake has him in a headlock!  Oh, the
humanity...</wrestling-announcer> OK, I'll try to divide the two
functions into separate classes, and see how it goes.  Would it be all
right if I left both the m_* and um_* methods on the basic Marshalling
class, and just pushed out the SAX handler methods?  Or should there
be different Marshaller and Unmarshaller classes?

	Incidentally, Paul's idea of changing Python's pickle module
to XML is an interesting one for Python 2.0, but not really possible
before then.  It would be nice if xml.marshal could do what pickle
does, though.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your
home.
    -- Terry Pratchett & Neil Gaiman, _Good Omens_


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Dec 18 22:22:48 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 18 Dec 1998 17:22:48 -0500 (EST)
Subject: [XML-SIG] Marshalling
In-Reply-To: <13946.53571.550184.123458@amarok.cnri.reston.va.us>
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net>
 <13946.24429.905213.372579@amarok.cnri.reston.va.us>
 <13946.31647.653312.303899@weyr.cnri.reston.va.us>
 <13946.53571.550184.123458@amarok.cnri.reston.va.us>
Message-ID: <13946.54712.149166.228072@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > right if I left both the m_* and um_* methods on the basic Marshalling
 > class, and just pushed out the SAX handler methods?  Or should there
 > be different Marshaller and Unmarshaller classes?

  Wasn't that the point?  I think pickle and xdrlib got the model
right: packing and unpacking are two different functions.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From gstein@lyra.org  Fri Dec 18 22:59:57 1998
From: gstein@lyra.org (Greg Stein)
Date: Fri, 18 Dec 1998 14:59:57 -0800
Subject: [XML-SIG] Recent CVS changes
References: <199812181651.KAA31392@bitsko.slc.ut.us> <367AB6A8.F2D7B76@prescod.net>
Message-ID: <367ADE6D.357DABD3@lyra.org>

Paul Prescod wrote:
> 
> Ken MacLeod wrote:
> >
> > Another is LDO's XML serialization:
> >
> >   <http://www.ntlug.org/cgi-bin/cvsweb/LDO/ldo-xml.dtd>
> >
> > The DTD itself has basic specs and I hope to complete more docs over
> > Christmas vacation.
> 
> Can you people please explain why we need all of these competing
> proposals? XML-RPC looks like a superset of WDDX (in that it has a concept
> of "method"). It could be described as a superset of WDDX, couldn't it?
> 
> LDO looks like a *subset* of WDDX except for the REF element type.
> 
> Can't we all just get along?

Reality says "no"

I think we would be in error to create a new one, but since those others
are already out there, then (IMO) it is best if we can work with them.
Put politics and ideals aside -- pragmatism says "damn it, I need a
connector because I need to work with XYZ". It would be nice to keep
Python in the game here.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From digitome@iol.ie  Sat Dec 19 10:58:44 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Sat, 19 Dec 1998 10:58:44 +0000
Subject: [XML-SIG] Python tutorial at XML Europe '99
Message-ID: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie>

My proposal to present a half day tutorial on
Python at XML Europe '98 in Spain has
been accepted. Python goes mainstream
at GCA SGML/XML Conference. Great!

I will also be doing a half day Python tutorial
at WWW8 where XML will receive more than
a passing mention:-)

The Python/XML combo marches ever onward...

See www.gca.org and www. www8.org for
conference details.

Regards,
Sean

<!ELEMENT turtle (turtle?)>


From Fred L. Drake, Jr." <fdrake@acm.org  Sat Dec 19 15:38:07 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Sat, 19 Dec 1998 10:38:07 -0500 (EST)
Subject: [XML-SIG] Python tutorial at XML Europe '99
In-Reply-To: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie>
References: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie>
Message-ID: <13947.51295.280708.515434@weyr.cnri.reston.va.us>

Sean Mc Grath writes:
 > My proposal to present a half day tutorial on
 > Python at XML Europe '98 in Spain has
 > been accepted. Python goes mainstream
 > at GCA SGML/XML Conference. Great!

  Congratulations!

 > I will also be doing a half day Python tutorial
 > at WWW8 where XML will receive more than
 > a passing mention:-)

  Sounds like you've been busy!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From ken@bitsko.slc.ut.us  Sat Dec 19 16:55:22 1998
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 19 Dec 1998 10:55:22 -0600
Subject: [XML-SIG] Recent CVS changes
In-Reply-To: Paul Prescod's message of Fri, 18 Dec 1998 14:10:16 -0600
References: <199812181651.KAA31392@bitsko.slc.ut.us> <367AB6A8.F2D7B76@prescod.net>
Message-ID: <m3vhj8ysud.fsf@biff.bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> Ken MacLeod wrote:
> > 
> > Another is LDO's XML serialization:
> > 
> >   <http://www.ntlug.org/cgi-bin/cvsweb/LDO/ldo-xml.dtd>
> > 
> > The DTD itself has basic specs and I hope to complete more docs over
> > Christmas vacation.
> 
> Can you people please explain why we need all of these competing
> proposals? XML-RPC looks like a superset of WDDX (in that it has a concept
> of "method"). It could be described as a superset of WDDX, couldn't it? 
> 
> Can't we all just get along?

Serialization in LDO is modular, and LDO includes binary and XML
serialization specs that are a ``best fit'' for how LDO handles
distributed objects.  Python's `pickle' and Perl's `Storable' also
work well within LDO for python-to-python or perl-to-perl messages.

I would be glad to support WDDX serialization too, or in place of
LDO's XML serialization, but it's not a ``best fit'' for LDO right
now, in part because it's not specified how to handle binary values
(using base64 for example), null values are explicitly unsupported,
there's no type or class attributes, no support for object references,
and no support for non-string keys in dictionaries (structures).

> LDO looks like a *subset* of WDDX except for the REF element type. 

LDO's XML serialization may have fewer tags, but it does support all
the semantics described above.  I would say it is actually a superset,
because everything in WDDX can be encoded in LDO's XML serialization,
but the reverse is not true.

-- 
  Ken MacLeod
  ken@bitsko.slc.ut.us


From gwachob@aimnet.com  Sat Dec 19 20:28:01 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Sat, 19 Dec 1998 12:28:01 -0800 (PST)
Subject: [XML-SIG] Recent CVS changes
In-Reply-To: <m3vhj8ysud.fsf@biff.bitsko.slc.ut.us>
Message-ID: <Pine.GSO.4.05.9812191211440.26128-100000@shell1.ncal.verio.com>

On 19 Dec 1998, Ken MacLeod wrote:

> Paul Prescod <paul@prescod.net> writes:
> 
> > Ken MacLeod wrote:
> > LDO looks like a *subset* of WDDX except for the REF element type. 
> 
> LDO's XML serialization may have fewer tags, but it does support all
> the semantics described above.  I would say it is actually a superset,
> because everything in WDDX can be encoded in LDO's XML serialization,
> but the reverse is not true.

Having worked a little with Ken on the LDO/Python stuff as well as the
WDDX stuff, I must say that they do serve very similar functions. Ken's
stuff I think has more "requirements" and thus is a little more
complicated. WDDX is simpler, and is much easier to implement (thats not a
knock against Ken's work -- his work is more ambitious, IMHO). 

The one thing I would say is that Ken's LDO specification relies more on
the processes at each end of the wire to decode what the information
traveling over the wire means in a semantic sense. LDO explicitly has no
concept of type (which leads to some thorny issues ;-), whereas WDDX has
hints or outright imposition of type information. 

I look at LDO as the "XML" of serialization, whereas WDDX is more like the
"HTML" of serialization (in that LDO can be used for more different
things, but it requires more work on the processing ends by application
writers).

If that isn't flame bait, I don't know what is ;-)

I like both. <cheek><tongue>Why can't we all get along!</tongue></cheek>
One will probably be used more widely than the other. I think LDO is more
consistent, but I think WDDX is obviously easier to use. One if written by
a really bright guy for a great opensource project (Casbah -
http://www.ntlug.org/casbah), one is written by a well-known
application-server company who have a lot of recognition. I don't know
which one will survive (hell, maybe they *both* will -- that'd be ok)

	-Gabe

 
-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From jtauber@jtauber.com  Sun Dec 20 06:21:25 1998
From: jtauber@jtauber.com (James Tauber)
Date: Sun, 20 Dec 1998 14:21:25 +0800
Subject: [XML-SIG] Python tutorial at XML Europe '99
Message-ID: <00c201be2be1$4988f160$0300000a@othniel.cygnus.uwa.edu.au>

-----Original Message-----
From: Sean Mc Grath <digitome@iol.ie>
>I will also be doing a half day Python tutorial
>at WWW8 where XML will receive more than
>a passing mention:-)

And I will be doing a full day XML tutorial at WWW8 where Python will
receive mention :-)

>The Python/XML combo marches ever onward...

Indeed.

James
--
James Tauber / jtauber@jtauber.com / www.jtauber.com
Associate Researcher, Electronic Commerce Network
Curtin University of Technology, Perth, Western Australia

Maintainer of : www.xmlinfo.com,  www.xmlsoftware.com and www.schema.net


From larsga@ifi.uio.no  Sun Dec 20 12:02:55 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 20 Dec 1998 13:02:55 +0100
Subject: [XML-SIG] Python tutorial at XML Europe '99
In-Reply-To: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie>
References: <3.0.6.32.19981219105844.0092b320@gpo.iol.ie>
Message-ID: <wkogoznhqo.fsf@ifi.uio.no>

* Sean Mc Grath
|
| My proposal to present a half day tutorial on Python at XML Europe
| '98 in Spain has been accepted. 

Cool! :)

For my own part, I will give a full-day tutorial on XML processing
(sort of an expansion of the workshop Paul, Geir Ove and I did at
SGML/XML Norway '98 a couple of weeks ago) at the same conference. Of
course, Python will receive more than a passing mention.

--Lars M.


From larsga@ifi.uio.no  Sun Dec 20 14:29:45 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 20 Dec 1998 15:29:45 +0100
Subject: [XML-SIG] Perl and character encodings
In-Reply-To: <36750A0B.EBEB7355@prescod.net>
References: <36750A0B.EBEB7355@prescod.net>
Message-ID: <wklnk2opie.fsf@ifi.uio.no>

* Paul Prescod [quoting an XML::Parser release announcement]
|
| > The major new feature is access to character set encodings other than
| > expat's built-in set (UTF-8, UTF-16, ISO-8859-1, US-ASCII). This is done
| > through binary character encoding maps appearing in the pathlist
| > represented by @XML::Parser::Expat::Encoding_Path. 

Just for the record: xmlproc has something similar in its charconv
module. This module is currently not used by the parser, but modifying
xmlproc to use it is a very simple job. I've not given these changes
priority, since the conversions that are not simple mappings that can
be handled by string.translate are way too slow (and these are of
course the most interesting ones, such as utf-8 -> iso-8559-1 and vice
versa).

Martin von L�wis' module looks like it has some stuff I can use, so
this may appear soon if anyone wants it enough to ask for it (or if I
one day feel like making it).

If anyone else feels like having a go at this, then feel free.

--Lars M.


From Sjoerd.Mullender@cwi.nl  Mon Dec 21 10:35:19 1998
From: Sjoerd.Mullender@cwi.nl (Sjoerd Mullender)
Date: Mon, 21 Dec 1998 11:35:19 +0100
Subject: [XML-SIG] New version of xmllib
In-Reply-To: Your message of Fri, 18 Dec 1998 18:33:02 +0100.
 <UTC199812181733.SAA10767.sjoerd@bireme.cwi.nl>
References: <UTC199812181733.SAA10767.sjoerd@bireme.cwi.nl>
Message-ID: <UTC199812211035.LAA21684.sjoerd@bireme.cwi.nl>

On Fri, Dec 18 1998 Sjoerd Mullender wrote:

> Here is my current version of xmllib.py and the documentation.  This
> version has some API changes with respect to the version currently in
> Python (also the one in 1.5.2a).
> This version supports XML namespaces.

And here is a patch to this version.  There are two improvements:

- Fixed a bug where a syntax error was reported when a document
  started with white space.  (White space at the start of a document
  is valid if there is no XML declaration.)
- Improved the speed quite a bit for documents that don't make use of
  namespaces.

-- Sjoerd Mullender <Sjoerd.Mullender@cwi.nl>
   <URL:http://www.cwi.nl/~sjoerd/>

Index: xmllib.py
===================================================================
RCS file: /ufs/sjoerd/.CVSroot/mm/demo/pylib/xmllib.py,v
retrieving revision 1.24
diff -u -r1.24 xmllib.py
--- xmllib.py	1998/12/18 17:33:50	1.24
+++ xmllib.py	1998/12/21 10:25:29
@@ -100,6 +100,7 @@
         self.__at_start = 1
         self.__seen_doctype = None
         self.__seen_starttag = 0
+        self.__use_namespaces = 0
         self.__namespaces = {'xml':None}   # xml is implicitly declared
 
     # For derived classes only -- enter literal mode (CDATA) till EOF
@@ -183,10 +184,10 @@
             else:
                     j = n
             if i < j:
-                if self.__at_start:
+                data = rawdata[i:j]
+                if self.__at_start and space.match(data) is None:
                     self.syntax_error('illegal data at start of file')
                 self.__at_start = 0
-                data = rawdata[i:j]
                 if not self.stack and space.match(data) is None:
                     self.syntax_error('data not in content')
                 if illegal.search(data):
@@ -439,6 +440,7 @@
         name = res.group(0)
         if name == 'xml:namespace':
             self.syntax_error('old-fashioned namespace declaration')
+            self.__use_namespaces = -1
             # namespace declaration
             # this must come after the <?xml?> declaration (if any)
             # and before the <!DOCTYPE> (if any).
@@ -489,6 +491,8 @@
                 # namespace declaration
                 ncname = res.group('ncname')
                 namespace[ncname or ''] = attrvalue or None
+                if not self.__use_namespaces:
+                    self.__use_namespaces = len(self.stack)+1
                 continue
             if '<' in attrvalue:
                 self.syntax_error("`<' illegal in attribute value")
@@ -518,7 +522,10 @@
         k, j = tag.span('attrs')
         attrdict, nsdict, k = self.parse_attributes(tagname, k, j)
         self.stack.append((tagname, nsdict, nstag))
-        res = qname.match(tagname)
+        if self.__use_namespaces:
+            res = qname.match(tagname)
+        else:
+            res = None
         if res is not None:
             prefix, nstag = res.group('prefix', 'local')
             if prefix is None:
@@ -535,27 +542,28 @@
                 nstag = prefix + ':' + nstag # undo split
             self.stack[-1] = tagname, nsdict, nstag
         # translate namespace of attributes
-        nattrdict = {}
-        for key, val in attrdict.items():
-            res = qname.match(key)
-            if res is not None:
-                aprefix, key = res.group('prefix', 'local')
-                if aprefix is None:
-                    aprefix = ''
-                ans = None
-                for t, d, nst in self.stack:
-                    if d.has_key(aprefix):
-                        ans = d[aprefix]
-                if ans is None and aprefix != '':
-                    ans = self.__namespaces.get(aprefix)
-                if ans is not None:
-                    key = ans + ' ' + key
-                elif aprefix != '':
-                    key = aprefix + ':' + key
-                elif ns is not None:
-                    key = ns + ' ' + key
-            nattrdict[key] = val
-        attrdict = nattrdict
+        if self.__use_namespaces:
+            nattrdict = {}
+            for key, val in attrdict.items():
+                res = qname.match(key)
+                if res is not None:
+                    aprefix, key = res.group('prefix', 'local')
+                    if aprefix is None:
+                        aprefix = ''
+                    ans = None
+                    for t, d, nst in self.stack:
+                        if d.has_key(aprefix):
+                            ans = d[aprefix]
+                    if ans is None and aprefix != '':
+                        ans = self.__namespaces.get(aprefix)
+                    if ans is not None:
+                        key = ans + ' ' + key
+                    elif aprefix != '':
+                        key = aprefix + ':' + key
+                    elif ns is not None:
+                        key = ns + ' ' + key
+                nattrdict[key] = val
+            attrdict = nattrdict
         attributes = self.attributes.get(nstag)
         if attributes is not None:
             for key in attrdict.keys():
@@ -634,6 +642,8 @@
                 self.handle_endtag(nstag, method)
             else:
                 self.unknown_endtag(nstag)
+            if self.__use_namespaces == len(self.stack):
+                self.__use_namespaces = 0
             del self.stack[-1]
 
     # Overridable -- handle xml processing instruction


From Milan.Hemzal@pvt.cz  Mon Dec 21 15:02:38 1998
From: Milan.Hemzal@pvt.cz (=?ISO-8859-2?Q?Hem=BEal_Milan?=)
Date: Mon, 21 Dec 1998 16:02:38 +0100
Subject: [XML-SIG] (no subject)
Message-ID: <6CD0F60F48F9D1119E4B0000F87A9AE2287724@p40w13.plz.pvt.cz>


From gwachob@aimnet.com  Tue Dec 22 02:06:04 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Mon, 21 Dec 1998 18:06:04 -0800 (PST)
Subject: [XML-SIG] Simple WDDX Serialization
Message-ID: <Pine.GSO.4.05.9812211759550.15617-100000@shell1.ncal.verio.com>

OK, I have not been following the serialization thread very closely.

I want to put together a simple WDDX serializer, and I want to throw out
my idea to see if anyone can see any major problems.

Basically, serialization is easy for most objects.

Tuples, Arrays -> WDDX Arrays
Objects -> Structs (obviously, skipping methods)
Number -> Numbers
String -> String

For the dateTime WDDX type, I am thinking either 1) do pattern matching on
strings to determine if they are valid time/dates -- if so, make them
dateTime WDDX elements, or 2) if a string begins with a magic code, then
the rest of the string is interpreted as a dateTime element. We could also
have a flag in the serializer which turns on or off serialization into
dateTime globally for the serialization of a particular object.

I'm thinking that the serializer would only serialize a whole object at a
time (ie it would not allow for "building" WDDX packets programmatically)

Thoughts? Bumps in the road? 

	-Gabe


-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From gwachob@aimnet.com  Tue Dec 22 02:17:15 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Mon, 21 Dec 1998 18:17:15 -0800 (PST)
Subject: [XML-SIG] More on WDDX Serialization
Message-ID: <Pine.GSO.4.05.9812211814320.16588-100000@shell1.ncal.verio.com>

Oh yeah, the thorny Recordset issue.

I guess the rule would be if a dictionary contains a number of keys which
map to arrays of equal size, then the dictionary should encoded as a
recordset (but this also requires that the arrays of equal size contain
only "simple" types -- otherwise, we are to use an array of structures
(have to think about this one)). 

There are also issue about enforcing the distinguishability of recordset
field names (thats not too difficult). 

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From paul@prescod.net  Tue Dec 22 20:54:02 1998
From: paul@prescod.net (Paul Prescod)
Date: Tue, 22 Dec 1998 14:54:02 -0600
Subject: [XML-SIG] Simple WDDX Serialization
References: <Pine.GSO.4.05.9812211759550.15617-100000@shell1.ncal.verio.com>
Message-ID: <368006EA.C3328307@prescod.net>

Gabe Wachob wrote:
> 
> OK, I have not been following the serialization thread very closely.
> 
> I want to put together a simple WDDX serializer, and I want to throw out
> my idea to see if anyone can see any major problems.

You should probably build on the work that Andrew Kuchling is doing in his
"universal serializer."

> String -> String
> 
> For the dateTime WDDX type, I am thinking either 1) do pattern matching on
> strings to determine if they are valid time/dates -- if so, make them
> dateTime WDDX elements, or 2) if a string begins with a magic code, then
> the rest of the string is interpreted as a dateTime element. 

Autosensing of either sort seems dangerous. Also, Python dates can be
encoded as integers and tuples. (see the time module for more information.
What we need is to ship some particular date/time class with the XML
package and require people to use it on both input and output.

> We could also
> have a flag in the serializer which turns on or off serialization into
> dateTime globally for the serialization of a particular object.

I'm not sure what you mean.

> I'm thinking that the serializer would only serialize a whole object at a
> time (ie it would not allow for "building" WDDX packets programmatically)

That sounds fine.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Are the social and economic benefits of capital punishment sufficient
to outweigh the injustice of accidentally executing innocents?"
"What benefits???"


From akuchlin@cnri.reston.va.us  Tue Dec 22 21:17:13 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue, 22 Dec 1998 16:17:13 -0500 (EST)
Subject: [XML-SIG] Simple WDDX Serialization
In-Reply-To: <Pine.GSO.4.05.9812211759550.15617-100000@shell1.ncal.verio.com>
References: <Pine.GSO.4.05.9812211759550.15617-100000@shell1.ncal.verio.com>
Message-ID: <13952.2782.43087.9551@amarok.cnri.reston.va.us>

Gabe Wachob writes:
>Tuples, Arrays -> WDDX Arrays
>Objects -> Structs (obviously, skipping methods)
>Number -> Numbers
>String -> String

Dictionaries -> Structs would be another possibility. 

>For the dateTime WDDX type, I am thinking either 1) do pattern matching on
>strings to determine if they are valid time/dates -- if so, make them
>dateTime WDDX elements, or 2) if a string begins with a magic code, then
>the rest of the string is interpreted as a dateTime element. We could also

For dateTime, we would really need a standard date/time object,
included in either the Python standard library or in the XML package.
Instances of this object would then become dateTime elements in the
generated WDDX.

For record sets, I haven't thought up anything yet, but I like your
idea of a dictionary of keys mapping to equal-sized lists.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The world is full of people whose notion of a satisfactory future is, in fact,
a return to an idealised past.
    -- Robertson Davies, _A Voice from the Attic_


From gwachob@aimnet.com  Wed Dec 23 01:30:43 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Tue, 22 Dec 1998 17:30:43 -0800 (PST)
Subject: [XML-SIG] Simple WDDX Serialization
In-Reply-To: <368006EA.C3328307@prescod.net>
Message-ID: <Pine.GSO.4.05.9812221728410.3709-100000@shell1.ncal.verio.com>

On Tue, 22 Dec 1998, Paul Prescod wrote:

> Gabe Wachob wrote:
> > 
> > OK, I have not been following the serialization thread very closely.
> > 
> > I want to put together a simple WDDX serializer, and I want to throw out
> > my idea to see if anyone can see any major problems.
> 
> You should probably build on the work that Andrew Kuchling is doing in his
> "universal serializer."

I saw mentions of this, but I have seen this. Pointers? 

> 
> > String -> String
> > 
> > For the dateTime WDDX type, I am thinking either 1) do pattern matching on
> > strings to determine if they are valid time/dates -- if so, make them
> > dateTime WDDX elements, or 2) if a string begins with a magic code, then
> > the rest of the string is interpreted as a dateTime element. 
> 
> Autosensing of either sort seems dangerous. Also, Python dates can be
> encoded as integers and tuples. (see the time module for more information.
> What we need is to ship some particular date/time class with the XML
> package and require people to use it on both input and output.

Well, thats fine -- I'm just trying to suggest something I can do now... 

> > We could also
> > have a flag in the serializer which turns on or off serialization into
> > dateTime globally for the serialization of a particular object.
> 
> I'm not sure what you mean.

Not important -- basically to allow globally for NOT doing the
autosensing. 

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From gwachob@aimnet.com  Wed Dec 23 01:37:36 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Tue, 22 Dec 1998 17:37:36 -0800 (PST)
Subject: [XML-SIG] Simple WDDX Serialization
In-Reply-To: <13952.2782.43087.9551@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.05.9812221731300.3709-100000@shell1.ncal.verio.com>

On Tue, 22 Dec 1998, Andrew M. Kuchling wrote:

> Gabe Wachob writes:
> >Tuples, Arrays -> WDDX Arrays
> >Objects -> Structs (obviously, skipping methods)
> >Number -> Numbers
> >String -> String
> 
> Dictionaries -> Structs would be another possibility. 

I think I mentioned that in another email.. If I didn't then oops. 

> >For the dateTime WDDX type, I am thinking either 1) do pattern matching on
> >strings to determine if they are valid time/dates -- if so, make them
> >dateTime WDDX elements, or 2) if a string begins with a magic code, then
> >the rest of the string is interpreted as a dateTime element. We could also
> 
> For dateTime, we would really need a standard date/time object,
> included in either the Python standard library or in the XML package.
> Instances of this object would then become dateTime elements in the
> generated WDDX.

Hey, I'm all for a dateTime object in the Python lib... However, isn't the
point of writing a WDDX serializer to make WDDX *transparent*? That is,
don't you want to eliminate special effort on the part of the Python
programmer in composing WDDX packets from Python entities? 

It seems unclean to have the WDDX serializer be transparent *except* for
the dateTime object -- perhaps this is WDDX's fault (dateTime seems to me
to be a higher level abstraction than String, Number, Array, etc). 

> For record sets, I haven't thought up anything yet, but I like your
> idea of a dictionary of keys mapping to equal-sized lists.

I see this an unavoidable kludge, actually. The problem is that the array
elements have to consist solely of "simple" types (according to the DTD).
That means that to "autodetect" that a dictionary should be mapped to a
recordset, we need to figure out the type of every element in every array
in the dictionary. Now, I suppose this may not be a big issue if we assume
that the data structures involved are not too complex or large (a valid
assumption given the type of applications likely to use WDDX, I would
think).

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From paul@prescod.net  Wed Dec 23 05:19:29 1998
From: paul@prescod.net (Paul Prescod)
Date: Tue, 22 Dec 1998 23:19:29 -0600
Subject: [XML-SIG] Simple WDDX Serialization
References: <Pine.GSO.4.05.9812221731300.3709-100000@shell1.ncal.verio.com>
Message-ID: <36807D61.13A47F5C@prescod.net>

Gabe Wachob wrote:
> 
> It seems unclean to have the WDDX serializer be transparent *except* for
> the dateTime object -- perhaps this is WDDX's fault (dateTime seems to me
> to be a higher level abstraction than String, Number, Array, etc).

WDDX is not going to be transparent unless it handles instances and none
of the implementations handle those yet. I can't remember the last time I
created a Python data structure that consisted of only dictionaries,
tuples, lists and other built-in types. Anyhow, there is no way to make
date/time handling transparent in Python until there is a Python date/time
class.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"In spite of everything I still believe that people are basically 
good at heart." - Anne Frank


From Frank McGeough" <fm@synchrologic.com  Sun Dec 27 04:24:27 1998
From: Frank McGeough" <fm@synchrologic.com (Frank McGeough)
Date: Sat, 26 Dec 1998 23:24:27 -0500
Subject: [XML-SIG] Running XML on NT
Message-ID: <000c01be3150$cb73ef30$289b90d1@frank_home.synchrologic.com>

Hi,

Is it possible to run the test release of XML
on NT? I downloaded the software from :
http://www.python.org/topics/xml/download.html

The README says to run make. I don't have a
Unix style make. Is there a version that would
work with Microsoft's nmake and VC compiler.

Thanks,
Frank

Synchrologic, Inc.
http://www.synchrologic.com
T: 770.754.5600
F: 770.619.5612


From akuchlin@cnri.reston.va.us  Sun Dec 27 16:30:28 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 27 Dec 1998 11:30:28 -0500
Subject: [XML-SIG] Marshalling
In-Reply-To: <13946.24429.905213.372579@amarok.cnri.reston.va.us>
References: <199812180422.XAA00960@207-172-59-23.s277.tnt2.ann.erols.com>
 <3679F613.68A22D40@prescod.net>
 <13946.24429.905213.372579@amarok.cnri.reston.va.us>
Message-ID: <199812271630.LAA14883@207-172-46-235.s235.tnt9.ann.erols.com>

I've been working on the XML marshalling some more, and have
implemented handling of Python instances.  In the generic module, 
instances are marshalled as:

<object module="__main__" class="A">
  <tuple>... init args</tuple>
  <dictionary> ... contents of __dict__ ... </dictionary>
</object>

I don't know what to do for WDDX and XML-RPC, if anything.

Earlier, I wrote:
 > 	Probably data_stack shouldn't be an attribute of the class,
 > but be passed to each of the unmarshalling functions.  That would mean
 > that the Marshaller class would have no mutable attributes at all, and
 > the self.__class__ thing would be unnecessary.

	Unfortunately, I realized this isn't possible because I'm
using SAX to parse the XML when unmarshalling, and that gives me no
way to pass in an additional argument.  So the self.__class__() hack
has to stay.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Perhaps they are leaving the village. They are going up to the high place, to
wait there for the end of their world. And here in my room (I will be fifty
soon. I wonder if I will see that birthday, if I will be here to
celebrate?)... all alone, I am going with them.
    -- The director's last screenplay in SIGNAL TO NOISE


From akuchlin@cnri.reston.va.us  Sun Dec 27 17:07:42 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 27 Dec 1998 12:07:42 -0500
Subject: [XML-SIG] Namespace support for DOM
Message-ID: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com>

After reading over the current namespace working draft, I thought a
little bit about how PyDOM should support it.  I'd like to hear
opinions on this...

       Thought 1: the idea of walking over the whole tree and
annotating it is bad, because if you modify the tree, the annotations
become outdated and you have to recompute them.  

       Thought 2: similar reasoning applies to modifying element or
attribute names by removing the namespace prefix.

       Thought 3: therefore, the better course of action is to have
functions or methods that dynamically compute what namespaces apply by
looking at a node's ancestors.

       Thought 4: looking at the draft, the bits that are needed are
functions or methods to do the following:

	  1) Get a dictionary mapping namespace prefixes to URIs, and
vice versa; this would be done by walking up the tree looking at
xmlns:* attributes.

	  2) Get the default namespace (might be prefix = "" in the
dictionary returned from the previous function)

	  3) Divide an element or attribute name into the prefix and
the rest of the name.

This means that namespace-using applications won't have everything
done for them; Python code might look vaguely like:

XSL_URI = "http://www.w3.org/..."
uri = node.get_namespace_mapping()

# Next line assumes node is an Element tag
nsp, name = divide_qualified_name( node.tagName )

if uri[nsp] == XSL_URI:
    # node is a tag in the XSL namespace; react appropriately

elif uri[nsp] == other_namespace:
    # do something else

Proposed interfaces need to be tried out by actually implementing
something on top of them, in order to find areas that have been
missed.  Can anyone suggest some namespace-using application that
would be useful as a test case?  It would also provide another demo
application.  The transformation portion of XSL is one candidate, but
I haven't read enough of the XSL draft to get an idea of how big the
job would be.  Anyone know of something small?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
The bitterest tears shed over graves are for words left unsaid and for deeds
left undone.
    -- Harriet Beecher Stowe


From gwachob@aimnet.com  Mon Dec 28 03:11:51 1998
From: gwachob@aimnet.com (Gabe Wachob)
Date: Sun, 27 Dec 1998 19:11:51 -0800 (PST)
Subject: [XML-SIG] Namespace support for DOM
In-Reply-To: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com>
Message-ID: <Pine.GSO.4.05.9812271851550.11931-100000@shell1.ncal.verio.com>

On Sun, 27 Dec 1998, A.M. Kuchling wrote:

> After reading over the current namespace working draft, I thought a
> little bit about how PyDOM should support it.  I'd like to hear
> opinions on this...
> 
>        Thought 1: the idea of walking over the whole tree and
> annotating it is bad, because if you modify the tree, the annotations
> become outdated and you have to recompute them.  

What about annotation where when you modify a node, you simply recompute
the namespace annotations for all the nodes in the subtree of that changed
node.

I *think* you can do this efficiently (in other words, the newly changed
node can be scanned to see if it could possibly have an effect on its
children's namespaces). If it doesn't not contain any namespace-related
declarations, for example, there shouldn't be any need to update subtree
namespace annotations...

>        Thought 2: similar reasoning applies to modifying element or
> attribute names by removing the namespace prefix.

I think similar reasoning would also apply to my previous comment (though
not sure).

>        Thought 3: therefore, the better course of action is to have
> functions or methods that dynamically compute what namespaces apply by
> looking at a node's ancestors.

I'm not sure if you are suggesting what I mention in the my first
response, or whether (as I think) you are suggesting a "get_namespace" (I
assume thats what get_namespace_mapping() is below). 

Is there a concise statement of the algorithm for determining the
namespace of an element or attribute somewhere? I have not been able to
find one.. 

> Proposed interfaces need to be tried out by actually implementing
> something on top of them, in order to find areas that have been
> missed.  Can anyone suggest some namespace-using application that
> would be useful as a test case?  It would also provide another demo
> application.  The transformation portion of XSL is one candidate, but
> I haven't read enough of the XSL draft to get an idea of how big the
> job would be.  Anyone know of something small?

Well, to be a nontrivial test, wouldn't we want some app built on
documents using multiple namespaces?? I mean, if everything is
xsl:<something>, whats the point -- you'll always get either xml or
"another" namespace (whatever your output namespaces is I guess). 

How about something simple with RDF? How about a RDF equality tool? Takes
two RDF XML documents and determines if the two are semantically
equivalent forms? 

<?xml version="1.0"?>
  <RDF
    xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
    xmlns:s="http://description.org/schema/">
    <Description about="http://www.w3.org/Home/Lassila">
      <s:Creator>Ora Lassila</s:Creator>
    </Description>
  </RDF>

 <?xml version="1.0"?>
  <RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#">
    <Description about="http://www.w3.org/Home/Lassila">
      <s:Creator xmlns:s="http://description.org/schema/">Ora
Lassila</s:Creator>
    </Description>
  </RDF>


  <?xml version="1.0"?>
  <RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#">
    <Description about="http://www.w3.org/Home/Lassila">
      <Creator xmlns="http://description.org/schema/">Ora
Lassila</Creator>
    </Description>
  </RDF>

All three of these are "semantically equivalent" (or are they -- I notice
a lack of namespace declarations in the latter two examples?), but not
syntactically equivalent. This would be a more interesting tool if more
than two namespaces (RDF and then multiple schemas) were involved. I don't
know complicated this would be (I'm not extremely familiar with RDF). I,
as I gather, RDF documents can be treated as directed graphs, it would
seem to me that equivalence shouldn't be too hard a task to take on...

	-Gabe

-------------------------------------------------------------------
http://www.aimnet.com/~gwachob               http://www.findlaw.com
"A popular Government, without popular information, or the means of 
acquiring it, is but a Prologue to a Farce or a Tragedy; or perhaps 
both." -- James Madison 
                       import std.disclaimer


From prescod@prescod.net  Mon Dec 28 14:01:43 1998
From: prescod@prescod.net (Paul)
Date: Mon, 28 Dec 1998 08:01:43 -0600 (CST)
Subject: [XML-SIG] Namespace support for DOM
In-Reply-To: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com>
Message-ID: <Pine.LNX.3.91.981228074112.23254A-100000@amati.techno.com>

On Sun, 27 Dec 1998, A.M. Kuchling wrote:
> 
>        Thought 3: therefore, the better course of action is to have
> functions or methods that dynamically compute what namespaces apply by
> looking at a node's ancestors.

This is probably the best, mostly because you want your 
namespace-enhanced DOM to be a superset of the regular DOM.

> 	  1) Get a dictionary mapping namespace prefixes to URIs, and
> vice versa; this would be done by walking up the tree looking at
> xmlns:* attributes.

I don't think that the programmer needs access to this dictionary. 
Internally you need it, but I don't think that the programmer should.

> This means that namespace-using applications won't have everything
> done for them; 

Why not?

> Python code might look vaguely like:
> 
> XSL_URI = "http://www.w3.org/..."
> uri = node.get_namespace_mapping()
> 
> # Next line assumes node is an Element tag
> nsp, name = divide_qualified_name( node.tagName )
> 
> if uri[nsp] == XSL_URI:

I think that this would be better:

uri, name = namespace_divide( node.tagName )

You can do the lookup internally, whether you use a dictionary or a walk 
up the tree is your business.

> would be useful as a test case?  It would also provide another demo
> application.  The transformation portion of XSL is one candidate, but
> I haven't read enough of the XSL draft to get an idea of how big the
> job would be.  Anyone know of something small?

How about an app that rewrote namespace prefixes to some canonical form to 
allow simple diff-ing. So maybe you have a configuration file like this:

<CONFIG>
<NAMESPACE URI="http://www.w3.org/XSL" PREFIX="xsl">
<NAMESPACE URI="http://www.w3.org/RDF" PREFIX="rdf">
<NAMESPACE URI="http://www.w3.org/XLink" PREFIX="xlink">
</CONFIG>

and you would feed in a document like this:

<DOC ...presume namespace declarations here...>
<x-style:element>...</x-style:element>
<metadata:description>...</metadata:description>
<xll:link>...</xll:link>
</DOC>

and would rewrite it (based on the elided namespace declarations and the 
configuration file) as:

<DOC ...presume namespace declarations here...>
<xsl:element>...</xsl:element>
<rdf:description>...</rdf:description>
<xll:link>...</xll:link>
</DOC>

This is useful for all of the usual reasons canoncalization is useful: to 
write simpler software that depends on the output instead of 
understanding the input. For instance if you were writing an RDF 
processor but were to lazy to handle the various requirements of 
namespaces you would pipe your data through the canoncalizer and do "dumb 
checks" like tagName=="rdf:description".

To be totally useful to programmers and not just as a demo app, it should 
actually transform one DOM into another (or, better, act as a lazy proxy).

I think that this app would use all features of the namespace draft.

 Paul Prescod


From akuchlin@cnri.reston.va.us  Mon Dec 28 15:15:57 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 28 Dec 1998 10:15:57 -0500 (EST)
Subject: [XML-SIG] Updated XML HOWTO with DOM coverage
Message-ID: <199812281515.KAA19087@amarok.cnri.reston.va.us>

I've added more coverage of how to use PyDOM to the XML HOWTO.  The
new material starts at:
	http://www.python.org/doc/howto/xml/DOM.html

As usual, comments are welcome.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "I... I did not intend to hurt you."
    "And what if you did not? Intent and outcome are so rarely coincident."
    -- Dream and Larissa, in SANDMAN #65: "The Kindly Ones:9"


From spepping@scaprea.hobby.nl  Tue Dec 29 18:31:23 1998
From: spepping@scaprea.hobby.nl (Simon Pepping)
Date: Tue, 29 Dec 1998 19:31:23 +0100 (MET)
Subject: [XML-SIG] Documentation and problems
Message-ID: <Pine.LNX.3.95.981228125504.724A-100000@scaprea.hobby.nl>

Hi,

I have spent quite some time with the XML package, mainly with the SAX
interface and xmlproc. As a result I have written a(nother) document
about the interaction of an application and a SAX parser, and how to
write a SAX application. I also wrote a simple application to
demonstrate it.

Check it out at http://www.hobby.nl/~scaprea/XML/index.html.

I also made a short list of problems I encountered:

Pr. SAXParseException.__str__ reads:

return "%s at %s:%d:%d" % (self.msg,self.getSystemId(),
    self.getColumnNumber(),self.getLineNumber())

getColumnNumber and getLineNumber should be swapped.

========================

Pr. pyexpat does not report the document name with the getSystemId
method:

Document: 
Fatal error: not well-formed at :5:1 (SAXParseException.__str__)

========================

Pr. XMLValidator does not use my error handlers:

ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at ./waarnemingen.dtd:17:25
TEXT: '#PCDATA )>'

Possible cause: XMLValidator.reset() and
XMLValidator.set_dtd_listener().
With these modifications it works, but now the location in the DTD is no
longer reported:

    def reset(self):
        self.dtd=CompleteDTD(ErrorHandler(self.parser))
        # added SP 1998/12/23
        self.dtd.set_dtd_listener(self.parser.dtd_listener)
        # added SP 1998/12/23
        self.dtd.set_error_handler(self.parser.err)
        self.val=ValidatingApp(self.dtd)
        self.val.set_real_app(self.app)
        # added SP 1998/12/23
        self.val.set_error_handler(self.parser.err)

        self.parser.reset()
        self.parser.set_application(self.val)
        self.parser.dtd=self.dtd
        self.parser.ent=self.dtd
        
    def set_dtd_listener(self,dtd_listener):
        self.parser.set_dtd_listener(dtd_listener)
        # added SP 1998/12/23
        self.dtd.set_dtd_listener(dtd_listener)

========================

Pr. drv_xmlproc does not implement a getPublicId method:

    # added SP 1998/12/24
    def getPublicId(self):
        # Hmmm, the parser has no method to get the PubID
        # return self.parser.get_current_pubid()
        return 'unknown'

=========================

Pr. XMLValidator does not accept spaces around #PCDATA as content in
an element type declaration:

<!ELEMENT	period					( #PCDATA )>

ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at ./waarnemingen.dtd:17:25
TEXT: '#PCDATA )>'

=========================

Pr. XMLValidator does not accept the following construction in an
external DTD:

<!ENTITY %  tekst                   "(#PCDATA|taxon|label|opsomming)*">
<!ELEMENT   p                       (%tekst;)>

ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at waarnemingen.dtd:22:38
TEXT: '%tekst;)>
'
(the declaration of p is line 22)

I am not sure whether this is allowed. nsgmls gives the warning:
'#PCDATA in nested model group'. 

I hope this is useful. And thanks for the work you have already put
into this. It generally works fine.

Simon Pepping
email: spepping@scaprea.hobby.nl


From fm@synchrologic.com  Tue Dec 29 20:11:51 1998
From: fm@synchrologic.com (Frank McGeough)
Date: Tue, 29 Dec 1998 15:11:51 -0500
Subject: [XML-SIG] Documentation and problems
Message-ID: <009e01be3367$794b5220$529b90d1@synchrologic.com>

Simon,

In your doc at :
http://www.hobby.nl/~scaprea/XML/t173.html

I believe the

2. Call the parser factory with the name of a known driver module, e.g.,
SAXparser=xml.sax.saxexts.make_parser("xml.sax.drivers.drv_xmlproc")

is incorrect.  The saxexts.py has the following code in it:
parser_name = 'xml.sax.drivers.drv_' + parser_name

therefore you should create the parser with :

SAXparser=xml.sax.saxexts.make_parser("xmlproc")

This may have been a recent change. I just started in with
Python XML stuff. I have downloaded the xml-0_5.zip
version.

Thanks for putting that doc on-line. I found it very helpful.

-----Original Message-----
From: Simon Pepping <simon@scaprea.hobby.nl>
To: Python XML-SIG <xml-sig@python.org>
Date: Tuesday, December 29, 1998 2:56 PM
Subject: [XML-SIG] Documentation and problems


>Hi,
>
>I have spent quite some time with the XML package, mainly with the SAX
>interface and xmlproc. As a result I have written a(nother) document
>about the interaction of an application and a SAX parser, and how to
>write a SAX application. I also wrote a simple application to
>demonstrate it.
>
>Check it out at http://www.hobby.nl/~scaprea/XML/index.html.
>


From dieter@handshake.de  Wed Dec 30 18:39:28 1998
From: dieter@handshake.de (Dieter Maurer)
Date: Wed, 30 Dec 1998 19:39:28 +0100
Subject: [XML-SIG] Experiences with xml-0.5
Message-ID: <199812301839.TAA01200@lindm.dm>

This is a multi-part MIME message.
--------------FC5583E803777E8ABB8C4995
Content-Type: text/plain; charset=iso-8859-1

Based on our xml-0.5 release, I have made a small tool which adds
a hierarchical content table to HTML documents:

	URL:http://www.handshake.de/~dieter/pyprojects/addContentTable.html

I encountered three bugs:

 1. "xml.dom.core.Document"s methods "get_firstChild" and
    "get_lastChild" (inherited from "xml.dom.core.Node")
    fail to initialize the "ownerDocument" in the children
    correctly (patch attached).

 2. "xml.dom.write.OutputStream.write" folds successive '\n'
    into a single '\n' (i.e. it eliminates empty lines).
    This is bad for preformatted elements (patch attached).

 3. The "NodeList" returned by "get_childNodes" is live (as
    required by the standard).
    This can make children processing a bit hasardous (the downside
    of liveness), e.g.

	f= dom.createDocumentFragment()
	for c in node.childNodes: f.appendChild(c)
   
        will *NOT* put all children of "node" into "f" (it does for
	about every second, and leaves the remaining children)
	because the list is modified as a side effect.

	This is a well known problem with Pythons for loop.
	However, the standard workaround (using a slice copy
	of the list) does not work in this case, because
	"NodeList[:]" does not yield a NodeList but rather
	a "_nodeData".

Dieter

--------------FC5583E803777E8ABB8C4995
Content-Type: application/x-patch; name="docowner.pat"
Content-Description: Patch to provide "xml.dom.core.Document" its own
		     implementation of "get_firstChild" and "get_lastChild"
		     correctly initializing "ownerDocument" of the
		     children.

--- :core.py	Tue Dec 29 10:45:25 1998
+++ core.py	Tue Dec 29 14:59:35 1998
@@ -1041,6 +1041,27 @@
     def get_childNodes(self):
         return NodeList(self._node.children, self, self)
 
+    ## DM: the inherited method fails to set "._document" correctly
+    def get_firstChild(self):
+        """Return the first child of this node. If there is no such node, this
+        returns null."""
+
+        if self._node.children:
+            n = self._node.children[0]
+            return NODE_CLASS[ n.type ] (n, self, self)
+        else:
+            return None
+
+    ## DM: the inherited method fails to set "._document" correctly
+    def get_lastChild(self):
+        """Return the last child of this node. If there is no such node, this
+        returns null."""
+        if self._node.children:
+            n = self._node.children[-1]
+            return NODE_CLASS[ n.type ] (n, self, self)
+        else:
+            return None
+
     def get_documentElement(self):
         """Return the root element of the Document object, or None
         if there is no root element."""


--------------FC5583E803777E8ABB8C4995
Content-Type: application/x-patch; name="emptyline.pat"
Content-Description: Patch to remove empty line removal in "xml.dom.writer"

--- :writer.py	Tue Dec 29 10:45:27 1998
+++ writer.py	Wed Dec 30 11:51:50 1998
@@ -16,7 +16,9 @@
 
 	def write(self, s):
 		#print 'write', `s`
-		self.file.write(re.sub('\n+', '\n', s))
+		#self.file.write(re.sub('\n+', '\n', s))
+	        # removing newlines is not a good idea for 'pre', e.g.
+		self.file.write(s)
 		if s and s[-1] == '\n':
 			self.new_line = 1
 		else:


--------------FC5583E803777E8ABB8C4995--


From pas@xis.xerox.com  Tue Dec 29 21:42:54 1998
From: pas@xis.xerox.com (Perry Stoll)
Date: Tue, 29 Dec 1998 13:42:54 -0800
Subject: [XML-SIG] Running XML on NT
Message-ID: <004601be3374$6a718500$b54cf60d@bushido>

In case no one has responded to you offline, I'll respond here. Yes, I have
the xml-0.5 release running on NT.

There are some dlls in the the xml-0.5/windows/ directory. The wstrop module
is not there, although it's not strictly necessary (if you don't have utf8
encoded data).  I compiled sgmlop, wstring, pyexpat modules. I also copied
over the xml.* modules into a directory on my path by hand. It seems to do
the trick.

If the packager (Andrew?) of the xml module would like a wstrop.dll, I'd
pass it along.

Frank, if you want more specific instructions (as opposed to "Yep, i've done
it!"), let me know.


-Perry


-----Original Message-----
From: Frank McGeough <fm@synchrologic.com>
To: xml-sig@python.org <xml-sig@python.org>
Date: Saturday, December 26, 1998 8:41 PM
Subject: [XML-SIG] Running XML on NT


>Hi,
>
>Is it possible to run the test release of XML
>on NT? I downloaded the software from :
>http://www.python.org/topics/xml/download.html
>
>The README says to run make. I don't have a
>Unix style make. Is there a version that would
>work with Microsoft's nmake and VC compiler.
>
>Thanks,
>Frank
>
>Synchrologic, Inc.
>http://www.synchrologic.com
>T: 770.754.5600
>F: 770.619.5612
>
>
>_______________________________________________
>XML-SIG maillist  -  XML-SIG@python.org
>http://www.python.org/mailman/listinfo/xml-sig
>
>


From akuchlin@cnri.reston.va.us  Mon Dec 28 22:07:56 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 28 Dec 1998 17:07:56 -0500 (EST)
Subject: [XML-SIG] Namespace support for DOM
In-Reply-To: <Pine.LNX.3.91.981228074112.23254A-100000@amati.techno.com>
References: <199812271707.MAA14899@207-172-46-235.s235.tnt9.ann.erols.com>
 <Pine.LNX.3.91.981228074112.23254A-100000@amati.techno.com>
Message-ID: <13959.65326.842240.956946@amarok.cnri.reston.va.us>

Paul writes:
>> 	  1) Get a dictionary mapping namespace prefixes to URIs, and
>> vice versa; this would be done by walking up the tree looking at
>> xmlns:* attributes.
>
>I don't think that the programmer needs access to this dictionary. 
>Internally you need it, but I don't think that the programmer should.

>I think that this would be better:
>uri, name = namespace_divide( node.tagName )

	Talking about this with Fred at lunch today, I realized that
this is probably not sufficient, and that you really do need access to
the dictionary.  Consider an Element node with no namespace prefix;
its namespace is therefore assumed to be the default one.  Take that
node out of the tree, and insert it somewhere else, where the default
namespace is *different*.  Assume that this behaviour isn't what you
want; instead, you want to keep the element in the same namespace as
it was originally in.

	This may mean adding the right prefix for the namespace's URI,
which means you need some way of getting at the prefixes and URIs
availabe at the new location. (It could also be done by adding an
xmlns="URI" attribute to the element, but that makes solving this
problem too easy. :) More seriously, there might be applications where
adding the NS prefix is the only way to go.)

	I like the idea of the namespace canonicalizer as a demo app,
BTW.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I must be strong. And in my head a voice says, Yes, Dear, you must. And in my
head another voice is muttering Oh that I were a man, or that I had power to
execute my apprehended wishes: I would whip some with scorpions... And a voice
says, You know what you must do.
    -- Lyta is told her son is dead, in SANDMAN #59: "The Kindly Ones:3"


From prescod@prescod.net  Thu Dec 31 04:47:14 1998
From: prescod@prescod.net (Paul)
Date: Wed, 30 Dec 1998 22:47:14 -0600 (CST)
Subject: [XML-SIG] Namespace support for DOM
In-Reply-To: <13959.65326.842240.956946@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.91.981230224232.15531A-100000@amati.techno.com>

On Mon, 28 Dec 1998, Andrew M. Kuchling wrote:

> 	Talking about this with Fred at lunch today, I realized that
> this is probably not sufficient, and that you really do need access to
> the dictionary.  Consider an Element node with no namespace prefix;
> its namespace is therefore assumed to be the default one.  Take that
> node out of the tree, and insert it somewhere else, where the default
> namespace is *different*.  Assume that this behaviour isn't what you
> want; instead, you want to keep the element in the same namespace as
> it was originally in.

Since namespace defaulting is just a typing convenience, I would argue that
moving a node should never change its namespace.

> 	This may mean adding the right prefix for the namespace's URI,
> which means you need some way of getting at the prefixes and URIs
> availabe at the new location. (It could also be done by adding an
> xmlns="URI" attribute to the element, but that makes solving this
> problem too easy. :) More seriously, there might be applications where
> adding the NS prefix is the only way to go.)

I think that The namespace-aware node-moving-method should do the fixup 
automatically.

Maybe my desire to have everything be automatic and semantically clean is 
at odds with your desire to have this be a transparent extension to the 
DOM that doesn't change the behaviorof any DOM-builtin method.

 Paul Prescod


From akuchlin@cnri.reston.va.us  Thu Dec 31 19:08:41 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Thu, 31 Dec 1998 14:08:41 -0500 (EST)
Subject: [XML-SIG] Namespace support for DOM
In-Reply-To: <Pine.LNX.3.91.981230224232.15531A-100000@amati.techno.com>
References: <13959.65326.842240.956946@amarok.cnri.reston.va.us>
 <Pine.LNX.3.91.981230224232.15531A-100000@amati.techno.com>
Message-ID: <13963.51590.804687.473078@amarok.cnri.reston.va.us>

Paul writes:
>I think that The namespace-aware node-moving-method should do the fixup 
>automatically.
>
>Maybe my desire to have everything be automatic and semantically clean is 
>at odds with your desire to have this be a transparent extension to the 
>DOM that doesn't change the behaviorof any DOM-builtin method.

	Indeed; I'm frightened of adding some sort of clever,
invalidate-namespaces-on-a-move, scheme and opening the door to lots
of subtle bugs.  Also, the PyDOM representation has nodes with a list
of their children, and no parent pointers; this makes the traversing
of ancestors difficult.  I'm somewhat tempted to toss the recently
announced WeakDict object into the XML package and add parent
pointers, but it may be too late to undertake such a large change to
the DOM code.  Any opinions?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "Wow. That's wicked! Like _Star Wars_."
    "A strange analogy, child, but indeed, there was a war in heaven, and you
see the vanquished now, burning as they fall, like stars. In the darkness
before the first dawn, theirs was the first folly; theirs the first rebellion."
    -- Tim and Dr Occult, in BOOKS OF MAGIC #1