From xml-sig@teleo.net  Thu Mar  1 02:52:55 2001
From: xml-sig@teleo.net (Patrick Phalen)
Date: Wed, 28 Feb 2001 18:52:55 -0800
Subject: [XML-SIG] DTD design: include categorization, or use RDF?
In-Reply-To: <0102281124320Y.04301@quadra.teleo.net>
References: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us> <0102281124320Y.04301@quadra.teleo.net>
Message-ID: <0102281852551G.04301@quadra.teleo.net>

On Wednesday 28 February 2001 11:24, Patrick Phalen wrote:
> There's now an Open Source TM engine in Python:
> http://ontopia.net/software/tmproc/

oops ... make that http://www.ontopia.net/software/tmproc/


From loewis@informatik.hu-berlin.de  Thu Mar  1 14:22:34 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Thu, 1 Mar 2001 15:22:34 +0100 (MET)
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <01022521251706.28858@fermi.eeel.nist.gov> (message from Michael
 McLay on Sun, 25 Feb 2001 21:25:17 -0500)
References: <200102260841.JAA09898@pandora> <01022521251706.28858@fermi.eeel.nist.gov>
Message-ID: <200103011422.PAA04177@pandora.informatik.hu-berlin.de>

> I'm begining to think someone from the Enlightenment window manager project 
> has been given control of the version numbering for PyXML.

I don't know much about Enlightenment, so I can't tell whether this is
applause or criticism - I assume it's the latter...

> Version numbers are arbitrary, but some people will mistakenly read
> the low number on PyXML as an inidcation of unstable and immature
> software.  Based on the improved level of integration of this latest
> release the version number should have at least been bumped to a
> 0.7.0 release number.

For 0.7, I hope to provide XPath support.

> What needs to be added/finished before the number can be bumped to
> 1.0?

If the major components have no well-known and problematic
deficiencies left, I'll call it 1.0. A well-known deficiency are the
Unicode problems, for example.

Regards,
Martin


From stefan.marsiske@sysdata.siemens.hu  Thu Mar  1 14:31:19 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Thu, 1 Mar 2001 15:31:19 +0100
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <200103011422.PAA04177@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Thu, Mar 01, 2001 at 03:22:34PM +0100
References: <200102260841.JAA09898@pandora> <01022521251706.28858@fermi.eeel.nist.gov> <200103011422.PAA04177@pandora.informatik.hu-berlin.de>
Message-ID: <20010301153119.C12848@sysdata.siemens.hu>

hi,

On Thu, Mar 01, 2001 at 03:22:34PM +0100, Martin von Loewis wrote:
> > I'm begining to think someone from the Enlightenment window manager project 
> > has been given control of the version numbering for PyXML.
> 
> I don't know much about Enlightenment, so I can't tell whether this is
> applause or criticism - I assume it's the latter...

i feel offended here, since i'm involved a with E. and i agree totally with
the versioning.  since atm we're at our 4 rewrite of the whole app. so the low
version numbering is ok. though maybe for each rewrite we could also choose a
new name, and start over from 0.1. e-0.16.5 could be considered a major
release. but we scrapped that, and started over. once again. always
improving. :)
---end quoted text---

-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From uche.ogbuji@fourthought.com  Thu Mar  1 17:37:38 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 01 Mar 2001 10:37:38 -0700
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
References: <Pine.LNX.4.21.0102281721500.17468-100000@leo.logilab.fr>
Message-ID: <3A9E88E2.D0E836B5@fourthought.com>

Alexandre Fayolle wrote:
> 
> Our dear friend Uche is quoted on
> http://www.xml.com/pub/a/2001/02/14/deviant.html about the <xsl:script>
> element.
> 
> The article is worth reading, I think.

Actually, I've gone beyond that.  With Clark Evans and other concerned
parties, I've set up a petition against the xsl:script nonsense and
language bindings.  Please see

http://uche.ogbuji.net:8000/etc/no-xsl-script.xhtml

I think Python XML users in should be worried about the W3C's continual
efforts to enshrine particular languages as first-class XML-processing
environments.  It wouldn't be so bad if things such as xsl:script were
not so bloody unnecessary.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From mclay@nist.gov  Thu Mar  1 05:44:05 2001
From: mclay@nist.gov (Michael McLay)
Date: Thu, 1 Mar 2001 00:44:05 -0500
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <20010301153119.C12848@sysdata.siemens.hu>
References: <200102260841.JAA09898@pandora> <200103011422.PAA04177@pandora.informatik.hu-berlin.de> <20010301153119.C12848@sysdata.siemens.hu>
Message-ID: <0103010044050Q.28858@fermi.eeel.nist.gov>

On Thursday 01 March 2001 09:31, Marsiske Stefan - 3244 wrote:
> hi,
>
> On Thu, Mar 01, 2001 at 03:22:34PM +0100, Martin von Loewis wrote:
> > > I'm begining to think someone from the Enlightenment window manager
> > > project has been given control of the version numbering for PyXML.
> >
> > I don't know much about Enlightenment, so I can't tell whether this is
> > applause or criticism - I assume it's the latter...
>
> i feel offended here, since i'm involved a with E. and i agree totally with
> the versioning.  since atm we're at our 4 rewrite of the whole app. so the
> low version numbering is ok. though maybe for each rewrite we could also
> choose a new name, and start over from 0.1. e-0.16.5 could be considered a
> major release. but we scrapped that, and started over. once again. always
> improving. :)

No offense was intended.  I used E as an example of a project that has been 
very conservative with version numbering increments.  Python has been 
conservative as well.  They finally bumped Python up to 2.0 for marketing 
purposes.  If anything it should be taken as a complement.  There is nothing 
wrong with being conservative about moving to a 1.0 release.  I was just 
looking for some indication of when 1.0 might happen.  

The low version number does have a down side.  Many people won't touch code 
below a 1.0 or 1.2 release.  This may be dumb logic on their part, but it is 
reality. 


From Alexandre.Fayolle@logilab.fr  Thu Mar  1 17:53:02 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 1 Mar 2001 18:53:02 +0100 (CET)
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: <3A9E88E2.D0E836B5@fourthought.com>
Message-ID: <Pine.LNX.4.21.0103011848250.21344-100000@leo.logilab.fr>

On Thu, 1 Mar 2001, Uche Ogbuji wrote:

> 
> Actually, I've gone beyond that.  With Clark Evans and other concerned
> parties, I've set up a petition against the xsl:script nonsense and
> language bindings.  Please see
> 
> http://uche.ogbuji.net:8000/etc/no-xsl-script.xhtml

The text of the petition says:

"7. With [...] recent changes to the DOM specification, it appears that
the W3C strongly favors Java and Javascript over other equally qualified
languages."

Could you please detail this? I'm interested in learning how the DOM can
be language biased. 


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Thu Mar  1 18:27:49 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 01 Mar 2001 11:27:49 -0700
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
References: <Pine.LNX.4.21.0103011848250.21344-100000@leo.logilab.fr>
Message-ID: <3A9E94A5.12873CBC@fourthought.com>

Alexandre Fayolle wrote:
> 
> On Thu, 1 Mar 2001, Uche Ogbuji wrote:
> 
> >
> > Actually, I've gone beyond that.  With Clark Evans and other concerned
> > parties, I've set up a petition against the xsl:script nonsense and
> > language bindings.  Please see
> >
> > http://uche.ogbuji.net:8000/etc/no-xsl-script.xhtml
> 
> The text of the petition says:
> 
> "7. With [...] recent changes to the DOM specification, it appears that
> the W3C strongly favors Java and Javascript over other equally qualified
> languages."
> 
> Could you please detail this? I'm interested in learning how the DOM can
> be language biased.

Ah.  I'm on the spot.  Note that the petition is the synthesis of the
entire "gang of eight" that put it together.

But I think the DOM clause is a mistake which I missed on earlier
editing.  It probably referes to the inclusion of the Java and ECMA
bindings in level 2, which isn't all that recent, and is not, I think as
bad an instance of language bias as the XSLT 1.1 language binding
section.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From guido@digicool.com  Thu Mar  1 18:38:17 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 01 Mar 2001 13:38:17 -0500
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: Your message of "Thu, 01 Mar 2001 10:37:38 MST."
 <3A9E88E2.D0E836B5@fourthought.com>
References: <Pine.LNX.4.21.0102281721500.17468-100000@leo.logilab.fr>
 <3A9E88E2.D0E836B5@fourthought.com>
Message-ID: <200103011838.NAA17049@cj20424-a.reston1.va.home.com>

> Alexandre Fayolle wrote:
> > 
> > Our dear friend Uche is quoted on
> > http://www.xml.com/pub/a/2001/02/14/deviant.html about the <xsl:script>
> > element.
> > 
> > The article is worth reading, I think.

Uche:
> Actually, I've gone beyond that.  With Clark Evans and other concerned
> parties, I've set up a petition against the xsl:script nonsense and
> language bindings.  Please see
> 
> http://uche.ogbuji.net:8000/etc/no-xsl-script.xhtml
> 
> I think Python XML users in should be worried about the W3C's continual
> efforts to enshrine particular languages as first-class XML-processing
> environments.  It wouldn't be so bad if things such as xsl:script were
> not so bloody unnecessary.

What does our friend Dan Connolly think of all this?  He's our secret
ally in the W3C, I believe! :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Alexandre.Fayolle@logilab.fr  Thu Mar  1 18:50:30 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 1 Mar 2001 19:50:30 +0100 (CET)
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: <3A9E94A5.12873CBC@fourthought.com>
Message-ID: <Pine.LNX.4.21.0103011933460.21549-100000@leo.logilab.fr>

On Thu, 1 Mar 2001, Uche Ogbuji wrote:

> Alexandre Fayolle wrote:
> 
> > The text of the petition says:
> > 
> > "7. With [...] recent changes to the DOM specification, it appears that
> > the W3C strongly favors Java and Javascript over other equally qualified
> > languages."
> > 
> > Could you please detail this? I'm interested in learning how the DOM can
> > be language biased.
> 
> Ah.  I'm on the spot.  Note that the petition is the synthesis of the
> entire "gang of eight" that put it together.

I was not accusing you or anything. Just being curious. I remember hearing
you pestering about some stuff in numbering handling (or date handling,
I'm not sure) in XSLT, which was Java biased (and this does not appear in
the petition, as far as I can tell), but could not see what was the thing
with DOM. 

Now, as for the bindings, I have to admit that it is one part of the spec
that I have never looked at (I've just checked it 30 seconds ago to see
what it looks like, and I really do not see the point in putting this in
the spec. It brings nothing new, and the IDL is all you need.)


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From stefan.marsiske@sysdata.siemens.hu  Thu Mar  1 17:58:10 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Thu, 1 Mar 2001 18:58:10 +0100
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <0103010044050Q.28858@fermi.eeel.nist.gov>; from mclay@nist.gov on Thu, Mar 01, 2001 at 12:44:05AM -0500
References: <200102260841.JAA09898@pandora> <200103011422.PAA04177@pandora.informatik.hu-berlin.de> <20010301153119.C12848@sysdata.siemens.hu> <0103010044050Q.28858@fermi.eeel.nist.gov>
Message-ID: <20010301185810.F12848@sysdata.siemens.hu>

On Thu, Mar 01, 2001 at 12:44:05AM -0500, Michael McLay wrote:
> On Thursday 01 March 2001 09:31, Marsiske Stefan - 3244 wrote:
> > hi,
> >
> > On Thu, Mar 01, 2001 at 03:22:34PM +0100, Martin von Loewis wrote:
> > > > I'm begining to think someone from the Enlightenment window manager
> > > > project has been given control of the version numbering for PyXML.
> > >
> > > I don't know much about Enlightenment, so I can't tell whether this is
> > > applause or criticism - I assume it's the latter...
> >
> > i feel offended here, since i'm involved a with E. and i agree totally with
> > the versioning.  since atm we're at our 4 rewrite of the whole app. so the
> > low version numbering is ok. though maybe for each rewrite we could also
> > choose a new name, and start over from 0.1. e-0.16.5 could be considered a
> > major release. but we scrapped that, and started over. once again. always
> > improving. :)
> 
> No offense was intended.  I used E as an example of a project that has been 
> very conservative with version numbering increments.  Python has been 
> conservative as well.  They finally bumped Python up to 2.0 for marketing 
> purposes.  If anything it should be taken as a complement.  There is nothing 
> wrong with being conservative about moving to a 1.0 release.  I was just 
> looking for some indication of when 1.0 might happen.  
> 
> The low version number does have a down side.  Many people won't touch code 
> below a 1.0 or 1.2 release.  This may be dumb logic on their part, but it is 
> reality. 

ok, i'll admit, i wasn't really offended, somebody just needed to defend E... :)
i agree with you on low (sub 1.0) version numbers, but not in the case of E. E
has quite big userbase, a long time ago when raster was working for redhat, it
was the default windowmanager for gnome. and most people are aware that there
will never be a 1.0 version of E. altough i for example fear anyting that has
a version number ending in .0 that's a bad sign. remember linux-2.2.0? or
redhat [567].0? eeek, never, even with a 100 foot pole...

> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
> 
---end quoted text---

-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From akuchlin@mems-exchange.org  Thu Mar  1 19:07:51 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 1 Mar 2001 14:07:51 -0500
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <200103011422.PAA04177@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Thu, Mar 01, 2001 at 03:22:34PM +0100
References: <200102260841.JAA09898@pandora> <01022521251706.28858@fermi.eeel.nist.gov> <200103011422.PAA04177@pandora.informatik.hu-berlin.de>
Message-ID: <20010301140751.B9504@ute.cnri.reston.va.us>

On Thu, Mar 01, 2001 at 03:22:34PM +0100, Martin von Loewis wrote:
>If the major components have no well-known and problematic
>deficiencies left, I'll call it 1.0. A well-known deficiency are the
>Unicode problems, for example.

What problem is that?  Is it that if a parser outputs regular strings,
you don't know what encoding they're in?

Regarding version numbers: the PyXML code base is certainly
full-featured enough that it could certainly be called 1.0.  We'd want
to work on bringing the docs back up to date, though; I'm planning to
revise the XML HOWTO next week (after I get the QEL release out).

--amk


From uche.ogbuji@fourthought.com  Thu Mar  1 19:12:12 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 01 Mar 2001 12:12:12 -0700
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
References: <Pine.LNX.4.21.0103011933460.21549-100000@leo.logilab.fr>
Message-ID: <3A9E9F0C.63C14E1C@fourthought.com>

Alexandre Fayolle wrote:
> 
> On Thu, 1 Mar 2001, Uche Ogbuji wrote:
> 
> > Alexandre Fayolle wrote:
> >
> > > The text of the petition says:
> > >
> > > "7. With [...] recent changes to the DOM specification, it appears that
> > > the W3C strongly favors Java and Javascript over other equally qualified
> > > languages."
> > >
> > > Could you please detail this? I'm interested in learning how the DOM can
> > > be language biased.
> >
> > Ah.  I'm on the spot.  Note that the petition is the synthesis of the
> > entire "gang of eight" that put it together.
> 
> I was not accusing you or anything. Just being curious. I remember hearing
> you pestering about some stuff in numbering handling (or date handling,
> I'm not sure) in XSLT, which was Java biased (and this does not appear in
> the petition, as far as I can tell), but could not see what was the thing
> with DOM.
> 
> Now, as for the bindings, I have to admit that it is one part of the spec
> that I have never looked at (I've just checked it 30 seconds ago to see
> what it looks like, and I really do not see the point in putting this in
> the spec. It brings nothing new, and the IDL is all you need.)

All true.  You've brought to light that the DOM clause was a mistake. 
Oh well.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From cce@clarkevans.com  Thu Mar  1 21:01:31 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Thu, 1 Mar 2001 16:01:31 -0500 (EST)
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: <Pine.LNX.4.21.0103011848250.21344-100000@leo.logilab.fr>
Message-ID: <Pine.LNX.4.21.0103011552400.4143-100000@clarkevans.com>

On Thu, 1 Mar 2001, Alexandre Fayolle wrote:
> On Thu, 1 Mar 2001, Uche Ogbuji wrote:
> > Actually, I've gone beyond that.  With Clark Evans and other concerned
> > parties, I've set up a petition against the xsl:script nonsense and
> > language bindings.  Please see
> > 
> > http://uche.ogbuji.net:8000/etc/no-xsl-script.xhtml
> 
> The text of the petition says:
> 
> "7. With [...] recent changes to the DOM specification, it appears that
> the W3C strongly favors Java and Javascript over other equally qualified
> languages."
> 
> Could you please detail this? I'm interested in learning how the DOM can
> be language biased. 

I authored this clause and in the back of my head while I was writing
was a message by Mike Champion (perhaps a private one) about the new 
working draft having Java specific stuff.  I never followed up or
verified the reference.  So, when your post was brought to my attention
I freaked out, went scurring about looking for this Java reference
and didn't find it.  Thus, I labeled it as a "bug" and posted to the
xsl-list my apologies for the error.  (IMHO, it is better to admit to 
a possible error before you are accused of it on a public list even 
if it turns out not to be an error).  So, since I was the author of
this paragraph, I labeled it as a bug in the draft which was probably
a politic thing to do anyway.

However, just for your edification, Robin Berjon <robin@knowscape.com>
posted the following to xml-dev regarding the Java litter in
the DOM WG recent draft:

> In fact, if you look at the WD for DOM3-Core
> (http://www.w3.org/TR/2001/WD-DOM-Level-3-Core-20010126/core.html) you'll
> see that Java is not at all relegated to an appendix. Section 1.2 is
> *entirely* about Java. I'm certain that the intentions behind that section
> are good, and I am aware that it is only a WD but that section has nothing
> to do there and I nevertheless find it's presence alarming. Either it ought
> to describe bindings (and in this case, implementation because it's what it
> does) for all languages succeptible of supporting a DOM interface, or it
> should be language independent. A DOMImplementationFactory is probably a
> good idea, describing that interface as part of the DOM is certainly enough.

So, I hope this will help.  In any case, Uche was not responsible
for this goof... it was my bad.

Clark

P.S.  I look forward to using 4SuiteServer!  Sorry this had to be
      my first post...


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  1 21:46:32 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 1 Mar 2001 22:46:32 +0100
Subject: [XML-SIG] Re: Version number question on PyXML 0.6.4
In-Reply-To: <20010301140751.B9504@ute.cnri.reston.va.us> (message from Andrew
 Kuchling on Thu, 1 Mar 2001 14:07:51 -0500)
References: <200102260841.JAA09898@pandora> <01022521251706.28858@fermi.eeel.nist.gov> <200103011422.PAA04177@pandora.informatik.hu-berlin.de> <20010301140751.B9504@ute.cnri.reston.va.us>
Message-ID: <200103012146.f21LkWu01187@mira.informatik.hu-berlin.de>

> On Thu, Mar 01, 2001 at 03:22:34PM +0100, Martin von Loewis wrote:
> >If the major components have no well-known and problematic
> >deficiencies left, I'll call it 1.0. A well-known deficiency are the
> >Unicode problems, for example.
> 
> What problem is that?  Is it that if a parser outputs regular strings,
> you don't know what encoding they're in?

Mainly that, yes. Plus, you cannot tell what kind of string you'll
get, except by trying.

> Regarding version numbers: the PyXML code base is certainly
> full-featured enough that it could certainly be called 1.0.  We'd want
> to work on bringing the docs back up to date, though; I'm planning to
> revise the XML HOWTO next week (after I get the QEL release out).

Actually, the reference is much more outdated than the
howto. Everything in the reference is probably outdated; everything
not documented in the Python library documentation is probably
undocumented (with the exception of aspects of 4DOM).

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  1 21:43:34 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 1 Mar 2001 22:43:34 +0100
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: <Pine.LNX.4.21.0103011933460.21549-100000@leo.logilab.fr>
 (message from Alexandre Fayolle on Thu, 1 Mar 2001 19:50:30 +0100
 (CET))
References: <Pine.LNX.4.21.0103011933460.21549-100000@leo.logilab.fr>
Message-ID: <200103012143.f21LhYh01164@mira.informatik.hu-berlin.de>

> I was not accusing you or anything. Just being curious. I remember hearing
> you pestering about some stuff in numbering handling (or date handling,
> I'm not sure) in XSLT, which was Java biased (and this does not appear in
> the petition, as far as I can tell), but could not see what was the thing
> with DOM. 

For XSLT numbers, the spec indeed defines exactly the same floating
point semantics as used in Java. That is not a bad thing in itself, as
the Java meaning is a variant of IEEE 754 (i.e. selecting specific
options where the spec leaves options).

On the DOM, I notice a number of Java-isms, all of them minor:

- naming conventions. OMG style would be has_feature and
  get_dom_implementation, W3C style is hasFeature and getDOMImplementation.

- nesting. IMO, enums should be in module scope; W3C puts them in
  interface scope - presumably since Java does not allow package-level
  constants.

> Now, as for the bindings, I have to admit that it is one part of the spec
> that I have never looked at (I've just checked it 30 seconds ago to see
> what it looks like, and I really do not see the point in putting this in
> the spec. It brings nothing new, and the IDL is all you need.)

You do need it, as it does not follow the CORBA language
mappings. E.g. everything is in a module "dom", but that ends up as
package org.w3c.dom in Java, and xml.dom in Python. Likewise, the
"readonly attribute nodeType" maps to getNodeType() in Java, whereas
the IDL mapping would produce a method named nodeType().

You probably could have done all that by spelling out the mapping
rules instead of providing the mapping result; in Java, it is easier
just to write down the interface definitions.

Regards,
Martin


From frank@quantiva.com  Thu Mar  1 22:02:28 2001
From: frank@quantiva.com (Frank Stolze)
Date: Thu, 01 Mar 2001 17:02:28 -0500
Subject: [XML-SIG] [OT - JOB AD] Python / XML / Distributed Systems Developer
Message-ID: <5.0.2.1.0.20010301163601.00a29af0@pop3.norton.antivirus>

--=====================_1822150==_.ALT
Content-Type: text/plain; charset="iso-8859-1"; format=flowed
Content-Transfer-Encoding: quoted-printable

Sorry for the off-topic post. We are a well-funded, stealth mode startup in=
 the
network service management field. We are building a novel, distributed=
 system
and service that involves Internet protocol implementations (HTTP, SMTP,=
 POP3,
DNS, ping, etc.), statistical analysis, some AI, database storage and=20
reporting,
as well as issues such as routing, load balancing, fail-over, firewalls,=
 etc.

Almost all of the implementation is being done in Python. We also use XML,
XML-RPC, HTTP tunneling and a few new things. We are looking for two
enthusiastic Python & networking gurus to join a small team of hands-on
people to help us implement a great vision!

The "official" job description is below.


Regards,
Frank

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Company Profile:

We're onto something really big in network service management.  We have
VC backing and a clear view of the future.  Our stealth situation hides our
long history in this area and lets us focus on the goal.  Join the team of
innovative, visionary, enthusiastic, and passionate people that will shape
our future.  Share the opportunity to work with industry leading experts on
the design and creation of a next generation system.

Quantiva is looking for people who can quickly grasp new concepts,
develop new, original solutions to existing problems, and in general can
"hit the ground running."

Quantiva is located in Princeton, New Jersey.

Please send resumes to techjobs@quantiva.com.


Job Description:

The Network Software Engineer will contribute to the design and development
Quantiva's network service management software. This position will involve
active participation in the design, architecture, and implementation of the=
=20
product.


Requirements:
=B7       Strong software development skills in a Unix environment.
=B7       Strong network and distributed systems programming experience.=
 Good=20
knowledge of network protocols such as TCP/IP, HTTP, DNS, SMTP, POP3 and=20
concepts such as firewalls, routing required.
=B7       4+ years of hands-on experience with scripting languages such as=
=20
Perl, Python or Tcl. Python proficiency is required.
=B7       Applications programming experience in object-oriented languages=
=20
such as C++ or Java.
=B7       DBMS experience including database schemas, SQL, and database=20
programming.
=B7       Solid Unix experience (Solaris preferred) in a commercial=
 environment.
=B7       XML experience is a plus.
=B7       BS/MS/PhD in CS, EE, CE or related field.

--=====================_1822150==_.ALT
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
Sorry for the off-topic post. We are a well-funded, stealth mode startup
in the<br>
network service management field. We are building a novel, distributed
system<br>
and service that involves Internet protocol implementations (HTTP, SMTP,
POP3, <br>
DNS, ping, etc.), statistical analysis, some AI, database storage and
reporting,<br>
as well as issues such as routing, load balancing, fail-over, firewalls,
etc.<br>
<br>
Almost all of the implementation is being done in Python. We also use
XML, <br>
XML-RPC, HTTP tunneling and a few new things. We are looking for two
<br>
enthusiastic Python &amp; networking gurus to join a small team of
hands-on<br>
people to help us implement a great vision!<br>
<br>
The &quot;official&quot; job description is below.<br>
<br>
<br>
Regards,<br>
Frank<br>
<br>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
<font face=3D"Arial, Helvetica">Company Profile:<br>
<br>
We're onto something really big in network service management.&nbsp; We
have<br>
VC backing and a clear view of the future.&nbsp; Our stealth situation
hides our <br>
long history in this area and lets us focus on the goal.&nbsp; Join the
team of <br>
innovative, visionary, enthusiastic, and passionate people that will
shape <br>
our future.&nbsp; Share the opportunity to work with industry leading
experts on<br>
the design and creation of a next generation system.<br>
<br>
Quantiva is looking for people who can quickly grasp new concepts,<br>
develop new, original solutions to existing problems, and in general can
<br>
&quot;hit the ground running.&quot;<br>
<br>
Quantiva is located in Princeton, New Jersey.<br>
<br>
Please send resumes to techjobs@quantiva.com.<br>
<br>
<br>
Job Description:<br>
<br>
The Network Software Engineer will contribute to the design and
development<br>
Quantiva's network service management software. This position will
involve <br>
active participation in the design, architecture, and implementation of
the product. <br>
<br>
<br>
Requirements:<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">Strong
software development skills in a Unix environment.<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">Strong
network and distributed systems programming experience. Good knowledge of
network protocols such as TCP/IP, HTTP, DNS, SMTP, POP3 and concepts such
as firewalls, routing required. <br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">4+
years of hands-on experience with scripting languages such as Perl,
Python or Tcl. Python proficiency is required. <br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">Applications
programming experience in object-oriented languages such as C++ or
Java.<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">DBMS
experience including database schemas, SQL, and database
programming.<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">Solid
Unix experience (Solaris preferred) in a commercial environment.<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">XML
experience is a plus.<br>
</font><font face=3D"Symbol"=
 size=3D4>=B7<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab></font=
><font face=3D"Arial, Helvetica">BS/MS/PhD
in CS, EE, CE or related field.<br>
</font></html>

--=====================_1822150==_.ALT--


From frank@quantiva.com  Thu Mar  1 22:12:45 2001
From: frank@quantiva.com (Frank Stolze)
Date: Thu, 1 Mar 2001 17:12:45 -0500 (EST)
Subject: [XML-SIG] [OT - JOB AD][REPOST] Python / XML / Distributed Systems Developer
Message-ID: <Pine.LNX.4.30.0103011711120.9554-100000@localhost.localdomain>

This time without the HTML nonsense...


Sorry for the off-topic post. We are a well-funded, stealth mode startup =
in the
network service management field. We are building a novel, distributed sy=
stem
and service that involves Internet protocol implementations (HTTP, SMTP, =
POP3,
DNS, ping, etc.), statistical analysis, some AI, database storage and
reporting, as well as issues such as routing, load balancing, fail-over,
firewalls, etc.

Almost all of the implementation is being done in Python. We also use XML=
,
XML-RPC, HTTP tunneling and a few new things. We are looking for two
enthusiastic Python & networking gurus to join a small team of hands-on
people to help us implement a great vision!

The "official" job description is below.


Regards,
Frank

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Company Profile:

We're onto something really big in network service management. We have
VC backing and a clear view of the future. Our stealth situation hides ou=
r
long history in this area and lets us focus on the goal. Join the team of
innovative, visionary, enthusiastic, and passionate people that will shap=
e
our future. Share the opportunity to work with industry leading experts o=
n
the design and creation of a next generation system.

Quantiva is looking for people who can quickly grasp new concepts,
develop new, original solutions to existing problems, and in general can
"hit the ground running."

Quantiva is located in Princeton, New Jersey.

Please send resumes to techjobs@quantiva.com.


Job Description:

The Network Software Engineer will contribute to the design and developme=
nt
Quantiva's network service management software. This position will involv=
e
active participation in the design, architecture, and implementation of t=
he
product.


Requirements:
=B7 Strong software development skills in a Unix environment.
=B7 Strong network and distributed systems programming experience. Good k=
nowledge
  of network protocols such as TCP/IP, HTTP, DNS, SMTP, POP3 and concepts=
 such
  as firewalls, routing required.
=B7 4+ years of hands-on experience with scripting languages such as Perl=
, Python
  or Tcl. Python proficiency is required.
=B7 Applications programming experience in object-oriented languages such=
 as C++
  or Java.
=B7 DBMS experience including database schemas, SQL, and database program=
ming.
=B7 Solid Unix experience (Solaris preferred) in a commercial environment.
=B7 XML experience is a plus.
=B7 BS/MS/PhD in CS, EE, CE or related field.


From smith@xml-doc.org  Fri Mar  2 04:37:18 2001
From: smith@xml-doc.org (Michael Smith)
Date: 01 Mar 2001 20:37:18 -0800
Subject: [XML-SIG] Maintaining catalogs
In-Reply-To: Andrew Kuchling's message of "Tue, 27 Feb 2001 10:11:19 -0500"
References: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us>
Message-ID: <uzof423cx.fsf@openwave.com>

Andrew Kuchling <akuchlin@mems-exchange.org> writes:

> For a project, I'd like to install a DTD on the system and
> automatically add its public identifier to the catalog. Is there a
> standard place to put SGML/XML catalogs on Unix systems?
> /usr/(local)?/lib/sgml? /etc/sgml/?

I'm following up this a little late, so maybe somebody already pointed
you to the SGML/XML part of the proposed Linux Standard Base (LSB) spec:

  http://www.linuxbase.org/spec/gLSB/gLSB/lsbsgml.html

or for specifics on directory structure:

  http://www.linuxbase.org/spec/gLSB/gLSB/sgmlr001.html

It's a proposed standard, so current distributions aren't yet
necessarily consistent with it of course.


From akuchlin@mems-exchange.org  Fri Mar  2 06:33:25 2001
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Fri, 2 Mar 2001 01:33:25 -0500
Subject: [XML-SIG] ANN: quotation-tools 0.0.1
Message-ID: <200103020633.BAA02006@mira.erols.com>

I've made a first release of quotation-tools, which contains a Python
package for parsing QEL 2.0, and provides additional tools using the
'qel package'.  It can be downloaded from the QEL software page
at http://www.amk.ca/qel/software.html.

This is a first release, and the only two tools implemented at this
point are qtformat, for formatting QEL, and qtgrep, for searching
through QEL files.  In future releases I want to add more tools,
provide convertors to QEL from other formats, and eventually produce a
GUI editor, but that's some way off.

--amk


From uche.ogbuji@fourthought.com  Fri Mar  2 07:56:38 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 02 Mar 2001 00:56:38 -0700
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Thu, 01 Mar 2001 13:38:17 EST." <200103011838.NAA17049@cj20424-a.reston1.va.home.com>
Message-ID: <200103020756.AAA12971@localhost.localdomain>

> > I think Python XML users in should be worried about the W3C's continual
> > efforts to enshrine particular languages as first-class XML-processing
> > environments.  It wouldn't be so bad if things such as xsl:script were
> > not so bloody unnecessary.
> 
> What does our friend Dan Connolly think of all this?  He's our secret
> ally in the W3C, I believe! :-)

Actually, based on my correspondence, I think we have several allies in the 
W3C.  Dan Brickley is another.  Henry Thomson, Schemas WG chair made a very 
early Python binding for his prototype XSV schemas implementation, and Philip 
Le Hagar, current DOM WG chair and I have chatted about making the Python/DOM 
binding an official annex.

However, I've often noticed that W3C staffers tend to avoid public jousting 
with member company reps over matters that might be considered political.

In good news, though, it looks as if over a hundred people have signed our 
petition in under 24 hours.  That should ring a bell for the W3C.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From scott snyder <snyder@fnal.gov>  Sat Mar  3 01:39:59 2001
From: scott snyder <snyder@fnal.gov> (scott snyder)
Date: Fri, 02 Mar 2001 19:39:59 CST
Subject: [XML-SIG] 0.6.4 problem with reading DOM tree from XML with validation
Message-ID: <200103030140.TAA01207@d0sgibnl1.fnal.gov>

hi -

Reading a DOM tree from XML with validation seems to have broken between
0.6.2 and 0.6.4.  For example, if i run the following program:

-------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile

f = open ('test.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "NONEXISTENT.dtd">
<configuration/>""")
f.close()

doc = FromXmlFile ('test.xml', None, 1)
print doc
-------------------------------------------------------------

with 0.6.4, it runs without error, even though the DTD referred to
does not exist.

$ python read.py
<XML Document at 82026f8>


0.6.2, on the other hand, does give me the error i expect:

[sss@karma xmltest]$ python read.py
Traceback (innermost last):
  File "read.py", line 9, in ?
    doc = FromXmlFile ('test.xml', None, 1)
  ... (traceback trimmed) ...
  File "xml/dom/ext/reader/Sax2.py", line 240, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: Unknown:2:50: Couldn't open resource 'NONEXISTENT.dtd'


The immediate problem is fixed by this change:


*** xml/dom/ext/reader/Sax2.py-orig	Tue Feb 20 00:47:40 2001
--- xml/dom/ext/reader/Sax2.py	Fri Mar  2 18:29:21 2001
***************
*** 274,279 ****
--- 274,281 ----
      def __init__(self, validate=0, keepAllWs=0, catName=None,
                   saxHandlerClass=XmlDomGenerator, parser=None):
          self.parser = parser or (validate and sax2exts.XMLValParserFactory.make_parser()) or sax2exts.XMLParserFactory.make_parser()
+         if validate:
+             self.parser.setFeature (saxlib.feature_validation, 1)
          if catName:
              #set up the catalog, if there is one
              from xml.parsers.xmlproc import catalog


However, with this change, i run into another bug:

$ python read.py
Traceback (innermost last):
  File "read.py", line 9, in ?
    doc = FromXmlFile ('test.xml', None, 1)
  File "xml/dom/ext/reader/Sax2.py", line 330, in FromXmlFile
    saxHandlerClass, parser)
  File "xml/dom/ext/reader/Sax2.py", line 315, in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "xml/dom/ext/reader/Sax2.py", line 301, in fromStream
    self.parser.parse(s)
  File "xml/sax/drivers2/drv_xmlproc.py", line 90, in parse
    parser.read_from(source.getByteStream(), bufsize)
TypeError: too many arguments; expected 2, got 3


Pooh.  The interfaces for the validating and non-validating parsers are
not compatible.  Patched thusly:


*** xml/parsers/xmlproc/xmlval.py-orig	Fri Mar  2 18:26:47 2001
--- xml/parsers/xmlproc/xmlval.py	Fri Mar  2 18:26:53 2001
***************
*** 98,105 ****
      def parseEnd(self):
          self.parser.parseEnd()
  
!     def read_from(self,file):
!         self.parser.read_from(file)
  
      def flush(self):
          self.parser.flush()
--- 98,105 ----
      def parseEnd(self):
          self.parser.parseEnd()
  
!     def read_from(self,file,bufsize=16384):
!         self.parser.read_from(file,bufsize)
  
      def flush(self):
          self.parser.flush()


With these changes, the example above works (i.e., gives an error).

However, the following program then fails:

----------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile

f = open ('test2.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "test2.dtd">
<configuration>
</configuration>
""")
f.close()

f = open ('test2.dtd', 'w')
f.write ("<!ELEMENT configuration EMPTY>\n")
f.close ()

doc = FromXmlFile ('test2.xml', None, 1)
print doc
----------------------------------------------------------------------

$ python read2.py
Traceback (innermost last):
  File "read2.py", line 15, in ?
    doc = FromXmlFile ('test2.xml', None, 1)
  File "xml/dom/ext/reader/Sax2.py", line 330, in FromXmlFile
    saxHandlerClass, parser)
  File "xml/dom/ext/reader/Sax2.py", line 315, in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "xml/dom/ext/reader/Sax2.py", line 301, in fromStream
    self.parser.parse(s)
  File "xml/sax/drivers2/drv_xmlproc.py", line 90, in parse
    parser.read_from(source.getByteStream(), bufsize)
  File "xml/parsers/xmlproc/xmlval.py", line 102, in read_from
    self.parser.read_from(file,bufsize)
  File "xml/parsers/xmlproc/xmlutils.py", line 137, in read_from
    self.feed(buf)
  File "xml/parsers/xmlproc/xmlutils.py", line 185, in feed
    self.do_parse()
  File "xml/parsers/xmlproc/xmlproc.py", line 115, in do_parse
    self.parse_data()
  File "xml/parsers/xmlproc/xmlproc.py", line 377, in parse_data
    self.app.handle_data(self.data,start,end)
  File "xml/parsers/xmlproc/xmlval.py", line 213, in handle_data
    self.realapp.handle_ignorable_data(data,start,end)
  File "xml/sax/drivers2/drv_xmlproc.py", line 355, in handle_ignorable_data
    self._cont_handler.ignorableWhitespace(data, start, end) # FIXME?
TypeError: too many arguments; expected 2, got 4


This patch seems to fix this:

*** xml/dom/ext/reader/Sax2.py-orig	Tue Feb 20 00:47:40 2001
--- xml/dom/ext/reader/Sax2.py	Fri Mar  2 18:59:31 2001
***************
*** 199,205 ****
              self._nodeStack[-1].appendChild(new_element)
          return
  
!     def ignorableWhitespace(self, chars):
          """
          If 'keepAllWs' permits, add ignorable white-space as a text node.
          A Document node cannot contain text nodes directly.
--- 199,205 ----
              self._nodeStack[-1].appendChild(new_element)
          return
  
!     def ignorableWhitespace(self, chars, start, length):
          """
          If 'keepAllWs' permits, add ignorable white-space as a text node.
          A Document node cannot contain text nodes directly.
***************
*** 207,213 ****
          for it in the DOM and it must be discarded.
          """
          if self._keepAllWs and self._nodeStack[-1].nodeType !=  Node.DOCUMENT_NODE:
!             self._currText = self._currText + chars
          return
  
      def characters(self, chars):
--- 207,213 ----
          for it in the DOM and it must be discarded.
          """
          if self._keepAllWs and self._nodeStack[-1].nodeType !=  Node.DOCUMENT_NODE:
!             self._currText = self._currText + chars[start:start+length]
          return
  
      def characters(self, chars):


From scott snyder <snyder@fnal.gov>  Sat Mar  3 02:01:02 2001
From: scott snyder <snyder@fnal.gov> (scott snyder)
Date: Fri, 02 Mar 2001 20:01:02 CST
Subject: [XML-SIG] 0.6.4: problems with sax exceptions
Message-ID: <200103030201.UAA01583@d0sgibnl1.fnal.gov>

hi -

I've been having some problems with sax exceptions in 0.6.4,
while trying to build DOM trees from XML.

Consider this program.  It creates an invalid xml file and reads it.
The resulting exception is caught and printed.

---------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile
from xml.sax                         import saxlib

f = open ('test3.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "NONEXISTENT.dtd">
<""")
f.close()

try:
    doc = FromXmlFile ('test3.xml')
except saxlib.SAXException, e:
    print e
---------------------------------------------------------------------

However, when i run this, i get


[sss@karma xmltest]$ python read3.py

Traceback (innermost last):
  File "read3.py", line 13, in ?
    print e
  File "xml/sax/_exceptions.py", line 83, in __str__
    sysid = self.getSystemId()
  File "xml/sax/_exceptions.py", line 79, in getSystemId
    return self._locator.getSystemId()
  File "xml/sax/drivers2/drv_xmlproc.py", line 161, in getSystemId
    return self._parser.get_current_sysid() # FIXME?
AttributeError: 'None' object has no attribute 'get_current_sysid'


It looks like the objects that get followed to get this information
get deleted during the stack unwind.

Here's an attempt at a fix:


*** xml/sax/_exceptions.py-orig	Fri Mar  2 19:43:46 2001
--- xml/sax/_exceptions.py	Fri Mar  2 19:43:59 2001
***************
*** 61,74 ****
          SAXException.__init__(self, msg, exception)
          self._locator = locator
  
      def getColumnNumber(self):
          """The column number of the end of the text where the exception
          occurred."""
!         return self._locator.getColumnNumber()
  
      def getLineNumber(self):
          "The line number of the end of the text where the exception occurred."
!         return self._locator.getLineNumber()
  
      def getPublicId(self):
          "Get the public identifier of the entity where the exception occurred."
--- 61,82 ----
          SAXException.__init__(self, msg, exception)
          self._locator = locator
  
+         # We need to cache this stuff at construction time.
+         # If this exception is thrown, the objects through which we must
+         # traverse to get this information may be deleted by the time
+         # it gets caught.
+         self._systemId = self._locator.getSystemId()
+         self._colnum = self._locator.getColumnNumber()
+         self._linenum = self._locator.getLineNumber()
+ 
      def getColumnNumber(self):
          """The column number of the end of the text where the exception
          occurred."""
!         return self._colnum
  
      def getLineNumber(self):
          "The line number of the end of the text where the exception occurred."
!         return self._linenum
  
      def getPublicId(self):
          "Get the public identifier of the entity where the exception occurred."
***************
*** 76,82 ****
  
      def getSystemId(self):
          "Get the system identifier of the entity where the exception occurred."
!         return self._locator.getSystemId()
  
      def __str__(self):
          "Create a string representation of the exception."
--- 84,90 ----
  
      def getSystemId(self):
          "Get the system identifier of the entity where the exception occurred."
!         return self._systemId
  
      def __str__(self):
          "Create a string representation of the exception."


With this change, the program prints this:

$ python read3.py
test3.xml:3:1: Premature document end, no root element


However, if i switch to using a validating XML parser, then i lose
the file name in the exception (this assumes the patches in my last
note to make the validating parser actually work are applied).


---------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile
from xml.sax                         import saxlib

f = open ('test3.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "test4.dtd">
<""")
f.close()

f = open ('test4.dtd', 'w')
f.write ("<!ELEMENT configuration EMPTY>\n")
f.close ()

try:
    doc = FromXmlFile ('test3.xml', None, 1)
except saxlib.SAXException, e:
    print e
---------------------------------------------------------------------


$ python read4.py
Unknown:3:1: Premature document end, no root element


The following patch seems to fix the problem.


*** xml/parsers/xmlproc/xmlval.py-orig2	Fri Mar  2 19:55:03 2001
--- xml/parsers/xmlproc/xmlval.py	Fri Mar  2 19:55:33 2001
***************
*** 26,31 ****
--- 26,32 ----
          self.app=Application()
          self.dtd=CompleteDTD(self.parser)
          self.val=ValidatingApp(self.dtd,self.parser)
+         self.current_sysID = "Unknown"
          self.reset()
  
      def parse_resource(self,sysid):
***************
*** 99,104 ****
--- 100,106 ----
          self.parser.parseEnd()
  
      def read_from(self,file,bufsize=16384):
+         self.parser.current_sysID = self.current_sysID
          self.parser.read_from(file,bufsize)
  
      def flush(self):


Now, when i run the program, i get

$ python read4.py
test3.xml:3:1: Premature document end, no root element


From scott snyder <snyder@fnal.gov>  Sat Mar  3 02:29:41 2001
From: scott snyder <snyder@fnal.gov> (scott snyder)
Date: Fri, 02 Mar 2001 20:29:41 CST
Subject: [XML-SIG] 0.6.4: another problem with building DOM using validating parser
Message-ID: <200103030229.UAA02146@d0sgibnl1.fnal.gov>

hi -

Here's another problem with building DOM trees from XML with the validating
parser with 0.6.4.

--------------------------------------------------------------------
from xml.dom.ext.reader.Sax2         import FromXmlFile

f = open ('test5.xml', 'w')
f.write ("""<?xml version="1.0"?>
<!DOCTYPE configuration  [
  <!ENTITY testscrap   SYSTEM "testscrap">
  <!ELEMENT configuration EMPTY>
]>

<configuration/>
""")
f.close()

doc = FromXmlFile ('test5.xml', None, 1)

print doc
--------------------------------------------------------------------


When i run this:

$ python read5.py
Traceback (innermost last):
  File "read5.py", line 14, in ?
    doc = FromXmlFile ('test5.xml', None, 1)
  File "xml/dom/ext/reader/Sax2.py", line 330, in FromXmlFile
    saxHandlerClass, parser)
  File "xml/dom/ext/reader/Sax2.py", line 315, in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "xml/dom/ext/reader/Sax2.py", line 301, in fromStream
    self.parser.parse(s)
  File "xml/sax/drivers2/drv_xmlproc.py", line 90, in parse
    parser.read_from(source.getByteStream(), bufsize)
  File "xml/parsers/xmlproc/xmlval.py", line 104, in read_from
    self.parser.read_from(file,bufsize)
  File "xml/parsers/xmlproc/xmlutils.py", line 137, in read_from
    self.feed(buf)
  File "xml/parsers/xmlproc/xmlutils.py", line 185, in feed
    self.do_parse()
  File "xml/parsers/xmlproc/xmlproc.py", line 104, in do_parse
    self.parse_doctype()
  File "xml/parsers/xmlproc/xmlproc.py", line 482, in parse_doctype
    self.parse_internal_dtd()    
  File "xml/parsers/xmlproc/xmlproc.py", line 532, in parse_internal_dtd
    self.handle_internal_dtd(line,lb,self.get_region()[:-last_part_size])
  File "xml/parsers/xmlproc/xmlproc.py", line 544, in handle_internal_dtd
    p.feed(int_dtd)
  File "xml/parsers/xmlproc/xmlutils.py", line 185, in feed
    self.do_parse()
  File "xml/parsers/xmlproc/dtdparser.py", line 251, in do_parse
    self.parse_entity()
  File "xml/parsers/xmlproc/dtdparser.py", line 341, in parse_entity
    self.dtd_consumer.new_external_entity(ent_name,pub_id,sys_id,ndata)
  File "xml/parsers/xmlproc/xmldtd.py", line 151, in new_external_entity
    self.dtd_listener.new_external_entity(ent_name,pubid,sysid,ndata)
  File "xml/sax/drivers2/drv_xmlproc.py", line 239, in new_external_entity
    ndata)
TypeError: too many arguments; expected 4, got 5


This seems to work around the problem, though i think it's probably not
the correct fix.

*** xml/dom/ext/reader/Sax2.py-orig2	Fri Mar  2 20:10:52 2001
--- xml/dom/ext/reader/Sax2.py	Fri Mar  2 20:23:31 2001
***************
*** 255,262 ****
          self._ownerDoc.getDocumentType().getNotations().setNamedItem(new_notation)
          return
  
!     def unparsedEntityDecl (self, publicId, systemId, notationName):
!         new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, notationName)
          self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
          return
  
--- 255,264 ----
          self._ownerDoc.getDocumentType().getNotations().setNamedItem(new_notation)
          return
  
!     def unparsedEntityDecl (self, name, publicId, systemId, ndata):
!         if not self._ownerDoc:
!             return
!         new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, name)
          self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
          return
  

From martin@loewis.home.cs.tu-berlin.de  Sat Mar  3 08:10:01 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 3 Mar 2001 09:10:01 +0100
Subject: [XML-SIG] 0.6.4: problems with sax exceptions
In-Reply-To: <200103030201.UAA01583@d0sgibnl1.fnal.gov> (message from scott
 snyder on Fri, 02 Mar 2001 20:01:02 CST)
References: <200103030201.UAA01583@d0sgibnl1.fnal.gov>
Message-ID: <200103030810.f238A1h01334@mira.informatik.hu-berlin.de>

> It looks like the objects that get followed to get this information
> get deleted during the stack unwind.
> 
> Here's an attempt at a fix:

Thanks, committed as-is.

> *** xml/parsers/xmlproc/xmlval.py-orig2	Fri Mar  2 19:55:03 2001
> --- xml/parsers/xmlproc/xmlval.py	Fri Mar  2 19:55:33 2001
> ***************
> *** 26,31 ****
> --- 26,32 ----
>           self.app=Application()
>           self.dtd=CompleteDTD(self.parser)
>           self.val=ValidatingApp(self.dtd,self.parser)
> +         self.current_sysID = "Unknown"
>           self.reset()
>   
>       def parse_resource(self,sysid):
> ***************
> *** 99,104 ****
> --- 100,106 ----
>           self.parser.parseEnd()
>   
>       def read_from(self,file,bufsize=16384):
> +         self.parser.current_sysID = self.current_sysID
>           self.parser.read_from(file,bufsize)
>   
>       def flush(self):

That did not seem right. Instead, I've used set_sysid throughout.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Mar  3 08:08:27 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 3 Mar 2001 09:08:27 +0100
Subject: [XML-SIG] 0.6.4 problem with reading DOM tree from XML with validation
In-Reply-To: <200103030140.TAA01207@d0sgibnl1.fnal.gov> (message from scott
 snyder on Fri, 02 Mar 2001 19:39:59 CST)
References: <200103030140.TAA01207@d0sgibnl1.fnal.gov>
Message-ID: <200103030808.f2388Rh01332@mira.informatik.hu-berlin.de>

Hi Scott,

Thanks for your comments and patches, they are quite helpful.

> *** xml/dom/ext/reader/Sax2.py-orig	Tue Feb 20 00:47:40 2001
> --- xml/dom/ext/reader/Sax2.py	Fri Mar  2 18:29:21 2001
> ***************
> *** 274,279 ****
> --- 274,281 ----
>       def __init__(self, validate=0, keepAllWs=0, catName=None,
>                    saxHandlerClass=XmlDomGenerator, parser=None):
>           self.parser = parser or (validate and sax2exts.XMLValParserFactory.make_parser()) or sax2exts.XMLParserFactory.make_parser()
> +         if validate:
> +             self.parser.setFeature (saxlib.feature_validation, 1)
>           if catName:
>               #set up the catalog, if there is one
>               from xml.parsers.xmlproc import catalog

I think the bug is actually in the XMLValParserFactory, which should
return a validating parser (which validation turned on).

> *** xml/parsers/xmlproc/xmlval.py-orig	Fri Mar  2 18:26:47 2001
> --- xml/parsers/xmlproc/xmlval.py	Fri Mar  2 18:26:53 2001
> ***************
> *** 98,105 ****
>       def parseEnd(self):
>           self.parser.parseEnd()
>   
> !     def read_from(self,file):
> !         self.parser.read_from(file)
>   
>       def flush(self):
>           self.parser.flush()
> --- 98,105 ----
>       def parseEnd(self):
>           self.parser.parseEnd()
>   
> !     def read_from(self,file,bufsize=16384):
> !         self.parser.read_from(file,bufsize)
>   
>       def flush(self):
>           self.parser.flush()

I've committed this as-is.

More later,
Martin


From larsga@garshol.priv.no  Sun Mar  4 12:32:14 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 04 Mar 2001 13:32:14 +0100
Subject: [XML-SIG] Re: [4suite] Article about XSLT 1.1 and <xsl:script>
In-Reply-To: <Pine.LNX.4.21.0103011848250.21344-100000@leo.logilab.fr>
References: <Pine.LNX.4.21.0103011848250.21344-100000@leo.logilab.fr>
Message-ID: <m3wva5pve9.fsf@lambda.garshol.priv.no>

* Alexandre Fayolle
| 
| I'm interested in learning how the DOM can be language biased.

The DOM is not biased towards any particular language, but it does
have a strong bias towards a particular family of languages:
mainstream statically typed object-oriented languages. This bias is,
of course, more or less inherited from IDL.

The further away you are from that core family of languages the more
painful you'll find implementing and using the DOM, since its design
will follow a philosophy increasingly distant from that of your
language.

In Python, a mainstream object-oriented language, the pain is not too
great, even though it can be felt. In Common Lisp, an object-oriented
language, it would be felt more strongly. In Haskell, a functional
programming language, the DOM is better ignored. Ditto for Prolog,
Forth and many other languages.

To put it another way the question is not how the DOM can be biased
towards a particular language, but more how it could possibly avoid
such a bias.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Sun Mar  4 22:15:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 4 Mar 2001 23:15:36 +0100
Subject: [XML-SIG] 0.6.4 problem with reading DOM tree from XML with validation
In-Reply-To: <200103030140.TAA01207@d0sgibnl1.fnal.gov> (message from scott
 snyder on Fri, 02 Mar 2001 19:39:59 CST)
References: <200103030140.TAA01207@d0sgibnl1.fnal.gov>
Message-ID: <200103042215.f24MFa902951@mira.informatik.hu-berlin.de>

>   File "xml/sax/drivers2/drv_xmlproc.py", line 355, in handle_ignorable_data
>     self._cont_handler.ignorableWhitespace(data, start, end) # FIXME?
> TypeError: too many arguments; expected 2, got 4
> 
> 
> This patch seems to fix this:

Thanks for the report. The patch is incorrect: The official SAX2
interface (in xml.sax.handlers) is that ignorableWhitespace gets a
single data argument, so the bug was actually in drv_xmlproc. I've
installed an appropriate fix.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sun Mar  4 22:26:42 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 4 Mar 2001 23:26:42 +0100
Subject: [XML-SIG] 0.6.4: another problem with building DOM using validating parser
In-Reply-To: <200103030229.UAA02146@d0sgibnl1.fnal.gov> (message from scott
 snyder on Fri, 02 Mar 2001 20:29:41 CST)
References: <200103030229.UAA02146@d0sgibnl1.fnal.gov>
Message-ID: <200103042226.f24MQgP03017@mira.informatik.hu-berlin.de>

> from xml.dom.ext.reader.Sax2         import FromXmlFile
> 
> f = open ('test5.xml', 'w')
> f.write ("""<?xml version="1.0"?>
> <!DOCTYPE configuration  [
>   <!ENTITY testscrap   SYSTEM "testscrap">
>   <!ELEMENT configuration EMPTY>
> ]>
> 
> <configuration/>
> """)
> f.close()
> 
> doc = FromXmlFile ('test5.xml', None, 1)
> 
> print doc
[...]
> !     def unparsedEntityDecl (self, publicId, systemId, notationName):
> !         new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, notationName)
>           self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
>           return

I'm glad that others are as confused about the matter as I am. What
you have in your document is not an unparsed entity, but an external
one - the unparsed ones have an NDATA notation name. xmlproc detected
that properly (by setting ndata to ""), but drv_xmlproc expected None
as the ndata. So I changed to to invoke externalEntityDecl in that
case, which is not handled by Sax2.

As you found, *if* this was ever invoked, _ownerDoc will be None
(since the document element has not been seen yet). Instead of
ignoring the unparsed entity, it would be better to put them into the
_orphanedChildren; I've changed it thus. In the process, I found that
things are put into _orphanedChildren which are later not processed -
I've fixed that too.

I still think that the unparsedEntityDecl callback is completely
broken. What is getFactory and getEntities? Also, if there is a
feature for creating entities, it is surely part of a 4DOM extension -
probably on the document type. However, that apparently is not capable
of distinguishing between external and unparsed entities; not sure
whether it should.

In any case, I've applied the following patch. I'd appreciate if
somebody of FourThough could take a look.

Regards,
Martin

Index: xml/dom/ext/reader/Sax2.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/ext/reader/Sax2.py,v
retrieving revision 1.7
diff -u -r1.7 Sax2.py
--- xml/dom/ext/reader/Sax2.py	2001/02/20 01:00:03	1.7
+++ xml/dom/ext/reader/Sax2.py	2001/03/04 22:05:59
@@ -8,7 +8,7 @@
 Components for reading XML files from a SAX2 producer.
 WWW: http://4suite.com/4DOM         e-mail: support@4suite.com
 
-Copyright (c) 2000 Fourthought Inc, USA.   All Rights Reserved.
+Copyright (c) 2000, 2001 Fourthought Inc, USA.   All Rights Reserved.
 See  http://4suite.com/COPYRIGHT  for license and copyright information
 """
 
@@ -148,6 +148,10 @@
                     self._ownerDoc.appendChild(comment)
             elif o_node[0] == 'doctype':
                 before_doctype = 0
+            elif o_node[0] == 'unparsedentitydecl':
+                apply(self.unparsedEntityDecl, o_node[1:])
+            else:
+                raise "Unknown orphaned node:"+o_node[0]
         self._rootNode = self._ownerDoc
         self._nodeStack.append(self._rootNode)
         return
@@ -222,7 +226,7 @@
     def startDTD(self, doctype, publicID, systemID):
         if not self._rootNode:
             self._dt = implementation.createDocumentType(doctype, publicID, systemID)
-            self._orphanedNodes.append(('doctype'))
+            self._orphanedNodes.append(('doctype',))
         else:
             raise 'Illegal DocType declaration'
         return
@@ -255,9 +259,12 @@
         self._ownerDoc.getDocumentType().getNotations().setNamedItem(new_notation)
         return
 
-    def unparsedEntityDecl (self, publicId, systemId, notationName):
-        new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, notationName)
-        self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
+    def unparsedEntityDecl (self, name, publicId, systemId, ndata):
+        if self._ownerDoc:
+            new_notation = self._ownerDoc.getFactory().createEntity(self._ownerDoc,  publicId, systemId, name)
+            self._ownerDoc.getDocumentType().getEntities().setNamedItem(new_notation)
+        else:
+            self._orphanedNodes.append(('unparsedentitydecl', name, publicId, systemId, ndata))
         return
 
     #Overridden ErrorHandler methods


From larsga@garshol.priv.no  Mon Mar  5 09:44:34 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Mar 2001 10:44:34 +0100
Subject: [XML-SIG] 0.6.4: another problem with building DOM using validating parser
In-Reply-To: <200103042226.f24MQgP03017@mira.informatik.hu-berlin.de>
References: <200103030229.UAA02146@d0sgibnl1.fnal.gov> <200103042226.f24MQgP03017@mira.informatik.hu-berlin.de>
Message-ID: <m3bsrgy2gt.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| I'm glad that others are as confused about the matter as I am. What
| you have in your document is not an unparsed entity, but an external
| one - the unparsed ones have an NDATA notation name. xmlproc detected
| that properly (by setting ndata to ""), but drv_xmlproc expected None
| as the ndata. So I changed to to invoke externalEntityDecl in that
| case, which is not handled by Sax2.

Whoops. Please note that xmlproc should report None rather than "".
This is one of the fixes either waiting in my CVS tree or lost in my
disk crash. So thre driver was correct, and xmlproc incorrect.

--Lars M.
 

From crawford@goingware.com  Mon Mar  5 08:52:19 2001
From: crawford@goingware.com (Michael D. Crawford)
Date: Mon, 05 Mar 2001 05:22:19 -0330
Subject: [XML-SIG] Web App Testing article at LinuxQuality
Message-ID: <3AA353C3.935C1510@goingware.com>

Tonight I posted:

Use Validators and Load Generators to Test Your Web Applications
http://linuxquality.sunsite.dk/articles/webapptesting/

The article generally promotes the idea that one should use
validators to ensure that the pages produced by a web application
conform to W3C standards.  I also talk about stress testing with
load generators and bring up the idea of combining the two to check
for document corruption from a server under heavy stress.

I mention PyXML and the Python XML Sig in the section "XML Validators":

http://linuxquality.sunsite.dk/articles/webapptesting/validators.html#xml

where I suggest that if a web application generates XHTML rather than
HTML,
one can make use of one of the many available XML software packages for
validating and processing one's documents.

I don't say a lot specifically about PyXML (although I do say I've used
it
and it's good) - is there anything I should add?  Do you have any
comments
on any part of the page?

The Linux Quality Database at http://linuxquality.sunsite.dk/ has the
dual
purpose of promoting better quality in Free and Open Source Software
programs
by publishing articles like this one, and the eventual development of an 
easy-to-use but powerful bug database to ease widespread public quality
assurance of the Linux kernel.

Any articles you might like to submit yourself are appreciated.  Also
appreciated are helpful hands to contribute to the database project.

Regards,

Mike Crawford
-- 
Michael D. Crawford
GoingWare Inc. - Expert Software Development and Consulting
http://www.goingware.com
crawford@goingware.com

  Tilting at Windmills for a Better Tomorrow.


From Alexandre.Fayolle@logilab.fr  Mon Mar  5 10:57:40 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 5 Mar 2001 11:57:40 +0100 (CET)
Subject: [XML-SIG] [ANN] PyPaSax
Message-ID: <Pine.LNX.4.21.0103051152380.29612-100000@leo.logilab.fr>

I'm releasing today a utility we use at Logilab for documenting the code
of Narval. We call it pypasax. It uses the parser module to extract
information about classes and methods and generates an XML tree from this.
We are working on XSLT to generate XMI files so that we can import the
data in some UML tool such as ArgoUML. 

More information can be found at http://www.logilab.org/pypasax/

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From nyenyec@mailbox.hu  Mon Mar  5 14:29:54 2001
From: nyenyec@mailbox.hu (Nyenyec)
Date: 5 Mar 2001 14:29:54 -0000
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
Message-ID: <20010305142954.16107.qmail@netfinity2.mailbox.hu>

Hi,

I try to pretty-print an XML file using the XML package v0.6.2.

My problem is with the doctype node.
I'm writing a web.xml (Java Servlet config) file and it has a DOCTYPE
like this:

<!DOCTYPE web-app 
      PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" 
      "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">

The problematic code in xml.dom.ext.Printer.py is in the PrintVisitor class:

    def visitDocumentType(self, node):
        if node.systemId != '': ################### WHY?????
            self.__emptyReturn = 0
            self.stream.write("<!DOCTYPE " + node.name)
            if node.publicId != '':
                self.stream.write(' PUBLIC "' + node.publicId + '"')
            self.stream.write(' SYSTEM "' + node.systemId + '" ')
            if node.entities.length or node.notations.length:
                self.stream.write('[')
                self.visitNamedNodeMap(node.entities)
                self.visitNamedNodeMap(node.notations)
                self.stream.write(']')
            self.stream.write('>')
        return


It seems that if the DOCTYPE node does not have a SYSTEM id, it will not
be printed at all? Is this deliberate or is this a bug?
What is the simplest workaround?

Is there another (simpler) way to pretty print XML files?

Thanks,
Nyenyec

--------------------------------------------------
 Mi az �n MailBox c�me? - http://mailbox.hu


From akuchlin@mems-exchange.org  Mon Mar  5 14:04:28 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 5 Mar 2001 09:04:28 -0500
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
In-Reply-To: <20010305142954.16107.qmail@netfinity2.mailbox.hu>; from nyenyec@mailbox.hu on Mon, Mar 05, 2001 at 02:29:54PM -0000
References: <20010305142954.16107.qmail@netfinity2.mailbox.hu>
Message-ID: <20010305090428.A27565@newcnri.cnri.reston.va.us>

On Mon, Mar 05, 2001 at 02:29:54PM -0000, Nyenyec wrote:
>It seems that if the DOCTYPE node does not have a SYSTEM id, it will not
>be printed at all? Is this deliberate or is this a bug?

Likely to be deliberate, because I don't think you can have a DOCTYPE
without a system ID; the two choices are:

<!DOCTYPE PUBLIC pubId systemId>
<!DOCTYPE SYSTEM systemId>

Is this actually causing a problem for you?  Is pubId present while
systemId is "" or None?  If so, what parser aare you using, and what
does your file's declaration look like?

--amk


From xml-sig@thewrittenword.com  Mon Mar  5 16:51:42 2001
From: xml-sig@thewrittenword.com (xml-sig@thewrittenword.com)
Date: Mon, 5 Mar 2001 10:51:42 -0600
Subject: [XML-SIG] Patch to 0.6.4 to add --with-libexpat and --ldflags
Message-ID: <20010305105142.A23879@postal.il.thewrittenword.com>

Patch to add command-line arguments --with-libexpat=PATH to specify
location of libexpat include/lib files and --ldflags=STR to add
arbitrary linker flags to build the resulting object file (on some
systems, need to set runtime path to the libexpat shared library).

-- 
albert chin (china@thewrittenword.com)

-- snip snip
--- setup.py.orig	Thu Mar  1 11:45:51 2001
+++ setup.py	Thu Mar  1 12:01:45 2001
@@ -35,6 +35,19 @@
     def xml(s):
         return "_xmlplus"+s
 
+# special command-line arguments
+LIBEXPAT = None
+LDFLAGS = []
+
+args = sys.argv[:]
+for arg in args:
+    if string.find(arg, '--with-libexpat=') == 0:
+        LIBEXPAT = string.split(arg, '=')[1]
+        sys.argv.remove(arg)
+    elif string.find(arg, '--ldflags=') == 0:
+        LDFLAGS = string.split(string.split(arg, '=')[1])
+        sys.argv.remove(arg)
+
 def should_build_pyexpat():
     try:
         import pyexpat
@@ -56,6 +69,9 @@
         return 0
 
 def get_expat_prefix():
+    if LIBEXPAT:
+        return LIBEXPAT
+
     for p in ("/usr", "/usr/local"):
         incs = os.path.join(p, "include")
         libs = os.path.join(p, "lib")
@@ -100,6 +116,7 @@
                       include_dirs=include_dirs,
                       library_dirs=library_dirs,
                       libraries=libraries,
+                      extra_link_args=LDFLAGS,
                       sources=sources
                       ))
 

From martin@loewis.home.cs.tu-berlin.de  Mon Mar  5 22:47:45 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Mar 2001 23:47:45 +0100
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
In-Reply-To: <20010305142954.16107.qmail@netfinity2.mailbox.hu>
 (nyenyec@mailbox.hu)
References: <20010305142954.16107.qmail@netfinity2.mailbox.hu>
Message-ID: <200103052247.f25Mljc00873@mira.informatik.hu-berlin.de>

> It seems that if the DOCTYPE node does not have a SYSTEM id, it will not
> be printed at all? Is this deliberate or is this a bug?

That's a bug.

> What is the simplest workaround?

Update to 0.6.4, which has this bug fixed.

> Is there another (simpler) way to pretty print XML files?

Depends on what you've got. If it is a DOM tree, then the answer is
probably "no". You could traverse it yourself, but it won't be
simpler.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Mar  5 22:56:22 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Mar 2001 23:56:22 +0100
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
In-Reply-To: <20010305090428.A27565@newcnri.cnri.reston.va.us> (message from
 Andrew Kuchling on Mon, 5 Mar 2001 09:04:28 -0500)
References: <20010305142954.16107.qmail@netfinity2.mailbox.hu> <20010305090428.A27565@newcnri.cnri.reston.va.us>
Message-ID: <200103052256.f25MuMw00895@mira.informatik.hu-berlin.de>

> Likely to be deliberate, because I don't think you can have a DOCTYPE
> without a system ID

Why is that? If the doctype only consists of an internal subset, then
there is nothing wrong with not having a system id, e.g. as in

<!DOCTYPE foo [
 <!ELEMENT foo (bar*)>
 <!ELEMENT bar (#PCDATA)>
]>


> <!DOCTYPE PUBLIC pubId systemId>
> <!DOCTYPE SYSTEM systemId>

I think the name of the root element is required. The syntax of
doctypedecl is

[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? 
            ('[' (markupdecl | PEReference | S)* ']' S?)? '>'

where ExternalId is

[75] ExternalID ::= 'SYSTEM' S SystemLiteral | 
                    'PUBLIC' S PubidLiteral S SystemLiteral

So you can't have a public ID without a system ID; you certainly can
have neither.

A related question: Is it well-formed to have neither system id nor
internal subset, i.e.

<!DOCTYPE foo>

If well-formed, can that ever appear in a valid document?

Regards,
Martin


From jeremy.kloth@fourthought.com  Tue Mar  6 00:01:23 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Mon, 05 Mar 2001 17:01:23 -0700
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
References: <20010305142954.16107.qmail@netfinity2.mailbox.hu> <20010305090428.A27565@newcnri.cnri.reston.va.us> <200103052256.f25MuMw00895@mira.informatik.hu-berlin.de>
Message-ID: <3AA428D3.CDEA60D9@fourthought.com>

"Martin v. Loewis" wrote:
> 
> > Likely to be deliberate, because I don't think you can have a DOCTYPE
> > without a system ID
> 
> Why is that? If the doctype only consists of an internal subset, then
> there is nothing wrong with not having a system id, e.g. as in
> 
> <!DOCTYPE foo [
>  <!ELEMENT foo (bar*)>
>  <!ELEMENT bar (#PCDATA)>
> ]>
> 
> > <!DOCTYPE PUBLIC pubId systemId>
> > <!DOCTYPE SYSTEM systemId>
> 
> I think the name of the root element is required. The syntax of
> doctypedecl is
> 
> [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
>             ('[' (markupdecl | PEReference | S)* ']' S?)? '>'
> 
> where ExternalId is
> 
> [75] ExternalID ::= 'SYSTEM' S SystemLiteral |
>                     'PUBLIC' S PubidLiteral S SystemLiteral
> 
> So you can't have a public ID without a system ID; you certainly can
> have neither.
> 
> A related question: Is it well-formed to have neither system id nor
> internal subset, i.e.
> 
> <!DOCTYPE foo>
> 
> If well-formed, can that ever appear in a valid document?

According to the doctypedecl, the answer would be yes.  The ExternalID
is optional and so is the internal subset.  Both are followed by a
question mark.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Tue Mar  6 06:48:17 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 6 Mar 2001 07:48:17 +0100
Subject: [XML-SIG] Missing DOCTYPE when pretty printing
In-Reply-To: <3AA428D3.CDEA60D9@fourthought.com> (message from Jeremy Kloth on
 Mon, 05 Mar 2001 17:01:23 -0700)
References: <20010305142954.16107.qmail@netfinity2.mailbox.hu> <20010305090428.A27565@newcnri.cnri.reston.va.us> <200103052256.f25MuMw00895@mira.informatik.hu-berlin.de> <3AA428D3.CDEA60D9@fourthought.com>
Message-ID: <200103060648.f266mHD00831@mira.informatik.hu-berlin.de>

> According to the doctypedecl, the answer would be yes.  The ExternalID
> is optional and so is the internal subset.  Both are followed by a
> question mark.

So it would be well-formed, yes. However, it would seem that this
gives a document type with no element definitions. In turn, any
document using that doctype will be invalid - even the root element is
undefined.

Regards,
Martin


From mda@discerning.com  Tue Mar  6 22:47:49 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Tue, 6 Mar 2001 14:47:49 -0800
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
Message-ID: <027601c0a68f$7abee980$9200a8c0@mdaxke>

this morning i decided to try out python for an xml hack, rather than my tried-and-true perl.
well, that was this morning, and now it is the afternoon....
i do have tmproc working now (my first goal), but it was heavy going there because
of the lack of road signs for the new person (new to python and its xml tools, but
experienced with xml and other programming languages).

so this is me just letting of steam (i know there is equal or greater chaos in the
current state of perl xml tools, but i already know that chaos).

in particular, what is the relationship between:
- the saxlib available from http://www.garshol.priv.no/download/software/saxlib/
- the xml core package that comes with python 2.x
- the _xmlplus package that comes with the pyxml package from the xml-sig at sourceforge

i can't find any explanation accessible from various top-level pages:
   http://pyxml.sourceforge.net/topics/
   http://www.python.org/sigs/xml-sig/
   http://www.python.org/sigs/xml-sig/status.html
   http://www.python.org/doc/howto/xml/ .
nor do any of the three packages above seem to have any obvious mention of the other two.
nor can i find an "xml and python faq", though surely this issue is an example of such a faq.
another would be: "will old python programs written against sax1 work with the latest pyxml?"

i did find a long, confusing, and inconclusive email thread several months ago on python-dev
http://mail.python.org/pipermail/python-dev/2000-September/009369.html

i've also looked at the ugly hack in xml/__init__.py for loading _xmlplus, though i still don't
know what the difference is between the packages.

btw, http://mail.python.org/mailman/listinfo/xml-sig
has dead links to http://mail.python.org/sigs/xml-sig/status.html
and http://mail.python.org/sigs/xml-sig/links.html

btw also, is it expected that the pyxml win32 installer for 2.0 not work with the python 2.1 beta?
when i ran the installer, it didn't even find the 2.1 installation.
if binary packages are obsoleted by dot revisions in the core, it is going to be painful for everyone.

btw again, another faq should be how urllib deals with win32 drive letters.
it barfs on things like "c:/tmp/myfile.xml" which is inconvenient but understandable, because there is no such thing
as a "c" scheme. using the "|" convention works: "c|/tmp/myfile.xml".
it works with "file:c:/tmp/myfile.xml" and "file:c|/tmp/myfile.xml".
the strings file:///c|/tmp/myfile.xml and file://c|/tmp/myfile.xml fail but file:/c|/tmp/myfile.xml works.
AFAIK this all differs slightly from java and from rfc1738.

-mda


From martin@loewis.home.cs.tu-berlin.de  Wed Mar  7 07:16:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 7 Mar 2001 08:16:39 +0100
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
In-Reply-To: <027601c0a68f$7abee980$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke>
Message-ID: <200103070716.f277Gds01961@mira.informatik.hu-berlin.de>

> i do have tmproc working now (my first goal), but it was heavy going
> there because of the lack of road signs for the new person (new to
> python and its xml tools, but experienced with xml and other
> programming languages).

Sorry for the confusion. Please notice that you are a "rare case";
most people complaining about bad documentation are familiar with
Python but new to XML, so they need to understand terms like "parser",
"event-driven", "tree-based", etc.

> in particular, what is the relationship between:
> - the saxlib available from http://www.garshol.priv.no/download/software/saxlib/
> - the xml core package that comes with python 2.x
> - the _xmlplus package that comes with the pyxml package from the xml-sig at sourceforge

As you can see from the "last release" date on the saxlib page, this
package is quite outdated. It has been incorporated in PyXML in the
past, and is known today as "Python SAX version 1". Today, the
preferred SAX API is SAX2, which is included in Python 2 and PyXML
(PyXML continues to provide the SAX1 interfaces).

In addition to the API spec, there is a number of SAX drivers in each
package. The saxlib has the SAX1 drivers, Python 2 only has a Expat
SAX2 driver, and PyXML has SAX1 and SAX2 drivers (in the latter
category, only Expat and xmlproc).

PyXML is meant as a strict superset of the Python 2 XML offerings; in
all aspects that are present in Python 2, PyXML should behave
identical (as far as possible and reasonable).

> i can't find any explanation accessible from various top-level pages:
>    http://pyxml.sourceforge.net/topics/
>    http://www.python.org/sigs/xml-sig/
>    http://www.python.org/sigs/xml-sig/status.html
>    http://www.python.org/doc/howto/xml/ .

> nor do any of the three packages above seem to have any obvious
> mention of the other two.

In the README of PyXML itself, you'll notice that saxlib 1.0 is
included. The relationship with Python 2 should be documented better;
thanks for pointing that out.

> nor can i find an "xml and python faq", though surely this issue is
> an example of such a faq.

So far, people have been using the tutorial, and API documentation. I
couldn't say that any specific question is asked frequently - this is
the first time that your question comes up on this list.

> another would be: "will old python programs written against sax1
> work with the latest pyxml?"

Yes; people find out by trying. There is at least one minor
incompatibility: In Python 2, SAX drivers may produce Unicode strings,
which old applications may not expect.

> i've also looked at the ugly hack in xml/__init__.py for loading
> _xmlplus, though i still don't know what the difference is between
> the packages.

That hack is needed to provide the "strict superset" relationship
between PyXML and Python 2. It allows you to think of PyXML in terms
of "from xml.sax import ...", instead of "from _xmlplus.sax import
...". If PyXML is installed on top of Python 1.5.2, it will call its
package directory "xml".

> btw also, is it expected that the pyxml win32 installer for 2.0 not
> work with the python 2.1 beta?

Yes, binary modules will need recompilation - the extension modules
contain references to "python20.dll", and hell breaks lose if you load
conflicting python<foo>.dll into the same process (and try to access
them from the same interpreter).

> when i ran the installer, it didn't even find the 2.1 installation.

That is intentional, yes. To use PyXML with Python 2.1b1, you'll need
to compile it yourself from sources; that requires a VC++
installation.

> if binary packages are obsoleted by dot revisions in the core, it is
> going to be painful for everyone.

Unfortunately, that is a specific form of "DLL hell"; there is not
much that can be done about it except guaranteeing that conflicting
things are not used together - the installer refusing to install the
package anywhere else is one aspect of that.

> btw again, another faq should be how urllib deals with win32 drive letters.
>
> it barfs on things like "c:/tmp/myfile.xml" which is inconvenient
> but understandable, because there is no such thing

Likely, there should be, yes - but there appears to be no expert that
can say for sure what the "right way" is. In any case, you'll need to
pass URLs to urllib, and as system identifiers to XML libraries. On
Unix, passing file names should "work" in most cases; on Windows,
things are a bit more complicated.

If you can give a consistent story of how things *should* work, I'll
start a FAQ list (since your message is the third instance of this
question during this year - which makes it frequent :-). Out of
curiosity: how do you interpret RFC 1738 with regard to drive letters?
I.e. what is the URL referring to C:\autoexec.bat?

Regards,
Martin


From nobody@usw-sf-web2.sourceforge.net  Wed Mar  7 17:15:55 2001
From: nobody@usw-sf-web2.sourceforge.net (nobody)
Date: Wed, 07 Mar 2001 09:15:55 -0800
Subject: [XML-SIG] [ pyxml-Patches-406732 ] --with-libexpat and --ldflags options
Message-ID: <E14ahXX-0008Vf-00@usw-sf-web2.sourceforge.net>

Patches #406732, was updated on 2001-03-07 09:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=406732&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: The Written Word (china)
Assigned to: Nobody/Anonymous
Summary: --with-libexpat and --ldflags options

Initial Comment:
Patch to add command-line arguments
--with-libexpat=PATH to specify location of libexpat
include/lib files and --ldflags=STR to add arbitrary
linker flags to build the resulting object file (on
some systems, need to set runtime path to the libexpat
shared library).

ftp://ftp.thewrittenword.com/outgoing/pub/PyXML-0.6.4.patch

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=406732&group_id=6473


From mda@discerning.com  Wed Mar  7 17:53:01 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Wed, 7 Mar 2001 09:53:01 -0800
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de>
Message-ID: <001101c0a72f$761f2cf0$9200a8c0@mdaxke>

> Sorry for the confusion. Please notice that you are a "rare case";

i've heard that before :).

> As you can see from the "last release" date on the saxlib page, this
> package is quite outdated. It has been incorporated in PyXML in the
> past, and is known today as "Python SAX version 1".

it'd be nice if lars updated his page to note this. though it is old, there are still
quite a few links pointing to his page, not to pyxml or python 2.

> In addition to the API spec, there is a number of SAX drivers in each
> package. The saxlib has the SAX1 drivers, Python 2 only has a Expat
> SAX2 driver, and PyXML has SAX1 and SAX2 drivers (in the latter
> category, only Expat and xmlproc).

another useful faq somewhere would be about expat. this is actually
a PITA for the perl world too right now -- apache links in expat
(optional, but not if dav is linked in), and then XML::Parser pulls in another expat,
and probably both are different from the latest one, things start crashing.
they all used to be statically linked but can now use a separate package direct from
the sourceforge expat project. (I'm one of the unfortunate few who actually
understands all this, and i'm the first to admit i haven't done my part in writing it up).

so for python, suppose you wanted to upgrade to the latest sourceforge expat.
is that possible? is the expat dll relied upon by any core python modules?
does pyexpat make any changes relative to the SF expat distribution?
what happens if you want to use mod_py and an apache with expat linked in?
or worse yet, suppose you wanted to link mod_py and mod_perl and mod_dav into apache?

> PyXML is meant as a strict superset of the Python 2 XML offerings; in
> all aspects that are present in Python 2, PyXML should behave
> identical (as far as possible and reasonable).

is this situation going to remain indefinitely?
does this mean that any other "foo" sig who produces something part of python
core is going to have to do a similarly ugly python/_fooplus ?

> In the README of PyXML itself, you'll notice that saxlib 1.0 is
> included. 

true, though generally people like to know what is in a package before
they download it...

> > if binary packages are obsoleted by dot revisions in the core, it is
> > going to be painful for everyone.
> 
> Unfortunately, that is a specific form of "DLL hell"; there is not
> much that can be done about it except guaranteeing that conflicting
> things are not used together - the installer refusing to install the
> package anywhere else is one aspect of that.

well, 2.1 doesn't *have* to call its dll python21.dll
after all, why do we all have msvc42.dll on our windows boxes?
that is just one choice about how to force incompatibility.
obviously someone chose to make all the 2.1 betas and alphas share a dll name.

what win32 perl is currently doing is a perl56.dll.
that would seem similar to what python is doing today, except that:
1. perl changes its second version digit far less often. that one dll works with all but the earliest
activestate 600 series, spanning well over a year. if python is going to be doing a dot rev every 3 months,
things will be painful.
2. python distutils is not very close yet to the power and convenience of perl's ppm or
"perl -MCPAN -e shell" so upgrading binary packages over the net is harder.
3. perl has a more sophisticated import search facility than python's, which
attempts to pick the highest version of a module which is applicable, for lib
directories structured a certain way, making it possible to have a single lib directory shared
among multiple perls.

> > btw again, another faq should be how urllib deals with win32 drive letters.

i'll start a new thread on this.

-mda


From mda@discerning.com  Wed Mar  7 18:25:56 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Wed, 7 Mar 2001 10:25:56 -0800
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de>
Message-ID: <003b01c0a735$b2b78210$9200a8c0@mdaxke>

(was: "saxlib, xml, _xmlplus, etc.")

Martin v. Loewis says:
> Likely, there should be, yes - but there appears to be no expert that
> can say for sure what the "right way" is. 

true enough. a lot has happened since rfc1738.

>In any case, you'll need to
> pass URLs to urllib, and as system identifiers to XML libraries. On
> Unix, passing file names should "work" in most cases; on Windows,
> things are a bit more complicated.

and unfortunately often the effort to make the unix case "work" makes the
windows case work less often. i've had the same difficulty with various java tools.
they check for a leading slash or a "^\w:" match to determine whether the 
string which is passed in is a uri or a host path.

> If you can give a consistent story of how things *should* work, I'll
> start a FAQ list (since your message is the third instance of this
> question during this year - which makes it frequent :-). Out of
> curiosity: how do you interpret RFC 1738 with regard to drive letters?
> I.e. what is the URL referring to C:\autoexec.bat?

it really is a morass. here are some notes which mostly just serve to clarify how awful it is....

rfc 1738 states:

   A file URL takes the form:
       file://<host>/<path>
   where <host> is the fully qualified domain name of the system on
   which the <path> is accessible, and <path> is a hierarchical
   directory path of the form <directory>/<directory>/.../<name>.
   [...]
   As a special case, <host> can be the string "localhost" or the empty
   string; this is interpreted as `the machine from which the URL is
   being interpreted'.


So this would mean that if localhost is implied, all file urls should have (at least) three slashes.
Assuming that the rfc means that the "/" is purely syntactic, what you should expect to work is:
   file:////etc/passwd        (4 slashes, because of the leading "/")
   file:///c:\autoexec.bat
   file:///\\drv\autoexec.bat
   file://///drv/autoexec.bat       (5 slashes, since forward slashes work on win32 too)

but:
- there is sometimes the convention (not rfc that i know of) of allowing "|" for ":"
- there is sometimes the convention (not rfc that i know of) of allowing file:<path> without the 3 slashes
- most software gives unhelpful errors if someone attempts to specify a host in the file url
- relative urls (i.e. without a scheme; see rfcs 1808 and 2396) complicate matters; in particular
  they indicate that absolute urls are signaled with a leading slash, suggesting "/c:/autoexec.bat",
  which rarely works in any software.
- existing software usually treats the url "/" before the path to be part of the path, using 3 slashes, not 4
  and in particular most url libraries return the leading slash in their path() function, and CGI variables
  like SCRIPT_PATH usually do too. It seems clear though that the original intent was for the path
  to not include the url syntactic separator, and in fact the rfc for NFS urls (rfc2224) makes this explicit

personally, what i'd suggest is:
1. RFC-compliant urls must be handled. 
2. Any code which attempts to accept a string which may be either a url or a local path,
should be as flexible on win32 as unix. That is, if the code accepts "/etc/passwd", it should
also accept "c:/autoexec.bat", even though "c:" might be mistaken as a url scheme.
there is zero chance of a single-letter url scheme being standardized, and anyway it actually
isn't ambiguous because win32 paths are never of the form "c://", so the double slash can
distinguish things.
3. When not introducing conflicts with current standards or other platforms, software should
match the defacto behavior of internet explorer when parsing file: urls.
4. URL libraries must at least document what they choose to return as the path for the strings
   file:, http://localhost, file:/, http://localhost/

Today, python urllib is not doing any of these, rejecting file:///c:\autoexec.bat and c:/autoexec.bat

-mda


From martin@loewis.home.cs.tu-berlin.de  Wed Mar  7 21:47:58 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 7 Mar 2001 22:47:58 +0100
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
In-Reply-To: <001101c0a72f$761f2cf0$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke>
Message-ID: <200103072147.f27Llwm01178@mira.informatik.hu-berlin.de>

> another useful faq somewhere would be about expat. this is actually
> a PITA for the perl world too right now -- apache links in expat
> (optional, but not if dav is linked in), and then XML::Parser pulls
> in another expat, and probably both are different from the latest
> one, things start crashing.  they all used to be statically linked
> but can now use a separate package direct from the sourceforge expat
> project. (I'm one of the unfortunate few who actually understands
> all this, and i'm the first to admit i haven't done my part in
> writing it up).

In Python, on Unix, multiple and different copies of expat are a
problem only if you have one statically linked into Python; PyXML will
then refuse to install. If multiple extension modules link expat,
those are opened with RTLD_LOCAL, so they won't interfere with each
other. Problems will occur once people decide that building expat as a
shared library is a good idea; at a minimum, you need different
sonames for them.

Don't know what the status on Windows is - expat *is* typically a DLL
there, so that is a bit more tricky; the expat maintainers better
start to put a version number into the DLL name.

> so for python, suppose you wanted to upgrade to the latest
> sourceforge expat.  is that possible? is the expat dll relied upon
> by any core python modules?

The pyexpat module (pyexpat.pyd) shipped with BeOpen Python 2.0 relies
on the expat DLLs (multiple!). The pyexpat.pyd shipped with the PyXML
binary distribution has expat linked statically, so it won't care
about any expat DLLs. If PyXML is installed, the pyexpat shipped with
Python won't be used anymore (unless you explicitly request it - it is
not overwritten).

> does pyexpat make any changes relative to the SF expat distribution?

Not sure what the question means. pyexpat.c can now use features of
multiple expat versions (although you can't distinguish 1.1, 1.2, and
1.95.1 programmatically - in the expat CVS, there is a version #define
now). The expat version being used is used unmodified, except perhaps
for the build procedure: in PyXML, we have expat 1.2 incorporated, so
we build it ourselves. There are actually a few changes compared to
expat 1.2, e.g. to not use C++-style comments; the stock version will
work fine as well.

> what happens if you want to use mod_py and an apache with expat
> linked in?  or worse yet, suppose you wanted to link mod_py and
> mod_perl and mod_dav into apache?

On Unix, nothing bad will happen. On Windows, it would be best to use
the PyXML build process: link it statically. Or, if building from
sources, use the same expat version to build all of them.

> > PyXML is meant as a strict superset of the Python 2 XML offerings; in
> > all aspects that are present in Python 2, PyXML should behave
> > identical (as far as possible and reasonable).
> 
> is this situation going to remain indefinitely?

I'm not planning for the eternity. For the forseeable future, yes.

> does this mean that any other "foo" sig who produces something part
> of python core is going to have to do a similarly ugly
> python/_fooplus ?

No. Normally, you cannot replace a module from the standard
library. For the XML package, there was a special exception. It is
only ugly when you look at it; normally, you don't need to be concern
with it.

> > Unfortunately, that is a specific form of "DLL hell"; there is not
> > much that can be done about it except guaranteeing that conflicting
> > things are not used together - the installer refusing to install the
> > package anywhere else is one aspect of that.
> 
> well, 2.1 doesn't *have* to call its dll python21.dll
> after all, why do we all have msvc42.dll on our windows boxes?

Because Microsoft has frozen the API of msvc42.dll. Actually, this
library is only needed for old applications; new applications link
with msvcp60.dll (or msvcp60d.dll or msvc60u.dll or ...).

You must rename the library if you change the API - even if it is a
change to a function "that nobody uses". Such a change happened for
2.1 - the function that creates frame objects takes two additional
parameters (lists of cell objects).

> obviously someone chose to make all the 2.1 betas and alphas share a
> dll name.

Yes, that might cause binary incompatibilities - if you have build
programs against the betas, you need to rebuild once the final release
happens.

> 1. perl changes its second version digit far less often.

That is simply not true. Over a period of eight years, there where 6
Python releases (I think); only three of them resulting in Win32 DLLs.
python15.dll lasted 2.8 years or so (since January 1998).

> if python is going to be doing a dot rev every 3 months, things will
> be painful.

It won't.

> 2. python distutils is not very close yet to the power and
> convenience of perl's ppm or "perl -MCPAN -e shell" so upgrading
> binary packages over the net is harder.

Distutils is rather the equivalent of makefile.pl, not of CPAN; I
agree that upgrading is harder.

> 3. perl has a more sophisticated import search facility than python's,
> which attempts to pick the highest version of a module which is
> applicable, for lib directories structured a certain way, making it
> possible to have a single lib directory shared among multiple perls.

Won't this break terribly if the ABI has changed (or even if just a
different set of options was used to build the previous release)? I
always tell perl installations not to look for old versions of the
modules; everytime I use CPAN, it essentially reinstalls my entire
system (as far as perl modules go). It might be better, but it isn't
perfect, either...

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Mar  7 22:15:17 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 7 Mar 2001 23:15:17 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <003b01c0a735$b2b78210$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <003b01c0a735$b2b78210$9200a8c0@mdaxke>
Message-ID: <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de>

> rfc 1738 states:
> 
>    A file URL takes the form:
>        file://<host>/<path>
>    where <host> is the fully qualified domain name of the system on
>    which the <path> is accessible, and <path> is a hierarchical
>    directory path of the form <directory>/<directory>/.../<name>.
>    [...]
>    As a special case, <host> can be the string "localhost" or the empty
>    string; this is interpreted as `the machine from which the URL is
>    being interpreted'.
> 
> 
> So this would mean that if localhost is implied, all file urls should have (at least) three slashes.
> Assuming that the rfc means that the "/" is purely syntactic, what you should expect to work is:
>    file:////etc/passwd        (4 slashes, because of the leading "/")
>    file:///c:\autoexec.bat
>    file:///\\drv\autoexec.bat
>    file://///drv/autoexec.bat       (5 slashes, since forward slashes work on win32 too)

That clearly is not the intention of the RFC. It "essentially" says
that <path> is a slash-separated list of directories, forming a
hierarchy; ie. the intention is that it does not start with a
slash. So /etc/passwd clearly is

file:///etc/passwd

It then gives the example of a VMS file name
DISK$USER:[MY.NOTES]NOTE123456.TXT, saying that it might become (*)
file://vms.host.edu/disk$user/my/notes/note12345.txt. So the intention
clearly is that hierarchy is presented using /. Apparently,
translation between a file name and a <path> is meant to be executed
in a system-dependent manner, but many systems failed to define a
procedure for doing so. Considering that one needs to distinguish the
drv case, the logical form would be

file://C:/autoexec.bat

Regards,
Martin

(*) The 'might' probably refers to the fact that the URL introduces
vms.host.edu, which was not mentioned before.


From mclay@nist.gov  Wed Mar  7 22:44:11 2001
From: mclay@nist.gov (Michael McLay)
Date: Wed, 7 Mar 2001 17:44:11 -0500
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de>
Message-ID: <0103071744111X.28858@fermi.eeel.nist.gov>

On Wednesday 07 March 2001 17:15, Martin v. Loewis wrote:
> > rfc 1738 states:
>
> translation between a file name and a <path> is meant to be executed
> in a system-dependent manner, but many systems failed to define a
> procedure for doing so. Considering that one needs to distinguish the
> drv case, the logical form would be
>
> file://C:/autoexec.bat

This mapping skips a slash for the hostname. I'm using a commercial tool, XML 
Authority, that is written in Java.  It maps the local file:

	C:/windows/command.com

to:
	file:///C:/windows/command.com

This looks consistent with the example mapping of a VMS logical drive in RFC 
1738:

  For example, a VMS file

     DISK$USER:[MY.NOTES]NOTE123456.TXT

   might become

     <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>


From mda@discerning.com  Thu Mar  8 00:31:31 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Wed, 7 Mar 2001 16:31:31 -0800
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke> <200103072147.f27Llwm01178@mira.informatik.hu-berlin.de>
Message-ID: <00f501c0a767$1fd75650$9200a8c0@mdaxke>

> You must rename the library if you change the API - even if it is a
> change to a function "that nobody uses". Such a change happened for
> 2.1 - the function that creates frame objects takes two additional
> parameters (lists of cell objects).

true enough. i didn't know that changing function signatures was "allowed"
in dot revs of python (versus only adding functions, possibly with "Ext" or "2" at the end...).

> > 1. perl changes its second version digit far less often.
> 
> That is simply not true. Over a period of eight years, there where 6
> Python releases (I think); only three of them resulting in Win32 DLLs.
> python15.dll lasted 2.8 years or so (since January 1998).

i meant perl changing the version digit in the dll in such a way as to invalidate existing
binary modules. my perl56.dll has worked with binary modules built with other
perls, and i have upgraded my perl56.dll repeatedly with different activestate releases.

in retrospect, i'm not actually sure there is much different here between python
and perl in binary compatibility. it is just that python is bringing out 2.1 shortly after 2.0,
while the perl world has been effectively frozen for a year or so while the powers that be
contemplate perl6.

> > 3. perl has a more sophisticated import search facility than python's,
> > which attempts to pick the highest version of a module which is
> > applicable, for lib directories structured a certain way, making it
> > possible to have a single lib directory shared among multiple perls.
> 
> Won't this break terribly if the ABI has changed (or even if just a
> different set of options was used to build the previous release)? I
> always tell perl installations not to look for old versions of the
> modules; everytime I use CPAN, it essentially reinstalls my entire
> system (as far as perl modules go). It might be better, but it isn't
> perfect, either...

on unix, perl embeds the perl version in the site_perl hierarchy, so that 
multiple perl installations can share that same hierarchy. then the import search
path is initialized appropriately in any perl using that site_perl so it only "sees"
the site_perl branches that match its version.

windows perl doesn't do that, and it should. 
historically activestate has horked up the install directory structure in various
ways deviating from the unix one; it seems to have gotten more similar over
the years. 

-mda


From tpassin@home.com  Thu Mar  8 00:48:53 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 7 Mar 2001 19:48:53 -0500
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <003b01c0a735$b2b78210$9200a8c0@mdaxke>
Message-ID: <008a01c0a769$8d6cdb20$7cac1218@reston1.va.home.com>

Mark D. Anderson writes about file: urls.

Mark, here is a copy of a message I posted last month on this tricky subject.
I've been hoping to get some agreement on the usage so we can start building
it in.  I'm glad you brought it up.

Cheers,

Tom P


==================================================
This file: business is trickier than it seems, because the RFC is ambiguous
for file: urls.  A pipe character isn't in the rfc at all even though it's
used by some of the browsers.

I strongly suggest that when a local file is intended, that one should use the
file: scheme.  That way, the application doesn't have to guess and it won't
try a spurious url if the file isn't found.  The way it's done in this example
is just asking for continuous trouble, as I guess we're seeing now.

I think we should come to an agreement with the maintainer of the urllib about
the allowed forms for file: schemes.  It's mainly on Windows (and, perhaps,
Macs) that there would be a problem.  My preferred forms are these, for a file
at d:\temp\python\thefile.xml -

1) file:///d:/temp/python/thefile.xml

2) file:///d:\temp\python\thefile.xml

Both of these comply fully with the rfc.  2) is an "opaque" form - no further
parsing would be done by the url processor, it would just pass it to the os.
1) is what you get according to the rfc when you want the url processor to be
able to parse out the path parts.  The processor is supposed to know to
replace slashes by backslashes if appropriate for the os.

Either 1) or 2) would also work for files on a network file system, if you put
the host name in there -

file://host/temp/python/thefile.xml

1) would be more portable, and is my preference.  The processor should be able
to handle both, however.  For backwards compatibility, form 3) should also be
accepted, I suppose:

3) file:d:\temp\python\thefile.xml

This could be negotiated, though.

Let's agree on this and get it working right!


From mda@discerning.com  Thu Mar  8 00:47:13 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Wed, 7 Mar 2001 16:47:13 -0800
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov>
Message-ID: <010901c0a769$50e0d350$9200a8c0@mdaxke>

> > file://C:/autoexec.bat
> 
> This mapping skips a slash for the hostname. I'm using a commercial tool, XML 
> Authority, that is written in Java.  It maps the local file:
> C:/windows/command.com
> to:
> file:///C:/windows/command.com
> This looks consistent with the example mapping of a VMS logical drive in RFC 
> 1738:
>...

exactly. if file:///C:/autoexec.bat is correct then file:////etc/passwd should be, regardless
of current practice. as i mentioned, this is clarified in the direction of the slash being
a syntactic separator only in the nfs url rfc2224.

however, i do realize current practice is for the 3-slash version for unix-style paths.

-mda


From tpassin@home.com  Thu Mar  8 01:37:46 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 7 Mar 2001 20:37:46 -0500
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke>
Message-ID: <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com>

Mark D. Anderson wrote -

> > > file://C:/autoexec.bat
> >
> > This mapping skips a slash for the hostname. I'm using a commercial tool,
XML
> > Authority, that is written in Java.  It maps the local file:
> > C:/windows/command.com
> > to:
> > file:///C:/windows/command.com
> > This looks consistent with the example mapping of a VMS logical drive in
RFC
> > 1738:
> >...
>
> exactly. if file:///C:/autoexec.bat is correct then file:////etc/passwd
should be, regardless
> of current practice. as i mentioned, this is clarified in the direction of
the slash being
> a syntactic separator only in the nfs url rfc2224.
>
> however, i do realize current practice is for the 3-slash version for
unix-style paths.
>

The triple slash really comes from an abbreviation.  The basic form is

scheme://host/path-on-host

For the file:scheme, the host is supposed to be localhost (for your own
machine), or the name of a network host if you want to refer to a file on a
network file system.  you are allowed to replace "locahost" by an empty
string, so you have either

file://localhost/path-on-local-machine

or

file:///path-on-local-machine

So far, so good.  The problem comes in when you ask what is the path for
windows?  You could use an opaque path, wherein the entire path is not to be
parsed by the url handler.  This should give you file:urls like this, which is
completely compatible with both the old and the new rfc:

1) file:///c:\temp\file_url.txt

Or you could use the parsable form, which uses forward slashes, which is also
compatible with the rfcs:

2) file:///c:/temp/file_url.txt

The rfcs don't allow a form with no slashes after the scheme's colon.  But
it's common enough that  it might be worthwhile to support it anyway.

Double or quadruple slashes should be disallowed.  To see this, just imagine
that you restore the "localhost" host name, or some other network host name.
It just doesn't work unless you have three slashes.

My recommendation is to allow both 1) and 2), and also possibly (needs more
discussion) to allow the form

file:c:\temp\file_url.txt

Cheers,

Tom P


From mda@discerning.com  Wed Mar  7 12:22:33 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Wed, 7 Mar 2001 04:22:33 -0800
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com>
Message-ID: <012301c0a701$4aa61c60$9200a8c0@mdaxke>

i'm definitely getting academic here, because i think the appropriate handling for windows file:
urls is fairly clear, and they are not handled properly by urllib, while the handling by urllib of unix-style
paths, while not what i consider "right thing", is what everyone else does.

but....

suppose we agree that file:///c:/autoexec.bat should work (this is the case of a collapsed localhost).
then the processing model is that if a url starts with file:/// then remove that prefix, and consider
the remainder (because /c:/autoexec.bat is not a proper local file).
ok, now do that to file:///etc/passwd and you get etc/passwd.
so that means a parser has to look at c:/autoexec.bat and etc/passwd and conclude that because
the first segment looks like a drive letter, it is ok, while etc/passwd needs a leading slash.
if the host slash separator were treated as purely a separator, then this heuristic would not be necessary.

i think it is fair to say that rfc1738 is ambiguous since they only give an mvs example.
but nfs urls are defined clearly to match my "cleaner" notion of purely lexical url processing,
as per http://www.faqs.org/rfcs/rfc2396.html :
   Note that the initial "/" that introduces the <url-path> of an NFS
   URL must not be passed to the server for multi-component lookup since
   the pathname is to be evaluated relative to the public filehandle
   directory.  For example, if the public filehandle is associated with
   the server's directory "/a/b/c" then the URL:
        nfs://server/d/e/f
   will be evaluated with a multi-component lookup of the path
   "d/e/f" relative to the server's directory "/a/b/c" while
   the URL:
        nfs://server//a/b/c/d/e/f
   will locate the same file with an absolute multi-component lookup of
   the path "/a/b/c/d/e/f" relative to the server's filesystem root.
   Notice that a double slash is required at the beginning of the path.

but wait, it gets worse.

we'd like certain functions to "just work" and handle either
a url or a local host path -- this is certainly what we'd like when we specify an
xml source on a command line. 
if so, then we'd also like to sometimes specify in *relative urls* in some of those same
cirmstances. and guess what? relative urls have no leading scheme and therefore
are lexically indistinguishable from some local host paths.
so in that case, if a processor sees etc/passwd, it should *not* add a leading slash,
since it is relative to either current working directory or the current url base, whichever
you like, and should instead look at /usr/etc/passwd or whatever.

so if we'd like to follow the non-rfc convention that a file:foobar url is allowed,
without the net_loc part of the url, then we should say that file:etc/passwd is
a relative url while file:/etc/passwd is absolute.

regardless, i think the policy should be independent of the operating system of the
server. that is, the url file:///c:/autoexec.bat should look for the file c:/autoexec.bat
on unix systems as well. It should be a purely lexical operation.
this is incidentally one of the annoying features of the rfc for imap urls, where in
their infinite wisdom they did not designate a standardized hierarchy separator, nor
even a url parameter to indicate one -- it is entirely up to the server to interpret.
this means no url processing library can do anything with an imap url by itself.

-mda


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 07:04:42 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 08:04:42 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <010901c0a769$50e0d350$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke>
Message-ID: <200103080704.f2874g501415@mira.informatik.hu-berlin.de>

> > > file://C:/autoexec.bat
> > 
> > This mapping skips a slash for the hostname. 

Oops, yes, it should be file:///C:/autoexec.bat

> exactly. if file:///C:/autoexec.bat is correct then
> file:////etc/passwd should be

No. The <path> is build as a sequence of directories, with a slash
between each directory, and a slash between the host and the first
hierarchy component (C: in the windows case). On Unix, the first
hierarchy component is etc, so it should use only three slashes.

> as i mentioned, this is clarified in the direction of the slash
> being a syntactic separator only in the nfs url rfc2224.

Looking at rfc2224, I can find no such clarification. It mentions that
the first slash is a syntactic separator in the nfs url; how does that
effect the file url?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 06:55:31 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 07:55:31 +0100
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
In-Reply-To: <00f501c0a767$1fd75650$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke> <200103072147.f27Llwm01178@mira.informatik.hu-berlin.de> <00f501c0a767$1fd75650$9200a8c0@mdaxke>
Message-ID: <200103080655.f286tVd01338@mira.informatik.hu-berlin.de>

> true enough. i didn't know that changing function signatures was
> "allowed" in dot revs of python (versus only adding functions,
> possibly with "Ext" or "2" at the end...).

It depends on what a "dot rev" is. Python 2.0.1 would be a pure bugfix
release; 2.1 isn't.

> i meant perl changing the version digit in the dll in such a way as
> to invalidate existing binary modules. my perl56.dll has worked with
> binary modules built with other perls, and i have upgraded my
> perl56.dll repeatedly with different activestate releases.

How does that work? Are these other perls also using perl56.dll? If
they had been using, say, perl55.dll, are the binary modules not
linked with perl55.dll? If they are, how does perl manage to use
perl56.dll and perl55.dll simultaneously?

> in retrospect, i'm not actually sure there is much different here
> between python and perl in binary compatibility. it is just that
> python is bringing out 2.1 shortly after 2.0, while the perl world
> has been effectively frozen for a year or so while the powers that
> be contemplate perl6.

Indeed. Python 2.0 was a major change over Python 1, and a number of
things needed to be fixed/extended/improved/completed, which caused
2.1 being released only five months after 2.0.

> > > 3. perl has a more sophisticated import search facility than python's,
> > > which attempts to pick the highest version of a module which is
> > > applicable, for lib directories structured a certain way, making it
> > > possible to have a single lib directory shared among multiple perls.
...
> on unix, perl embeds the perl version in the site_perl hierarchy, so
> that multiple perl installations can share that same hierarchy. then
> the import search path is initialized appropriately in any perl
> using that site_perl so it only "sees" the site_perl branches that
> match its version.

So you are saying that different perl versions share the same toplevel
lib directory, but do not share any library files? Why is that a good
thing?

In Python, if you want to share packages between Python installations,
you can put them in <prefix>/lib/site-python (instead of
<prefix>/lib/python<version>/site-packages). That, of course, requires
that the package actually works with all the Python versions
installed. Distutils cannot know for sure, so it installs packages
into site-packages by default.

> windows perl doesn't do that, and it should.  historically
> activestate has horked up the install directory structure in various
> ways deviating from the unix one; it seems to have gotten more
> similar over the years.

That is the case with Python also.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 07:12:59 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 08:12:59 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <008a01c0a769$8d6cdb20$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <008a01c0a769$8d6cdb20$7cac1218@reston1.va.home.com>
Message-ID: <200103080712.f287Cx401471@mira.informatik.hu-berlin.de>

> I think we should come to an agreement with the maintainer of the
> urllib about the allowed forms for file: schemes.  It's mainly on
> Windows (and, perhaps, Macs) that there would be a problem.  My
> preferred forms are these, for a file

> at d:\temp\python\thefile.xml -
> 
> 1) file:///d:/temp/python/thefile.xml
> 
> 2) file:///d:\temp\python\thefile.xml

While there appears certainly to be a need to change something, it is
not clear to me how we should come to an agreement. It seems that
there is already agreement on the fact that file URLs have a
system-specific syntax, so we can easily do NT/Win/DOS independently
from Mac, and that independently from Unix.

It also seems that for Unix, it "works" most of the time; focus should
probably be on Windows. Now, since file: works in a system dependent
manner, I'd look to the operating system manufacturer for
guidance. Does MS have any documentation on how file: URLs are
supposed to work? Does their software behave in a consistent way in
that matter? If so, I'd say it is safest to copy what MS does.

I can see the point of your proposal, and I agree it is in the spirit
of the RFC. I'd avoid implementing it until it can be established that
MS software works in the same way.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 07:43:35 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 08:43:35 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <012301c0a701$4aa61c60$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke>
Message-ID: <200103080743.f287hZ001784@mira.informatik.hu-berlin.de>

> suppose we agree that file:///c:/autoexec.bat should work (this is
> the case of a collapsed localhost).  then the processing model is
> that if a url starts with file:/// then remove that prefix, and
> consider the remainder (because /c:/autoexec.bat is not a proper
> local file).

Perhaps. Processing of file: URLs happens in a system-dependent
manner, so it could use one procedure on one system and another
procedure on another.

> ok, now do that to file:///etc/passwd and you get etc/passwd.

Sure. And that <path> denotes the file /etc/passwd, on Unix.

> so that means a parser has to look at c:/autoexec.bat and etc/passwd
> and conclude that because the first segment looks like a drive
> letter, it is ok, while etc/passwd needs a leading slash.

A different parser is used on Windows and Unix, so file:///etc/passwd
could mean different things on Windows and Unix. On Windows, it might
be ill-formed: for an absolute path, you need a drive letter (or else
you need to learn the current drive based on some magic processing
context); or it could mean \\etc\passwd (i.e. etc being the topmost
hierarchy level, if you allow file: URLs to denote UNC names). On
Unix, it clearly means /etc/passwd.

> i think it is fair to say that rfc1738 is ambiguous since they only
> give an mvs example.  but nfs urls are defined clearly to match my
> "cleaner" notion of purely lexical url processing,

Yes, but that is for the nfs: scheme; it does not tell anything about
the file: scheme.

> as per http://www.faqs.org/rfcs/rfc2396.html :
>    Note that the initial "/" that introduces the <url-path> of an NFS
>    URL must not be passed to the server for multi-component lookup since
>    the pathname is to be evaluated relative to the public filehandle
>    directory.  For example, if the public filehandle is associated with
>    the server's directory "/a/b/c" then the URL:
>         nfs://server/d/e/f
>    will be evaluated with a multi-component lookup of the path
>    "d/e/f" relative to the server's directory

That means something non-obvious: WebNFS (RFC 2054) has the notion of
a "public filehandle", which is a all-null file handle in NFSv2, and a
zero-length file handle in NFSv3; the directory associated with the
public filehandle is a matter of server configuration. So a "relative
path" starts at the directory associated with the public filehandle;
an "absolute path" starts with the directory associated with / on the
server. That does not readily translate to the file: scheme.

> we'd like certain functions to "just work" and handle either a url
> or a local host path -- this is certainly what we'd like when we
> specify an xml source on a command line.

Well, Guido argues that file names and URLs should not be mixed in XML
processing; that there should be separate APIs for putting in file
names and URLs. That is currently not the case, but it probably should
be. Then it is the application's matter to decide whether a string
they have is a file name or an URL.

> so in that case, if a processor sees etc/passwd, it should *not* add
> a leading slash, since it is relative to either current working
> directory or the current url base, whichever you like

It should be clear from the context whether a relative thing is a
relative file name or a relative URL; e.g. when it is passed by the
user, it is normally a relative file name, if it is an entity
definition, it is a relative URL.

> It should be a purely lexical operation.

That is clearly not the intention of the RFC; the conversion in the
VMS example shows that knowledge about the local file system is
required to process a file: URL.

Regards,
Martin


From larsga@garshol.priv.no  Thu Mar  8 09:02:46 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 08 Mar 2001 10:02:46 +0100
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
In-Reply-To: <001101c0a72f$761f2cf0$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke>
Message-ID: <m3zoew7hvt.fsf@lambda.garshol.priv.no>

* Mark D. Anderson
| 
| it'd be nice if lars updated his page to note this. though it is
| old, there are still quite a few links pointing to his page, not to
| pyxml or python 2.

I will update my page. I've had it on the todo list for a long time,
but will finally do it now. Thanks for pushing me.

--Lars M.


From nobody@usw-sf-web3.sourceforge.net  Thu Mar  8 13:18:05 2001
From: nobody@usw-sf-web3.sourceforge.net (nobody)
Date: Thu, 08 Mar 2001 05:18:05 -0800
Subject: [XML-SIG] [ pyxml-Bugs-407007 ] Insane amount of memory lost in FromXml
Message-ID: <E14b0Iv-0006Bk-00@usw-sf-web3.sourceforge.net>

Bugs #407007, was updated on 2001-03-08 05:18
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407007&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Luke Kenneth Casson Leighton
Assigned to: Nobody/Anonymous
Summary: Insane amount of memory lost in FromXml

Initial Comment:
calling FromXml uses an INSANE amount of memory.  the
larger the document, the more memory is lost.  here is
a demonstration that uses Cyclops.py (found from
searches on python.org for memory usage).

#!/usr/bin/env python
"""
"""
resdata = """<root >
<Schema name="Schema1" >
</Schema>
</root>
"""

from xml.dom.ext.reader import Sax2
from Cyclops import CycleFinder
def test():

    z = CycleFinder()
    d = Sax2.FromXml(resdata, validate=0, keepAllWs=1)
    z.register(d)
    del d
    z.find_cycles()
    z.show_stats()
    z.show_cycles()
    z.show_cycleobjs()
    z.show_sccs()
    z.show_arcs()
    print "dead root set objects:"
    for rc, cyclic, x in z.get_rootset():
        if rc == 0:
            z.show_obj(x)
    z.find_cycles(1)
    z.show_stats()

if __name__ == '__main__':
    test()

~               

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407007&group_id=6473


From rnd@onego.ru  Thu Mar  8 14:37:51 2001
From: rnd@onego.ru (Roman Suzi)
Date: Thu, 8 Mar 2001 17:37:51 +0300 (MSK)
Subject: [XML-SIG] Bug in re found ( expand hangs)
Message-ID: <Pine.LNX.4.30.0103081723510.27723-100000@rnd.onego.ru>

Hello!

It seems I have hit a bug in sre, which could be of interest to you.

Python 2.0 (#1, Oct 16 2000, 18:10:03)
[GCC 2.95.2 19991024 (release)] on linux2

import re
a = "abcdefghijklmnop"
m = re.match("(.)"*15, a)
print m.expand(r"\1")
print m.expand(r"\10")
... this takes too much time (probably forever)

- as I have not submitted any bugs yet, I am not sure
if I did it correctly on
http://sourceforge.net/tracker/? ...

(Could anyone check if my bug report succeeded? I was
submitting it from Lynx and probably missed something).

- I have not found anything like the example above in
the known bugs. If "\10" is not supported, then the correct
behaviour is to return it as is (or return something!), not just
hang.

Sincerely yours, Roman Suzi
-- 
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Thursday, March 08, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Gun Control: Keep muzzle pointed at target." _/


From mda@discerning.com  Thu Mar  8 18:14:27 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Thu, 8 Mar 2001 10:14:27 -0800
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke> <200103080743.f287hZ001784@mira.informatik.hu-berlin.de>
Message-ID: <01b201c0a7fb$af0ba620$9200a8c0@mdaxke>

> A different parser is used on Windows and Unix, so file:///etc/passwd
> could mean different things on Windows and Unix. On Windows, it might
> be ill-formed: for an absolute path, you need a drive letter (or else
> you need to learn the current drive based on some magic processing
> context); or it could mean \\etc\passwd (i.e. etc being the topmost
> hierarchy level, if you allow file: URLs to denote UNC names). On
> Unix, it clearly means /etc/passwd.

it is certainly the case that interpretation of the path portion is server-specific.

what is bothering me is that assembly of a url from its scheme,net_loc,path components
(or parsing a url into those components) would seemingly have to know about the server OS,
just  to know what to do with host-path separator slash, which is sometimes significant and sometimes not.

but maybe it is all ok....

on the client, suppose i am given a server-specific host path (c:\autoexec.bat or /etc/passwd)
and want to make a url. so i follow the rule
   C1. if the path starts with /, prepend file://
   C2. else prepend file:///

on the server, suppose i am given a file: url. So i follow these rules:
   S1. if there are exactly 0 or 1 slashes after file:, remove file: and take the rest to be the path, possibly relative
   S2. else if there are exactly 2 slashes after file:, error
   S3. else if there are 3 or more slashes after file:, remove file:/// and consider the remainder:
      a. if the remainder starts with a system-specific file system root (such as / or c: or c| or \\), use the string as the
absolute path
      b. else prepend "/" and use that string as the absolute path

would that work?
note that this treats backward slashes like any other character.
server rule S1 is to allow the convenience of just prepending "file:" in front of anything, although
clients obeying the client rules above would never do that.
it also introduces a (non-rfc) convention for sending a relative file url.

-mda


From mda@discerning.com  Thu Mar  8 19:43:37 2001
From: mda@discerning.com (Mark D. Anderson)
Date: Thu, 8 Mar 2001 11:43:37 -0800
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke> <200103072147.f27Llwm01178@mira.informatik.hu-berlin.de> <00f501c0a767$1fd75650$9200a8c0@mdaxke> <200103080655.f286tVd01338@mira.informatik.hu-berlin.de>
Message-ID: <01d401c0a808$238fb840$9200a8c0@mdaxke>

> How does that work? Are these other perls also using perl56.dll? If
> they had been using, say, perl55.dll, are the binary modules not
> linked with perl55.dll? If they are, how does perl manage to use
> perl56.dll and perl55.dll simultaneously?

it just works because perl hasn't changed for a while. they are all using perl56.
in retrospect, i do think perl and python are not that much different here.
i can't compare how much functionality perl chose to insert in the 5.6 patch series
as compared to python 2.0 patch series.

> So you are saying that different perl versions share the same toplevel
> lib directory, but do not share any library files? Why is that a good
> thing?

perl has a lib/site_perl hierarchy and a lib hierarchy. both are searched, and both
may be updated using cpan.
nominally, the lib hierarchy is for modules that come bundled with the perl distro, and
site/lib is for ones that are add-ons, although this isn't entirely true for reasons probably having
to do with activestate. (yes, site_perl is located inside lib, but pretend that isn't true).

in both cases, all modules embed the perl version in their path, and for binary modules
(but not pure perl modules), the OS name is also embedded.
here is an excerpt from a perl-5.6 installation on linux.

./lib/5.6.0/CGI.pm
./lib/5.6.0/CPAN.pm

./lib/5.6.0/i686-linux/Data/Dumper.pm
./lib/5.6.0/i686-linux/auto/Data/Dumper/Dumper.so

./lib/site_perl/5.6.0/URI/file/Base.pm
./lib/site_perl/5.6.0/URI/file/Unix.pm

./lib/site_perl/5.6.0/i686-linux/SQL/Statement.pm
./lib/site_perl/5.6.0/i686-linux/Storable.pm

./lib/site_perl/5.6.0/i686-linux/auto/Storable/Storable.so
./lib/site_perl/5.6.0/i686-linux/auto/SQL/Statement/Statement.so

the separation of "lib" from "lib/site_perl" lets you separately upgrade (or rollback) your perl distribution
from whatever site-specific addons you have.
the embedding of the OS name allows you to overlay multiple operating systems in your site_perl
area (say, hpux, linux, and freebsd) for a single perl version, which may be convenient either
for a single developer or multiple developers sharing a mounted install.
by default, perl of some version X will attempt to load the module in the version directory
which is most recent but not more recent than X. So a 5.6.0 perl would attempt to load a 5.005
module if there was none in the 5.6.0 tree, but a 5.005 perl would ignore all 5.6.0 modules.
In some cases, a binary module (or even a pure-perl module) might not be compatible with
a newer perl. In that case, this algorithm would cause a runtime failure. The solution
is to either make a change in the configuration for the perl, or to simply install a more recent module
for that perl.

this was a change from pre-5.6, and remains poorly documented.
the best discussion is probably "Coexistence with earlier versions of perl5" in the INSTALL file.
it also has its limitations; for example a person could build the same version of perl ("5.6.0")
in multiple ways (say with threads and sfio) and probably get into trouble if some modules were shared.

note that at the perl source level, perl has other versioning facilities.
perl allows any program to require a minimum perl core version ("require 5.005").
you can also require a minimum version from any module being imported (assuming that module exports
a version, for example "use XML::Simple qw(1.04)".
the request for a particular minimum version of a module has no affect on the search, but
will abort if the search doesn't find one desired.

with my limited python experience i haven't yet seen analogous abilities to declare an assumed
python core version, or require a minimum version from another module.
but i realize the purpose of this mailing list is not to educate me in python :).

-mda


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 21:14:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 22:14:36 +0100
Subject: [XML-SIG] saxlib, xml, _xmlplus, etc.
In-Reply-To: <01d401c0a808$238fb840$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <001101c0a72f$761f2cf0$9200a8c0@mdaxke> <200103072147.f27Llwm01178@mira.informatik.hu-berlin.de> <00f501c0a767$1fd75650$9200a8c0@mdaxke> <200103080655.f286tVd01338@mira.informatik.hu-berlin.de> <01d401c0a808$238fb840$9200a8c0@mdaxke>
Message-ID: <200103082114.f28LEaJ01267@mira.informatik.hu-berlin.de>

Hi Mark,

Thanks for your elaboration of perl versioning mechanics. I agree that
the Python workings appear to be quite similar.

> with my limited python experience i haven't yet seen analogous
> abilities to declare an assumed python core version, or require a
> minimum version from another module.

Sure there is

import sys
assert sys.version_info > (2,0) # requires Python 2.0 or better

In fact, this is how the _xmlplus hack works. xml/__init__ has

_MINIMUM_XMLPLUS_VERSION = (0, 6, 1)
...
        v = _xmlplus.version_info
        if v >= _MINIMUM_XMLPLUS_VERSION:
            import sys
            sys.modules[__name__] = _xmlplus

This only installs "_xmlplus" as "xml" if _xmlplus is recent enough.
If you study that code, you'll notice that it also deals with the case
of old PyXML versions, which did not provide version_info.

> but i realize the purpose of this mailing list is not to educate me
> in python :).

It's ok, we are back to your original question.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Mar  8 21:05:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Mar 2001 22:05:36 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <01b201c0a7fb$af0ba620$9200a8c0@mdaxke>
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke> <200103080743.f287hZ001784@mira.informatik.hu-berlin.de> <01b201c0a7fb$af0ba620$9200a8c0@mdaxke>
Message-ID: <200103082105.f28L5ai01262@mira.informatik.hu-berlin.de>

> what is bothering me is that assembly of a url from its
> scheme,net_loc,path components (or parsing a url into those
> components) would seemingly have to know about the server OS, just
> to know what to do with host-path separator slash, which is
> sometimes significant and sometimes not.

It is, in general, not possible to interpret the file: URL on another
but the local system. In fact, I cannot think of a single system where
it *is* possible.

> on the client, suppose i am given a server-specific host path
> (c:\autoexec.bat or /etc/passwd)

Not sure what the client and the server is here.

> and want to make a url. so i follow the rule
>    C1. if the path starts with /, prepend file://
>    C2. else prepend file:///

No. On DOS, build a list of components, starting with drive,
directory, ... On Unix, build a list of components, starting with
directory, directory, ... Then join the components with slashes. Put
your machine name in front of it if you want, or else leave it blank.

If you meant to take the local filename literally, it would not work
if the file name uses characters that are reserved in URLs.

> on the server, suppose i am given a file: url. So i follow these rules:
>    S1. if there are exactly 0 or 1 slashes after file:, remove file: and take the rest to be the path, possibly relative
>    S2. else if there are exactly 2 slashes after file:, error
>    S3. else if there are 3 or more slashes after file:, remove file:/// and consider the remainder:
>       a. if the remainder starts with a system-specific file system root (such as / or c: or c| or \\), use the string as the
> absolute path
>       b. else prepend "/" and use that string as the absolute path
> 
> would that work?

No. For Windows NT and Unix, it would probably work. On the Mac, it
probably wouldn't - you'll have to replace slashes in the path with
colons. On VMS, using the example from the RFC, it probably would fail
as well.

Regards,
Martin


From tpassin@home.com  Fri Mar  9 04:34:28 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 8 Mar 2001 23:34:28 -0500
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <008a01c0a769$8d6cdb20$7cac1218@reston1.va.home.com> <200103080712.f287Cx401471@mira.informatik.hu-berlin.de>
Message-ID: <000801c0a852$3b3cd140$7cac1218@reston1.va.home.com>

Martin v. Loewis" writes,

> I can see the point of your proposal, and I agree it is in the spirit
> of the RFC. I'd avoid implementing it until it can be established that
> MS software works in the same way.
>
I just tested the following combinations usng IE5.5 on Win98:

OK (i.e., it works):
file:///D:\temp\xxx.html
D:\temp\xxx.html
D:/temp/xxx.html
file:/D:\temp\xxx.html
file:D:/temp/xxx.html
file:///D:/temp/xxx.html
file:///D|/temp/xxx.html
file:///D|\temp\xxx.html
file://localhost/D:/temp/xxx.html
file://localhost/D:\temp\xxx.html

Not OK:
D|\temp\xxx.html


On NS4.08,

OK:
file:///D:\temp\xxx.html
D:\temp\xxx.html
D:/temp/xxx.html
file:/D:\temp\xxx.html
file:///D:/temp/xxx.html
file:///D|/temp/xxx.html
file:///D|\temp\xxx.html
file://localhost/D:/temp/xxx.html
file://localhost/D:\temp\xxx.html


Not OK:
file:D:/temp/xxx.html (doesn't work)

for D|\temp\xxx.html , NS thought it was a real url and tried to do a DNS
lookup on it.(Huh???)

Pretty amazing, eh?  Looks like they are following the maxim, write strict,
accept loose.

Does anyone think we should go to these extremes?

Shaking-his-head-in-wonder-ly,

Tom P


From tpassin@home.com  Fri Mar  9 04:40:47 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 8 Mar 2001 23:40:47 -0500
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke> <200103080743.f287hZ001784@mira.informatik.hu-berlin.de> <01b201c0a7fb$af0ba620$9200a8c0@mdaxke>
Message-ID: <001501c0a853$1c8b0a40$7cac1218@reston1.va.home.com>

Mark D. Anderson wrote -

> on the server, suppose i am given a file: url. So i follow these rules:
>    S1. if there are exactly 0 or 1 slashes after file:, remove file: and
take the rest to be the > path, possibly relative

If it is a relative path, it can't have the "file:' part, since that was
already established by the base url.  Conversely, if the url starts with
"file:", it must be absolute, as best as I can see.

Cheers,

Tom P


From tpassin@home.com  Fri Mar  9 04:48:12 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 8 Mar 2001 23:48:12 -0500
Subject: [XML-SIG] file urls in urllib
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke> <200103080743.f287hZ001784@mira.informatik.hu-berlin.de> <01b201c0a7fb$af0ba620$9200a8c0@mdaxke>
Message-ID: <001b01c0a854$2625bae0$7cac1218@reston1.va.home.com>

Mark D. Anderson wrote -

>
> what is bothering me is that assembly of a url from its scheme,net_loc,path
components
> (or parsing a url into those components) would seemingly have to know about
the
> server OS,
> just  to know what to do with host-path separator slash, which is sometimes
significant and sometimes not.
>
If you are getting a file by http, you ALWAYS use forward slashes, no volume
name, and the "http:" scheme.  No server-os ambiguity here.

The only time this issue would arise is when you want to load files on your
own machine or on a networked file system connected to your machine.  In this
case, you presumably know the right form.

The real issue, I think, is for the handler to know when it encounters an
opaque file: path, so that it can send it as is to the OS.  Otherwise, if the
url follows the rfc for transparent file: urls, use forward slashes and the
volume designator (c:/ on Windows, for example).  The handler is supposed to
be able to parse this and translate it for the OS it is running on.

The other issue is to decide how lenient we want to be in allowing variant
forms.  Any one know about what works and doesn't on a Mac?  And will this
change with OS-X?

Cheers,

Tom P


From martin@loewis.home.cs.tu-berlin.de  Fri Mar  9 07:28:18 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 9 Mar 2001 08:28:18 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <001501c0a853$1c8b0a40$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <200103072215.f27MFHN01380@mira.informatik.hu-berlin.de> <0103071744111X.28858@fermi.eeel.nist.gov> <010901c0a769$50e0d350$9200a8c0@mdaxke> <000a01c0a770$62c187c0$7cac1218@reston1.va.home.com> <012301c0a701$4aa61c60$9200a8c0@mdaxke> <200103080743.f287hZ001784@mira.informatik.hu-berlin.de> <01b201c0a7fb$af0ba620$9200a8c0@mdaxke> <001501c0a853$1c8b0a40$7cac1218@reston1.va.home.com>
Message-ID: <200103090728.f297SIq01298@mira.informatik.hu-berlin.de>

> If it is a relative path, it can't have the "file:' part, since that was
> already established by the base url.  Conversely, if the url starts with
> "file:", it must be absolute, as best as I can see.

That is my understanding as well.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Mar  9 07:27:44 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 9 Mar 2001 08:27:44 +0100
Subject: [XML-SIG] file urls in urllib
In-Reply-To: <000801c0a852$3b3cd140$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <027601c0a68f$7abee980$9200a8c0@mdaxke> <200103070716.f277Gds01961@mira.informatik.hu-berlin.de> <003b01c0a735$b2b78210$9200a8c0@mdaxke> <008a01c0a769$8d6cdb20$7cac1218@reston1.va.home.com> <200103080712.f287Cx401471@mira.informatik.hu-berlin.de> <000801c0a852$3b3cd140$7cac1218@reston1.va.home.com>
Message-ID: <200103090727.f297Rif01296@mira.informatik.hu-berlin.de>

> I just tested the following combinations usng IE5.5 on Win98:
> 
> OK (i.e., it works):
> file:///D:\temp\xxx.html
> D:\temp\xxx.html
> D:/temp/xxx.html
> file:/D:\temp\xxx.html
> file:D:/temp/xxx.html
> file:///D:/temp/xxx.html
> file:///D|/temp/xxx.html
> file:///D|\temp\xxx.html
> file://localhost/D:/temp/xxx.html
> file://localhost/D:\temp\xxx.html

Thanks for these investigations. That seems to confirm that atleast

file:///D:/temp/xxx.html

is accepted as a URL, so I think urllib should accept it as well. As
for the others, I noticed one aspect that seems to have escaped (pun
intended) in the discussion so far: According to RFC 1738, both | and
\ are *unsafe*. That means they MUST be escaped in an URL (also the
rfc only writes "must"); in turn, the proper form of some of the
others would be 

file:///D%7C/temp/xxx.html
file:///D%7C%5Ctemp%5Cxxx.html

> Pretty amazing, eh?  Looks like they are following the maxim, write
> strict, accept loose.

I'd like urllib to follow that as well; the strict case probably being
the one with the forward slashes (as the required escaping for the
REVERSE SOLIDUS and the VERTICAL LINE looks ugly). Please note that
urllib.quote quotes the COLON, although this is not required by the
RFC: only if the colon was reserved by the scheme, it would need to be
quoted.

As for accepting: We should atleast accept what is clearly conforming
to the RFC, i.e. the forms starting with file://<optional host>/; we
should probably accept that not everything that should be quoted
is. We also need backwards compatibility, so the forms using the
vertical line should be accepted.

Regards,
Martin


From nobody@sourceforge.net  Fri Mar  9 12:15:03 2001
From: nobody@sourceforge.net (nobody)
Date: Fri, 09 Mar 2001 04:15:03 -0800
Subject: [XML-SIG] [ pyxml-Bugs-407288 ] tabs inside attribute values removed
Message-ID: <E14bLnT-0000HR-00@usw-sf-web1.sourceforge.net>

Bugs #407288, was updated on 2001-03-09 04:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407288&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Luke Kenneth Casson Leighton
Assigned to: Nobody/Anonymous
Summary: tabs inside attribute values removed

Initial Comment:
i am having to pre-process all text, substituting
&#x09; for "\t" as a work-around for this problem.

if this is not performed, then all tabs inside
attribute's values, e.g.
<node attr="value\tsep\tby\ttabs"/>, are turned into
spaces.

i am storing python code in an attribute value, so i
_must_ have my tabs!!! :) :)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407288&group_id=6473


From Eugene.Leitl@lrz.uni-muenchen.de  Fri Mar  9 17:05:15 2001
From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 9 Mar 2001 18:05:15 +0100 (MET)
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
Message-ID: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de>

Excuse me if I'm on crack, but is it possible to dump a DOM (i.e. an
object tree representation of the XML document) XML parser skeleton
(preferably in Python, but C++ and Java would be also welcome), using a
DTD as input?

If it is possible, has it been done? With a free tool?

TIA,
-- Eugene


From jeremy.kloth@fourthought.com  Fri Mar  9 17:16:00 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Fri, 09 Mar 2001 10:16:00 -0700
Subject: [XML-SIG] Re: tabs inside attribute values removed
References: <E14bLnT-0000HR-00@usw-sf-web1.sourceforge.net>
Message-ID: <3AA90FD0.B5777324@fourthought.com>


> i am having to pre-process all text, substituting
> &#x09; for "\t" as a work-around for this problem.
> 
> if this is not performed, then all tabs inside
> attribute's values, e.g.
> <node attr="value\tsep\tby\ttabs"/>, are turned into
> spaces.

Using PyXML 0.6.4, I didn't see this behavior.

from xml.dom.ext.reader import Sax2
doc = Sax2.FromXml('<element attr="a&#x09;tab"/>')
attr = doc.documentElement.attributes.item(0)
print repr(attr.value)
'a\011tab'


-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Fri Mar  9 21:27:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 9 Mar 2001 22:27:47 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de>
 (message from Eugene Leitl on Fri, 9 Mar 2001 18:05:15 +0100 (MET))
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de>
Message-ID: <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de>

> Excuse me if I'm on crack, but is it possible to dump a DOM (i.e. an
> object tree representation of the XML document) XML parser skeleton
> (preferably in Python, but C++ and Java would be also welcome), using a
> DTD as input?

Hard to say, I don't even understand the question. What is a "DOM XML
parser skeleton"? And how would you like to "dump" it? If you are
asking whether you can convert a DOM tree into an XML document -
certainly, you don't even need a DTD as input.

Regards,
Martin


From Eugene.Leitl@lrz.uni-muenchen.de  Fri Mar  9 22:19:11 2001
From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de)
Date: Fri, 09 Mar 2001 23:19:11 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de>
Message-ID: <3AA956DF.EAC34D7D@lrz.uni-muenchen.de>

"Martin v. Loewis" wrote:

> Hard to say, I don't even understand the question. What is a "DOM XML
> parser skeleton"? And how would you like to "dump" it? If you are

It is a program that parses XML files in a certain fashion, by creating
a tree of objects (so it has to be an OO language it dumps) representing 
the structure of the XML file. It is a skeleton because it just does that, 
as lacking true understanding of my further intentions it has no clue 
as what I'm going to do with the data created from the parsing of the 
document, so it has to leave the action field blank, to be filled out by 
me. (Assuming (foolishly) that I know what I'm doing).

It is dumped because I'm asking for a program that will dump a program
(see above), when supplied with a DTD of the XML it is supposed to be
able to parse.

As I said, correct me if my glass pipe has burned out. I've been only 
checking out the whole XML thingy for the last couple of days.

> asking whether you can convert a DOM tree into an XML document -
> certainly, you don't even need a DTD as input.

No, I'm asking for a program that will dump a (skeleton of a, to be filled
in at earliest convenience) parser program, when supplied with the DTD of
the XML document.


From gtn@ebt.com  Sat Mar 10 00:23:05 2001
From: gtn@ebt.com (Gavin Thomas Nicol)
Date: Fri, 9 Mar 2001 19:23:05 -0500
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de>
Message-ID: <NCBBJNEMNEOKNGLADMAHOEFBCMAC.gtn@ebt.com>

> Hard to say, I don't even understand the question. What is a "DOM XML
> parser skeleton"? 

I'm not aware of anyone that has code... it probably exists
somewhere though.

Should be pretty trivial though. Take the DTD, compile it into
a state machine, and then split the state machine back out in
code.


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 10 07:00:41 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 10 Mar 2001 08:00:41 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: <3AA956DF.EAC34D7D@lrz.uni-muenchen.de>
 (Eugene.Leitl@lrz.uni-muenchen.de)
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de>
Message-ID: <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de>

> > Hard to say, I don't even understand the question. What is a "DOM XML
> > parser skeleton"? And how would you like to "dump" it? If you are
> 
> It is a program that parses XML files in a certain fashion, by creating
> a tree of objects (so it has to be an OO language it dumps) representing 
> the structure of the XML file. 

I get the feeling of being dumb here, since I still cannot understand
what you are asking for. Let me interpret it word-by-word.

You want to program that parses XML files: Well, there are plenty of
XML parsers, I can recommend PyXML. It shall create a tree of objects
...  I recommend to use a parser that creates a DOM tree: That is a
tree of objects. 

... representing the structure of the XML file. That I cannot
understand: Do you want the content of the XML file being represented
by the tree of objects (i.e. the tag names of the elements, their
attributes and attribute values, and strings for the text fragments in
the elements)? That is what the DOM does. If this is not what you
want, what is it about the "structure of the XML file" that you want
to be represented. E.g. given

<foo><bar/></foo>

what is the tree of objects that you want to get.

> It is a skeleton because it just does that, as lacking true
> understanding of my further intentions it has no clue as what I'm
> going to do with the data created from the parsing of the document,
> so it has to leave the action field blank, to be filled out by me.

The DOM tree is good for that - it has no understanding of your plans
to process the document.

> It is dumped because I'm asking for a program that will dump a program
> (see above), when supplied with a DTD of the XML it is supposed to be
> able to parse.

So you want to generate a program? Given a DTD? How about this program?

print "from xml.dom.ext.reader import Sax2"
print "import sys"
print "doc = Sax.FromXmlFile(sys.argv[1])"

When being executed, it will always generate the same program:

from xml.dom.ext.reader import Sax2
import sys
doc = Sax.FromXmlFile(sys.argv[1])

This is a program that can read an XML document and build a tree of
objects. The tree of objects is stored in a variable named doc. You
can give a DTD to the first program, but it is ignored as it is not
needed.

> No, I'm asking for a program that will dump a (skeleton of a, to be
> filled in at earliest convenience) parser program, when supplied
> with the DTD of the XML document.

The nice thing about XML is that you can parse it without a DTD, and
that you can furthermore use the same parser for all XML documents.

Regards,
Martin


From Eugene.Leitl@lrz.uni-muenchen.de  Sat Mar 10 10:29:34 2001
From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de)
Date: Sat, 10 Mar 2001 11:29:34 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de>
Message-ID: <3AAA020E.335812E@lrz.uni-muenchen.de>

"Martin v. Loewis" wrote:

> I get the feeling of being dumb here, since I still cannot understand
> what you are asking for. Let me interpret it word-by-word.

That's highly unlikely. I'm just trolling for clue, being forced
to learn XML in the course of a few days.

The company I'm with has the following ad hoc approach to XML: 
whip up some XML fitting the problem, don't bother with writing 
a DTD, code up a parser in an OO language, which recursively 
reads the tags into memory, creating a hierarchy/tree of objects. 
Fill in methods to deal with the data sitting in the tree, finis.

I looked at the way other people parse XML, and ran into DOM, which seemed
to imply the company has reinvented the wheel. I'm trying to understand
what Python DOM does (the regression test I ran yesterday did dump core, 
so I don't have a working installation up yet).
 
> You want to program that parses XML files: Well, there are plenty of
> XML parsers, I can recommend PyXML. It shall create a tree of objects
> ...  I recommend to use a parser that creates a DOM tree: That is a
> tree of objects.

Excellent. So, DOM parses the XML file (any well-formed XML file).
Because it is agnostic of what tags might be coming (since, as you
say, it doesn't need a DTD), it doesn't offer any hooks, calling a
matching method if a given tag is encountered.

So essentially, I wind up with a representation of the XML file
as tree of objects, which I process after the fact, right? Iirc,
DOM offers some helpful routines, allowing me to parse the tree.

So, where do I put my handler, interpreting the stuff as it passes
by? 

Let's say I have a reaction tree (molecule A is precursor of molecule B
is precursor of molecule C is educt of product Z) as result of a query. 
So building XML as a representation of it is quite natural, as it *is* 
a tree. I want to transform this into a variety of formats: mapping the
tree to a number of .png images layed out in a HTML table, or use a 
Tree Widget to paint a large bitmap, potentially with server-side 
clickable maps.

So, where does Python DOM offer me ways I can get at the data in
the object tree? 

> ... representing the structure of the XML file. That I cannot
> understand: Do you want the content of the XML file being represented
> by the tree of objects (i.e. the tag names of the elements, their
> attributes and attribute values, and strings for the text fragments in
> the elements)? That is what the DOM does. If this is not what you

This is what I need, yes.

> > It is a skeleton because it just does that, as lacking true
> > understanding of my further intentions it has no clue as what I'm
> > going to do with the data created from the parsing of the document,
> > so it has to leave the action field blank, to be filled out by me.
> 
> The DOM tree is good for that - it has no understanding of your plans
> to process the document.

Ok, very good, but where can I get at the data sitting there?
  
> from xml.dom.ext.reader import Sax2
> import sys
> doc = Sax.FromXmlFile(sys.argv[1])
> 
> This is a program that can read an XML document and build a tree of
> objects. The tree of objects is stored in a variable named doc. You
> can give a DTD to the first program, but it is ignored as it is not
> needed.
> 
> > No, I'm asking for a program that will dump a (skeleton of a, to be
> > filled in at earliest convenience) parser program, when supplied
> > with the DTD of the XML document.
> 
> The nice thing about XML is that you can parse it without a DTD, and
> that you can furthermore use the same parser for all XML documents.

Good, now I only need to get Python DOM pass the regression tests,
and find out how I can get at the data.


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 10 13:24:02 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 10 Mar 2001 14:24:02 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: <3AAA020E.335812E@lrz.uni-muenchen.de>
 (Eugene.Leitl@lrz.uni-muenchen.de)
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de>
Message-ID: <200103101324.f2ADO2g03086@mira.informatik.hu-berlin.de>

> I looked at the way other people parse XML, and ran into DOM, which seemed
> to imply the company has reinvented the wheel. I'm trying to understand
> what Python DOM does (the regression test I ran yesterday did dump core, 
> so I don't have a working installation up yet).

What operating system, what version of Python and PyXML? Python should
*never* coredump; at worst, you might get Python exceptions.

> Excellent. So, DOM parses the XML file (any well-formed XML file).

Indeed. You have the choice of either a validating parser (on that
looks at the DOCTYPE declaration in the document, and complains when
elements are used incorrectly), and a non-validating parser, one that
looks only for well-formedness.

In either case, you get the same DOM tree (well, almost - a validating
parser may fill in DEFAULT values of attributes from the DTD; a
non-validating parser won't normally).

> Because it is agnostic of what tags might be coming (since, as you
> say, it doesn't need a DTD), it doesn't offer any hooks, calling a
> matching method if a given tag is encountered.

Yes and no. The DOM does not call any callbacks. Instead, you give the
parser the document URL, and it gives you back a DOM tree; no
application interaction during parsing.

If you want event-oriented XML processing, you should study the SAX
interface. This calls your callback for every start and end tag, text
nodes, and so on. It does not build any kind of tree. In many XML
libraries, it is possible to implement a "DOM builder" on top of a
"SAX parser"; this is in fact how PyXML operates.

> So essentially, I wind up with a representation of the XML file
> as tree of objects, which I process after the fact, right?

Exactly.

> Iirc, DOM offers some helpful routines, allowing me to parse the
> tree.

Yes, depending on what exactly you want to do with the tree; not all
routines are helpful for all applications.

> So, where do I put my handler, interpreting the stuff as it passes
> by? 

You don't, unless you implement your own SAX content handler - which
either might or might not chose to build a DOM tree.

> I want to transform this into a variety of formats: mapping the
> tree to a number of .png images layed out in a HTML table, or use a 
> Tree Widget to paint a large bitmap, potentially with server-side 
> clickable maps.
> 
> So, where does Python DOM offer me ways I can get at the data in
> the object tree? 

The DOM itself offers standard accessor functions - they are not only
standard across Python DOM implementations, but also standard across
programming languages.

The "DOM Core" interface only provides accessor functions to
"navigate" the tree: Give me the name of the element (elem.tagName);
give me all the children (elem.childNodes), give me the next sibling,
give me the attribute named "atomWeight". There are some query
functions: give me all element nodes with a certain element name, ...

"DOM 2 Navigation" offers traversal interfaces. You might be tempted
to use those, but I suggest to work with the core interfaces only at
first; you'll find that it is quite easy to do your own traversal with
just the accessor functions.

Depending on the output format, it might be easy to write a SAX
ContentHandler.

Alternatively, if you can describe the output in terms of "for every
foo element write bar, then go to the child nodes, then write foobar",
it might be that XSLT is the right transformation language. There is
no single best way to process XML - the only rule is that nobody ever
writes his own parser, since that's already done.

> Good, now I only need to get Python DOM pass the regression tests,
> and find out how I can get at the data.

I'd rather recommend to look at the demos. It may be indeed that some
tests fail, e.g. when running PyXML on Python 1.5, which does not
support Unicode strings.

Regards,
Martin


From tpassin@home.com  Sat Mar 10 15:10:27 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 10 Mar 2001 10:10:27 -0500
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de>
Message-ID: <001801c0a974$3d868360$7cac1218@reston1.va.home.com>

<Eugene.Leitl@lrz.uni-muenchen.de> wrote -

You are mixing up several concepts or processing steps.

1) Parsing  xml.
This means to get hold of the structural elements of the xml document and give
them to another application for further processing.  There are many xml
parsers out there, come command line and some not.  It's almost certainly not
worth it to roll your own.

2) Creating a tree-like structure to represent the structure of the xml
document.
The DOM is an API for a tree-like representation.  Most major parsers out
there either include a DOM api or can work with another DOM API.  (SAX is a
non-DOM api, but the output of a sax processsor can be used to build a tree,
too).  The DOM is an object oriented api.

3) DOM manipulation, using the DOM api. There are already good processors that
can use the DOM api to manipulate and actual, populated DOM trees.  So don't
roll your own there, either.

4) You don't need a DTD, but it's a good idea to make one anyway because then
you can use a validating parser to check that the first xml examples that you
build are "valid" - i.e., put together correctly from a structural point of
view.  It's amazing how easy it is to accidently create something else besides
what you thought you were making.

Otherwise, you can start simple with no DTD and later define one after you
have some hands-on experience working with xml.

As Martin said, the  Python PyXML package is very good.  There's also the
Microsoft xml processor, which can be written to as a COM object, in VBscript,
or in Javascript.  There are several good java processors, and some good Perl
ones.  Python would be the quickest and easiest to use, especially if you are
not already up to speed in one of the other languages.  Even if you are,
Python will be faster and easier to use than one of the strongly typed
compiled languages like java.

Get a good book or two, like Wrox's Professional XML and XML in a Nutshell
from O'Reilly, to mention only two of the good ones out there.

>
> The company I'm with has the following ad hoc approach to XML:
> whip up some XML fitting the problem, don't bother with writing
> a DTD, code up a parser in an OO language, which recursively
> reads the tags into memory, creating a hierarchy/tree of objects.
> Fill in methods to deal with the data sitting in the tree, finis.
>
> I looked at the way other people parse XML, and ran into DOM, which seemed
> to imply the company has reinvented the wheel>
>
Yes, the wheel has already been invented.  But core dumps aren't going to be
very useful.  Do examples from a book or tutorial site, fix them til they run
right, then start morphing them closer to what you want to do.  You don't need
to try to understand a DOM tree from a core dump.  Learn about the api
instead.

Cheers,

Tom P


From Eugene.Leitl@lrz.uni-muenchen.de  Sat Mar 10 15:41:09 2001
From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de)
Date: Sat, 10 Mar 2001 16:41:09 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de> <001801c0a974$3d868360$7cac1218@reston1.va.home.com>
Message-ID: <3AAA4B15.E2D84D6F@lrz.uni-muenchen.de>

"Thomas B. Passin" wrote:

> You are mixing up several concepts or processing steps.

I realize that. It comes from being a newbie with a deadline
breathing down my neck.
 
> 1) Parsing  xml.
> This means to get hold of the structural elements of the xml document and give
> them to another application for further processing.  There are many xml
> parsers out there, come command line and some not.  It's almost certainly not
> worth it to roll your own.

I know that, but apparently not my senior cow-orkers. It's a C/C++ shop
with an occasional sprinking of Java, my choice of Python is purely
personal (note to myself: not to goof up this one).
 
Before I try selling them on the DOM thing, I'd rather know what I'm
doing. It cost them three days to whip up their object tree XML parser
in Java.

> 2) Creating a tree-like structure to represent the structure of the xml
> document.
> The DOM is an API for a tree-like representation.  Most major parsers out
> there either include a DOM api or can work with another DOM API.  (SAX is a
> non-DOM api, but the output of a sax processsor can be used to build a tree,
> too).  The DOM is an object oriented api.

They (said cow-orkers) insist on an object tree based approach.
 
> 3) DOM manipulation, using the DOM api. There are already good processors that
> can use the DOM api to manipulate and actual, populated DOM trees.  So don't
> roll your own there, either.

Does http://4suite.org/download.epy fill the ticket? The regression tests of it
dumped core on me at work, let's see whether I can get it running at home.
 
> 4) You don't need a DTD, but it's a good idea to make one anyway because then
> you can use a validating parser to check that the first xml examples that you
> build are "valid" - i.e., put together correctly from a structural point of
> view.  It's amazing how easy it is to accidently create something else besides
> what you thought you were making.

I think Emacs psgml mode will take care of that.
 
> Otherwise, you can start simple with no DTD and later define one after you
> have some hands-on experience working with xml.
> 
> As Martin said, the  Python PyXML package is very good.  There's also the

Downloading it now.

> Microsoft xml processor, which can be written to as a COM object, in VBscript,
> or in Javascript.  There are several good java processors, and some good Perl
> ones.  Python would be the quickest and easiest to use, especially if you are
> not already up to speed in one of the other languages.  Even if you are,
> Python will be faster and easier to use than one of the strongly typed
> compiled languages like java.
> 
> Get a good book or two, like Wrox's Professional XML and XML in a Nutshell
> from O'Reilly, to mention only two of the good ones out there.

I've gotten me Learning XML from ORA, which was a fresh wind in comparision to
SGML & XML Cookbook.

> Yes, the wheel has already been invented.  But core dumps aren't going to be
> very useful.  Do examples from a book or tutorial site, fix them til they run
> right, then start morphing them closer to what you want to do.  You don't need
> to try to understand a DOM tree from a core dump.  Learn about the api

The 4Suite DOM package dumped core on me when I was running regression tests as
part of the build. Perhaps I should try sticking with PyXML at first.

> instead.

Thanks for all the good advice.


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 10 18:33:30 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 10 Mar 2001 19:33:30 +0100
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: <3AAA4B15.E2D84D6F@lrz.uni-muenchen.de>
 (Eugene.Leitl@lrz.uni-muenchen.de)
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de> <001801c0a974$3d868360$7cac1218@reston1.va.home.com> <3AAA4B15.E2D84D6F@lrz.uni-muenchen.de>
Message-ID: <200103101833.f2AIXUB04062@mira.informatik.hu-berlin.de>

> Does http://4suite.org/download.epy fill the ticket? The regression
> tests of it dumped core on me at work, let's see whether I can get
> it running at home.

To install just PyXML, the download section (Letzte Dateireleases) on

http://sourceforge.net/projects/pyxml

should be sufficient; 4suite.org offers the full 4Suite set of
libraries.

Regards,
Martin


From tpassin@home.com  Sat Mar 10 17:53:34 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 10 Mar 2001 12:53:34 -0500
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de> <001801c0a974$3d868360$7cac1218@reston1.va.home.com> <3AAA4B15.E2D84D6F@lrz.uni-muenchen.de>
Message-ID: <003001c0a98b$07a9f120$7cac1218@reston1.va.home.com>

<Eugene.Leitl@lrz.uni-muenchen.de> wrote -

>
> Before I try selling them on the DOM thing, I'd rather know what I'm
> doing. It cost them three days to whip up their object tree XML parser
> in Java.
>
Yes, it's easy to make a basic xml parser, and it's easy to come up with a
tree structure.  Lots of us have done something like this.  But there are a
lot of  specialized wrinkles to xml.  If you are only ever going to work with
your own xml, it may not matter.  But if you want to work with xml produced by
others, it may use features that require these wrinkles.  Your home-grown
parser and tree structure likely won't handle them all.  Handling of external
entities, namespaces, whitespace normalization, character encodings, and CDATA
sections are some of these wrinkles that can get tricky.

Also, if you use your own tree API, you won't be able to make use of other
software that uses the DOM, like xslt, xpath,xpointer, etc. (I'm not sure how
many of these are out yet in C++, but they will be coming).

> > 2) Creating a tree-like structure to represent the structure of the xml
> > document.
> > The DOM is an API for a tree-like representation.  Most major parsers out
> > there either include a DOM api or can work with another DOM API.  (SAX is
a
> > non-DOM api, but the output of a sax processsor can be used to build a
tree,
> > too).  The DOM is an object oriented api.
>
> They (said cow-orkers) insist on an object tree based approach.
>

Oh, yes, a tree approach is fine for a lot of things.  Takes a lot of memory
if you have a large chunk of xml.  It isn't so much the tree as the api for it
that you probably want to concentrate on first.

Cheers,

Tom P


From ken@bitsko.slc.ut.us  Sat Mar 10 18:55:02 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 10 Mar 2001 12:55:02 -0600
Subject: [XML-SIG] dumping an XML parser skeleton from DTD input
In-Reply-To: Eugene.Leitl@lrz.uni-muenchen.de's message of "Sat, 10 Mar 2001 16:41:09 +0100"
References: <Pine.GSO.4.03.10103091800500.26053-100000@sun1.lrz-muenchen.de> <200103092127.f29LRlR00921@mira.informatik.hu-berlin.de> <3AA956DF.EAC34D7D@lrz.uni-muenchen.de> <200103100700.f2A70fK01248@mira.informatik.hu-berlin.de> <3AAA020E.335812E@lrz.uni-muenchen.de> <001801c0a974$3d868360$7cac1218@reston1.va.home.com> <3AAA4B15.E2D84D6F@lrz.uni-muenchen.de>
Message-ID: <x5bsr9qws9.fsf@jess.bitsko.slc.ut.us>

Eugene.Leitl@lrz.uni-muenchen.de writes:

> "Thomas B. Passin" wrote:
> 
> > You are mixing up several concepts or processing steps.
> 
> I realize that. It comes from being a newbie with a deadline
> breathing down my neck.
>  
> > 1) Parsing  xml.

> > This means to get hold of the structural elements of the xml
> > document and give them to another application for further
> > processing.  There are many xml parsers out there, come command
> > line and some not.  It's almost certainly not worth it to roll
> > your own.
> 
> I know that, but apparently not my senior cow-orkers. It's a C/C++
> shop with an occasional sprinking of Java, my choice of Python is
> purely personal (note to myself: not to goof up this one).
>  
> Before I try selling them on the DOM thing, I'd rather know what I'm
> doing. It cost them three days to whip up their object tree XML
> parser in Java.
> 
> > 2) Creating a tree-like structure to represent the structure of
> > the xml document.  The DOM is an API for a tree-like
> > representation.  Most major parsers out there either include a DOM
> > api or can work with another DOM API.  (SAX is a non-DOM api, but
> > the output of a sax processsor can be used to build a tree, too).
> > The DOM is an object oriented api.
> 
> They (said cow-orkers) insist on an object tree based approach.

Note that DOM objects are a raw, in-memory version of the XML document
(objects representing XML elements, attributes, text nodes).  What you
(or your coworkers) are probably wanting are normal application
objects exported and imported via XML.

The way your coworkers seemed to have started is to create a unique
XML format for each application object or file, and then write
per-file importers and exporters for each format.

As you suspected, there is probably a way to refactor this code so
that you need only have one importer and exporter regardless of which
application objects or file format is used.  Your first post suggested
having some kind of "DTD compiler" that could digest a DTD and produce
a per-file "parser" for you, for reading in arbitrary XML.

Practically speaking, that's a hard problem.  The difficulty is that
each XML format is being created "by hand" unique and tweaked to each
application object, you're expecting some kind of compiler to
generalize the XML and re-create usable application objects from the
various uniquely designed formats.

So what's the easy way?  Instead of creating a unique format by hand
for each application object, create a set of generic encoding rules
for converting any type of object into XML, and then write a parser to
read the generic XML and convert it into objects.

SOAP is one such set of encoding rules (SOAP Section 5, to be exact),
and if you're comfortable with using the SOAP libraries to read and
write XML, I would highly recommend going that way.  The problem is
that most SOAP libraries are a little tedious to use for "just
serializing objects" (thinking of Apache Java SOAP here in
particular).

To roll your own, you just need a set of simple rules for encoding.
Here's an example XML:

  <top>
    <field1>A simple value in a record, structure, or object</field1>
    <field2 isArray="1">
      <item>A simple value in a list</item>
      <item>
        <subfield1>A simple value, in a strcture, in a list</subfield1>
        <subfield2>12345</subfield2>
      </item>
    </field2>
    <field3>
      <subfieldA>A simple value, in a structure, in a structure</subfieldA>
      <subfieldB>12345</subfieldB>
    </field3>
  </top>

The rules are:

  1) If an XML element contains subelements, then the value is an
     array or a structure.

  2) The sub-element names of structures (objects) are the field, key,
     or member names of the structure or object.

  3) An array is indicated by an attribute isArray="1".

  4) The sub-element names of an array are arbitrary, so you can pick
     something like <item>.

  5) If an element has no sub-elements, then that element is a simple
     value (a string, integer, date, whatever).

I didn't put this in the example, but it's easiest to store type
information for every element, whether it be a class name on a
structure or list, or a simple value type (string, integer, date) on a
simple value.  Use an attribute like type="someType".

Here's the relevant part of a decoder for this format, converted by
hand from the Orchard SOAP parser[1], it should give you a start.
Note that it's not trying to decode the class names of objects, but
when you want to do that, add the code to the endElement handler in
the 'else' clause of the 'if utype is _CHAR'.

import xml.sax

# just constants
_DICT = "dict"
_ARRAY = "array"
_CHAR = "char"

class Unpickler:
    def __init__(self, file):
        self.file = file

    def load(self):
        self.parse_value_stack = [ {} ]
        self.parse_utype_stack = [ _DICT ]
        self.parse_type_stack = [ ]

        parser = xml.sax.make_parser()
        parser.setContentHandler(self)
        parser.setErrorHandler(self)
        parser.parse(file)
        object = self.parse_value_stack[0]
        delattr(self, 'parse_value_stack')
        return object

    def startElement(self, name, atts):
        self.chars = ""

        type = None
        if atts.has_key('type'):
            type = atts['type']
        self.parse_type_stack.append(type)

        if atts.has_key('isArray'):
            self.parse_utype_stack.append(_ARRAY)
            self.parse_value_stack.append( [ ] )
        else:
            # will be set to _DICT if a sub-element is found
            self.parse_utype_stack.append(_CHAR)

    def endElement(self, name):
        type = self.parse_type_stack.pop()
        utype = self.parse_utype_stack.pop()

        if utype is _CHAR:
            if type == 'integer':
                value = int(self.chars)
            elif type == 'float':
                value = float(self.chars)
            else:
                value = self.chars
        else:
            value = self.parse_value_stack.pop()

        # if we're in an element, and our parent element was defaulted
        # to _CHAR, then we're in a struct and we need to create that
        # dictionary.
        if self.parse_utype_stack[-1] is _CHAR:
            self.parse_value_stack.append( {} )
            self.parse_utype_stack[-1] = _DICT

        if self.parse_utype_stack[-1] is _DICT:
            self.parse_value_stack[-1][name] = value
        else:
            self.parse_value_stack[-1].append(value)

    def characters(self, chars):
        self.chars = self.chars + chars.data

    def startDocument(self): pass
    def endDocument(self): pass
    def ignorableWhitespace(self, ch, start, length): pass
    def processingInstruction(self, target, data): pass
    def error(self, exc): raise exc
    def fatalError(self, exc): raise exc
    def warning(self, exc): pass

In C++ or Java, you might consider having each class you expect to be
ex/imported from XML to have a constructor that accepts a dictionary
from the XML reader (to create the new object just read from XML) and
a method asDictionary() that will return the representation of the
object as a dictionary (to be written to XML).

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/SOAP.py>


From nobody@sourceforge.net  Sat Mar 10 20:35:36 2001
From: nobody@sourceforge.net (nobody)
Date: Sat, 10 Mar 2001 12:35:36 -0800
Subject: [XML-SIG] [ pyxml-Bugs-407587 ] ns_parse.py and ampersand
Message-ID: <E14bq5Q-0001K3-00@usw-sf-web2.sourceforge.net>

Bugs #407587, was updated on 2001-03-10 12:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407587&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Sam Lowry
Assigned to: Nobody/Anonymous
Summary: ns_parse.py and ampersand

Initial Comment:
ns_parse.py fails converting NN bookmarks if it
encounters ampersand sign in the href of a 
bookmark file, e.g. <a href="...&...">.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407587&group_id=6473


From nobody@sourceforge.net  Sat Mar 10 20:39:52 2001
From: nobody@sourceforge.net (nobody)
Date: Sat, 10 Mar 2001 12:39:52 -0800
Subject: [XML-SIG] [ pyxml-Bugs-407588 ] broken links on pyxml homepage
Message-ID: <E14bq9Y-0001Lz-00@usw-sf-web2.sourceforge.net>

Bugs #407588, was updated on 2001-03-10 12:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407588&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Sam Lowry
Assigned to: Nobody/Anonymous
Summary: broken links on pyxml homepage

Initial Comment:
The bug is self-explanatory ;-)

BTW, how can I subscribe to XML-SIG group?
Link at http://www.python.org/sigs/ and
at http://pyxml.sourceforge.net/ leadsto nowhere...

I've made a XSL for xbel that I want to share with
others.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407588&group_id=6473


From nobody@sourceforge.net  Sun Mar 11 04:31:06 2001
From: nobody@sourceforge.net (nobody)
Date: Sat, 10 Mar 2001 20:31:06 -0800
Subject: [XML-SIG] [ pyxml-Patches-407630 ] Fix ns_parse.py from XBEL to accept ampe
Message-ID: <E14bxVa-0000AM-00@usw-sf-web3.sourceforge.net>

Patches #407630, was updated on 2001-03-10 20:31
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=407630&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Uche Ogbuji
Assigned to: Uche Ogbuji
Summary: Fix ns_parse.py from XBEL to accept ampe

Initial Comment:
I submitted this patch before I was hacking at PyXML
itself, but I guess it vanished into the ether. 
Addresses bug at

http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407587&group_id=6473


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=407630&group_id=6473


From uche.ogbuji@fourthought.com  Sun Mar 11 04:49:54 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 10 Mar 2001 21:49:54 -0700
Subject: [XML-SIG] News on Sourceforge
Message-ID: <200103110449.VAA16481@localhost.localdomain>

The latest news on

https://sourceforge.net/projects/pyxml/

Is the 0.6.1 release back in October.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From carlos@eberhardt.net  Sun Mar 11 16:59:20 2001
From: carlos@eberhardt.net (Carlos Eberhardt)
Date: Sun, 11 Mar 2001 10:59:20 CST
Subject: [XML-SIG] PyXML-0.6.4 on BeOS
Message-ID: <20010311165641.275FE813E@conn.mc.mpls.visi.com>

Hello-

Just wanted to drop a note mentioning that the setup.py script fails 
under BeOS R5.0.3 (x86) due to the expat filemap stuff. BeOS doesn't 
have mmap (I guess), so it needs to use the readfilemap.c (like the mac 
setup):

# Use either unixfilemap or readfilemap depending on the platform
if sys.platform == 'win32':
    FILEMAP_SRC = 'extensions/expat/xmlwf/win32filemap.c'
elif sys.platform[:3] == 'mac':
    FILEMAP_SRC = 'extensions/expat/xmlwf/readfilemap.c'
elif sys.platform[:4] == 'beos':
    FILEMAP_SRC = 'extensions/expat/xmlwf/readfilemap.c'
else:
    # Assume all other platforms are Unix-compatible; this is almost
    # certainly wrong. :)
    FILEMAP_SRC = 'extensions/expat/xmlwf/unixfilemap.c'

(actually, I cheated and just set the FILEMAP_SRC in the else block to 
use readfile map, but I would assume adding the check for beos would do 
the trick as well)

... Just FYI!

Thanks for all the hard work!

Carlos
carlos@eberhardt.net
 


From guido@digicool.com  Sun Mar 11 21:33:59 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 11 Mar 2001 16:33:59 -0500
Subject: [XML-SIG] News on Sourceforge
In-Reply-To: Your message of "Sat, 10 Mar 2001 21:49:54 MST."
 <200103110449.VAA16481@localhost.localdomain>
References: <200103110449.VAA16481@localhost.localdomain>
Message-ID: <200103112133.QAA13056@cj20424-a.reston1.va.home.com>

> The latest news on
> 
> https://sourceforge.net/projects/pyxml/
> 
> Is the 0.6.1 release back in October.

As a sworn-in developer, you should be able to submit a news item to
fix this!  Go to "News" and then click on "Submit".  If you can't,
one of the project admins (e.g. Fred, Andrew or Martin) should do it,
or they can give you permission to submit new news items by going into
the Admin page.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From uche.ogbuji@fourthought.com  Sun Mar 11 21:58:01 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 11 Mar 2001 14:58:01 -0700
Subject: [XML-SIG] News on Sourceforge
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Sun, 11 Mar 2001 16:33:59 EST." <200103112133.QAA13056@cj20424-a.reston1.va.home.com>
Message-ID: <200103112158.OAA07682@localhost.localdomain>

> > The latest news on
> > 
> > https://sourceforge.net/projects/pyxml/
> > 
> > Is the 0.6.1 release back in October.
> 
> As a sworn-in developer, you should be able to submit a news item to
> fix this!  Go to "News" and then click on "Submit".  If you can't,
> one of the project admins (e.g. Fred, Andrew or Martin) should do it,
> or they can give you permission to submit new news items by going into
> the Admin page.

Yes.  I should have completed my question.  I'm never sure what only admins 
can do and what only mere developers can.  The impression I've developed is 
that all I can do is check in code, which is why I didn't look to add the news 
items myself.

If I find that I do have permissions, I'll do so.

More importantly, it would be nice for whoever is releasing a PyXML package to 
update SF at the same time.  Of course it's hard to remember such things, so 
perhaps we need to make up a release check-list.

My first attempt:

*  Ask all developers to check in (say 72 hours before planned release)

  - Note: I actually had some fixes in my local repo that would have been nice 
to get into 0.6.4 (they're in now).  I guess I should just check in more often.

*  Check all test suites (all are in the test directory, except for 
PyXML/xml/dom/ext/reader/test_suite/Benchmark.py, which looks as if it should 
just be nuked)

*  Update any docs

*  Draft announcement

*  Update SF page

Anything else?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From guido@digicool.com  Sun Mar 11 22:32:47 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 11 Mar 2001 17:32:47 -0500
Subject: [XML-SIG] News on Sourceforge
In-Reply-To: Your message of "Sun, 11 Mar 2001 14:58:01 MST."
 <200103112158.OAA07682@localhost.localdomain>
References: <200103112158.OAA07682@localhost.localdomain>
Message-ID: <200103112232.RAA13985@cj20424-a.reston1.va.home.com>

> > As a sworn-in developer, you should be able to submit a news item to
> > fix this!  Go to "News" and then click on "Submit".  If you can't,
> > one of the project admins (e.g. Fred, Andrew or Martin) should do it,
> > or they can give you permission to submit new news items by going into
> > the Admin page.
> 
> Yes.  I should have completed my question.  I'm never sure what only
> admins can do and what only mere developers can.  The impression
> I've developed is that all I can do is check in code, which is why I
> didn't look to add the news items myself.

Actually, it's up to the admins to give the "mere" developers
additional permissions.  In the Python project, it is a policy to give
all developers all permissions -- because in our view checkin
permission (which every developer has) is more powerful than any of
the sourceforge admin things, so why not give everybody all
permissions!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Sun Mar 11 18:07:27 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 11 Mar 2001 19:07:27 +0100
Subject: [XML-SIG] PyXML-0.6.4 on BeOS
In-Reply-To: <20010311165641.275FE813E@conn.mc.mpls.visi.com>
 (carlos@eberhardt.net)
References: <20010311165641.275FE813E@conn.mc.mpls.visi.com>
Message-ID: <200103111807.f2BI7Rn03851@mira.informatik.hu-berlin.de>

> elif sys.platform[:4] == 'beos':
>     FILEMAP_SRC = 'extensions/expat/xmlwf/readfilemap.c'

Thanks, I've added this to my local copy of setup.py.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sun Mar 11 22:57:41 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 11 Mar 2001 23:57:41 +0100
Subject: [XML-SIG] News on Sourceforge
In-Reply-To: <200103112158.OAA07682@localhost.localdomain> (message from Uche
 Ogbuji on Sun, 11 Mar 2001 14:58:01 -0700)
References: <200103112158.OAA07682@localhost.localdomain>
Message-ID: <200103112257.f2BMvfh00991@mira.informatik.hu-berlin.de>

> More importantly, it would be nice for whoever is releasing a PyXML
> package to update SF at the same time.  Of course it's hard to
> remember such things, so perhaps we need to make up a release
> check-list.

I'm actually following a checklist; the one at the top of the ANNOUNCE
file. So far, non of the 0.6.x releases did *all* of the release
procedure steps; that was intentional on my part as I otherwise would
have released nothing (release early, release often).

E.g. in 0.6.4, for the first time, I put a notice on freshmeat. That
took quite some time in itself, as I had to get a freshmeat account,
find the name that freshmeat uses for the package, and update all the
outdated information (the last freshmeat announcement was in the 0.5.x
series, by amk).

As for SF announcements, after posting the 0.6.1 one, I found that it
might be pointless - only people looking at the project page will see
it, and they see what the recent release is by looking just above that
field. It might be useful to post other announcements there, e.g. when
important check-ins occur, or related software is released :-)

> *  Ask all developers to check in (say 72 hours before planned release)
> 
>   - Note: I actually had some fixes in my local repo that would have
> been nice to get into 0.6.4 (they're in now).  I guess I should just
> check in more often.

For 0.6.4, I sent a message on Feb 20 that I would be releasing it a
few days later. I got some useful feedback in response to that
message; the release was on Feb 25.


> * Check all test suites (all are in the test directory, except for
> PyXML/xml/dom/ext/reader/test_suite/Benchmark.py, which looks as if
> it should just be nuked)

I normally run all of the test directory on Linux and Solaris; this
time, I also ran in on WinNT (and noticed that the packaging would
forget the output/test_ files due to a bug in distutils).

> *  Update any docs

I normally do that before running the test suite.

> *  Draft announcement

At least for 0.6.4, that happened quite some time before that:
revisions 1.10-1.12 all deal with 0.6.4.

> *  Update SF page

So far, this is uploading only. If people feel that I should post a
news item also, I can add this to my list.

> Anything else?

* Place CVS tag on all files
* Post announcements (the one to xml-dev always returns since only
  subscribers can post, and since I was not subscribed and always forgot
  that restriction)

Regards,
Martin


From fdrake@acm.org  Mon Mar 12 02:27:48 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sun, 11 Mar 2001 21:27:48 -0500 (EST)
Subject: [XML-SIG] PyXML-0.6.4 on BeOS
In-Reply-To: <200103111807.f2BI7Rn03851@mira.informatik.hu-berlin.de>
References: <20010311165641.275FE813E@conn.mc.mpls.visi.com>
 <200103111807.f2BI7Rn03851@mira.informatik.hu-berlin.de>
Message-ID: <15020.13348.988144.898070@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > Thanks, I've added this to my local copy of setup.py.

  Then check it in!  The first thing I did when I saw the report was
to check for checkins, then make the change myself.  I saw your note
before checking in, but ... part of "release early, release often" is
"share code base updates".  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From nobody@sourceforge.net  Mon Mar 12 02:42:47 2001
From: nobody@sourceforge.net (nobody)
Date: Sun, 11 Mar 2001 18:42:47 -0800
Subject: [XML-SIG] [ pyxml-Bugs-407810 ] xmlproc chokes on lengthy comments
Message-ID: <E14cIIJ-00083M-00@usw-sf-web1.sourceforge.net>

Bugs #407810, was updated on 2001-03-11 18:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407810&group_id=6473

Category: xmlproc
Group: None
Status: Open
Priority: 5
Submitted By: A.M. Kuchling
Assigned to: Lars Marius Garshol
Summary: xmlproc chokes on lengthy comments

Initial Comment:
Lengthy comment blocks cause xmlproc to raise a
RuntimeError: "maximum recursion depth exceeded"
error.  The problem is that a group is used to match an
individual character, and SRE recurses 
on group repeats: '([^-]|-[^-])*'.

Fix: would '(.*?)--' be an equivalent pattern?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=407810&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Mon Mar 12 07:12:32 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 12 Mar 2001 08:12:32 +0100
Subject: [XML-SIG] PyXML-0.6.4 on BeOS
In-Reply-To: <15020.13348.988144.898070@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <20010311165641.275FE813E@conn.mc.mpls.visi.com>
 <200103111807.f2BI7Rn03851@mira.informatik.hu-berlin.de> <15020.13348.988144.898070@cj42289-a.reston1.va.home.com>
Message-ID: <200103120712.f2C7CWa01347@mira.informatik.hu-berlin.de>

>   Then check it in!  The first thing I did when I saw the report was
> to check for checkins, then make the change myself.  I saw your note
> before checking in, but ... part of "release early, release often" is
> "share code base updates".  ;-)

No argument about that. Committed.

Martin


From lkcl@samba-tng.org  Mon Mar 12 13:15:46 2001
From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton)
Date: Tue, 13 Mar 2001 00:15:46 +1100
Subject: [XML-SIG] Re: tabs inside attribute values removed
In-Reply-To: <3AA90FD0.B5777324@fourthought.com>
Message-ID: <Pine.SGI.4.05.10103130006500.24229-100000@samba.org>

On Fri, 9 Mar 2001, Jeremy Kloth wrote:

> 
> 
> > i am having to pre-process all text, substituting
> > &#x09; for "\t" as a work-around for this problem.
> > 
> > if this is not performed, then all tabs inside
> > attribute's values, e.g.
> > <node attr="value\tsep\tby\ttabs"/>, are turned into
> > spaces.
> 
> Using PyXML 0.6.4, I didn't see this behavior.
> 
> from xml.dom.ext.reader import Sax2
> doc = Sax2.FromXml('<element attr="a&#x09;tab"/>')
> attr = doc.documentElement.attributes.item(0)
> print repr(attr.value)
> 'a\011tab'

it's the other way round [and this was with 0.6.2]

doc = Sax2.FromXml('<element attr="a\011tab"/>')
attr = doc.documentElement.attributes.attributes['','attr'].value

and should i be using doc.documentElement.attributes['ns','name'].value,
is that okay?

[ just checked this]

it still doesn't work, and it still doesn't work with 0.6.4.

so, yes: i have to pre-process all text, substituting \t with &#x09; which
is _not_ something i want to have to leave in the code, long-term, as you
might imagine!

some of the documents i am parsing are over 2.5mb in size, and other
people may find larger uses (see http://sourceforge.net/projects/pyxsmqll)

yes, i know: i need to move to a Sax model not a DOM one.  first
implementation, and all that :)

all best,

luke

 ----- Luke Kenneth Casson Leighton <lkcl@samba-tng.org> -----

"i want a world of dreams, run by near-sighted visionaries"
"good.  that's them sorted out.  now, on _this_ world..."


From jerome.marant@free.fr  Mon Mar 12 13:30:02 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 12 Mar 2001 14:30:02 +0100
Subject: [XML-SIG] 4DOM
Message-ID: <7zitlfi085.fsf@amboise.ird.idealx.com>

Hi,
=20
  I made a diff between 4DOM in the 4Suite tarball and 4DOM
  in PyXML and I found many differences.

  What kind of changes have been made to it for its inclusion
  into PyXML and are theses changes to be backported the
  its original place ?

  Thanks.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From martin@loewis.home.cs.tu-berlin.de  Mon Mar 12 18:42:14 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 12 Mar 2001 19:42:14 +0100
Subject: [XML-SIG] 4DOM
In-Reply-To: <7zitlfi085.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <7zitlfi085.fsf@amboise.ird.idealx.com>
Message-ID: <200103121842.f2CIgEF01274@mira.informatik.hu-berlin.de>

>   I made a diff between 4DOM in the 4Suite tarball and 4DOM
>   in PyXML and I found many differences.

What versions exactly have you been comparing?

>   What kind of changes have been made to it for its inclusion
>   into PyXML and are theses changes to be backported the
>   its original place ?

PyXML *is* the original place for 4DOM. Maybe I did not say it loud
enough; here is the first item of the 0.6.4 ANNOUNCEMENT:

	* 4DOM was integrated from 4Suite 0.10.2. 4DOM is now
          maintained as a part of PyXML. A detailed list of changes can
          be found in xml/dom/ChangeLog.

Regards,
Martin


From larsga@garshol.priv.no  Mon Mar 12 20:56:14 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Mar 2001 21:56:14 +0100
Subject: [XML-SIG] [ pyxml-Bugs-407288 ] tabs inside attribute values removed
In-Reply-To: <E14bLnT-0000HR-00@usw-sf-web1.sourceforge.net>
References: <E14bLnT-0000HR-00@usw-sf-web1.sourceforge.net>
Message-ID: <m3ae6qu2oh.fsf@lambda.garshol.priv.no>

* nobody@sourceforge.net
|
| Bugs #407288, was updated on 2001-03-09 04:15
| [...]
| Initial Comment:
|
| i am having to pre-process all text, substituting
| &#x09; for "\t" as a work-around for this problem.
| 
| if this is not performed, then all tabs inside
| attribute's values, e.g.
| <node attr="value\tsep\tby\ttabs"/>, are turned into
| spaces.

This is the correct behaviour for an XML parser, as mandated by the
XML recommendation:

  <URL: http://www.w3.org/TR/REC-xml#AVNormalize >
 
| i am storing python code in an attribute value, so i
| _must_ have my tabs!!! :) :)

Then you must encode them correctly. :-)

--Lars M.


From msanborn@Adobe.COM  Tue Mar 13 00:25:08 2001
From: msanborn@Adobe.COM (Michael Sanborn)
Date: Mon, 12 Mar 2001 16:25:08 -0800
Subject: [XML-SIG] Problem installing PyXML-0.6.4 on W2K
Message-ID: <4.3.2.7.2.20010312161802.01ed5ee8@mailsj-v1>

I was pleased to see the binary installer for PyXML, but I'm finding that 
it comes to a screen that asks me to "Select python installation to use:" 
with a blank text pane and a greyed-out text box that I can't type into, so 
I'm stuck. I'm using a freshly installed Python 1.6.1 on d:\Python161, 
running Windows 2000. Anyone run into this problem before?

If all else fails, can I just extract the PyXML-0.6.4.tar.gz files into a 
subdirectory of d:\Python161\Lib and ignore the compiling, since there's 
already a pyexpat.pyd?

Thanks,

Michael Sanborn


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 13 04:38:25 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 13 Mar 2001 05:38:25 +0100
Subject: [XML-SIG] Problem installing PyXML-0.6.4 on W2K
In-Reply-To: <4.3.2.7.2.20010312161802.01ed5ee8@mailsj-v1>
 (msanborn@Adobe.COM)
References: <4.3.2.7.2.20010312161802.01ed5ee8@mailsj-v1>
Message-ID: <200103130438.f2D4cPK00857@mira.informatik.hu-berlin.de>

> I was pleased to see the binary installer for PyXML, but I'm finding that 
> it comes to a screen that asks me to "Select python installation to use:" 
> with a blank text pane and a greyed-out text box that I can't type into, so 
> I'm stuck. I'm using a freshly installed Python 1.6.1 on d:\Python161, 
> running Windows 2000. Anyone run into this problem before?

That is no surprise. The binary installer works for 1.5.2, and 2.0,
respectively. Nobody uses or should use Python 1.6, so I recommend to
upgrade to 2.0.

> If all else fails, can I just extract the PyXML-0.6.4.tar.gz files
> into a subdirectory of d:\Python161\Lib and ignore the compiling,
> since there's already a pyexpat.pyd?

No. The expat.pyd of 1.6.1 is probably horribly broken, so PyXML will
not work properly with it.

Regards,
Martin


From frank@quantiva.com  Tue Mar 13 22:08:01 2001
From: frank@quantiva.com (Frank Stolze)
Date: Tue, 13 Mar 2001 17:08:01 -0500 (EST)
Subject: [XML-SIG] SAX parsing
Message-ID: <Pine.LNX.4.30.0103131701530.2381-100000@localhost.localdomain>

Hi,


I'm trying to parse an XML stream, i.e., an "infinitely long" XML
document. I want to process XML entities in real time as they are
being read. That's why I'm using the SAX approach. However, it seems
that both the expat parser in Python 2.0 as well as the xmlproc
parser in the latest PyXML don't even start to parse until they see
an end-of-file.

Is that a "known and intented behavior" (which would be a pity as
it would make them unusable as stream parsers) or am I wrong?


Thanks,
Frank


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 13 22:37:45 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 13 Mar 2001 23:37:45 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
Message-ID: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>

Since a number of bug fixes have been committed to PyXML since 0.6.4,
I plan to release 0.6.5 sometime next week. If you have any pending
patches that you'd like to see, or if you know of bugs that you think
should be (and can be) corrected, please let me know. This will be the
last 0.6.x release, to be followed by 0.7, or by 1.0 if too many
people complain :-)

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 13 22:34:45 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 13 Mar 2001 23:34:45 +0100
Subject: [XML-SIG] SAX parsing
In-Reply-To: <Pine.LNX.4.30.0103131701530.2381-100000@localhost.localdomain>
 (message from Frank Stolze on Tue, 13 Mar 2001 17:08:01 -0500 (EST))
References: <Pine.LNX.4.30.0103131701530.2381-100000@localhost.localdomain>
Message-ID: <200103132234.f2DMYjN02770@mira.informatik.hu-berlin.de>

> I'm trying to parse an XML stream, i.e., an "infinitely long" XML
> document. I want to process XML entities in real time as they are
> being read. That's why I'm using the SAX approach. However, it seems
> that both the expat parser in Python 2.0 as well as the xmlproc
> parser in the latest PyXML don't even start to parse until they see
> an end-of-file.
> 
> Is that a "known and intented behavior" (which would be a pity as
> it would make them unusable as stream parsers) or am I wrong?

There is a SAX extension in use in PyXML, which is the incremental
parser. Not all readers are incremental parsers, but the expat reader
is. Please see xml.sax.xmlreader for details; the parse() function of
that will invoke feed() every now and then, which in turn will result
in content handler events.

If you don't see this, it might be that you have to few data
available. Or, you did something wrong, which is hard to say without
seeing any source code. To get a more reliable behaviour, you can
chose to invoke feed() yourself in a loop, by reading chunks of data
from your stream.

Regards,
Martin

P.S. If you had expected the parser to read one byte at a time, I'll
have to disappoint you: that would be so unefficient that nobody has
considered it.


From uche.ogbuji@fourthought.com  Wed Mar 14 03:06:37 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 13 Mar 2001 20:06:37 -0700
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Tue, 13 Mar 2001 23:37:45 +0100." <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
Message-ID: <200103140306.UAA02140@localhost.localdomain>

> Since a number of bug fixes have been committed to PyXML since 0.6.4,
> I plan to release 0.6.5 sometime next week. If you have any pending
> patches that you'd like to see, or if you know of bugs that you think
> should be (and can be) corrected, please let me know. This will be the
> last 0.6.x release, to be followed by 0.7, or by 1.0 if too many
> people complain :-)

I think it makes sense to make 0.7 the first release with 4XPath and 4XSLT 
built in.  Then we can burn it in through an 0.7.x cycle and go 1.0 when we're 
happy with things?

Our plans are to release 4Suite 0.10.3 this week or early next.  Then it's 
testing, testing, testing for a month or so and 1.0 in late April.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jerome.marant@free.fr  Wed Mar 14 08:48:06 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 14 Mar 2001 09:48:06 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Uche Ogbuji's message of "Tue, 13 Mar 2001 20:06:37 -0700"
References: <200103140306.UAA02140@localhost.localdomain>
Message-ID: <7zn1aopwhl.fsf@amboise.ird.idealx.com>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

=20
> I think it makes sense to make 0.7 the first release with 4XPath and =
4XSLT=20
> built in.  Then we can burn it in through an 0.7.x cycle and go 1.0 w=
hen we're=20
> happy with things?

=20=20
  BTW, do you plan to merge 4Suite and PyXML? It seems that a growing n=
umber
  of 4Suite components are integrated into PyXML ...
  What is the future of 4Suite?

  Thanks.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From uche.ogbuji@fourthought.com  Wed Mar 14 13:42:05 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Mar 2001 06:42:05 -0700
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Message from jerome.marant@free.fr (J r me Marant)
 of "14 Mar 2001 09:48:06 +0100." <7zn1aopwhl.fsf@amboise.ird.idealx.com>
Message-ID: <200103141342.GAA03805@localhost.localdomain>

> Uche Ogbuji <uche.ogbuji@fourthought.com> writes:
> =

>  =

> > I think it makes sense to make 0.7 the first release with 4XPath and =
4XSLT =

> > built in.  Then we can burn it in through an 0.7.x cycle and go 1.0 w=
hen we're =

> > happy with things?
> =

>   =

>   BTW, do you plan to merge 4Suite and PyXML?

No.  But I don't think it's a good idea to do so anyway.

For one thing, not all of 4Suite is relevant to PyXML.  For instance, 4OD=
S =

probably wouldn't fit.

But also, I think it has worked quite well for the technology to be incub=
ated =

in 4Suite, and the parts that are of broadest use for Python XML users to=
 =

migrate to PyXML.  I see 4Suite as a sort of PyXML++ for those who want t=
he =

kittin' kaboodle of XML tools.

>   It seems that a growing number
>   of 4Suite components are integrated into PyXML ...

Yes, but in some cases there is more to it than simple migration.  For =

instance, we'll be moving 4XPath and 4XSLT to PyXML, but we'll be develop=
ing =

from scratch a new XSLT implementation that will live in 4Suite 1.1 and h=
igher =

as an alternative to 4XSLT.  That way Python will have a mature =

implementation, and an improved, but experimental implementation.

>   What is the future of 4Suite?

1.0 probably in late April, which will be mostly what's in CVS now with =

bug-fixes.  Then 4Suite 1.0.x is maintained as a bug-fix branch while 4XP=
ath =

and 4XSLT are removed from a 1.1 development branch and the new XSLT proc=
essor =

introduced.

So 4Suite will keep on, although we will move over to PyXML whatever make=
s =

sense and has consensus (there was much discussion about moving 4XPath an=
d =

4XSLT in almost a year ago, but the timing makes more sense now).

One note is that since 4XSLT includes PyXML, all this migration should be=
 =

relatively transparent to the end user (although it can make for some ext=
ra =

work for distributors).


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jerome.marant@free.fr  Wed Mar 14 14:40:13 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 14 Mar 2001 15:40:13 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Uche Ogbuji's message of "Wed, 14 Mar 2001 06:42:05 -0700"
References: <200103141342.GAA03805@localhost.localdomain>
Message-ID: <7z8zm8pg6q.fsf@amboise.ird.idealx.com>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

=20
> No.  But I don't think it's a good idea to do so anyway.
>=20
> For one thing, not all of 4Suite is relevant to PyXML.  For instance,=
 4ODS=20
> probably wouldn't fit.

  I agree.
=20

> Yes, but in some cases there is more to it than simple migration.  Fo=
r=20
...
>=20
> So 4Suite will keep on, although we will move over to PyXML whatever =
makes=20
> sense and has consensus (there was much discussion about moving 4XPat=
h and=20
> 4XSLT in almost a year ago, but the timing makes more sense now).

  Right.

  I'm the Debian maintainer of the PyXML package and I'm working on
  packaging 4Suite for Debian (BTW, do you agree with it?). So, I have
  to remove PyXML and 4DOM sections as they are provided by the PyXML
  package (4Suite depends on PyXML) and I was wondering whether I'd
  have to remove more and more components :-)

  Thanks for these explainations.

  Cheers,

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From Alexandre.Fayolle@logilab.fr  Wed Mar 14 14:56:19 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Mar 2001 15:56:19 +0100 (CET)
Subject: [XML-SIG] packaging 4Suite
In-Reply-To: <7z8zm8pg6q.fsf@amboise.ird.idealx.com>
Message-ID: <Pine.LNX.4.21.0103141552010.12134-100000@leo.logilab.fr>

On 14 Mar 2001, J�r�me Marant wrote:

>   I'm the Debian maintainer of the PyXML package and I'm working on
>   packaging 4Suite for Debian (BTW, do you agree with it?). So, I have
>   to remove PyXML and 4DOM sections as they are provided by the PyXML
>   package (4Suite depends on PyXML) and I was wondering whether I'd
>   have to remove more and more components :-)

Hmm, would it not be easier to have the 4Suite debian package "provide"
PyXML, and maybe make both packages conflict (so that one would have to
choose between PyXML and 4Suite, knowing that the latter is a strict
superset of the former).

This said, I'm by no mean an expert of the Debian policies...

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From jerome.marant@free.fr  Wed Mar 14 15:03:26 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 14 Mar 2001 16:03:26 +0100
Subject: [XML-SIG] Re: packaging 4Suite
In-Reply-To: Alexandre Fayolle's message of "Wed, 14 Mar 2001 15:56:19 +0100 (CET)"
References: <Pine.LNX.4.21.0103141552010.12134-100000@leo.logilab.fr>
Message-ID: <7zk85so0jl.fsf@amboise.ird.idealx.com>

Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> writes:

> Hmm, would it not be easier to have the 4Suite debian package "provid=
e"
> PyXML, and maybe make both packages conflict (so that one would have =
to
> choose between PyXML and 4Suite, knowing that the latter is a strict
> superset of the former).

  I don't agree. It is clear, according to the 4Suite documentation,
  that 4Suite depends on PyXML and extracting PyXML from 4Suite
  allows this package to use the lattest bugfixed PyXML without needing
  4Suite to be updated anytime that PyXML changes.

  Cheers,

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From akuchlin@mems-exchange.org  Wed Mar 14 15:33:17 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Wed, 14 Mar 2001 10:33:17 -0500
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Mar 13, 2001 at 11:37:45PM +0100
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
Message-ID: <20010314103317.C15434@ute.cnri.reston.va.us>

On Tue, Mar 13, 2001 at 11:37:45PM +0100, Martin v. Loewis wrote:
>patches that you'd like to see, or if you know of bugs that you think
>should be (and can be) corrected, please let me know. This will be the

If my suggested fix for bug #407810 in xmlproc is correct, it would be
trivial to fix.  If it's not, this might be more difficult to fix.

        Lengthy comment blocks cause xmlproc to raise a 
        RuntimeError: "maximum recursion depth exceeded" 
        error. The problem is that a group is used to match an 
        individual character, and SRE recurses 
        on group repeats: '([^-]|-[^-])*'. 

        Fix: would '(.*?)--' be an equivalent pattern? 

--amk


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 14 20:47:57 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 14 Mar 2001 21:47:57 +0100
Subject: [XML-SIG] Re: packaging 4Suite
In-Reply-To: <7zk85so0jl.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <Pine.LNX.4.21.0103141552010.12134-100000@leo.logilab.fr> <7zk85so0jl.fsf@amboise.ird.idealx.com>
Message-ID: <200103142047.f2EKlvf01502@mira.informatik.hu-berlin.de>

>   I don't agree. It is clear, according to the 4Suite documentation,
>   that 4Suite depends on PyXML and extracting PyXML from 4Suite
>   allows this package to use the lattest bugfixed PyXML without
>   needing 4Suite to be updated anytime that PyXML changes.

That is up to your packaging. In theory, you are right: 4Suite is
meant as a strict superset. In practice, there is more dependence
between the two than we'd like, atleast at the moment. 4Suite needs
*atleast* the most recent snapshot of PyXML, and I personally cannot
guarantee that future releases of PyXML won't break older 4Suite
releases (although there is a clear intention to be backwards
compatible if possible).

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 14 20:38:31 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 14 Mar 2001 21:38:31 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <7zn1aopwhl.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <200103140306.UAA02140@localhost.localdomain> <7zn1aopwhl.fsf@amboise.ird.idealx.com>
Message-ID: <200103142038.f2EKcV701477@mira.informatik.hu-berlin.de>

>   BTW, do you plan to merge 4Suite and PyXML? It seems that a
>   growing number of 4Suite components are integrated into PyXML ...
>   What is the future of 4Suite?

Uche has already explained his view, so let me add mine. Personally, I
feel that PyXML "owns" the xml package, and I see my responsibility in
getting all the components in it to work together. That is what makes
integrating xml.xpath and xml.xslt interesting (although there
certainly also is the challenge of doing it in pure Python which makes
it interesting). 

For everything in Ft.*, I won't push integration into PyXML. From the
XML point of view, that means that the Domlettes may never show up in
PyXML - unless Fourthought wants to contribute them. There is other
stuff in 4Suite that clearly does not belong into PyXML, also.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 14 20:43:49 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 14 Mar 2001 21:43:49 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <7z8zm8pg6q.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com>
Message-ID: <200103142043.f2EKhn201479@mira.informatik.hu-berlin.de>

>   I'm the Debian maintainer of the PyXML package and I'm working on
>   packaging 4Suite for Debian (BTW, do you agree with it?). So, I have
>   to remove PyXML and 4DOM sections as they are provided by the PyXML
>   package (4Suite depends on PyXML) and I was wondering whether I'd
>   have to remove more and more components :-)

At the moment, you have two options:

a) you can declare PyXML as a prerequisite of 4Suite; in that case,
   I'd appreciate if you'd restrict to released versions of PyXML only
   - no matter how broken they are.

b) you can declare PyXML and 4Suite to be conflicting packages (don't
   know whether this is possible in Debian packaging); your 4Suite
   package would then incorporate a copy of PyXML. If you follow this
   route, you can chose whatever state of PyXML that is useful; just
   make sure that either PyXML or 4Suite properly supercedes any
   Python 2 package that might be also available (but I know that
   Debian refuses to offer Python 2 for political reasons)

Regards,
Martin

P.S. No, I don't mean to start a flame war on licensing :-) Python
licensing will hopefully sort out with 2.1.


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 14 21:32:44 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 14 Mar 2001 22:32:44 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <20010314103317.C15434@ute.cnri.reston.va.us> (message from
 Andrew Kuchling on Wed, 14 Mar 2001 10:33:17 -0500)
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us>
Message-ID: <200103142132.f2ELWiM02058@mira.informatik.hu-berlin.de>

> If my suggested fix for bug #407810 in xmlproc is correct, it would be
> trivial to fix.  If it's not, this might be more difficult to fix.
> 
>         Lengthy comment blocks cause xmlproc to raise a 
>         RuntimeError: "maximum recursion depth exceeded" 
>         error. The problem is that a group is used to match an 
>         individual character, and SRE recurses 
>         on group repeats: '([^-]|-[^-])*'. 
> 
>         Fix: would '(.*?)--' be an equivalent pattern? 

I must admit that *? was new to me, but it appears to be extremely
useful, and that appears to be the right use for it. IOW, I think your
fix is correct (and probably more efficient in day-to-day use, also).

Regards,
Martin

P.S. Could you take another look at the patches that have been
assigned to you; if not, can you unassign them?

P.P.S. Recently, I could not assign anything to None on SF, so the
last "can" is not only "are you willing to", but also "are you capable
of" :-?


From akuchlin@mems-exchange.org  Wed Mar 14 21:41:45 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Wed, 14 Mar 2001 16:41:45 -0500
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <200103142132.f2ELWiM02058@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Wed, Mar 14, 2001 at 10:32:44PM +0100
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us> <200103142132.f2ELWiM02058@mira.informatik.hu-berlin.de>
Message-ID: <20010314164145.K15434@ute.cnri.reston.va.us>

On Wed, Mar 14, 2001 at 10:32:44PM +0100, Martin v. Loewis wrote:
>P.S. Could you take another look at the patches that have been
>assigned to you; if not, can you unassign them?

I thought that "[#403408] xml/marshal/wddx.py mods" was being revised
by the author.  The patch is dated Jan. 24, but there are subsequent
discussions about revising them further, and I thought the patch was
on hold pending further changes.  I've added a comment asking Robin if
I should just check in the current patches.  (Annoying thing about
SF's new patch mailings: I have no idea who the notifications are
going to; is Robin even seeing them?)

>P.P.S. Recently, I could not assign anything to None on SF, so the
>last "can" is not only "are you willing to", but also "are you capable
>of" :-?

It does work for me; I unassigned the WDDX patches, and then promptly
assigned them back to me.

--amk


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 14 22:01:38 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 14 Mar 2001 23:01:38 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <20010314164145.K15434@ute.cnri.reston.va.us> (message from
 Andrew Kuchling on Wed, 14 Mar 2001 16:41:45 -0500)
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us> <200103142132.f2ELWiM02058@mira.informatik.hu-berlin.de> <20010314164145.K15434@ute.cnri.reston.va.us>
Message-ID: <200103142201.f2EM1cp02263@mira.informatik.hu-berlin.de>

> Annoying thing about SF's new patch mailings: I have no idea who the
> notifications are going to; is Robin even seeing them?

I think so, yes: everybody who ever made a comment to the issue, plus
the submitter, plus the responsible developer gets a copy (that often
meant I get multiple copies - the algorithm appears to play on the
safe side). I agree SF should show *whom* it send a message to.

Regards,
Martin


From larsga@garshol.priv.no  Wed Mar 14 23:11:14 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 15 Mar 2001 00:11:14 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <20010314103317.C15434@ute.cnri.reston.va.us>
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us>
Message-ID: <m3k85st08d.fsf@lambda.garshol.priv.no>

* Andrew Kuchling
| 
| If my suggested fix for bug #407810 in xmlproc is correct, it would
| be trivial to fix.  If it's not, this might be more difficult to
| fix.

I don't think it is correct, but I need to look more closely at it.
I'm in the process of doing so now, but am somewhat hampered by not
having my test suite working.

--Lars M.


From larsga@garshol.priv.no  Thu Mar 15 00:17:52 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 15 Mar 2001 01:17:52 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <20010314103317.C15434@ute.cnri.reston.va.us>
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us>
Message-ID: <m3itlbubpr.fsf@lambda.garshol.priv.no>

* Andrew Kuchling
| 
| If my suggested fix for bug #407810 in xmlproc is correct, it would
| be trivial to fix.  If it's not, this might be more difficult to
| fix.

The fix turned out to be wrong, but luckily the problem wasn't very
hard to fix.

I've fixed it now both in my CVS tree and in the PyXML CVS tree.

I've also done most of the hard work in cleaning up the test suite and
making it read for a move to the PyXML test suite. I hope to be able
to do the rest soon.

--Lars M.


From uche.ogbuji@fourthought.com  Thu Mar 15 00:28:19 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Mar 2001 17:28:19 -0700
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Message from jerome.marant@free.fr (J r me Marant)
 of "14 Mar 2001 15:40:13 +0100." <7z8zm8pg6q.fsf@amboise.ird.idealx.com>
Message-ID: <200103150028.RAA18167@localhost.localdomain>

> > Yes, but in some cases there is more to it than simple migration.  Fo=
r =

> ...
> > =

> > So 4Suite will keep on, although we will move over to PyXML whatever =
makes =

> > sense and has consensus (there was much discussion about moving 4XPat=
h and =

> > 4XSLT in almost a year ago, but the timing makes more sense now).
> =

>   Right.
> =

>   I'm the Debian maintainer of the PyXML package and I'm working on
>   packaging 4Suite for Debian (BTW, do you agree with it?).

Absolutely?  I thank you.

>   So, I have
>   to remove PyXML and 4DOM sections as they are provided by the PyXML
>   package (4Suite depends on PyXML) and I was wondering whether I'd
>   have to remove more and more components :-)

I'm sorry the 4Suite/PyXML combo causes headaches for distributors, but a=
s =

Martin suggests, I think there are workarounds.


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Thu Mar 15 06:25:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 15 Mar 2001 07:25:36 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <m3itlbubpr.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 15 Mar 2001 01:17:52 +0100)
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <20010314103317.C15434@ute.cnri.reston.va.us> <m3itlbubpr.fsf@lambda.garshol.priv.no>
Message-ID: <200103150625.f2F6PaT01170@mira.informatik.hu-berlin.de>

> The fix turned out to be wrong, but luckily the problem wasn't very
> hard to fix.

Thanks for looking into this. Doing a forward string search looks like
the better solution, anyway.

Regards,
Martin


From jerome.marant@free.fr  Thu Mar 15 09:31:31 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 15 Mar 2001 10:31:31 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: "Martin v. Loewis"'s message of "Wed, 14 Mar 2001 21:43:49 +0100"
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com> <200103142043.f2EKhn201479@mira.informatik.hu-berlin.de>
Message-ID: <7zu24ve5u4.fsf@amboise.ird.idealx.com>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

=20
> At the moment, you have two options:
>=20
> a) you can declare PyXML as a prerequisite of 4Suite; in that case,
>    I'd appreciate if you'd restrict to released versions of PyXML only
>    - no matter how broken they are.

  This option is the most elegant, IMHO and the one I chose. Hence,
  you avoid bloating by providing stricly different components, and
  you do not forbid 4Suite users to use the latest bugfixed version
  of PyXML.
  I'm trying to follow what happening on the list to keep informed
  and it is my job to make decisions when something is broken: I
  can easily make changes to packages.

  Cheers,=20=20
=20
> P.S. No, I don't mean to start a flame war on licensing :-) Python
> licensing will hopefully sort out with 2.1.

  I don't like famewars neither. We are glad to see that this
  problem will be worked out in 2.1 (as it was recently with 1.6.1).
  Until then, we must have multiple versions of packages for both
  1.5.x and 2.0.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From jerome.marant@free.fr  Fri Mar 16 08:50:24 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 16 Mar 2001 09:50:24 +0100
Subject: [XML-SIG] setup.py question
In-Reply-To: "Martin v. Loewis"'s message of "Tue, 13 Mar 2001 23:37:45 +0100"
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
Message-ID: <7z8zm66qsv.fsf@amboise.ird.idealx.com>

Hi,

  Is there a good reason for installing PyXML in the _xmlplus
  for Python 2.0 rather that xml for the previous versions ?
  This change is breaking applications which are using
  import xml.=20

  Thanks.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From Alexandre.Fayolle@logilab.fr  Fri Mar 16 09:24:29 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Fri, 16 Mar 2001 10:24:29 +0100 (CET)
Subject: [XML-SIG] setup.py question
In-Reply-To: <7z8zm66qsv.fsf@amboise.ird.idealx.com>
Message-ID: <Pine.LNX.4.21.0103161008250.1964-100000@leo.logilab.fr>

On 16 Mar 2001, J�r�me Marant wrote:

> 
> Hi,
> 
>   Is there a good reason for installing PyXML in the _xmlplus
>   for Python 2.0 rather that xml for the previous versions ?
>   This change is breaking applications which are using
>   import xml. 

Using xml would conflict with the core xml module in Python 2.0. There is
a change in the the __init__.py of the core xml package which checks for
_xmlplus and uses it if it is found, so this should not beak Python 1.5
application using import xml to import PyXML.

The issue was discussed in August (a little) and September 2000 (a lot),
and kept the list busy for quite a while. You may want to check the
archives
(http://mail.python.org/pipermail/xml-sig/2000-September/thread.html). The
threads were 'Python Package Name', 'Uniform interface with Python 2.0',
'namespace collision between lib/xml and site-packages/xml'.

Cheers.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From johann@egenetics.com  Fri Mar 16 09:23:12 2001
From: johann@egenetics.com (Johann Visagie)
Date: Fri, 16 Mar 2001 11:23:12 +0200
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <7z8zm8pg6q.fsf@amboise.ird.idealx.com>; from jerome.marant@free.fr on Wed, Mar 14, 2001 at 03:40:13PM +0100
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com>
Message-ID: <20010316112311.E4464@fling.sanbi.ac.za>

J�r�me Marant on 2001-03-14 (Wed) at 15:40:13 +0100:
> 
>   I'm the Debian maintainer of the PyXML package and I'm working on
>   packaging 4Suite for Debian (BTW, do you agree with it?). So, I have
>   to remove PyXML and 4DOM sections as they are provided by the PyXML
>   package (4Suite depends on PyXML) and I was wondering whether I'd
>   have to remove more and more components :-)

I'm glad to see I'm not the only one having these problems.  :-)  I took over
maintainership of the PyXML port in the FreeBSD ports tree last November.
Since then, we managed to solve some subtle dependency problems caused by
PyXML installing in different locations under Python 2.0 and earlier
versions, but I have yet to face up to the monster that is the proper
integration of 4Suite and PyXML ports.  Currently, therefore, FreeBSD has no
4Suite port.  I hope this will change soon.  :-)

This thread has been most informative, thanks.

-- Johann


From jerome.marant@free.fr  Fri Mar 16 10:02:46 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 16 Mar 2001 11:02:46 +0100
Subject: [XML-SIG] setup.py question
In-Reply-To: Alexandre Fayolle's message of "Fri, 16 Mar 2001 10:24:29 +0100 (CET)"
References: <Pine.LNX.4.21.0103161008250.1964-100000@leo.logilab.fr>
Message-ID: <7zitla58vt.fsf@amboise.ird.idealx.com>

Alexandre Fayolle <Alexandre.Fayolle@logilab.fr> writes:
=20
> Using xml would conflict with the core xml module in Python 2.0. Ther=
e is
> a change in the the __init__.py of the core xml package which checks =
for
> _xmlplus and uses it if it is found, so this should not beak Python 1=
.5
> application using import xml to import PyXML.

Thanks !

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From jerome.marant@free.fr  Fri Mar 16 10:05:48 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 16 Mar 2001 11:05:48 +0100
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: Johann Visagie's message of "Fri, 16 Mar 2001 11:23:12 +0200"
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com> <20010316112311.E4464@fling.sanbi.ac.za>
Message-ID: <7zelvy58qr.fsf@amboise.ird.idealx.com>

Johann Visagie <johann@egenetics.com> writes:

=20
> I'm glad to see I'm not the only one having these problems.  :-)  I t=
ook over
> maintainership of the PyXML port in the FreeBSD ports tree last Novem=
ber.
> Since then, we managed to solve some subtle dependency problems cause=
d by
> PyXML installing in different locations under Python 2.0 and earlier

  At the moment we are able to install both 1.5 and 2.0 on the same
  Debian system. Then we do provide 1.5 and 2.0 versions of the
  same package.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From uche.ogbuji@fourthought.com  Fri Mar 16 12:53:13 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 16 Mar 2001 05:53:13 -0700
Subject: [XML-SIG] Preparing for PyXML 0.6.5
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com> <20010316112311.E4464@fling.sanbi.ac.za>
Message-ID: <3AB20CB9.CBE80F17@fourthought.com>

Johann Visagie wrote:
>=20
> J=E9r=F4me Marant on 2001-03-14 (Wed) at 15:40:13 +0100:
> >
> >   I'm the Debian maintainer of the PyXML package and I'm working on
> >   packaging 4Suite for Debian (BTW, do you agree with it?). So, I hav=
e
> >   to remove PyXML and 4DOM sections as they are provided by the PyXML
> >   package (4Suite depends on PyXML) and I was wondering whether I'd
> >   have to remove more and more components :-)
>=20
> I'm glad to see I'm not the only one having these problems.  :-)  I too=
k over
> maintainership of the PyXML port in the FreeBSD ports tree last Novembe=
r.
> Since then, we managed to solve some subtle dependency problems caused =
by
> PyXML installing in different locations under Python 2.0 and earlier
> versions, but I have yet to face up to the monster that is the proper
> integration of 4Suite and PyXML ports.  Currently, therefore, FreeBSD h=
as no
> 4Suite port.  I hope this will change soon.  :-)

Actually, this is not true.

See

http://www.4suite.org/FAQ.epy#1.1

Of course the latest version is 0.10.1, but it looks as if Peter was
able to tackle the problems.


--=20
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com=20
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From johann@egenetics.com  Fri Mar 16 13:33:16 2001
From: johann@egenetics.com (Johann Visagie)
Date: Fri, 16 Mar 2001 15:33:16 +0200
Subject: [XML-SIG] Preparing for PyXML 0.6.5
In-Reply-To: <3AB20CB9.CBE80F17@fourthought.com>; from uche.ogbuji@fourthought.com on Fri, Mar 16, 2001 at 05:53:13AM -0700
References: <200103141342.GAA03805@localhost.localdomain> <7z8zm8pg6q.fsf@amboise.ird.idealx.com> <20010316112311.E4464@fling.sanbi.ac.za> <3AB20CB9.CBE80F17@fourthought.com>
Message-ID: <20010316153316.A17768@fling.sanbi.ac.za>

Uche Ogbuji on 2001-03-16 (Fri) at 05:53:13 -0700:
> 
> > but I have yet to face up to the monster that is the proper
> > integration of 4Suite and PyXML ports.  Currently, therefore, FreeBSD has no
> > 4Suite port.  I hope this will change soon.  :-)
> 
> Actually, this is not true.
> 
> See
> 
> http://www.4suite.org/FAQ.epy#1.1

Hmm.  This port has not been committed to the FreeBSD ports tree, and is
therefore not an "official" FreeBSD port.  (FreeBSD ports are installed as
part of the OS in /usr/ports, and most FreeBSD users would update their ports
tree regularly via CVSup or similar.)

I now notice that it has been submitted several times, but looking at it I
would guess the reason why it has not been committed is that it suffers from
the same problems J�r�me originally mentioned.  For instance, it does not
attempt peaceful cohabitation with the PyXML port.

-- Johann


From fdrake@acm.org  Fri Mar 16 13:33:59 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 16 Mar 2001 08:33:59 -0500 (EST)
Subject: [XML-SIG] setup.py question
In-Reply-To: <7z8zm66qsv.fsf@amboise.ird.idealx.com>
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
 <7z8zm66qsv.fsf@amboise.ird.idealx.com>
Message-ID: <15026.5703.555244.547926@cj42289-a.reston1.va.home.com>

J=E9r=F4me Marant writes:
 >   Is there a good reason for installing PyXML in the _xmlplus
 >   for Python 2.0 rather that xml for the previous versions ?
 >   This change is breaking applications which are using
 >   import xml.=20

  Python 2.0 provides an "xml" package already, but PyXML is an
upgrade to that package.  PyXML should be installing as "_xmlplus" for
Python 2.0+, and as "xml" for all older versions of Python.
  Can you detail the combination of releases that breaks for you?
  Thanks!


  -Fred

--=20
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From jerome.marant@free.fr  Fri Mar 16 14:37:04 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 16 Mar 2001 15:37:04 +0100
Subject: [XML-SIG] setup.py question
In-Reply-To: "Fred L. Drake, Jr."'s message of "Fri, 16 Mar 2001 08:33:59 -0500 (EST)"
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <7z8zm66qsv.fsf@amboise.ird.idealx.com> <15026.5703.555244.547926@cj42289-a.reston1.va.home.com>
Message-ID: <7zbsr14w6n.fsf@amboise.ird.idealx.com>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:
=20
>   Python 2.0 provides an "xml" package already, but PyXML is an
> upgrade to that package.  PyXML should be installing as "_xmlplus" for
> Python 2.0+, and as "xml" for all older versions of Python.
>   Can you detail the combination of releases that breaks for you?
>   Thanks!

  Well, I can see the problem now. It is related to the way the
  interpreter is packaged in Debian: we usually split packages in
  several parts (thematically) in order not to bloat the system.
  For instance, python2-xmlbase contains the core xml library.
  My problem is that i made pyxml conflict with python2-xmlbase
  so that we cannot have 2 xml implementations at a time.
  So, It breaks applications since import xml does not work any
  more.

  After reading your remark, would do say that the core xml
  package is mandatory for pyxml ? If so, i can peacefully
  remove the "conflict". If not, I'll have to rename _xmlplus
  to xml.

  Thanks !

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From fdrake@acm.org  Fri Mar 16 15:12:06 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 16 Mar 2001 10:12:06 -0500 (EST)
Subject: [XML-SIG] setup.py question
In-Reply-To: <7zbsr14w6n.fsf@amboise.ird.idealx.com>
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
 <7z8zm66qsv.fsf@amboise.ird.idealx.com>
 <15026.5703.555244.547926@cj42289-a.reston1.va.home.com>
 <7zbsr14w6n.fsf@amboise.ird.idealx.com>
Message-ID: <15026.11590.178156.375665@localhost.localdomain>

J=E9r=F4me Marant writes:
 >   Well, I can see the problem now. It is related to the way the
 >   interpreter is packaged in Debian: we usually split packages in
 >   several parts (thematically) in order not to bloat the system.
 >   For instance, python2-xmlbase contains the core xml library.
 >   My problem is that i made pyxml conflict with python2-xmlbase
 >   so that we cannot have 2 xml implementations at a time.
 >   So, It breaks applications since import xml does not work any
 >   more.

  Hmm... the reason for moving some of it into the core was to ensure
that all installations have at least basic XML support if pyexpat
could compile; is pyexpat part of xmlbase?  (And not all of the xml
package depends on pyexpat; even using another parser, the xml.dom and
xml.sax packages provide the needed exceptions and constants for a
number of the XML APIs.  The node type constants are one example; such
things need to be found in a single location for fully general API
compatibility.)

 >   After reading your remark, would do say that the core xml
 >   package is mandatory for pyxml ? If so, i can peacefully
 >   remove the "conflict". If not, I'll have to rename _xmlplus
 >   to xml.

  I'd make xmlbase mandatory for PyXML; it includes the magic needed
to PyXML take precedence if present.


  -Fred

--=20
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Fri Mar 16 17:41:05 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 16 Mar 2001 18:41:05 +0100
Subject: [XML-SIG] setup.py question
In-Reply-To: <7zbsr14w6n.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de> <7z8zm66qsv.fsf@amboise.ird.idealx.com> <15026.5703.555244.547926@cj42289-a.reston1.va.home.com> <7zbsr14w6n.fsf@amboise.ird.idealx.com>
Message-ID: <200103161741.f2GHf5I00943@mira.informatik.hu-berlin.de>

>   After reading your remark, would do say that the core xml
>   package is mandatory for pyxml ?=20

=46rom the point of your packaging strategy, it is. But as Fred, I'd
strongly discourage splitting the Core Python distribution. The major
strength of Python is the "batteries included" aspect. So for the
Python core, I'd rather encourage an all-or-nothing position.

Do you have any feedback how many administrators would chose a
"partial" installation? How could an administrator know what kind of
libraries her users need? Sometimes, offering choices only complicates
matters instead of simplifying them. The people will show up on
python-help or python-tutor and ask what happened to the supposed XML
support of Python 2.0, since they did not get it on their system.

Regards,
Martin


From marketing@rjsnetworks.com  Sat Mar 17 00:51:38 2001
From: marketing@rjsnetworks.com (=?iso-8859-1?Q?Sales_-_rjsNetworks=2Ecom?=)
Date: Fri, 16 Mar 2001 19:51:38 -0500
Subject: [XML-SIG] (no subject)
Message-ID: <200103161951375.SM01192@rjsnetworks-ws1>

<html=3E<body LINK=3D"#003300" ALINK=3D"#99cc99" VLINK=3D"#336600"=3E<DIV TA=
BINDEX=3D"1"=3E<BASE TARGET=3D"=5Fblank"=3E
<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4=2E0 transitional//EN"=3E


<SCRIPT LANGUAGE=3D"JavaScript"=3E
<!--
function openwindow(pagename,args,windowname,width,height) {
var w =3D window=2Eopen(pagename+args,windowname,"resizable=3Dyes,location=3D=
no,width=3D" + width + ",height=3D" + height + ",scrollbars=3Dyes");
}
//--=3E
</SCRIPT=3E

<table width=3D"640" border=3D"0" cellspacing=3D"0" cellpadding=3D"1" bgcolo=
r=3D"#333333"=3E
  <tr=3E<td=3E
        <table width=3D"640" border=3D"0" cellspacing=3D"0" cellpadding=3D"0=
" bgcolor=3D"#ffffff"=3E
          <tr=3E 
            <td=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5F=
blank=2Egif" WIDTH=3D"20" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5F=
blank=2Egif" WIDTH=3D"10" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5F=
blank=2Egif" WIDTH=3D"420" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5F=
blank=2Egif" WIDTH=3D"10" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5F=
blank=2Egif" WIDTH=3D"160" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
          </tr=3E
          <tr=3E 
            <td colspan=3D"5" bgcolor=3D"#333333"=3E<img src=3D"http://www=2E=
rjsnetworks=2Ecom/images/pixel=5Fblank=2Egif"  width=3D"1" height=3D"1" bord=
er=3D"0"=3E</td=3E
          </tr=3E
          <tr=3E 
            <td background=3D"http://www=2Erjsnetworks=2Ecom/images/1=2Egif"=
=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixel=5Fblank=2Egif" WI=
DTH=3D"20" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td bgcolor=3D"#cccccc" background=3D"http://www=2Erjsnetworks=2E=
com/images/1=2Egif"=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixe=
l=5Fblank=2Egif" WIDTH=3D"8" HEIGHT=3D"8" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td valign=3D"bottom" background=3D"http://www=2Erjsnetworks=2Ec=
om/images/1=2Egif"=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/2=2Eg=
if"=3E</td=3E
            <td bgcolor=3D"#cccccc" background=3D"http://www=2Erjsnetworks=2E=
com/images/1=2Egif"=3E<IMG SRC=3D"http://www=2Erjsnetworks=2Ecom/images/pixe=
l=5Fblank=2Egif" WIDTH=3D"5" HEIGHT=3D"1" BORDER=3D"0" ALT=3D""=3E</td=3E
            <td align=3D"right" background=3D"http://www=2Erjsnetworks=2Ecom=
/images/1=2Egif"=3E</td=3E
          </tr=3E
          <tr=3E 
            <td bgcolor=3D"#666666"=3E&nbsp;</td=3E
            <td=3E&nbsp;</td=3E
            <td=3E<br=3E
              <font face=3D"arial,san serif" size=3D"2" color=3D"#666666"=3E=

              <P=3EStill waiting for your <b=3E<font color=3D"#000000"=3Eweb=
 host</font=3E</b=3E 
                to reply to your support question=3F Having hosting nightmar=
es=3F 
                Need experts to make sure your site is always up=3F Introduc=
ing 
                your #1 provider for quality web hosting <b=3E<font color=3D=
"#000000"=3ErjsNetworks=2Ecom</font=3E</b=3E: 
              <ul=3E
                <li=3E24/7/365 Support and availability 
                <li=3EUnlimited Email POP3 accounts 
                <li=3EUnlimited Bandwidth Transfer 
                <li=3EUnlimited FTP access 
                <li=3E100-500 MB data storage 
                <li=3E<b=3E<font color=3D"#FF3300"=3E$14=2E95 with NO SETUP =
charges</font=3E</b=3E 
              </ul=3E
              <P=3E 
              <ol=3E
                <li=3EAccounts starts at $14=2E95/month! No Long term contra=
cts and 
                  FREE setup included=2E
                <li=3EImmediate 60 sec setup=2E Immediate access to your acc=
ount=2E
                <li=3EOnline administration portal to configure your email e=
tc=2E
              </ol=3E
              <p align=3D"center"=3EVisit <a href=3D"http://www=2Erjsnetwork=
s=2Ecom"=3E<font color=3D"#FF3333"=3E<b=3Ewww=2Erjsnetworks=2Ecom</b=3E</fon=
t=3E</a=3E 
                for more information</p=3E
              Regards, 
              <P=3E rjsNetworks=2Ecom
              <P=3E <br=3E
                <br=3E
              </font=3E</td=3E
            <td=3E&nbsp;</td=3E
            <td bgcolor=3D"#333333" valign=3D"top" align=3D"center"=3E<BR=3E=

            </td=3E
          </tr=3E
        </table=3E
</td=3E
  </tr=3E
</table=3E
<table width=3D"640" cellpadding=3D"3" cellspacing=3D"0" border=3D"0"=3E
<tr=3E
      <td=3E<font face=3D"arial,san serif" size=3D"1" color=3D"#666666"=3Erj=
sNetworks=2Ecom 
        is firmly committed to respecting your privacy and we are sending yo=
u 
        this message as part of our affiliate program=2E <br=3E
        If you would like to be removed: <a href=3D"http://www=2Erjsnetworks=
=2Ecom/remove=2Easp=3Fmail=3Dxml-sig@python=2Eorg"=3Eplease click here</a=3E=
</font=3E</td=3E
    </tr=3E</table=3E
</div=3E</body=3E
</html=3E


From fdrake@acm.org  Fri Mar 16 15:03:22 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 16 Mar 2001 10:03:22 -0500 (EST)
Subject: [XML-SIG] setup.py question
In-Reply-To: <7zbsr14w6n.fsf@amboise.ird.idealx.com>
References: <200103132237.f2DMbjm02774@mira.informatik.hu-berlin.de>
 <7z8zm66qsv.fsf@amboise.ird.idealx.com>
 <15026.5703.555244.547926@cj42289-a.reston1.va.home.com>
 <7zbsr14w6n.fsf@amboise.ird.idealx.com>
Message-ID: <15026.11066.691824.633648@localhost.localdomain>

J=E9r=F4me Marant writes:
 >   Well, I can see the problem now. It is related to the way the
 >   interpreter is packaged in Debian: we usually split packages in
 >   several parts (thematically) in order not to bloat the system.
 >   For instance, python2-xmlbase contains the core xml library.
 >   My problem is that i made pyxml conflict with python2-xmlbase
 >   so that we cannot have 2 xml implementations at a time.
 >   So, It breaks applications since import xml does not work any
 >   more.

  Hmm... the reason for moving some of it into the core was to ensure
that all installations have at least basic XML support if pyexpat
could compile; is pyexpat part of xmlbase?  (And not all of the xml
package depends on pyexpat; even using another parser, the xml.dom and
xml.sax packages provide the needed exceptions and constants for a
number of the XML APIs.  The node type constants are one example; such
things need to be found in a single location for fully general API
compatibility.)

 >   After reading your remark, would do say that the core xml
 >   package is mandatory for pyxml ? If so, i can peacefully
 >   remove the "conflict". If not, I'll have to rename _xmlplus
 >   to xml.

  I'd make xmlbase mandatory for PyXML; it includes the magic needed
to PyXML take precedence if present.


  -Fred

--=20
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From akuchlin@mems-exchange.org  Sat Mar 17 05:51:42 2001
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Sat, 17 Mar 2001 00:51:42 -0500
Subject: [XML-SIG] ANN: quotation-tools 0.0.3 released
Message-ID: <200103170551.AAA02154@mira.erols.com>

I've made a new release of quotation-tools, a package for processing
QEL.  With this release, I'm finished with hacking on the command-line
for the moment; the next task is going to be a Tkinter GUI.

The package is available from http://www.amk.ca/qel/software.html .

--amk

Changes in version 0.0.3 and version 0.0.2:
	* New scripts: qtmerge for merging several QEL files into one,
	  and fortune2qel to convert fortune's files into QEL.
	* Implemented the QELdb class, which acts as a fast cache for a 
	  (potentially large) QEL file.  
	* Added docstrings so pydoc can produce some helpful output.
        * Added XML output format to qtformat; this now pretty-prints QEL.
	* Added -c option to qtgrep, to cause it to just print the number
          of matching quotations for each file searched.
	* Added a CSS1 stylesheet, xml/qel.css, for formatting QEL.
        * Fixed bugs in dealing with the <pre/> element.
        * Fixed bugs in dealing with the <br/> element.


From noreply@sourceforge.net  Sun Mar 18 23:03:08 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 18 Mar 2001 15:03:08 -0800
Subject: [XML-SIG] [ pyxml-Bugs-409605 ] reader.HtmlLib ignores optional starttag
Message-ID: <E14emCa-0000Uv-00@usw-sf-web3.sourceforge.net>

Bugs item #409605, was updated on 2001-03-18 15:03
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Martin v. L�wis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: reader.HtmlLib ignores optional starttag

Initial Comment:
Given the document

good_html = """
<html>
<P>I prefer (all things being equal)
regularity/orthogonality and logical
syntax/semantics in a language because there is less to
have to remember.
(Of course I <em>know</em> all things are NEVER really
equal!)
<P CLASS=source>Guido van Rossum, 6 Dec 91
<P>The details of that silly code are irrelevant.
<P CLASS=source>Tim Peters, 4 Mar 92
&amp; &lt; &gt; &eacute; &ouml; &nbsp;
</html>
"""

the reader should imply the <body> tag when it sees the
first p element. Instead, it will drop the p element,
as it is not directly allowed inside of the html
element.

Still, the document is valid, so the reader should
build the P elements into the tree. To see the error,
do

from xml.dom.ext.reader import HtmlLib
b = HtmlLib.FromHtml(good_html) 
print b.firstChild.firstChild

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473


From cce@clarkevans.com  Mon Mar 19 11:03:37 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Mon, 19 Mar 2001 06:03:37 -0500 (EST)
Subject: [XML-SIG] Getting namespace aware parser to work...
In-Reply-To: <200103170551.AAA02154@mira.erols.com>
Message-ID: <Pine.LNX.4.21.0103190556410.30527-100000@clarkevans.com>

I'm trying to process the following xml file, with
this python script to strip all elements with a 
given namespace.  I believe that I have a pretty
recent version (0.5.2).  I get the error following...

-----------------------------------------------------------------
test.xml
-----------------------------------------------------------------

<test>
  <one xmlns="baduri">strip</one>
  <two>keep</two>
</test>     

-----------------------------------------------------------------
test.py
-----------------------------------------------------------------

"""Strips a particular namespace from an XML document."""
from xml.sax import saxutils

class StripperFilter(saxutils.XMLFilterBase ):
    """Does the actual stripping"""
    def __init__(self,nmsp):
        """The namespace to strip is nmsp"""
        saxutils.XMLFilterBase.__init__(self)
        self.nmsp = nmsp
        
    def startElementNS(self, name, qname, attrs):
        """Ignores elements and strips attributes of nmsp"""
        if name[0] != self.nmsp:
            #
            # Warning: For efficiency this dives into the
            #          underlying representation of AttributesNSImpl
            #          and deletes attributes to be stripped.
            #
            #  _attrs should be of the form {(ns_uri, lname): value, ...}.
            #  _qnames of the form {(ns_uri, lname): qname, ...}."""
            #
            for (ns_uri,lname) in attrs._attrs.keys():
                if nmsp == ns_uri: del attrs._attrs[(ns_uri,lname)]
            saxutils.XMLFilterBase.startElementNS(self,name,qname,attrs)


from xml.sax import make_parser
from xml.sax.handler import feature_namespaces

def testStripper():
    parser = make_parser()
    parser.setFeature(feature_namespaces, 1)
    strip = StripperFilter('myuri')
    out = saxutils.XMLGenerator()
    strip.setContentHandler(out)
    parser.setContentHandler(strip)
    parser.parse("c:\\work\\xfld\\test.xml")

if __name__ == '__main__':
    testStripper()

----------------------------------------------------------------------
The error message
----------------------------------------------------------------------
<?xml version="1.0" encoding="iso-8859-1"?>
<test>
  stripTraceback (most recent call last):
  File "<stdin>", line 40, in ?
  File "<stdin>", line 37, in testStripper
  File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 43, in
parse
    xmlreader.IncrementalParser.parse(self, source)
  File "F:\Program Files\Python\_xmlplus\sax\xmlreader.py", line 120, in
parse
    self.feed(buffer)
  File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 87, in
feed
    self._parser.Parse(data, isFinal)
  File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 187, in
end_element_ns
    self._cont_handler.endElementNS(pair, None)
  File "F:\Program Files\Python\_xmlplus\sax\saxutils.py", line 259, in
endElementNS
    self._cont_handler.endElementNS(name, qname)
  File "F:\Program Files\Python\_xmlplus\sax\saxutils.py", line 192, in
endElementNS
    qname = self._current_context[name[0]] + ":" + name[1]
TypeError: bad operand type(s) for +


From cce@clarkevans.com  Mon Mar 19 11:07:10 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Mon, 19 Mar 2001 06:07:10 -0500 (EST)
Subject: [XML-SIG] Re: Getting namespace aware parser to work...
In-Reply-To: <Pine.LNX.4.21.0103190556410.30527-100000@clarkevans.com>
Message-ID: <Pine.LNX.4.21.0103190604520.30527-100000@clarkevans.com>

I believe the problem is default namespaces that
do not have a prefix.  The stripper gives the
expected (and, of course incorrect as it's not finished)
output when the test.xml file is changed to:

  <test>
    <strip:one xmlns:strip="baduri">one</strip:one>
    <two>keep</two>
  </test>

So... does the namespace aware code handle the case
when a namespace is not in the lookup table?

Clark


On Mon, 19 Mar 2001, Clark C. Evans wrote:

> Date: Mon, 19 Mar 2001 06:03:37 -0500 (EST)
> From: Clark C. Evans <cce@clarkevans.com>
> To: xml-sig@python.org
> Subject: Getting namespace aware parser to work...
> 
> I'm trying to process the following xml file, with
> this python script to strip all elements with a 
> given namespace.  I believe that I have a pretty
> recent version (0.5.2).  I get the error following...
> 
> -----------------------------------------------------------------
> test.xml
> -----------------------------------------------------------------
> 
> <test>
>   <one xmlns="baduri">strip</one>
>   <two>keep</two>
> </test>     
> 
> -----------------------------------------------------------------
> test.py
> -----------------------------------------------------------------
> 
> """Strips a particular namespace from an XML document."""
> from xml.sax import saxutils
> 
> class StripperFilter(saxutils.XMLFilterBase ):
>     """Does the actual stripping"""
>     def __init__(self,nmsp):
>         """The namespace to strip is nmsp"""
>         saxutils.XMLFilterBase.__init__(self)
>         self.nmsp = nmsp
>         
>     def startElementNS(self, name, qname, attrs):
>         """Ignores elements and strips attributes of nmsp"""
>         if name[0] != self.nmsp:
>             #
>             # Warning: For efficiency this dives into the
>             #          underlying representation of AttributesNSImpl
>             #          and deletes attributes to be stripped.
>             #
>             #  _attrs should be of the form {(ns_uri, lname): value, ...}.
>             #  _qnames of the form {(ns_uri, lname): qname, ...}."""
>             #
>             for (ns_uri,lname) in attrs._attrs.keys():
>                 if nmsp == ns_uri: del attrs._attrs[(ns_uri,lname)]
>             saxutils.XMLFilterBase.startElementNS(self,name,qname,attrs)
> 
> 
> from xml.sax import make_parser
> from xml.sax.handler import feature_namespaces
> 
> def testStripper():
>     parser = make_parser()
>     parser.setFeature(feature_namespaces, 1)
>     strip = StripperFilter('myuri')
>     out = saxutils.XMLGenerator()
>     strip.setContentHandler(out)
>     parser.setContentHandler(strip)
>     parser.parse("c:\\work\\xfld\\test.xml")
> 
> if __name__ == '__main__':
>     testStripper()
> 
> ----------------------------------------------------------------------
> The error message
> ----------------------------------------------------------------------
> <?xml version="1.0" encoding="iso-8859-1"?>
> <test>
>   stripTraceback (most recent call last):
>   File "<stdin>", line 40, in ?
>   File "<stdin>", line 37, in testStripper
>   File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 43, in
> parse
>     xmlreader.IncrementalParser.parse(self, source)
>   File "F:\Program Files\Python\_xmlplus\sax\xmlreader.py", line 120, in
> parse
>     self.feed(buffer)
>   File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 87, in
> feed
>     self._parser.Parse(data, isFinal)
>   File "F:\Program Files\Python\_xmlplus\sax\expatreader.py", line 187, in
> end_element_ns
>     self._cont_handler.endElementNS(pair, None)
>   File "F:\Program Files\Python\_xmlplus\sax\saxutils.py", line 259, in
> endElementNS
>     self._cont_handler.endElementNS(name, qname)
>   File "F:\Program Files\Python\_xmlplus\sax\saxutils.py", line 192, in
> endElementNS
>     qname = self._current_context[name[0]] + ":" + name[1]
> TypeError: bad operand type(s) for +
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


From cce@clarkevans.com  Mon Mar 19 11:38:47 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Mon, 19 Mar 2001 06:38:47 -0500 (EST)
Subject: [XML-SIG] (patch) Re: Getting namespace aware parser to work...
In-Reply-To: <Pine.LNX.4.21.0103190604520.30527-100000@clarkevans.com>
Message-ID: <Pine.LNX.4.21.0103190637220.30527-100000@clarkevans.com>

It's not perfect (since it doesn't use a stack for implicit
namespaces), but the errors I was getting should be fixed
by this patch.  Clark

........................
_xmlplus/sax/saxutils.py
.........................

171c171,172
<             name = name[1]
---
>             qname = name[1]
>             self._out.write('<' + qname)
173,175c174,181
<             name = self._current_context[name[0]] + ":" + name[1]
<         self._out.write('<' + name)
<
---
>             prefix = self._current_context[name[0]]
>             if prefix is None:
>                 self._out.write('<%s xmlns="%s"' % (name[1],name[0]) )
>                 qname = name[1]
>             else:
>                 qname = prefix  + ":" + name[1]
>                 self._out.write('<' + qname)
>
177c183,186
<             self._out.write(' xmlns:%s="%s"' % pair)
---
>             if pair[0] is None:
>                 pass
>             else:
>                 self._out.write(' xmlns:%s="%s"' % pair)
181,182c190,194
<             name = self._current_context[name[0]] + ":" + name[1]
<             self._out.write(' %s="%s"' % (name, escape(value)))
---
>             if name[0] is None:
>                 qname = name[1]
>             else:
>                 qname = self._current_context[name[0]] + ":" + name[1]
>             self._out.write(' %s="%s"' % (qname, escape(value)))
192c204,208
<             qname = self._current_context[name[0]] + ":" + name[1]
---
>             prefix = self._current_context[name[0]]
>             if prefix is None:
>                 qname = name[1]
>             else:
>                 qname = prefix + ":" + name[1]


From cce@clarkevans.com  Mon Mar 19 12:16:15 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Mon, 19 Mar 2001 07:16:15 -0500 (EST)
Subject: [XML-SIG] (patch) Re: Getting namespace aware parser to work...
In-Reply-To: <Pine.LNX.4.21.0103190637220.30527-100000@clarkevans.com>
Message-ID: <Pine.LNX.4.21.0103190713390.32426-100000@clarkevans.com>

This is a nicer patch to saxutils.py to fix the default 
namespace handling.  Please excuse the bad python code... I'm 
still less than 100 lines old... so it may have stupid errors.

---------------------
141a142
>         self._default_context = None
171c172
<             name = name[1]
---
>             self._out.write('<' + name[1])
173,175c174,183
<             name = self._current_context[name[0]] + ":" + name[1]
<         self._out.write('<' + name)
<
---
>             prefix = self._current_context[name[0]]
>             if prefix is None:
>                 if self._default_context is None or
self._default_context != name[0]:
>                     self._out.write('<%s xmlns="%s"' %
(name[1],name[0]) )
>                     self._default_context = name[0]
>                 else:
>                     self._out.write('<' + name[1])
>             else:
>                 self._out.write('<' + prefix  + ":" + name[1])
>
177c185,188
<             self._out.write(' xmlns:%s="%s"' % pair)
---
>             if pair[0] is None:
>                 pass
>             else:
>                 self._out.write(' xmlns:%s="%s"' % pair)
181,182c192,196
<             name = self._current_context[name[0]] + ":" + name[1]
<             self._out.write(' %s="%s"' % (name, escape(value)))
---
>             if name[0] is None:
>                 qname = name[1]
>             else:
>                 qname = self._current_context[name[0]] + ":" + name[1]
>             self._out.write(' %s="%s"' % (qname, escape(value)))
192c206,210
<             qname = self._current_context[name[0]] + ":" + name[1]
---
>             prefix = self._current_context[name[0]]
>             if prefix is None:
>                 qname = name[1]
>             else:
>                 qname = prefix + ":" + name[1]


From cce@clarkevans.com  Mon Mar 19 12:21:07 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Mon, 19 Mar 2001 07:21:07 -0500 (EST)
Subject: [XML-SIG] Namespace Stripper Filter
In-Reply-To: <Pine.LNX.4.21.0103190637220.30527-100000@clarkevans.com>
Message-ID: <Pine.LNX.4.21.0103190718090.32426-100000@clarkevans.com>

Here is my first "real live" python program... anyone who'd 
like to comment for style, please do so as I'm a newbie.
-----------------------------------------------------------------

"""Strips a particular namespace from an XML document."""
from xml.sax import saxutils

class StripperFilter(saxutils.XMLFilterBase ):
    """Does the actual stripping"""
    def __init__(self,nmsp):
        """The namespace to strip is nmsp"""
        saxutils.XMLFilterBase.__init__(self)
        self.nmsp = nmsp
        self.depth = 0
        
    def startElementNS(self, name, qname, attrs):
        """Ignores elements and strips attributes of nmsp"""
        if name[0] != self.nmsp:
            #
            # Warning: For efficiency this dives into the
            #          underlying representation of AttributesNSImpl
            #          and deletes attributes to be stripped.
            #
            #  _attrs should be of the form {(ns_uri, lname): value, ...}.
            #  _qnames of the form {(ns_uri, lname): qname, ...}."""
            #
            for (ns_uri,lname) in attrs._attrs.keys():
                if self.nmsp == ns_uri: del attrs._attrs[(ns_uri,lname)]
            self._cont_handler.startElementNS(name,qname,attrs)
        else:
            self.depth = self.depth + 1

    def characters(self, content):
        if self.depth == 0:
            saxutils.XMLFilterBase.characters(self,content)

    def endElementNS(self, name, qname):
        if self.depth > 0:
            self.depth = self.depth - 1
        else:
            self._cont_handler.endElementNS(name,qname)

    def startPrefixMapping(self, prefix, uri):
        if self.nmsp != uri:
            self._cont_handler.startPrefixMapping(prefix, uri)
                
from xml.sax import make_parser
from xml.sax.handler import feature_namespaces

def testStripper():
    parser = make_parser()
    parser.setFeature(feature_namespaces, 1)
    strip = StripperFilter('namespace-to-strip)
    out = saxutils.XMLGenerator()
    strip.setContentHandler(out)
    parser.setContentHandler(strip)
    parser.parse("test.xml")

if __name__ == '__main__':
    testStripper()


From greg@itam.zabrze.pl  Mon Mar 19 13:40:24 2001
From: greg@itam.zabrze.pl (Grzegorz Zegartowski)
Date: Mon, 19 Mar 2001 14:40:24 +0100
Subject: [XML-SIG] Minidom
Message-ID: <3AB60C48.C66BB433@itam.zabrze.pl>

I wish to know how to reading xml files with validation...

there's a parse method:
xml.dom.minidom.parse(filename, parser)

What should I put as a parser?

Thanks, Zedd


From stuartd@alerton.com  Mon Mar 19 17:20:33 2001
From: stuartd@alerton.com (Stuart Donaldson)
Date: Mon, 19 Mar 2001 09:20:33 -0800
Subject: [XML-SIG] WBXML?
Message-ID: <A19EEC21DB90D411B40900D0B7B4F8E703F990@alermx.alerton.com>

I'm new to this SIG mailing list, and have looked over the XML-SIG Status
page but could not find any reference to WBXML a WAP Binary XML standard.

Anyone out there working with this or another form of XML that is optimized
both for space and ease of parsing?

Thanks...
-Stuart-


From akuchlin@mems-exchange.org  Mon Mar 19 19:01:18 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 19 Mar 2001 14:01:18 -0500
Subject: [XML-SIG] iso8601 module: re-creating an original date
Message-ID: <E14f4u6-0001P7-00@ute.cnri.reston.va.us>

I've noticed that xml.utils.iso8601 doesn't provide enough information
to allow parsing and then re-creating a date.  iso8601.parse() takes a
string and returns the value in seconds since the epoch.  There's no
way to tell if the original date string was '2000-01-01' or '2000' or
'2000-01-01T00:00'.  You also can't parse the date manually in the
event you want an mxDateTime instead of just seconds, which means you
can't handle very old or very futuristic dates.

I'd like to add support for being precise and figuring out exactly
what was provided, but we need to discuss the interface a bit.

One possible API: parse_tuple(string) which returns a 9-tuple like the
one from time.gmtime() or time.localtime(), except that fields not
provided are represented by None, not 0.  (This means you can't pass
the tuple to functions like time.mktime() without first converting
None to 0.)  An alternative interface would be to return a dictionary
of fields, or an object with attributes.

Thoughts?

--amk


From fdrake@acm.org  Mon Mar 19 19:15:26 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 19 Mar 2001 14:15:26 -0500 (EST)
Subject: [XML-SIG] iso8601 module: re-creating an original date
In-Reply-To: <E14f4u6-0001P7-00@ute.cnri.reston.va.us>
References: <E14f4u6-0001P7-00@ute.cnri.reston.va.us>
Message-ID: <15030.23246.842826.68806@localhost.localdomain>

Andrew Kuchling writes:
 > One possible API: parse_tuple(string) which returns a 9-tuple like the
 > one from time.gmtime() or time.localtime(), except that fields not
 > provided are represented by None, not 0.  (This means you can't pass

  Con:  This maintains the existing "tuplized" excuse for a structure
-- pure evil, and a pain to work with!

 > the tuple to functions like time.mktime() without first converting
 > None to 0.)  An alternative interface would be to return a dictionary
 > of fields, or an object with attributes.

  I favor an object with attributes, and look forward to your updates
to the module.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From harris41@msu.edu  Mon Mar 19 21:04:48 2001
From: harris41@msu.edu (Scott Harrison)
Date: Mon, 19 Mar 2001 16:04:48 -0500
Subject: [XML-SIG] error with xhtml strict dtd
Message-ID: <3AB67470.15E08846@msu.edu>

What should be done with this situation below?  And do you have
a mailing list?  I'd like to contribute or at least stay
in touch as to what is going on.  Thanks -Scott

Trying to use pyxml with xhtml (using current cvs version).
xmlproc_val

E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:316:3: xml:space
must have exactly the values 'default' and 'preserve'
E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:326:3: xml:space
must have exactly the values 'default' and 'preserve'
E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:457:3: xml:space
must have exactly the values 'default' and 'preserve'

These are lines 316, 326 and 457 for xhtml1-strict.dtd:
  xml:space (preserve) #FIXED 'preserve'
  xml:space (preserve) #FIXED 'preserve'
  xml:space (preserve) #FIXED 'preserve'

And of course the part of the xmldtd.py code
that is responsible is shown here:

        if name=="xml:space":
            if type(self.type)==types.StringType:
                parser.report_error(2015)
                return

            if len(self.type)!=2:
                error=1
            else:
                if (self.type[0]=="default" and
self.type[1]=="preserve") or \
                   (self.type[1]=="default" and
self.type[0]=="preserve"):
                    error=0
                else:
                    error=1

            if error:
parser.report_error(2016)


From harris41@msu.edu  Mon Mar 19 21:39:41 2001
From: harris41@msu.edu (Scott Harrison)
Date: Mon, 19 Mar 2001 16:39:41 -0500
Subject: [XML-SIG] Re: error with xhtml strict dtd
References: <3AB67470.15E08846@msu.edu>
Message-ID: <3AB67C9D.92DFE8CE@msu.edu>

I would recommend this patch:

Index: xml/parsers/xmlproc/xmldtd.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/parsers/xmlproc/xmldtd.py,v
retrieving revision 1.11
diff -r1.11 xmldtd.py
408c408
<             if len(self.type)!=2:
---
>             if (len(self.type)!=2) and (len(self.type)!=1):
409a410,411
> 	    elif len(self.type)==1:
> 		error=0


Scott Harrison wrote:
> 
> What should be done with this situation below?  And do you have
> a mailing list?  I'd like to contribute or at least stay
> in touch as to what is going on.  Thanks -Scott
> 
> Trying to use pyxml with xhtml (using current cvs version).
> xmlproc_val
> 
> E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:316:3: xml:space
> must have exactly the values 'default' and 'preserve'
> E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:326:3: xml:space
> must have exactly the values 'default' and 'preserve'
> E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:457:3: xml:space
> must have exactly the values 'default' and 'preserve'
> 
> These are lines 316, 326 and 457 for xhtml1-strict.dtd:
>   xml:space (preserve) #FIXED 'preserve'
>   xml:space (preserve) #FIXED 'preserve'
>   xml:space (preserve) #FIXED 'preserve'
> 
> And of course the part of the xmldtd.py code
> that is responsible is shown here:
> 
>         if name=="xml:space":
>             if type(self.type)==types.StringType:
>                 parser.report_error(2015)
>                 return
> 
>             if len(self.type)!=2:
>                 error=1
>             else:
>                 if (self.type[0]=="default" and
> self.type[1]=="preserve") or \
>                    (self.type[1]=="default" and
> self.type[0]=="preserve"):
>                     error=0
>                 else:
>                     error=1
> 
>             if error:
> parser.report_error(2016)


From martin@loewis.home.cs.tu-berlin.de  Mon Mar 19 21:35:08 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Mar 2001 22:35:08 +0100
Subject: [XML-SIG] Minidom
In-Reply-To: <3AB60C48.C66BB433@itam.zabrze.pl> (message from Grzegorz
 Zegartowski on Mon, 19 Mar 2001 14:40:24 +0100)
References: <3AB60C48.C66BB433@itam.zabrze.pl>
Message-ID: <200103192135.f2JLZ8H01030@mira.informatik.hu-berlin.de>

> I wish to know how to reading xml files with validation...
> 
> there's a parse method:
> xml.dom.minidom.parse(filename, parser)
> 
> What should I put as a parser?

Depends on whether you only have Python 2, or PyXML. In Python 2, no
validating parser is included. With PyXML,
xml.sax.sax2exts.XMLValParserFactory.make_parser() will create you a
validating SAX parser (namely, xmlproc, unless additional validating
parsers have been registered).

Regards,
Martin


From j.lee@spitech.com  Fri Mar 16 10:50:50 2001
From: j.lee@spitech.com (Lee, Junmar)
Date: Fri, 16 Mar 2001 18:50:50 +0800
Subject: [XML-SIG] Help
Message-ID: <D179A0442527D411B2E60050DA8C3D89B1B71F@mail.spiglobe.com>

Hi,

	I wonder if you could help me out.

	I'm new at this so please forgive my ignorance.

	I just downloaded BeOpen-Python-2.0.exe and installed it.

	I then downloaded PythonXML.exe and installed that.

	Then I downloaded PyXML-0.6.4.win32-py2.0.exe and installed it also.

	My query is,  what now?   How do I get the XML parser to run?   I read in the docs that Python/XML has three(3) parsers.   How do I run them?   Can I make them into an EXE for Windows?   How can I do this?

	How can I get an EXE to look at an XML file and its DTD and say if it is well-formed and all those other XML parsing tools?

	Basically,  I was looking for an XML parser written in Python that will run in Windows or DOS.

	Sorry for all the questions and for being ignorant.   I just hope that you'll be able to help me out.


Regards,
Junmar :-)


From martin@loewis.home.cs.tu-berlin.de  Mon Mar 19 22:34:05 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Mar 2001 23:34:05 +0100
Subject: [XML-SIG] error with xhtml strict dtd
In-Reply-To: <3AB67470.15E08846@msu.edu> (message from Scott Harrison on Mon,
 19 Mar 2001 16:04:48 -0500)
References: <3AB67470.15E08846@msu.edu>
Message-ID: <200103192234.f2JMY5H01326@mira.informatik.hu-berlin.de>

> What should be done with this situation below?  

In general, you might submit a bug report to
sourceforge.net/projects/python.

> And do you have a mailing list?

Sure, xml-sig@python.org.

> Trying to use pyxml with xhtml (using current cvs version).
> xmlproc_val
> 
> E:http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:316:3: xml:space
> must have exactly the values 'default' and 'preserve'

Please have a look at the thread starting at

http://mail.python.org/pipermail/xml-sig/2000-October/003520.html

It appeared to me (and to the author of xmlproc) that XML 1.0 says
that the XHTML DTD is invalid
(http://mail.python.org/pipermail/xml-sig/2000-October/003523.html)

There is an erratum for XML 1.0 that says that this was a mistake in
XML 1.0, which was corrected with
http://www.w3.org/XML/xml-19980210-errata#E81

Lars Marius Garshol (the xmlproc author) indicated in

http://mail.python.org/pipermail/xml-sig/2000-October/003527.html

that he has a fix for this problem; so far, he has not managed to
contribute this fix into PyXML.

You propose the patch

Index: xml/parsers/xmlproc/xmldtd.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/parsers/xmlproc/xmldtd.py,v
retrieving revision 1.11
diff -r1.11 xmldtd.py
408c408
<             if len(self.type)!=2:
---
>             if (len(self.type)!=2) and (len(self.type)!=1):
409a410,411
> 	    elif len(self.type)==1:
> 		error=0

As a procedural note, please always submit unified (-u) or context
(-c) diffs; they are easier to read and also continue to work if the
file is slightly modified.

This particular patch seems incorrect: If there is a single value to
xml:space, it *still* must be either "default" or "preserve"; your
patch does not perform this patch. In any case, I still hope that Lars
will contribute his changes later this year.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Mar 19 22:47:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Mar 2001 23:47:33 +0100
Subject: [XML-SIG] Help
In-Reply-To: <D179A0442527D411B2E60050DA8C3D89B1B71F@mail.spiglobe.com>
 (j.lee@spitech.com)
References: <D179A0442527D411B2E60050DA8C3D89B1B71F@mail.spiglobe.com>
Message-ID: <200103192247.f2JMlXA01445@mira.informatik.hu-berlin.de>

> 	I just downloaded BeOpen-Python-2.0.exe and installed it.

That is a good starting point.

> 	I then downloaded PythonXML.exe and installed that.

I don't know what that is - where did you get it?

> 	Then I downloaded PyXML-0.6.4.win32-py2.0.exe and installed it also.

That is also a good thing.

> My query is,  what now?   

You now need to write a Python program that makes use of the packages.

> How do I get the XML parser to run?  I read in the docs that
> Python/XML has three(3) parsers.  How do I run them?

That is a somewhat surprising request. Why do you want to run the
parsers? An XML parser, when run, typically does not do much (*).

To write a program that is a simple XML parser, please try

import sys, xml.sax
parser = xml.sax.parse(sys.argv[1], xml.sax.ContentHandler())

>  Can I make them into an EXE for Windows?   How can I do this?

There is a number of ways to make a Python program into an
executable. These are independent from PyXML; please see the Python
FAQ for details.

> How can I get an EXE to look at an XML file and its DTD and say if
> it is well-formed and all those other XML parsing tools?

The script above invokes a non-validating parser, so it will only tell
you if it is well-formed, not whether it is valid. To run a validating
parser, you need to instantiate xmlproc. The next release of PyXML
will actually include two command line utilities to run xmlproc; they
offer a few more features, though (such as outputting ESIS) -
i.e. they offer some specific processing.

Regards,
Martin

(*) It does perform the well-formedness check, and might even perform
validation. So all you get out of it are ill-formedness and invalidity
errors. If that is all you need PyXML might not be the appropriate
choice of tool.


From tpassin@home.com  Mon Mar 19 23:16:13 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 19 Mar 2001 18:16:13 -0500
Subject: [XML-SIG] iso8601 module: re-creating an original date
References: <E14f4u6-0001P7-00@ute.cnri.reston.va.us>
Message-ID: <001201c0b0ca$9793c4a0$7cac1218@reston1.va.home.com>

Andrew Kuchling had a very good idea -

> I've noticed that xml.utils.iso8601 doesn't provide enough information
> to allow parsing and then re-creating a date.  iso8601.parse() takes a
> string and returns the value in seconds since the epoch.  There's no
> way to tell if the original date string was '2000-01-01' or '2000' or
> '2000-01-01T00:00'.  You also can't parse the date manually in the
> event you want an mxDateTime instead of just seconds, which means you
> can't handle very old or very futuristic dates.
>
> I'd like to add support for being precise and figuring out exactly
> what was provided, but we need to discuss the interface a bit.
>
> One possible API: parse_tuple(string) which returns a 9-tuple like the
> one from time.gmtime() or time.localtime(), except that fields not
> provided are represented by None, not 0.  (This means you can't pass
> the tuple to functions like time.mktime() without first converting
> None to 0.)  An alternative interface would be to return a dictionary
> of fields, or an object with attributes.
>
I favor a dictionary or an object that may contain or act like one.  In
favor of an object, you could add various conversion methods as it seems
they are needed, and still by backwards compatible with older methods.

Cheers,

Tom P


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 20 07:17:31 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Mar 2001 08:17:31 +0100
Subject: [XML-SIG] Getting namespace aware parser to work...
In-Reply-To: <Pine.LNX.4.21.0103190556410.30527-100000@clarkevans.com>
 (cce@clarkevans.com)
References: <Pine.LNX.4.21.0103190556410.30527-100000@clarkevans.com>
Message-ID: <200103200717.f2K7HVG01327@mira.informatik.hu-berlin.de>

> I'm trying to process the following xml file, with
> this python script to strip all elements with a 
> given namespace.  I believe that I have a pretty
> recent version (0.5.2).  I get the error following...

Thanks for your bug report. It would be interesting to find out what
version you are using; 0.5.x is not fairly recent - 0.6.2 would be.

In any case, I cannot reproduce the problem with 0.6.4, and I doubt
anything relevant has changed since 0.6.2 in this respect. What
version of Expat are you using (the one included with PyXML or a
different one)?

Looking at the error you get

>     qname = self._current_context[name[0]] + ":" + name[1]
> TypeError: bad operand type(s) for +

I would really like to know what self._current_context[name[0]] and
name[1] are at this point. I found a problem with default namespaces,
but otherwise, the code appears to be correct.

Regards,
Martin

P.S. Please send patches as unified (-u) or context (-c) diffs.


From larsga@garshol.priv.no  Tue Mar 20 08:11:56 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Mar 2001 09:11:56 +0100
Subject: [XML-SIG] Namespace Stripper Filter
In-Reply-To: <Pine.LNX.4.21.0103190718090.32426-100000@clarkevans.com>
References: <Pine.LNX.4.21.0103190718090.32426-100000@clarkevans.com>
Message-ID: <m3bsqwq2pf.fsf@lambda.garshol.priv.no>

* Clark C. Evans
|
|     def startElementNS(self, name, qname, attrs):
|         """Ignores elements and strips attributes of nmsp"""
|         if name[0] != self.nmsp:
|             #
|             # Warning: For efficiency this dives into the
|             #          underlying representation of AttributesNSImpl
|             #          and deletes attributes to be stripped.
|             #
|             #  _attrs should be of the form {(ns_uri, lname): value, ...}.
|             #  _qnames of the form {(ns_uri, lname): qname, ...}."""
|             #
|             for (ns_uri,lname) in attrs._attrs.keys():
|                 if self.nmsp == ns_uri: del attrs._attrs[(ns_uri,lname)]
|             self._cont_handler.startElementNS(name,qname,attrs)
|         else:
|             self.depth = self.depth + 1

This isn't really a good idea, since there is no guarantee that you
will in fact get AttributesNSImpl instances. The only thing that is
guaranteed is that the objects you get will follow that interface.

It is very likely that many SAX drivers, such as the Jython SAX
driver, will not use this class, but reimplement the interface in a
class specific to themselves.

Otherwise it looked fine to me.

--Lars M.


From larsga@garshol.priv.no  Tue Mar 20 08:13:26 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Mar 2001 09:13:26 +0100
Subject: [XML-SIG] Minidom
In-Reply-To: <200103192135.f2JLZ8H01030@mira.informatik.hu-berlin.de>
References: <3AB60C48.C66BB433@itam.zabrze.pl> <200103192135.f2JLZ8H01030@mira.informatik.hu-berlin.de>
Message-ID: <m3ae6gq2mx.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| With PyXML, xml.sax.sax2exts.XMLValParserFactory.make_parser() will
| create you a validating SAX parser (namely, xmlproc, unless
| additional validating parsers have been registered).

We shouldn't be using sax2exts any more, since that is just a legacy
thing left over from an old SAX 2.0 version. In fact, we should aim to
rip all that stuff out before too long.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 20 08:26:57 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Mar 2001 09:26:57 +0100
Subject: [XML-SIG] Minidom
In-Reply-To: <m3ae6gq2mx.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 20 Mar 2001 09:13:26 +0100)
References: <3AB60C48.C66BB433@itam.zabrze.pl> <200103192135.f2JLZ8H01030@mira.informatik.hu-berlin.de> <m3ae6gq2mx.fsf@lambda.garshol.priv.no>
Message-ID: <200103200826.f2K8QvC01632@mira.informatik.hu-berlin.de>

> | With PyXML, xml.sax.sax2exts.XMLValParserFactory.make_parser() will
> | create you a validating SAX parser (namely, xmlproc, unless
> | additional validating parsers have been registered).
> 
> We shouldn't be using sax2exts any more, since that is just a legacy
> thing left over from an old SAX 2.0 version. In fact, we should aim to
> rip all that stuff out before too long.

Then, of course, the question is: How do you create a parser that
supports validation?

Regards,
Martin


From loewis@informatik.hu-berlin.de  Tue Mar 20 08:51:27 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Tue, 20 Mar 2001 09:51:27 +0100 (MET)
Subject: [XML-SIG] PyXML 0.6.5 is released
Message-ID: <200103200851.JAA14502@pandora.informatik.hu-berlin.de>

Version 0.6.5 of the Python/XML distribution is now available.  It
should be considered a beta release, and can be downloaded from
the following URLs:

http://download.sourceforge.net/pyxml/PyXML-0.6.5.tar.gz
http://download.sourceforge.net/pyxml/PyXML-0.6.5.win32-py1.5.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.5.win32-py2.0.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.5-1.5.2.i386.rpm
http://download.sourceforge.net/pyxml/PyXML-0.6.5-2.0.i386.rpm

Changes in this version, compared to 0.6.4:

	* setup supports two command line options, --with-libexpat and
	  --ldflags to specify an alternative Expat installation.

	* Fourthought has contributed a new type xml.utils.boolean to
	  distinguish boolean from integral values.

	* The scripts xmlproc_parse and xmlproc_val, which allow
	  command-line interaction with xmlproc, are now included.

	* The WDDX marshalling now supports a "strict" and a "loose"
	  mode of operation.

	* minidom now supports the DocumentFragment interface, and
	  correctly sets the ownerDocument property.

	* A SAX exception now retrieves line number information when
	  it is created, not when it is printed.

	* Invoking sax2lib.ValidatingReaderFactory.make_parser creates
	  a reader object that is already set to validating mode.

	* A number of callback errors in the SAX2 xmlproc driver have
	  been corrected.

The Python/XML distribution contains the basic tools required for
processing XML data using the Python programming language, assembled
into one easy-to-install package.  The distribution includes parsers
and standard interfaces such as SAX and DOM, along with various other
useful modules. =20

The package currently contains:

	* XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius
Garshol), sgmlop (Fredrik Lundh).

	* SAX interface (Lars Marius Garshol)
	* minidom DOM implementation (Paul Prescod)
	* 4DOM from Fourthought (Uche Ogbuji, Mike Olson)
	* Various utility modules and functions (various people)
	* Documentation and example programs (various people)

The code is being developed bazaar-style by contributors from the
Python XML Special Interest Group, so please send comments, questions,
or bug reports to <xml-sig@python.org>.

For more information about Python and XML, see:
	http://www.python.org/topics/xml/

--=20
Martin v. L=F6wis               http://www.informatik.hu-berlin.de/~loewis


From frank63@ms5.hinet.net  Tue Mar 20 09:37:19 2001
From: frank63@ms5.hinet.net (Frank Chen)
Date: Tue, 20 Mar 2001 17:37:19 +0800
Subject: [XML-SIG] Re:WBXML
References: <E14fHt1-0000sR-00@mail.python.org>
Message-ID: <000c01c0b121$ea6b1fa0$f5a01ea3@MiTACUser>

>
> I'm new to this SIG mailing list, and have looked over the XML-SIG Status
> page but could not find any reference to WBXML a WAP Binary XML standard.
>
> Anyone out there working with this or another form of XML that is
optimized
> both for space and ease of parsing?
>
Maybe you should look for Java Xerces about WAP. I remembered that there are
some works for that.

Frank


From fdrake@acm.org  Tue Mar 20 14:23:06 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 20 Mar 2001 09:23:06 -0500 (EST)
Subject: [XML-SIG] Minidom
In-Reply-To: <m3ae6gq2mx.fsf@lambda.garshol.priv.no>
References: <3AB60C48.C66BB433@itam.zabrze.pl>
 <200103192135.f2JLZ8H01030@mira.informatik.hu-berlin.de>
 <m3ae6gq2mx.fsf@lambda.garshol.priv.no>
Message-ID: <15031.26570.616169.648499@cj42289-a.reston1.va.home.com>

Lars Marius Garshol writes:
 > We shouldn't be using sax2exts any more, since that is just a legacy
 > thing left over from an old SAX 2.0 version. In fact, we should aim to
 > rip all that stuff out before too long.

  Perhaps with Python 2.1 a DeprecationWarning should be issued?

        try:
            import warnings
        except ImportError:
            pass
        else:
            warnings.warn("sax2exts has been deprecated; use...",
                          DeprecationWarning)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 20 16:18:26 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Mar 2001 17:18:26 +0100
Subject: [XML-SIG] xml.xpath and xml.xslt are available
Message-ID: <200103201618.f2KGIQT08759@mira.informatik.hu-berlin.de>

I've added two packages to the XML package, xml.xpath and
xml.xslt. These are heavily based on 4XPath 4XSLT, but use PyXPath as
the expression parser. In theory, it should be possible to plug them
into a 4Suite installation, or use them stand-alone (without 4Suite).

In practice, much of the test suite passes, but there are still some
issues left. On the plus side, this has the chance of fixing the 4XSLT
bugs related to character sets, as the packages fully support Unicode.

To get an overview what has been taking literally from 4Suite and what
has been adopted, please have a look at xml/xpath/README.4XPath. Over
the next few months, we will strive to reduce the dependency on a
particular parser, and on Ft.Lib, so that really most of the files
become the same eventually.

If you find any problems, please let me know.

Regards,
Martin


From noreply@sourceforge.net  Tue Mar 20 17:14:14 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 20 Mar 2001 09:14:14 -0800
Subject: [XML-SIG] [ pyxml-Patches-410065 ] Range.surroundContents()
Message-ID: <E14fPi2-000745-00@usw-sf-web1.sourceforge.net>

Patches item #410065, was updated on 2001-03-20 09:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=410065&group_id=6473

Category: 4Suite
Group: None
Status: Open
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Range.surroundContents()

Initial Comment:
Hi.

I just started using your lib. First of all: 
congratulations, good
job. However, the surroundContents-Method of the Range-
Object is
broken (at least in my version [0.10.2
], maybe you fixed it already).

First of all, calling it surround instead of surrond 
would keep
newbies like am am with Python from getting serious 
problems with
their self-esteem ;-)

The major issue is, that you called insertNode after 
having removed
the Range's contents by calling extractContents, which 
has to fail,
scince then arbitrary siblings of the Range are at 
self.startOffset,
somtimes None. Here is a fix, maybe there is a more 
elegant solution,
I already mentioned I'm a newbie to Python.

Regards
Henrik Motakef

884,885c884,885
<     def surroundContents(self,newParent):
<         """Surround the range with this node"""
---
>     def surrondContents(self,newParent):
>         """Surrond the range with this node"""
916c916
<         df = self.cloneContents()
---
>         df = self.extractContents()
918,922c918
<         newParent.appendChild(df)
<         
<         refNode = self.startContainer.childNodes
[self.startOffset]
< 
<         self.startContainer.insertBefore(newParent, 
refNode)
---
>         self.insertNode(newParent)
924c920
<         self.startContainer.removeChild
(newParent.nextSibling)
---
>         newParent.appendChild(df)


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=410065&group_id=6473


From daniel pearson via RT <news-admins@freshmeat.net>  Tue Mar 20 17:24:00 2001
From: daniel pearson via RT <news-admins@freshmeat.net> (daniel pearson via RT)
Date: Tue, 20 Mar 2001 12:24:00 -0500 (EST)
Subject: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML update
Message-ID: <20010320172400.4D3C582FFF@mail.freshmeat.net>

The following notes are in response to a recent freshmeat.net submission:

- Martin v. L�wis <loewis@informatik.hu-berlin.de> has requested ownership of
  the freshmeat listing for Python/XML.  Do you approve of this?

This contribution cannot be processed until you take appropriate action on your
part and get back to us.

Sincerely,
daniel pearson
<news-admins@freshmeat.net>


Note: Make sure you include the prefix '[fm #6671]'
in the subject when replying to this email.


--- Headers Follow ---

>From nobody@freshmeat.net  Tue Mar 20 12:23:59 2001
Return-Path: <nobody@freshmeat.net>
Delivered-To: news-admins@freshmeat.net
Received: from www2.freshmeat.net (freshmeat.net [64.28.67.35])
	by mail.freshmeat.net (Postfix) with ESMTP id CC63182FAE
	for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 12:23:59 -0500 (EST)
Received: by www2.freshmeat.net (Postfix, from userid 65534)
	id E592ED6561; Tue, 20 Mar 2001 12:23:59 -0500 (EST)
To: news-admins@freshmeat.net
Subject: [fm #6671] (news-admins) Submission report - Python/XML update
From: daniel pearson <news-admins@freshmeat.net>
Message-Id: <20010320172359.E592ED6561@www2.freshmeat.net>
Date: Tue, 20 Mar 2001 12:23:59 -0500 (EST)
Sender: nobody@freshmeat.net

-------------------------------------------- Managed by Request Tracker


From stuartd@alerton.com  Tue Mar 20 17:22:30 2001
From: stuartd@alerton.com (Stuart Donaldson)
Date: Tue, 20 Mar 2001 09:22:30 -0800
Subject: [XML-SIG] Re: WBXML
Message-ID: <A19EEC21DB90D411B40900D0B7B4F8E703F995@alermx.alerton.com>

>From: "Frank Chen" <frank63@ms5.hinet.net>
>To: <xml-sig@python.org>
>Date: Tue, 20 Mar 2001 17:37:19 +0800
>Subject: [XML-SIG] Re:WBXML
>>
>> I'm new to this SIG mailing list, and have looked over the XML-SIG Status
>> page but could not find any reference to WBXML a WAP Binary XML standard.
>>
>> Anyone out there working with this or another form of XML that is
>optimized
>> both for space and ease of parsing?
>>
>
>Maybe you should look for Java Xerces about WAP. I remembered that there
are
>some works for that.
>
>Frank

Thus far everything I have found regarding WBXML and most everything for WAP
has been Java based.  But I have a python application that I would like to
incorporate these features in.  And since the entire reason for looking at
WBXML is performance and simplicity, the idea of using a Java layer in
between just doesn't make sense.

-Stuart-


From Rich Salz via RT <news-admins@freshmeat.net>  Tue Mar 20 17:52:40 2001
From: Rich Salz via RT <news-admins@freshmeat.net> (Rich Salz via RT)
Date: Tue, 20 Mar 2001 12:52:40 -0500 (EST)
Subject: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML
Message-ID: <20010320175240.61EF083043@mail.freshmeat.net>

yes!


--- Headers Follow ---

>From rsalz@zolera.com  Tue Mar 20 12:52:40 2001
Return-Path: <rsalz@zolera.com>
Delivered-To: news-admins@freshmeat.net
Received: from zolera.com (unknown [63.142.188.177])
	by mail.freshmeat.net (Postfix) with ESMTP id 2A85C82FFF
	for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 12:52:39 -0500 (EST)
Received: from zolera.com (os390.zolera.com [10.0.1.9])
	by zolera.com (8.9.3/8.9.3) with ESMTP id MAA02243
	for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 12:55:09 -0500
Sender: rsalz@zolera.com
Message-ID: <3AB7997D.623044BB@zolera.com>
Date: Tue, 20 Mar 2001 12:55:09 -0500
From: Rich Salz <rsalz@zolera.com>
X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: daniel pearson via RT <news-admins@freshmeat.net>
Subject: Re: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML 
 update
References: <20010320172400.4D3C582FFF@mail.freshmeat.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

-------------------------------------------- Managed by Request Tracker


From Request Tracker <news-admins@freshmeat.net>  Tue Mar 20 17:55:05 2001
From: Request Tracker <news-admins@freshmeat.net> (Request Tracker)
Date: Tue, 20 Mar 2001 12:55:05 -0500 (EST)
Subject: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML update
Message-ID: <20010320175505.DB0F183043@mail.freshmeat.net>

On Tue, Mar 20, 2001 at 12:24:00PM -0500, daniel pearson via RT wrote:
>The following notes are in response to a recent freshmeat.net submission:
>- Martin v. L�wis <loewis@informatik.hu-berlin.de> has requested ownership of
>  the freshmeat listing for Python/XML.  Do you approve of this?

Yes, I approve; Martin has taken over maintenance of Python/XML from me.

--amk


--- Headers Follow ---

>From akuchlin@mems-exchange.org  Tue Mar 20 12:55:05 2001
Return-Path: <akuchlin@mems-exchange.org>
Delivered-To: news-admins@freshmeat.net
Received: from ute.cnri.reston.va.us (cnri44.cnri.reston.va.us [132.151.1.44])
	by mail.freshmeat.net (Postfix) with ESMTP id 1A1B682FFF
	for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 12:55:05 -0500 (EST)
Received: from akuchlin by ute.cnri.reston.va.us with local (Exim 3.20 #1)
	id 14fQLP-0003c7-00
	for news-admins@freshmeat.net; Tue, 20 Mar 2001 12:54:55 -0500
Date: Tue, 20 Mar 2001 12:54:55 -0500
From: Andrew Kuchling <akuchlin@mems-exchange.org>
To: daniel pearson via RT <news-admins@freshmeat.net>
Subject: Re: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML update
Message-ID: <20010320125455.A13770@ute.cnri.reston.va.us>
Reply-To: akuchlin@mems-exchange.org
References: <20010320172400.4D3C582FFF@mail.freshmeat.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.2i
In-Reply-To: <20010320172400.4D3C582FFF@mail.freshmeat.net>; from news-admins@freshmeat.net on Tue, Mar 20, 2001 at 12:24:00PM -0500

-------------------------------------------- Managed by Request Tracker


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 20 18:16:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Mar 2001 19:16:15 +0100
Subject: [XML-SIG] Re: WBXML
In-Reply-To: <A19EEC21DB90D411B40900D0B7B4F8E703F995@alermx.alerton.com>
 (message from Stuart Donaldson on Tue, 20 Mar 2001 09:22:30 -0800)
References: <A19EEC21DB90D411B40900D0B7B4F8E703F995@alermx.alerton.com>
Message-ID: <200103201816.f2KIGFo09464@mira.informatik.hu-berlin.de>

> Thus far everything I have found regarding WBXML and most everything
> for WAP has been Java based.  But I have a python application that I
> would like to incorporate these features in.  And since the entire
> reason for looking at WBXML is performance and simplicity, the idea
> of using a Java layer in between just doesn't make sense.

I'm not aware of any WBXML libraries for Python. What kind of support
are you looking for?

For a parser, it would probably be most meaningful if a SAX reader was
implemented; for generating WBXML, an algorithm operating on a DOM
tree is probably most useful.

Are you interested in contributing any code to that respect?

Regards,
Martin


From stuartd@alerton.com  Tue Mar 20 18:54:32 2001
From: stuartd@alerton.com (Stuart Donaldson)
Date: Tue, 20 Mar 2001 10:54:32 -0800
Subject: [XML-SIG] Re: WBXML
Message-ID: <A19EEC21DB90D411B40900D0B7B4F8E703F998@alermx.alerton.com>


>-----Original Message-----
>From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
>Sent: Tuesday, March 20, 2001 10:16 AM
>To: Stuart Donaldson
>Cc: xml-sig@python.org
>Subject: Re: [XML-SIG] Re: WBXML
>
>> Thus far everything I have found regarding WBXML and most everything
>> for WAP has been Java based.  But I have a python application that I
>> would like to incorporate these features in.  And since the entire
>> reason for looking at WBXML is performance and simplicity, the idea
>> of using a Java layer in between just doesn't make sense.
>
>I'm not aware of any WBXML libraries for Python. What kind of support
>are you looking for?
>
>For a parser, it would probably be most meaningful if a SAX reader was
>implemented; for generating WBXML, an algorithm operating on a DOM
>tree is probably most useful.
>
>Are you interested in contributing any code to that respect?
>
>Regards,
>Martin

I'm looking for both reading and generating.  I would certainly be willing
to contribute anything I generate if I decide to go this route.  Currently I
am looking at WBXML as one possible solution, with the hoped for advantage
being an existing code base.  If I have to write it all then much of that
advantage goes out the window.

Is there much interest in a WBXML SAX reader implementation?

-Stuart-


From Thomas B. Passin" via RT <news-admins@freshmeat.net  Tue Mar 20 23:27:33 2001
From: Thomas B. Passin" via RT <news-admins@freshmeat.net (Thomas B. Passin via RT)
Date: Tue, 20 Mar 2001 18:27:33 -0500 (EST)
Subject: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML update
Message-ID: <20010320232733.1188F82FFF@mail.freshmeat.net>

yes

----- Original Message -----
From: "daniel pearson via RT" <news-admins@freshmeat.net>
To: "Python XML-SIG" <xml-sig@python.org>
Sent: Tuesday, March 20, 2001 12:24 PM
Subject: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML
update


> The following notes are in response to a recent freshmeat.net submission:
>
> - Martin v. L�wis <loewis@informatik.hu-berlin.de> has requested ownership
of
>   the freshmeat listing for Python/XML.  Do you approve of this?
>
> This contribution cannot be processed until you take appropriate action on
your
> part and get back to us.
>
> Sincerely,
> daniel pearson
> <news-admins@freshmeat.net>
>
>
> Note: Make sure you include the prefix '[fm #6671]'
> in the subject when replying to this email.
>
>
> --- Headers Follow ---
>
> >From nobody@freshmeat.net  Tue Mar 20 12:23:59 2001
> Return-Path: <nobody@freshmeat.net>
> Delivered-To: news-admins@freshmeat.net
> Received: from www2.freshmeat.net (freshmeat.net [64.28.67.35])
> by mail.freshmeat.net (Postfix) with ESMTP id CC63182FAE
> for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 12:23:59 -0500 (EST)
> Received: by www2.freshmeat.net (Postfix, from userid 65534)
> id E592ED6561; Tue, 20 Mar 2001 12:23:59 -0500 (EST)
> To: news-admins@freshmeat.net
> Subject: [fm #6671] (news-admins) Submission report - Python/XML update
> From: daniel pearson <news-admins@freshmeat.net>
> Message-Id: <20010320172359.E592ED6561@www2.freshmeat.net>
> Date: Tue, 20 Mar 2001 12:23:59 -0500 (EST)
> Sender: nobody@freshmeat.net
>
> -------------------------------------------- Managed by Request Tracker
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>


--- Headers Follow ---

>From tpassin@home.com  Tue Mar 20 18:27:32 2001
Return-Path: <tpassin@home.com>
Delivered-To: news-admins@freshmeat.net
Received: from femail15.sdc1.sfba.home.com (femail15.sdc1.sfba.home.com [24.0.95.142])
	by mail.freshmeat.net (Postfix) with ESMTP id 4354382FFB
	for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 18:27:32 -0500 (EST)
Received: from cj64132b ([24.18.172.124]) by femail15.sdc1.sfba.home.com
          (InterMail vM.4.01.03.20 201-229-121-120-20010223) with SMTP
          id <20010320232732.XSGT23165.femail15.sdc1.sfba.home.com@cj64132b>
          for <news-admins@freshmeat.net>; Tue, 20 Mar 2001 15:27:32 -0800
Message-ID: <001001c0b195$c26953e0$7cac1218@reston1.va.home.com>
From: "Thomas B. Passin" <tpassin@home.com>
To: "daniel pearson via RT" <news-admins@freshmeat.net>
References: <20010320172400.4D3C582FFF@mail.freshmeat.net>
Subject: Re: [XML-SIG] [fm #6671] (news-admins) Submission report - Python/XML update
Date: Tue, 20 Mar 2001 18:30:32 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="Windows-1252"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4522.1200

-------------------------------------------- Managed by Request Tracker


From ravi.nagaraja@wipro.com  Wed Mar 21 09:33:09 2001
From: ravi.nagaraja@wipro.com (Ravi Nagaraja)
Date: Wed, 21 Mar 2001 15:03:09 +0530
Subject: [XML-SIG] Cannot install XML package
Message-ID: <3AB87555.E860D2C9@wipro.com>

Hi,

I downloaded the Python version 2.0 and installed.
I then downloaded the python XML package PyXML-0_6_2_win32-py2_0.exe
I also installed the XML package.
I also downloaded the file - PyXML-0_6_2_tar.gz  and extracted it to a
dir.
When i tried to run the command:   python setup.py build  ,
It stops with an error message : No such command : cl.exe

What other files should i have to install the XML package ?

Thanks and regards
Ravi.N


From guenter.radestock@sap.com  Wed Mar 21 11:05:51 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Wed, 21 Mar 2001 12:05:51 +0100
Subject: [XML-SIG] Error handling in PyExpat
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90ED4@dbwdfx14.wdf.sap-ag.de>

Hello,

I am using PyExpat to parse XML files and sometimes these files are not
correct.  If
I find an error in my handler (start_element, end_element or characters), I
raise
an exception and abort processing the XML file.  If I raise the exception my
self in
the handler, parser.ErrorLineNumber (and other variables describing the
error position) are not available to my code (ErrorLineNumber contains a
random
value); that is in the exception handler that catches my exception.

It should be possible to detect the exception in the expat parser module and
set
call set_error() in pyexpat.c if the information is available from expat.  I
could not
check the expat documentation right now (sourceforge is currently
unavailable and
I don't have it locally) but I hope, somebody has thought of this.
Unfortunately
the (C level) handlers are void functions so there must be another way to
tell expat
that processing has failed.

I have checked my (between PyXML-0.6.3 and 0.6.4) PyExpat source and the
xmlplus sources
for the SAX implementation but did not find the code I am looking for.  Are
there
plans to implement this or should I do it my self?  What I need is:

If I raise an exception inside a handler, pyexpat.c.set_error() should be
called
(or some other function that gets line number, column number, byte posision
etc.).
I am not sure if this should be done for every exception or only for
subclasses
of expat.error.

Thanks in advance for any help.

- Guenter


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 21 12:07:21 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 21 Mar 2001 13:07:21 +0100
Subject: [XML-SIG] Cannot install XML package
In-Reply-To: <3AB87555.E860D2C9@wipro.com> (ravi.nagaraja@wipro.com)
References: <3AB87555.E860D2C9@wipro.com>
Message-ID: <200103211207.f2LC7LU01830@mira.informatik.hu-berlin.de>

> I downloaded the Python version 2.0 and installed.
> I then downloaded the python XML package PyXML-0_6_2_win32-py2_0.exe

Did you try to run this file? Also, I'd recomment to use 0.6.5 instead
of 0.6.2.

> I also installed the XML package.
> I also downloaded the file - PyXML-0_6_2_tar.gz  and extracted it to a
> dir.
> When i tried to run the command:   python setup.py build  ,
> It stops with an error message : No such command : cl.exe
> 
> What other files should i have to install the XML package ?

To install it from sources, you need a C++ compiler (Visual C++).  To
install the binary distribution (.exe), you need only Python 2.0.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 21 13:58:34 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 21 Mar 2001 14:58:34 +0100
Subject: [XML-SIG] Error handling in PyExpat
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90ED4@dbwdfx14.wdf.sap-ag.de>
 (guenter.radestock@sap.com)
References: <FAFE609CB754D311B60C0008C75D355608C90ED4@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200103211358.f2LDwYK02510@mira.informatik.hu-berlin.de>

> I am using PyExpat to parse XML files and sometimes these files are
> not correct.  If I find an error in my handler (start_element,
> end_element or characters), I raise an exception and abort
> processing the XML file.  If I raise the exception my self in the
> handler, parser.ErrorLineNumber (and other variables describing the
> error position) are not available to my code (ErrorLineNumber
> contains a random value); that is in the exception handler that
> catches my exception.

Yes, expat does not support user-identified error lines. However, it
should be possible to propagate such information with the exception
that you raise.

> It should be possible to detect the exception in the expat parser
> module and set call set_error() in pyexpat.c if the information is
> available from expat.

Not sure what you mean. set_error generates a Python exception when
the expat parser has produced an error. That has nothing to do with
errors that callback functions might have found.

> Unfortunately the (C level) handlers are void functions so there
> must be another way to tell expat that processing has failed.

I don't think so. This is C, so there is no means of exception
handling. Once a callback is invoked, it is safe to assume that the
XML in itself is correct. You have to let expat finish parsing before
it returns to you (AFAIK).

Of course, once pyexpat has seen a Python exception, all callbacks are
cleared, so no further events get reported.

> I have checked my (between PyXML-0.6.3 and 0.6.4) PyExpat source and
> the xmlplus sources for the SAX implementation but did not find the
> code I am looking for.  Are there plans to implement this or should
> I do it my self?

In expat proper? Not my plan, certainly. In pyexpat? Don't know how.
If you can come up with some code to do what you want, that would be
good.

> If I raise an exception inside a handler, pyexpat.c.set_error()
> should be called
> (or some other function that gets line number, column number, byte posision
> etc.).

flag_error is called in that case; I don't think it should manipulate
the user's exception object.

Regards,
Martin


From guenter.radestock@sap.com  Wed Mar 21 15:57:23 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Wed, 21 Mar 2001 16:57:23 +0100
Subject: [XML-SIG] Error handling in PyExpat
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90ED6@dbwdfx14.wdf.sap-ag.de>

> From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
> Sent: Mittwoch, 21. M=E4rz 2001 14:59
> To: Radestock, Guenter
> Cc: XML-SIG@python.org; Faerber, Franz
> Subject: Re: [XML-SIG] Error handling in PyExpat
>=20
>=20
> > I am using PyExpat to parse XML files and sometimes these files are
> > not correct.  If I find an error in my handler (start_element,
> > end_element or characters), I raise an exception and abort
> > processing the XML file.  If I raise the exception my self in the
> > handler, parser.ErrorLineNumber (and other variables describing the
> > error position) are not available to my code (ErrorLineNumber
> > contains a random value); that is in the exception handler that
> > catches my exception.
>=20
> Yes, expat does not support user-identified error lines. However, it
> should be possible to propagate such information with the exception
> that you raise.

Sorry - I missed it somehow.  ErrorLineNumber gave me numbers outside=20
the document - probably because I called it only after parsing,
but ErrorByteIndex has the right value, at least before I raise the
exception.  The values will be incorrect in the exception handler
because the parser continues, I guess.  Probably the parsing will =
continue,
but my handlers will not be called anymore because PyExpat (not Expat
itself)
knows about the exception?


> > Unfortunately the (C level) handlers are void functions so there
> > must be another way to tell expat that processing has failed.
>=20
> I don't think so. This is C, so there is no means of exception
> handling. Once a callback is invoked, it is safe to assume that the
> XML in itself is correct. You have to let expat finish parsing before
> it returns to you (AFAIK).

OK so there is no way to stop Expat when things go south in the C level
handler (they could have defined handlers int instead of void and =
stopped
parsing when somebody returned -1 ...).
Seems PyExpat can't do any better this way.

Thanks a lot.

- Guenter

PS: if you would stop calling handlers after a handler has raised an
exception,
you could freeze ErrorLine, ErroColumn and ErrorByteIndex to the values =
they
had when the (Python) handler returned to you.  But it seems you don't =
stop
calling handlers.  Probably I should do something like this in my =
script.


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 21 16:24:41 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 21 Mar 2001 17:24:41 +0100
Subject: [XML-SIG] Error handling in PyExpat
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90ED6@dbwdfx14.wdf.sap-ag.de>
 (guenter.radestock@sap.com)
References: <FAFE609CB754D311B60C0008C75D355608C90ED6@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200103211624.f2LGOfH03075@mira.informatik.hu-berlin.de>

> Sorry - I missed it somehow.  ErrorLineNumber gave me numbers
> outside the document - probably because I called it only after
> parsing, but ErrorByteIndex has the right value, at least before I
> raise the exception.  The values will be incorrect in the exception
> handler because the parser continues, I guess.  Probably the parsing
> will continue, but my handlers will not be called anymore because
> PyExpat (not Expat itself) knows about the exception?

All correct, AFAICT.

> OK so there is no way to stop Expat when things go south in the C level
> handler (they could have defined handlers int instead of void and stopped
> parsing when somebody returned -1 ...).

It looks like that. You may want to report that as a bug, at
sourceforge.net/projects/expat.

> PS: if you would stop calling handlers after a handler has raised an
> exception, you could freeze ErrorLine, ErroColumn and ErrorByteIndex
> to the values they had when the (Python) handler returned to you.
> But it seems you don't stop calling handlers.

All handlers are cleared in case of an error, so expat should not call
anything anymore. It will still continue to operate until it runs out
of data, or gets to the end of the document, or finds an XML error.

Freezing the error location would be an option, but might not do what
you expect - it would freeze the location of the last error that expat
found, which is not necessarily related to what the application
considers an error. 

If the real problem was a division by zero, or a NameError because of
a typo in the callback - should that propagate into the state of the
expat object?

What you should do is to record the current position in the exception
object. It appears that pyexpat does not support retrieven the
*Current* information - any patch to that respect would be appreciated
(*).

Regards,
Martin

(*) I don't know *why* it does not expose XML_GetCurrentLineNumber
etc; perhaps earlier versions did not support it? That might need some
investigation.


From noreply@sourceforge.net  Thu Mar 22 00:35:31 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 21 Mar 2001 16:35:31 -0800
Subject: [XML-SIG] [ pyxml-Patches-410416 ] Minor C fixes to PyXML 0.6.5
Message-ID: <E14ft4d-0000o3-00@usw-sf-web3.sourceforge.net>

Patches item #410416, was updated on 2001-03-21 16:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=410416&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: The Written Word (china) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: Minor C fixes to PyXML 0.6.5

Initial Comment:
Some of the C functions have a semicolon at the end.
The patch at
ftp://ftp.thewrittenword.com/outgoing/pub/PyXML-0.6.5.patch
fixes them.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=410416&group_id=6473


From represearch@yahoo.com  Wed Mar 21 19:52:33 2001
From: represearch@yahoo.com (reptile research)
Date: Wed, 21 Mar 2001 19:52:33
Subject: [XML-SIG] (no subject)
Message-ID: <E14fuF6-0000t1-00@mail.python.org>


From alexandre.fayolle@free.fr  Thu Mar 22 11:33:19 2001
From: alexandre.fayolle@free.fr (Alexandre Fayolle)
Date: Thu, 22 Mar 2001 12:33:19 +0100 (MET)
Subject: [XML-SIG] 4DOM compliance potential problem
Message-ID: <985260799.3ab9e2ff426d3@imp.free.fr>

Sourceforge seems to be down right now, so I post this directly to the list.

I working on an XML-Java course, and I the existence of the specified attribute 
of the Attr interface was just brought into light to me.

It seems to me that 4DOM does not comply on tht spec regarding this point. 
OTOH, the intended behaviour seems a real pain to implement (requires an access 
to the DTD when using validation, since the required info is not available from 
a SAX interface)

I'm quite happy with the current implemetation, but maybe this incompliance 
should be documented somewhere.

Alexandre 'freezing in London' Fayolle
--
http://alexandre.fayolle.free.fr
http://www.logilab.org
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From martin@loewis.home.cs.tu-berlin.de  Thu Mar 22 13:32:24 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 22 Mar 2001 14:32:24 +0100
Subject: [XML-SIG] 4DOM compliance potential problem
In-Reply-To: <985260799.3ab9e2ff426d3@imp.free.fr> (message from Alexandre
 Fayolle on Thu, 22 Mar 2001 12:33:19 +0100 (MET))
References: <985260799.3ab9e2ff426d3@imp.free.fr>
Message-ID: <200103221332.f2MDWOK03193@mira.informatik.hu-berlin.de>

> It seems to me that 4DOM does not comply on tht spec regarding this
> point.  OTOH, the intended behaviour seems a real pain to implement
> (requires an access to the DTD when using validation, since the
> required info is not available from a SAX interface)

A primary problem is that SAX does not suppot reporting whether the
information came from the DTD or from the document, see

http://lists.xml.org/archives/xml-dev/200102/msg00761.html

David Megginson has no intent to add it to SAX (or to continue
development of SAX, for that matter).

Even *if* that information was available through SAX, you still need a
validating parser to properly build the DOM tree - a non-validating
parser would not guess that an absent attribute might need to appear
in the tree.

So it appears that the DOM requires a parser to read the DTD. However,
they also write

# XML does not mandate that a non-validating XML processor read and
# process entity declarations made in the external subset or declared
# in external parameter entities.

In turn, I'd say that it is actually a bug in the DOM spec to mandate
that the specified attribute "works" - it should be a three-state
value: yes, no, maybe, and Attr nodes for unspecified but defaulted
attributes should not be mandated.

Regards,
Martin


From uche.ogbuji@fourthought.com  Thu Mar 22 14:25:44 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 22 Mar 2001 07:25:44 -0700
Subject: [XML-SIG] 4DOM compliance potential problem
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Thu, 22 Mar 2001 14:32:24 +0100." <200103221332.f2MDWOK03193@mira.informatik.hu-berlin.de>
Message-ID: <200103221425.HAA01411@localhost.localdomain>

> > It seems to me that 4DOM does not comply on tht spec regarding this
> > point.  OTOH, the intended behaviour seems a real pain to implement
> > (requires an access to the DTD when using validation, since the
> > required info is not available from a SAX interface)
> 
> A primary problem is that SAX does not suppot reporting whether the
> information came from the DTD or from the document, see
> 
> http://lists.xml.org/archives/xml-dev/200102/msg00761.html

Yes.  Lack of info from the low-level parsers has always been the problem here 
(we haven't written a dom.ext.readers.Xmlproc yet).

> So it appears that the DOM requires a parser to read the DTD. However,
> they also write
> 
> # XML does not mandate that a non-validating XML processor read and
> # process entity declarations made in the external subset or declared
> # in external parameter entities.
> 
> In turn, I'd say that it is actually a bug in the DOM spec to mandate
> that the specified attribute "works" - it should be a three-state
> value: yes, no, maybe, and Attr nodes for unspecified but defaulted
> attributes should not be mandated.

We complained to www-dom about this years ago, but all the discussion didn't, 
apparently, lead them to reconsider this.  I must confess that I've tended to 
wave off that particular corner of DOM madness since then.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Juergen Hermann" <jh@web.de  Fri Mar 23 09:30:32 2001
From: Juergen Hermann" <jh@web.de (Juergen Hermann)
Date: Fri, 23 Mar 2001 10:30:32 +0100
Subject: [XML-SIG] SAX Serializer
Message-ID: <m14gNtw-000tCWC@smtp.web.de>

Hi!

Is there any means in PyXML or other sources to serialize a SAX stream 
(i.e. w/o building an intermediary DOM tree)?


Ciao, J=FCrgen

--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/


From martin@loewis.home.cs.tu-berlin.de  Fri Mar 23 11:12:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 23 Mar 2001 12:12:36 +0100
Subject: [XML-SIG] SAX Serializer
In-Reply-To: <m14gNtw-000tCWC@smtp.web.de> (jh@web.de)
References: <m14gNtw-000tCWC@smtp.web.de>
Message-ID: <200103231112.f2NBCaF00792@mira.informatik.hu-berlin.de>

> Is there any means in PyXML or other sources to serialize a SAX stream 
> (i.e. w/o building an intermediary DOM tree)?

Sure. Pass it to a xml.sax.saxutils.XMLGenerator, and save the XML
document.

Not sure what kind of serialization you had in mind; this might
actually be one of the more efficient and compact options (compared
to, say, pickling something).

Regards,
Martin


From akuchlin@mems-exchange.org  Sat Mar 24 03:08:08 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 23 Mar 2001 22:08:08 -0500
Subject: [XML-SIG] iso8601: re-creating an original date II
Message-ID: <20010323220808.A13896@newcnri.cnri.reston.va.us>

I'm about halfway through my proposed course of enhancing iso8601.py to note
which portions of a date were supplied, but clearly it's reinventing the
wheel. The ISO8601Date class needs a converter to and from 9-tuples, seconds
until the epoch, and string format, and it all feels like I'm reimplementing
the C library or mxDateTime -- badly -- so I'm abandoning the effort.
mxDateTime already includes an ISO-8601 class; from the docs it doesn't seem
to support round-tripping, but that could be added, and it's probably less
work than recreating lots of complicated time handling code.

--amk


From mal@lemburg.com  Sat Mar 24 14:21:00 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 24 Mar 2001 15:21:00 +0100
Subject: [XML-SIG] iso8601: re-creating an original date II
References: <20010323220808.A13896@newcnri.cnri.reston.va.us>
Message-ID: <3ABCAD4C.7462AE91@lemburg.com>

Andrew Kuchling wrote:
> 
> I'm about halfway through my proposed course of enhancing iso8601.py to note
> which portions of a date were supplied, but clearly it's reinventing the
> wheel. The ISO8601Date class needs a converter to and from 9-tuples, seconds
> until the epoch, and string format, and it all feels like I'm reimplementing
> the C library or mxDateTime -- badly -- so I'm abandoning the effort.
>
> mxDateTime already includes an ISO-8601 class; from the docs it doesn't seem
> to support round-tripping, but that could be added, and it's probably less
> work than recreating lots of complicated time handling code.

mxDateTime has an ISO 8601 parser, not a special ISO 8601 class.

I am not sure what you mean with "round-tripping" -- could you 
explain ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Pages:                           http://www.lemburg.com/python/


From noreply@sourceforge.net  Mon Mar 26 12:37:47 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 26 Mar 2001 04:37:47 -0800
Subject: [XML-SIG] [ pyxml-Bugs-411350 ] 4XSLT xsl:attribute name not required
Message-ID: <E14hWFn-0007UY-00@usw-sf-web2.sourceforge.net>

Bugs item #411350, was updated on 2001-03-26 04:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=411350&group_id=6473

Category: 4Suite
Group: None
Status: Open
Priority: 5
Submitted By: Alexandre Fayolle (afayolle)
Assigned to: Nobody/Anonymous (nobody)
Summary: 4XSLT xsl:attribute name not required

Initial Comment:
Version used : 4Suite 0.10.2

Using <xsl:attribute without a name attribute is
accepted by 4XSLT, whereas name is required by the spec
(XSLT 1.0 �7.1.3)

Sample code

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:transform
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'>

<xsl:template match="/*">
<foo><xsl:attribute>banzai</xsl:attribute></foo>
</xsl:template>
</xsl:transform>

When applied to any well formed document with 4XSLT,
the following output is given:

<?xml version='1.0' encoding='UTF-8'?>
<foo ='banzai'/>

4XSLT should raise an exception.


Cheers,

Alexandre 'back from the UK' Fayolle

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=411350&group_id=6473


From stuff4gary@hotmail.com  Mon Mar 26 23:43:49 2001
From: stuff4gary@hotmail.com (gary cor)
Date: Mon, 26 Mar 2001 23:43:49
Subject: [XML-SIG] After installing python2.0 what other packages should I intsall for XML?
Message-ID: <F158h8s4KWiGtXsL1c000000508@hotmail.com>

Hello people,

I want to produce a web image database using XML and python.
If anyone has already done this I would be grateful if they could recommend 
what I should install on Win 2000 and learn how to use?

I want to the XML as a way of identifying my images and I want people to be 
able to edit my image descriptions from simple forms...  And to be able to 
collect them to add comments etc.


Gary


_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.


From jsydik@virtualparadigm.com  Tue Mar 27 02:48:51 2001
From: jsydik@virtualparadigm.com (Jeremy J. Sydik)
Date: Mon, 26 Mar 2001 20:48:51 -0600
Subject: [XML-SIG] After installing python2.0 what other packages should I intsall for XML?
In-Reply-To: <F158h8s4KWiGtXsL1c000000508@hotmail.com>
Message-ID: <MMEHLOIJDENFKMFKBPHEEEBECDAA.jsydik@virtualparadigm.com>

It depends on if you've installed Python already or not.  If you have,
the package at pyxml.sourceforge.net should get you up and running.  If
not, I've found that the activestate distribution has been nice to work
with and maintain on Win2K (particularly because of the package database
accessible through their site)

-----Original Message-----
From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On
Behalf Of gary cor
Sent: Monday, March 26, 2001 11:44 PM
To: xml-sig@python.org
Subject: [XML-SIG] After installing python2.0 what other packages should
I intsall for XML?


Hello people,

I want to produce a web image database using XML and python.
If anyone has already done this I would be grateful if they could recommend
what I should install on Win 2000 and learn how to use?

I want to the XML as a way of identifying my images and I want people to be
able to edit my image descriptions from simple forms...  And to be able to
collect them to add comments etc.


Gary


_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.


_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 27 13:52:31 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 27 Mar 2001 15:52:31 +0200
Subject: [XML-SIG] Unicode support in xmlproc
Message-ID: <200103271352.f2RDqVx01477@mira.informatik.hu-berlin.de>

I have committed a few changes to xmlproc which make it generate
Unicode strings, and deal with most aspects of character sets in XML
correctly (with respect to the recommendation). In particular, it
honors the encoding attribute of the xml declaration and performs the
optional autodetection of an encoding. Encoding information provided
from a higher level (e.g. MIME content type) is still for further
study (offering a set_input_encoding on the XMLCommonParser might be
appropriate).

On Python 1.5, a fallback procedure is used which only supports a
subset of the character sets (namely, US-ASCII, UTF-8, and Latin-1);
the application then receives UTF-8 encoded byte strings from xmlproc.

AFAIK, the only missing aspect is proper support for Unicode in tag
and attribute names; XML allows for a quite long list of characters,
and I'm not sure how to best implement that. If anybody has an sre
regular expression that correctly matches the Name production of XML,
please let me know.

This code has seen only little testing, so I'm pretty sure that there
are bugs in it. If you find any problems, please post them to the list
or on SF; ideally, the major problems should be resolved before 0.7 is
released. Unfortunately, running the testsuite with xmlproc as the
default parser does no good: many test cases expect an
IncremementalParser, and drv_xmlproc is not incremental.

Regards,
Martin


From larsga@garshol.priv.no  Tue Mar 27 14:38:02 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 27 Mar 2001 16:38:02 +0200
Subject: [XML-SIG] Unicode support in xmlproc
In-Reply-To: <200103271352.f2RDqVx01477@mira.informatik.hu-berlin.de>
References: <200103271352.f2RDqVx01477@mira.informatik.hu-berlin.de>
Message-ID: <m31yrj46r9.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
|
| AFAIK, the only missing aspect is proper support for Unicode in tag
| and attribute names; XML allows for a quite long list of characters,
| and I'm not sure how to best implement that. If anybody has an sre
| regular expression that correctly matches the Name production of XML,
| please let me know.

The question is also what the performance of that would be. Name
matching is performed very very often, so any changes here strongly
affect the overall performance of xmlproc.

It may also be that we want to use a dictionary of characters for
this. I think several avenues need to be explored here to find the
best approach.
 
| Unfortunately, running the testsuite with xmlproc as the default
| parser does no good: many test cases expect an IncremementalParser,
| and drv_xmlproc is not incremental.

That's probably easy to fix, since xmlproc is incremental.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 27 16:57:43 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: 27 Mar 2001 18:57:43 +0200
Subject: [XML-SIG] Unicode support in xmlproc
In-Reply-To: <m33dbz46rg.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 27 Mar 2001 16:37:55 +0200)
Message-ID: <200103271711.f2RHBtY02942@mira.informatik.hu-berlin.de>

> The question is also what the performance of that would be. Name
> matching is performed very very often, so any changes here strongly
> affect the overall performance of xmlproc.

That is certainly a problem. I had the hope that the Unicode character
classes of Python 2.0 are related to what a BaseChar is in XML, but
that turned out to be wrong: XML uses Unicode 2.0; the Python tables
are based on Unicode 3.0. Also, many letters have been excluded from
BaseChar which count as letters in Unicode.

> It may also be that we want to use a dictionary of characters for
> this. I think several avenues need to be explored here to find the
> best approach.

Indeed; I'll see what I can come up with.

> That's probably easy to fix, since xmlproc is incremental.

I'll look into that as well.

Regards,
Martin


From Lance_Hill/OLS.OLS@olsinc.net  Tue Mar 27 18:00:59 2001
From: Lance_Hill/OLS.OLS@olsinc.net (Lance_Hill/OLS.OLS@olsinc.net)
Date: Tue, 27 Mar 2001 13:00:59 -0500
Subject: [XML-SIG] Swap images for text elements using xsl?
Message-ID: <OFE83DE0B4.75622DE6-ON85256A1C.0062F790@olsinc.net>

Hi all,

I have a page generated from a database which I would like to use to show
status information. Currenly, I am using a table to display the text
wrapped in each tag (generally a Y or N), but I would prefer to use an
image selectred depending on the text element in each tag. I had thought
that using a choose/which would work well, but I cannot figure out where to
put it in the included code. Also, can I just subsitute a "<IMG SRC
="Picture.gif/>" for the "Y" or "N", leaving it as text aand letting the
browser handle the HTML? I can't seem to get it to even allow me to switch
the text which is displaying. Any assistance would be greatly appreciated.


<xsl:template match="/">
<HTML>
<BODY>
<table border="1">
<tr>
  <th>New Client Status</th>
...etc.
</tr>


 <xsl:for-each select="status_report/agency">
  <tr>
  <td>
  <xsl:value-of select="new_client_status" />
  </td>
...etc.

  </tr>
  </xsl:for-each>
  </table>
  </BODY>
  </HTML>
  </xsl:template>
  </xsl:stylesheet>


Thanks,
Lance M. Hill


From r.burton@180sw.com  Tue Mar 27 20:19:14 2001
From: r.burton@180sw.com (Ross Burton)
Date: 27 Mar 2001 21:19:14 +0100
Subject: [XML-SIG] Metadata in XBEL
Message-ID: <985724354.4243.0.camel@eddie>

Hi,

I am involved in adding support for XBEL to Galeon, the GNOME Mozilla
Gecko based browser.  Initial export is working, but there are several
issues related to the metadata elements which I would like some
clarification about.

The specification for metadata is vague, <metadata> elements have a
"owner" attribute which should be a URI. But what forms of metadata are
valid? The DTD implies that there can be no children of metadata
elements (the content is EMPTY).

Currently Galeon-specific attributes are exported as follows:

<site ...>
  <info>
    <metadata owner="http://galeon.sourceforge.net/pixmap">
        /home/users/ross/pictures/slashdot.png
    </metadata>
  </info>
</site>

It does seem that the DTD is in error as requiring all metadata to be in
the owner attribute is rather limiting.  But what content is allowed as
children of the metadata element? Just text? Or could the content of a
metadata element be a free-form XML tree?  For example:

<site ...>
  <info>
    <metadata owner="http://galeon.sourceforge.net">
      <pixmap>/home/users/ross/pictures/slashdot.org</pixmap>
      <toolbar>true</toolbar>
    </metadata>
  </info>
</site>

This, although allowing more free-form data, is heavier for the
application. We intent to be "polite" to XBEL data and store any unknown
metadata so that it can be written out again - if only text is allowed
this is trivial, otherwise tree fragments have to be stored.

I hope this can be cleared up,
Regards,
Ross Burton


From martin@loewis.home.cs.tu-berlin.de  Tue Mar 27 21:30:51 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 27 Mar 2001 23:30:51 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <985724354.4243.0.camel@eddie> (message from Ross Burton on 27
 Mar 2001 21:19:14 +0100)
References: <985724354.4243.0.camel@eddie>
Message-ID: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>

> The specification for metadata is vague, <metadata> elements have a
> "owner" attribute which should be a URI. But what forms of metadata are
> valid? The DTD implies that there can be no children of metadata
> elements (the content is EMPTY).
[...]
> It does seem that the DTD is in error as requiring all metadata to be in
> the owner attribute is rather limiting.

That is clearly not the intent, so I'd agree that the DTD is in error.
You'll have to ask Fred Drake to be sure; I *think* the idea was that
metadata has a content model of ANY. The documentation makes it clear
that "owner" is just to tell apart the various sources which may put
metadata into the bookmark list:

      The \element{metadata} element is used as a container for all
      auxillary information related to a node which belongs to a
      single metadata scheme.  The specific contents of
      \element{metadata} is highly dependent on the metadata scheme
      which applies; XML namespaces should be used to identify
      explicit markup used within the element.

So the intent clearly is that content within the metadata element is
possible, and may use XML markup.

That, of course, would mean that a version 1.1 of XBEL needs to be
issued, so perhaps this is the time to think about other pending
improvements.

> This, although allowing more free-form data, is heavier for the
> application. We intent to be "polite" to XBEL data and store any
> unknown metadata so that it can be written out again - if only text
> is allowed this is trivial, otherwise tree fragments have to be
> stored.

You don't necessarily have to store tree fragments; you just need to
find the matching closing tag. I don't know how your parsing
technology works, but it seems that restring it to text cannot be such
a big simplification: you have to do normal XML parsing, otherwise you
won't properly deal with CDATA sections and other XML "features".

Regards,
Martin


From dieter@handshake.de  Tue Mar 27 20:57:09 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Tue, 27 Mar 2001 22:57:09 +0200 (CEST)
Subject: [XML-SIG] Unicode support in xmlproc
In-Reply-To: <138016683@toto.iv>
Message-ID: <15040.65189.27587.140130@lindm.dm>

Great!

Dieter


From ross@180sw.com  Tue Mar 27 22:26:40 2001
From: ross@180sw.com (Ross Burton)
Date: Tue, 27 Mar 2001 23:26:40 +0100 (BST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>

On Tue, 27 Mar 2001, Martin v. Loewis wrote:

> > The specification for metadata is vague, <metadata> elements have a
> > "owner" attribute which should be a URI. But what forms of metadata are
> > valid? The DTD implies that there can be no children of metadata
> > elements (the content is EMPTY).
> [...]
> > It does seem that the DTD is in error as requiring all metadata to be in
> > the owner attribute is rather limiting.
> 
> That is clearly not the intent, so I'd agree that the DTD is in error.
> You'll have to ask Fred Drake to be sure; I *think* the idea was that
> metadata has a content model of ANY. The documentation makes it clear
> that "owner" is just to tell apart the various sources which may put
> metadata into the bookmark list:
<snip>
> So the intent clearly is that content within the metadata element is
> possible, and may use XML markup.

Thought so.

> That, of course, would mean that a version 1.1 of XBEL needs to be
> issued, so perhaps this is the time to think about other pending
> improvements.

Less child nodes (such as <title>) and more attributes? :-) Only kidding
but libxml (aka gnome-xml) isn't that good with child nodes. I miss W3C
DOM...

Thanks for the mail, it's confirmed what I thought (and feared... :-)

Ross
-- 
Ross Burton                     Software Engineer
OneEighty Software Ltd          Tel: +44 20 8263 2332
The Lansdowne Building          Fax: +44 20 8263 6314
2 Lansdowne Road                r.burton@180sw.com
Croydon, Surrey CR9 2ER, UK     http://www.180sw.com./
====================================================================
Under the Regulation of Investigatory Powers (RIP) Act 2000 together
with any and all Regulations in force pursuant to the Act OneEighty
Software Ltd reserves the right to monitor any or all incoming or
outgoing communications as provided for under the Act


From cce@clarkevans.com  Wed Mar 28 06:33:11 2001
From: cce@clarkevans.com (Clark C. Evans)
Date: Wed, 28 Mar 2001 01:33:11 -0500 (EST)
Subject: [XML-SIG] Simple wxPython XSLT Testing Tool using MSXML
Message-ID: <Pine.LNX.4.21.0103280128060.13057-100000@clarkevans.com>

Title says it all... I was wondering if Uche or someone
else would replace "TransformStage" so that it used
the FourThought XSLT processor...

One item... the file "msxml" is generated using makepy,
I'm opting for the "very static" approach; this could
be written to use the dynamic methods as detailed on P204
of the Python Programming On Windows.  I'm not using
this method since I have a very small stripped down
"msxml" that only has the items I need for my program.

This is a development tool that I've been using to 
test XSLT.  

Share & Enjoy,

Clark

P.S.  coding comments would be greatly apprechiated
      as I'm relatively new to python....

...

import sys, os
import msxml
import pythoncom
from   wxPython.wx         import *
from   wxPython.html       import *
from   wxPython.lib        import wxpTag

class OutputHtmlWindow(wxHtmlWindow):
    def __init__(self, parent, id):
        wxHtmlWindow.__init__(self, parent, id)

    def OnLinkClicked(self, linkinfo):
        self.base_OnLinkClicked(linkinfo)

    def OnSetTitle(self, title):
        self.base_OnSetTitle(title)

class OpenFileLine(wxWindow):
    def __init__(self,parent,id,pos,mode,label):
        wxWindow.__init__(self,parent,id,pos,wxSize(400,30))
        self.mode = mode
        wxStaticText(self, -1, label,
wxPoint(0,5),wxSize(75,25),wxALIGN_RIGHT)
        self.text = wxTextCtrl(self,-1,"",wxPoint(75,0),wxSize(200,25))
        wxButton(self,10,"Change",wxPoint(275,0))
        EVT_BUTTON(self, 10, self.OnClick)
        EVT_SET_FOCUS

    def OnClick(self, event):
        dlg = wxFileDialog(self, "Choose a file", ".", "*.*",
"*.xml,*.xslt,*.xsl,*.html,*.xhtml", self.mode)
        if dlg.ShowModal() == wxID_OK:
            if os.path.exists(dlg.GetPath()):
                self.text.SetValue(dlg.GetPath())
        dlg.Destroy()

    def OnSetFocus(self,event):
        wxMessageBox("Focus!")
        self.text.SetFocus()

    def SetValue(self,str):
        self.text.SetValue(str)
        self.text.SetInsertionPoint(0)

    def GetValue(self): return self.text.GetValue()


class OpenFileDlg(wxDialog):
    def __init__(self, parent,mode):
        if mode is wxOPEN:
            wxDialog.__init__(self,parent, -1, "Open Files",
wxDefaultPosition, wxSize(450, 250))
        else:
            if mode is wxSAVE:
                wxDialog.__init__(self,parent, -1, "Save Files",
wxDefaultPosition, wxSize(450, 250))
            else:
                raise TypeError("Expected wxOpen or wxSave")
        self.mode = mode
        self.metaxslt = OpenFileLine(self,-1,wxPoint(20,5),self.mode,"Meta
X&SLT:")
        self.metadata =
OpenFileLine(self,-1,wxPoint(20,35),self.mode,"Meta D&ATA:")
        self.mainxslt =
OpenFileLine(self,-1,wxPoint(20,65),self.mode,"Main &XSLT:")
        self.maindata =
OpenFileLine(self,-1,wxPoint(20,95),self.mode,"Main &DATA:")
        self.output   =
OpenFileLine(self,-1,wxPoint(20,125),self.mode,"&Output:")
        wxButton(self, wxID_OK,     " OK ", wxPoint(75, 175),
wxDefaultSize).SetDefault()
        wxButton(self, wxID_CANCEL, " Cancel ", wxPoint(200, 175),
wxDefaultSize)

    def ReadNames(self,conf):
        self.metaxslt.SetValue(conf.Read("metaxslt"))
        self.metadata.SetValue(conf.Read("metadata"))
        self.mainxslt.SetValue(conf.Read("mainxslt"))
        self.maindata.SetValue(conf.Read("maindata"))
        self.output.SetValue(conf.Read("output"))

    def WriteNames(self,conf):
        conf.Write("metaxslt",self.metaxslt.GetValue())
        conf.Write("metadata",self.metadata.GetValue())
        conf.Write("mainxslt",self.mainxslt.GetValue())
        conf.Write("maindata",self.maindata.GetValue())
        conf.Write("output",self.output.GetValue())

class MainFrame(wxFrame):
    def __init__(self, parent, ID, title):
        ID_ABOUT = 101
        ID_OPEN  = 102
        ID_SAVE  = 103
        ID_META  = 104
        ID_TRAN  = 105
        ID_PRINT = 106
        ID_EXIT  = 107

        wxFrame.__init__(self, parent, ID, title,
                         wxDefaultPosition, wxSize(600, 400))

        self.CreateStatusBar()
        self.SetStatusText("This is the statusbar")
        menu = wxMenu()
        menu.Append(ID_ABOUT, "&About",
                    "More information about this program")
        menu.Append(ID_OPEN, "&Open\tCtrl+O",
                    "Open files.")
        menu.Append(ID_SAVE, "&Save\tCtrl+S",
                    "Save files." )
        menu.Append(ID_TRAN, "&Transform\tCtrl+T",
                    "Run XSLT Transform")
        menu.Append(ID_META, "&Meta\tCtrl+M",
                    "Show meta frame")
        menu.Append(ID_PRINT, "&Print\tCtrl+P",
                    "Print HTML." )
        menu.AppendSeparator()
        menu.Append(ID_EXIT, "E&xit", "Terminate the program")
        menuBar = wxMenuBar()
        menuBar.Append(menu, "&File");
        self.SetMenuBar(menuBar)
        EVT_MENU(self, ID_ABOUT, self.OnAbout)
        EVT_MENU(self, ID_OPEN,  self.OnOpen)
        EVT_MENU(self, ID_SAVE,  self.OnSave)
        EVT_MENU(self, ID_META,  self.ShowMeta)
        EVT_MENU(self, ID_EXIT,  self.TimeToQuit)
        EVT_MENU(self, ID_PRINT, self.OnPrint)
        EVT_MENU(self, ID_TRAN,  self.OnTransform)

        self.primary    = wxSplitterWindow(self,-1,wxDefaultPosition,
wxDefaultSize, wxSP_3D)
        self.meta       =
wxSplitterWindow(self.primary,-1,wxDefaultPosition,  wxDefaultSize,
wxSP_3D)
        self.metaxslt  = wxTextCtrl(self.meta, -1, "", wxDefaultPosition,
wxDefaultSize,
                                     wxTE_MULTILINE|wxSUNKEN_BORDER)
        self.metadata  = wxTextCtrl(self.meta, -1, "", wxDefaultPosition,
wxDefaultSize,
                                     wxTE_MULTILINE|wxSUNKEN_BORDER)
        self.meta.SplitHorizontally(self.metaxslt,self.metadata)
        self.meta.SetMinimumPaneSize(100)
        self.meta.Show(0)

        self.secondary  =
wxSplitterWindow(self.primary,-1,wxDefaultPosition,  wxDefaultSize,
wxSP_3D)

        self.main       =
wxSplitterWindow(self.secondary,-1,wxDefaultPosition, wxDefaultSize,
wxSP_3D)
        self.mainxslt  = wxTextCtrl(self.main, -1, "", wxDefaultPosition,
wxDefaultSize,
                                     wxTE_MULTILINE|wxSUNKEN_BORDER)
        self.maindata  = wxTextCtrl(self.main, -1, "", wxDefaultPosition,
wxDefaultSize,
                              wxTE_MULTILINE|wxSUNKEN_BORDER)
        self.main.SplitHorizontally(self.mainxslt,self.maindata)
        self.main.SetMinimumPaneSize(100)

        self.out        =
wxSplitterWindow(self.secondary,-1,wxDefaultPosition, wxDefaultSize,
wxSP_3D)
        self.output   = wxTextCtrl(self.out, -1, "", wxDefaultPosition,
wxDefaultSize,
                                     wxTE_MULTILINE|wxSUNKEN_BORDER)
        self.html = OutputHtmlWindow(self.out, -1)
        self.html.SetRelatedFrame(self, "wxXSLT: %s")
        self.html.SetRelatedStatusBar(0)

        self.out.SplitHorizontally(self.output,self.html)
        self.out.SetMinimumPaneSize(100)

        self.primary.SplitVertically(self.meta,self.secondary)
        self.primary.SetMinimumPaneSize(0)
        self.primary.SetSashPosition(0)

        self.secondary.SplitVertically(self.main,self.out)
        self.secondary.SetMinimumPaneSize(50)
        self.secondary.SetSashPosition(300)
        self.main.SetSashPosition(200)
        self.out.SetSashPosition(200)
        self.meta.SetSashPosition(200)

        self.LoadFrames()

    def ShowMeta(self,event):
        if self.meta.IsShown():
            self.meta.Show(0)
            self.primary.SetSashPosition(0)
        else:
            self.meta.Show(1)
            size = self.GetSize()
            self.primary.SetSashPosition(size.width/3)

    def OnAbout(self, event):
        dlg = wxMessageDialog(self, "This program can be used to\n"
                              "test XSLT within a Python Environment.",
                              "About Me", wxOK | wxICON_INFORMATION)
        dlg.ShowModal()
        dlg.Destroy()

    def TimeToQuit(self, event):
        self.Close(true)

    def OnPrint(self, event):
        printer = wxHtmlEasyPrinting()
        printer.PrintFile(self.html.GetOpenedPage())

    def LoadFrames(self):
        conf = wxConfig("PythonXSLTTester")
        def Load(ctl,conf,str,bad):
            try:
                 if conf.Read(str):
                     file = open(conf.Read(str),"r")
                     ctl.SetValue(file.read())
                     file.close()
                 else:
                     ctl.SetValue("")
                 return bad
            except IOError, value:
                return "%s\n%s" % (bad,value)

        bad = Load(self.metaxslt,conf,"metaxslt","")
        bad = Load(self.metadata,conf,"metadata",bad)
        bad = Load(self.mainxslt,conf,"mainxslt",bad)
        bad = Load(self.maindata,conf,"maindata",bad)
        bad = Load(self.output,conf,"output",bad)
        if bad:
            wxMessageBox(bad,"Could Not Open One Or More Files")

    def OnOpen(self,event):
        win = OpenFileDlg(self,wxOPEN)
        conf = wxConfig("PythonXSLTTester")
        win.ReadNames(conf)
        val = win.ShowModal()
        if val == wxID_OK:
            win.WriteNames(conf)
            self.LoadFrames()

    def OnSave(self,event):
        conf = wxConfig("PythonXSLTTester")
        def Save(ctl,conf,str,bad):
            try:
                 if conf.Read(str):
                     val = ctl.GetValue()
                     if val:
                         file = open(conf.Read(str),"w")
                         file.write(val)
                         file.close()
                 return bad
            except IOError, value:
                return "%s\n%s" % (bad,value)

        bad = Save(self.metaxslt,conf,"metaxslt","")
        bad = Save(self.metadata,conf,"metadata",bad)
        bad = Save(self.mainxslt,conf,"mainxslt",bad)
        bad = Save(self.maindata,conf,"maindata",bad)
        bad = Save(self.output,conf,"output",bad)
        if bad:
            wxMessageBox(bad,"Could Not Save One Or More Files")

    def ErrorToString(self,hr,msg,exc,arg):
        ret = "%d: %s" % (hr,msg)
        if exc:
            wcode, source, text, helpFile, helpID, scode = exc
            ret = "%s\nSource: %s\nText: %s" % (ret, source,text)
        return ret

    def OnTransform(self,event):
        if self.meta.IsShown():
            temp =
self.TransformStage(self.metaxslt.GetValue(),self.metadata.GetValue())
            if temp:
                self.mainxslt.SetValue(temp)
            else:
                return
        temp =
self.TransformStage(self.mainxslt.GetValue(),self.maindata.GetValue())
        if temp:
            self.output.SetValue(temp)
            try:
                self.html.SetPage(temp)
            except: pass

    def TransformStage(self,xslt,data):
        try:
            domxslt =
msxml.DOMDocument.default_interface(msxml.FreeThreadedDOMDocument())
            domdata =
msxml.DOMDocument.default_interface(msxml.DOMDocument())

            domdata.validateOnParse = 0
            domdata.async = 0
            domdata.preserveWhiteSpace = 1


            domxslt.validateOnParse = 1
            domxslt.async = 0
            domxslt.preserveWhiteSpace = 1

            try:
                domdata.loadXML(data)
            except pythoncom.com_error, ( hr, msg, exc, arg ):
                wxMessageBox("%s\n\n%s\nline:%d col:%d\ntext: %s" %
                             ( self.ErrorToString(hr,msg,exc,arg),
                               domdata.parseError.reason,
                               domdata.parseError.line,
                               domdata.parseError.linepos,
                               domdata.parseError.srcText), "Error Parsing
Data")
                return None

            try:
                domxslt.loadXML(xslt)
            except pythoncom.com_error, ( hr, msg, exc, arg):
                wxMessageBox("%s\n\n%s\nline:%d col:%d\ntext: %s" %
                             ( self.ErrorToString(hr,msg,exc,arg),
                               domdata.parseError.reason,
                               domdata.parseError.line,
                               domdata.parseError.linepos,
                               domdata.parseError.srcText), "Error Parsing
XSLT")
                return None


            templ =
msxml.XSLTemplate.default_interface(msxml.XSLTemplate())
            templ.stylesheet = domxslt
            proc = templ.createProcessor()
            proc.input = domdata
            proc.transform()
            return proc.output.encode("ISO8859-1")
        except pythoncom.com_error, ( hr, msg, exc, arg):
            wxMessageBox(self.ErrorToString(hr,msg,exc,arg), "Error
Transforming")
            return None

class TheApp(wxApp):
    def OnInit(self):
        frame = MainFrame(NULL, -1, "Python XSLT Testing Tool")
        frame.Show(true)
        self.SetTopWindow(frame)
        return true

if __name__ == '__main__':
    app = TheApp(0)
    app.MainLoop()


From Olivier.Cayrol@logilab.fr  Wed Mar 28 08:33:06 2001
From: Olivier.Cayrol@logilab.fr (Olivier CAYROL (Logilab))
Date: Wed, 28 Mar 2001 10:33:06 +0200 (CEST)
Subject: [XML-SIG] Swap images for text elements using xsl?
In-Reply-To: <OFE83DE0B4.75622DE6-ON85256A1C.0062F790@olsinc.net>
Message-ID: <Pine.LNX.4.21.0103281028170.2971-100000@sagittarius.logilab.fr>

Hello,

On Tue, 27 Mar 2001 Lance_Hill/OLS.OLS@olsinc.net wrote:

> Currenly, I am using a table to display the text wrapped in each tag
> (generally a Y or N), but I would prefer to use an image selectred
> depending on the text element in each tag.

A solution is to use an <xsl:choose>, <xsl:when>, <xsl:otherwise>
statement (see example below).

  <xsl:template match=3D"/">
  <HTML>
  <BODY>
  <table border=3D"1">
  <tr>
    <th>New Client Status</th>
  ...etc.
  </tr>
 =20
 =20
   <xsl:for-each select=3D"status_report/agency">
    <tr>
    <td>

     <xsl:choose>
      <xsl:when test=3D"new-client_status =3D 'Y'">
       <IMG SRC=3D"image_yes.gif"/>
      </xsl:when>
      <xsl:otherwise>
       <IMG SRC=3D"image_no.gif"/>
      </xsl:otherwise>
     </xsl:choose>

    </td>
  ...etc.
 =20
    </tr>
    </xsl:for-each>
    </table>
    </BODY>
    </HTML>
    </xsl:template>
    </xsl:stylesheet>

Regards,

  O. CAYROL.
_________________________________________________________________________
Olivier CAYROL                                   LOGILAB - Paris (France)
                                                 http://www.logilab.com/
Change your millenium, try NARVAL the Intelligent Personal Assistant.
Changez de mill=E9naire, essayez NARVAL l'Assistant Personnel Intelligent.
_________________________________________________________________________


From fdrake@acm.org  Wed Mar 28 17:29:59 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 28 Mar 2001 12:29:59 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <985724354.4243.0.camel@eddie>
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
Message-ID: <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>

[Adding David Faure to the recipients list.]

Ross Burton writes:
 > I am involved in adding support for XBEL to Galeon, the GNOME Mozilla
 > Gecko based browser.  Initial export is working, but there are several

  Cool!  I've been meaning to play with Galeon; I guess I've just
gotten a better excuse.  ;-)

 > The specification for metadata is vague, <metadata> elements have a
 > "owner" attribute which should be a URI. But what forms of metadata are
 > valid? The DTD implies that there can be no children of metadata
 > elements (the content is EMPTY).

  Here's the problem:  What we want is to be able to say
"ANY-and-we-really-mean-it", not ANY as defined in the DTD language.
That definition tells us that ANY means anything *defined in the DTD*,
which is pretty limited -- this is an inherited SGML wart.  I don't
know how to express what we actually want in the DTD language; if
anyone can tell me, I'd be glad to change the DTD for revision 1.1.
If anyone can tell me how to do it in XSchema, I'd be happy to use
that for the schema language instead of using the DTD language.

 > Currently Galeon-specific attributes are exported as follows:

...ugh!... Don't do that.

 > It does seem that the DTD is in error as requiring all metadata to be in
 > the owner attribute is rather limiting.  But what content is allowed as
 > children of the metadata element? Just text? Or could the content of a
 > metadata element be a free-form XML tree?  For example:
 > 
 > <site ...>
 >   <info>
 >     <metadata owner="http://galeon.sourceforge.net">
 >       <pixmap>/home/users/ross/pictures/slashdot.org</pixmap>
 >       <toolbar>true</toolbar>
 >     </metadata>
 >   </info>
 > </site>

  This is *much* better!  It also matches the intent.

 > This, although allowing more free-form data, is heavier for the
 > application. We intent to be "polite" to XBEL data and store any unknown
 > metadata so that it can be written out again - if only text is allowed
 > this is trivial, otherwise tree fragments have to be stored.

  This wasn't hard to do for Grail, which also supported this use.
But Python data types make this pretty trivial as long as I can get
all the interesting parse events.

Martin v. Loewis writes:
 > That, of course, would mean that a version 1.1 of XBEL needs to be
 > issued, so perhaps this is the time to think about other pending
 > improvements.

  I'm very happy with doing this.  In fact, I've made a couple of
changes to the DTD and documentation based on comments from David
Faure (from the Konqueror development group).
  In particular, I've added the "icon" attribute to the <bookmark/>
and <folder/> elements, and the "toolbar" attribute to the <folder/>
element.  The later is intended to mark which folder should be used as
the "Personal Toolbar" -- my tentative change allows it to have the
values "yes" or "no", with "no" as the default.  This may need some
reconsideration; I can envision having software that supports multiple
toolbars, but I'm not sure of the best way to encode that
information.  (It may even be appropriate to push that into
application-specific metadata inside the <metadata/> element.)
  Another idea I've thought about from time to time is of linking to
other bookmark collections, so that a folder-like thing about be used
to refer to another (possibly remote) XBEL document by URI, or to RSS
or other documents that could be used to store bookmarks (possibly
including Netscape-style HTML bookmarks).  I think this would be easy
to support in XBEL, and it only takes software to make it useful. ;)
  Martin, were there other warts you were thinking about?  (Anyone?)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Wed Mar 28 17:35:06 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 28 Mar 2001 12:35:06 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
Message-ID: <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>

Ross Burton writes:
 > Less child nodes (such as <title>) and more attributes? :-) Only kidding
 > but libxml (aka gnome-xml) isn't that good with child nodes. I miss W3C
 > DOM...

  Having joined the ranks of DOM implementors myself, I can only wish
I missed it.  ;-(
  Is the libxml API really that hard to work with, or does it have
implementation limitations that cause it not to work with deeply
nested documents?  I keep meaning to look at it more, but just haven't
had the time.  Is there any reason the GNOME people are writing their
own XML parser instead of using Expat?
  Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From r.burton@180sw.com  Wed Mar 28 19:06:18 2001
From: r.burton@180sw.com (Ross Burton)
Date: Wed, 28 Mar 2001 20:06:18 +0100
Subject: [XML-SIG] Metadata in XBEL
References: <985724354.4243.0.camel@eddie><200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
Message-ID: <004901c0b7ba$2dbf84a0$2a01a8c0@eddie>

>  > I am involved in adding support for XBEL to Galeon, the GNOME Mozilla
>  > Gecko based browser.  Initial export is working, but there are several

>  > The specification for metadata is vague, <metadata> elements have a
>  > "owner" attribute which should be a URI. But what forms of metadata are
>  > valid? The DTD implies that there can be no children of metadata
>  > elements (the content is EMPTY).
>
>   Here's the problem:  What we want is to be able to say
> "ANY-and-we-really-mean-it", not ANY as defined in the DTD language.
> That definition tells us that ANY means anything *defined in the DTD*,
> which is pretty limited -- this is an inherited SGML wart.  I don't
> know how to express what we actually want in the DTD language; if
> anyone can tell me, I'd be glad to change the DTD for revision 1.1.
> If anyone can tell me how to do it in XSchema, I'd be happy to use
> that for the schema language instead of using the DTD language.

Ah.  I'm not a DTD expert so though that ANY meant literally anything.

>  > Currently Galeon-specific attributes are exported as follows:
>
> ...ugh!... Don't do that.

Okay.

>  > It does seem that the DTD is in error as requiring all metadata to be
in
>  > the owner attribute is rather limiting.  But what content is allowed as
>  > children of the metadata element? Just text? Or could the content of a
>  > metadata element be a free-form XML tree?  For example:
>  >
>  > <site ...>
>  >   <info>
>  >     <metadata owner="http://galeon.sourceforge.net">
>  >       <pixmap>/home/users/ross/pictures/slashdot.org</pixmap>
>  >       <toolbar>true</toolbar>
>  >     </metadata>
>  >   </info>
>  > </site>
>
>   This is *much* better!  It also matches the intent.

Right.  I'll change the code soon.  Maybe there should be an example of use
for the metadata elements in XBEL 1.1.

> Martin v. Loewis writes:
>  > That, of course, would mean that a version 1.1 of XBEL needs to be
>  > issued, so perhaps this is the time to think about other pending
>  > improvements.

>   I'm very happy with doing this.  In fact, I've made a couple of
> changes to the DTD and documentation based on comments from David
> Faure (from the Konqueror development group).
>   In particular, I've added the "icon" attribute to the <bookmark/>
> and <folder/> elements, and the "toolbar" attribute to the <folder/>
> element.  The later is intended to mark which folder should be used as
> the "Personal Toolbar" -- my tentative change allows it to have the
> values "yes" or "no", with "no" as the default.  This may need some
> reconsideration; I can envision having software that supports multiple
> toolbars, but I'm not sure of the best way to encode that
> information.  (It may even be appropriate to push that into
> application-specific metadata inside the <metadata/> element.)

I like those additions...  because they are the main reason Galeon has to
use metadata!  Galeon does allow multiple toolbars to be displayed (thinking
about it, just the one is a limitation really), but I think that a simple
"yes|no" with no as the default is good enough for that.  Pushing that into
metadata is not using the potential of the toolbar attribute.

>   Another idea I've thought about from time to time is of linking to
> other bookmark collections, so that a folder-like thing about be used
> to refer to another (possibly remote) XBEL document by URI, or to RSS
> or other documents that could be used to store bookmarks (possibly
> including Netscape-style HTML bookmarks).  I think this would be easy
> to support in XBEL, and it only takes software to make it useful. ;)

Sounds like the future plans for Gnobog (a GNOME bookmark organiser). At the
moment it just reads/wites Netscape/IE and can re-organise, but they are
planning on moving to a "filesystem" like architecture, where bookmarks are
stored seperately to the tree view. This way aliases are taken to the
logical extension and the entire system behaves just like an ext2 filesystem
with hard links everywhere.  Nodes in the tree will be allowed to point at a
bookmark entry, or another (possibly remote) set of folders.

I have no say whatsoever here, but I'm +1 for adding the toolbar and icon
attributes, clarifying the metadata elements and releasing 1.1 of the XBEL
spec. :-)

Regards, and thanks for the mails,
Ross Burton


From martin@loewis.home.cs.tu-berlin.de  Wed Mar 28 20:15:48 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 28 Mar 2001 22:15:48 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
Message-ID: <200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de>

>   Here's the problem:  What we want is to be able to say
> "ANY-and-we-really-mean-it", not ANY as defined in the DTD language.
> That definition tells us that ANY means anything *defined in the DTD*,
> which is pretty limited -- this is an inherited SGML wart.  I don't
> know how to express what we actually want in the DTD language; if
> anyone can tell me, I'd be glad to change the DTD for revision 1.1.

I think you are right: it cannot be expressed. Looking at the four
options of "Element Valid", none of them applies.

> Martin v. Loewis writes:
>  > That, of course, would mean that a version 1.1 of XBEL needs to be
>  > issued, so perhaps this is the time to think about other pending
>  > improvements.
> 
>   I'm very happy with doing this.  In fact, I've made a couple of
> changes to the DTD and documentation based on comments from David
> Faure (from the Konqueror development group).

So I'd reverse my previous comment: *If* a 1.1 release of XBEL is
issued for good reasons, it should probably show ANY as the element
contents, to avoid confusion; and the documentation should be clear
that any well-formed element (plus text) is accepted as contents

>   Martin, were there other warts you were thinking about?

None specifically. I was suggesting that any missing features that
came up during in the Galeon project should be worth consideration.

Regards,
Martin


From fdrake@acm.org  Wed Mar 28 20:34:41 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 28 Mar 2001 15:34:41 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de>
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
 <200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de>
Message-ID: <15042.19169.974191.457571@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > So I'd reverse my previous comment: *If* a 1.1 release of XBEL is
 > issued for good reasons, it should probably show ANY as the element
 > contents, to avoid confusion; and the documentation should be clear
 > that any well-formed element (plus text) is accepted as contents

  Agreed; I've made this change in my tentative 1.1 DTD and
documentation.

 > None specifically. I was suggesting that any missing features that
 > came up during in the Galeon project should be worth consideration.

  Sounds good.  Ross, David: please speak up if you have any
additional considerations for us!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From r.burton@180sw.com  Wed Mar 28 21:41:42 2001
From: r.burton@180sw.com (Ross Burton)
Date: Wed, 28 Mar 2001 22:41:42 +0100
Subject: [XML-SIG] Metadata in XBEL
References: <985724354.4243.0.camel@eddie><200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de><15042.8087.392696.683721@cj42289-a.reston1.va.home.com><200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de> <15042.19169.974191.457571@cj42289-a.reston1.va.home.com>
Message-ID: <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>

>  > None specifically. I was suggesting that any missing features that
>  > came up during in the Galeon project should be worth consideration.
>
>   Sounds good.  Ross, David: please speak up if you have any
> additional considerations for us!

The only features which XBEL was missing for Galeon which required metadata
are:

1) icon for bookmark
2) toolbar for folder
3) notes on bookmark
4) nick-name (shortcut name for typing into location box)
5) add to context menu

Of these 1 and 2 are already in XBEL 1.1.  The question is are 3-5 general
enough to be in the spec?

I think that 3 and 4 possibly are.

Several browsers allow free-form notes to be attached to sites, although
that does overlap somewhat with the metadata tags.  Maybe standard owners
for metadata elements can be defined for optional metadata, so that an owner
of  "python.org/xbel/notes" (say) could be used for notes on a item. This
way the data is confimed to the <info> node where it belongs, but it still
identified.

Also, some browsers allow short names to be assigned to sites, so that
typing in the short name is sufficient to navigate to the URL.  Of course
nick names are similar to the ID attribute, in that they are both short
names. However, I can see systems where the ID is generated by the system
itself, not the user.

I'm not so convinced about 5 (it adds the selected folder/bookmark to the
default context menu, so it is always available).  Personally I'm not so
convinced of it's usefulness, so it should not be in XBEL 1.1.

In summary, I'd like to see 4 in the spec, and possibly 3.  Comments anyone?

Regards,
Ross Burton


From fdrake@acm.org  Wed Mar 28 22:05:55 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 28 Mar 2001 17:05:55 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
 <200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de>
 <15042.19169.974191.457571@cj42289-a.reston1.va.home.com>
 <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>
Message-ID: <15042.24643.532405.135455@cj42289-a.reston1.va.home.com>

Ross Burton writes:
 > 3) notes on bookmark

  How is this different from the <desc/> element?  In Grail, this is
filled in by a multi-line type-in box in the bookmark's or folder's
Properties dialog -- it is equivalent to the same feature in
Navigator.

 > 4) nick-name (shortcut name for typing into location box)

  Presumably Galeon supports this.  Anyone else?  I'm not at all sure
I've ever seen it.

 > 5) add to context menu

  This sounds strangely like the "personal toolbar" -- how are they
different?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From srn@coolheads.com  Wed Mar 28 09:10:19 2001
From: srn@coolheads.com (Steven R. Newcomb)
Date: Wed, 28 Mar 2001 03:10:19 -0600
Subject: [XML-SIG] Extreme Markup Languages 2001 Conference
Message-ID: <200103280910.DAA25844@bruno.techno.com>

Reminder: It's time to send in your paper proposal for the Extreme
Markup Languages conference in Montreal next August.  Papers/proposals
are due this weekend.


                Call for Participation
                         for
             Extreme Markup Languages 2001

          NOTE - CONFERENCE DATES AND LOCATION HAVE CHANGED!

Highlights:
   - highly technical peer-reviewed 3.7-day conference preceded by 2 
days of tutorials
   - SGML, XML, Topic Maps, query languages, linking, schemas, 
transformations,  inference engines, formatting and behavior, and more
   - Submissions due by March 31, 2001
   - For more information visit www.gca.org


             Extreme Markup Languages 2001
    There's Nothing so Practical as a Good Theory

>From GCA (Alexandria, Va.) - Extreme Markup Languages brings 
>together software developers, markup theorists, information 
>visionaries, and other assorted geeks for formal presentations, 
>poster sessions, question and answer sessions, hallway discussions, 
>arguments and gesticulations in front of flip charts, table-top 
>software demos, coffee, and the cuisine, ambience, and charm of 
>Montr�al in August. Extreme conference participants include thought 
>leaders from corporate and academic information management, 
>knowledge engineering, enterprise integration/corporate memory, 
>science, and technical and cultural research.

There will be four types of presentations at Extreme: peer reviewed 
technical papers, late breaking news, posters, and invited keynotes. 
All will be new material, address some aspect of information 
management from a theoretical or practical standpoint, and be 
detailed and rigorous. Come join us to discuss information alchemy: 
making documents into information and data into gold.

   WHEN:      August 12-17, 2001
   WHERE:     Le Centre Sheraton, Montr�al, Canada
   SPONSOR:   Graphic Communications Association (GCA)
   Chairs:    Steven R. Newcomb
              B. Tommie Usdin, Mulberry Technologies, Inc.
   Co-Chairs: Deborah A. Lapeyre, Mulberry Technologies, Inc.
              C. M. Sperberg-McQueen, World Wide Web Consortium/MIT
                 Laboratory for Computer Sciences
   WHAT:      Call for Papers, Peer Reviewers, Posters, and Tutorials
   HOW:       Submit full papers or paper proposals to the conference
              secretariat in SGML or XML according to one of the
              submission DTDs and sent via email to: extreme@mulberrytech.com.
              Guidelines for Submission and the DTDs are available by
              email: extreme@mulberrytech.com
              or at http://www.mulberrytech.com/Extreme

              Apply to the Peer Review panel using the form at:
              http://www.mulberrytech.com/Extreme/Peer/

              Submit tutorial proposals according to the instructions
              at: http://www.mulberrytech.com/Extreme/Tutorial

   SCHEDULE:  Peer Review Applications Due. . March 2, 2001
              Tutorial Proposals Due . .  . . March 16, 2001
              Paper Submission Deadline . . . March 31, 2001
              Speakers Notified . . . . . . . May 14, 2001
              Revised Papers Due. . . . . . . June 18, 2001
              Tutorials . . . . . . . . . . . August 12-13, 2001
              Conference  . . . . . . . . . . August 14-17, 2001

QUESTIONS:  Email to Extreme@mulberrytech.com or call Tommie Usdin
              +1 301/315-9631
MORE INFORMATION: For updated information on the program and plans for
             the conference as they develop, see http://www2.gca.org/extreme/


-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

405 Flagler Court
Allen, Texas 75013-2821 USA


From r.burton@180sw.com  Wed Mar 28 22:24:43 2001
From: r.burton@180sw.com (Ross Burton)
Date: Wed, 28 Mar 2001 23:24:43 +0100
Subject: [XML-SIG] Metadata in XBEL
References: <985724354.4243.0.camel@eddie><200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de><15042.8087.392696.683721@cj42289-a.reston1.va.home.com><200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de><15042.19169.974191.457571@cj42289-a.reston1.va.home.com><008b01c0b7d0$1f51a9a0$2a01a8c0@eddie> <15042.24643.532405.135455@cj42289-a.reston1.va.home.com>
Message-ID: <00a301c0b7d5$e433f020$2a01a8c0@eddie>

>  > 3) notes on bookmark
>   How is this different from the <desc/> element?  In Grail, this is
> filled in by a multi-line type-in box in the bookmark's or folder's
> Properties dialog -- it is equivalent to the same feature in
> Navigator.

Erm... D'oh!

>  > 4) nick-name (shortcut name for typing into location box)
>   Presumably Galeon supports this.  Anyone else?  I'm not at all sure
> I've ever seen it.

I though Netscape did this? Just checked, it doesn't.  Damn.  I'm not doing
well tonight, am I?  Maybe I should get more sleep.  :-)

Hey, ignore that too unless people can quote other browsers which support
this.

>  > 5) add to context menu
>   This sounds strangely like the "personal toolbar" -- how are they
> different?

That was my thought...  Don't think about putting it in the spec, I'll
encode it as Galeon-specific metadata.

Ross


From fdrake@acm.org  Wed Mar 28 22:32:36 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 28 Mar 2001 17:32:36 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <00a301c0b7d5$e433f020$2a01a8c0@eddie>
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
 <200103282015.f2SKFmx04164@mira.informatik.hu-berlin.de>
 <15042.19169.974191.457571@cj42289-a.reston1.va.home.com>
 <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>
 <15042.24643.532405.135455@cj42289-a.reston1.va.home.com>
 <00a301c0b7d5$e433f020$2a01a8c0@eddie>
Message-ID: <15042.26244.446881.208379@cj42289-a.reston1.va.home.com>

Ross Burton writes:
 > well tonight, am I?  Maybe I should get more sleep.  :-)

  No, you just need a better grade of caffeine!

 > That was my thought...  Don't think about putting it in the spec, I'll
 > encode it as Galeon-specific metadata.

  So I'll presume that the current changes to XBEL are good for
Galeon.  I've not heard from David Faure yet; I'll wait to see if he
chimes in before checking in a new DTD and documentation.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From noreply@sourceforge.net  Thu Mar 29 08:31:24 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 29 Mar 2001 00:31:24 -0800
Subject: [XML-SIG] [ pyxml-Bugs-412141 ] pDomlette fails on cloneNode
Message-ID: <E14iXq0-0007ga-00@usw-sf-web2.sourceforge.net>

Bugs item #412141, was updated on 2001-03-29 00:31
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=412141&group_id=6473

Category: 4Suite
Group: None
Status: Open
Priority: 5
Submitted By: Alexandre Fayolle (afayolle)
Assigned to: Nobody/Anonymous (nobody)
Summary: pDomlette fails on cloneNode

Initial Comment:
Not all classes in pDomlette implement the cloneNode
method. PI, Attributes and Comments are notable
exceptions. However the cloneNode implementation in the
Element class calls cloneNode on all the children of
the current Element, which can result in attribute
errors.

Here's a patch against pDomlette from 4Suite 0.10.2. It
will skip children that are not elements. This
behaviour seems acceptable to me, but should be
documented somewhere if the patch was to be included in
the main distribution. 

--- /home/alf/tmp/pDomlette.py	Thu Mar 29 10:27:23 2001
+++ pDomlette.py	Thu Mar 29 10:11:50 2001
@@ -289,8 +289,9 @@
            
newElement.setAttributeNS(attr.namespaceURI,attr.name,attr.value)
         if deep:
             for c in self.childNodes:
-                nc = c.cloneNode(deep)
-                newElement.appendChild(nc)
+                if c.nodeType == c.ELEMENT_NODE:
+                    nc = c.cloneNode(deep)
+                    newElement.appendChild(nc)
             
         return newElement
     

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=412141&group_id=6473


From Alexandre.Fayolle@logilab.fr  Thu Mar 29 09:04:00 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 29 Mar 2001 11:04:00 +0200 (CEST)
Subject: [XML-SIG] User Friendly and XML
Message-ID: <Pine.LNX.4.21.0103291101260.21238-100000@leo.logilab.fr>

For those of you who do not already know User Friendly (the comic strip),
I advise you to give a look at yesterday and today's cartoons, which
illustrate the evilness of XML books.

http://ars.userfriendly.org/cartoons/?id=20010328&mode=classic
http://ars.userfriendly.org/cartoons/?id=20010329&mode=classic

Cheers,

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From david@mandrakesoft.com  Thu Mar 29 16:35:54 2001
From: david@mandrakesoft.com (David Faure)
Date: Thu, 29 Mar 2001 17:35:54 +0100
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>
References: <985724354.4243.0.camel@eddie> <15042.19169.974191.457571@cj42289-a.reston1.va.home.com> <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie>
Message-ID: <200103291635.f2TGZsr27038@faure.worldonline.co.uk>

Hello everyone,

First, I'm glad that Galeon uses XBEL too, I didn't know that.

> 1) icon for bookmark
> 2) toolbar for folder

Those are the two things I asked for in XBEL too. Good to see
it will be in the 1.1 version of the spec. However there is a small
concern about how icons are designated - as usual between
KDE and Gnome, since the same problem exists in the .desktop files.

The way it currently works in Konqueror is the following.
 <bookmark icon="www" href="http://www.kde.org/" >
  <title>KDE Home Page</title>
 </bookmark>

The icon is an attribute of the bookmark element, and of the folder element,
and 
* either the icon name is the base name of a globally
available icon; no extension is written, and no directory either.
The icon loading looks for icons of that name + ".png" or ".xpm",
under the standard (for KDE) directories, e.g. /usr/share/icons/hicolor/16x16/*/
This makes it possible to have a different icon for 8-bit displays
(locolor instead of hicolor), and gives access to different icon sizes.
* or the icon name is like "favicons/www-1.ibm.com" to designate a
"favourite icon" for a given site, which has been stored under the
~/.kde/share/icons/favicons/ directory (with .png appended).
* obviously full paths are supported too.

I realize that all this is very hard to standardize !!

The only practical solution is to add the search paths of the other
environment in each, as was done for .desktop files. But that doesn't
consistute a clean spec. I'm afraid I have no solution to offer,
I guess I'm just pointing out that sharing the same attribute might
not be enough for users to use the same bookmark collection with
both browsers.

I saw in another mail on the subject, this piece of XML :
 > <site ...>
  >   <info>
  >     <metadata owner="http://galeon.sourceforge.net">
  >       <pixmap>/home/users/ross/pictures/slashdot.org</pixmap>
  >       <toolbar>true</toolbar>
  >     </metadata>
  >   </info>
  > </site>
Is this a concrete case of XML used by Galeon, or is it more like a
theorical example ?
I'm surprised by <site>, <info> etc. Is that part of XBEL ?
I guess not :)
Surely jumping in the middle of a discussion doesn't help :-)
Anyway, back to the icon issue, this seems to suggest that Galeon
uses full paths ?

> 3) notes on bookmark
That, and many other things associated with bookmarks, will end
up being necessary.
Juergen (who plans to contribute to Konqueror's bookmarks) mentionned
scoring: "to give the site a score (of out 10, for example)... then you could 
search for "linux kde development" with a score >= 7 for example".
Especially useful if merging is done, see end of mail.

Other things that users mentionned were: list of keywords
(still for searches), and, hmm, how often a given bookmark was
visited. Not very important, given that we still don't support the
added/visited/modified dates yet.

> 4) nick-name (shortcut name for typing into location box)
Interesting idea :)
In fact this is possible in Konqueror, but via a separate module 
(the "short-URI filter"), so it's currently unrelated with bookmarks.

One often requested feature, is for merging. For instance, in a company,
there could be a "company-global" set of bookmarks, to be merged with
the user's bookmarks - much like everything else in KDE already has
a global and a local directory, possibly with even more levels (e.g. for
groups of people).
To make that possible, XBEL could have a sort of "include this other
bookmark collection" tag, and it could be up to the application to create
aliases towards those global bookmarks in the user's bookmark file.
Well, that's just one solution - it allows to change the order, to remove
a global bookmark, to insert its own anywhere... but it doesn't notice new
bookmarks in the global collection, unless some timestamp is used.

Another way could be that including another set of bookmarks simply means
that all those bookmarks appear first, then those in the user's file.
This way, changes to the global collection are automatically taken into account,
but it's impossible to modify/remove/reorder/change anything in the global
collection. It's probably much easier to implement too, and has the exact semantic
of a #include. I suggest to add this to XBEL then: a simple 
<include href="file:/path/to/bookmarks/collection.xml">.
There's still the issue of relative paths vs absolute paths, but, well... 
no solution here either :}


In summary, despite the compatibility problem with icon names (and paths),
I'm very happy if icon="..." and toolbar="yes" are added to XBEL
(given that Konqueror already uses those), I suggest to add an <include>
possibility, and the few other things that are not in XBEL and that might 
be in konqueror one day (keywords, scoring), can certainly be done as 
konq-specific metadata - unless others want to share the same data.

-- 
David FAURE, david@mandrakesoft.com, faure@kde.org
http://perso.mandrakesoft.com/~david/, http://www.konqueror.org/
KDE, Making The Future of Computing Available Today


From r.burton@180sw.com  Thu Mar 29 16:56:57 2001
From: r.burton@180sw.com (Ross Burton)
Date: Thu, 29 Mar 2001 17:56:57 +0100
Subject: [XML-SIG] Metadata in XBEL
References: <985724354.4243.0.camel@eddie> <15042.19169.974191.457571@cj42289-a.reston1.va.home.com> <008b01c0b7d0$1f51a9a0$2a01a8c0@eddie> <200103291635.f2TGZsr27038@faure.worldonline.co.uk>
Message-ID: <005601c0b871$4a632280$1501a8c0@180sw.com>

Hi,

> First, I'm glad that Galeon uses XBEL too, I didn't know that.

Well, the lack of XBEL is what prompted me to start work on it.  There is
now a branch were work by me and Ricardo is slowly but surely progressing.

> The way it currently works in Konqueror is the following.
>  <bookmark icon="www" href="http://www.kde.org/" >
>   <title>KDE Home Page</title>
>  </bookmark>
>
> The icon is an attribute of the bookmark element, and of the folder
element,
> and
> * either the icon name is the base name of a globally
> available icon; no extension is written, and no directory either.
> The icon loading looks for icons of that name + ".png" or ".xpm",
> under the standard (for KDE) directories, e.g.
/usr/share/icons/hicolor/16x16/*/
> This makes it possible to have a different icon for 8-bit displays
> (locolor instead of hicolor), and gives access to different icon sizes.
> * or the icon name is like "favicons/www-1.ibm.com" to designate a
> "favourite icon" for a given site, which has been stored under the
> ~/.kde/share/icons/favicons/ directory (with .png appended).
> * obviously full paths are supported too.
> I realize that all this is very hard to standardize !!
> The only practical solution is to add the search paths of the other
> environment in each, as was done for .desktop files. But that doesn't
> consistute a clean spec. I'm afraid I have no solution to offer,
> I guess I'm just pointing out that sharing the same attribute might
> not be enough for users to use the same bookmark collection with
> both browsers.

Hmm... I didn't know KDE had lists of icons.  That could be an issue.

> I saw in another mail on the subject, this piece of XML :
>  > <site ...>
>   >   <info>
>   >     <metadata owner="http://galeon.sourceforge.net">
>   >       <pixmap>/home/users/ross/pictures/slashdot.org</pixmap>
>   >       <toolbar>true</toolbar>
>   >     </metadata>
>   >   </info>
>   > </site>
> Is this a concrete case of XML used by Galeon, or is it more like a
> theorical example ?
> I'm surprised by <site>, <info> etc. Is that part of XBEL ?
> I guess not :)

That's the Lack Of Caffine And Sleep problem again. site == bookmark.
Galeon now (my local copy, anyway) exports its metadata in that form, so
there is only one metadata element owned by Galeon in each bookmark/folder.

> Surely jumping in the middle of a discussion doesn't help :-)
> Anyway, back to the icon issue, this seems to suggest that Galeon
> uses full paths ?

Yes, it does.

> > 3) notes on bookmark
> That, and many other things associated with bookmarks, will end
> up being necessary.
> Juergen (who plans to contribute to Konqueror's bookmarks) mentionned
> scoring: "to give the site a score (of out 10, for example)... then you
could
> search for "linux kde development" with a score >= 7 for example".
> Especially useful if merging is done, see end of mail.
> Other things that users mentionned were: list of keywords
> (still for searches), and, hmm, how often a given bookmark was
> visited. Not very important, given that we still don't support the
> added/visited/modified dates yet.

Nice ideas.  You're not alone with the added/visited/modified dates, BTW.
:-)

> One often requested feature, is for merging. For instance, in a company,
> there could be a "company-global" set of bookmarks, to be merged with
> the user's bookmarks - much like everything else in KDE already has
> a global and a local directory, possibly with even more levels (e.g. for
> groups of people).

I like that idea too, it could be very handy.

> In summary, despite the compatibility problem with icon names (and paths),
> I'm very happy if icon="..." and toolbar="yes" are added to XBEL
> (given that Konqueror already uses those), I suggest to add an <include>
> possibility, and the few other things that are not in XBEL and that might
> be in konqueror one day (keywords, scoring), can certainly be done as
> konq-specific metadata - unless others want to share the same data.

I'm for creating a set of metadata owners which can be considered "standard"
in that they are defined under common grounds in the open.  Not part of the
actual specification (as it's best if that is kept small) but a catalogue of
owners and expected content which would allow sharing of data.  Into this
could be added all of the usefull attributes which could be shared without
making XBEL overly complex, such as keywords and scoring.

Regards,
Ross Burton
---
Ross Burton                     Software Engineer
OneEighty Software Ltd          Tel: +44 20 8263 2332
The Lansdowne Building          Fax: +44 20 8263 6314
2 Lansdowne Road                r.burton@180sw.com
Croydon, Surrey CR9 2ER, UK     http://www.180sw.com./
====================================================================
Under the Regulation of Investigatory Powers (RIP) Act 2000 together
with any and all Regulations in force pursuant to the Act OneEighty
Software Ltd reserves the right to monitor any or all incoming or
outgoing communications as provided for under the Act


From noreply@sourceforge.net  Thu Mar 29 17:31:23 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 29 Mar 2001 09:31:23 -0800
Subject: [XML-SIG] [ pyxml-Bugs-412235 ] xml.xslt.RtfWriter broken
Message-ID: <E14igGZ-0001MD-00@usw-sf-web3.sourceforge.net>

Bugs item #412235, was updated on 2001-03-29 09:31
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=412235&group_id=6473

Category: 4Suite
Group: None
Status: Open
Priority: 5
Submitted By: Alexandre Fayolle (afayolle)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.xslt.RtfWriter broken

Initial Comment:
When trying to process an XSLT outputting a pDomlette
document fragment, runNode will fail with the following
traceback :

>>> frag = p.runNode(element,1,{},RtfWriter(None,d2))
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 186, in runNode
    baseUri, outputStream)
  File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 244, in execute
    self.writers[-1].startDocument()
AttributeError: startDocument


The problem lies in RtfWriter, which shoudl inherit
from NullWriter (which provides default implementation
for all writer method). Here's a patch:

--- /tmp/RtfWriter.py   Thu Mar 29 19:25:01 2001
+++ RtfWriter.py        Thu Mar 29 19:26:05 2001
@@ -19,8 +19,9 @@
 from Ft.Lib import pDomlette
 from xml.dom.ext import SplitQName
 from xml.dom import XMLNS_NAMESPACE
+from xml.xslt import NullWriter
 
-class RtfWriter:
+class RtfWriter(NullWriter.NullWriter):
     def __init__(self, outputParams, ownerDoc):
         self.__ownerDoc = ownerDoc
         self.__root =
pDomlette.DocumentFragment(ownerDoc)


Cheers

Alexandre


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=412235&group_id=6473


From martin@loewis.home.cs.tu-berlin.de  Thu Mar 29 17:39:12 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 29 Mar 2001 19:39:12 +0200
Subject: [XML-SIG] Matching NameChars
Message-ID: <200103291739.f2THdCe01821@mira.informatik.hu-berlin.de>

I have now committed two new modules, utils/xmlchargen.py and
xml/utils/characters.py (generated from the former). These represent
common regular expressions: specifically, expressions for the
productions in sections B and 2.3, Names and Tokens. For each of them,
there is a string constant Foo represending a regular expression, and
a compiled regular expression re_Foo.

I've changed xmlproc to use those. As it turns out, this will
slow-down parsing on an example document (the XSLT spec) by 3%,
contrary to my earlier (more optimistic) measurements.

Marc-Andr=E9 suggested to write C code to speed this up. So here is a
revised challenge for any prospective contributor: write a C module
that emulates xml.utils.characters, by providing objects with the same
methods as the compiled regular expressions, but faster matching
algorithms. Alternatively, come up with a patch to sre that performs
faster matching when presented with Unicode character classes - that
would help more Python users than the former approach.

Hint: Please have a look at how expat represents the bitmaps, that
appears to be quite efficient. I'd discourage outright copying of
those tables, though - somebody should verify that they are still
correct for XML 1.0 2nd edition.

Regards,
Martin


From noreply@sourceforge.net  Thu Mar 29 17:47:54 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 29 Mar 2001 09:47:54 -0800
Subject: [XML-SIG] [ pyxml-Patches-412237 ] sgmlop returns Unicode
Message-ID: <E14igWY-0001tE-00@usw-sf-web1.sourceforge.net>

Patches item #412237, was updated on 2001-03-29 09:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=412237&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Walter D�rwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmlop returns Unicode

Initial Comment:
This patch enhances sgmlop:

It adds a third parser type (XMLUnicodeParser)
that returns Unicode objects to the application. The 
parser recognizes all 8bit encodings in the XML header 
and decodes the 8bit characters accordingly. The 
encoding defaults to UTF-8. (This could be changed 
easily or made customizable)


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=412237&group_id=6473


From larsga@garshol.priv.no  Thu Mar 29 21:38:58 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Mar 2001 23:38:58 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
References: <985724354.4243.0.camel@eddie> 	<200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
Message-ID: <m3vgosmf0t.fsf@lambda.garshol.priv.no>

* Fred L. Drake, Jr.
| 
| Here's the problem: What we want is to be able to say
| "ANY-and-we-really-mean-it", not ANY as defined in the DTD language.
| That definition tells us that ANY means anything *defined in the
| DTD*, which is pretty limited -- this is an inherited SGML wart.  I
| don't know how to express what we actually want in the DTD language;
| if anyone can tell me, I'd be glad to change the DTD for revision
| 1.1.

I would do it like this:

  <!-- Please redefine this in any derived DTDs -->
  <!ENTITY % any "EMPTY">

  <!ELEMENT metadata %any;>

That would allow anyone creating an extended DTD to first define their
elements, then redefine %any;, then refer to the XBEL DTD and have it
interpreted correctly.

ANY would also work, in the sense that another DTD could define the
extra elements and then refer to the XBEL DTD, but I think it would be
too loose, and that a PE is better.

--Lars M.


From larsga@garshol.priv.no  Thu Mar 29 21:40:39 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Mar 2001 23:40:39 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> 	<Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com> <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
Message-ID: <m3u24cmey0.fsf@lambda.garshol.priv.no>

* Fred L. Drake, Jr.
| 
| Having joined the ranks of DOM implementors myself, I can only wish
| I missed it.  ;-(

So do I. In fact, if you're writing a DOM implementation I suggest
that you stop and that we design an API more suitable to whatever it
is we want to do. Python really could use a better tree XML API than
the DOM. Pyxie looks good, but it needs work. JDOM also looks good.

--Lars M.


From fdrake@acm.org  Thu Mar 29 21:46:52 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 29 Mar 2001 16:46:52 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <m3vgosmf0t.fsf@lambda.garshol.priv.no>
References: <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
 <m3vgosmf0t.fsf@lambda.garshol.priv.no>
Message-ID: <15043.44364.217328.30550@cj42289-a.reston1.va.home.com>

Lars Marius Garshol writes:
 > I would do it like this:
 > 
 >   <!-- Please redefine this in any derived DTDs -->
 >   <!ENTITY % any "EMPTY">
 > 
 >   <!ELEMENT metadata %any;>

  I like this much better!  I've named this metadata.mix to be
consistent with other PEs in XBEL, but otherwise used this directly.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Thu Mar 29 21:51:07 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 29 Mar 2001 16:51:07 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <m3u24cmey0.fsf@lambda.garshol.priv.no>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no>
Message-ID: <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>

[Removed Ross Burton from the list of recipients; the Python DOM issue
 isn't relevant to galeon.]

Lars Marius Garshol writes:
 > So do I. In fact, if you're writing a DOM implementation I suggest
 > that you stop and that we design an API more suitable to whatever it
 > is we want to do. Python really could use a better tree XML API than
 > the DOM. Pyxie looks good, but it needs work. JDOM also looks good.

  Alas, I'm afraid there was an element of "buzzword compliance" in
the motivation for the implementation I'm involved in.  I'd be very
interested in developing a new API to use instead, though.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From larsga@garshol.priv.no  Thu Mar 29 21:57:54 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Mar 2001 23:57:54 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> 	<Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com> 	<15042.8394.88965.369473@cj42289-a.reston1.va.home.com> 	<m3u24cmey0.fsf@lambda.garshol.priv.no> <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
Message-ID: <m3n1a4me59.fsf@lambda.garshol.priv.no>

* Fred L. Drake, Jr.
| 
| Alas, I'm afraid there was an element of "buzzword compliance" in
| the motivation for the implementation I'm involved in.  I'd be very
| interested in developing a new API to use instead, though.

Then I think we should put it in the roadmap, unless we have some
people ready to start on it now. I am not able to, since I'll be very
heavily loaded until Easter and (what bliss!!!) on holiday after that.

--Lars M.


From fdrake@acm.org  Thu Mar 29 21:59:24 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 29 Mar 2001 16:59:24 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <m3n1a4me59.fsf@lambda.garshol.priv.no>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no>
 <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
 <m3n1a4me59.fsf@lambda.garshol.priv.no>
Message-ID: <15043.45116.937337.669625@cj42289-a.reston1.va.home.com>

Lars Marius Garshol writes:
 > Then I think we should put it in the roadmap, unless we have some
 > people ready to start on it now. I am not able to, since I'll be very
 > heavily loaded until Easter and (what bliss!!!) on holiday after that.

  Sounds good to me.  I won't be able to work on it until after Python
2.1 is out.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From rsalz@zolera.com  Thu Mar 29 23:43:43 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 29 Mar 2001 18:43:43 -0500
Subject: [XML-SIG] Metadata in XBEL
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no> <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
Message-ID: <3AC3C8AF.28D02DCC@zolera.com>

>  I'd be very
> interested in developing a new API to use instead, though.

I'd rather have the current stuff documented. :)
	/r$


From fdrake@acm.org  Fri Mar 30 00:11:39 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 29 Mar 2001 19:11:39 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <3AC3C8AF.28D02DCC@zolera.com>
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no>
 <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
 <3AC3C8AF.28D02DCC@zolera.com>
Message-ID: <15043.53051.338378.384859@cj42289-a.reston1.va.home.com>

Rich Salz writes:
 > I'd rather have the current stuff documented. :)

  Are you aware of the DOM documentation in the development version of
the Python docs?  See:

    http://python.sourceforge.net/devel-docs/lib/markup.html


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Fri Mar 30 06:14:03 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 30 Mar 2001 08:14:03 +0200
Subject: [XML-SIG] Documentation
In-Reply-To: <15043.53051.338378.384859@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no>
 <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
 <3AC3C8AF.28D02DCC@zolera.com> <15043.53051.338378.384859@cj42289-a.reston1.va.home.com>
Message-ID: <200103300614.f2U6E3W01048@mira.informatik.hu-berlin.de>

> Rich Salz writes:
>  > I'd rather have the current stuff documented. :)
> 
>   Are you aware of the DOM documentation in the development version of
> the Python docs?  See:
> 
>     http://python.sourceforge.net/devel-docs/lib/markup.html

There is still a lot of stuff missing, though:
- saxexts/sax2exts. I think Lars Marius claims that these are obsolete,
  but I can't see how to live without them.
- saxlib.{DeclHandler, LexicalHandler}
- saxutils.{ErrorPrinter, ErrorRaiser, Location}
- xml.dom.javadom
- DOM interfaces beyond Core in 4DOM
- xml.dom.ext
- xml.marshal
- xml.utils.qp_xml

Regards,
Martn


From Eugene.Leitl@lrz.uni-muenchen.de  Fri Mar 30 09:12:18 2001
From: Eugene.Leitl@lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 30 Mar 2001 11:12:18 +0200 (MET DST)
Subject: [XML-SIG] Documentation
In-Reply-To: <200103300614.f2U6E3W01048@mira.informatik.hu-berlin.de>
Message-ID: <Pine.GSO.4.03.10103301111390.28364-100000@sun1.lrz-muenchen.de>

On Fri, 30 Mar 2001, Martin v. Loewis wrote:

> There is still a lot of stuff missing, though:

Any way XML-RPC will make it into PyXML?


From martin@loewis.home.cs.tu-berlin.de  Fri Mar 30 11:27:14 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 30 Mar 2001 13:27:14 +0200
Subject: [XML-SIG] Documentation
In-Reply-To: <Pine.GSO.4.03.10103301111390.28364-100000@sun1.lrz-muenchen.de>
 (message from Eugene Leitl on Fri, 30 Mar 2001 11:12:18 +0200 (MET
 DST))
References: <Pine.GSO.4.03.10103301111390.28364-100000@sun1.lrz-muenchen.de>
Message-ID: <200103301127.f2UBREH08494@mira.informatik.hu-berlin.de>

> On Fri, 30 Mar 2001, Martin v. Loewis wrote:
> 
> > There is still a lot of stuff missing, though:
> 
> Any way XML-RPC will make it into PyXML?

Due to contributions of code, of course. I was talking about missing
documentation, though, not about missing code.

Regards,
Martin


From support@internetdiscovery.com  Fri Mar 30 14:58:57 2001
From: support@internetdiscovery.com (Mike Clarkson)
Date: Fri, 30 Mar 2001 06:58:57 -0800
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15042.8087.392696.683721@cj42289-a.reston1.va.home.com>
References: <985724354.4243.0.camel@eddie>
 <985724354.4243.0.camel@eddie>
 <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
Message-ID: <3.0.6.32.20010330065857.007c7140@popd.ix.netcom.com>

At 12:29 PM 3/28/01 -0500, you wrote:
>
>[Adding David Faure to the recipients list.]
>  Here's the problem:  What we want is to be able to say
>"ANY-and-we-really-mean-it", not ANY as defined in the DTD language.
>That definition tells us that ANY means anything *defined in the DTD*,
>which is pretty limited -- this is an inherited SGML wart.  I don't
>know how to express what we actually want in the DTD language; if
>anyone can tell me, I'd be glad to change the DTD for revision 1.1.

Isn't the canonical "solution" to this:

<info>
<metadata owner="http://tix.sourceforge.net"><![CDATA[
I'll do what I please, with or without a <DTD>.
]]></metadata>
</info>

That's legal in terms of an ANY definition of <metadata> isn't it?
We do this because we also don't want parse the contents of the metadara.

Or if the contents of the <desc> tag stores non-conforming HTML, such as
a user-generated description or comment, is that not the same problem:

<desc><![CDATA[
I'll do what I please, <BR> in HTML 2.0 <P>.
]]></desc>

Even if it were conforming XML, we'd want to mask it off anyway
to protect it from being parsed by the DOM; we want to treat it as a
chunk that gets replaced en masse when the user decides to.

Mike.


From ken@bitsko.slc.ut.us  Fri Mar 30 15:28:55 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 30 Mar 2001 09:28:55 -0600
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: Lars Marius Garshol's message of "29 Mar 2001 23:40:39 +0200"
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com> <15042.8394.88965.369473@cj42289-a.reston1.va.home.com> <m3u24cmey0.fsf@lambda.garshol.priv.no>
Message-ID: <x51yrf1djc.fsf@bitsko.slc.ut.us>

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> * Fred L. Drake, Jr.
> | 
> | Having joined the ranks of DOM implementors myself, I can only wish
> | I missed it.  ;-(
> 
> So do I. In fact, if you're writing a DOM implementation I suggest
> that you stop and that we design an API more suitable to whatever it
> is we want to do. Python really could use a better tree XML API than
> the DOM. Pyxie looks good, but it needs work. JDOM also looks good.

I have one such API implemented in Orchard.  The API is described in
[1], and the Python implementation available from [2].  I also have a
C implementation there as well, but the C <-> Python bridge is not
available yet.

I've briefly mentioned Orchard here a couple of times, but not in the
context of its DOM or SAX APIs, because I'd presumed, apparently
incorrectly, that there'd be little interest in less Java-ish APIs
when Python has very solid SAX and DOM bindings with several
implementations.

Orchard implements the "node based" SAX we've discussed here before,
where the nodes used in SAX are the same nodes used to build a tree.
Implementing a pull-parser with nodes is a minor addition, but hasn't
been specced yet.

Orchard has a "grove-like" feel to it (intentionally) and allows for
compatible subsets (like Common XML) or supersets (like Jonathan
Borden's XSet or parsed-syntax information).

Orchard's XML node semantics are intended to be compatible with DOM,
such that bi-directional wrappers are both possible and shouldn't be
too difficult.  For example, I would like to use the W3C DOM Test
Suite, via a wrapper, to certify the Orchard tree implementation.
Orchard's SAX-like interface has been lightly tested against
Java-style SAX parsers using SAX<->Orchard filters, and is also
intended to be fully compatible.

Orchard grew out of the need to implement this style of SAX and DOM
for Perl's bindings.  Over the last couple of years my writing has
been split fairly evenly between Perl and Python and I wanted to be
able to use this style of API in my Python applications as well.  I
figured even if it were just for myself, I'd be happy ;-)

Let me know what you think,

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/xml.html>
[2] <http://casbah.org/~kmacleod/orchard/>


From rsalz@zolera.com  Fri Mar 30 17:17:45 2001
From: rsalz@zolera.com (Rich Salz)
Date: Fri, 30 Mar 2001 12:17:45 -0500
Subject: [XML-SIG] Re: Documentation
References: <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de>
 <Pine.LNX.4.21.0103272323310.2124-100000@gallahad.180sw.com>
 <15042.8394.88965.369473@cj42289-a.reston1.va.home.com>
 <m3u24cmey0.fsf@lambda.garshol.priv.no>
 <15043.44619.964310.42196@cj42289-a.reston1.va.home.com>
 <3AC3C8AF.28D02DCC@zolera.com> <15043.53051.338378.384859@cj42289-a.reston1.va.home.com> <200103300614.f2U6E3W01048@mira.informatik.hu-berlin.de>
Message-ID: <3AC4BFB9.F7E7E69F@zolera.com>

> There is still a lot of stuff missing, though:

and, for those of us living on the bleeding edge, xpath and xlst. :)

And just in case it needs to be said, "I mean no disrespect."  I'd much
rather have undocumented code than no code.

Fred's link is very useful -- it shows that there is an overall harness
in which to put the docs.

Probably early next week I'll have XML Canonicalization code ready to
contribute.  And now I know what to write up, too.
	/r$


From larsga@garshol.priv.no  Fri Mar 30 18:23:59 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 30 Mar 2001 20:23:59 +0200
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <3.0.6.32.20010330065857.007c7140@popd.ix.netcom.com>
References: <985724354.4243.0.camel@eddie>  <985724354.4243.0.camel@eddie>  <200103272130.f2RLUpu04246@mira.informatik.hu-berlin.de> <3.0.6.32.20010330065857.007c7140@popd.ix.netcom.com>
Message-ID: <m3u24b2k00.fsf@lambda.garshol.priv.no>

* Mike Clarkson
| 
| Isn't the canonical "solution" to this:
| 
| <info>
| <metadata owner="http://tix.sourceforge.net"><![CDATA[
| I'll do what I please, with or without a <DTD>.
| ]]></metadata>
| </info>

No, this is not very nice, I think, because the information inside
the CDATA marked section will then have to be reparsed. It is much
better to redefine the DTD and just use an extended DTD.

Well-written XBEL applications that don't understand this stuff should
just ignore it, and those that understand it would much prefer to have
the content parsed as part of the document.
 
| That's legal in terms of an ANY definition of <metadata> isn't it?

Yes, it is.

| We do this because we also don't want parse the contents of the
| metadara.

Well, if you _really_ don't it works, but...
 
--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Fri Mar 30 19:13:21 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 30 Mar 2001 21:13:21 +0200
Subject: [XML-SIG] New Developer
Message-ID: <200103301913.f2UJDLC02061@mira.informatik.hu-berlin.de>

Please welcome Rich Salz <rsalz@zolera.com> as a PyXML
developer. He'll look into the xml.xslt and xml.xpath packages, as
well as into authoring documentation.

Regards,
Martin


From jtauber@bowstreet.com  Sat Mar 31 00:58:29 2001
From: jtauber@bowstreet.com (James Tauber)
Date: Fri, 30 Mar 2001 19:58:29 -0500
Subject: [XML-SIG] PyTREX as xml.schema.trex?
Message-ID: <C4EE90263CBFD411A0C800B0D0490095CCC5AF@bst-mail02>

Now that PyTREX (http://pytrex.sourceforge.net/) is beta, is there any
interest in making it part of PyXML?

James


From fdrake@acm.org  Sat Mar 31 06:38:34 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 31 Mar 2001 01:38:34 -0500 (EST)
Subject: [XML-SIG] New Developer
In-Reply-To: <200103301913.f2UJDLC02061@mira.informatik.hu-berlin.de>
References: <200103301913.f2UJDLC02061@mira.informatik.hu-berlin.de>
Message-ID: <15045.31594.466578.261510@beowolf.pythonlabs.org>

Martin v. Loewis writes:
 > Please welcome Rich Salz <rsalz@zolera.com> as a PyXML
 > developer. He'll look into the xml.xslt and xml.xpath packages, as
 > well as into authoring documentation.

  Hurray, documentation!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 31 12:29:53 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 31 Mar 2001 14:29:53 +0200
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: <C4EE90263CBFD411A0C800B0D0490095CCC5AF@bst-mail02> (message from
 James Tauber on Fri, 30 Mar 2001 19:58:29 -0500)
References: <C4EE90263CBFD411A0C800B0D0490095CCC5AF@bst-mail02>
Message-ID: <200103311229.f2VCTrY07215@mira.informatik.hu-berlin.de>

> Now that PyTREX (http://pytrex.sourceforge.net/) is beta, is there any
> interest in making it part of PyXML?

Certainly. I have imported it into /xml/xml/schema/trex.py, as you've
proposed. I also made you a developer, so you can update it as
needed. I have not incorporated the test suite. If you want to ship it
with PyXML, you should import it into xml/test. If you merely want to
provide a copy to PyXML, you can also import it into /test/trex. Or,
you can leave that alone, so that everybody would get the test suite
from the pytrex CVS.

Thanks for the contribution,
Martin


From larsga@garshol.priv.no  Sat Mar 31 13:12:30 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 31 Mar 2001 15:12:30 +0200
Subject: [XML-SIG] xmlproc in PyXML CVS tree
Message-ID: <m3zoe213r5.fsf@lambda.garshol.priv.no>

I have now, finally, moved xmlproc into the PyXML CVS tree, where it
will be maintained from now on. I will no longer maintain it
separately in my own CVS tree, but use the PyXML tree for this.

To this end the xmlproc test suite has been added to the PyXML CVS
tree as a separate top-level project called 'test'. This test suite,
and especially the one named 'oasis' should be used to verify any
changes made to xmlproc to ensure that they do not break anything.

Please note that the test suite is about 20 MB, so it's a substantial
download. 

--Lars M.


From uche.ogbuji@fourthought.com  Sat Mar 31 13:37:14 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 06:37:14 -0700
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: Message from David Faure <david@mandrakesoft.com>
 of "Thu, 29 Mar 2001 17:35:54 +0100." <200103291635.f2TGZsr27038@faure.worldonline.co.uk>
Message-ID: <200103311337.GAA06268@localhost.localdomain>

> One often requested feature, is for merging. For instance, in a company,
> there could be a "company-global" set of bookmarks, to be merged with
> the user's bookmarks - much like everything else in KDE already has
> a global and a local directory, possibly with even more levels (e.g. for
> groups of people).

Yes.  I actually implemented an off-line merge earlier, but I think a 
standardized merge indicator would be useful.

> To make that possible, XBEL could have a sort of "include this other
> bookmark collection" tag, and it could be up to the application to create
> aliases towards those global bookmarks in the user's bookmark file.
> Well, that's just one solution - it allows to change the order, to remove
> a global bookmark, to insert its own anywhere... but it doesn't notice new
> bookmarks in the global collection, unless some timestamp is used.
> 
> Another way could be that including another set of bookmarks simply means
> that all those bookmarks appear first, then those in the user's file.
> This way, changes to the global collection are automatically taken into account,
> but it's impossible to modify/remove/reorder/change anything in the global
> collection. It's probably much easier to implement too, and has the exact semantic
> of a #include. I suggest to add this to XBEL then: a simple 
> <include href="file:/path/to/bookmarks/collection.xml">.
> There's still the issue of relative paths vs absolute paths, but, well... 
> no solution here either :}

That should instead be spelled

<merge include:href="file:/path/to/bookmarks/collection.xml"/>

Or such, so that processors that don't have first-class merge support can 
still include the other file through xinclude.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Mar 31 13:40:21 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 06:40:21 -0700
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: Message from "Ross Burton" <r.burton@180sw.com>
 of "Thu, 29 Mar 2001 17:56:57 +0100." <005601c0b871$4a632280$1501a8c0@180sw.com>
Message-ID: <200103311340.GAA06282@localhost.localdomain>

> > In summary, despite the compatibility problem with icon names (and paths),
> > I'm very happy if icon="..." and toolbar="yes" are added to XBEL
> > (given that Konqueror already uses those), I suggest to add an <include>
> > possibility, and the few other things that are not in XBEL and that might
> > be in konqueror one day (keywords, scoring), can certainly be done as
> > konq-specific metadata - unless others want to share the same data.
> 
> I'm for creating a set of metadata owners which can be considered "standard"
> in that they are defined under common grounds in the open.  Not part of the
> actual specification (as it's best if that is kept small) but a catalogue of
> owners and expected content which would allow sharing of data.  Into this
> could be added all of the usefull attributes which could be shared without
> making XBEL overly complex, such as keywords and scoring.

Though some might not like it, this sounds like a job for namespaces, and for 
m12n such as that provided by RSS (which is a rocking success, BTW for 
open-source interoperability).  Speaking of RSS, I think XBEL is properly a 
job for RDF.

It could be converted to RDF without a lot of damage to its structure.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Mar 31 13:54:35 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 06:54:35 -0700
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: Message from Ken MacLeod <ken@bitsko.slc.ut.us>
 of "30 Mar 2001 09:28:55 CST." <x51yrf1djc.fsf@bitsko.slc.ut.us>
Message-ID: <200103311354.GAA06314@localhost.localdomain>

> I have one such API implemented in Orchard.  The API is described in
> [1], and the Python implementation available from [2].  I also have a
> C implementation there as well, but the C <-> Python bridge is not
> available yet.
> 
> I've briefly mentioned Orchard here a couple of times, but not in the
> context of its DOM or SAX APIs, because I'd presumed, apparently
> incorrectly, that there'd be little interest in less Java-ish APIs
> when Python has very solid SAX and DOM bindings with several
> implementations.

Oh come now.  Fred and I are both DOM implementors who have expressed strong 
discontent with the DOM.  I'm all for a better tree API.

However, I'd like one, but *not* based on JDOM, but rather 100% Pythonic.  I 
think to do otherwise is to risk continuing the performance and 
resource-hogging properties of straightforward DOM ports.


> Orchard implements the "node based" SAX we've discussed here before,
> where the nodes used in SAX are the same nodes used to build a tree.
> Implementing a pull-parser with nodes is a minor addition, but hasn't
> been specced yet.

I think a Lisp approach to storing the nodes is an interesting idea, given 
Python's strong list processing.  Basically, just a straightforward 
translation of the parameters of SAX events (plus node-type) into nested 
lists.  Probably not exactly what we'd want for a PyDOM, but an easy straw man 
to build.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jtauber@bowstreet.com  Sat Mar 31 14:01:11 2001
From: jtauber@bowstreet.com (James Tauber)
Date: Sat, 31 Mar 2001 09:01:11 -0500
Subject: [XML-SIG] Metadata in XBEL
Message-ID: <C4EE90263CBFD411A0C800B0D0490095CCC5B1@bst-mail02>

> Oh come now.  Fred and I are both DOM implementors who have 
> expressed strong 
> discontent with the DOM.  I'm all for a better tree API.
> 
> However, I'd like one, but *not* based on JDOM, but rather 
> 100% Pythonic.  I 
> think to do otherwise is to risk continuing the performance and 
> resource-hogging properties of straightforward DOM ports.

Agreed. JDOM came about because DOM's language neutrality led to
inefficiencies for particular languages. JDOM attempted to take advantage of
the specifics of Java. A tree API should take advantage of the specifics of
Python.

James


From uche.ogbuji@fourthought.com  Sat Mar 31 14:03:04 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 07:03:04 -0700
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: Message from James Tauber <jtauber@bowstreet.com>
 of "Fri, 30 Mar 2001 19:58:29 EST." <C4EE90263CBFD411A0C800B0D0490095CCC5AF@bst-mail02>
Message-ID: <200103311403.HAA06336@localhost.localdomain>

> Now that PyTREX (http://pytrex.sourceforge.net/) is beta, is there any
> interest in making it part of PyXML?

Are you kidding?  Absolutely!

Disclaimer: I haven't had a moment to try it yet, though I hope to soon.  I 
swotted over the TREX specs without the benefit of a friendly implementation 
to play with, and PyTREX should be fun to poke at.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jtauber@bowstreet.com  Sat Mar 31 14:08:45 2001
From: jtauber@bowstreet.com (James Tauber)
Date: Sat, 31 Mar 2001 09:08:45 -0500
Subject: [XML-SIG] PyTREX as xml.schema.trex?
Message-ID: <C4EE90263CBFD411A0C800B0D0490095CCC5B2@bst-mail02>

> > Now that PyTREX (http://pytrex.sourceforge.net/) is beta, 
> is there any
> > interest in making it part of PyXML?
> 
> Certainly. I have imported it into /xml/xml/schema/trex.py, as you've
> proposed. I also made you a developer, so you can update it as
> needed. I have not incorporated the test suite. If you want to ship it
> with PyXML, you should import it into xml/test. If you merely want to
> provide a copy to PyXML, you can also import it into /test/trex. Or,
> you can leave that alone, so that everybody would get the test suite
> from the pytrex CVS.

Thank you!

I think I'll leave the test suite separate for now. I'll probably import it
later, though.

Have people typically maintained a parallel CVS and done separate releases
for a while? i.e. should I make changes to both pytrex/pytrex.py and
/xml/xml/schema/trex.py and continue to do releases from pytrex?

James


From jtauber@bowstreet.com  Sat Mar 31 14:13:47 2001
From: jtauber@bowstreet.com (James Tauber)
Date: Sat, 31 Mar 2001 09:13:47 -0500
Subject: [XML-SIG] PyTREX as xml.schema.trex?
Message-ID: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02>

One more thing...

Hints on how best to whip up some documentation? I would only need to write
up a single page saying how to invoke the validator and intepret the return
object.

James

> -----Original Message-----
> From: James Tauber [mailto:jtauber@bowstreet.com]
> Sent: Saturday, March 31, 2001 9:09 AM
> To: 'Martin v. Loewis'
> Cc: xml-sig@python.org
> Subject: RE: [XML-SIG] PyTREX as xml.schema.trex?
> 
> 
> > > Now that PyTREX (http://pytrex.sourceforge.net/) is beta, 
> > is there any
> > > interest in making it part of PyXML?
> > 
> > Certainly. I have imported it into /xml/xml/schema/trex.py, 
> as you've
> > proposed. I also made you a developer, so you can update it as
> > needed. I have not incorporated the test suite. If you want 
> to ship it
> > with PyXML, you should import it into xml/test. If you 
> merely want to
> > provide a copy to PyXML, you can also import it into /test/trex. Or,
> > you can leave that alone, so that everybody would get the test suite
> > from the pytrex CVS.
> 
> Thank you!
> 
> I think I'll leave the test suite separate for now. I'll 
> probably import it
> later, though.
> 
> Have people typically maintained a parallel CVS and done 
> separate releases
> for a while? i.e. should I make changes to both pytrex/pytrex.py and
> /xml/xml/schema/trex.py and continue to do releases from pytrex?
> 
> James
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
> 


From jtauber@bowstreet.com  Sat Mar 31 14:14:07 2001
From: jtauber@bowstreet.com (James Tauber)
Date: Sat, 31 Mar 2001 09:14:07 -0500
Subject: [XML-SIG] RDF?
Message-ID: <C4EE90263CBFD411A0C800B0D0490095CCC5B4@bst-mail02>

Where are we with standard RDF support in Python? Any work being done? Any
interest in Dan Krech and I donating the RDF library within Redfoot
(http://redfoot.sourceforge.net/)? Possibly merging with other RDF
implementations.

Uche?

James


From uche.ogbuji@fourthought.com  Sat Mar 31 14:23:59 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 07:23:59 -0700
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: Message from James Tauber <jtauber@bowstreet.com>
 of "Sat, 31 Mar 2001 09:08:45 EST." <C4EE90263CBFD411A0C800B0D0490095CCC5B2@bst-mail02>
Message-ID: <200103311423.HAA06459@localhost.localdomain>

> Have people typically maintained a parallel CVS and done separate releases
> for a while? i.e. should I make changes to both pytrex/pytrex.py and
> /xml/xml/schema/trex.py and continue to do releases from pytrex?

Yes.  This is how 4DOM, and now 4XSLT and 4XPath have migrated into PyXML 
core.  4DOM had parallel CVS for almost 6 months, and according to current 
schedule 4XSLT/XPath will have parallel CVS for about 2 months (until the 
4Suite 1.0 release ca. June 1).


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Mar 31 14:28:50 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 31 Mar 2001 07:28:50 -0700
Subject: [XML-SIG] RDF?
In-Reply-To: Message from James Tauber <jtauber@bowstreet.com>
 of "Sat, 31 Mar 2001 09:14:07 EST." <C4EE90263CBFD411A0C800B0D0490095CCC5B4@bst-mail02>
Message-ID: <200103311428.HAA06470@localhost.localdomain>

> 
> Where are we with standard RDF support in Python? Any work being done? Any
> interest in Dan Krech and I donating the RDF library within Redfoot
> (http://redfoot.sourceforge.net/)? Possibly merging with other RDF
> implementations.

Well, we have 4RDF as well, but there's no reason why we can't have multiple 
RDF implementations.

We could merge implementations, and I really have no problem with that, but 
since the core RDF model is subject to so many interpretations, I think it's 
an especially good idea to have parallel implementations.

The question is: who wants RDF in PyXML?  And would you prefer to start with a 
lightweight solution, or one with all the trimmings?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Sat Mar 31 15:02:29 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 31 Mar 2001 10:02:29 -0500
Subject: [XML-SIG] RDF?
References: <200103311428.HAA06470@localhost.localdomain>
Message-ID: <001c01c0b9f3$9bc5d900$7cac1218@reston1.va.home.com>

Uche Ogbuji
>
> Well, we have 4RDF as well, but there's no reason why we can't have
multiple
> RDF implementations.
>
 Yes, yes

> We could merge implementations, and I really have no problem with that,
but
> since the core RDF model is subject to so many interpretations, I think
it's
> an especially good idea to have parallel implementations.
>
yes again.

> The question is: who wants RDF in PyXML?  And would you prefer to start
with a
> lightweight solution, or one with all the trimmings?
>
Its a good question - about PyXML be only the "real" core xml
infrastructure - parsing, DOM, and so on, or about major support areas for
applications like RDF?  If not, that sounds like a new SIG.  That might be a
good iead in the long run, but we would need more keen minds blasting stuff
out first.  I think that the SIG shouldn't be split at this time, so I'd say
to keep RDF in it.

And many thanks to James for this work and his generous offer to share it.

Cheers,

Tom P


From ken@bitsko.slc.ut.us  Sat Mar 31 15:03:11 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 31 Mar 2001 09:03:11 -0600
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: Uche Ogbuji's message of "Sat, 31 Mar 2001 06:54:35 -0700"
References: <200103311354.GAA06314@localhost.localdomain>
Message-ID: <x5y9tmx9ow.fsf@bitsko.slc.ut.us>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

> However, I'd like [a Pythonic DOM], but *not* based on JDOM, but
> rather 100% Pythonic.  I think to do otherwise is to risk continuing
> the performance and resource-hogging properties of straightforward
> DOM ports.

The Orchard API is "near pure" Pythonic, using only objects for nodes,
and arrays and mappings for NodeLists and NamedNodeLists.  No
DOM-style iterators and manipulation.  Other behaviors, which do not
replicate built-in Python features (like normalize()) are retained.

"Near pure" means two things: 1) Orchard uses accessor overrides on
attribute lookups, to make things like element.tag_name map properly
to element.prefix and element.local_name, and 2) in refactoring the
node base class to work across non-XML nodes (like RSS or MPEG),
certain "intrinsic" properties, like Parent and NodeType are moved
into a seperate namespace.

Orchard provides namespaced-attributes on all nodes.  Not a necessity
just XML nodes, but a big win for other formats.  Namespaced
attributes are accessed using a tuple key acessing the node with
mapping syntax:

  dublin_core = "http://purl.org/dc/elements/1.1/"

  print rss_channel[(dublin_core, 'creator')]

I've recently came with an idea for an Orchard.namespace() function
which creates a name generator, so the above is more simply now:

  DC = Orchard.namespace("http://purl.org/dc/elements/1.1/")

  print rss_channel[DC.creator]

for all XML local-names which are valid Python tokens.

I've got to run now, so I'll do some more evangelizing later ;-)

  -- Ken


From fdrake@acm.org  Sat Mar 31 16:14:08 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 31 Mar 2001 11:14:08 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <200103311354.GAA06314@localhost.localdomain>
References: <ken@bitsko.slc.ut.us>
 <x51yrf1djc.fsf@bitsko.slc.ut.us>
 <200103311354.GAA06314@localhost.localdomain>
Message-ID: <15046.592.612106.548532@beowolf.pythonlabs.org>

Uche Ogbuji writes:
 > I think a Lisp approach to storing the nodes is an interesting idea, given 
 > Python's strong list processing.  Basically, just a straightforward 
 > translation of the parameters of SAX events (plus node-type) into nested 
 > lists.  Probably not exactly what we'd want for a PyDOM, but an easy straw
 > man to build.

  There is xml.utils.qp_xml, which is very lightweight.  Perhaps that
should be examined more carefully?
  I think one thing we need to consider before we settle on any
particular API is, how much abstraction should we provide, and how
much should we expose the lexical details?  One thing we've found
whlie working with Zope is that while abstract is nice, we usually
want to work our transformations in a near surgical manner -- the less
we change about the input, the better.  This is especially important
if we're feeding a WebDAV client, where a human is relatively likely
to want to view or edit the source text we generate.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 31 16:54:57 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 31 Mar 2001 18:54:57 +0200
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02> (message from
 James Tauber on Sat, 31 Mar 2001 09:13:47 -0500)
References: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02>
Message-ID: <200103311654.f2VGsvw08220@mira.informatik.hu-berlin.de>

> Hints on how best to whip up some documentation? I would only need
> to write up a single page saying how to invoke the validator and
> intepret the return object.

I think you've got two options: you can either commit something into
the PyXML www pages (checkout www from pyxml, on commit, a cron job
should pick it up automatically after 6 hours); alternatively, you
could write a section in doc/xml-howto.tex. That will take some more
time to propagate, since I (or Andrew) has to forward this change to
the Python HOWTOs.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 31 16:50:55 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 31 Mar 2001 18:50:55 +0200
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: <C4EE90263CBFD411A0C800B0D0490095CCC5B2@bst-mail02> (message from
 James Tauber on Sat, 31 Mar 2001 09:08:45 -0500)
References: <C4EE90263CBFD411A0C800B0D0490095CCC5B2@bst-mail02>
Message-ID: <200103311650.f2VGotu08217@mira.informatik.hu-berlin.de>

> Have people typically maintained a parallel CVS and done separate releases
> for a while? i.e. should I make changes to both pytrex/pytrex.py and
> /xml/xml/schema/trex.py and continue to do releases from pytrex?

People maintain stuff in parallel all the time. If you don't want to
lose your hair over it, you better use CVS tags to indicate when
you've copied changes from one tree to the other. That lets you find
out whether there have been independent or overlapping changes in
either tree.

Specifically for PyXML, atleast the following pieces had different
homes at some time:
- xmlproc (Lars gave up his own CVS tree just now)
- 4DOM (has been long in the Fourthought CVS)
- pyexpat.c (primarily lives in Python CVS, with copies in PyXML and
  Zope)
- expat (lives in expat CVS, and is updated in PyXML only occasionally)
- xml-howto.tex/xml-ref.tex (lives primarily in the Python HOWTOs)
- the core of xml.sax, and minidom (maintain in Python CVS, copied
  into PyXML - sometimes vice versa)

As for what you should do with PyTREX: that's your own decision. If
you expect to move on a fast pace, I recommend to keep your own
project - it might take some time until PyXML 0.7 is released. There
is then no need to commit every single change into the PyXML copy as
well. I'll give advance warning of a 0.7 release (likely after the
4XSLT issues have been settled).

Regards,
Martin


From fdrake@acm.org  Sat Mar 31 16:59:52 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 31 Mar 2001 11:59:52 -0500 (EST)
Subject: [XML-SIG] PyTREX as xml.schema.trex?
In-Reply-To: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02>
References: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02>
Message-ID: <15046.3336.163741.378763@beowolf.pythonlabs.org>

James Tauber writes:
 > Hints on how best to whip up some documentation? I would only need to write
 > up a single page saying how to invoke the validator and intepret the return
 > object.

  My recommendation is to use the Python LaTeX format.  Not because I
think it's a good format (though I don't think it's particularly bad),
but because it will be easy to integrate with other parts of the
Python documentation.
  I'm starting to move once more on turning all the documentation into
an XML format for authoring, and there will be a tool to convert the
Python LaTeX markup into XML with very little manual intervention.
I'm also starting to actually learn XSLT so I can start making use of
an XML version of the docs.  So there is hope.
  Regarding PyXML documentation, what I'd like to do is to create two
documents (based in part on the existing documentation in PyXML and
the Python Library Reference).  The first document (ok, probably a set
of smaller docs) will give the Python bindings of common XML APIs:
DOM, SAX2, etc.  The second will be reference documentation for
utility modules and implementation-specific extensions of the standard
APIs.  The tutorial HOWTO will remain a separate document.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From akuchlin@mems-exchange.org  Sat Mar 31 18:34:14 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Sat, 31 Mar 2001 13:34:14 -0500
Subject: [XML-SIG] RDF?
In-Reply-To: <001c01c0b9f3$9bc5d900$7cac1218@reston1.va.home.com>; from tpassin@home.com on Sat, Mar 31, 2001 at 10:02:29AM -0500
References: <200103311428.HAA06470@localhost.localdomain> <001c01c0b9f3$9bc5d900$7cac1218@reston1.va.home.com>
Message-ID: <20010331133414.A32562@ute.cnri.reston.va.us>

On Sat, Mar 31, 2001 at 10:02:29AM -0500, Thomas B. Passin wrote:
>Its a good question - about PyXML be only the "real" core xml
>infrastructure - parsing, DOM, and so on, or about major support areas for
>applications like RDF?  If not, that sounds like a new SIG.  That might be a

We've discussed applications on the XML-SIG before; just in the last
few days we've had all that discussion of XBEL, for example.
Discussion of building XML-related things in Python is on-topic for
this SIG, in my view, even if the system isn't a candidate for
inclusion in the PyXML distribution.  

RDF looks like it's going to be common and fundamental enough that
IMHO there should be some support for it in the basic package.  I'd
lean toward making it minimal instead of full-featured, but it doesn't
look like RDF needs that large an API; neither 4RDF nor Redfoot have
very large APIs, though I haven't looked at them very closely.

--amk


From akuchlin@mems-exchange.org  Sat Mar 31 18:36:15 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Sat, 31 Mar 2001 13:36:15 -0500
Subject: [XML-SIG] Documentation
In-Reply-To: <200103311654.f2VGsvw08220@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Mar 31, 2001 at 06:54:57PM +0200
References: <C4EE90263CBFD411A0C800B0D0490095CCC5B3@bst-mail02> <200103311654.f2VGsvw08220@mira.informatik.hu-berlin.de>
Message-ID: <20010331133615.B32562@ute.cnri.reston.va.us>

On Sat, Mar 31, 2001 at 06:54:57PM +0200, Martin v. Loewis wrote:
>could write a section in doc/xml-howto.tex. That will take some more
>time to propagate, since I (or Andrew) has to forward this change to
>the Python HOWTOs.

We should really change that, though; the master copy should live in 
the pyxml CVS on SourceForge, close to the code it documents.

--amk


From fdrake@acm.org  Sat Mar 31 19:21:47 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 31 Mar 2001 14:21:47 -0500 (EST)
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <200103311337.GAA06268@localhost.localdomain>
References: <david@mandrakesoft.com>
 <200103291635.f2TGZsr27038@faure.worldonline.co.uk>
 <200103311337.GAA06268@localhost.localdomain>
Message-ID: <15046.11851.341533.770037@beowolf.pythonlabs.org>

Uche Ogbuji writes:
 > Yes.  I actually implemented an off-line merge earlier, but I think a 
 > standardized merge indicator would be useful.

  To make this meaningful, do we need more discussion of what "merge"
means, or should this be left entirely to clients?  I'm inclined to
think we need a good description of the expected range of application
and motivation, and the rest can be left to specific applications.

 > That should instead be spelled
 > 
 > <merge include:href="file:/path/to/bookmarks/collection.xml"/>
 > 
 > Or such, so that processors that don't have first-class merge support can 
 > still include the other file through xinclude.

  This syntax seems reasonable; I presume we'll want to include some
way to mark multiple <merge/> sources with priorities to determine
"who wins" in the presence of multiple sources for a bookmark; some
applications will present all versions of a bookmark and others will
only want to present one but make the determination based on the
bookmark data.
  I presume this <merge/> element should be allowed in both <xbel/>
and <folder/> elements.  Do we want to do this in XBEL 1.1 or wait for
more experiance before adding it?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From rsalz@zolera.com  Sat Mar 31 19:30:53 2001
From: rsalz@zolera.com (Rich Salz)
Date: Sat, 31 Mar 2001 14:30:53 -0500
Subject: [XML-SIG] RDF?
Message-ID: <200103311930.OAA14630@zolera.com>

I agree that anything having to do with Python implementation of XML
can be discussed here.

As for what should be in PyXML...  I used to be quite sure: core
technology only.  Sure, there were those nagging doubts -- what
is 'core,' other than 'I know it when I see it' -- but I was
pretty confident.

Now, I'm not so sure.  (It was consideration of xmlrpc.)  For the
end-user, what are the reasons to not include a package under PyXML?
I understand the developer issues -- see James's recent thread bout
PyTrex, for example -- but what does an end-user (i.e., me :) care?

Hoping to spark some discussion.
	/r$


From tpassin@home.com  Sat Mar 31 19:59:30 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 31 Mar 2001 14:59:30 -0500
Subject: [XML-SIG] RDF?
References: <200103311428.HAA06470@localhost.localdomain> <001c01c0b9f3$9bc5d900$7cac1218@reston1.va.home.com> <20010331133414.A32562@ute.cnri.reston.va.us>
Message-ID: <004201c0ba1d$1945da00$7cac1218@reston1.va.home.com>

Andrew Kuchling said -

> We've discussed applications on the XML-SIG before; just in the last
> few days we've had all that discussion of XBEL, for example.
> Discussion of building XML-related things in Python is on-topic for
> this SIG, in my view, even if the system isn't a candidate for
> inclusion in the PyXML distribution.  
> 
Yes, me too.

Tom P


From martin@loewis.home.cs.tu-berlin.de  Sat Mar 31 20:45:50 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 31 Mar 2001 22:45:50 +0200
Subject: [XML-SIG] RDF?
In-Reply-To: <200103311930.OAA14630@zolera.com> (message from Rich Salz on
 Sat, 31 Mar 2001 14:30:53 -0500)
References: <200103311930.OAA14630@zolera.com>
Message-ID: <200103312045.f2VKjoc09005@mira.informatik.hu-berlin.de>

> Now, I'm not so sure.  (It was consideration of xmlrpc.)  For the
> end-user, what are the reasons to not include a package under PyXML?
> I understand the developer issues -- see James's recent thread bout
> PyTrex, for example -- but what does an end-user (i.e., me :) care?

End users will only complain about "too much" if the size of the
distribution grows unacceptably. Since bandwidth and disk space are
going up all the time, this is not a real danger - although 20MB
testsuite for xmlproc probably would have been a little too much.

They will also complain if the stuff that is there does not
work. That, indirectly, limits growth, and means that sometimes things
have to be taken out just because the original author ran away, and
nobody cares to maintain it (or because the original author did not
want to take the blame anymore :-).

Regards,
Martin


From david@mandrakesoft.com  Sat Mar 31 22:21:13 2001
From: david@mandrakesoft.com (David Faure)
Date: Sat, 31 Mar 2001 23:21:13 +0100
Subject: [XML-SIG] Metadata in XBEL
In-Reply-To: <15046.11851.341533.770037@beowolf.pythonlabs.org>
References: <david@mandrakesoft.com> <200103311337.GAA06268@localhost.localdomain> <15046.11851.341533.770037@beowolf.pythonlabs.org>
Message-ID: <200103312221.f2VMLEX02984@faure.worldonline.co.uk>

On Saturday 31 March 2001 20:21, Fred L. Drake, Jr. wrote:
> Uche Ogbuji writes:
>  > Yes.  I actually implemented an off-line merge earlier, but I think a 
>  > standardized merge indicator would be useful.

What's off-line merge ?

>   To make this meaningful, do we need more discussion of what "merge"
> means, or should this be left entirely to clients?  I'm inclined to
> think we need a good description of the expected range of application
> and motivation, and the rest can be left to specific applications.
> 
>  > That should instead be spelled
>  > 
>  > <merge include:href="file:/path/to/bookmarks/collection.xml"/>
>  > 
>  > Or such, so that processors that don't have first-class merge support can 
>  > still include the other file through xinclude.
> 
>   This syntax seems reasonable; I presume we'll want to include some
> way to mark multiple <merge/> sources with priorities to determine
> "who wins" in the presence of multiple sources for a bookmark; some
> applications will present all versions of a bookmark and others will
> only want to present one but make the determination based on the
> bookmark data.
>   I presume this <merge/> element should be allowed in both <xbel/>
> and <folder/> elements.  Do we want to do this in XBEL 1.1 or wait for
> more experiance before adding it?

I agree that the whole issue probably needs more thinking if we want to
get it right and devise a complete merging mechanism.

I was simply suggesting an easy solution (including another file) - but that
definitely doesn't go as far as a full merging, plus the possibility to
"hide" included bookmarks, etc.

I'm fine with this being left out from XBEL 1.1, and we can come back
on it when someone starts implementing it, or if someone has a mechanism
to suggest.

-- 
David FAURE, david@mandrakesoft.com, faure@kde.org
http://perso.mandrakesoft.com/~david/, http://www.konqueror.org/
KDE, Making The Future of Computing Available Today