From uche.ogbuji@fourthought.com  Tue Jan  2 03:57:47 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 01 Jan 2001 20:57:47 -0700
Subject: [XML-SIG] 4Suite -> gettext
Message-ID: <200101020357.UAA21220@localhost.localdomain>

I started looking into converting 4Suite from my hacked i18n to Python's 
gettext, but it seems this is only supported for Python 2.0.

Unfortunately, as we've discussed here before, we need to maintain support for 
Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?)

So I'm holding off on the changes for now.  If anyone has any tricks for 
straddling Python versions using gettext, please let me know.  Thanks.

I will look next at supporting Martin's factory architecture for 4XPath/4XSLT.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Tue Jan  2 04:08:41 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 1 Jan 2001 23:08:41 -0500
Subject: [XML-SIG] 4Suite -> gettext
References: <200101020357.UAA21220@localhost.localdomain>
Message-ID: <00cf01c07471$b158fee0$7cac1218@reston1.va.home.com>

<uche.ogbuji@fourthought.com> asks -

>
> Unfortunately, as we've discussed here before, we need to maintain support
for
> Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?)
>
> So I'm holding off on the changes for now.  If anyone has any tricks for
> straddling Python versions using gettext, please let me know.  Thanks.
>
At least 6 months after Zope switches to 2.0.

Cheers,
Tom P


From gstein@lyra.org  Tue Jan  2 04:16:38 2001
From: gstein@lyra.org (Greg Stein)
Date: Mon, 1 Jan 2001 20:16:38 -0800
Subject: [XML-SIG] 4Suite -> gettext
In-Reply-To: <200101020357.UAA21220@localhost.localdomain>; from uche.ogbuji@fourthought.com on Mon, Jan 01, 2001 at 08:57:47PM -0700
References: <200101020357.UAA21220@localhost.localdomain>
Message-ID: <20010101201638.O10567@lyra.org>

On Mon, Jan 01, 2001 at 08:57:47PM -0700, uche.ogbuji@fourthought.com wrote:
> I started looking into converting 4Suite from my hacked i18n to Python's 
> gettext, but it seems this is only supported for Python 2.0.
> 
> Unfortunately, as we've discussed here before, we need to maintain support for
> Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?)

By "we", do you mean Fourthought, or PyXML?

IIRC, PyXML 0.5.5.1 is for 1.5.2 and the latest is for Python 2.0 only.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From uche.ogbuji@fourthought.com  Tue Jan  2 05:20:50 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 01 Jan 2001 22:20:50 -0700
Subject: [XML-SIG] 4Suite -> gettext
References: <200101020357.UAA21220@localhost.localdomain> <20010101201638.O10567@lyra.org>
Message-ID: <3A516532.5DFA3CA7@fourthought.com>

Greg Stein wrote:
> 
> On Mon, Jan 01, 2001 at 08:57:47PM -0700, uche.ogbuji@fourthought.com wrote:
> > I started looking into converting 4Suite from my hacked i18n to Python's
> > gettext, but it seems this is only supported for Python 2.0.
> >
> > Unfortunately, as we've discussed here before, we need to maintain support for
> > Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?)
> 
> By "we", do you mean Fourthought, or PyXML?

Well, to be clear, PyXML debated about it and no firm resolution was
come to, except that Martin added back in the Unicode support. 
Fourthought certainly intends the support.

> IIRC, PyXML 0.5.5.1 is for 1.5.2 and the latest is for Python 2.0 only.

I thought that was the original plan, but that it was decided to
continue supporting Python 1.5.2 in PyXML 0.6.3 and up.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Tue Jan  2 07:49:28 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 2 Jan 2001 08:49:28 +0100
Subject: [4suite] Re: [XML-SIG] 4Suite -> gettext
In-Reply-To: <3A516532.5DFA3CA7@fourthought.com> (message from Uche Ogbuji on
 Mon, 01 Jan 2001 22:20:50 -0700)
References: <200101020357.UAA21220@localhost.localdomain> <20010101201638.O10567@lyra.org> <3A516532.5DFA3CA7@fourthought.com>
Message-ID: <200101020749.IAA00706@loewis.home.cs.tu-berlin.de>

> I thought that was the original plan, but that it was decided to
> continue supporting Python 1.5.2 in PyXML 0.6.3 and up.

Indeed, the rationale being that people using PyXML want to also use
Python 1.5. PyXML 0.5.5.1 is not supported in any sense: Nobody
answers even questions related to that release, all they can get is a
recommendation to use the latest release. That recommendation would be
meaningless if the latest versions didn't support Python 1.5. There
are even binary distributions of PyXML 0.6 for Python 1.5.2.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Jan  2 07:56:09 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 2 Jan 2001 08:56:09 +0100
Subject: [XML-SIG] Re: [4suite] 4Suite -> gettext
In-Reply-To: <200101020357.UAA21220@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200101020357.UAA21220@localhost.localdomain>
Message-ID: <200101020756.IAA00757@loewis.home.cs.tu-berlin.de>

> So I'm holding off on the changes for now.  If anyone has any tricks
> for straddling Python versions using gettext, please let me know.

How about "degraded functionality":

try:
  import gettext
  def _(msg):
    gettext.dgettext("4suite",msg)
except ImportError:
  def _(msg):
    return msg

That is, for 1.5, there would be only the english message. That
shouldn't be a major obstacle, since there aren't any translations of
the messages so far, AFAICT.

Regards,
Martin

P.S. On some Linux systems, the above import will even succeed with
1.5, and do the right thing. A gettext module is available as part of
the GNOME package.


From martin@loewis.home.cs.tu-berlin.de  Tue Jan  2 08:00:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 2 Jan 2001 09:00:39 +0100
Subject: [XML-SIG] Preparing 0.6.3
Message-ID: <200101020800.JAA00817@loewis.home.cs.tu-berlin.de>

I'd like to release PyXML 0.6.3 later this week or early next week. If
you have any changes that you'd like to get into the release - this
would be the time to check them in. Remember: there will be always
another release.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Tue Jan  2 14:49:29 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 2 Jan 2001 15:49:29 +0100 (CET)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <200012231649.JAA02948@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0101021544300.8825-100000@leo.logilab.fr>

Sorry to come back on last weeks mails, but I was offline for a while (and
this was really a *good* thing for my mental health). Anyway, a happy new
year to everyone here... 

On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote:

> Seriously, after a quick survey of my code, the only place I import Node is in 
> order to get at the constants.

Yup, I noticed this in 4Suite code, and I kept wondering about the
rational of doing so, since almost every object you manipulate _is_ a
node, and therefore has access to the class attributes. 
In other words a typical line of code is: 
"if some_node.nodeType == Node.ELEMENT_NODE :"

Is there a difference in performance with:
"if some_node.nodeType == some_node.ELEMENT_NODE :" ?

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From teg@redhat.com  Tue Jan  2 14:53:35 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 02 Jan 2001 09:53:35 -0500
Subject: [XML-SIG] 4Suite -> gettext
In-Reply-To: <200101020357.UAA21220@localhost.localdomain>
References: <200101020357.UAA21220@localhost.localdomain>
Message-ID: <xuyhf3i0zv4.fsf@halden.devel.redhat.com>

uche.ogbuji@fourthought.com writes:

> I started looking into converting 4Suite from my hacked i18n to Python's 
> gettext, but it seems this is only supported for Python 2.0.

There are modules available for Python 1.5 handling this - we use one
in the installer for Red Hat Linux (which is written in python), and
there is also one which is part of pygnome.

> Unfortunately, as we've discussed here before, we need to maintain support for 
> Python 1.5.2 for a while (BTW, does anyone have any thoughts on how long?)

We're using python 1.5 through the 7 series. We may (or not - right
now, likely not) include a python2 package as well, but it won't be
the primary one.

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From uche.ogbuji@fourthought.com  Tue Jan  2 16:37:33 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 02 Jan 2001 09:37:33 -0700
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Tue, 02 Jan 2001 15:49:29 +0100." <Pine.LNX.4.21.0101021544300.8825-100000@leo.logilab.fr>
Message-ID: <200101021637.JAA01405@localhost.localdomain>

> On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote:
> 
> > Seriously, after a quick survey of my code, the only place I import Node is in 
> > order to get at the constants.
> 
> Yup, I noticed this in 4Suite code, and I kept wondering about the
> rational of doing so, since almost every object you manipulate _is_ a
> node, and therefore has access to the class attributes. 
> In other words a typical line of code is: 
> "if some_node.nodeType == Node.ELEMENT_NODE :"
> 
> Is there a difference in performance with:
> "if some_node.nodeType == some_node.ELEMENT_NODE :" ?

Nope.  It's all about developer's intertia, AKA cutnpasteitis.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Tue Jan  2 22:34:55 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 2 Jan 2001 23:34:55 +0100
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <Pine.LNX.4.21.0101021544300.8825-100000@leo.logilab.fr> (message
 from Alexandre Fayolle on Tue, 2 Jan 2001 15:49:29 +0100 (CET))
References: <Pine.LNX.4.21.0101021544300.8825-100000@leo.logilab.fr>
Message-ID: <200101022234.f02MYtN07543@mira.informatik.hu-berlin.de>

> > Seriously, after a quick survey of my code, the only place I
> > import Node is in order to get at the constants.

> Yup, I noticed this in 4Suite code

Actually, when editing 4DOM, I found that a number of places uses Node
as a base class, so you still need to import the module.

> Is there a difference in performance with:
> "if some_node.nodeType == some_node.ELEMENT_NODE :" ?

Yes, but it should not matter much. If you have an inheritance depth
of 4 (xml.dom.Node, xml.dom.FtNode.Node, xml.dom.Element.Element,
something that derives from Element), then you get 5 dictionary
lookups to find self.ELEMENT_NODE (for the instance, and for each of
the bases).

For Node.ELEMENT_NODE, you get only two (one to find Node, one to find
ELEMENT_NODE); three if you look in FtNode.Node, four if you write
xml.dom.Node.ELEMENT_NODE.

Since dictionary lookups were tuned to be one of the most efficient
operations in Python, and since it is so easy to get many dictionary
lookups in other places, that really shouldn't matter much.

So what counts would be clarity, I have to admit that I find
Node.ELEMENT_NODE clearer than self.ELEMENT_NODE (although either is
obvious if you know the DOM).

Regards,
Martin


From Mike.Olson@fourthought.com  Wed Jan  3 06:06:00 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 02 Jan 2001 23:06:00 -0700
Subject: [XML-SIG] PyXPath 1.1
References: <200012270120.SAA02777@localhost.localdomain>
Message-ID: <3A52C148.390819BE@FourThought.com>

uche.ogbuji@fourthought.com wrote:
> >
> > Likely! :-) I briefly skimmed the source and 4suite.org and can't seem
> > to get a good description of what those structures look like, is there
> > a URL I missed?
> 
> There is no such beast.  These were originally intended to be purely internal
> objects.  If we decided to expose them as an API, we'd want to decide on the
> naming (Martin doesn't like the "Parsed" prefixes, I'm +0 on killing them) and
> document them properly.

I'm confused.  This thread originally started as an interface from
multiple lexers into 4XPath (if I remeber correctly).  However, the
Parsed* classes in 4XPath are created by the parser (Bison).  This is
why I originally recommended the interface of a token stream to feed
into the parser (currently Bison, but could be replaced with a python
only version).

Mike


> 
> For now, your best bet is to have a look at XPath/Parsed* in 4Suite (and also
> check out Xslt/Parsed* for the associated Pattern machine objects).
> 
> > Note also: I'm getting odd URL redirects going to 4suite.{org|com},
> > with URLs being replaced with quoted strings that then won't resolve:
> >
> >   http://www.4suite.org/
> >     --> http://www.4suite.org/"index.epy"
> >
> > This seems to happen on "directory" URLs.
> 
> Hmm.  I looked into this, but I'm not seeing it.  I went as bare-bones as
> possible to avoid user agent artifacts and all that:
> 
> [uogbuji@borgia uogbuji]$ telnet www.4suite.org 80
> Trying 204.144.146.184...
> Connected to dollar.4suite.org.
> Escape character is '^]'.
> GET http://www.4suite.org/ HTTP/1.0
> 
> HTTP/1.1 200 OK
> Date: Wed, 27 Dec 2000 01:14:59 GMT
> Server: Apache/1.3.12 (Unix) mod_snake/0.4.1
> Last-Modified: Thu, 02 Nov 2000 19:07:30 GMT
> ETag: "36f0d-178-3a01bb72"
> Accept-Ranges: bytes
> Content-Length: 376
> Connection: close
> Content-Type: text/html
> 
> <html>
> <head>
>   <meta http-equiv='Content-Type' content='text/html'>
>   <meta http-equiv='Refresh' content='1;URL="index.epy"'>
> </head>
> <body>
> 
>   <TABLE WIDTH="100%" HEIGHT="100%">
>     <TR>
>       <TD ALIGN="CENTER">  <img src="images/4suite-org.gif"/><BR>
>          <FONT SIZE="+1"><A HREF="index.epy">Click to Enter</A></FONT>
>       </TD>
>     </TR>
>   </TABLE>
> 
> </body>
> </html>
> Connection closed by foreign host.
> [uogbuji@borgia uogbuji]$
> 
> As you can see, the meta refresh goes to the relative "index.epy".  I don't
> know how this would cause the effect you mention.  What user agent are you
> using?
> 
> Thanks.
> 
> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ken@bitsko.slc.ut.us  Wed Jan  3 11:34:52 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 03 Jan 2001 05:34:52 -0600
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: Mike Olson's message of "Tue, 02 Jan 2001 23:06:00 -0700"
References: <200012270120.SAA02777@localhost.localdomain>
 <3A52C148.390819BE@FourThought.com>
Message-ID: <x71yukswbn.fsf@bitsko.slc.ut.us>

Mike Olson <Mike.Olson@fourthought.com> writes:

> uche.ogbuji@fourthought.com wrote:
> > >
> > > Likely! :-) I briefly skimmed the source and 4suite.org and
> > > can't seem to get a good description of what those structures
> > > look like, is there a URL I missed?
> > 
> > There is no such beast.  These were originally intended to be
> > purely internal objects.  If we decided to expose them as an API,
> > we'd want to decide on the naming (Martin doesn't like the
> > "Parsed" prefixes, I'm +0 on killing them) and document them
> > properly.
> 
> I'm confused.  This thread originally started as an interface from
> multiple lexers into 4XPath (if I remeber correctly).  However, the
> Parsed* classes in 4XPath are created by the parser (Bison).  This
> is why I originally recommended the interface of a token stream to
> feed into the parser (currently Bison, but could be replaced with a
> python only version).

I'm the one who asked for the resulting parse tree, rather than the
token stream.

I would like to use the XPath (already parsed, thanks everyone!) to
traverse other structures, like Py objects (where Py object attribute
names stand in for XML element names).

  -- Ken


From martin@loewis.home.cs.tu-berlin.de  Wed Jan  3 11:21:48 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 3 Jan 2001 12:21:48 +0100
Subject: [XML-SIG] PyXPath 1.1
In-Reply-To: <3A52C148.390819BE@FourThought.com> (message from Mike Olson on
 Tue, 02 Jan 2001 23:06:00 -0700)
References: <200012270120.SAA02777@localhost.localdomain> <3A52C148.390819BE@FourThought.com>
Message-ID: <200101031121.f03BLmw01813@mira.informatik.hu-berlin.de>

> I'm confused.  This thread originally started as an interface from
> multiple lexers into 4XPath (if I remeber correctly).  

It was never clear an interface to *what* is the subject. As the
subject still indicates, it started with my announcement that I have
multiple pure-Python lexers and parsers. It may be reasonable to get
an interface to multiple lexers also, but only if there are actually
multiple lexers that are sufficiently different (e.g. C based ones,
sre based ones, fast ones, correct ones - assuming you can't be fast
and correct simultaneously).

Note that an interface to XPath could be even higher-level than the
parsing level, since there are multiple independent software blocks
involved in your typical XPath application:
- the XPath lexer (reading streams, generating tokens)
- the XPath parser (reading tokens, generating trees)
- the tree implementation (providing expression trees, offering evaluation)
- the application (evaluating trees)

That gives a total of three potential interfaces. There may be other
things that an application wishes to do with an XPath expression
(e.g. navigating it), which would require more features from the tree
implementation.

> However, the Parsed* classes in 4XPath are created by the parser
> (Bison).  This is why I originally recommended the interface of a
> token stream to feed into the parser (currently Bison, but could be
> replaced with a python only version).

As a matter of fact, PyXML 1.2 creates the Parsed* classes without a
bison parser.

Regards,
Martin


From Mike.Olson@fourthought.com  Wed Jan  3 17:09:50 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Wed, 03 Jan 2001 10:09:50 -0700
Subject: [XML-SIG] Specializing DOM exceptions
References: <200101021637.JAA01405@localhost.localdomain>
Message-ID: <3A535CDE.80C448E6@FourThought.com>

uche.ogbuji@fourthought.com wrote:
> 
> > On Sat, 23 Dec 2000 uche.ogbuji@fourthought.com wrote:
> >
> > > Seriously, after a quick survey of my code, the only place I import Node is in
> > > order to get at the constants.
> >
> > Yup, I noticed this in 4Suite code, and I kept wondering about the
> > rational of doing so, since almost every object you manipulate _is_ a
> > node, and therefore has access to the class attributes.
> > In other words a typical line of code is:
> > "if some_node.nodeType == Node.ELEMENT_NODE :"
> >
> > Is there a difference in performance with:
> > "if some_node.nodeType == some_node.ELEMENT_NODE :" ?
> 
> Nope.  It's all about developer's intertia, AKA cutnpasteitis.


Actually there may be a small performace advantage doing it they way it
is done.  Looking it up from the instance it will have to look into
atleast 3 dictionaries to find the value, while looking it up from the
class itself it will only have to look into one dictionary.  (though
this theroy is untested)

Mike


> 
> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Wed Jan  3 17:48:20 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 3 Jan 2001 12:48:20 -0500 (EST)
Subject: [XML-SIG] Specializing DOM exceptions
In-Reply-To: <3A535CDE.80C448E6@FourThought.com>
References: <200101021637.JAA01405@localhost.localdomain>
 <3A535CDE.80C448E6@FourThought.com>
Message-ID: <14931.26084.715041.483820@cj42289-a.reston1.va.home.com>

Mike Olson writes:
 > Actually there may be a small performace advantage doing it they way it
 > is done.  Looking it up from the instance it will have to look into
 > atleast 3 dictionaries to find the value, while looking it up from the
 > class itself it will only have to look into one dictionary.  (though
 > this theroy is untested)

Mike,
  It doesn't quite work like that -- looking it up from the class only
takes one dict lookup *once you have the class*, but you are also
doing one lookup for the class itself, assuming you've imported it
into your module's globals.  So the difference is a single dictionary
lookup for each level of class derivation from Node.  For interned
strings, this is pretty trivial and you can reasonably expect it to
disappear in the wash.
  On the other hand, picking it up from the class does assure you know
the exact access path, and some people think it's more readable.
"from xml.dom import Node" is your friend.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From Mike.Olson@fourthought.com  Wed Jan  3 18:56:40 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Wed, 03 Jan 2001 11:56:40 -0700
Subject: [XML-SIG] Specializing DOM exceptions
References: <200101021637.JAA01405@localhost.localdomain>
 <3A535CDE.80C448E6@FourThought.com> <14931.26084.715041.483820@cj42289-a.reston1.va.home.com>
Message-ID: <3A5375E8.B1B0DB41@FourThought.com>

"Fred L. Drake, Jr." wrote:
> 
> Mike Olson writes:
>  > Actually there may be a small performace advantage doing it they way it
>  > is done.  Looking it up from the instance it will have to look into
>  > atleast 3 dictionaries to find the value, while looking it up from the
>  > class itself it will only have to look into one dictionary.  (though
>  > this theroy is untested)
> 
> Mike,
>   It doesn't quite work like that -- looking it up from the class only
> takes one dict lookup *once you have the class*, but you are also
> doing one lookup for the class itself, assuming you've imported it
> into your module's globals.  

Of course, I realized after I saw Martin's response.  That's what I get
for answering email before coffee.


Mike

So the difference is a single dictionary
> lookup for each level of class derivation from Node.  For interned
> strings, this is pretty trivial and you can reasonably expect it to
> disappear in the wash.
>   On the other hand, picking it up from the class does assure you know
> the exact access path, and some people think it's more readable.
> "from xml.dom import Node" is your friend.  ;-)
> 
>   -Fred
> 
> --
> Fred L. Drake, Jr.  <fdrake at acm.org>
> PythonLabs at Digital Creations

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Thu Jan  4 14:37:08 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 04 Jan 2001 07:37:08 -0700
Subject: [XML-SIG] Python XML topic page
Message-ID: <200101041437.HAA07999@localhost.localdomain>

http://pyxml.sourceforge.net/topics/

Way out of date in general.  I'd like to make bunch of additions and a few 
corrections.  First of all I wanted to be sure no one minded.  If not, the 
next bit is knowing where it is in the sourceforge source tree.

While I'm noting the fact, python.org is terribly out of date in general 
beyond the first few pages.  I know there are some unfortunate reasons behind 
this, but it's pretty sad.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From akuchlin@mems-exchange.org  Thu Jan  4 15:13:05 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 4 Jan 2001 10:13:05 -0500
Subject: [XML-SIG] Python XML topic page
In-Reply-To: <200101041437.HAA07999@localhost.localdomain>; from uche.ogbuji@fourthought.com on Thu, Jan 04, 2001 at 07:37:08AM -0700
References: <200101041437.HAA07999@localhost.localdomain>
Message-ID: <20010104101305.A23803@kronos.cnri.reston.va.us>

On Thu, Jan 04, 2001 at 07:37:08AM -0700, uche.ogbuji@fourthought.com wrote:
>Way out of date in general.  I'd like to make bunch of additions and a few 
>corrections.  First of all I wanted to be sure no one minded.  If not, the 
>next bit is knowing where it is in the sourceforge source tree.

Please do.  The Web pages are in a separate module, 'www', so you'll
have to check that module out from cvs.pyxml.sourceforge.net
separately.

>While I'm noting the fact, python.org is terribly out of date in general 
>beyond the first few pages.  I know there are some unfortunate reasons behind 
>this, but it's pretty sad.

Yes; we should make it a goal to spruce things up through the first
half of 2001 (maybe after 2.1 is released).

--amk


From martin@loewis.home.cs.tu-berlin.de  Thu Jan  4 18:44:44 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 4 Jan 2001 19:44:44 +0100
Subject: [XML-SIG] Python XML topic page
In-Reply-To: <20010104101305.A23803@kronos.cnri.reston.va.us> (message from
 Andrew Kuchling on Thu, 4 Jan 2001 10:13:05 -0500)
References: <200101041437.HAA07999@localhost.localdomain> <20010104101305.A23803@kronos.cnri.reston.va.us>
Message-ID: <200101041844.f04Iii601097@mira.informatik.hu-berlin.de>

> >Way out of date in general.  I'd like to make bunch of additions and a few 
> >corrections.  First of all I wanted to be sure no one minded.  If not, the 
> >next bit is knowing where it is in the sourceforge source tree.
> 
> Please do.  The Web pages are in a separate module, 'www', so you'll
> have to check that module out from cvs.pyxml.sourceforge.net
> separately.

I'd like to add that a cron job is supposed to re-generate the pages
within 6 hours after the changes have been committed. You can run the
generator manually if you want on pyxml.sourceforge.net, although I
recommend running it locally if you only want to check whether it is
correct; customize doupdate to your needs to do so.

Regards,
Martin


From loewis@informatik.hu-berlin.de  Sun Jan  7 11:22:03 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Sun, 7 Jan 2001 12:22:03 +0100 (MET)
Subject: [XML-SIG] PyXML 0.6.3 is available
Message-ID: <200101071122.MAA15470@pandora.informatik.hu-berlin.de>

Version 0.6.3 of the Python/XML distribution is now available.  It
should be considered a beta release, and can be downloaded from
the following URLs:

http://download.sourceforge.net/pyxml/PyXML-0.6.3.tar.gz
http://download.sourceforge.net/pyxml/PyXML-0.6.3.win32-py1.5.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.3.win32-py2.0.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.3-1.5.2.i386.rpm
http://download.sourceforge.net/pyxml/PyXML-0.6.3-2.0.i386.rpm

Changes in this version, compared to 0.6.2:

	* Include documentation in binary packages as well.

	* Update to Expat 1.2, offer all Python Unicode codecs to
          expat.

        * support the lexical-handler property in the expat SAX driver.

	* Restructure DOM interfaces to better accomodate multiple
          DOM implementations: provide standard exceptions and symbolic
          constants (including those inside of the Node interface) in
          xml.dom.

	* Improve minidom: validate arguments and raise DOM exceptions,
          correct NameNodeMap operations, offer cloneNode, splitText,
          DocumentType, DOMImplementation, and correct various other
          errors.

	* Restore xml.unicode for compatibility with PyXML 0.5. This is
          a pure-Python implementation of the iso8859 module, which can
          only convert between ISO-8859-x and UTF-8. Python 2 users
          should use the Unicode type instead of this service.

	* Fix memory leaks in expat parser and pulldom.

The Python/XML distribution contains the basic tools required for
processing XML data using the Python programming language, assembled
into one easy-to-install package.  The distribution includes parsers
and standard interfaces such as SAX and DOM, along with various other
useful modules. =20

The package currently contains:

	* XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius
Garshol), sgmlop (Fredrik Lundh).

	* SAX interface (Lars Marius Garshol)
	* minidom DOM implementation (Paul Prescod)
	* 4DOM from Fourthought (Uche Ogbuji, Mike Olson)
	* Various utility modules and functions (various people)
	* Documentation and example programs (various people)

The code is being developed bazaar-style by contributors from the
Python XML Special Interest Group, so please send comments, questions,
or bug reports to <xml-sig@python.org>.

For more information about Python and XML, see:
	http://www.python.org/topics/xml/

--=20
Martin v. L=F6wis               http://www.informatik.hu-berlin.de/~loewis


From noreply@sourceforge.net  Mon Jan  8 15:32:04 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jan 2001 07:32:04 -0800
Subject: [XML-SIG] [Bug #128044] 4DOM is unpickleable
Message-ID: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>

Bug #128044, was updated on 2001-Jan-08 07:31
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: larsga
Assigned to : nobody
Summary: 4DOM is unpickleable

Details: For some reason, when trying to dump 4DOM Document nodes 
with cPickle or pickle under Python 2.0, only the Document 
node is serialized.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128044&group_id=6473


From nobody@sourceforge.net  Tue Jan  9 15:38:50 2001
From: nobody@sourceforge.net (nobody)
Date: Tue, 09 Jan 2001 07:38:50 -0800
Subject: [XML-SIG] [Bug #128172] [4XSLT] strange behaviour of xsl:import
Message-ID: <E14G0rK-0001EV-00@usw-sf-web1.sourceforge.net>

From: noreply@sourceforge.net

Bug #128172, was updated on 2001-Jan-09 07:38
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: ornicar
Assigned to : nobody
Summary: [4XSLT] strange behaviour of xsl:import


Details:  Hello,

  I'm using XSL Transformation to turn xml trees in viewable html
documents (again!) and I've just found something looking like a bug in
4xslt engine.

  The attached xml file (carpool.xml) contains data on a pool of cars.
Each <car> node has a 'state' attribute used to know if the car is free or
used or in the garage for maintenance. The 'state' value is a number.

    <car state="1">
      <brand> ... </brand>
      ...
    </car>

  In order to display valuable information on a web page, I use the
attached xslt stylesheet (pool2html.xsl) and I transform the 'state'
numeric value to an understandable string. As this number-to-string
transformation should be used in various stylesheets, I put it in a
named-template stored in a common XSLT stylesheet (pool-comm.xsl). This
common stylesheet is imported at the beginning of pool2html.xsl
stylesheet.

    <xsl:stylesheet ...>
      <xsl:import href="pool-comm.xsl"/>
      ...
      <xsl:template match="car">
        ...
        <xsl:call-template name="state-value"/>
        ...
      </xsl:template>
    </xsl:stylesheet>

  What I expected to get is the attached html document called
expected-pool.html . Nevertheless, I got the attached html document called
pool.html ! 4xslt wasn't able to call the template named 'state-value'
whereas this template is defined in the imported stylesheet
(pool-comm.xsl). Another XSLT engine (e.g. xalan) is able to call the
template and outputs the expected html.

  A stranger behaviour : when I replace the 'xsl:import' with an
'xsl:include', 4xslt can call the named-template and outputs the expected
html.
  I read very carefully the XSLT spec and I didn't find any possible
explanation to this strange behaviour ... could you let me know if this is
a bug or if there is something I didn't get in stylesheets combination
philosophy.

  Best regards, 

    O. CAYROL.

-------------------------------------------------------
carpool.xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

<!DOCTYPE pool>

<pool>
  <car state="1">
    <brand>Ferrari</brand>
    <type>F40</type>
    <number>459 CBO 75</number>
  </car>
  <car state="2">
    <brand>Porsche</brand>
    <type>911</type>
    <number>347 CQQ 75</number>
  </car>
</pool>
-----------------------------------------------------
pool2html.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:import href="pool-comm.xsl"/>

  <xsl:output method="html" 
              version="4.0" 
              encoding="ISO-8859-1" 
              indent="yes" 
              doctype-public="-//W3C//DTD HTML 4.0//EN"/>

  <xsl:template match="/">
<html>
 <head>
  <title>Cars Pool Management</title>
  <meta http-equiv="content-type" content="text/html"/>
 </head>

 <body bgcolor="#FFFFFF">
  <h1>Cars Pool Management</h1>
  <table border="1" cellpadding="3">
   <tr>
    <td>State</td>
    <td>Brand</td>
    <td>Type</td>
    <td>Registration Number</td>
   </tr>

    <xsl:apply-templates select="pool/car"/>

  </table>
 </body>
</html>
  </xsl:template>

  <xsl:template match="car">
<tr>
 <td>
    <xsl:call-template name="state-value"/>
 </td>
 <td>
    <xsl:value-of select="brand"/>
 </td>
 <td>
    <xsl:value-of select="type"/>
 </td>
 <td>
    <xsl:value-of select="number"/>
 </td>
</tr>
  </xsl:template>

</xsl:stylesheet>
 -------------------------------------------------------
pool-comm.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:template name="state-value">
    <xsl:choose>
      <xsl:when test="@state=1">
        <b>Free</b>
      </xsl:when>
      <xsl:when test="@state=2">
        Used
      </xsl:when>
      <xsl:when test="@state=3">
        <i>Getting repaired</i>
      </xsl:when>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>
---------------------------------------------------
pool.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
  <head>
    <META HTTP-EQUIV='Content-Type' CONTENT='text/html;
charset=ISO-8859-1'>
    <title>Cars Pool Management</title>
    <meta content='text/html' http-equiv='content-type'>
  </head>
  <body bgcolor='#FFFFFF'>
    <h1>Cars Pool Management</h1>
    <table cellpadding='3' border='1'>
      <tr>
        <td>State</td>
        <td>Brand</td>
        <td>Type</td>
        <td>Registration Number</td>
      </tr>
      <tr>
        <td></td>
        <td>Ferrari</td>
        <td>F40</td>
        <td>459 CBO 75</td>
      </tr>
      <tr>
        <td></td>
        <td>Porsche</td>
        <td>911</td>
        <td>347 CQQ 75</td>
      </tr>
    </table>
  </body>
</html>
----------------------------------------------------
expected-pool.html 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
    <head>
        <title>Cars Pool Management</title>
        <meta http-equiv="content-type" content="text/html">
    </head>
    <body bgcolor="#FFFFFF">
        <h1>Cars Pool Management</h1>
        <table border="1" cellpadding="3">
            <tr>
                <td>State</td><td>Brand</td><td>Type</td><td>Registration
Number
</td>
            </tr>
            <tr>
                <td><b>Free</b></td><td>Ferrari</td><td>F40</td><td>459 CBO
75</
td>
            </tr>
            <tr>
                <td>
        Used
      </td><td>Porsche</td><td>911</td><td>347 CQQ 75</td>
            </tr>
        </table>
    </body>
</html>


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128172&group_id=6473


From matt@virtualspectator.com  Wed Jan 10 05:20:43 2001
From: matt@virtualspectator.com (matt)
Date: Wed, 10 Jan 2001 18:20:43 +1300
Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again
In-Reply-To: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
Message-ID: <0101101829390Y.00856@localhost.localdomain>

--Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

If this is a bug, I will post it, but I'm not sure it is yet.  Attached are two
files, one a test xml with encoding ISO-8859-1 and the other a test python
script.  The problem is that if one uses a pyexpat parser, and then renders in
ISO-8859-1 then things are ok.  If one uses the drv_xmllib driver, then an
error occurs as it tries to translate back to ISO-8859-1.  My guess is that the
ISO-8859-1 transformation into UTF-8 for character data(which is what happens
when the original document is parsed) is not being done properly in the
drv_xmllib driver.

I have also included an xml document created within the script to show that
infact that one is ok, and that it is the parser that is doing something wrong,
or me doing something wrong with the parser.

My only reason for using drv_xmllib is that pyexpat still has a memory leak in
it.

I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur.

regards
Matt

--Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC
Content-Type: text/x-java;
  name="test.py"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.py"

ZnJvbSB4bWwuZG9tIGltcG9ydCBpbXBsZW1lbnRhdGlvbgpmcm9tIHhtbC5kb20gaW1wb3J0IGV4
dApmcm9tIHhtbC5kb20uZXh0LnJlYWRlciBpbXBvcnQgU2F4MgoKZHQgPSBpbXBsZW1lbnRhdGlv
bi5jcmVhdGVEb2N1bWVudFR5cGUoJycsJycsJycpCmRvYyA9IGltcGxlbWVudGF0aW9uLmNyZWF0
ZURvY3VtZW50KCcnLCd0ZXN0JyxkdCkKY2RzID0gZG9jLmNyZWF0ZUNEQVRBU2VjdGlvbigiaGVs
bG8iKQpjZHMuZGF0YT0iaGVsbG8gdGhpcyBpcyB0ZXh0IDog6SIKZm4gPSBkb2MuZ2V0RWxlbWVu
dHNCeVRhZ05hbWVOUygnJywnKicpWzBdCmZuLmFwcGVuZENoaWxkKGNkcykKCmV4dC5QcmV0dHlQ
cmludChkb2MsZW5jb2Rpbmc9J0lTTy04ODU5LTEnKQoKZnJvbSB4bWwuc2F4IGltcG9ydCBzYXhl
eHRzCgpkb2MyID0geG1sX2RvbV9vYmplY3QgPSBTYXgyLkZyb21YbWxGaWxlKCd0ZXN0LnhtbCcp
CmRvYzMgPSB4bWxfZG9tX29iamVjdCA9IFNheDIuRnJvbVhtbEZpbGUoJ3Rlc3QueG1sJyxwYXJz
ZXI9c2F4ZXh0cy5YTUxQYXJzZXJGYWN0b3J5Lm1ha2VfcGFyc2VyKCd4bWwuc2F4LmRyaXZlcnMu
ZHJ2X3B5ZXhwYXQnKSkKZG9jNCA9IHhtbF9kb21fb2JqZWN0ID0gU2F4Mi5Gcm9tWG1sRmlsZSgn
dGVzdC54bWwnLHBhcnNlcj1zYXhleHRzLlhNTFBhcnNlckZhY3RvcnkubWFrZV9wYXJzZXIoJ3ht
bC5zYXguZHJpdmVycy5kcnZfeG1sbGliJykpCgpwcmludApwcmludCAibm8gcGFyc2VyIHdhcyBz
ZWxlY3RlZCAuLiBzaG91bGQgZGVmYXVsdCB0byBweWV4cGF0IgpleHQuUHJldHR5UHJpbnQoZG9j
MixlbmNvZGluZz0nSVNPLTg4NTktMScpCgpwcmludApwcmludCAicHlleHBhdCB3YXMgcGFyc2Vy
IHNlbGVjdGVkIgpleHQuUHJldHR5UHJpbnQoZG9jMyxlbmNvZGluZz0nSVNPLTg4NTktMScpCgpw
cmludApwcmludCAiZHJ2X3htbGxpYiB3YXMgcGFyc2VyIHNlbGVjdGVkIgpleHQuUHJldHR5UHJp
bnQoZG9jNCxlbmNvZGluZz0nSVNPLTg4NTktMScpCiMgbm90ZSBpdCBpcyBmaW5lIGlmIHByaW50
ZWQgYXMgVVRGLTggZm9ybWF0Cgo=

--Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC
Content-Type: text/x-c++;
  name="test.xml"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.xml"

PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nSVNPLTg4NTktMSc/Pgo8dGVzdD48IVtDREFU
QVtoZWxsbyB0aGlzIGlzIHRleHQgOiDpXV0+CjwvdGVzdD4K

--Boundary-=_yGgxxpkLoRellNMPapqfWkHOPkMC--


From martin@loewis.home.cs.tu-berlin.de  Wed Jan 10 07:49:56 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 10 Jan 2001 08:49:56 +0100
Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again
In-Reply-To: <0101101829390Y.00856@localhost.localdomain> (message from matt
 on Wed, 10 Jan 2001 18:20:43 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <0101101829390Y.00856@localhost.localdomain>
Message-ID: <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de>

> If this is a bug, I will post it, but I'm not sure it is yet.
> Attached are two files, one a test xml with encoding ISO-8859-1 and
> the other a test python script.  The problem is that if one uses a
> pyexpat parser, and then renders in ISO-8859-1 then things are ok.
> If one uses the drv_xmllib driver, then an error occurs as it tries
> to translate back to ISO-8859-1.  My guess is that the ISO-8859-1
> transformation into UTF-8 for character data(which is what happens
> when the original document is parsed) is not being done properly in
> the drv_xmllib driver.

That's a good guess. drv_xmllib does not implement handle_xml at all,
so it does not know what the encoding is. However, what it *should*
do, atleast in Python 2.0, is to produce Unicode objects, not UTF-8
encoded strings.

Would you like to look into correcting that?

> My only reason for using drv_xmllib is that pyexpat still has a
> memory leak in it.

Not that I know of, atleast not in PyXML 0.6.3.

> I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur.

I'm confused. Where did you get PyXML 1.2 from?

Regards,
Martin


From matt@virtualspectator.com  Wed Jan 10 08:15:09 2001
From: matt@virtualspectator.com (matt)
Date: Wed, 10 Jan 2001 21:15:09 +1300
Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again
In-Reply-To: <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <0101101829390Y.00856@localhost.localdomain> <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de>
Message-ID: <01011021320810.00856@localhost.localdomain>

--Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

On Wed, 10 Jan 2001, Martin v. Loewis wrote:
> > If this is a bug, I will post it, but I'm not sure it is yet.
> > Attached are two files, one a test xml with encoding ISO-8859-1 and
> > the other a test python script.  The problem is that if one uses a
> > pyexpat parser, and then renders in ISO-8859-1 then things are ok.
> > If one uses the drv_xmllib driver, then an error occurs as it tries
> > to translate back to ISO-8859-1.  My guess is that the ISO-8859-1
> > transformation into UTF-8 for character data(which is what happens
> > when the original document is parsed) is not being done properly in
> > the drv_xmllib driver.
> 
> That's a good guess. drv_xmllib does not implement handle_xml at all,
> so it does not know what the encoding is. However, what it *should*
> do, atleast in Python 2.0, is to produce Unicode objects, not UTF-8
> encoded strings.

ahh ... ok.


> 
> Would you like to look into correcting that?
> 

Hmm, means upgrading to 2.0, which perhaps I should do.  The problem is that I
use 4dom in some quite heavy zope products, and I am unconvinced that python
2.0 and Zope are stable enough for production environments, and too different
to have split between production and development.  I am starting to figure out
PyXMLs stitching and would love to contribute somewhere.  Character encoding is
a good area.  The other part though is making 4Dom pickleable, which was
actually my next little project, to look at it some more and see where it is
not pickleable.  Could be simple, someone may already have the answer.


> > My only reason for using drv_xmllib is that pyexpat still has a
> > memory leak in it.
> 
> Not that I know of, atleast not in PyXML 0.6.3.
> 

Having a closer inspection of PyXML 0.6.3, the original memory leak from the
parser doing it's parsing thing has gone, but there is one that exists for just
purely making a parser.  I use to call FromXML and its derivatives with no
parser defined(ugh!!) and after about 77 loops of this memory would suddenly
start been eaten.   Anyway, now I just create parsers(the same pyexpat that
Reader defaults to) as members of any class that needs them, so the memory leak
never shows now.  Some of my functions need to open up to 100 xml documents from
files, import nodes, and write out others, so these leaks tend to show up
quickly.


> > I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur.
> 
> I'm confused. Where did you get PyXML 1.2 from?
> 

Someone said go get PyXML 1.3 on the 5th January from sourcefourge and I only
found PyXML 1.2 ..... which has now changed to 1.3 ... and there are
differences .. I have attached diff PyXML-0.6.2 PyXML-0.6.3 so you can see.

regards
Matt


> Regards,
> Martin
-- 

--Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM
Content-Type: text/english;
  name="diff.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="diff.txt"

ZGlmZiBQeVhNTC0wLjYuMi9BTk5PVU5DRSBQeVhNTC0wLjYuMy9BTk5PVU5DRQo2YzYKPCBWZXJz
aW9uIDAuNi4yIG9mIHRoZSBQeXRob24vWE1MIGRpc3RyaWJ1dGlvbiBpcyBub3cgYXZhaWxhYmxl
LiAgSXQKLS0tCj4gVmVyc2lvbiAwLjYuMyBvZiB0aGUgUHl0aG9uL1hNTCBkaXN0cmlidXRpb24g
aXMgbm93IGF2YWlsYWJsZS4gIEl0CjEwLDE0YzEwLDE0CjwgaHR0cDovL2Rvd25sb2FkLnNvdXJj
ZWZvcmdlLm5ldC9weXhtbC9QeVhNTC0wLjYuMi50YXIuZ3oKPCBodHRwOi8vZG93bmxvYWQuc291
cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4yLndpbjMyLXB5MS41LmV4ZQo8IGh0dHA6Ly9k
b3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwtMC42LjIud2luMzItcHkyLjAuZXhl
CjwgaHR0cDovL2Rvd25sb2FkLnNvdXJjZWZvcmdlLm5ldC9weXhtbC9QeVhNTC0wLjYuMi0xLjUu
Mi5pMzg2LnJwbQo8IGh0dHA6Ly9kb3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwt
MC42LjItMi4wLmkzODYucnBtCi0tLQo+IGh0dHA6Ly9kb3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQv
cHl4bWwvUHlYTUwtMC42LjMudGFyLmd6Cj4gaHR0cDovL2Rvd25sb2FkLnNvdXJjZWZvcmdlLm5l
dC9weXhtbC9QeVhNTC0wLjYuMy53aW4zMi1weTEuNS5leGUKPiBodHRwOi8vZG93bmxvYWQuc291
cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4zLndpbjMyLXB5Mi4wLmV4ZQo+IGh0dHA6Ly9k
b3dubG9hZC5zb3VyY2Vmb3JnZS5uZXQvcHl4bWwvUHlYTUwtMC42LjMtMS41LjIuaTM4Ni5ycG0K
PiBodHRwOi8vZG93bmxvYWQuc291cmNlZm9yZ2UubmV0L3B5eG1sL1B5WE1MLTAuNi4zLTIuMC5p
Mzg2LnJwbQoxNmMxNgo8IENoYW5nZXMgaW4gdGhpcyB2ZXJzaW9uLCBjb21wYXJlZCB0byAwLjYu
MToKLS0tCj4gQ2hhbmdlcyBpbiB0aGlzIHZlcnNpb24sIGNvbXBhcmVkIHRvIDAuNi4yOgoxOGMx
OAo8IAkqIFN5bmNocm9uaXplIHdpdGggc3RhbmRhcmQgbGlicmFyeSBmcm9tIFB5dGhvbiAyLjAK
LS0tCj4gCSogSW5jbHVkZSBkb2N1bWVudGF0aW9uIGluIGJpbmFyeSBwYWNrYWdlcyBhcyB3ZWxs
LgoyMCwyM2MyMCwyMQo8IAkqIFVwZGF0ZWQgdG8gNERPTSBmcm9tIDRTdWl0ZSAwLjkuMS4gVGhp
cyBjb3JyZWN0cyBtYW55CjwgCWVycm9ycywgc2VlIHRoZSA0U3VpdGUgQ2hhbmdlTG9nIGZvciBk
ZXRhaWxzLiBNb3N0IG5vdGFibHksCjwgCXRoZSBTQVggcmVhZGVyIGludGVyZmFjZSBoYXMgYmVl
biBleHBhbmRlZCB0byBzdXBwb3J0CjwgCWFyYml0cmFyeSBwYXJzZXJzLCBhbmQgYSBQeUV4cGF0
IHJlYWRlciBjbGFzcyB3YXMgYWRkZWQuCi0tLQo+IAkqIFVwZGF0ZSB0byBFeHBhdCAxLjIsIG9m
ZmVyIGFsbCBQeXRob24gVW5pY29kZSBjb2RlY3MgdG8KPiAgICAgICAgICAgZXhwYXQuCjI1YzIz
CjwgCSogQWRkIG1pbmlkb20gZnVuY3Rpb25zOiBub3JtYWxpemUgYW5kIGhhc0F0dHJpYnV0ZS4K
LS0tCj4gICAgICAgICAqIHN1cHBvcnQgdGhlIGxleGljYWwtaGFuZGxlciBwcm9wZXJ0eSBpbiB0
aGUgZXhwYXQgU0FYIGRyaXZlci4KMjdjMjUsMjgKPCAJKiBGaXggYSBudW1iZXIgb2YgbWlub3Ig
YnVncy4KLS0tCj4gCSogUmVzdHJ1Y3R1cmUgRE9NIGludGVyZmFjZXMgdG8gYmV0dGVyIGFjY29t
b2RhdGUgbXVsdGlwbGUKPiAgICAgICAgICAgRE9NIGltcGxlbWVudGF0aW9uczogcHJvdmlkZSBz
dGFuZGFyZCBleGNlcHRpb25zIGFuZCBzeW1ib2xpYwo+ICAgICAgICAgICBjb25zdGFudHMgKGlu
Y2x1ZGluZyB0aG9zZSBpbnNpZGUgb2YgdGhlIE5vZGUgaW50ZXJmYWNlKSBpbgo+ICAgICAgICAg
ICB4bWwuZG9tLgoyOWMzMCw0MAo8IAkqIE1vcmUgdGVzdHMgcGFzcyBub3csIGluIHBhcnRpY3Vs
YXIgdGVzdF9kb20sIGFuZCB0ZXN0L2RvbS90ZXN0LgotLS0KPiAJKiBJbXByb3ZlIG1pbmlkb206
IHZhbGlkYXRlIGFyZ3VtZW50cyBhbmQgcmFpc2UgRE9NIGV4Y2VwdGlvbnMsCj4gICAgICAgICAg
IGNvcnJlY3QgTmFtZU5vZGVNYXAgb3BlcmF0aW9ucywgb2ZmZXIgY2xvbmVOb2RlLCBzcGxpdFRl
eHQsCj4gICAgICAgICAgIERvY3VtZW50VHlwZSwgRE9NSW1wbGVtZW50YXRpb24sIGFuZCBjb3Jy
ZWN0IHZhcmlvdXMgb3RoZXIKPiAgICAgICAgICAgZXJyb3JzLgo+IAo+IAkqIFJlc3RvcmUgeG1s
LnVuaWNvZGUgZm9yIGNvbXBhdGliaWxpdHkgd2l0aCBQeVhNTCAwLjUuIFRoaXMgaXMKPiAgICAg
ICAgICAgYSBwdXJlLVB5dGhvbiBpbXBsZW1lbnRhdGlvbiBvZiB0aGUgaXNvODg1OSBtb2R1bGUs
IHdoaWNoIGNhbgo+ICAgICAgICAgICBvbmx5IGNvbnZlcnQgYmV0d2VlbiBJU08tODg1OS14IGFu
ZCBVVEYtOC4gUHl0aG9uIDIgdXNlcnMKPiAgICAgICAgICAgc2hvdWxkIHVzZSB0aGUgVW5pY29k
ZSB0eXBlIGluc3RlYWQgb2YgdGhpcyBzZXJ2aWNlLgo+IAo+IAkqIEZpeCBtZW1vcnkgbGVha3Mg
aW4gZXhwYXQgcGFyc2VyIGFuZCBwdWxsZG9tLgo0MCw0MWM1MQo8IEdhcnNob2wpLCB4bWxsaWIu
cHkgKFNqb2VyZCBNdWxsZW5kZXIpIHVzaW5nIHRoZSBzZ21sb3AuYyBhY2NlbGVyYXRvcgo8IG1v
ZHVsZSAoRnJlZHJpayBMdW5kaCkuCi0tLQo+IEdhcnNob2wpLCBzZ21sb3AgKEZyZWRyaWsgTHVu
ZGgpLgo0NCw0NmM1NCw1NQo8IAkqIERPTSBpbnRlcmZhY2UgKFN0ZWZhbmUgRmVybWlnaWVyLCBB
Lk0uIEt1Y2hsaW5nKQo8IAkqIDRET00gaW50ZXJmYWNlIGZyb20gRm91cnRob3VnaHQgKFVjaGUg
T2didWppLCBNaWtlIE9sc29uKQo8IAkqIHhtbGFyY2gucHksIGZvciBhcmNoaXRlY3R1cmFsIGZv
cm1zIHByb2Nlc3NpbmcgKEdlaXIgT3ZlIEdy+G5tbykKLS0tCj4gCSogbWluaWRvbSBET00gaW1w
bGVtZW50YXRpb24gKFBhdWwgUHJlc2NvZCkKPiAJKiA0RE9NIGZyb20gRm91cnRob3VnaHQgKFVj
aGUgT2didWppLCBNaWtlIE9sc29uKQpkaWZmIFB5WE1MLTAuNi4yL0NSRURJVFMgUHlYTUwtMC42
LjMvQ1JFRElUUwo3YzcKPCAgICA8aGFja2VyPiBlbGVtZW50IGFyZTogPG5hbWU+LCA8ZW1haWw+
LCA8aG9tZS1wYWdlPiwgPHB1YmxpYy1rZXk+LAotLS0KPiAgICA8eG1sLWhhY2tlcj4gZWxlbWVu
dCBhcmU6IDxuYW1lPiwgPGVtYWlsPiwgPGhvbWUtcGFnZT4sIDxwdWJsaWMta2V5PiwKMjNjMjMs
MjUKPCAgICAgPG5hbWU+IEZyZWQgTC4gRHJha2UgPC9uYW1lPgotLS0KPiAgICAgPG5hbWU+IEZy
ZWQgTC4gRHJha2UsIEpyLiA8L25hbWU+Cj4gICAgIDxlbWFpbD4gZmRyYWtlQGFjbS5vcmcgPC9l
bWFpbD4KPiAgICAgPGhvbWUtcGFnZT4gaHR0cDovL3B5dGhvbi5zdGFyc2hpcC5uZXQvY3Jldy9m
ZHJha2UvIDwvaG9tZS1wYWdlPgo4OWE5Mgo+ICAgICA8dGFzaz4gbWluaWRvbSA8L3Rhc2s+CjEx
NWMxMTgsMTI0CjwgICAgIDx0YXNrPiBBZGQgbm9ybWFsaXplKCkgdG8gbWluaWRvbSA8L3Rhc2s+
Ci0tLQo+ICAgICA8dGFzaz4gQWRkIHZhcmlvdXMgbWluaWRvbSBmZWF0dXJlcyA8L3Rhc2s+Cj4g
ICA8L3htbC1oYWNrZXI+Cj4gCj4gICA8eG1sLWhhY2tlcj4KPiAgICAgPG5hbWU+IEV2Z2VueSBD
aGVya2FzaGluIDwvbmFtZT4KPiAgICAgPGVtYWlsPiBldWdlbmVhaUBpY2MucnUgPC9lbWFpbD4K
PiAgICAgPHRhc2s+IEV4cG9zZSBQeXRob24gY29kZWNzIHRvIHB5ZXhwYXQgPC90YXNrPgpkaWZm
IFB5WE1MLTAuNi4yL0xJQ0VOQ0UgUHlYTUwtMC42LjMvTElDRU5DRQo1YzUKPCBET006Ci0tLQo+
IDRET006CjcsMTFjNwo8IFB5RXhwYXQ6CjwgCjwgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPCBDb3B5cmlnaHQgMTk5
MS0xOTk1IGJ5IFN0aWNodGluZyBNYXRoZW1hdGlzY2ggQ2VudHJ1bSwgQW1zdGVyZGFtLAo8IFRo
ZSBOZXRoZXJsYW5kcy4KLS0tCj4gQ29weXJpZ2h0IChjKSAyMDAwIEZvdXJ0aG91Z2h0IEluYywg
VVNBCjE5LDIxYzE1LDE2Mgo8IHN1cHBvcnRpbmcgZG9jdW1lbnRhdGlvbiwgYW5kIHRoYXQgdGhl
IG5hbWVzIG9mIFN0aWNodGluZyBNYXRoZW1hdGlzY2gKPCBDZW50cnVtIG9yIENXSSBvciBDb3Jw
b3JhdGlvbiBmb3IgTmF0aW9uYWwgUmVzZWFyY2ggSW5pdGlhdGl2ZXMgb3IKPCBDTlJJIG5vdCBi
ZSB1c2VkIGluIGFkdmVydGlzaW5nIG9yIHB1YmxpY2l0eSBwZXJ0YWluaW5nIHRvCi0tLQo+IHN1
cHBvcnRpbmcgZG9jdW1lbnRhdGlvbiwgYW5kIHRoYXQgdGhlIG5hbWUgb2YgRm91clRob3VnaHQg
TExDIG5vdCBiZQo+IHVzZWQgaW4gYWR2ZXJ0aXNpbmcgb3IgcHVibGljaXR5IHBlcnRhaW5pbmcg
dG8gZGlzdHJpYnV0aW9uIG9mIHRoZQo+IHNvZnR3YXJlIHdpdGhvdXQgc3BlY2lmaWMsIHdyaXR0
ZW4gcHJpb3IgcGVybWlzc2lvbi4KPiAKPiBGT1VSVEhPVUdIVCBMTEMgRElTQ0xBSU0gQUxMIFdB
UlJBTlRJRVMgV0lUSCBSRUdBUkQgVE8gVEhJUyBTT0ZUV0FSRSwKPiBJTkNMVURJTkcgQUxMIElN
UExJRUQgV0FSUkFOVElFUyBPRiBNRVJDSEFOVEFCSUxJVFkgQU5EIEZJVE5FU1MsCj4gSU4gTk8g
RVZFTlQgU0hBTEwgRk9VUlRIT1VHSFQgQkUgTElBQkxFIEZPUiBBTlkgU1BFQ0lBTCwgSU5ESVJF
Q1QgT1IKPiBDT05TRVFVRU5USUFMIERBTUFHRVMgT1IgQU5ZIERBTUFHRVMgV0hBVFNPRVZFUiBS
RVNVTFRJTkcgRlJPTSBMT1NTIE9GCj4gVVNFLCBEQVRBIE9SIFBST0ZJVFMsIFdIRVRIRVIgSU4g
QU4gQUNUSU9OIE9GIENPTlRSQUNULCBORUdMSUdFTkNFCj4gT1IgT1RIRVIgVE9SVElPVVMgQUNU
SU9OLCBBUklTSU5HIE9VVCBPRiBPUiBJTiBDT05ORUNUSU9OIFdJVEggVEhFCj4gVVNFIE9SIFBF
UkZPUk1BTkNFIE9GIFRISVMgU09GVFdBUkUuCj4gCj4gCj4gUHlFeHBhdCwgU0FYIGxpYnJhcmll
czoKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLQo+IEJFT1BFTiBQWVRIT04gT1BFTiBTT1VSQ0UgTElDRU5TRSBBR1JF
RU1FTlQgVkVSU0lPTiAxCj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0KPiAKPiAxLiBUaGlzIExJQ0VOU0UgQUdSRUVNRU5UIGlzIGJldHdlZW4g
QmVPcGVuLmNvbSAoIkJlT3BlbiIpLCBoYXZpbmcgYW4KPiBvZmZpY2UgYXQgMTYwIFNhcmF0b2dh
IEF2ZW51ZSwgU2FudGEgQ2xhcmEsIENBIDk1MDUxLCBhbmQgdGhlCj4gSW5kaXZpZHVhbCBvciBP
cmdhbml6YXRpb24gKCJMaWNlbnNlZSIpIGFjY2Vzc2luZyBhbmQgb3RoZXJ3aXNlIHVzaW5nCj4g
dGhpcyBzb2Z0d2FyZSBpbiBzb3VyY2Ugb3IgYmluYXJ5IGZvcm0gYW5kIGl0cyBhc3NvY2lhdGVk
Cj4gZG9jdW1lbnRhdGlvbiAoInRoZSBTb2Z0d2FyZSIpLgo+IAo+IDIuIFN1YmplY3QgdG8gdGhl
IHRlcm1zIGFuZCBjb25kaXRpb25zIG9mIHRoaXMgQmVPcGVuIFB5dGhvbiBMaWNlbnNlCj4gQWdy
ZWVtZW50LCBCZU9wZW4gaGVyZWJ5IGdyYW50cyBMaWNlbnNlZSBhIG5vbi1leGNsdXNpdmUsCj4g
cm95YWx0eS1mcmVlLCB3b3JsZC13aWRlIGxpY2Vuc2UgdG8gcmVwcm9kdWNlLCBhbmFseXplLCB0
ZXN0LCBwZXJmb3JtCj4gYW5kL29yIGRpc3BsYXkgcHVibGljbHksIHByZXBhcmUgZGVyaXZhdGl2
ZSB3b3JrcywgZGlzdHJpYnV0ZSwgYW5kCj4gb3RoZXJ3aXNlIHVzZSB0aGUgU29mdHdhcmUgYWxv
bmUgb3IgaW4gYW55IGRlcml2YXRpdmUgdmVyc2lvbiwKPiBwcm92aWRlZCwgaG93ZXZlciwgdGhh
dCB0aGUgQmVPcGVuIFB5dGhvbiBMaWNlbnNlIGlzIHJldGFpbmVkIGluIHRoZQo+IFNvZnR3YXJl
LCBhbG9uZSBvciBpbiBhbnkgZGVyaXZhdGl2ZSB2ZXJzaW9uIHByZXBhcmVkIGJ5IExpY2Vuc2Vl
Lgo+IAo+IDMuIEJlT3BlbiBpcyBtYWtpbmcgdGhlIFNvZnR3YXJlIGF2YWlsYWJsZSB0byBMaWNl
bnNlZSBvbiBhbiAiQVMgSVMiCj4gYmFzaXMuICBCRU9QRU4gTUFLRVMgTk8gUkVQUkVTRU5UQVRJ
T05TIE9SIFdBUlJBTlRJRVMsIEVYUFJFU1MgT1IKPiBJTVBMSUVELiAgQlkgV0FZIE9GIEVYQU1Q
TEUsIEJVVCBOT1QgTElNSVRBVElPTiwgQkVPUEVOIE1BS0VTIE5PIEFORAo+IERJU0NMQUlNUyBB
TlkgUkVQUkVTRU5UQVRJT04gT1IgV0FSUkFOVFkgT0YgTUVSQ0hBTlRBQklMSVRZIE9SIEZJVE5F
U1MKPiBGT1IgQU5ZIFBBUlRJQ1VMQVIgUFVSUE9TRSBPUiBUSEFUIFRIRSBVU0UgT0YgVEhFIFNP
RlRXQVJFIFdJTEwgTk9UCj4gSU5GUklOR0UgQU5ZIFRISVJEIFBBUlRZIFJJR0hUUy4KPiAKPiA0
LiBCRU9QRU4gU0hBTEwgTk9UIEJFIExJQUJMRSBUTyBMSUNFTlNFRSBPUiBBTlkgT1RIRVIgVVNF
UlMgT0YgVEhFCj4gU09GVFdBUkUgRk9SIEFOWSBJTkNJREVOVEFMLCBTUEVDSUFMLCBPUiBDT05T
RVFVRU5USUFMIERBTUFHRVMgT1IgTE9TUwo+IEFTIEEgUkVTVUxUIE9GIFVTSU5HLCBNT0RJRllJ
TkcgT1IgRElTVFJJQlVUSU5HIFRIRSBTT0ZUV0FSRSwgT1IgQU5ZCj4gREVSSVZBVElWRSBUSEVS
RU9GLCBFVkVOIElGIEFEVklTRUQgT0YgVEhFIFBPU1NJQklMSVRZIFRIRVJFT0YuCj4gCj4gNS4g
VGhpcyBMaWNlbnNlIEFncmVlbWVudCB3aWxsIGF1dG9tYXRpY2FsbHkgdGVybWluYXRlIHVwb24g
YSBtYXRlcmlhbAo+IGJyZWFjaCBvZiBpdHMgdGVybXMgYW5kIGNvbmRpdGlvbnMuCj4gCj4gNi4g
VGhpcyBMaWNlbnNlIEFncmVlbWVudCBzaGFsbCBiZSBnb3Zlcm5lZCBieSBhbmQgaW50ZXJwcmV0
ZWQgaW4gYWxsCj4gcmVzcGVjdHMgYnkgdGhlIGxhdyBvZiB0aGUgU3RhdGUgb2YgQ2FsaWZvcm5p
YSwgZXhjbHVkaW5nIGNvbmZsaWN0IG9mCj4gbGF3IHByb3Zpc2lvbnMuICBOb3RoaW5nIGluIHRo
aXMgTGljZW5zZSBBZ3JlZW1lbnQgc2hhbGwgYmUgZGVlbWVkIHRvCj4gY3JlYXRlIGFueSByZWxh
dGlvbnNoaXAgb2YgYWdlbmN5LCBwYXJ0bmVyc2hpcCwgb3Igam9pbnQgdmVudHVyZQo+IGJldHdl
ZW4gQmVPcGVuIGFuZCBMaWNlbnNlZS4gIFRoaXMgTGljZW5zZSBBZ3JlZW1lbnQgZG9lcyBub3Qg
Z3JhbnQKPiBwZXJtaXNzaW9uIHRvIHVzZSBCZU9wZW4gdHJhZGVtYXJrcyBvciB0cmFkZSBuYW1l
cyBpbiBhIHRyYWRlbWFyawo+IHNlbnNlIHRvIGVuZG9yc2Ugb3IgcHJvbW90ZSBwcm9kdWN0cyBv
ciBzZXJ2aWNlcyBvZiBMaWNlbnNlZSwgb3IgYW55Cj4gdGhpcmQgcGFydHkuICBBcyBhbiBleGNl
cHRpb24sIHRoZSAiQmVPcGVuIFB5dGhvbiIgbG9nb3MgYXZhaWxhYmxlIGF0Cj4gaHR0cDovL3d3
dy5weXRob25sYWJzLmNvbS9sb2dvcy5odG1sIG1heSBiZSB1c2VkIGFjY29yZGluZyB0byB0aGUK
PiBwZXJtaXNzaW9ucyBncmFudGVkIG9uIHRoYXQgd2ViIHBhZ2UuCj4gCj4gNy4gQnkgY29weWlu
ZywgaW5zdGFsbGluZyBvciBvdGhlcndpc2UgdXNpbmcgdGhlIHNvZnR3YXJlLCBMaWNlbnNlZQo+
IGFncmVlcyB0byBiZSBib3VuZCBieSB0aGUgdGVybXMgYW5kIGNvbmRpdGlvbnMgb2YgdGhpcyBM
aWNlbnNlCj4gQWdyZWVtZW50Lgo+IAo+IAo+IENOUkkgT1BFTiBTT1VSQ0UgTElDRU5TRSBBR1JF
RU1FTlQKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gCj4gUHl0aG9uIDEu
NiBDTlJJIE9QRU4gU09VUkNFIExJQ0VOU0UgQUdSRUVNRU5UCj4gCj4gSU1QT1JUQU5UOiBQTEVB
U0UgUkVBRCBUSEUgRk9MTE9XSU5HIEFHUkVFTUVOVCBDQVJFRlVMTFkuIEJZIENMSUNLSU5HCj4g
T04gIkFDQ0VQVCIgV0hFUkUgSU5ESUNBVEVEIEJFTE9XLCBPUiBCWSBDT1BZSU5HLCBJTlNUQUxM
SU5HIE9SCj4gT1RIRVJXSVNFIFVTSU5HIFBZVEhPTiAxLjYgU09GVFdBUkUsIFlPVSBBUkUgREVF
TUVEIFRPIEhBVkUgQUdSRUVEIFRPCj4gVEhFIFRFUk1TIEFORCBDT05ESVRJT05TIE9GIFRISVMg
TElDRU5TRSBBR1JFRU1FTlQuCj4gCj4gMS4gVGhpcyBMSUNFTlNFIEFHUkVFTUVOVCBpcyBiZXR3
ZWVuIHRoZSBDb3Jwb3JhdGlvbiBmb3IgTmF0aW9uYWwKPiBSZXNlYXJjaCBJbml0aWF0aXZlcywg
aGF2aW5nIGFuIG9mZmljZSBhdCAxODk1IFByZXN0b24gV2hpdGUgRHJpdmUsCj4gUmVzdG9uLCBW
QSAyMDE5MSAoIkNOUkkiKSwgYW5kIHRoZSBJbmRpdmlkdWFsIG9yIE9yZ2FuaXphdGlvbgo+ICgi
TGljZW5zZWUiKSBhY2Nlc3NpbmcgYW5kIG90aGVyd2lzZSB1c2luZyBQeXRob24gMS42IHNvZnR3
YXJlIGluCj4gc291cmNlIG9yIGJpbmFyeSBmb3JtIGFuZCBpdHMgYXNzb2NpYXRlZCBkb2N1bWVu
dGF0aW9uLCBhcyByZWxlYXNlZCBhdAo+IHRoZSB3d3cucHl0aG9uLm9yZyBJbnRlcm5ldCBzaXRl
IG9uIFNlcHRlbWJlciA1LCAyMDAwICgiUHl0aG9uIDEuNiIpLgo+IAo+IDIuIFN1YmplY3QgdG8g
dGhlIHRlcm1zIGFuZCBjb25kaXRpb25zIG9mIHRoaXMgTGljZW5zZSBBZ3JlZW1lbnQsIENOUkkK
PiBoZXJlYnkgZ3JhbnRzIExpY2Vuc2VlIGEgbm9uZXhjbHVzaXZlLCByb3lhbHR5LWZyZWUsIHdv
cmxkLXdpZGUKPiBsaWNlbnNlIHRvIHJlcHJvZHVjZSwgYW5hbHl6ZSwgdGVzdCwgcGVyZm9ybSBh
bmQvb3IgZGlzcGxheSBwdWJsaWNseSwKPiBwcmVwYXJlIGRlcml2YXRpdmUgd29ya3MsIGRpc3Ry
aWJ1dGUsIGFuZCBvdGhlcndpc2UgdXNlIFB5dGhvbiAxLjYKPiBhbG9uZSBvciBpbiBhbnkgZGVy
aXZhdGl2ZSB2ZXJzaW9uLCBwcm92aWRlZCwgaG93ZXZlciwgdGhhdCBDTlJJJ3MKPiBMaWNlbnNl
IEFncmVlbWVudCBhbmQgQ05SSSdzIG5vdGljZSBvZiBjb3B5cmlnaHQsIGkuZS4sICJDb3B5cmln
aHQgKGMpCj4gMTk5NS0yMDAwIENvcnBvcmF0aW9uIGZvciBOYXRpb25hbCBSZXNlYXJjaCBJbml0
aWF0aXZlczsgQWxsIFJpZ2h0cwo+IFJlc2VydmVkIiBhcmUgcmV0YWluZWQgaW4gUHl0aG9uIDEu
NiBhbG9uZSBvciBpbiBhbnkgZGVyaXZhdGl2ZQo+IHZlcnNpb24gcHJlcGFyZWQgYnkKPiAKPiBM
aWNlbnNlZS4gQWx0ZXJuYXRlbHksIGluIGxpZXUgb2YgQ05SSSdzIExpY2Vuc2UgQWdyZWVtZW50
LCBMaWNlbnNlZQo+IG1heSBzdWJzdGl0dXRlIHRoZSBmb2xsb3dpbmcgdGV4dCAob21pdHRpbmcg
dGhlIHF1b3Rlcyk6ICJQeXRob24gMS42Cj4gaXMgbWFkZSBhdmFpbGFibGUgc3ViamVjdCB0byB0
aGUgdGVybXMgYW5kIGNvbmRpdGlvbnMgaW4gQ05SSSdzCj4gTGljZW5zZSBBZ3JlZW1lbnQuIFRo
aXMgQWdyZWVtZW50IHRvZ2V0aGVyIHdpdGggUHl0aG9uIDEuNiBtYXkgYmUKPiBsb2NhdGVkIG9u
IHRoZSBJbnRlcm5ldCB1c2luZyB0aGUgZm9sbG93aW5nIHVuaXF1ZSwgcGVyc2lzdGVudAo+IGlk
ZW50aWZpZXIgKGtub3duIGFzIGEgaGFuZGxlKTogMTg5NS4yMi8xMDEyLiBUaGlzIEFncmVlbWVu
dCBtYXkgYWxzbwo+IGJlIG9idGFpbmVkIGZyb20gYSBwcm94eSBzZXJ2ZXIgb24gdGhlIEludGVy
bmV0IHVzaW5nIHRoZSBmb2xsb3dpbmcKPiBVUkw6IGh0dHA6Ly9oZGwuaGFuZGxlLm5ldC8xODk1
LjIyLzEwMTIiLgo+IAo+IDMuIEluIHRoZSBldmVudCBMaWNlbnNlZSBwcmVwYXJlcyBhIGRlcml2
YXRpdmUgd29yayB0aGF0IGlzIGJhc2VkIG9uCj4gb3IgaW5jb3Jwb3JhdGVzIFB5dGhvbiAxLjYg
b3IgYW55IHBhcnQgdGhlcmVvZiwgYW5kIHdhbnRzIHRvIG1ha2UgdGhlCj4gZGVyaXZhdGl2ZSB3
b3JrIGF2YWlsYWJsZSB0byBvdGhlcnMgYXMgcHJvdmlkZWQgaGVyZWluLCB0aGVuIExpY2Vuc2Vl
Cj4gaGVyZWJ5IGFncmVlcyB0byBpbmNsdWRlIGluIGFueSBzdWNoIHdvcmsgYSBicmllZiBzdW1t
YXJ5IG9mIHRoZQo+IGNoYW5nZXMgbWFkZSB0byBQeXRob24gMS42Lgo+IAo+IDQuIENOUkkgaXMg
bWFraW5nIFB5dGhvbiAxLjYgYXZhaWxhYmxlIHRvIExpY2Vuc2VlIG9uIGFuICJBUyBJUyIKPiBi
YXNpcy4gQ05SSSBNQUtFUyBOTyBSRVBSRVNFTlRBVElPTlMgT1IgV0FSUkFOVElFUywgRVhQUkVT
UyBPUgo+IElNUExJRUQuIEJZIFdBWSBPRiBFWEFNUExFLCBCVVQgTk9UIExJTUlUQVRJT04sIENO
UkkgTUFLRVMgTk8gQU5ECj4gRElTQ0xBSU1TIEFOWSBSRVBSRVNFTlRBVElPTiBPUiBXQVJSQU5U
WSBPRiBNRVJDSEFOVEFCSUxJVFkgT1IgRklUTkVTUwo+IEZPUiBBTlkgUEFSVElDVUxBUiBQVVJQ
T1NFIE9SIFRIQVQgVEhFIFVTRSBPRiBQWVRIT04gMS42IFdJTEwgTk9UCj4gSU5GUklOR0UgQU5Z
IFRISVJEIFBBUlRZIFJJR0hUUy4KPiAKPiA1LiBDTlJJIFNIQUxMIE5PVCBCRSBMSUFCTEUgVE8g
TElDRU5TRUUgT1IgQU5ZIE9USEVSIFVTRVJTIE9GIFBZVEhPTgo+IDEuNiBGT1IgQU5ZIElOQ0lE
RU5UQUwsIFNQRUNJQUwsIE9SIENPTlNFUVVFTlRJQUwgREFNQUdFUyBPUiBMT1NTIEFTIEEKPiBS
RVNVTFQgT0YgTU9ESUZZSU5HLCBESVNUUklCVVRJTkcsIE9SIE9USEVSV0lTRSBVU0lORyBQWVRI
T04gMS42LCBPUgo+IEFOWSBERVJJVkFUSVZFIFRIRVJFT0YsIEVWRU4gSUYgQURWSVNFRCBPRiBU
SEUgUE9TU0lCSUxJVFkgVEhFUkVPRi4KPiAKPiA2LiBUaGlzIExpY2Vuc2UgQWdyZWVtZW50IHdp
bGwgYXV0b21hdGljYWxseSB0ZXJtaW5hdGUgdXBvbiBhIG1hdGVyaWFsCj4gYnJlYWNoIG9mIGl0
cyB0ZXJtcyBhbmQgY29uZGl0aW9ucy4KPiAKPiA3LiBUaGlzIExpY2Vuc2UgQWdyZWVtZW50IHNo
YWxsIGJlIGdvdmVybmVkIGJ5IGFuZCBpbnRlcnByZXRlZCBpbiBhbGwKPiByZXNwZWN0cyBieSB0
aGUgbGF3IG9mIHRoZSBTdGF0ZSBvZiBWaXJnaW5pYSwgZXhjbHVkaW5nIGNvbmZsaWN0IG9mCj4g
bGF3IHByb3Zpc2lvbnMuIE5vdGhpbmcgaW4gdGhpcyBMaWNlbnNlIEFncmVlbWVudCBzaGFsbCBi
ZSBkZWVtZWQgdG8KPiBjcmVhdGUgYW55IHJlbGF0aW9uc2hpcCBvZiBhZ2VuY3ksIHBhcnRuZXJz
aGlwLCBvciBqb2ludCB2ZW50dXJlCj4gYmV0d2VlbiBDTlJJIGFuZCBMaWNlbnNlZS4gVGhpcyBM
aWNlbnNlIEFncmVlbWVudCBkb2VzIG5vdCBncmFudAo+IHBlcm1pc3Npb24gdG8gdXNlIENOUkkg
dHJhZGVtYXJrcyBvciB0cmFkZSBuYW1lIGluIGEgdHJhZGVtYXJrIHNlbnNlCj4gdG8gZW5kb3Jz
ZSBvciBwcm9tb3RlIHByb2R1Y3RzIG9yIHNlcnZpY2VzIG9mIExpY2Vuc2VlLCBvciBhbnkgdGhp
cmQKPiBwYXJ0eS4KPiAKPiA4LiBCeSBjbGlja2luZyBvbiB0aGUgIkFDQ0VQVCIgYnV0dG9uIHdo
ZXJlIGluZGljYXRlZCwgb3IgYnkgY29weWluZywKPiBpbnN0YWxsaW5nIG9yIG90aGVyd2lzZSB1
c2luZyBQeXRob24gMS42LCBMaWNlbnNlZSBhZ3JlZXMgdG8gYmUgYm91bmQKPiBieSB0aGUgdGVy
bXMgYW5kIGNvbmRpdGlvbnMgb2YgdGhpcyBMaWNlbnNlIEFncmVlbWVudC4KPiAKPiBBQ0NFUFQK
PiAKPiAKPiBDV0kgUEVSTUlTU0lPTlMgU1RBVEVNRU5UIEFORCBESVNDTEFJTUVSCj4gLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+IAo+IENvcHlyaWdodCAoYykgMTk5
MSAtIDE5OTUsIFN0aWNodGluZyBNYXRoZW1hdGlzY2ggQ2VudHJ1bSBBbXN0ZXJkYW0sCj4gVGhl
IE5ldGhlcmxhbmRzLiAgQWxsIHJpZ2h0cyByZXNlcnZlZC4KPiAKPiBQZXJtaXNzaW9uIHRvIHVz
ZSwgY29weSwgbW9kaWZ5LCBhbmQgZGlzdHJpYnV0ZSB0aGlzIHNvZnR3YXJlIGFuZCBpdHMKPiBk
b2N1bWVudGF0aW9uIGZvciBhbnkgcHVycG9zZSBhbmQgd2l0aG91dCBmZWUgaXMgaGVyZWJ5IGdy
YW50ZWQsCj4gcHJvdmlkZWQgdGhhdCB0aGUgYWJvdmUgY29weXJpZ2h0IG5vdGljZSBhcHBlYXIg
aW4gYWxsIGNvcGllcyBhbmQgdGhhdAo+IGJvdGggdGhhdCBjb3B5cmlnaHQgbm90aWNlIGFuZCB0
aGlzIHBlcm1pc3Npb24gbm90aWNlIGFwcGVhciBpbgo+IHN1cHBvcnRpbmcgZG9jdW1lbnRhdGlv
biwgYW5kIHRoYXQgdGhlIG5hbWUgb2YgU3RpY2h0aW5nIE1hdGhlbWF0aXNjaAo+IENlbnRydW0g
b3IgQ1dJIG5vdCBiZSB1c2VkIGluIGFkdmVydGlzaW5nIG9yIHB1YmxpY2l0eSBwZXJ0YWluaW5n
IHRvCjI1LDM2YzE2NiwxNzIKPCBXaGlsZSBDV0kgaXMgdGhlIGluaXRpYWwgc291cmNlIGZvciB0
aGlzIHNvZnR3YXJlLCBhIG1vZGlmaWVkIHZlcnNpb24KPCBpcyBtYWRlIGF2YWlsYWJsZSBieSB0
aGUgQ29ycG9yYXRpb24gZm9yIE5hdGlvbmFsIFJlc2VhcmNoIEluaXRpYXRpdmVzCjwgKENOUkkp
IGF0IHRoZSBJbnRlcm5ldCBhZGRyZXNzIGZ0cDovL2Z0cC5weXRob24ub3JnLgo8IAo8IFNUSUNI
VElORyBNQVRIRU1BVElTQ0ggQ0VOVFJVTSBBTkQgQ05SSSBESVNDTEFJTSBBTEwgV0FSUkFOVElF
UyBXSVRICjwgUkVHQVJEIFRPIFRISVMgU09GVFdBUkUsIElOQ0xVRElORyBBTEwgSU1QTElFRCBX
QVJSQU5USUVTIE9GCjwgTUVSQ0hBTlRBQklMSVRZIEFORCBGSVRORVNTLCBJTiBOTyBFVkVOVCBT
SEFMTCBTVElDSFRJTkcgTUFUSEVNQVRJU0NICjwgQ0VOVFJVTSBPUiBDTlJJIEJFIExJQUJMRSBG
T1IgQU5ZIFNQRUNJQUwsIElORElSRUNUIE9SIENPTlNFUVVFTlRJQUwKPCBEQU1BR0VTIE9SIEFO
WSBEQU1BR0VTIFdIQVRTT0VWRVIgUkVTVUxUSU5HIEZST00gTE9TUyBPRiBVU0UsIERBVEEgT1IK
PCBQUk9GSVRTLCBXSEVUSEVSIElOIEFOIEFDVElPTiBPRiBDT05UUkFDVCwgTkVHTElHRU5DRSBP
UiBPVEhFUgo8IFRPUlRJT1VTIEFDVElPTiwgQVJJU0lORyBPVVQgT0YgT1IgSU4gQ09OTkVDVElP
TiBXSVRIIFRIRSBVU0UgT1IKPCBQRVJGT1JNQU5DRSBPRiBUSElTIFNPRlRXQVJFLgotLS0KPiBT
VElDSFRJTkcgTUFUSEVNQVRJU0NIIENFTlRSVU0gRElTQ0xBSU1TIEFMTCBXQVJSQU5USUVTIFdJ
VEggUkVHQVJEIFRPCj4gVEhJUyBTT0ZUV0FSRSwgSU5DTFVESU5HIEFMTCBJTVBMSUVEIFdBUlJB
TlRJRVMgT0YgTUVSQ0hBTlRBQklMSVRZIEFORAo+IEZJVE5FU1MsIElOIE5PIEVWRU5UIFNIQUxM
IFNUSUNIVElORyBNQVRIRU1BVElTQ0ggQ0VOVFJVTSBCRSBMSUFCTEUKPiBGT1IgQU5ZIFNQRUNJ
QUwsIElORElSRUNUIE9SIENPTlNFUVVFTlRJQUwgREFNQUdFUyBPUiBBTlkgREFNQUdFUwo+IFdI
QVRTT0VWRVIgUkVTVUxUSU5HIEZST00gTE9TUyBPRiBVU0UsIERBVEEgT1IgUFJPRklUUywgV0hF
VEhFUiBJTiBBTgo+IEFDVElPTiBPRiBDT05UUkFDVCwgTkVHTElHRU5DRSBPUiBPVEhFUiBUT1JU
SU9VUyBBQ1RJT04sIEFSSVNJTkcgT1VUCj4gT0YgT1IgSU4gQ09OTkVDVElPTiBXSVRIIFRIRSBV
U0UgT1IgUEVSRk9STUFOQ0UgT0YgVEhJUyBTT0ZUV0FSRS4KNDUsNDdjMTgxCjwgc2F4bGliOgo8
IAo8IHNnbWxvcC5jCi0tLQo+IHNnbWxvcC5jOgo1Niw2MmQxODkKPCB4bWxhcmNoOgo8IC0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tCjwgQ29weXJpZ2h0IChDKSAxOTk4IGJ5IEdlaXIgTy4gR3L4bm1vLCBncm92ZUBpbmZv
dGVrLm5vCjwgCjwgRnJlZSBmb3IgY29tbWVyY2lhbCBhbmQgbm9uLWNvbW1lcmNpYWwgdXNlLgo8
IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tCjwgCjY3YTE5NSwxOTcKPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+IAo+IHNldHVwZXh0L2lu
c3RhbGxfZGF0YS5weToKNjhhMTk5LDIxNgo+IFBlcm1pc3Npb24gaXMgaGVyZWJ5IGdyYW50ZWQs
IGZyZWUgb2YgY2hhcmdlLCB0byBhbnkgcGVyc29uIG9idGFpbmluZwo+IGEgY29weSBvZiB0aGlz
IHNvZnR3YXJlIGFuZCBhc3NvY2lhdGVkIGRvY3VtZW50YXRpb24gZmlsZXMgKHRoZQo+ICJTb2Z0
d2FyZSIpLCB0byBkZWFsIGluIHRoZSBTb2Z0d2FyZSB3aXRob3V0IHJlc3RyaWN0aW9uLCBpbmNs
dWRpbmcKPiB3aXRob3V0IGxpbWl0YXRpb24gdGhlIHJpZ2h0cyB0byB1c2UsIGNvcHksIG1vZGlm
eSwgbWVyZ2UsIHB1Ymxpc2gsCj4gZGlzdHJpYnV0ZSwgc3VibGljZW5zZSwgYW5kL29yIHNlbGwg
Y29waWVzIG9mIHRoZSBTb2Z0d2FyZSwgYW5kIHRvCj4gcGVybWl0IHBlcnNvbnMgdG8gd2hvbSB0
aGUgU29mdHdhcmUgaXMgZnVybmlzaGVkIHRvIGRvIHNvLCBzdWJqZWN0IHRvCj4gdGhlIGZvbGxv
d2luZyBjb25kaXRpb25zOgo+ICAKPiBUaGUgYWJvdmUgY29weXJpZ2h0IG5vdGljZSBhbmQgdGhp
cyBwZXJtaXNzaW9uIG5vdGljZSBzaGFsbCBiZSBpbmNsdWRlZAo+IGluIGFsbCBjb3BpZXMgb3Ig
c3Vic3RhbnRpYWwgcG9ydGlvbnMgb2YgdGhlIFNvZnR3YXJlLgo+ICAKPiBUSEUgU09GVFdBUkUg
SVMgUFJPVklERUQgIkFTIElTIiwgV0lUSE9VVCBXQVJSQU5UWSBPRiBBTlkgS0lORCwKPiBFWFBS
RVNTIE9SIElNUExJRUQsIElOQ0xVRElORyBCVVQgTk9UIExJTUlURUQgVE8gVEhFIFdBUlJBTlRJ
RVMgT0YKPiBNRVJDSEFOVEFCSUxJVFksIEZJVE5FU1MgRk9SIEEgUEFSVElDVUxBUiBQVVJQT1NF
IEFORCBOT05JTkZSSU5HRU1FTlQuCj4gSU4gTk8gRVZFTlQgU0hBTEwgVEhFIEFVVEhPUlMgT1Ig
Q09QWVJJR0hUIEhPTERFUlMgQkUgTElBQkxFIEZPUiBBTlkKPiBDTEFJTSwgREFNQUdFUyBPUiBP
VEhFUiBMSUFCSUxJVFksIFdIRVRIRVIgSU4gQU4gQUNUSU9OIE9GIENPTlRSQUNULAo+IFRPUlQg
T1IgT1RIRVJXSVNFLCBBUklTSU5HIEZST00sIE9VVCBPRiBPUiBJTiBDT05ORUNUSU9OIFdJVEgg
VEhFCj4gU09GVFdBUkUgT1IgVEhFIFVTRSBPUiBPVEhFUiBERUFMSU5HUyBJTiBUSEUgU09GVFdB
UkUuCmRpZmYgUHlYTUwtMC42LjIvTUFOSUZFU1QgUHlYTUwtMC42LjMvTUFOSUZFU1QKMTBhMTEK
PiBzZXR1cC5jZmcKNjJjNjMKPCBkZW1vL3hiZWwveGJlbC5kdGQKLS0tCj4gZGVtby94YmVsL3hi
ZWwtMS4wLmR0ZAoxMDhkMTA4CjwgZXh0ZW5zaW9ucy9leHBhdC9NUEwtMV8wLmh0bWwKMTEyLDEx
M2QxMTEKPCBleHRlbnNpb25zL2V4cGF0L2V4cGF0Lm1hawo8IGV4dGVuc2lvbnMvZXhwYXQvZ3Bs
ZWxlY3QuaHRtbAoxMTcsMTE4ZDExNAo8IGV4dGVuc2lvbnMvZXhwYXQveG1scGFyc2UvaGFzaHRh
YmxlLmMKPCBleHRlbnNpb25zL2V4cGF0L3htbHBhcnNlL2hhc2h0YWJsZS5oCjEyMWExMTgKPiBl
eHRlbnNpb25zL2V4cGF0L3htbHRvay9hc2NpaS5oCjE0OWExNDcsMTQ5Cj4gc2V0dXBleHQvX19p
bml0X18ucHkKPiBzZXR1cGV4dC9pbnN0YWxsX2RhdGEucHkKPiB0ZXN0L2VuY190ZXN0LnhtbAox
NTVhMTU2Cj4gdGVzdC90ZXN0X2VuY29kaW5ncy5weQoyMDJhMjA0LDI2Mgo+IHRlc3QvZG9tL2h0
bWwvdGVzdC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9hLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0
X2FwcGxldC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9hcmVhLnB5Cj4gdGVzdC9kb20vaHRtbC90
ZXN0X2Jhc2UucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfYmFzZWZvbnQucHkKPiB0ZXN0L2RvbS9o
dG1sL3Rlc3RfYmxvY2txdW90ZS5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9ib2R5LnB5Cj4gdGVz
dC9kb20vaHRtbC90ZXN0X2JyLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2J1dHRvbi5weQo+IHRl
c3QvZG9tL2h0bWwvdGVzdF9jYXB0aW9uLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2NvbC5weQo+
IHRlc3QvZG9tL2h0bWwvdGVzdF9jb2xsZWN0aW9uLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2Rp
ci5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9kaXYucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZGwu
cHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZG9jdW1lbnQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rf
ZWxlbWVudC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9maWVsZHNldC5weQo+IHRlc3QvZG9tL2h0
bWwvdGVzdF9mb250LnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2Zvcm0ucHkKPiB0ZXN0L2RvbS9o
dG1sL3Rlc3RfZnJhbWUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfZnJhbWVzZXQucHkKPiB0ZXN0
L2RvbS9odG1sL3Rlc3RfaC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9oZWFkLnB5Cj4gdGVzdC9k
b20vaHRtbC90ZXN0X2hyLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2h0bWwucHkKPiB0ZXN0L2Rv
bS9odG1sL3Rlc3RfaHRtbF9kb21faW1wbGVtZW50YXRpb24ucHkKPiB0ZXN0L2RvbS9odG1sL3Rl
c3RfaWZyYW1lLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2ltZy5weQo+IHRlc3QvZG9tL2h0bWwv
dGVzdF9pbnB1dC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9pc2luZGV4LnB5Cj4gdGVzdC9kb20v
aHRtbC90ZXN0X2xhYmVsLnB5Cj4gdGVzdC9kb20vaHRtbC90ZXN0X2xlZ2VuZC5weQo+IHRlc3Qv
ZG9tL2h0bWwvdGVzdF9saS5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9saW5rLnB5Cj4gdGVzdC9k
b20vaHRtbC90ZXN0X21hcC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9tZW51LnB5Cj4gdGVzdC9k
b20vaHRtbC90ZXN0X21ldGEucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfbW9kLnB5Cj4gdGVzdC9k
b20vaHRtbC90ZXN0X29iamVjdC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9vbC5weQo+IHRlc3Qv
ZG9tL2h0bWwvdGVzdF9vcHRncm91cC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9vcHRpb24ucHkK
PiB0ZXN0L2RvbS9odG1sL3Rlc3RfcC5weQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9wYXJhbS5weQo+
IHRlc3QvZG9tL2h0bWwvdGVzdF9wcmUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfcS5weQo+IHRl
c3QvZG9tL2h0bWwvdGVzdF9zY3JpcHQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rfc2VjdGlvbi5w
eQo+IHRlc3QvZG9tL2h0bWwvdGVzdF9zZWxlY3QucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rfc3R5
bGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdGFibGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3Rf
dGQucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdGV4dGFyZWEucHkKPiB0ZXN0L2RvbS9odG1sL3Rl
c3RfdGl0bGUucHkKPiB0ZXN0L2RvbS9odG1sL3Rlc3RfdHIucHkKPiB0ZXN0L2RvbS9odG1sL3Rl
c3RfdWwucHkKPiB0ZXN0L2RvbS9odG1sL3V0aWwucHkKMjA0YTI2NQo+IHRlc3Qvb3V0cHV0L3Rl
c3RfZW5jb2RpbmdzCjIzM2EyOTUKPiB4bWwvZG9tL0Z0Tm9kZS5weQoyMzVkMjk2CjwgeG1sL2Rv
bS9Ob2RlLnB5CjQyNmE0ODgsNDkwCj4geG1sL3VuaWNvZGUvX19pbml0X18ucHkKPiB4bWwvdW5p
Y29kZS9pc284ODU5LnB5Cj4geG1sL3VuaWNvZGUvdXRmOF9pc28ucHkKZGlmZiBQeVhNTC0wLjYu
Mi9NQU5JRkVTVC5pbiBQeVhNTC0wLjYuMy9NQU5JRkVTVC5pbgo2MmE2Myw2NAo+IAo+IGluY2x1
ZGUgc2V0dXBleHQvKi5weQpkaWZmIFB5WE1MLTAuNi4yL1JFQURNRSBQeVhNTC0wLjYuMy9SRUFE
TUUKMzVhMzYKPiAJbWluaWRvbQkJCVBhdWwgUHJlc2NvZApkaWZmIFB5WE1MLTAuNi4yL1JFQURN
RS5weWV4cGF0IFB5WE1MLTAuNi4zL1JFQURNRS5weWV4cGF0CjEsMmMxLDIKPCBQeXRob24gRXhw
YXQgd3JhcHBlciBtb2R1bGUsIHZlcnNpb24gb2YgMTktTWF5LTk4CjwgPT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0KPiBQeXRob24gRXhwYXQgd3Jh
cHBlciBtb2R1bGUKPiA9PT09PT09PT09PT09PT09PT09PT09PT09PT0KNCwxMGM0CjwgSWYgeW91
IGhhdmUgZG93bmxvYWRlZCB0aGUgYmluYXJ5IGRpc3RyaWJ1dGlvbiBmb3IgdGhlIG1hY2ludG9z
aCB5b3UKPCBjYW4gc2tpcCB0aGUgImJ1aWxkaW5nIiBzZWN0aW9ucyBhbmQgZ28gc3RyYWlnaHQg
dG8gdGhlICJ1c2luZyIKPCBiaXQuIElmIHlvdSBhcmUgdXNpbmcgYSBtYWNpbnRvc2ggYW5kIGRv
IHdhbnQgdG8gYnVpbGQgZnJvbSBzb3VyY2UgeW91IAo8IHNob3VsZCBnZXQgdGhlIHB5ZXhwYXQu
dGd6IGRpc3RyaWJ1dGlvbiAoU3R1ZmZpdCBFeHBhbmRlciB3aXRoCjwgRXhwYW5kZXIgRW5oYW5j
ZXIgd2lsbCBrbm93IGhvdyB0byB1bnBhY2sgYSBnemlwcGVkIHRhciBmaWxlKS4KPCAJCjwgQnVp
bGRpbmcgdGhlIHB5ZXhwYXQgbW9kdWxlIHVuZGVyIHVuaXgKLS0tCj4gQnVpbGRpbmcgdGhlIHB5
ZXhwYXQgbW9kdWxlCjEzLDMwYzcKPCAtIEJ1aWxkIGxpYmV4cGF0LmEgaW4gZXhwYXQuIFRoaXMg
dmVyc2lvbiBpcyB2ZXJ5IHNsaWdodGx5IGRpZmZlcmVudCAKPCAgIGZyb20gdGhlIG9yaWdpbmFs
IGJ5IEphbWVzIENsYXJrICh0aGUgbGliZXhwYXQuYSB0YXJnZXQgd2FzIGFkZGVkLAo8ICAgYW5k
IGEgZmV3IEMrKyBjb21tZW50cyB3ZXJlIHJlcGxhY2VkIGJ5IEMgY29tbWVudHMpLgo8IC0gRWRp
dCBNYWtlZmlsZS5wcmUuaW4gYW5kIHNldCB5b3VyIGluc3RhbGxkaXIKPCAtIG1ha2UgLWYgTWFr
ZWZpbGUucHJlLmluIFZFUlNJT049MS41LjEgTWFrZWZpbGUKPCAtIG1ha2Ugc2hhcmVkbW9kcwo8
IC0gcHV0IHRoZSBzaGFyZWQgbW9kdWxlIHNvbWV3aGVyZSBpbiB5b3VyIHN5cy5wYXRoCjwgCjwg
KGlmIHlvdSB3YW50IGEgc3RhdGljIFB5dGhvbiBlZGl0IFNldHVwLmluLCBhbmQgcmVwbGFjZSB0
aGUgbGFzdCBsaW5lCjwgd2l0aCAibWFrZSIpLgo8IAo8IEJ1aWxkaW5nIHRoZSBweWV4cGF0IG1v
ZHVsZSBvbiB0aGUgbWFjaW50b3NoCjwgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0KPCAtIFVucGFjayB0aGUgdmFyaW91cyAuaHF4IHByb2plY3QgZmlsZXMuCjwg
LSBBbGwgdGhlIHByb2plY3RzIGFyZSBsaW5rZWQsIHNvIGJ1aWxkaW5nIHB5ZXhwYXQucHJqIHNo
b3VsZCBidWlsZAo8ICAgZXZlcnl0aGluZy4gSWYgdGhpcyBkb2Vzbid0IHdvcmsgeW91IHdpbGwg
ZmluZCB0aGUgbGlicmFyeQo8ICAgc3VicHJvamVjdHMgdG8gYnVpbGQgaW4gdGhlIGV4cGF0IGZv
bGRlci4KPCAtIFVzZSBFZGl0UHl0aG9uUHJlZnMgdG8gYWRkIHRoZSBjdXJyZW50IGZvbGRlciB0
byBzeXMucGF0aC4KLS0tCj4gVGhlIG1vZHVsZSBpcyBidWlsdCBhcyBwYXJ0IG9mIHJ1bm5pbmcg
c2V0dXAucHkKNDMsNDVjMjAsMjEKPCAJdGhpcyBpcyB0aGUgbGFzdCBiaXQgb2YgZGF0YS4gUmV0
dXJucyB0cnVlIGlmIHBhcnNpbmcKPCAJc3VjY2VlZGVkIChzbyBmYXIpLCBvdGhlcndpc2UgdGhl
IGVycm9yIGF0dHJpYnV0ZXMgaGF2ZQo8IAlpbmZvcm1hdGlvbiBvbiB0aGUgZXJyb3IuCi0tLQo+
IAl0aGlzIGlzIHRoZSBsYXN0IGJpdCBvZiBkYXRhLiBSYWlzZXMgYW4gZXhjZXB0aW9uIGluIGNh
c2Ugb2YKPiAJYW4gZXJyb3IsIHRoZSBlcnJvciBhdHRyaWJ1dGVzIGhhdmUgaW5mb3JtYXRpb24g
b24gdGhlIGVycm9yLgo2Nyw2OWM0Mwo8IFRoaXMgbW9kdWxlIGN1cnJlbnRseSBsaXZlcyBhdAo8
IGZ0cDovL2Z0cC5jd2kubmwvcHViL2phY2svcHl0aG9uL3B5ZXhwYXRzcmMudGd6IChzb3VyY2Up
IGFuZAo8IGZ0cDovL2Z0cC5jd2kubmwvcHViL2phY2svcHl0aG9uL3B5ZXhwYXQuaHF4IChtYWNp
bnRvc2ggYmluYXJ5LW9ubHkpLgotLS0KPiBQbGVhc2UgcmVwb3J0IHByb2JsZW1zIHRvIHhtbC1z
aWdAcHl0aG9uLm9yZy4KZGlmZiBQeVhNTC0wLjYuMi9UT0RPIFB5WE1MLTAuNi4zL1RPRE8KMiw3
ZDEKPCAgICAgICAgICogSW50ZWdyYXRlIHdpZGVzdHJpbmcgc3VwcG9ydCB3aXRoIHRoZSBQeUV4
cGF0IG1vZHVsZSAobWFqb3IgdGhpbmcpCjwgCSogU3dpdGNoIHRvIDRET00ncyBET00gaW1wbGVt
ZW50YXRpb24gCjwgCSogQWRkIFNBWDIgc3VwcG9ydAo8IAkqIERyb3Agd3N0cm9wICYgVW5pY29k
ZTsgUHl0aG9uIDEuNiB3aWxsIGhhbmRsZSB0aGlzCjwgCSogU3BlZWQgdXAgdGhlIGJ1aWxkZXIg
Y2xhc3Mgc29tZWhvdywgYW5kIGRvIHNvbWUgcGVyZm9ybWFuY2UgdGVzdHMKPCAJKiBDaGFuZ2Ug
SFRNTEJ1aWxkZXIgdG8gdXNlIFNBWCBpbnN0ZWFkIG9mIFNHTUxsaWIKMTBkMwo8IAkqIEFkZCBS
RUFETUVzIHRvIGV4aXN0aW5nIGRlbW8gcHJvZ3JhbXMKMTNkNQo8IAkqIHNheGxpYi5BdHRyaWJ1
dGVMaXN0IHNob3VsZCByZWFsbHkgc3VwcG9ydCBhbGwgZGljdGlvbmFyeSBiZWhhdmlvdXIKMTUs
MTZkNgo8IAkqIFVwZGF0ZSB0aGUgV2luZG93cyBETExzIGFuZCBpbnRlZ3JhdGUgQ2hyaXN0aWFu
IFRpc21lcidzIFdJU0UgCjwgCSAgaW5zdGFsbGVyCjIzLDI5ZDEyCjwgCSogQ29udmVydCBhbGwg
dGhlIHJhaXNlIHN0YXRlbWVudCB0byB1c2UgdGhlIGV4Y2VwdGlvbihhcmcpIGZvcm0KPCAKPCAJ
KiBJbXBsZW1lbnQgcmVhZGluZyBvZiBTR01MIGRvY3VtZW50cyBmb3IgRmlsZVJlYWRlcgo8IAo8
IAkqIERvY3VtZW50VHlwZSBjbGFzcyBpcyBtb3N0bHkgdW5maW5pc2hlZDsgd2hhdCBzaG91bGQg
dGhlCjwgaW50ZXJmYWNlIGZvciBjcmVhdGluZyB0aGVtIGxvb2sgbGlrZT8KPCAKMzIsNDFkMTQK
PCAKPCAJKiBXYWxrZXI6IG1lcmdlIHdhbGsoKSBhbmQgd2FsazEoKSBpbnRvIG9uZSBmdW5jdGlv
biAob3IgYXQKPCBsZWFzdCBtYWtlIGl0IG1vcmUgZ2VuZXJpYykKPCAKPCAJKiBYbWxMaW5lYXJp
c2VyOiB3aGF0IHNob3VsZCBpdCBkbyB3aXRoIFBJcyBhbmQgb3RoZXIgc2ltaWxhciB0aGluZ3M/
CjwgCjwgCSogTm9kZUxpc3QgcmV0dXJuZWQgZnJvbSAuZ2V0RWxlbWVudHNCeVRhZ05hbWUgc2hv
dWxkIGJlIGxpdmUuCjwgKEhhcmQsIGFuZCBkb2Vzbid0IHNlZW0gdG8gYmUgdmVyeSB1c2VmdWw7
IEFNSyBkb2Vzbid0IHJlYWxseSBjYXJlLikKPCAKPCAJKiBET00gTGV2ZWwgMiBjaGFuZ2VzCkNv
bW1vbiBzdWJkaXJlY3RvcmllczogUHlYTUwtMC42LjIvV2lzZSBhbmQgUHlYTUwtMC42LjMvV2lz
ZQpDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2J1aWxkIGFuZCBQeVhNTC0wLjYu
My9idWlsZApDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2RlbW8gYW5kIFB5WE1M
LTAuNi4zL2RlbW8KQ29tbW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhNTC0wLjYuMi9kb2MgYW5kIFB5
WE1MLTAuNi4zL2RvYwpDb21tb24gc3ViZGlyZWN0b3JpZXM6IFB5WE1MLTAuNi4yL2V4dGVuc2lv
bnMgYW5kIFB5WE1MLTAuNi4zL2V4dGVuc2lvbnMKQ29tbW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhN
TC0wLjYuMi9tYWMgYW5kIFB5WE1MLTAuNi4zL21hYwpPbmx5IGluIFB5WE1MLTAuNi4zOiBzZXR1
cC5jZmcKZGlmZiBQeVhNTC0wLjYuMi9zZXR1cC5weSBQeVhNTC0wLjYuMy9zZXR1cC5weQo2YzYK
PCBpbXBvcnQgc3lzLCBvcwotLS0KPiBpbXBvcnQgc3lzLCBvcywgc3RyaW5nCjhhOQo+IGZyb20g
c2V0dXBleHQgaW1wb3J0IERhdGFfRmlsZXMsIGluc3RhbGxfRGF0YV9GaWxlcwo0NmE0OCw1MQo+
ICAgICBpZiAncHlleHBhdCcgaW4gc3lzLmJ1aWx0aW5fbW9kdWxlX25hbWVzOgo+ICAgICAgICAg
cHJpbnQgIkVycm9yOiBidWlsdGluIGV4cGF0IGxpYnJhcnkgd2lsbCBjb25mbGljdCB3aXRoIG91
cnMiCj4gICAgICAgICBwcmludCAiUmUtYnVpbGQgcHl0aG9uIHdpdGhvdXQgYnVpbHRpbiBleHBh
dCBtb2R1bGUiCj4gICAgICAgICByYWlzZSBTeXN0ZW1FeGl0CjUzYzU4LDYwCjwgICAgICAgICAg
ICAgICAgICAgZGVmaW5lX21hY3JvcyA9IFsoJ1hNTF9OUycsIE5vbmUpXSwKLS0tCj4gICAgICAg
ICAgICAgICAgICAgZGVmaW5lX21hY3JvcyA9IFsoJ1hNTF9OUycsIE5vbmUpLAo+ICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgKCdYTUxfRFREJyxOb25lKSwKPiAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICgnRVhQQVRfVkVSU0lPTicsJzB4MDEwMjAwJyldLAo2
M2M3MCw3MQo8ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICdleHRlbnNpb25zL2V4cGF0
L3htbHBhcnNlL2hhc2h0YWJsZS5jJywKLS0tCj4gICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgIyBHb25lIGluIDEuMgo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMnZXh0ZW5z
aW9ucy9leHBhdC94bWxwYXJzZS9oYXNodGFibGUuYycsCjcxYzc5LDEzMwo8ICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAKLS0tCj4gCj4gCj4gIyBPbiBXaW5kb3dzLCBpbnN0YWxs
IHRoZSBkb2N1bWVudGF0aW9uIGludG8gYSBkaXJlY3RvcnkgeG1sZG9jLCBhbG9uZwo+ICMgd2l0
aCB4bWwvX3htbHBsdXMuIEZvciBSUE1zLCBkb2NzIGFyZSBpbnN0YWxsZWQgaW50byB0aGUgUlBN
IGRvYwo+ICMgZGlyZWN0b3J5IHZpYSBzZXR1cC5jZmcgKHVzdWFsbCAvdXNyL2RvYykuIE9uIGFs
bCBvdGhlciBzeXN0ZW1zLCB0aGUKPiAjIGRvY3VtZW50YXRpb24gaXMgbm90IGluc3RhbGxlZC4K
PiAKPiBkb2MyeG1sZG9jID0gMAo+IGlmIHN5cy5wbGF0Zm9ybSA9PSAnd2luMzInOgo+ICAgICBk
b2MyeG1sZG9jID0gMQo+IAo+ICMgVGhpcyBpcyBhIGZyYWdtZW50IGZyb20gTUFOSUZFU1QuaW4g
d2hpY2ggc2hvdWxkIGNvbnRhaW4gYWxsCj4gIyBmaWxlcyB3aGljaCBhcmUgY29uc2lkZXJlZCBk
b2N1bWVudGF0aW9uIChkb2MsIGRlbW8sIHRlc3QsIHBsdXMgc29tZQo+ICMgdG9wbGV2ZWwgZmls
ZXMpCj4gZG9jZmlsZXM9IiIiCj4gcmVjdXJzaXZlLWluY2x1ZGUgZG9jICouaHRtbCAKPiByZWN1
cnNpdmUtaW5jbHVkZSBkb2MgKi50ZXggCj4gcmVjdXJzaXZlLWluY2x1ZGUgZG9jICoudHh0IAo+
IHJlY3Vyc2l2ZS1pbmNsdWRlIGRvYyAqLmdpZiAKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2MgKi5j
c3MKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2MgKi5hcGkKPiByZWN1cnNpdmUtaW5jbHVkZSBkb2Mg
Ki53ZWIKPiAKPiByZWN1cnNpdmUtaW5jbHVkZSBkZW1vIFJFQURNRSAKPiByZWN1cnNpdmUtaW5j
bHVkZSBkZW1vICoucHkgCj4gcmVjdXJzaXZlLWluY2x1ZGUgZGVtbyAqLnhtbAo+IHJlY3Vyc2l2
ZS1pbmNsdWRlIGRlbW8gKi5kdGQKPiByZWN1cnNpdmUtaW5jbHVkZSBkZW1vICouaHRtbAo+IHJl
Y3Vyc2l2ZS1pbmNsdWRlIGRlbW8gKi5odG0KPiBpbmNsdWRlIGRlbW8vZ2VueG1sL2RhdGEudHh0
Cj4gaW5jbHVkZSBkZW1vL2RvbS9odG1sMmh0bWwKPiBpbmNsdWRlIGRlbW8veGJlbC9kb2MveGJl
bC5iaWIKPiBpbmNsdWRlIGRlbW8veGJlbC9kb2MveGJlbC50ZXgKPiBpbmNsdWRlIGRlbW8veG1s
cHJvYy9jYXRhbG9nLnNvYwo+IAo+IHJlY3Vyc2l2ZS1pbmNsdWRlIHRlc3QgKi5weSAKPiByZWN1
cnNpdmUtaW5jbHVkZSB0ZXN0ICoueG1sCj4gaW5jbHVkZSB0ZXN0L3Rlc3QueG1sLm91dAo+IHJl
Y3Vyc2l2ZS1pbmNsdWRlIHRlc3Qvb3V0cHV0IHRlc3RfKgo+IAo+IGluY2x1ZGUgQU5OT1VOQ0Ug
Cj4gaW5jbHVkZSBDUkVESVRTIAo+IGluY2x1ZGUgTElDRU5DRSAKPiBpbmNsdWRlIFJFQURNRSog
Cj4gaW5jbHVkZSBUT0RPIAo+ICIiIgo+IAo+IGlmIGRvYzJ4bWxkb2M6Cj4gICAgIHhtbGRvY2Zp
bGVzID0gWwo+ICAgICAgICAgRGF0YV9GaWxlcyhjb3B5X3RvID0gJ3htbGRvYycsCj4gICAgICAg
ICAgICAgICAgICAgIHRlbXBsYXRlID0gc3RyaW5nLnNwbGl0KGRvY2ZpbGVzLCJcbiIpLAo+ICAg
ICAgICAgICAgICAgICAgICBwcmVzZXJ2ZV9wYXRoID0gMSkKPiAgICAgICAgIF0KPiBlbHNlOgo+
ICAgICB4bWxkb2NmaWxlcyA9IFtdCjc0YzEzNgo8ICAgICAgICB2ZXJzaW9uID0gIjAuNi4yIiwg
IyBOZWVkcyB0byBtYXRjaCB4bWwvX19pbml0X18udmVyc2lvbl9pbmZvCi0tLQo+ICAgICAgICB2
ZXJzaW9uID0gIjAuNi4zIiwgIyBOZWVkcyB0byBtYXRjaCB4bWwvX19pbml0X18udmVyc2lvbl9p
bmZvCjgyYTE0NSwxNDcKPiAKPiAgICAgICAgIyBPdmVycmlkZSBjZXJ0YWluIGNvbW1hbmQgY2xh
c3NlcyB3aXRoIG91ciBvd24gb25lcwo+ICAgICAgICBjbWRjbGFzcyA9IHsnaW5zdGFsbF9kYXRh
JzppbnN0YWxsX0RhdGFfRmlsZXN9LCAKODRhMTUwLDE1MQo+IAo+ICAgICAgICBkYXRhX2ZpbGVz
ID0geG1sZG9jZmlsZXMsCjg5YzE1Ngo8ICAgICAgICAgICAgICAgICAgICB4bWwoJy5tYXJzaGFs
JyksCi0tLQo+ICAgICAgICAgICAgICAgICAgICB4bWwoJy5tYXJzaGFsJyksIHhtbCgnLnVuaWNv
ZGUnKSwKT25seSBpbiBQeVhNTC0wLjYuMzogc2V0dXBleHQKQ29tbW9uIHN1YmRpcmVjdG9yaWVz
OiBQeVhNTC0wLjYuMi90ZXN0IGFuZCBQeVhNTC0wLjYuMy90ZXN0Ck9ubHkgaW4gUHlYTUwtMC42
LjI6IHZzX2FjY2VwdExpdmVGaWxlSW5mbz9mZGF0ZT0wMDEyMjYmZm5hbWU9dzAwMTIyNl8wMDAw
JmZ0aW1lPTAwMDAmZnR5cGU9dyZmdmVyc2lvbj0wMyZmc2l6ZT00MjA2OQpDb21tb24gc3ViZGly
ZWN0b3JpZXM6IFB5WE1MLTAuNi4yL3dpbmRvd3MgYW5kIFB5WE1MLTAuNi4zL3dpbmRvd3MKQ29t
bW9uIHN1YmRpcmVjdG9yaWVzOiBQeVhNTC0wLjYuMi94bWwgYW5kIFB5WE1MLTAuNi4zL3htbAo=

--Boundary-=_oQHnWnkUEwHsqmGbbuqCLJJiVswM--


From Olivier Deckmyn" <odeckmyn@teaser.fr  Wed Jan 10 11:09:26 2001
From: Olivier Deckmyn" <odeckmyn@teaser.fr (Olivier Deckmyn)
Date: Wed, 10 Jan 2001 12:09:26 +0100
Subject: [XML-SIG] [URGENT] Problem with accent char
Message-ID: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K>

Hi all,

Looks like parser modifies my content :(

I have the following "xml" string :
"""
<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

One can notice that there are accents chars (iso-8859-1) inside <Name> or
<HeadLine> tags ; with a well defined encoding value in header...

If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
"""
La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
\303\240 Londres
"""

Looks like there has been a unicode (utf-8 ?) conversion ...

What can I do, not to have this conversion made ? I don't want the parser to
modify my content !!!!

Thanx for your support...

I've tried with py-xml 0.5.1 and 0.6.2

I use python 1.5.2 under FreeBSD 4.2

My imports (might help ?):
from xml import dom
from xml.dom.ext.reader import Sax2
from xml.dom import ext
from xml.dom.Node import Node

Thanx again,

Olivier.

---
We are Micro$oft. You will be assimilated. Resistance is futile.


From matt@virtualspectator.com  Wed Jan 10 11:29:38 2001
From: matt@virtualspectator.com (matt)
Date: Thu, 11 Jan 2001 00:29:38 +1300
Subject: [XML-SIG] [URGENT] Problem with accent char
In-Reply-To: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K>
References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K>
Message-ID: <0101110034181B.00856@localhost.localdomain>

Have a look through the mailing list ... I asked a whol lot of these question
earlier ... anyway, comments below :


On Thu, 11 Jan 2001, Olivier Deckmyn wrote:
> Hi all,
> 
> Looks like parser modifies my content :(
> 

good .. it should ... see later


> I have the following "xml" string :
> """
> <?xml version="1.0" encoding="iso-8859-1"?>
> <Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
>   <Head>
>     <Name>GB-OTAN-santé</Name>
>     <DateReleased>20010110T105314Z</DateReleased>
>     <Source>AFP</Source>
>   </Head>
>   <NewsLines>
>     <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
> Londres</HeadLine>
>     <DateLine>LONDRES</DateLine>
>   </NewsLines>
> </Xafp>
> """
> 
> One can notice that there are accents chars (iso-8859-1) inside <Name> or
> <HeadLine> tags ; with a well defined encoding value in header...
> 
> If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
> nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
> """
> La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
> \303\240 Londres
> """
> 
> Looks like there has been a unicode (utf-8 ?) conversion ...
> 

Yes, that is correct, as specified.  All xml parsers should recognise the
encoding set and CONVERT it to unicode ... UTF-8 being the common flavour.


> What can I do, not to have this conversion made ? I don't want the parser to
> modify my content !!!!


It's ok, you can get it back out nicely ....

try the following little function I use :

from xml.dom import ext
def retPrettyPrint(doc):
    t = cStringIO.StringIO()
    ext.PrettyPrint(doc,t, encoding='ISO-8859-1')
    return t.getvalue()


regards
Matt


> 
> Thanx for your support...
> 
> I've tried with py-xml 0.5.1 and 0.6.2
> 
> I use python 1.5.2 under FreeBSD 4.2
> 
> My imports (might help ?):
> from xml import dom
> from xml.dom.ext.reader import Sax2
> from xml.dom import ext
> from xml.dom.Node import Node
> 
> Thanx again,
> 
> Olivier.
> 
> ---
> We are Micro$oft. You will be assimilated. Resistance is futile.
> 
> 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
-- 
Matt Halstead (PhD)
Research and development
VirtualSpectator
http://www.virtualspectator.com
ph 64-9-9136896


From larsga@garshol.priv.no  Wed Jan 10 13:31:50 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 10 Jan 2001 14:31:50 +0100
Subject: [XML-SIG] [URGENT] Problem with accent char
In-Reply-To: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K>
References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K>
Message-ID: <m3wvc35y9l.fsf@lambda.garshol.priv.no>

* Olivier Deckmyn
| 
| One can notice that there are accents chars (iso-8859-1) inside
| <Name> or <HeadLine> tags ; with a well defined encoding value in
| header...
| 
| If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
| nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
| """
| La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
| \303\240 Londres
| """
| 
| Looks like there has been a unicode (utf-8 ?) conversion ...

That is correct.
 
| What can I do, not to have this conversion made ? I don't want the
| parser to modify my content !!!!
 
You can use xmlproc, you can convert back to latin1 yourself, or you
can use Python 2.0, where you'd get Unicode strings.

IMHO this is perfectly reasonable behaviour on the part of pyexpat.

--Lars M.


From uche.ogbuji@fourthought.com  Wed Jan 10 20:23:47 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 10 Jan 2001 13:23:47 -0700
Subject: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char
References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> <m3wvc35y9l.fsf@lambda.garshol.priv.no>
Message-ID: <3A5CC4D3.C933C9AD@fourthought.com>

Lars Marius Garshol wrote:

> | What can I do, not to have this conversion made ? I don't want the
> | parser to modify my content !!!!
> 
> You can use xmlproc, you can convert back to latin1 yourself, or you
> can use Python 2.0, where you'd get Unicode strings.

Bah.  Just to illustrate I prepped the following:

----------------------------------%------------------------------------

from xml.dom.ext.reader import Sax2
from xml.sax.sax2exts import make_parser
p = make_parser("xml.sax.drivers2.drv_xmlproc")
reader = Sax2.Reader(parser=p)

src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)

----------------------------------%------------------------------------

But on the fromString I get

>>> doc = reader.fromString(src)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py",
line 49, in fromString
    rt = self.fromStream(stream, ownerDoc)
  File
"/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 270, in fromStream
    self.parser.parse(stream)
  File
"/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py",
line 88, in parse
    parser.parse_resource(source.getSystemId()) # FIXME: rest!
AttributeError: getSystemId


Looks as if drv_xmlproc is broken for Sax2.

However, Oliver should be OK since the following works.

----------------------------------%------------------------------------

from xml.dom.ext.reader import Sax
from xml.sax.saxexts import make_parser
p = make_parser("xml.sax.drivers.drv_xmlproc")
reader = Sax.Reader(parser=p)

src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)
----------------------------------%------------------------------------

I get

>>> print repr(nodes[0].firstChild.nodeValue)
"La pol\351mique loin d'\352tre apais\351e par l'annonce de tests
\340\012Londres"

Which is what I think Oliver wants.

Lars,  is the Sax2 problem something you've fixed in your CVS tree?  Any
chance of a quick fix?  (I know you're still swamped).

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Wed Jan 10 21:18:20 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 10 Jan 2001 22:18:20 +0100
Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again
In-Reply-To: <01011021320810.00856@localhost.localdomain> (message from matt
 on Wed, 10 Jan 2001 21:15:09 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <0101101829390Y.00856@localhost.localdomain> <200101100749.f0A7nuY00950@mira.informatik.hu-berlin.de> <01011021320810.00856@localhost.localdomain>
Message-ID: <200101102118.f0ALIKA01226@mira.informatik.hu-berlin.de>

> > Would you like to look into correcting that?
> > 
> 
> Hmm, means upgrading to 2.0, which perhaps I should do.  

Ok, I now had a look at it myself; please try the patch attached.
It generates Unicode objects in Python 2, UTF-8 in 1.5.

> The problem is that I use 4dom in some quite heavy zope products,
> and I am unconvinced that python 2.0 and Zope are stable enough for
> production environments, and too different to have split between
> production and development.

I understand the Zope problems are not resolved, yet, so not upgrading
seems still the right thing to do.

> The other part though is making 4Dom pickleable, which was actually
> my next little project, to look at it some more and see where it is
> not pickleable.  Could be simple, someone may already have the
> answer.

I don't know what the state of this is; if you think you can
contribute, just go ahead.

> Having a closer inspection of PyXML 0.6.3, the original memory leak
> from the parser doing it's parsing thing has gone, but there is one
> that exists for just purely making a parser.

Can you provide sample code showing the problem? Perhaps I'm not
seeing it because the Python 2 garbage collector collects the cycles.

Also, did you call xml.dom.ext.ReleaseNode? The DOM is full of cycles;
without a cyclic gc, the only way to get rid of them is to explicitly
release them.

> > > I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur.
> > 
> > I'm confused. Where did you get PyXML 1.2 from?
> > 
> 
> Someone said go get PyXML 1.3 on the 5th January from sourcefourge and I only
> found PyXML 1.2 ..... which has now changed to 1.3 ... and there are
> differences .. I have attached diff PyXML-0.6.2 PyXML-0.6.3 so you can see.

Well, I know well what PyXML 0.6.3 is. I'm just curious as to why you
are calling it 1.3...

Regards,
Martin


From odeckmyn.list@teaser.fr  Thu Jan 11 07:46:16 2001
From: odeckmyn.list@teaser.fr (Olivier Deckmyn)
Date: Thu, 11 Jan 2001 08:46:16 +0100
Subject: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char
References: <018e01c07af5$cbbb4d80$0d00000a@ODECKMYN2K> <m3wvc35y9l.fsf@lambda.garshol.priv.no> <3A5CC4D3.C933C9AD@fourthought.com>
Message-ID: <003701c07ba2$93d5e1c0$0d00000a@ODECKMYN2K>

Hi !!

Thanx you all for your support !

I solved using 4T UTF8String class provided in Unicode package, found in
xml.dom.ext ....

Thanx 4T ;)


----- Original Message -----
From: "Uche Ogbuji" <uche.ogbuji@fourthought.com>
To: "Olivier Deckmyn" <odeckmyn@teaser.fr>
Cc: <xml-sig@python.org>; "'4suite@lists.fourthought.com'"
<4suite@dollar.fourthought.com>
Sent: Wednesday, January 10, 2001 9:23 PM
Subject: Re: [4suite] Re: [XML-SIG] [URGENT] Problem with accent char


> Lars Marius Garshol wrote:
>
> > | What can I do, not to have this conversion made ? I don't want the
> > | parser to modify my content !!!!
> >
> > You can use xmlproc, you can convert back to latin1 yourself, or you
> > can use Python 2.0, where you'd get Unicode strings.
>
> Bah.  Just to illustrate I prepped the following:
>
> ----------------------------------%------------------------------------
>
> from xml.dom.ext.reader import Sax2
> from xml.sax.sax2exts import make_parser
> p = make_parser("xml.sax.drivers2.drv_xmlproc")
> reader = Sax2.Reader(parser=p)
>
> src = """<?xml version="1.0" encoding="iso-8859-1"?>
> <Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
>   <Head>
>     <Name>GB-OTAN-santé</Name>
>     <DateReleased>20010110T105314Z</DateReleased>
>     <Source>AFP</Source>
>   </Head>
>   <NewsLines>
>     <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
> Londres</HeadLine>
>     <DateLine>LONDRES</DateLine>
>   </NewsLines>
> </Xafp>
> """
>
> doc = reader.fromString(src)
> nodes = doc.getElementsByTagName('HeadLine')
> print repr(nodes[0].firstChild.nodeValue)
>
> ----------------------------------%------------------------------------
>
> But on the fromString I get
>
> >>> doc = reader.fromString(src)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py",
> line 49, in fromString
>     rt = self.fromStream(stream, ownerDoc)
>   File
> "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
> line 270, in fromStream
>     self.parser.parse(stream)
>   File
>
"/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py
",
> line 88, in parse
>     parser.parse_resource(source.getSystemId()) # FIXME: rest!
> AttributeError: getSystemId
>
>
> Looks as if drv_xmlproc is broken for Sax2.
>
> However, Oliver should be OK since the following works.
>
> ----------------------------------%------------------------------------
>
> from xml.dom.ext.reader import Sax
> from xml.sax.saxexts import make_parser
> p = make_parser("xml.sax.drivers.drv_xmlproc")
> reader = Sax.Reader(parser=p)
>
> src = """<?xml version="1.0" encoding="iso-8859-1"?>
> <Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
>   <Head>
>     <Name>GB-OTAN-santé</Name>
>     <DateReleased>20010110T105314Z</DateReleased>
>     <Source>AFP</Source>
>   </Head>
>   <NewsLines>
>     <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
> Londres</HeadLine>
>     <DateLine>LONDRES</DateLine>
>   </NewsLines>
> </Xafp>
> """
>
> doc = reader.fromString(src)
> nodes = doc.getElementsByTagName('HeadLine')
> print repr(nodes[0].firstChild.nodeValue)
> ----------------------------------%------------------------------------
>
> I get
>
> >>> print repr(nodes[0].firstChild.nodeValue)
> "La pol\351mique loin d'\352tre apais\351e par l'annonce de tests
> \340\012Londres"
>
> Which is what I think Oliver wants.
>
> Lars,  is the Sax2 problem something you've fixed in your CVS tree?  Any
> chance of a quick fix?  (I know you're still swamped).
>
> Thanks.
>
>
> --
> Uche Ogbuji                               Principal Consultant
> uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
> Fourthought, Inc.                         http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig


From martin@loewis.home.cs.tu-berlin.de  Thu Jan 11 11:29:59 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 11 Jan 2001 12:29:59 +0100
Subject: [XML-SIG] UTF-8 and ISO-8859-1 problems again
In-Reply-To: <01011111124501.00909@localhost.localdomain> (message from matt
 on Thu, 11 Jan 2001 10:59:52 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011021320810.00856@localhost.localdomain> <200101102118.f0ALIKA01226@mira.informatik.hu-berlin.de> <01011111124501.00909@localhost.localdomain>
Message-ID: <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de>

> > > Having a closer inspection of PyXML 0.6.3, the original memory leak
> > > from the parser doing it's parsing thing has gone, but there is one
> > > that exists for just purely making a parser.

I found the problem: While I updated the SAX2 driver, I had not
changed the SAX1 driver. With the patch below, I don't get any memory
leak for your example.

There where two problems: For one, drv_pyexpat did not use our pyexpat
module but the Python one if available, and it would not attempt to
break cycles at the end of parsing.

Regards,
Martin

Index: drv_pyexpat.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/sax/drivers/drv_pyexpat.py,v
retrieving revision 1.11
diff -u -r1.11 drv_pyexpat.py
--- drv_pyexpat.py	2000/10/05 19:32:52	1.11
+++ drv_pyexpat.py	2001/01/11 11:25:28
@@ -14,10 +14,9 @@
 from xml.sax import saxlib,saxutils
 
 try:
-    import pyexpat
+    from xml.parsers import expat
 except ImportError:
-    # pyexpat not built in core installation, use our own
-    from xml.parsers import pyexpat
+    raise SAXReaderNotAvailable("expat not supported",None)
 
 import urllib,types
 
@@ -57,7 +56,7 @@
 
     def parse(self,sysID):
         self.parseFile(urllib.urlopen(sysID),sysID)
-        
+
     def parseFile(self,fileobj,sysID=None):
         self.reset()
         self.sysID=sysID
@@ -71,6 +70,7 @@
         self.parser.Parse("", 1)
             
         self.doc_handler.endDocument()
+        self.close()
 
     # --- Locator methods. Only usable after errors.
 
@@ -90,7 +90,7 @@
 
     def __report_error(self):
         errc=self.parser.ErrorCode
-        msg=pyexpat.ErrorString(errc)
+        msg=expat.ErrorString(errc)
         exc=saxlib.SAXParseException(msg,None,self)
         self.err_handler.fatalError(exc)
 
@@ -113,7 +113,7 @@
 
     def reset(self):
         self.sysID=None
-        self.parser=pyexpat.ParserCreate()
+        self.parser=expat.ParserCreate()
         self.parser.StartElementHandler = self.startElement
         self.parser.EndElementHandler = self.endElement
         self.parser.CharacterDataHandler = self.characters
@@ -125,8 +125,12 @@
             self.__report_error()
 
     def close(self):
+        if self.parser is None:
+            # make sure close is idempotent
+            return
         if self.parser.Parse("", 0) != 1:
             self.__report_error()
+        self.parser = None
         
 # --- An expat driver that uses the lazy map
 

From odeckmyn.list@teaser.fr  Thu Jan 11 16:30:28 2001
From: odeckmyn.list@teaser.fr (Olivier Deckmyn)
Date: Thu, 11 Jan 2001 17:30:28 +0100
Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss
Message-ID: <008e01c07beb$cec80580$0d00000a@ODECKMYN2K>

C'est un message de format MIME en plusieurs parties.

------=_NextPart_000_008B_01C07BF4.30795AB0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


----- Original Message -----=20
From: GExpertsDiscuss@egroups.com=20
To: GExpertsDiscuss@egroups.com=20
Sent: Thursday, January 11, 2001 3:54 PM
Subject: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss=20


Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the GExpertsDiscuss=20
group.

  File        : /Enhance002.zip=20
  Uploaded by : rschoenaker@hotmail.com=20
  Description : Latest and greatest Formdrawer. Please test the drawing =
and spawn flames and comments.=20

You can access this file at the URL

http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20

To learn more about eGroups file sharing, please visit

http://www.egroups.com/help/files.html


Regards,

rschoenaker@hotmail.com


      eGroups Sponsor=20

      Click here to Win a 2001 Acura MDX=20

To unsubscribe from this group, send an email to:
GExpertsDiscuss-unsubscribe@egroups.com


------=_NextPart_000_008B_01C07BF4.30795AB0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2920.0" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV>&nbsp;</DIV>
<DIV style=3D"FONT: 10pt arial">----- Original Message -----=20
<DIV style=3D"BACKGROUND: #e4e4e4; font-color: black"><B>From:</B> <A=20
href=3D"mailto:GExpertsDiscuss@egroups.com"=20
title=3DGExpertsDiscuss@egroups.com>GExpertsDiscuss@egroups.com</A> =
</DIV>
<DIV><B>To:</B> <A href=3D"mailto:GExpertsDiscuss@egroups.com"=20
title=3DGExpertsDiscuss@egroups.com>GExpertsDiscuss@egroups.com</A> =
</DIV>
<DIV><B>Sent:</B> Thursday, January 11, 2001 3:54 PM</DIV>
<DIV><B>Subject:</B> [GExpertsDiscuss] New file uploaded to =
GExpertsDiscuss=20
</DIV></DIV>
<DIV><BR></DIV><TT><BR>Hello,<BR><BR>This email message is a =
notification to let=20
you know that<BR>a file has been uploaded to the Files area of the=20
GExpertsDiscuss <BR>group.<BR><BR>&nbsp;=20
File&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : /Enhance002.zip =
<BR>&nbsp;=20
Uploaded by : <A=20
href=3D"mailto:rschoenaker@hotmail.com">rschoenaker@hotmail.com</A> =
<BR>&nbsp;=20
Description : Latest and greatest Formdrawer. Please test the drawing =
and spawn=20
flames and comments. <BR><BR>You can access this file at the =
URL<BR><BR><A=20
href=3D"http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip">ht=
tp://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip</A>=20
<BR><BR>To learn more about eGroups file sharing, please visit<BR><BR><A =

href=3D"http://www.egroups.com/help/files.html">http://www.egroups.com/he=
lp/files.html</A><BR><BR><BR>Regards,<BR><BR>rschoenaker@hotmail.com<BR><=
BR><BR><BR><BR></TT><BR><!-- |**|begin egp html banner|**| -->
<TABLE border=3D0 cellPadding=3D2 cellSpacing=3D0>
  <TBODY>
  <TR bgColor=3D#ffffcc>
    <TD align=3Dmiddle><FONT color=3D#003399 size=3D-1><B>eGroups=20
    Sponsor</B></FONT></TD></TR>
  <TR bgColor=3D#ffffff>
    <TD width=3D470><A=20
      =
href=3D"http://rd.yahoo.com/M=3D155181.1285362.2881705.2/D=3Degroupmail/S=
=3D1700115362:N/A=3D548475/*http://ad.doubleclick.net/clk;2267767;5122774=
;q?http://www.business.com/challenge"=20
      target=3D_top><IMG alt=3D"Click here to Win a 2001 Acura MDX" =
border=3D0=20
      height=3D60=20
      =
src=3D"http://us.a1.yimg.com/us.yimg.com/a/bu/business_com/promowin5_2.gi=
f"=20
      width=3D468><BR>Click here to Win a 2001 Acura =
MDX</A></TD></TR></TBODY></TABLE><!-- |**|end egp html banner|**| =
--><BR><TT>To=20
unsubscribe from this group, send an email=20
to:<BR>GExpertsDiscuss-unsubscribe@egroups.com<BR></TT><BR></BODY></HTML>=


------=_NextPart_000_008B_01C07BF4.30795AB0--


From odeckmyn.list@teaser.fr  Thu Jan 11 16:48:00 2001
From: odeckmyn.list@teaser.fr (Olivier Deckmyn)
Date: Thu, 11 Jan 2001 17:48:00 +0100
Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss
References: <008e01c07beb$cec80580$0d00000a@ODECKMYN2K>
Message-ID: <011a01c07bee$41f17530$0d00000a@ODECKMYN2K>

C'est un message de format MIME en plusieurs parties.

------=_NextPart_000_0117_01C07BF6.A3A2CA60
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

bad click - Sorry for the noise :(
  ----- Original Message -----=20
  From: Olivier Deckmyn=20
  To: xml-sig@python.org=20
  Sent: Thursday, January 11, 2001 5:30 PM
  Subject: [XML-SIG] Fw: [GExpertsDiscuss] New file uploaded to =
GExpertsDiscuss


  ----- Original Message -----=20
  From: GExpertsDiscuss@egroups.com=20
  To: GExpertsDiscuss@egroups.com=20
  Sent: Thursday, January 11, 2001 3:54 PM
  Subject: [GExpertsDiscuss] New file uploaded to GExpertsDiscuss=20


  Hello,

  This email message is a notification to let you know that
  a file has been uploaded to the Files area of the GExpertsDiscuss=20
  group.

    File        : /Enhance002.zip=20
    Uploaded by : rschoenaker@hotmail.com=20
    Description : Latest and greatest Formdrawer. Please test the =
drawing and spawn flames and comments.=20

  You can access this file at the URL

  http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip=20

  To learn more about eGroups file sharing, please visit

  http://www.egroups.com/help/files.html


  Regards,

  rschoenaker@hotmail.com


        eGroups Sponsor=20

        Click here to Win a 2001 Acura MDX=20

  To unsubscribe from this group, send an email to:
  GExpertsDiscuss-unsubscribe@egroups.com


------=_NextPart_000_0117_01C07BF6.A3A2CA60
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2920.0" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>bad&nbsp;click - Sorry for the noise=20
:(</FONT></DIV>
<BLOCKQUOTE=20
style=3D"BORDER-LEFT: #000000 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT: =
0px; PADDING-LEFT: 5px; PADDING-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A href=3D"mailto:odeckmyn.list@teaser.fr" =
title=3Dodeckmyn.list@teaser.fr>Olivier=20
  Deckmyn</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
href=3D"mailto:xml-sig@python.org"=20
  title=3Dxml-sig@python.org>xml-sig@python.org</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Thursday, January 11, =
2001 5:30=20
  PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [XML-SIG] Fw: =
[GExpertsDiscuss]=20
  New file uploaded to GExpertsDiscuss</DIV>
  <DIV><BR></DIV>
  <DIV>&nbsp;</DIV>
  <DIV style=3D"FONT: 10pt arial">----- Original Message -----=20
  <DIV style=3D"BACKGROUND: #e4e4e4; font-color: black"><B>From:</B> <A=20
  href=3D"mailto:GExpertsDiscuss@egroups.com"=20
  title=3DGExpertsDiscuss@egroups.com>GExpertsDiscuss@egroups.com</A> =
</DIV>
  <DIV><B>To:</B> <A href=3D"mailto:GExpertsDiscuss@egroups.com"=20
  title=3DGExpertsDiscuss@egroups.com>GExpertsDiscuss@egroups.com</A> =
</DIV>
  <DIV><B>Sent:</B> Thursday, January 11, 2001 3:54 PM</DIV>
  <DIV><B>Subject:</B> [GExpertsDiscuss] New file uploaded to =
GExpertsDiscuss=20
  </DIV></DIV>
  <DIV><BR></DIV><TT><BR>Hello,<BR><BR>This email message is a =
notification to=20
  let you know that<BR>a file has been uploaded to the Files area of the =

  GExpertsDiscuss <BR>group.<BR><BR>&nbsp;=20
  File&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : /Enhance002.zip =
<BR>&nbsp;=20
  Uploaded by : <A=20
  href=3D"mailto:rschoenaker@hotmail.com">rschoenaker@hotmail.com</A> =
<BR>&nbsp;=20
  Description : Latest and greatest Formdrawer. Please test the drawing =
and=20
  spawn flames and comments. <BR><BR>You can access this file at the=20
  URL<BR><BR><A=20
  =
href=3D"http://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip">ht=
tp://www.egroups.com/files/GExpertsDiscuss/Enhance002%2Ezip</A>=20
  <BR><BR>To learn more about eGroups file sharing, please =
visit<BR><BR><A=20
  =
href=3D"http://www.egroups.com/help/files.html">http://www.egroups.com/he=
lp/files.html</A><BR><BR><BR>Regards,<BR><BR>rschoenaker@hotmail.com<BR><=
BR><BR><BR><BR></TT><BR><!-- |**|begin egp html banner|**| -->
  <TABLE border=3D0 cellPadding=3D2 cellSpacing=3D0>
    <TBODY>
    <TR bgColor=3D#ffffcc>
      <TD align=3Dmiddle><FONT color=3D#003399 size=3D-1><B>eGroups=20
        Sponsor</B></FONT></TD></TR>
    <TR bgColor=3D#ffffff>
      <TD width=3D470><A=20
        =
href=3D"http://rd.yahoo.com/M=3D155181.1285362.2881705.2/D=3Degroupmail/S=
=3D1700115362:N/A=3D548475/*http://ad.doubleclick.net/clk;2267767;5122774=
;q?http://www.business.com/challenge"=20
        target=3D_top><IMG alt=3D"Click here to Win a 2001 Acura MDX" =
border=3D0=20
        height=3D60=20
        =
src=3D"http://us.a1.yimg.com/us.yimg.com/a/bu/business_com/promowin5_2.gi=
f"=20
        width=3D468><BR>Click here to Win a 2001 Acura =
MDX</A></TD></TR></TBODY></TABLE><!-- |**|end egp html banner|**| =
--><BR><TT>To=20
  unsubscribe from this group, send an email=20
  =
to:<BR>GExpertsDiscuss-unsubscribe@egroups.com<BR></TT><BR></BLOCKQUOTE><=
/BODY></HTML>

------=_NextPart_000_0117_01C07BF6.A3A2CA60--


From teg@redhat.com  Thu Jan 11 21:12:53 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 11 Jan 2001 16:12:53 -0500
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <200101071122.MAA15470@pandora.informatik.hu-berlin.de>
References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de>
Message-ID: <xuyvgrln67e.fsf@halden.devel.redhat.com>

Martin von Loewis <loewis@informatik.hu-berlin.de> writes:

> Version 0.6.3 of the Python/XML distribution is now available.  It
> should be considered a beta release, and can be downloaded from
> the following URLs:
> 
> 	* Restructure DOM interfaces to better accomodate multiple
>           DOM implementations: provide standard exceptions and symbolic
>           constants (including those inside of the Node interface) in
>           xml.dom.
> 
> 	* Improve minidom: validate arguments and raise DOM exceptions,
>           correct NameNodeMap operations, offer cloneNode, splitText,
>           DocumentType, DOMImplementation, and correct various other
>           errors.

Given this, what is the best way to create RPMs of PyXML and 4Suite
which coexist? (no overlapping files). If the dom directory of PyXML
is included (and the one from 4Suite thus not included), things like
XSLT break. OTOH, PyXML has a couple of extra files (minidom, javadom)
etc... would these coexist with the rest of the directory coming from
4Suite_ 

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From martin@loewis.home.cs.tu-berlin.de  Thu Jan 11 21:59:00 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 11 Jan 2001 22:59:00 +0100
Subject: [XML-SIG] Jython usage survey
Message-ID: <200101112159.f0BLx0J02401@mira.informatik.hu-berlin.de>

To find out usage of PyXML with Jython, and to play with another SF
facility, I created a roughly-two-question survey. Please take a
moment to answer it.  It is available at

http://sourceforge.net/survey/survey.php?group_id=6473&survey_id=11258

Please understand that answering the survey won't have any immediate
effect on PyXML; it's rather an indication how Jython support should
evolve in the long term.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Jan 11 21:49:49 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 11 Jan 2001 22:49:49 +0100
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <xuyvgrln67e.fsf@halden.devel.redhat.com> (teg@redhat.com)
References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de> <xuyvgrln67e.fsf@halden.devel.redhat.com>
Message-ID: <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de>

> Given this, what is the best way to create RPMs of PyXML and 4Suite
> which coexist? (no overlapping files). 

If you are speaking as a Linux distributor now, I think the best
action is to not distribute PyXML 0.6.3 at all, as it does not
cooperate with the current 4Suite release. Instead, I recommend to
wait for 0.6.4, and the next release of 4Suite [as it turns out,
4DOM's xml.dom.ext.reader.Sax won't even use the pyexpat
improvements].

> If the dom directory of PyXML is included (and the one from 4Suite
> thus not included), things like XSLT break. OTOH, PyXML has a couple
> of extra files (minidom, javadom) etc... would these coexist with
> the rest of the directory coming from 4Suite_

If you are just asking as a user who wants to use the current version
of 4Suite and PyXML 0.6.3, then yes, that would be a good
combination. I don't know how many javadom users are out there, so
just including minidom and pulldom might be sufficient. However,
these are strictly necessary in combination with Python 2.0 -
otherwise PyXML would break its contract with Python 2, which is to
offer a proper superset of the Python 2 functionality.

If somebody now wonders why I bothered releasing 0.6.3 at all: I would
not have learned about these problems if I hadn't.

If you really where asking about the long-term co-existance of 4Suite
and PyXML, with regard to the 4DOM overlap: I have good faith that
things will work out to everybody's liking.

Regards,
Martin

(*) To find out more about that question, I just created a survey:
http://sourceforge.net/survey/survey.php?group_id=6473&survey_id=11258


From teg@redhat.com  Thu Jan 11 22:17:14 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 11 Jan 2001 17:17:14 -0500
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de>
References: <200101071122.MAA15470@pandora.informatik.hu-berlin.de>
 <xuyvgrln67e.fsf@halden.devel.redhat.com>
 <200101112149.f0BLnnm02224@mira.informatik.hu-berlin.de>
Message-ID: <xuypuhtn385.fsf@halden.devel.redhat.com>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> > Given this, what is the best way to create RPMs of PyXML and 4Suite
> > which coexist? (no overlapping files). 
> 
> If you are speaking as a Linux distributor now,

Both this and as a user - we are distributing it (rawhide), but this
is mainly because we'll be using it.

> I think the best action is to not distribute PyXML 0.6.3 at all, as
>it does not cooperate with the current 4Suite release. Instead, I
>recommend to wait for 0.6.4, and the next release of 4Suite [as it
>turns out, 4DOM's xml.dom.ext.reader.Sax won't even use the pyexpat
>improvements].

Noted. I'll wait before updating - we're currently at 0.5.5.1 and 0.10.0

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From uche.ogbuji@fourthought.com  Thu Jan 11 22:39:11 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 11 Jan 2001 15:39:11 -0700
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: Message from teg@redhat.com (Trond Eivind
 =?iso-8859-1?q?Glomsr=F8d?=)
 of "11 Jan 2001 16:12:53 EST." <xuyvgrln67e.fsf@halden.devel.redhat.com>
Message-ID: <200101112239.PAA08928@localhost.localdomain>

> Martin von Loewis <loewis@informatik.hu-berlin.de> writes:

> Given this, what is the best way to create RPMs of PyXML and 4Suite
> which coexist? (no overlapping files). If the dom directory of PyXML
> is included (and the one from 4Suite thus not included), things like
> XSLT break. OTOH, PyXML has a couple of extra files (minidom, javadom)
> etc... would these coexist with the rest of the directory coming from
> 4Suite_ =


We've pretty much had enough.  As of version 0.10.1, PyXML will come bund=
led =

with 4Suite.

I'm desperately hacking at xmlproc, trying to get it to behave with SAX2,=
 and =

we have a few other minor PyXML fixes.  4Suite 0.10.1 needs a lot of thes=
e =

fixes, but I'm not sure a full PyXML 0.6.4 is warranted.

I think at this point, we'll just be sure to run all applicable tests and=
 =

bundle the version of PyXML we need.  Of course all credits and attributi=
ons =

will be maintained.

This should make building your RPMs much simpler.  In fact, if I had more=
 =

RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite=
" =

RPMs from the single 4Suite source tar.


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From 34065280@25480.com  Fri Jan 12 08:02:04 2001
From: 34065280@25480.com (Joy)
Date: Fri, 12 Jan 01 03:02:04 EST
Subject: [XML-SIG] a late happy new year to you;-)
Message-ID: <234>

Cross Stitcher

WIN-Stitch is the best Cross-Stitch Program on the Market - used by most Professionals.

WIN-Stitch Publisher normally $550 - this week $200
All other programs 50% discount till 15th January only.

Free Download at http://www.WIN-Stitch.com


P.S. sorry if this mail reached you in error. No remove needed as this is a one-time notice only.
 

From uche.ogbuji@fourthought.com  Fri Jan 12 06:38:37 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 11 Jan 2001 23:38:37 -0700
Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now
Message-ID: <200101120638.XAA10719@localhost.localdomain>

I've fought with it, but I think I'm running into pretty fundamental problems 
in xmlproc.XMLProcessor.  I'm not sure what it is about driver2.drv_xmlproc 
that brings out these problems, but I'm getting phantom end tags being 
reported and such weirdness.

Hopefully I'll be able to revisit the problem if Lars can't get to it, but for 
now I must turn back to other issues so we can get out 4Suite 0.10.1.

I have fixed quite a few bugs in driver2.drv_xmlproc which I'm about to check 
in.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From larsga@garshol.priv.no  Fri Jan 12 08:30:53 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Jan 2001 09:30:53 +0100
Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now
In-Reply-To: <200101120638.XAA10719@localhost.localdomain>
References: <200101120638.XAA10719@localhost.localdomain>
Message-ID: <m3lmsh18aq.fsf@lambda.garshol.priv.no>

* uche ogbuji
|
| I've fought with it, but I think I'm running into pretty fundamental
| problems in xmlproc.XMLProcessor.  I'm not sure what it is about
| driver2.drv_xmlproc that brings out these problems, but I'm getting
| phantom end tags being reported and such weirdness.

The problem is almost certainly that your application raises an
IndexError in one of its handler methods.  This causes xmlproc's
buffering to get out of whack and will give just the symptoms you
report. 

This is a known weakness of xmlproc that I will fix as soon as I can.
Note that it is non-trivial to fix it without impacting performance
too much.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Fri Jan 12 08:19:28 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 12 Jan 2001 09:19:28 +0100
Subject: [XML-SIG] Giving up on xmlproc/SAX2 for now
In-Reply-To: <200101120638.XAA10719@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200101120638.XAA10719@localhost.localdomain>
Message-ID: <200101120819.f0C8JSk00987@mira.informatik.hu-berlin.de>

> I've fought with it, but I think I'm running into pretty fundamental
> problems in xmlproc.XMLProcessor.  I'm not sure what it is about
> driver2.drv_xmlproc that brings out these problems, but I'm getting
> phantom end tags being reported and such weirdness.

Could you please provide a few bug reports for these problems? I'd
like to help, but a general "it is broken" is a bad starting point...

Regards,
Martin


From larsga@garshol.priv.no  Fri Jan 12 08:40:36 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Jan 2001 09:40:36 +0100
Subject: [XML-SIG] [OT] Compiler problems
Message-ID: <m3itnl17uj.fsf@lambda.garshol.priv.no>

Whenever I try to compile anything at all using the Python 2.0 sources
I get this compilation error:

/usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT definition appears wrong for platform (bad gcc config?)."


I'm using a stock RedHat 7.0 Linux system, except that I removed the
gcc 2.96 version that came with it (it caused problems compiling SP)
and replaced it with this:

Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/specs
gcc version 2.95.1 19990816/Linux (release)


Does anyone have any ideas as to what the problem is and how it is
best fixed?

--Lars M.


From loewis@informatik.hu-berlin.de  Fri Jan 12 11:23:32 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Fri, 12 Jan 2001 12:23:32 +0100 (MET)
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <200101112239.PAA08928@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200101112239.PAA08928@localhost.localdomain>
Message-ID: <200101121123.MAA15892@pandora.informatik.hu-berlin.de>

> This should make building your RPMs much simpler.  In fact, if I had more 
> RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" 
> RPMs from the single 4Suite source tar.

That probably would not be too desirable; depending on how it is done,
it might not even work. No matter how packaging is done,
xml.dom.minidom should be available in PyXML. In turn,
xml.dom.__init__ must be present to provide Node. In turn,
xml.dom.en_us must also be included.

Regards,
Martin


From akuchlin@mems-exchange.org  Fri Jan 12 15:04:39 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 12 Jan 2001 10:04:39 -0500
Subject: [XML-SIG] [OT] Compiler problems
In-Reply-To: <m3itnl17uj.fsf@lambda.garshol.priv.no>; from larsga@garshol.priv.no on Fri, Jan 12, 2001 at 09:40:36AM +0100
References: <m3itnl17uj.fsf@lambda.garshol.priv.no>
Message-ID: <20010112100439.A27688@kronos.cnri.reston.va.us>

On Fri, Jan 12, 2001 at 09:40:36AM +0100, Lars Marius Garshol wrote:
>
>Whenever I try to compile anything at all using the Python 2.0 sources
>I get this compilation error:
>
>/usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT definition appears wrong for platform (bad gcc config?)."

I believe that it's actually glibc at fault, and the error message in
Python is misleading.  Check at Red Hat for an updated glibc.

--amk


From uche.ogbuji@fourthought.com  Fri Jan 12 15:48:12 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 12 Jan 2001 08:48:12 -0700
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: Message from Martin von Loewis <loewis@informatik.hu-berlin.de>
 of "Fri, 12 Jan 2001 12:23:32 +0100." <200101121123.MAA15892@pandora.informatik.hu-berlin.de>
Message-ID: <200101121548.IAA12133@localhost.localdomain>

> > This should make building your RPMs much simpler.  In fact, if I had more 
> > RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" 
> > RPMs from the single 4Suite source tar.
> 
> That probably would not be too desirable; depending on how it is done,
> it might not even work. No matter how packaging is done,
> xml.dom.minidom should be available in PyXML. In turn,
> xml.dom.__init__ must be present to provide Node. In turn,
> xml.dom.en_us must also be included.

Is this really a problem?  PyXML would be a prereq for 4Suite, and would have 
everything it needs.  The 4Suite RPM vould write to the xml/dom dir the 
additional stuff.  This would mandate that we keep at least __init__ and en_us 
in sync.  But right after this release we plan to have a closer look at the 
co-packaging between PyXML and 4Suite, and I don't think all this will be such 
a mess for long.

I've already taken the preliminary step by updating 4Suite's setup.py to 
install PyXML as well if it's there.  See an announcement coming soon on the 
4Suite list.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From teg@redhat.com  Fri Jan 12 15:50:05 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 12 Jan 2001 10:50:05 -0500
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <200101121548.IAA12133@localhost.localdomain>
References: <200101121548.IAA12133@localhost.localdomain>
Message-ID: <xuyofxcsrbm.fsf@halden.devel.redhat.com>

uche.ogbuji@fourthought.com writes:

> > > This should make building your RPMs much simpler.  In fact, if I had more 
> > > RPM-fu, I could make a spec file that spits out "PyXML-nodom" and "4Suite" 
> > > RPMs from the single 4Suite source tar.
> > 
> > That probably would not be too desirable; depending on how it is done,
> > it might not even work. No matter how packaging is done,
> > xml.dom.minidom should be available in PyXML. In turn,
> > xml.dom.__init__ must be present to provide Node. In turn,
> > xml.dom.en_us must also be included.
> 
> Is this really a problem?  PyXML would be a prereq for 4Suite, and would have 
> everything it needs.  The 4Suite RPM vould write to the xml/dom dir the 
> additional stuff.  This would mandate that we keep at least __init__ and en_us 
> in sync.  

Note that files being present in both packages is a pain - epescially
if you can't use one or the other, but really need a combination of
the two...

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From larsga@garshol.priv.no  Fri Jan 12 16:12:58 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Jan 2001 17:12:58 +0100
Subject: [XML-SIG] [OT] Compiler problems
In-Reply-To: <20010112100439.A27688@kronos.cnri.reston.va.us>
References: <m3itnl17uj.fsf@lambda.garshol.priv.no> <20010112100439.A27688@kronos.cnri.reston.va.us>
Message-ID: <m33deovjed.fsf@lambda.garshol.priv.no>

* Lars Marius Garshol
|
| Whenever I try to compile anything at all using the Python 2.0 sources
| I get this compilation error:
| 
| /usr/local/include/python2.0/pyport.h:390: #error "LONG_BIT
| definition appears wrong for platform (bad gcc config?)."

* Andrew Kuchling
| 
| I believe that it's actually glibc at fault, and the error message in
| Python is misleading.  Check at Red Hat for an updated glibc.

That was it! Thank you!  I upgraded to glibc-2.2-9 and the problem
just disappeared.

--Lars M.


From uche.ogbuji@fourthought.com  Fri Jan 12 16:48:47 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 12 Jan 2001 09:48:47 -0700
Subject: [XML-SIG] 4Suite-0.10.1beta1 (help please)
Message-ID: <3A5F356F.E6768700@fourthought.com>

I have prepared a beta for the 4Suite 0.10.1 release.  I'd especially
like people to help test it because it's the first release that
incorporates PyXML.

If you do care to test it (not on a production machine, of course),
please nuke your Ft and _xmlplus directories in your Python library
first.  Then simply install using

python setup.py install

And give it a whirl.  Send in your bug reports right away so we can get
them in.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta1.tar.gz

Windows users will need a C compiler.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From loewis@informatik.hu-berlin.de  Fri Jan 12 17:02:26 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Fri, 12 Jan 2001 18:02:26 +0100 (MET)
Subject: [XML-SIG] PyXML 0.6.3 is available
In-Reply-To: <200101121548.IAA12133@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200101121548.IAA12133@localhost.localdomain>
Message-ID: <200101121702.SAA05880@pandora.informatik.hu-berlin.de>

> Is this really a problem?  PyXML would be a prereq for 4Suite, and
> would have everything it needs.  The 4Suite RPM vould write to the
> xml/dom dir the additional stuff.  This would mandate that we keep
> at least __init__ and en_us in sync.  But right after this release
> we plan to have a closer look at the co-packaging between PyXML and
> 4Suite, and I don't think all this will be such a mess for long.

As a short-term solution, that is fine. I'm just worried about
somebody installing PyXML and not getting 4DOM.

Regards,
Martin


From eric2461@caramail.com  Fri Jan 12 17:11:54 2001
From: eric2461@caramail.com (RICO)
Date: Fri, 12 Jan 2001 18:11:54 +0100
Subject: [XML-SIG] =?iso-8859-1?Q?Invitations_aux_soldes_priv=E9s_de_Grandes_marques_!?=
Message-ID: <200101121800.f0CI0PW13822@bacho.adi.fr>


From pg@fluent.com  Fri Jan 12 21:33:49 2001
From: pg@fluent.com (Pankaj Gupta)
Date: Fri, 12 Jan 2001 16:33:49 -0500 (EST)
Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10
Message-ID: <Pine.SOL.3.95.1010112162614.22225A-100000@pgsun>

Hi,

I downloaded PyXML and tried to setup on my ultra. It seems the location
of the files expected in Lib/distutils/sysconfig.py is different from the
ones that I have. I have not installed Python in /usr/local, but have it
in my home area. The exception which comes is:

distutils.errors.DistutilsPlatformError: invalid Python installation:
unable to open /usr/local/lib/python2.0/config/Makefile (No such file or
directory)

I tried to findout where this config directory is, but didnot get
anything. If anyone can suggest any workaround, it will be very helpful.

Specifically, can't I simply compile the C files in this distribution with
the Python source files and import the .py files once I open the
interpreter?

Thanks,
Pankaj


From akuchlin@mems-exchange.org  Fri Jan 12 22:13:22 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 12 Jan 2001 17:13:22 -0500
Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10
In-Reply-To: <Pine.SOL.3.95.1010112162614.22225A-100000@pgsun>; from pg@fluent.com on Fri, Jan 12, 2001 at 04:33:49PM -0500
References: <Pine.SOL.3.95.1010112162614.22225A-100000@pgsun>
Message-ID: <20010112171322.A5372@kronos.cnri.reston.va.us>

On Fri, Jan 12, 2001 at 04:33:49PM -0500, Pankaj Gupta wrote:
>distutils.errors.DistutilsPlatformError: invalid Python installation:
>unable to open /usr/local/lib/python2.0/config/Makefile (No such file or
>directory)

sysconfig uses the value of sys.prefix and sys.exec_prefix.  What are 
they set to for your Python installation?

--amk


From martin@loewis.home.cs.tu-berlin.de  Fri Jan 12 22:42:56 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 12 Jan 2001 23:42:56 +0100
Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10
In-Reply-To: <Pine.SOL.3.95.1010112162614.22225A-100000@pgsun> (message from
 Pankaj Gupta on Fri, 12 Jan 2001 16:33:49 -0500 (EST))
References: <Pine.SOL.3.95.1010112162614.22225A-100000@pgsun>
Message-ID: <200101122242.f0CMguH01448@mira.informatik.hu-berlin.de>

> I downloaded PyXML and tried to setup on my ultra. It seems the location
> of the files expected in Lib/distutils/sysconfig.py is different from the
> ones that I have. I have not installed Python in /usr/local, but have it
> in my home area. The exception which comes is:
> 
> distutils.errors.DistutilsPlatformError: invalid Python installation:
> unable to open /usr/local/lib/python2.0/config/Makefile (No such file or
> directory)

Can you give a details description of how you installed Python (what
commands in what sequence), and how you attempted to installed PyXML?
Is there a Python installation in /usr/local/lib/python2.0? If so,
which python binary did you use for setup.py?

> Specifically, can't I simply compile the C files in this
> distribution with the Python source files and import the .py files
> once I open the interpreter?

If you do it right, yes, you can. Please be aware that you need to
give certain defines when compiling the files, and that Python 2.0
comes with its own xml module which is only superceded by PyXML if the
latter is installed in _xmlplus.

Regards,
Martin


From pg@fluent.com  Fri Jan 12 23:15:39 2001
From: pg@fluent.com (Pankaj Gupta)
Date: Fri, 12 Jan 2001 18:15:39 -0500 (EST)
Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10
In-Reply-To: <200101122242.f0CMguH01448@mira.informatik.hu-berlin.de>
Message-ID: <Pine.SOL.3.95.1010112180737.22850A-100000@pgsun>

Hi,

> > I downloaded PyXML and tried to setup on my ultra. It seems the location
> > of the files expected in Lib/distutils/sysconfig.py is different from the
> > ones that I have. I have not installed Python in /usr/local, but have it
> > in my home area. The exception which comes is:
> > 
> > distutils.errors.DistutilsPlatformError: invalid Python installation:
> > unable to open /usr/local/lib/python2.0/config/Makefile (No such file or
> > directory)
> 
> Can you give a details description of how you installed Python (what
> commands in what sequence), and how you attempted to installed PyXML?
> Is there a Python installation in /usr/local/lib/python2.0? If so,
> which python binary did you use for setup.py?

I am not sure how I installed python. I think I just untarred and
gunzipped the distribution and invoked the Makefile after configuring. I
have the python directory in ~/Python-2.0

As for PyXML, I downloaded it in ~/Python-2.0 and after untarring it, I
used:
	'python setup.py install'
in ~/Python-2.0/PyXML-0.6.3 directory.

I donot have anyother python loaded anywhere in the /usr/local area. I
found this path was more or less hardcoded for posix systems in
~/Python-2.0/Lib/distutils/sysconfig.py. Even changing this path didn't
help as I could not find any config sub-directory in Python-2.0.

Thanks,
Pankaj


From martin@loewis.home.cs.tu-berlin.de  Fri Jan 12 23:41:07 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 13 Jan 2001 00:41:07 +0100
Subject: [XML-SIG] build problems with PyXML-0.6.3 on ultra10
In-Reply-To: <Pine.SOL.3.95.1010112180737.22850A-100000@pgsun> (message from
 Pankaj Gupta on Fri, 12 Jan 2001 18:15:39 -0500 (EST))
References: <Pine.SOL.3.95.1010112180737.22850A-100000@pgsun>
Message-ID: <200101122341.f0CNf7S02168@mira.informatik.hu-berlin.de>

> I am not sure how I installed python. I think I just untarred and
> gunzipped the distribution and invoked the Makefile after configuring. I
> have the python directory in ~/Python-2.0

Please have a look at the README file in the python sources. You
should invoke 'make install' to really get a working Python
installation. You probably want to give a --prefix option to
configure.

I believe if you properly install Python, distutils will properly work
as well.

> I donot have anyother python loaded anywhere in the /usr/local area. I
> found this path was more or less hardcoded for posix systems in
> ~/Python-2.0/Lib/distutils/sysconfig.py. 

sysconfig.py does not contain the string 'local', so I doubt there is
anything hard coded anywhere. Instead, it uses sys.prefix, which is
the location you gave to configure's --prefix option.

> Even changing this path didn't help as I could not find any config
> sub-directory in Python-2.0.

Yes, that's because 'make install' will create it.

Regards,
Martin


From noreply@sourceforge.net  Sat Jan 13 14:59:38 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 13 Jan 2001 06:59:38 -0800
Subject: [XML-SIG] [Bug #128666] [4S-0.10.1beta2] problem with validating parser
Message-ID: <E14HS9a-00021q-00@usw-sf-web2.sourceforge.net>

Bug #128666, was updated on 2001-Jan-13 06:59
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: [4S-0.10.1beta2] problem with validating parser

Details: Hi there, 

I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to
generate a DOM with validate set to 1 fails.

----------------------------------
Sample script:
xml="""<?xml version='1.0' encoding='iso-8859-1' standalone='no' ?>
<!DOCTYPE preferences SYSTEM 'test.dtd' []>
<preferences/>"""

from xml.dom.ext.reader import Sax2

d = Sax2.FromXml(xml,validate=1)#,catName=catalog)
-----------------------------------
stack trace:
Traceback (innermost last):
  File "catalog_bug.py", line 9, in ?
    d = Sax2.FromXml(xml,validate=1)#,catName=catalog)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
313, 
in FromXml
    saxHandlerClass, parser)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
306, 
in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
292, 
in fromStream
    self.parser.parse(s)
  File "/usr/lib/python1.5/site-packages/xml/sax/drivers2/drv_xmlproc.py",
line 
93, in parse
    parser.flush()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py",
line 
206, in flush
    self.do_parse()
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py",
line 9
3, in do_parse
    self.parse_start_tag()                        
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py",
line 1
92, in parse_start_tag
    self.report_error(3017)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlproc.py",
line 6
3, in report_error
    EntityParser.report_error(self,number,args)
  File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmlutils.py",
line 
372, in report_error
    self.err.fatal(msg)
  File "/usr/lib/python1.5/site-packages/xml/sax/drivers2/drv_xmlproc.py",
line 
215, in fatal
    self._err_handler.fatalError(saxlib.SAXParseException(msg, None,
self))
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
260, 
in fatalError
    raise exception
------------------------------------
test.DTD
<!ELEMENT preferences (part*)>
<!ELEMENT part (entries)>
<!ELEMENT entries (entry+)>
<!ELEMENT entry (value|entries)>
<!ELEMENT value (#PCDATA)>
<!ATTLIST part name CDATA #REQUIRED>
<!ATTLIST entry name CDATA #REQUIRED>


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128666&group_id=6473


From noreply@sourceforge.net  Sat Jan 13 15:09:02 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 13 Jan 2001 07:09:02 -0800
Subject: [XML-SIG] [Bug #128667] XHtmlPrettyPrint fails
Message-ID: <E14HSIg-000252-00@usw-sf-web2.sourceforge.net>

Bug #128667, was updated on 2001-Jan-13 07:09
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: XHtmlPrettyPrint fails

Details: This bug was present in 4S-0.10.0, and it's still there in
0.10.1.

>>> from xml.dom.ext.reader import Sax2
>>> d = Sax2.FromXml('<xhtml/>')
>>> from xml.dom.ext import XHtmlPrettyPrint
>>> XHtmlPrettyPrint(d)
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd"
>
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE xhtml>Traceback (innermost
last)
:
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/__init__.py", line 92,
in X
HtmlPrettyPrint
    Printer.PrintWalker(visitor, root).run()
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 353,
in r
un
    return self.step()
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 349,
in s
tep
    self.visitor.visit(self.start_node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 159,
in v
isit
    return self.visitDocument(node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/XHtmlPrinter.py", line
26, 
in visitDocument
    Printer.PrintVisitor.visitDocument(self,node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 204,
in v
isitDocument
    self.visitNodeList(node.childNodes, exclude=node.doctype)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 175,
in v
isitNodeList
    curr is not exclude and self.visit(curr)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 135,
in v
isit
    return self.visitElement(node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/XHtmlPrinter.py", line
65, 
in visitElement
    self.stream.write(self._newLine + self._indent*self._depth + '<' +
string.lo
wer(node.localName))
TypeError: bad operand type(s) for *


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128667&group_id=6473


From dsokol@osnut.com  Sun Jan 14 00:13:20 2001
From: dsokol@osnut.com (dsokol@osnut.com)
Date: Sat, 13 Jan 2001 19:13:20 -0500 (EST)
Subject: [XML-SIG] Exciting New Nutraceutical Company- Promote your own ideas!
Message-ID: <20010114001320.108AAEAF1@mail.python.org>

--=200101131341=
Content-Type: text/html;charset=US-ASCII

<!-- saved from url=(0022)http://internet.e-mail -->
<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits</title>
</head>

<body bgcolor="#FFFFFF" text="#008000">
<p align="left"> xml-sig@python.org,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p>
<p align="left">&nbsp;&nbsp;&nbsp; It was a pleasure learning about your
interests in chemistry from your website.&nbsp; Based on your
credentials, I am offering you the following opportunity, which I hope you may
find worthwhile.</p>
<p align="left">Thank you,</p>
<p align="left">Daniel</p>
<p align="center"><font face="Arial Black" size="5">&nbsp;<b><font color="#008000">Have
your nutraceutical ideas become reality and marketed to the general public-and perhaps even globally</font></b><font color="#008000"><b>.</b></font></font></p>
<p align="center"><b><u><font face="Arial Black" size="4">Design Your Own Herbal
and Nutritional Supplements and Reap the Financial Benefits from the Quality of
your own ideas!</font></u></b></p>
      <p align="center"><font face="Gill Sans Ultra Bold" size="4">Kava Kava, Ginseng,
      Echinacea, St. John's Wort...</font></p>
<p align="center"><font face="Gill Sans Ultra Bold" size="4">For <u>FREE</u>
information on these nutraceuticals, including their methods of synthesis,&nbsp;
you can go to <a href="http://www.osnut.com/freeinfo.htm">http://www.osnut.com/freeinfo.htm</a>
by clicking <a href="http://www.osnut.com/freeinfo.htm">HERE</a>.</font></p>
<p align="center"><font color="#008000" size="4">The
      explosion in the nutraceutical industry has left open the possibility for
considerable profits.&nbsp; New nutraceuticals and herbal formulas are being
      discovered, designed, and marketed every day!&nbsp; If you have a
      background in herbs/
      biology/ chemistry /nutrition and/or medicine, then OSnutraceuticals
      is the company for you.</font></p>
<p align="center"><font size="4" color="#008000">Open
      Source Nutraceuticals, Inc. is a company committed to
      excellence in the nutraceutical industry by providing an open
      source
      for the creation and standardization of nutraceuticals for
      naturally treating all kinds of conditions. By implementing
      a&nbsp;linux-like
      platform for discussion and protection of your ideas, OSnutraceuticals can
be the best way to have your innovations marketed to the general
      public and for you to reap the financial benefits from the
sales.</font></p>
      <p align="center"><font size="4" color="#008000">Sign up <b>NOW</b> and
      get 2 months <b>FREE</b>!</font></p>
      <p align="center"><font color="#008000" size="4">For
      more information, visit <a href="http://www.osnut.com">www.osnut.com </a>(</font><a href="http://www.osnut.com"><i><strong><font size="5" color="#008000">or
      if you live in the USA, call 718-336-1974, 9AM-5PM Eastern Standard Time</font></strong></i></a><i><strong><font size="5" color="#008000">)</font></strong></i></p>
      <p align="center"><font color="#008000" size="4">by
      clicking <a href="http://www.osnut.com">HERE!</a></font></p>
<p align="center"><font color="#008000" size="4">(Note:
<a href="http://www.osnut.com">www.osnut.com</a> is best viewed
      using Microsoft's Internet
      Explorer but can also be viewed with Netscape as well)</font></font></p>

<p align="center"><font size="3">&nbsp;</font><font size="4">If you feel
you received this ad by mistake, please contact <a href="mailto:dsokol@osnut.com">dsokol@osnut.com </a>and put the word
&quot;remove&quot; in the subject line.&nbsp; You will automatically be taken
off our mailing list!</font></p>

</body>
</html>

--=200101131341=--


From martin@mira.cs.tu-berlin.de  Sun Jan 14 00:40:23 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 14 Jan 2001 01:40:23 +0100
Subject: [XML-SIG] Re: [4S-0.10.1beta2] problem with validating parser
In-Reply-To: <E14HS9a-00021q-00@usw-sf-web2.sourceforge.net>
 (noreply@sourceforge.net)
References: <E14HS9a-00021q-00@usw-sf-web2.sourceforge.net>
Message-ID: <200101140040.f0E0eNf19753@mira.informatik.hu-berlin.de>

> I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to
> generate a DOM with validate set to 1 fails.

It's a bug in 4DOM, although xmlproc could be more robust (as Lars
Marius already admitted).

The problem is indeed that the XmlDomGenerator produces an index error
in the line

        old_nss, del_nss = self._namespaceStack[-1]

At that point, nothing is on the namespace stack. The reason for that
is that xmlproc uses the namespace interface of the content handler by
default, ie. it calls startElementNS and endElementNS.

Now, while startElement of the XmlDomGenerator extends the
_namespacestack, startElementNS doesn't. However, endElementNS invokes
endElement, which tries to remove things from the namespace stack.

If the XmlDomGenerator was designed to always do its own namespace
processing, I suggest that this is explicitly requested from the SAX
parser, by setting xml.sax.handler.feature_namespaces to 0. Then, the
SAX parser *should* never invoke startElementNS; those methods might
be implemented as raising AssertionErrors just to make sure they
aren't.

IOW, the quick fix for this bug is to patch

--- Sax2.py.orig	Sun Jan 14 01:07:31 2001
+++ Sax2.py	Sun Jan 14 01:08:08 2001
@@ -264,6 +264,7 @@
     def __init__(self, validate=0, keepAllWs=0, catName=None,
                  saxHandlerClass=XmlDomGenerator, parser=None):
         self.parser = parser or (validate and sax2exts.XMLValParserFactory.make_parser()) or sax2exts.XMLParserFactory.make_parser()
+        self.parser.setFeature(handler.feature_namespaces, 0)
         if catName:
             #set up the catalog, if there is one
             from xml.parsers.xmlproc import catalog

into 4DOM.

Regards,
Martin

P.S. As for xmlproc catching IndexErrors, it appears that the only
possible cause for an index error inside do_parse is the assignment to
t.

So why would it hurt to write 

                    try: 
                        t=self.data[self.pos+1] # Optimization
                    except IndexError, e:
                        raise OutOfDataException()

and to remove the outer IndexError? AFAICT, it only costs a
SETUP_EXCEPT/POP_BLOCK pair, which are quite cheap (a function call,
and storing a few variables, no memory allocation).


From uche.ogbuji@fourthought.com  Sun Jan 14 02:06:57 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 13 Jan 2001 19:06:57 -0700
Subject: [XML-SIG] Re: [4suite] Re: [4S-0.10.1beta2] problem with validating parser
References: <E14HS9a-00021q-00@usw-sf-web2.sourceforge.net> <200101140040.f0E0eNf19753@mira.informatik.hu-berlin.de>
Message-ID: <3A6109C1.9EE4D92@fourthought.com>

"Martin v. Loewis" wrote:
> 
> > I'm not sure if this is a 4Suite bug or an xmlproc bug. Attempting to
> > generate a DOM with validate set to 1 fails.
> 
> It's a bug in 4DOM, although xmlproc could be more robust (as Lars
> Marius already admitted).
> 
> The problem is indeed that the XmlDomGenerator produces an index error
> in the line
> 
>         old_nss, del_nss = self._namespaceStack[-1]
> 
> At that point, nothing is on the namespace stack. The reason for that
> is that xmlproc uses the namespace interface of the content handler by
> default, ie. it calls startElementNS and endElementNS.

A bit of a co-incidence.  I discovered this bug (and others in Sax2) a
few hours ago.  Lars's comment about IndexErrors was also my clue.  The
code on my machine now works with xmlproc and SAX2.

It still appears that the fixes I made to drv_xmlproc and xmlproc itself
are valid.  For instance, drv_xmlproc's InputSource management was
broken and xmlproc itself would incorrectly assign the elements
namespace URI to any unprefixed attributes.

I'm currently looking into the minidom and pulldom masking bugs you
mentioned and there should be another beta out today.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche@ogbuji.net  Sun Jan 14 05:26:40 2001
From: uche@ogbuji.net (Uche Ogbuji)
Date: Sat, 13 Jan 2001 22:26:40 -0700
Subject: [XML-SIG] [Fwd: Anyone use Installer with PyXML?]
Message-ID: <3A613890.7E2E3EF4@ogbuji.net>

This is a multi-part message in MIME format.
--------------3B2DC2B12EA665D671B00EFD
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


-- 
Uche Ogbuji
Personal:   uche@ogbuji.net		http://uche.ogbuji.net
Work:       uche.ogbuji@fourthought.com	http://Fourthought.com
--------------3B2DC2B12EA665D671B00EFD
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Path: newsfeed.intelenet.net!news.service.uci.edu!csulb.edu!logbridge.uoregon.edu!newsfeed.mesh.ad.jp!uunet!osa.uu.net!dfw.uu.net!ash.uu.net!news.baymountain.net!not-for-mail
From: "Dan Rolander" <dan.rolander@marriott.com>
Newsgroups: comp.lang.python
Subject: Anyone use Installer with PyXML?
Date: Sat, 13 Jan 2001 11:59:42 -0500
Organization: Baymountain
Message-ID: <mailman.979405324.31216.python-list@python.org>
NNTP-Posting-Host: 63.102.49.30
Mime-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Trace: news.baymountain.net 979405324 12892 63.102.49.30 (13 Jan 2001 17:02:04 GMT)
X-Complaints-To: abuse@baymountain.net
NNTP-Posting-Date: 13 Jan 2001 17:02:04 GMT
To: <python-list@python.org>
Return-Path: <dan.rolander@marriott.com>
Delivered-To: mm+python-list@python.org
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Errors-To: python-list-admin@python.org
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.0.1 (101270)
Precedence: bulk
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Post: <mailto:python-list@python.org>
List-Subscribe: <http://www.python.org/mailman/listinfo/python-list>,
	<mailto: python-list-request@python.org?subject=subscribe>
List-Id: General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe: <http://www.python.org/mailman/listinfo/python-list>,
	<mailto: python-list-request@python.org?subject=unsubscribe>
List-Archive: <http://www.python.org/pipermail/python-list/>
Errors-To: python-list-admin@python.org
X-BeenThere: python-list@python.org
Xref: newsfeed.intelenet.net comp.lang.python:120524

I have not been able to get Gordon McMillan's installer to work with PyXML.
The Win32 exe's I create cannot import a parser and I'm not sure how to
manually configure the .cfg file.

Has anyone done this?

Dan


--------------3B2DC2B12EA665D671B00EFD--


From uche.ogbuji@fourthought.com  Sun Jan 14 08:16:22 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 14 Jan 2001 01:16:22 -0700
Subject: [XML-SIG] 4Suite-0.10.1beta3
Message-ID: <3A616056.285C4DD4@fourthought.com>

Bis again, please help us test this thoroughly.  The
Sax2/xmlprocproblems and the masking of minidom and pulldom appear to be
fixed.  Let us know if it's not so.

On a non-production machine, nuke your Ft and _xmlplus directories in
your Python library.  Then simply install using

python setup.py install

And give it a whirl.  Send in your bug reports right away so we can get
them in.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta3.tar.gz

Windows users will need a C compiler.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Sun Jan 14 18:15:31 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 14 Jan 2001 19:15:31 +0100
Subject: [XML-SIG] 4Suite-0.10.1beta3
In-Reply-To: <3A616056.285C4DD4@fourthought.com> (message from Uche Ogbuji on
 Sun, 14 Jan 2001 01:16:22 -0700)
References: <3A616056.285C4DD4@fourthought.com>
Message-ID: <200101141815.f0EIFVC01536@mira.informatik.hu-berlin.de>

> Bis again, please help us test this thoroughly.

The PyXML test suite fails with it for 'test_dom test_howto
test_minidom test_saxdrivers'. At least test_saxdrivers can be fixed
by using the current PyXML CVS code.

test_howto fails because it now generates an empty

<!DOCTYPE >

in the DOM tests; I'm not sure what change was causing that behaviour.
Can somebody comment whether the line is well-formed? Then we could
regenerate test_howto - although suppressing the empty DOCTYPE
declaration might be a better solution.

test_minidom fails because 4DOM's dom/__init__.py deviates from
PyXML's; minidom passes string arguments into the DOM exceptions.
Perhaps some clarification/agreement is necessary of how exactly the
specific DOM exceptions work; bear in mind that Python 2, PyXML and
4Suite must offer consistent definitions of these classes.

test_dom fails, again, for writing an empty DOCTYPE.

I'd appreciate if you could run the testsuite just before releasing
4Suite; if you run into any problems, please let me know. To run the
testsuite, run testxml.py.

Regards,
Martin


From noreply@sourceforge.net  Mon Jan 15 10:27:38 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jan 2001 02:27:38 -0800
Subject: [XML-SIG] [Bug #128827] [0.10.1-beta3] cannot Print a validated document
Message-ID: <E14I6rS-0006gV-00@usw-sf-web3.sourceforge.net>

Bug #128827, was updated on 2001-Jan-15 02:27
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: [0.10.1-beta3] cannot Print a validated document

Details: Attempting to use xml.dom.ext.Print() on a validated document
gives the following traceback:

>>> Print(tree)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE preferencesTraceback
(innermost last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/__init__.py", line 65,
in Print
    Printer.PrintWalker(visitor, root).run()
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 353,
in run
    return self.step()
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 349,
in step
    self.visitor.visit(self.start_node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 159,
in visit
    return self.visitDocument(node)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 203,
in visitDocument
    node.doctype and self.visitDocumentType(node.doctype)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/Printer.py", line 282,
in visitDocumentType
    self.stream.write(' PUBLIC %s %s' % public, system)
TypeError: not enough arguments for format string


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128827&group_id=6473


From noreply@sourceforge.net  Mon Jan 15 13:53:11 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jan 2001 05:53:11 -0800
Subject: [XML-SIG] [Bug #128851] 4xslt (0.10.0) crash
Message-ID: <E14IA4N-0000pX-00@usw-sf-web2.sourceforge.net>

Bug #128851, was updated on 2001-Jan-15 05:53
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: ornicar
Assigned to : nobody
Summary: 4xslt (0.10.0) crash

Details: 
When running the command "4xslt bugnicar-database.xml bugnicar-insert.xslt"
I get a traceback (see below). The error occurs with both 4Suite 0.10.0 and
0.10.1beta3


Traceback (innermost last):
  File "/usr/bin/4xslt", line 5, in ?
    _4xslt.Run(sys.argv)
  File "/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py", line 94, in
Run
    topLevelParams=top_level_params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 130,
in runUri
    writer, uri, outputStream)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 202,
in runNode
    self.applyTemplates(context, None)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 222,
in applyTemplates
    self.applyBuiltins(context, mode)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 235,
in applyBuiltins
    self.applyTemplates(context, mode)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Processor.py", line 218,
in applyTemplates
    found = sty.applyTemplates(context, mode, self, params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py", line 353,
in applyTemplates
    patternInfo[TEMPLATE].instantiate(context, processor, params)
  File "/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py", line
115, in instantiate
    context = child.instantiate(context, processor)[0]
  File "/usr/lib/python1.5/site-packages/xml/xslt/LiteralElement.py", line
91, in instantiate
    context = child.instantiate(context, processor)[0]
  File "/usr/lib/python1.5/site-packages/xml/xslt/AttributeElement.py",
line 60, in instantiate
    processor.writers[-1].attribute(name, value, namespace)
  File "/usr/lib/python1.5/site-packages/xml/xslt/XmlWriter.py", line 89,
in attribute
    self._currElement.attrs[name] = TranslateCdataAttr(value)
AttributeError: 'None' object has no attribute 'attrs'


Here the XSLT file "bugnicar-insert.xslt":
----------------------------------------------

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml" version="1.0" encoding="ISO-8859-1"
indent="yes"/>
<xsl:strip-space elements='*'/>

<!-- Narval prototype
====================================================== -->

<prototype>
<description lang="fr">Insérer un nouveau bug dans la base de données de
Bugnicar.</description>
<description lang="en">Insert a new bug in Bugnicar's
database.</description>
<input><match>bugnicar-task/bug-report</match></input>
<input><match>bugnicar-database</match></input>
<output><match>bugnicar-database</match></output>
</prototype>

<!-- root
================================================================== -->

<xsl:template match='//bug-report'>
    <xsl:variable name='date' select='@date'/>
    <xsl:variable name='sender' select='@sender'/>
    <xsl:variable name='about' select='@about'/>
    <xsl:variable name='description' select='text()'/>
    <xsl:variable name='newid'>
        <xsl:choose>
            <xsl:when test='@new-id'><xsl:value-of
select='@new-id'/></xsl:when>
            <xsl:otherwise>1</xsl:otherwise>
        </xsl:choose>
    </xsl:variable>
</xsl:template>

<xsl:template match='//bugnicar-database'>
    <bugnicar-database>
        <xsl:copy-of select='*'/>
        <bug id='{$newid}' status='open' about='{$about}'
assigned='nobody'>
            <report date='{$date}' from='{$sender}' about='{$about}'>
                <xsl:value-of select='$description'/>
            </report>
        </bug>

        <xsl:attribute name='new-id'>
            <xsl:value-of select='$newid+1'/>
        </xsl:attribute>
    </bugnicar-database>
</xsl:template>

</xsl:transform>


And here the XML file bugnicar-database.xml:
-------------------------------------------
<bugnicar-database>
</bugnicar-database>


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128851&group_id=6473


From noreply@sourceforge.net  Mon Jan 15 15:00:54 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jan 2001 07:00:54 -0800
Subject: [XML-SIG] [Patch #103240] patch for bug #128827 - Print() fails on validated documents
Message-ID: <E14IB7u-0006tV-00@usw-sf-web1.sourceforge.net>

Patch #103240 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: afayolle
Assigned to : nobody
Summary: patch for bug #128827 - Print() fails on validated documents

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103240&group_id=6473


From noreply@sourceforge.net  Mon Jan 15 15:14:10 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jan 2001 07:14:10 -0800
Subject: [XML-SIG] [Bug #128860] [0.10.1beta3] Sax2 parser ignores keepAllWs option
Message-ID: <E14IBKk-0008UI-00@usw-sf-web3.sourceforge.net>

Bug #128860, was updated on 2001-Jan-15 07:13
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: [0.10.1beta3] Sax2 parser ignores keepAllWs option

Details: When using Sax.FromXmlWhatever with the validate argument set to
TRUE and keepAllWs to FALSE, whitespace at the beginning and ending of text
nodes is not ignored.

4Suite 0.10.0 had the right behaviour. 

Comments in source code seem to indicate that this may be related to
xmlproc (Sax2.py line 156)

Alexandre Fayolle

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128860&group_id=6473


From uche.ogbuji@fourthought.com  Mon Jan 15 18:28:00 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 15 Jan 2001 11:28:00 -0700
Subject: [XML-SIG] 4Suite 0.10.1beta4
Message-ID: <3A634130.258A62FF@fourthought.com>

Thanks so much to all those who reported bugs in the past betas.  We
have addressed most of these

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.1beta4.tar.gz

This fixes

*  Problems accessing minidom and pulldom (including a distutils
work-around for Alexandre's problem with permissions to write to the
source dir)
*  Problems with DOM HTML, XHTML, Sax2 and printers
*  XSLT bugs
*  etc.

Ther are still a couple of bugs we want to address before packaging so
expect a release candidate in a few hours.  We'll probably begin final
packaging late afternoon.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Mon Jan 15 20:59:54 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jan 2001 12:59:54 -0800
Subject: [XML-SIG] [Bug #128924] xmlproc not generating ignorableWhitespace events
Message-ID: <E14IGjK-0002Cp-00@usw-sf-web3.sourceforge.net>

Bug #128924, was updated on 2001-Jan-15 12:59
Here is a current snapshot of the bug.

Project: Python/XML
Category: xmlproc
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: uche
Assigned to : nobody
Summary: xmlproc not generating ignorableWhitespace events

Details: Trying the following using 4DOM

----------------------------------%--------------------------------
import cStringIO
from xml.dom.ext import Print, PrettyPrint
from xml.dom.ext.reader import Sax, Sax2
from xml.sax import sax2exts, saxexts


source_1 = """\
<?xml version = "1.0"?>
<!DOCTYPE ADDRBOOK [
  <!ELEMENT ADDRBOOK (ENTRY*)>
  <!ELEMENT ENTRY (NAME, ADDRESS, PHONENUM*, EMAIL)>
  <!ATTLIST ENTRY
    ID ID #REQUIRED
  >
  <!ELEMENT NAME (#PCDATA)>
  <!ELEMENT ADDRESS (#PCDATA)>
  <!ELEMENT PHONENUM (#PCDATA)>
  <!ATTLIST PHONENUM
    DESC CDATA #REQUIRED
  >
  <!ELEMENT EMAIL (#PCDATA)>
]>
<ADDRBOOK>
    <ENTRY ID="pa">
        <NAME>Pieter Aaron</NAME>
        <ADDRESS>404 Error Way</ADDRESS>
        <PHONENUM DESC="Work">404-555-1234</PHONENUM>
        <PHONENUM DESC="Fax">404-555-4321</PHONENUM>
        <PHONENUM DESC="Pager">404-555-5555</PHONENUM>
        <EMAIL>pieter.aaron@inter.net</EMAIL>
    </ENTRY>
    <ENTRY ID="en">
        <NAME>Emeka Ndubuisi</NAME>
        <ADDRESS>42 Spam Blvd</ADDRESS>
        <PHONENUM DESC="Work">767-555-7676</PHONENUM>
        <PHONENUM DESC="Fax">767-555-7642</PHONENUM>
        <PHONENUM DESC="Pager">800-SKY-PAGEx767676</PHONENUM>
        <EMAIL>endubuisi@spamtron.com</EMAIL>
    </ENTRY>
    <ENTRY ID="vz">
        <NAME>Vasia Zhugenev</NAME>
        <ADDRESS>2000 Disaster Plaza</ADDRESS>
        <PHONENUM DESC="Work">000-987-6543</PHONENUM>
        <PHONENUM DESC="Cell">000-000-0000</PHONENUM>
        <EMAIL>vxz@magog.ru</EMAIL>
    </ENTRY>
</ADDRBOOK>
"""

p = saxexts.make_parser("xml.sax.drivers.drv_xmlproc")
reader = Sax.Reader(parser=p, keepAllWs=0)
doc = reader.fromString(source_1)
stream = cStringIO.StringIO()
Print(doc, stream=stream)
result = stream.getvalue()

----------------------------------%--------------------------------

No ignorableWhitespace events are generated.

I have checked that drv_xmlproc does not seem to be getting the
handle_ignorable_data events.


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=128924&group_id=6473


From uche.ogbuji@fourthought.com  Mon Jan 15 21:02:10 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 15 Jan 2001 14:02:10 -0700
Subject: [XML-SIG] xmlproc bug, I think
Message-ID: <200101152102.OAA11363@localhost.localdomain>

Relevant to a problem Alexandre Fayolle is having

http://sourceforge.net/bugs/?func=detailbug&bug_id=128924&group_id=6473

I might have time to look into this after the 4Suite release today, but any 
help is appreciated.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From matt@virtualspectator.com  Mon Jan 15 21:45:29 2001
From: matt@virtualspectator.com (matt)
Date: Tue, 16 Jan 2001 10:45:29 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011111124501.00909@localhost.localdomain> <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de>
Message-ID: <01011610521502.00889@localhost.localdomain>

I'm using PyXML 0.6.3 and python 1.5.2.  It seems that CDATA sections are still
not handled correctly.  The following code demonstrates.  


test_xml = """<?xml version='1.0' encoding='ISO-8859-1'?>
<item_20001204_035952>
  <one>
      <caption>a test caption</caption>
  </one>
  <![CDATA[some test data]]>
</item_20001204_035952>"""
 
 
from xml.dom import ext
from xml.dom.ext.reader import Sax2
from xml.sax import saxexts
 
a_parser = saxexts.XMLParserFactory.make_parser('xml.sax.drivers.drv_pyexpat')
 
doc = Sax2.FromXml(test_xml,None,parser=a_parser, validate=0)
ext.PrettyPrint(doc,encoding='ISO-8859-1') 


from this I get the CDATA element returning as a text node
<Text Node at 818a8b0: data = '\0xa  some test data\0xa'> 


regards
Matt


From martin@mira.cs.tu-berlin.de  Tue Jan 16 00:21:05 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 16 Jan 2001 01:21:05 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011610521502.00889@localhost.localdomain> (message from matt
 on Tue, 16 Jan 2001 10:45:29 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011111124501.00909@localhost.localdomain> <200101111129.f0BBTxr00962@mira.informatik.hu-berlin.de> <01011610521502.00889@localhost.localdomain>
Message-ID: <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de>

> I'm using PyXML 0.6.3 and python 1.5.2.  It seems that CDATA
> sections are still not handled correctly.  The following code
> demonstrates.

Can you elaborate why this is incorrect?

Regards,
Martin


From matt@virtualspectator.com  Tue Jan 16 02:25:15 2001
From: matt@virtualspectator.com (matt)
Date: Tue, 16 Jan 2001 15:25:15 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011610521502.00889@localhost.localdomain> <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de>
Message-ID: <01011615263906.00889@localhost.localdomain>

Sorry, the result of the ext.PrettyPrint is :


test_xml = """<?xml version='1.0' encoding='ISO-8859-1'?>
<item_20001204_035952>
<one>
<caption>a test caption</caption>
</one>
some test data
</item_20001204_035952>


the CDATA escaping has disappeared


On Tue, 16 Jan 2001, Martin v. Loewis wrote:
> > I'm using PyXML 0.6.3 and python 1.5.2.  It seems that CDATA
> > sections are still not handled correctly.  The following code
> > demonstrates.
> 
> Can you elaborate why this is incorrect?
> 
> Regards,
> Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 16 07:24:22 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 16 Jan 2001 08:24:22 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011615263906.00889@localhost.localdomain> (message from matt
 on Tue, 16 Jan 2001 15:25:15 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011610521502.00889@localhost.localdomain> <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de> <01011615263906.00889@localhost.localdomain>
Message-ID: <200101160724.f0G7OMc00811@mira.informatik.hu-berlin.de>

> Sorry, the result of the ext.PrettyPrint is :
> 
> 
> test_xml = """<?xml version='1.0' encoding='ISO-8859-1'?>
> <item_20001204_035952>
> <one>
> <caption>a test caption</caption>
> </one>
> some test data
> </item_20001204_035952>
> 
> 
> the CDATA escaping has disappeared

Yes, and why is this incorrect? The two documents are equal.

Regards,
Martin


From ndw@nwalsh.com  Tue Jan 16 07:41:25 2001
From: ndw@nwalsh.com (Norman Walsh)
Date: 16 Jan 2001 14:41:25 +0700
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: matt's message of "Tue, 16 Jan 2001 15:25:15 +1300"
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
 <01011610521502.00889@localhost.localdomain>
 <200101160021.f0G0L5J01243@mira.informatik.hu-berlin.de>
 <01011615263906.00889@localhost.localdomain>
Message-ID: <877l3w3pwa.fsf@nwalsh.com>

<delurk><![CDATA[
/ matt <matt@virtualspectator.com> was heard to say:
| Sorry, the result of the ext.PrettyPrint is :
[...]
| </one>
| some test data
| </item_20001204_035952>
| 
| the CDATA escaping has disappeared

IMHO, that's the behavior that you should expect. CDATA sections are
an escaping mechanism, but a serializer is free to choose an alternate
escaping mechanism if it chooses.

Note also that CDATA escaping and document encoding are related.  It's
possible to construct documents (if you combine several input sources)
that *cannot* preserve the CDATA escaping and the desired encoding.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Life is an irritation--Tucker Case
http://nwalsh.com/            | (Christopher Moore)
]]></delurk>


From matt@virtualspectator.com  Tue Jan 16 09:34:23 2001
From: matt@virtualspectator.com (matt)
Date: Tue, 16 Jan 2001 22:34:23 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <877l3w3pwa.fsf@nwalsh.com>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011615263906.00889@localhost.localdomain> <877l3w3pwa.fsf@nwalsh.com>
Message-ID: <01011622394806.00912@localhost.localdomain>

I was following the logic that ext.PrettyPrint can write to a stream, and that
it is useful to pick up a document that has escaped data(which may be xml
itself), add some nodes to it, and save it back to the stream expecting the
escaped sections to be still present as escaped sections.  So what I understand
now is that I should either use a serializer that keeps these, or write a DTD
and use that to write my xml back out to file in a more proper way.  Which I
guess is my next question, what is the cleanest method in PyXML for reading in
such a file with CDATA sections, and getting them back out when rewriting?

regards
Matt


On Tue, 16 Jan 2001, Norman Walsh wrote:
> <delurk><![CDATA[
> / matt <matt@virtualspectator.com> was heard to say:
> | Sorry, the result of the ext.PrettyPrint is :
> [...]
> | </one>
> | some test data
> | </item_20001204_035952>
> | 
> | the CDATA escaping has disappeared
> 
> IMHO, that's the behavior that you should expect. CDATA sections are
> an escaping mechanism, but a serializer is free to choose an alternate
> escaping mechanism if it chooses.
> 
> Note also that CDATA escaping and document encoding are related.  It's
> possible to construct documents (if you combine several input sources)
> that *cannot* preserve the CDATA escaping and the desired encoding.
> 
>                                         Be seeing you,
>                                           norm
> 
> -- 
> Norman Walsh <ndw@nwalsh.com> | Life is an irritation--Tucker Case
> http://nwalsh.com/            | (Christopher Moore)
> ]]></delurk>
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
-- 


From Wolfgang.Schoeberl@web.de  Tue Jan 16 13:34:26 2001
From: Wolfgang.Schoeberl@web.de (Wolfgang Schoeberl)
Date: Tue, 16 Jan 2001 14:34:26 +0100
Subject: [XML-SIG] Problem with 'Bad Request'
Message-ID: <200101161334.f0GDYQh11472@mailgate4.cinetic.de>

Hi,

this is not a specific xml-Problem, but I hope you will help me though. I'=
ve got a problem with catching errors. More specific, I would like to catc=
h 'Bad Request', which wont't work because of the space. Is it a bug in Py=
thon=3F Does anybody know a neat trick=3F

Thanks a lot,
Wolfgang

Here's some more code to describe my problem:

def test1():
    try:
        raise "NoProblem"
    except "NoProblem":
        print "Test1: NoProblem catched" # work fine

def test2():
    try:
     raise "No Problem with blank"
    except "No Problem with blank":
        print "Test2: No Problem with blank catched" #work fine

def raiseNoProblem():
    raise "NoProblem"

def raiseProblemWithBlank():
    raise "Problem with blank"

def test3():
    try:
     raiseNoProblem()
    except "NoProblem":
        print "Test3: NoProblem catched" #work fine

def test4():
    try:
        raiseProblemWithBlank()
    except "Problem with blank":
        print "Test4: No Problem with blank" # does not work :-(
    except "Problem":
        print "'Test4: Problem' catched it"
    except "Problem ":
        print "'Test4: Problem=5F' catched it"
    except:
        print "Test4: 'Problem with blank' not catched - except catched it=
"


test1()
test2()
test3()
test4()
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F
Die Fachpresse ist sich einig: WEB.DE 16mal Testsieger! Kostenlos E-Mail,=20
Fax, SMS, Verschl=FCsselung, POP3, WAP....testen Sie uns! http://freemail.we=
b.de


From uche.ogbuji@fourthought.com  Tue Jan 16 14:33:35 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 16 Jan 2001 07:33:35 -0700
Subject: [XML-SIG] ANN: 4Suite 0.10.1
Message-ID: <200101161433.HAA25167@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                             4Suite 0.10.1
                      ---------------------------
   Open source tools for standards-based XML, DOM, XPath, XSLT, RDF
       XPointer, XLink and object-database development in Python

                           http://4Suite.org


4Suite is a collection of Python tools for XML processing and object
database management.  An integrated packaging of several formerly
separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS,
4XPointer, 4XLink and DbDOM.

News
----

    * PyXML (0.6.3 + fixes) is now built in
    * Implement XInclude
    * DbDom: Implement cloneNode and document fragments
    * XSLT: More thorough test harness
    * XSLT: Support source docs from stdin on 4xslt command line
    * XSLT: Implement unparsed-entity-uri
    * XSLT: Restricted HTML writer output allowed as security tool
    * XPath: Add extension funcs: evaluate,distinct,split,range,if,find
    * DOM: Update to 2000-11-13 level 2 recomendation
    * DOM: Proper SAX2 support for reader
    * DOM: Add native sgmlop reader
    * RDF: Add removeAll to Model
    * Documentation updates and consolidation
    * Domlette reader option to force 8-bit DOM strings even in Python 2.0
    * Organize Reader and URI handler APIs to allow easier customizations
    * Many Python 1.5.2 and 2.0 compatibility fixes
    * Many misc optimizations
    * Many misc bug-fixes
    * 4Suite.org revamped: much heavier use of 4Suite Server features


More info and Obtaining 4Suite
------------------------------

Please see

        http://4Suite.org

>From where you can download source, Windows and Linux binaries.

4Suite is distributed under a license similar to that of the
Apache Web Server.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +01 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Jan 16 14:33:59 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 16 Jan 2001 07:33:59 -0700
Subject: [XML-SIG] ANN: 4Suite Server 0.10.1
Message-ID: <200101161433.HAA25292@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                          4Suite Server 0.10.1
                      ----------------------------
         An open source XML data server based on open standards
               implemented using 4Suite and other tools


                  http://FourThought.com/4SuiteServer
                           http://4Suite.org


News
----

  *  Windows support
  *  Smoother installation and configuration
  *  Comprehensive installation HOWTOs
  *  HTTP server support
  *  Raw file support: can serve arbitrary files given mime type
  *  Very experimental SOAP support
  *  Python 2.0 support
  *  More demos
  *  Many optimizations and bug fixes
  * 4Suite.org revamped: much heavier use of 4Suite Server features


4Suite Server is a platform for XML processing.  It features an XML data
repository, a rules-based engine, and XSLT transforms, XPath and
RDF-based indexing and query, XLink resolution and many other XML
services.  It also supports related services such as distributed
transactions and access control lists.  It supports remote,
cross-platform and cross-language access through CORBA, HTTP and other
request protocols to be added shortly.

It's not meant to be a full-blown application server.  It provides
highly-specialized services for XML processing that can be used with
other application servers.

The software is open-source and free to download.  Priority support
and customization is available from Fourthought, Inc.  For more
information on this, see the http://FourThought.com, or contact
Fourthought at info@fourthought.com or +1 303 583 9900

The 4Suite Server home page is

http://FourThought.com/4SuiteServer

>From where you can download the software itself or an executive summary
thereof, read usage scenarios and find other information.


From martin@mira.cs.tu-berlin.de  Wed Jan 17 00:00:43 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 17 Jan 2001 01:00:43 +0100
Subject: [XML-SIG] Problem with 'Bad Request'
In-Reply-To: <200101161334.f0GDYQh11472@mailgate4.cinetic.de>
 (Wolfgang.Schoeberl@web.de)
References: <200101161334.f0GDYQh11472@mailgate4.cinetic.de>
Message-ID: <200101170000.f0H00hC00944@mira.informatik.hu-berlin.de>

> this is not a specific xml-Problem, but I hope you will help me
> though. I've got a problem with catching errors. More specific, I
> would like to catch 'Bad Request', which wont't work because of the
> space. Is it a bug in Python? Does anybody know a neat trick?

It is not a bug in Python; please look at the description of the
intern builtin to see why that happens (perhaps the raise/try
specification also requiring on using identical, not equal strings).

Anyway, the neat trick is to write

problemWithBlank = "Problem with blank"

def raiseProblemWithBlank():
    raise problemWithBlank

def test4():
    try:
        raiseProblemWithBlank()
    except problemWithBlank:
        print "Test4: No Problem with blank" # does not work :-(

test4()

Please note that string exceptions are deprecated; the Pythonic way to
write this code is

class ProblemWithBlank(Exception):
    pass

def raiseProblemWithBlank():
    raise ProblemWithBlank

def test4():
    try:
        raiseProblemWithBlank()
    except ProblemWithBlank:
        print "Test4: No Problem with blank" # does not work :-(

test4()

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 16 23:54:14 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 17 Jan 2001 00:54:14 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011622394806.00912@localhost.localdomain> (message from matt
 on Tue, 16 Jan 2001 22:34:23 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011615263906.00889@localhost.localdomain> <877l3w3pwa.fsf@nwalsh.com> <01011622394806.00912@localhost.localdomain>
Message-ID: <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de>

> I was following the logic that ext.PrettyPrint can write to a stream

That assumption is good, it indeed does.

> and that it is useful to pick up a document that has escaped
> data(which may be xml itself), add some nodes to it, and save it
> back to the stream expecting the escaped sections to be still
> present as escaped sections.

That logic is flawed (or, there is no logic in it - that's just an
assertion). Why is that useful? I.e. why would anybody who'll read the
resulting document need to know where exactly the CDATA sections where
located in the original document?

> So what I understand now is that I should either use a serializer
> that keeps these, or write a DTD and use that to write my xml back
> out to file in a more proper way.

I think your understanding is incorrect. It is not possible to write a
serializer that produces the original input by just looking at the DOM
tree, and having a DTD does not help at all, either.

> Which I guess is my next question, what is the cleanest method in
> PyXML for reading in such a file with CDATA sections, and getting
> them back out when rewriting?

The cleanest way is to accept that it is not possible to write the
document back so that it equals the original document on a
byte-by-byte basis.

It is possible to write the document back so that the content is the
same as in the original document; the cleanest way for that is to use
ext.PrettyPrint.

Regards,
Martin

P.S. What you *can* get back is CDATA sections for every text element,
by properly inheriting from the PrettyPrinter. However, this will give
you CDATA sections in places where the original document had none.


From matt@virtualspectator.com  Wed Jan 17 00:42:17 2001
From: matt@virtualspectator.com (matt)
Date: Wed, 17 Jan 2001 13:42:17 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011622394806.00912@localhost.localdomain> <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de>
Message-ID: <0101171357420F.00889@localhost.localdomain>

On Wed, 17 Jan 2001, Martin v. Loewis wrote:
> > I was following the logic that ext.PrettyPrint can write to a stream
> 
> That assumption is good, it indeed does.
> 
> > and that it is useful to pick up a document that has escaped
> > data(which may be xml itself), add some nodes to it, and save it
> > back to the stream expecting the escaped sections to be still
> > present as escaped sections.
> 
> That logic is flawed (or, there is no logic in it - that's just an
> assertion). Why is that useful? I.e. why would anybody who'll read the
> resulting document need to know where exactly the CDATA sections where
> located in the original document?

umm, I actually don't care where the CDATA sections are in the doucment.  I
thought the most obvious  scenario that I was alluding to is that one reads in
an xml document from a file.  Since one has NO interest in parsing the content,
rendering, or interpreting it, but does have an interest in locating a
particular node and adding a new fragment to it, then saving the modifed
document, via ext.PrettyPrint(which I am using), to file again, then one
obviously does not want CDATA markers to be removed, because, 1) they may have
not written the first document, and 2) they are not trying to interpret it,
this will be done at some later stage, in which case one would use an event
handler xml parser.  Consideriong DOM is useful for document assembly, I don't
see any flaw in this logic.  You missed the point entirely in that I don't care
where they are in the document.


> 
> > So what I understand now is that I should either use a serializer
> > that keeps these, or write a DTD and use that to write my xml back
> > out to file in a more proper way.
> 
> I think your understanding is incorrect. It is not possible to write a
> serializer that produces the original input by just looking at the DOM
> tree, and having a DTD does not help at all, either.

again you are on the wrong track ... I don't care about order .......


> 
> > Which I guess is my next question, what is the cleanest method in
> > PyXML for reading in such a file with CDATA sections, and getting
> > them back out when rewriting?
> 
> The cleanest way is to accept that it is not possible to write the
> document back so that it equals the original document on a
> byte-by-byte basis.

maybe the following will explain why it is useful ..... which is the hack I use
to get CDATA back into the file again.  Presumably you would think that if you
opened an xml file into a DOM tree, then saved it again, then it would still be
the same "kind" of document, i.e. CDATA nodes would STILL be CDATA nodes.

Yes I assume 1) the node name is unique and 2) that it's first child is a
text node ......

def convertTextNodeToCDataNodeByName(doc,name):
    node_list = doc.getElementsByTagNameNS('',name)
    text_node = node_list[0].firstChild
    text_data = retPrettyPrint(text_node)
    new_cdata_node = makeCDataSection(doc,text_data)
    text_node.parentNode.replaceChild(new_cdata_node,text_node)
       

> 
> It is possible to write the document back so that the content is the
> same as in the original document; the cleanest way for that is to use
> ext.PrettyPrint.
> 
> Regards,
> Martin
> 
> P.S. What you *can* get back is CDATA sections for every text element,
> by properly inheriting from the PrettyPrinter. However, this will give
> you CDATA sections in places where the original document had none.
-- 

regards
Matt


From martin@mira.cs.tu-berlin.de  Wed Jan 17 07:40:53 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 17 Jan 2001 08:40:53 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <0101171357420F.00889@localhost.localdomain> (message from matt
 on Wed, 17 Jan 2001 13:42:17 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011622394806.00912@localhost.localdomain> <200101162354.f0GNsEa00915@mira.informatik.hu-berlin.de> <0101171357420F.00889@localhost.localdomain>
Message-ID: <200101170740.f0H7era01202@mira.informatik.hu-berlin.de>

> Since one has NO interest in parsing the content, rendering, or
> interpreting it, but does have an interest in locating a particular
> node and adding a new fragment to it, then saving the modifed
> document, via ext.PrettyPrint(which I am using), to file again,

I understand you are not interested in parsing the document; if you
build a DOM tree, parsing of the document will happen as a side
effect. You cannot avoid this: this is the only way to get a DOM tree
from a document. So while you are not interested in the parsing, you
should accept that it is done.

> then one obviously does not want CDATA markers to be removed,
> because, 1) they may have not written the first document, and 2)
> they are not trying to interpret it,

Who is "they" here? The CDATA markers? or the users of your tool?

So somebody has not written the document, and that same
person/entity/whatever is not trying to interpret it. Why does it
follow that this person/entity does not want the CDATA markers to be
removed? If that person does not even look at the document, why is
there any harm done by removing the CDATA markers. They have *no*
meaning in the document.

> You missed the point entirely in that I don't care where they are in
> the document.

I assume "they" is the CDATA markers, here. If you don't care where
they are in the document, why is it a problem if there is no CDATA
marker in the output of PrettyPrint?

> maybe the following will explain why it is useful ..... which is the
> hack I use to get CDATA back into the file again.  Presumably you
> would think that if you opened an xml file into a DOM tree, then
> saved it again, then it would still be the same "kind" of document,

That I would think. It should still be the same "kind" of document,
i.e. have the same elements, the elements should have the same
attributes, and elements containing text should still contain the same
text.

> i.e. CDATA nodes would STILL be CDATA nodes.

No, I would not think that. Changing CDATA nodes to text does not
change the document; it is still the same one. Replacing CDATA
fragments with text is the same kind of transformation as replacing
&lt; with &#60; - this does not change the document.

> Yes I assume 1) the node name is unique and 2) that it's first child is a
> text node ......
> 
> def convertTextNodeToCDataNodeByName(doc,name):
>     node_list = doc.getElementsByTagNameNS('',name)
>     text_node = node_list[0].firstChild
>     text_data = retPrettyPrint(text_node)
>     new_cdata_node = makeCDataSection(doc,text_data)
>     text_node.parentNode.replaceChild(new_cdata_node,text_node)

That means you know in advance that you only have a single CDATA
fragment in the original document, you want to produce one in the
output in the same location (i.e. inside the same element as it was in
the original input).

What if there is more than one CDATA section in the original document?
What if there was none?

Regards,
Martin


From matt@virtualspectator.com  Wed Jan 17 09:14:49 2001
From: matt@virtualspectator.com (matt)
Date: Wed, 17 Jan 2001 22:14:49 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <200101170740.f0H7era01202@mira.informatik.hu-berlin.de>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <0101171357420F.00889@localhost.localdomain> <200101170740.f0H7era01202@mira.informatik.hu-berlin.de>
Message-ID: <01011722360200.00860@localhost.localdomain>

ok, so now I am getting somewhere in understanding this .... more comments below

On Wed, 17 Jan 2001, Martin v. Loewis wrote:
> > Since one has NO interest in parsing the content, rendering, or
> > interpreting it, but does have an interest in locating a particular
> > node and adding a new fragment to it, then saving the modifed
> > document, via ext.PrettyPrint(which I am using), to file again,
> 
> I understand you are not interested in parsing the document; if you
> build a DOM tree, parsing of the document will happen as a side
> effect. You cannot avoid this: this is the only way to get a DOM tree
> from a document. So while you are not interested in the parsing, you
> should accept that it is done.

This is where I see the extra step that is necessary, so tell me if I am on the
right track.  A CDATA section that contains xml will be translated by a parser
into a text node that is still valid by virtue of the character references that
it places in place of characters such as "<" ... i.e. &lt;, and that for
example if they wrote some naff xml in an input , eg "&&<name><<" this, if
escaped in the original document by CDAT, would be translated into a text node
with "&amp;&amp;&lt;name>&lt;&lt;".  Now if that CDATA was supposed to be xml
as well, but was necessarily hidden for a while so that validation could be
performed further along a processing chain, then I also need to write a
processor to replace the character references, in which case I could possibly
define <!ENTITY> s for such a translation, so that the parser would see <
instead of &lt;


> 
> > then one obviously does not want CDATA markers to be removed,
> > because, 1) they may have not written the first document, and 2)
> > they are not trying to interpret it,
> 
> Who is "they" here? The CDATA markers? or the users of your tool?
> 

many people who pick up a document and modify it and put it back.

> So somebody has not written the document, and that same
> person/entity/whatever is not trying to interpret it. Why does it
> follow that this person/entity does not want the CDATA markers to be
> removed? If that person does not even look at the document, why is
> there any harm done by removing the CDATA markers. They have *no*
> meaning in the document.

Just the above, one wants to take the CDATA at some point and treat it as
either an xml document on its own, or just part of the current xml document. 
The CDATA simply being used to escape sections that could possibly break
validation at earlier points, eg on a server, where there may be no chance of
handling bad xml sections, but that at a later point, eg some client
application, then an exception can be handled nicely, in which case the CDATA
section can now be safely interpreted.  This is where I see I need reverse
translation, and simply cannot directly parse what use to be a CDATA section.


> 
> > You missed the point entirely in that I don't care where they are in
> > the document.
> 
> I assume "they" is the CDATA markers, here. If you don't care where
> they are in the document, why is it a problem if there is no CDATA
> marker in the output of PrettyPrint?

as above


> 
> > maybe the following will explain why it is useful ..... which is the
> > hack I use to get CDATA back into the file again.  Presumably you
> > would think that if you opened an xml file into a DOM tree, then
> > saved it again, then it would still be the same "kind" of document,
> 
> That I would think. It should still be the same "kind" of document,
> i.e. have the same elements, the elements should have the same
> attributes, and elements containing text should still contain the same
> text.
> 
> > i.e. CDATA nodes would STILL be CDATA nodes.
> 
> No, I would not think that. Changing CDATA nodes to text does not
> change the document; it is still the same one. Replacing CDATA
> fragments with text is the same kind of transformation as replacing
> &lt; with &#60; - this does not change the document.
> 
> > Yes I assume 1) the node name is unique and 2) that it's first child is a
> > text node ......
> > 
> > def convertTextNodeToCDataNodeByName(doc,name):
> >     node_list = doc.getElementsByTagNameNS('',name)
> >     text_node = node_list[0].firstChild
> >     text_data = retPrettyPrint(text_node)
> >     new_cdata_node = makeCDataSection(doc,text_data)
> >     text_node.parentNode.replaceChild(new_cdata_node,text_node)
> 
> That means you know in advance that you only have a single CDATA
> fragment in the original document, you want to produce one in the
> output in the same location (i.e. inside the same element as it was in
> the original input).
> 
> What if there is more than one CDATA section in the original document?
> What if there was none?
> 

I already do checking for it being a text node and the node names that are
searched for are gauranteed to be unique and to be a single child node. 


> Regards,
> Martin
-- 


From martin@mira.cs.tu-berlin.de  Wed Jan 17 17:47:19 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 17 Jan 2001 18:47:19 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011722360200.00860@localhost.localdomain> (message from matt
 on Wed, 17 Jan 2001 22:14:49 +1300)
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <0101171357420F.00889@localhost.localdomain> <200101170740.f0H7era01202@mira.informatik.hu-berlin.de> <01011722360200.00860@localhost.localdomain>
Message-ID: <200101171747.f0HHlJU00867@mira.informatik.hu-berlin.de>

> A CDATA section that contains xml 

The entire document is xml; you probably mean

"A CDATA section that contains markup delimiters"

here. A CDATA section, by definition, contains only characters. It
never contains markup.

> will be translated by a parser into a text node that is still valid
> by virtue of the character references that it places in place of
> characters such as "<" ... i.e. &lt;, and that for example if they
> wrote some naff xml in an input , eg "&&<name><<" this, if escaped
> in the original document by CDAT, would be translated into a text
> node with "&amp;&amp;&lt;name>&lt;&lt;".

Not exactly. Character entities will be replaced with their true
characters in the DOM tree, i.e. the CDATA section will appear in the
DOM tree as a text node with its contents; a text containing "&lt;" in
the input will be translated to "<" when creating the DOM tree.

It is the *output* function that does any necessary escaping. So when
the CDATA section contained a literal "<", then, on output, the pretty
printer has the option of generating &lt; or &#60; or a CDATA section.

> Now if that CDATA was supposed to be xml as well, but was
> necessarily hidden for a while so that validation could be performed
> further along a processing chain,

It seems you are trying to use XML in a way not supported by any
standard. If you have a CDATA section, it contains characters by
definition; you can't suppose that these characters are markup.

> then I also need to write a processor to replace the character
> references, in which case I could possibly define <!ENTITY> s for
> such a translation, so that the parser would see < instead of &lt;

No. Each conforming XML parser knows that &lt; represents "<" - you
don't need to supply a entity definition for that. It also knows that
"<" cannot be represented as "<" in text; section 2.4 of the
recommendation clearly says

# The ampersand character (&) and the left angle bracket (<) may
# appear in their literal form only when used as markup delimiters, or
# within a comment, a processing instruction, or a CDATA section. ...
# If they are needed elsewhere, they must be escaped using either
# numeric character references or the strings "&amp;" and "&lt;"
# respectively.

So when generating XML, a conforming processor will only emit "<"
outside a CDATA section to mean the markup delimiter.

> Just the above, one wants to take the CDATA at some point and treat
> it as either an xml document on its own, or just part of the current
> xml document.

That is not supported by the XML recommendation. A CDATA section only
contains characters, not markup. So if you treat CDATA sections in any
other way, you violate the XML recommendation.

> The CDATA simply being used to escape sections that could possibly
> break validation at earlier points, eg on a server, where there may
> be no chance of handling bad xml sections, but that at a later
> point, eg some client application, then an exception can be handled
> nicely, in which case the CDATA section can now be safely
> interpreted.  This is where I see I need reverse translation, and
> simply cannot directly parse what use to be a CDATA section.

You need to invented a new markup language for that kind of
processing; XML does not support such a kind of interpretation of a
document.

Regards,
Martin


From matt@virtualspectator.com  Wed Jan 17 21:03:32 2001
From: matt@virtualspectator.com (matt)
Date: Thu, 18 Jan 2001 10:03:32 +1300
Subject: [XML-SIG] CDATA sections still not handled
Message-ID: <01011810040608.00856@localhost.localdomain>

hmm, I'm off track again ....

On Thu, 18 Jan 2001, you wrote:
> > A CDATA section that contains xml 
> 
> The entire document is xml; you probably mean
> 
> "A CDATA section that contains markup delimiters"
> 
> here. A CDATA section, by definition, contains only characters. It
> never contains markup.
> 
> > will be translated by a parser into a text node that is still valid
> > by virtue of the character references that it places in place of
> > characters such as "<" ... i.e. &lt;, and that for example if they
> > wrote some naff xml in an input , eg "&&<name><<" this, if escaped
> > in the original document by CDAT, would be translated into a text
> > node with "&amp;&amp;&lt;name>&lt;&lt;".
> 
> Not exactly. Character entities will be replaced with their true
> characters in the DOM tree, i.e. the CDATA section will appear in the
> DOM tree as a text node with its contents; a text containing "&lt;" in
> the input will be translated to "<" when creating the DOM tree.
> 


This translation obviously happens after validation, since invalid xml like
data in CDATA will never be validated against.  Which is what I want.

> It is the *output* function that does any necessary escaping. So when
> the CDATA section contained a literal "<", then, on output, the pretty
> printer has the option of generating &lt; or &#60; or a CDATA section.
> 
> > Now if that CDATA was supposed to be xml as well, but was
> > necessarily hidden for a while so that validation could be performed
> > further along a processing chain,
> 
> It seems you are trying to use XML in a way not supported by any
> standard. If you have a CDATA section, it contains characters by
> definition; you can't suppose that these characters are markup.

I don't suppose they are, I know they are.

> 
> > then I also need to write a processor to replace the character
> > references, in which case I could possibly define <!ENTITY> s for
> > such a translation, so that the parser would see < instead of &lt;
> 
> No. Each conforming XML parser knows that &lt; represents "<" - you
> don't need to supply a entity definition for that. It also knows that
> "<" cannot be represented as "<" in text; section 2.4 of the
> recommendation clearly says
> 
> # The ampersand character (&) and the left angle bracket (<) may
> # appear in their literal form only when used as markup delimiters, or
> # within a comment, a processing instruction, or a CDATA section. ...
> # If they are needed elsewhere, they must be escaped using either
> # numeric character references or the strings "&amp;" and "&lt;"
> # respectively.
> 
> So when generating XML, a conforming processor will only emit "<"
> outside a CDATA section to mean the markup delimiter.
> 
> > Just the above, one wants to take the CDATA at some point and treat
> > it as either an xml document on its own, or just part of the current
> > xml document.
> 
> That is not supported by the XML recommendation. A CDATA section only
> contains characters, not markup. So if you treat CDATA sections in any
> other way, you violate the XML recommendation.

ummm, here is another confusing part ... the following is from the xml
specification :

2.7 CDATA Sections

[Definition: CDATA sections may occur anywhere character data may occur;
they are used to escape blocks of text containing characters which would
otherwise be recognized as markup. CDATA sections begin with the
string "<![CDATA[" and end with the string "]]>":]


ummm, so can you be clearer about my apparent violation of CDATA by putting xml
like data in it?


> 
> > The CDATA simply being used to escape sections that could possibly
> > break validation at earlier points, eg on a server, where there may
> > be no chance of handling bad xml sections, but that at a later
> > point, eg some client application, then an exception can be handled
> > nicely, in which case the CDATA section can now be safely
> > interpreted.  This is where I see I need reverse translation, and
> > simply cannot directly parse what use to be a CDATA section.
> 
> You need to invented a new markup language for that kind of
> processing; XML does not support such a kind of interpretation of a
> document.


No I don't, because it works fine when the CDATA label are kept, but you are
also saying that a parser can/should translate the character references
such as "&lt;", and looking at expat, it does, so, well, it seems to work
perfectly fine.  But now I am interested why this is a violation.  A perfectly
acceptable use is that one uses xml to wrap a message, which itself may be xml,
but ut is up to the message interpreter later on to figure out if it valid. 


> 
> Regards,
> Martin
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig


regards
Matt
-------------------------------------------------------

-- 
Matt Halstead (PhD)
Research and development
VirtualSpectator
http://www.virtualspectator.com
ph 64-9-9136896


From martin@mira.cs.tu-berlin.de  Wed Jan 17 21:57:18 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 17 Jan 2001 22:57:18 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011810040608.00856@localhost.localdomain> (message from matt
 on Thu, 18 Jan 2001 10:03:32 +1300)
References: <01011810040608.00856@localhost.localdomain>
Message-ID: <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de>

> This translation obviously happens after validation, since invalid xml like
> data in CDATA will never be validated against.  Which is what I want.

I'm telling you: the data in CDATA are is just character text, not
markup. So no matter what text you put in there, it is always
well-formed and valid (unless it violates the document charset).

> > It seems you are trying to use XML in a way not supported by any
> > standard. If you have a CDATA section, it contains characters by
> > definition; you can't suppose that these characters are markup.
> 
> I don't suppose they are, I know they are.

Maybe in your understanding of how your application should work. Not
in XML.

> 2.7 CDATA Sections
> 
> [Definition: CDATA sections may occur anywhere character data may occur;
> they are used to escape blocks of text containing characters which would
> otherwise be recognized as markup. CDATA sections begin with the
> string "<![CDATA[" and end with the string "]]>":]
> 

> ummm, so can you be clearer about my apparent violation of CDATA by
> putting xml like data in it?

It is completely well-formed to put "xml-like" data into a CDATA
section. However, an application that suddenly "turns" those data into
markup by removing the CDATA markers violates XML; it appears that
your application is supposed to operate in such a way.

IOW, the data might look like xml. When they are in a CDATA section,
they are not markup. Trying to see them as markup at some point and
not as markup at some other point means to read something into the XML
standard that is not there.

> > You need to invented a new markup language for that kind of
> > processing; XML does not support such a kind of interpretation of a
> > document.
> 
> 
> No I don't, because it works fine when the CDATA label are kept, but you are
> also saying that a parser can/should translate the character references
> such as "&lt;", and looking at expat, it does, so, well, it seems to work
> perfectly fine.  

To be precise, I'm saying it can. It might chose to keep the generate
rougly the same, or even more, CDATA sections on output as well.

>But now I am interested why this is a violation.  A perfectly
>acceptable use is that one uses xml to wrap a message, which itself
>may be xml, but ut is up to the message interpreter later on to
>figure out if it valid.

It's not a violation to put "xml like" data into a CDATA section, but
they are just plain character data. I said

# So if you treat CDATA sections in any other way, you violate the XML
# recommendation.

*That* is something you cannot expect to work.

Regards,
Martin


From matt@virtualspectator.com  Wed Jan 17 23:11:26 2001
From: matt@virtualspectator.com (matt)
Date: Thu, 18 Jan 2001 12:11:26 +1300
Subject: Fwd: Re: [XML-SIG] CDATA sections still not handled
Message-ID: <01011812115302.00886@localhost.localdomain>

Now I see where you are coming from.  No I don't expect anything to suddenly
see xml where CDATA was and interpret it within the same context of the
document containing this node.  All I am saying is that xml documant A holds a
node B.  Node B happens to contain some xml, because that is part of a message
format.  A doesn't need to know about the form of B, in only so far as it is
CDATA and therefore it should not try to validate it as xml if it contains xml
markup, but it will validate the character set, as, yes it is character data.

At some point a process picks up A, searches for node B, extracts it, does NOT
assume it is xml, but will look through it for any xml that exists,  If it
finds some then it validates it ... which means that section will be cut out
ans passed to an xml parser.

The important thing that I think I understand is the following :
Any xml in the CDATA section doesn't need to look like xml to the human
reader.  A parser however, when handling a text node may do the following :
a) if the tag CDATA is still there, then call handlers for the start and and
CDATA sections, and pass the character data(which may contain markup explicitly)
to the character data handler.   b) if the CDATA tags are not there, then it
will/needs to be represented as character references, such as &lt; and one
needs to make sure that it is translated, either by the parser or by the
process reading it into the correct characters before being passed to a stream
for later processing and possibly validation.


On Thu, 18 Jan 2001, you wrote:
> > This translation obviously happens after validation, since invalid xml like
> > data in CDATA will never be validated against.  Which is what I want.
> 
> I'm telling you: the data in CDATA are is just character text, not
> markup. So no matter what text you put in there, it is always
> well-formed and valid (unless it violates the document charset).
> 

so what's this then ?

<?xml version='1.0' encoding='ISO-8859-1'?>
<text_20001222_154201>
  <body><![CDATA[some text and possibly some markup <name><<, but we don't
want to validate this yet]]>   </body>
</text_20001222_154201>
                         

looks like markup inside CDATA to me ....  I think you actually mean
"unescaped" character data does not contain markup, eg : &lt; is certainly not
markup.


> > > It seems you are trying to use XML in a way not supported by any
> > > standard. If you have a CDATA section, it contains characters by
> > > definition; you can't suppose that these characters are markup.
> > 
> > I don't suppose they are, I know they are.
> 
> Maybe in your understanding of how your application should work. Not
> in XML.

what would you say to someone wanting to let other people put html formatting
in text node data, but knowing that html is often not written as valid xml,
then escaping it is a safe bet ....


> 
> > 2.7 CDATA Sections
> > 
> > [Definition: CDATA sections may occur anywhere character data may occur;
> > they are used to escape blocks of text containing characters which would
> > otherwise be recognized as markup. CDATA sections begin with the
> > string "<![CDATA[" and end with the string "]]>":]
> > 
> 
> > ummm, so can you be clearer about my apparent violation of CDATA by
> > putting xml like data in it?
> 
> It is completely well-formed to put "xml-like" data into a CDATA
> section. However, an application that suddenly "turns" those data into
> markup by removing the CDATA markers violates XML; it appears that
> your application is supposed to operate in such a way.

Nope, nowhere near what I am trying to do.  A and B are independent.(see
above)


> 
> IOW, the data might look like xml. When they are in a CDATA section,
> they are not markup. Trying to see them as markup at some point and
> not as markup at some other point means to read something into the XML
> standard that is not there.


..... makes my html example look wrong, yet it is a common use for CDATA.

> 
> > > You need to invented a new markup language for that kind of
> > > processing; XML does not support such a kind of interpretation of a
> > > document.
> > 
> > 
> > No I don't, because it works fine when the CDATA label are kept, but you are
> > also saying that a parser can/should translate the character references
> > such as "&lt;", and looking at expat, it does, so, well, it seems to work
> > perfectly fine.  
> 
> To be precise, I'm saying it can. It might chose to keep the generate
> rougly the same, or even more, CDATA sections on output as well.
> 
> >But now I am interested why this is a violation.  A perfectly
> >acceptable use is that one uses xml to wrap a message, which itself
> >may be xml, but ut is up to the message interpreter later on to
> >figure out if it valid.
> 
> It's not a violation to put "xml like" data into a CDATA section, but
> they are just plain character data. I said
> 
> # So if you treat CDATA sections in any other way, you violate the XML
> # recommendation.
> 
> *That* is something you cannot expect to work.
> 


All I originally wanted was for CDATA tags to remain in place so that at some
point, when looking at B, one could actually look for the markup tags.  Now
that I know these are often reverse translated when character data is handles
then that is fine(I know they are with expat).


regards
Matt


> Regards,
> Martin
-------------------------------------------------------

-- 
Matt Halstead (PhD)
Research and development
VirtualSpectator
http://www.virtualspectator.com
ph 64-9-9136896


From ken@bitsko.slc.ut.us  Wed Jan 17 23:32:09 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 17 Jan 2001 17:32:09 -0600
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: "Martin v. Loewis"'s message of "Wed, 17 Jan 2001 22:57:18 +0100"
References: <01011810040608.00856@localhost.localdomain>
 <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de>
Message-ID: <x7vgrdsqkm.fsf@bitsko.slc.ut.us>

Matt,

If I understand this thread correctly, it's the common "how do I pass
XML inside XML" question.

CDATA sections are not relevant to this question.  These two XML
fragments are equivalent for all practical purposes:

  <my-tag><[CDATA[Some <tags> &amp; &entities; inside XML]]></my-tag>

  <my-tag>Some &lt;tags> &amp;amp; &amp;entities; inside XML</my-tag>

In both cases your application will see:

  startElement()  with element name 'my-tag'
  characters()    with data "Some <tags> &amp; &entities; inside XML"
  endElement()    with element name 'my-tag'


That the data "is" XML is also not relevant to this question, it could
be any type of data that contains markup characters.

If you want to "do something with the XML" inside the XML, the easiest
way is to use another instance of a parser to parse the string as XML.

If you are interested in preserving the fact that the original file
used a CDATA section to escape the markup, instead of entities to
escape the markup, I believe SAX2 does provide that information, but
you need to evaluate whether or not that really does what you want.
Besides downplaying CDATA sections, a SAX parser is going to normalize
a lot of other characters from the original file before it passes it
to you, in such a way that you really can't reproduce the original
file.

Does that help?

  -- Ken


From ndw@nwalsh.com  Thu Jan 18 08:08:05 2001
From: ndw@nwalsh.com (Norman Walsh)
Date: 18 Jan 2001 15:08:05 +0700
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011722360200.00860@localhost.localdomain>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
 <0101171357420F.00889@localhost.localdomain>
 <200101170740.f0H7era01202@mira.informatik.hu-berlin.de>
 <01011722360200.00860@localhost.localdomain>
Message-ID: <87k87t8hrm.fsf@nwalsh.com>

/ matt <matt@virtualspectator.com> was heard to say:
| On Wed, 17 Jan 2001, Martin v. Loewis wrote:
[...]
| > I understand you are not interested in parsing the document; if you
| > build a DOM tree, parsing of the document will happen as a side
| > effect. You cannot avoid this: this is the only way to get a DOM tree
| > from a document. So while you are not interested in the parsing, you
| > should accept that it is done.
| 
| This is where I see the extra step that is necessary, so tell me if
| I am on the right track.

I'm not trying to be pedantic, it just looks that way :-)

| A CDATA section that contains xml will be translated by a parser

A CDATA section cannot contain XML. It contains text, with a
particular form of escaping.

| into a text node that is still valid by virtue of the character
| references that it places in place of characters such as "<"
| ... i.e. &lt;, and that for example if they wrote some naff xml in
| an input , eg "&&<name><<" this, if escaped in the original document
| by CDAT, would be translated into a text node with
| "&amp;&amp;&lt;name>&lt;&lt;".

I think about this in a different way. Parsing a document that contains
<![CDATA[&&<name><<]]> produces an XML information set that includes
a text node that contains the Unicode characters 

  "&" "&" "<" "n" "a" "m" "e" ">" "<" "<"

These characters are not escaped in any way.

If the processor subsequently has reason to serialize the text node
in question, it may use any (or all) of the following mechanisms to
do so:

1. CDATA sections
2. The predefined entities &lt; and &amp;
3. Using numeric character references, &#60; and &#38; (in either
   decimal or hex).

If the document is known to have additional entity declarations associated
with it, these entities may also be used (for example, &gt;).

|  Now if that CDATA was supposed to be
| xml as well, but was necessarily hidden for a while so that
| validation could be performed further along a processing chain, then
| I also need to write a processor to replace the character
| references, in which case I could possibly define <!ENTITY> s for
| such a translation, so that the parser would see < instead of &lt;

There's no easy means to "unescape" these characters in an XML
processor. You can do it with Python, or some other non-XML string
processing language, and you could do it with XSLT using
disable-output-escaping (in some limited circumstances).

| many people who pick up a document and modify it and put it back.

Assuming I haven't made any typos, the following serializations of a
text node:

  <![CDATA[&&<name><<]]>
  &amp;&amp;&lt;name>&lt;&lt;
  &amp;&amp;&lt;n&#97;me>&lt;&lt;
  <![CDATA[&&]]>&lt;name><![CDATA[<<]]>

are indistinguishable to an XML processor. It *doesn't matter* what
escaping mechanism you use, unless you are including non-XML
processors.  If you're using non-XML processors, you may care about
the escaping, but XML isn't designed to help you with that problem.
(And you may care about other things that XML can't help you with,
like the serialization order of attributes.)

| Just the above, one wants to take the CDATA at some point and treat it as
| either an xml document on its own, or just part of the current xml document. 
| The CDATA simply being used to escape sections that could possibly break
| validation at earlier points, eg on a server, where there may be no chance of
| handling bad xml sections, but that at a later point, eg some client
| application, then an exception can be handled nicely, in which case the CDATA
| section can now be safely interpreted.  This is where I see I need reverse
| translation, and simply cannot directly parse what use to be a CDATA section.

Don't do that. I'm serious. You don't say exactly what problem you're
trying to solve, but the solution you're outlining is ugly and
fragile. (IMHO, naturally.)

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Life is a great bundle of little
http://nwalsh.com/            | things.--Oliver Wendell Holmes


From matt@virtualspectator.com  Thu Jan 18 12:20:52 2001
From: matt@virtualspectator.com (matt)
Date: Fri, 19 Jan 2001 01:20:52 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <x7vgrdsqkm.fsf@bitsko.slc.ut.us>
References: <01011810040608.00856@localhost.localdomain> <200101172157.f0HLvIS01251@mira.informatik.hu-berlin.de> <x7vgrdsqkm.fsf@bitsko.slc.ut.us>
Message-ID: <01011901332205.00859@localhost.localdomain>

On Thu, 18 Jan 2001, Ken MacLeod wrote:
> Matt,
> 
> If I understand this thread correctly, it's the common "how do I pass
> XML inside XML" question.

sort of ... but that will answer it too.

> 
> CDATA sections are not relevant to this question.  These two XML
> fragments are equivalent for all practical purposes:
> 
>   <my-tag><[CDATA[Some <tags> &amp; &entities; inside XML]]></my-tag>
> 
>   <my-tag>Some &lt;tags> &amp;amp; &amp;entities; inside XML</my-tag>
> 
> In both cases your application will see:
> 
>   startElement()  with element name 'my-tag'
>   characters()    with data "Some <tags> &amp; &entities; inside XML"
>   endElement()    with element name 'my-tag'
> 

Yes, yes, that is what I have been trying to say.  CDATA just lets it remain
human readable in the original document.  But once through a DOM implementation
and all that is gone, you get the second option back out.  Which is fine w.r.t
parsing down the line, but not much fun when perusing modified documents.


> 
> That the data "is" XML is also not relevant to this question, it could
> be any type of data that contains markup characters.

Yes, I also include program fragments sometimes ..... so that's another good
example.

> 
> If you want to "do something with the XML" inside the XML, the easiest
> way is to use another instance of a parser to parse the string as XML.
> 

Yep, I mentioned that in about my second email, that some "other" process will
be the thing that reads this data and "possibly" validating it if it indeed
needs to.


> If you are interested in preserving the fact that the original file
> used a CDATA section to escape the markup, instead of entities to
> escape the markup, I believe SAX2 does provide that information, but
> you need to evaluate whether or not that really does what you want.
> Besides downplaying CDATA sections, a SAX parser is going to normalize
> a lot of other characters from the original file before it passes it
> to you, in such a way that you really can't reproduce the original
> file.

Yes, I found that both fortunate and unfortunate.  I now see that if I want my
data to remain clean in the sense I can still look at it an read it with some
ease, then I need to write my own reverse-translation method and then rewrap
those text data nodes with CDATA tags again, and save that document.


> 
> Does that help?

Yes, very much so, it means I WAS on the right track, and that it IS normal to
want to put xml or xml like data within an xml document and not have it parsed
for well-formedness.  Maybe I am a rare exception where my translated CDATA,
i.e. in 'entity references' just looks such a nightmare to read through. 
Keeping the original characters speeds debugging of contained data immensely.


> 
>   -- Ken
> 

thanks
regards
Matt


> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
-- 


From matt@virtualspectator.com  Thu Jan 18 09:27:46 2001
From: matt@virtualspectator.com (matt)
Date: Thu, 18 Jan 2001 22:27:46 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <87k87t8hrm.fsf@nwalsh.com>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com>
Message-ID: <01011823353400.00859@localhost.localdomain>

... comments throughout ...


On Thu, 18 Jan 2001, Norman Walsh wrote:
> / matt <matt@virtualspectator.com> was heard to say:
> | On Wed, 17 Jan 2001, Martin v. Loewis wrote:
> [...]
> | > I understand you are not interested in parsing the document; if you
> | > build a DOM tree, parsing of the document will happen as a side
> | > effect. You cannot avoid this: this is the only way to get a DOM tree
> | > from a document. So while you are not interested in the parsing, you
> | > should accept that it is done.
> | 
> | This is where I see the extra step that is necessary, so tell me if
> | I am on the right track.
> 
> I'm not trying to be pedantic, it just looks that way :-)
> 
> | A CDATA section that contains xml will be translated by a parser
> 
> A CDATA section cannot contain XML. It contains text, with a
> particular form of escaping.

Ok, so now I am being pedantic, but this is good, I'm getting a clearer idea of
xml usage, my entry to xml has been recent and only from the building side of
documents, but now that I have to process them heavily it's nice to reason out
these things.

>From what I am seeing it seems CDATA can hold anything it wants, within the
constraints of the character encoding set.  Say I formed my own language that
happend to use things like "<" very often, then CDATA seems to give me and
"initial" way to write this in a plain, raw form, without translating it to
entity references first.  This is nice, since your new language section within
the xml document is still human readable.  It won't matter which way you go
from the point of the parser, because, for example, expat will recognize it as
character data by virtue of the CDATA escaping, or by the alternative
replacement of all xml markup in that section by entity references.

There is no way around the fact that CDATA allows you to write xml, programming
code, ..... whatever you want inside CDATA.  The parser will NOT try to parse
it.  For all I care, I could have encoded it with BASE64 ..... I don't need it
to be parsed as part of the document.


> 
> | into a text node that is still valid by virtue of the character
> | references that it places in place of characters such as "<"
> | ... i.e. &lt;, and that for example if they wrote some naff xml in
> | an input , eg "&&<name><<" this, if escaped in the original document
> | by CDAT, would be translated into a text node with
> | "&amp;&amp;&lt;name>&lt;&lt;".
> 
> I think about this in a different way. Parsing a document that contains
> <![CDATA[&&<name><<]]> produces an XML information set that includes
> a text node that contains the Unicode characters 
> 
>   "&" "&" "<" "n" "a" "m" "e" ">" "<" "<"
> 
> These characters are not escaped in any way.

Nope, not after they have been parsed, but they certainly were when they were
part of the CDATA section in the original document.  As the specification says,
they are used to ESCAPE blocks of text containing characters which would
otherwise be recognized as markup.  More on this below ....


> 
> If the processor subsequently has reason to serialize the text node
> in question, it may use any (or all) of the following mechanisms to
> do so:
> 
> 1. CDATA sections
> 2. The predefined entities &lt; and &amp;
> 3. Using numeric character references, &#60; and &#38; (in either
>    decimal or hex).
> 
> If the document is known to have additional entity declarations associated
> with it, these entities may also be used (for example, &gt;).
> 
> |  Now if that CDATA was supposed to be
> | xml as well, but was necessarily hidden for a while so that
> | validation could be performed further along a processing chain, then
> | I also need to write a processor to replace the character
> | references, in which case I could possibly define <!ENTITY> s for
> | such a translation, so that the parser would see < instead of &lt;
> 
> There's no easy means to "unescape" these characters in an XML
> processor. You can do it with Python, or some other non-XML string
> processing language, and you could do it with XSLT using
> disable-output-escaping (in some limited circumstances).
> 
> | many people who pick up a document and modify it and put it back.
> 
> Assuming I haven't made any typos, the following serializations of a
> text node:
> 
>   <![CDATA[&&<name><<]]>
>   &amp;&amp;&lt;name>&lt;&lt;
>   &amp;&amp;&lt;n&#97;me>&lt;&lt;
>   <![CDATA[&&]]>&lt;name><![CDATA[<<]]>
> 
> are indistinguishable to an XML processor. 

yes, I realize that.

>It *doesn't matter* what
> escaping mechanism you use, unless you are including non-XML
> processors.  If you're using non-XML processors, you may care about
> the escaping, but XML isn't designed to help you with that problem.
> (And you may care about other things that XML can't help you with,
> like the serialization order of attributes.)
> 
> | Just the above, one wants to take the CDATA at some point and treat it as
> | either an xml document on its own, or just part of the current xml document. 
> | The CDATA simply being used to escape sections that could possibly break
> | validation at earlier points, eg on a server, where there may be no chance of
> | handling bad xml sections, but that at a later point, eg some client
> | application, then an exception can be handled nicely, in which case the CDATA
> | section can now be safely interpreted.  This is where I see I need reverse
> | translation, and simply cannot directly parse what use to be a CDATA section.
> 
> Don't do that. I'm serious. You don't say exactly what problem you're
> trying to solve, but the solution you're outlining is ugly and
> fragile. (IMHO, naturally.)

No it's not.  If I put base64 encoded gzip compressed versions of the same
"escaped xml fragments" that I want to hide, then that would seem to make you
happy.  These xml documents are a transport, and when a transpot is interpreted
then certain tags may mean do something with the character data of this node. 
All seems pretty normal to me.  For example, say one wants to transport html. 
Now html is usually really ugly in that it is hardly ever well formed xml. 
Escaping with CDATA it is an easy way to hide that, and giving that data to an
html renderer some time later would be fine.  Being in CDATA, it is never
parsed for "well formedness".

Of course now I understand that a DOM implementation will remove CDATA tags and
replace all character data between them with entity references where
necessary.  If this is then persisted to disk and later parsed with an xml
handler, then the real characters will come back out again in the character
stream for the text node.  So that is fine too, I get back what I put in, and
who cares whether it was xml, or someones program code.

So the conclusion is that CDATA is just a useless feature if you are
parsing it into a DOM tree.  All it gives you is a free way of translating
markup to entity references.  That is nice in that sense, but not so nice that
you have now rendered your previously escaped sections as not very human
readable anymore.  And this can be a problem.  If someone complains that, for
example, their message, which was transported via some transport xml, looked
weird, and all that you had was the raw transport packets on your server, then
if things are still wrapped in nice CDATA tags then you can easily look
through it and find the improper formatting in the message.  However, if the
message has been translated into entity references, then forget it, you may as
well be looking at binary in a hex editor in some instances. 

regards
Matt


> 
>                                         Be seeing you,
>                                           norm
> 
> -- 
> Norman Walsh <ndw@nwalsh.com> | Life is a great bundle of little
> http://nwalsh.com/            | things.--Oliver Wendell Holmes


From jday@csihq.com  Thu Jan 18 16:26:30 2001
From: jday@csihq.com (John Day)
Date: Thu, 18 Jan 2001 11:26:30 -0500
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011823353400.00859@localhost.localdomain>
References: <87k87t8hrm.fsf@nwalsh.com>
 <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
 <01011722360200.00860@localhost.localdomain>
 <87k87t8hrm.fsf@nwalsh.com>
Message-ID: <4.3.1.0.20010118112124.00cf3810@mail.csihq.com>

--=====================_56340670==_.ALT
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 10:27 PM 1/18/01 +1300, matt wrote:
>weird, and all that you had was the raw transport packets on your server, then
>if things are still wrapped in nice CDATA tags then you can easily look
>through it and find the improper formatting in the message.  However, if the

Matt,

I think most of your problem is caused by viewing CDATA as a kind of markup 
tag. It's not. Your problem is easily solved by inventing some real XML tag 
to wrap around your 'encoded' data, e.g.

<html>  {HTML-encoded-as-CDATA-or-whatever} </html>

Then you won't care how the html is handled but you can still extract all 
of the precisely because it's marked up by 'real' tags.

John Day
Staff Scientist
Computer Science Innovations
--=====================_56340670==_.ALT
Content-Type: text/html; charset="us-ascii"

<html>
At 10:27 PM 1/18/01 +1300, matt wrote:<br>
<blockquote type=cite cite>weird, and all that you had was the raw
transport packets on your server, then<br>
if things are still wrapped in <b>nice CDATA tags</b> then you can easily
look<br>
through it and find the improper formatting in the message.&nbsp;
However, if the</blockquote><br>
Matt,<br>
<br>
I think most of your problem is caused by viewing CDATA as a kind of
markup tag. It's not. Your problem is easily solved by inventing some
real XML tag to wrap around your 'encoded' data, e.g.<br>
<br>
&lt;html&gt;&nbsp; {HTML-encoded-as-CDATA-or-whatever} 
&lt;/html&gt;<br>
<br>
Then you won't care how the html is handled but you can still extract all
of the precisely because it's marked up by 'real' tags.<br>
<br>
John Day<br>
Staff Scientist<br>
Computer Science Innovations</html>

--=====================_56340670==_.ALT--


From ndw@nwalsh.com  Thu Jan 18 16:54:38 2001
From: ndw@nwalsh.com (Norman Walsh)
Date: 18 Jan 2001 23:54:38 +0700
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011823353400.00859@localhost.localdomain>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net>
 <01011722360200.00860@localhost.localdomain>
 <87k87t8hrm.fsf@nwalsh.com>
 <01011823353400.00859@localhost.localdomain>
Message-ID: <87ae8on6lt.fsf@nwalsh.com>

/ matt <matt@virtualspectator.com> was heard to say:
| happend to use things like "<" very often, then CDATA seems to give me and
| "initial" way to write this in a plain, raw form, without translating it to
| entity references first.

In the interest of technical accuracy, I'll point out that there's nothing
that says a processor is not allowed to use CDATA to escape text. (It might
be an interesting switch on a serializer: use CDATA for any text node that
contains more than 5% entity references or something...)

| > Don't do that. I'm serious. You don't say exactly what problem you're
| > trying to solve, but the solution you're outlining is ugly and
| > fragile. (IMHO, naturally.)
| 
| No it's not.  If I put base64 encoded gzip compressed versions of the same
| "escaped xml fragments" that I want to hide, then that would seem to make you
| happy.  These xml documents are a transport, and when a transpot is interpreted
| then certain tags may mean do something with the character data of this node. 
| All seems pretty normal to me.

Ok, perhaps I overstated the case. I should have said something like "in
most cases that's going to be ugly and fragile".

XML isn't particularly good at wrapping up other chunks of XML. Using
CDATA sections is dangerous if there's any chance that the text you're
wrapping up might contain "]]>". For example, if one of the documents
that you're wrapping up has its own CDATA section.

| through it and find the improper formatting in the message.  However, if the
| message has been translated into entity references, then forget it, you may as
| well be looking at binary in a hex editor in some instances. 

Yes. That's a problem. Maybe you need that special-purpose serializer
I alluded to above.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | It is not impossibilities which fill us
http://nwalsh.com/            | with the deepest despair, but
                              | possibilities which we have failed to
                              | realize.--Robert Mallet


From iron@mso.oz.net  Thu Jan 18 16:54:32 2001
From: iron@mso.oz.net (Mike Orr)
Date: Thu, 18 Jan 2001 08:54:32 -0800
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011823353400.00859@localhost.localdomain>; from matt@virtualspectator.com on Thu, Jan 18, 2001 at 10:27:46PM +1300
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011722360200.00860@localhost.localdomain> <87k87t8hrm.fsf@nwalsh.com> <01011823353400.00859@localhost.localdomain>
Message-ID: <20010118085431.A15316@mso.oz.net>

On Thu, Jan 18, 2001 at 10:27:46PM +1300, matt wrote:
> For example, say one wants to transport html. 
> Now html is usually really ugly in that it is hardly ever well formed xml. 
> Escaping with CDATA it is an easy way to hide that, and giving that data to an
> html renderer some time later would be fine.  Being in CDATA, it is never
> parsed for "well formedness".

I was just about to suggest looking at it this way.  If you have a set
of records and a certain tag contains HTML, which you don't want to 
un-CDATA-ize because the (human) editor doesn't want to see or type
&lt;H1&gt; .  

Three other questions.  Are there certain tags that will always be CDATA,
or does it differ randomly from document to document?  Do you care
whether your application changes the witespace outside that CDATA
section, making an "equivalent" document?  Or do you want the
indentation and all to remain exactly as it is?

If you know that a certain tag should always be CDATA, and you're
willing to settle for an "equivalent" document otherwise, then maybe
it doesn't matter that the parser normalizes CDATA on input, 
because you can write it out manually and convert that tag body to CDATA.

If the CDATA sections will be coming in at random and you must leave
the document formatted exactly as it is (minus whatever changes your
application is supposed to be making to it), then perhaps you need a
lower-level parser than full XML.  Perhaps then you'll want to consider
modifying one of the existing XML parser classes or the sgmllib parser
to fit your needs.

-- 
-Mike (Iron) Orr, iron@mso.oz.net  (if mail problems: mso@jimpick.com)
   http://mso.oz.net/     English * Esperanto * Russkiy * Deutsch * Espan~ol


From ndw@nwalsh.com  Thu Jan 18 16:58:58 2001
From: ndw@nwalsh.com (Norman Walsh)
Date: 18 Jan 2001 23:58:58 +0700
Subject: Fwd: Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011812115302.00886@localhost.localdomain>
References: <01011812115302.00886@localhost.localdomain>
Message-ID: <8766jcn6el.fsf@nwalsh.com>

/ matt <matt@virtualspectator.com> was heard to say:
| <?xml version='1.0' encoding='ISO-8859-1'?>
| <text_20001222_154201>
|   <body><![CDATA[some text and possibly some markup <name><<, but we don't
| want to validate this yet]]>   </body>
| </text_20001222_154201>
| 
| looks like markup inside CDATA to me ....  I think you actually mean
| "unescaped" character data does not contain markup, eg : &lt; is certainly not
| markup.

Yes, it looks like markup to you because you're a human being. At
least, I think you are. Maybe you're just an NSA machine that passes
the turing test, I dunno. Then again, maybe that's all I am, so
nevermind. It does not look like markup to the XML processor.

| what would you say to someone wanting to let other people put html formatting
| in text node data, but knowing that html is often not written as valid xml,
| then escaping it is a safe bet ....

I see your point, but I warned you that we were in danger of pedantry. :-)

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Do not seek to follow in the footsteps
http://nwalsh.com/            | of men of old; seek what they
                              | sought.--Matsuo Basho


From matt@virtualspectator.com  Thu Jan 18 20:15:13 2001
From: matt@virtualspectator.com (matt)
Date: Fri, 19 Jan 2001 09:15:13 +1300
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <4.3.1.0.20010118112124.00cf3810@mail.csihq.com>
References: <87k87t8hrm.fsf@nwalsh.com> <4.3.1.0.20010118112124.00cf3810@mail.csihq.com>
Message-ID: <01011909164601.00874@localhost.localdomain>

On Fri, 19 Jan 2001, John Day wrote:
> 
> At 10:27 PM 1/18/01 +1300, matt wrote:
> >weird, and all that you had was the raw transport packets on your server, then
> >if things are still wrapped in nice CDATA tags then you can easily look
> >through it and find the improper formatting in the message.  However, if the
> 
> Matt,
> 
> I think most of your problem is caused by viewing CDATA as a kind of markup 
> tag. It's not. Your problem is easily solved by inventing some real XML tag 
> to wrap around your 'encoded' data, e.g.
> 
> <html>  {HTML-encoded-as-CDATA-or-whatever} </html>
> 
> Then you won't care how the html is handled but you can still extract all 
> of the precisely because it's marked up by 'real' tags.


I do that already .... I usually wrap all messages with <message> ....
</message>.  I certainly don't use CDATA as an identifier.  Any DOM
implementation that would allow me to do that would be wrong in doing so.

> 
> John Day
> Staff Scientist
> Computer Science Innovations

----------------------------------------
Content-Type: text/html; name="unnamed"
Content-Transfer-Encoding: 7bit
Content-Description: 
----------------------------------------


From matt@virtualspectator.com  Thu Jan 18 20:17:46 2001
From: matt@virtualspectator.com (matt)
Date: Fri, 19 Jan 2001 09:17:46 +1300
Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <20010118085431.A15316@mso.oz.net>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net>
Message-ID: <01011909403002.00874@localhost.localdomain>

On Fri, 19 Jan 2001, Mike Orr wrote:
> On Thu, Jan 18, 2001 at 10:27:46PM +1300, matt wrote:
> > For example, say one wants to transport html. 
> > Now html is usually really ugly in that it is hardly ever well formed xml. 
> > Escaping with CDATA it is an easy way to hide that, and giving that data to an
> > html renderer some time later would be fine.  Being in CDATA, it is never
> > parsed for "well formedness".
> 
> I was just about to suggest looking at it this way.  If you have a set
> of records and a certain tag contains HTML, which you don't want to 
> un-CDATA-ize because the (human) editor doesn't want to see or type
> &lt;H1&gt; .  

Exactly.


> 
> Three other questions.  Are there certain tags that will always be CDATA,
> or does it differ randomly from document to document?  Do you care
> whether your application changes the witespace outside that CDATA
> section, making an "equivalent" document?  Or do you want the
> indentation and all to remain exactly as it is?

Hmm, no, in my most common case, whitespace is not an issue, eg: html being
transported, but in some instances keeping the correct whitespace within
messages may be useful .... eg : when it is program code, where this could be
a) critical to preserving scope, or b) again the human readability factor.  In
any case the message is between message tags, eg : <message id='5335HJSK3'> ,
so it doesn't matter if there are numerous CDATA sections within it, which
would be the case if one was to append more data to the message instead of
doing a node replace.

> 
> If you know that a certain tag should always be CDATA, and you're
> willing to settle for an "equivalent" document otherwise, then maybe
> it doesn't matter that the parser normalizes CDATA on input, 
> because you can write it out manually and convert that tag body to CDATA.

That is what I currently do, and it works really well, and preserves my sanity
server side.

> 
> If the CDATA sections will be coming in at random and you must leave
> the document formatted exactly as it is (minus whatever changes your
> application is supposed to be making to it), then perhaps you need a
> lower-level parser than full XML.  Perhaps then you'll want to consider
> modifying one of the existing XML parser classes or the sgmllib parser
> to fit your needs.

That would defeat my intention of using xml from the point of view that it is a
standard.    What you raise though is interesting, if I go full circle and
readdress my original question that "CDATA sections are still not handled" then
I was just wondering that since one gets CDATA begin and end events while
parsing a document that contains CDATA section, then why couldn't the DOM
document still represent it as a CDATA section internally?  as it was when
first created.  Furthermore, a parser such as expat will preserve the original
form of the characters that have been escaped, and even convert them if they
happened to be in entity references.  It seems to me that the handling of CDATA
sits at the level of it's base class which is a text node and that the CDATA
sections are only used to say "don't validate the following, it is ALL
character data"..

> 
> -- 
> -Mike (Iron) Orr, iron@mso.oz.net  (if mail problems: mso@jimpick.com)
>    http://mso.oz.net/     English * Esperanto * Russkiy * Deutsch * Espan~ol
-- 

regards
Matt


From matt@virtualspectator.com  Thu Jan 18 20:47:57 2001
From: matt@virtualspectator.com (matt)
Date: Fri, 19 Jan 2001 09:47:57 +1300
Subject: thread 2) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <87ae8on6lt.fsf@nwalsh.com>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011823353400.00859@localhost.localdomain> <87ae8on6lt.fsf@nwalsh.com>
Message-ID: <01011909515003.00874@localhost.localdomain>

On Fri, 19 Jan 2001, Norman Walsh wrote:
> / matt <matt@virtualspectator.com> was heard to say:
> | happend to use things like "<" very often, then CDATA seems to give me and
> | "initial" way to write this in a plain, raw form, without translating it to
> | entity references first.
> 
> In the interest of technical accuracy, I'll point out that there's nothing
> that says a processor is not allowed to use CDATA to escape text. (It might
> be an interesting switch on a serializer: use CDATA for any text node that
> contains more than 5% entity references or something...)
> 
> | > Don't do that. I'm serious. You don't say exactly what problem you're
> | > trying to solve, but the solution you're outlining is ugly and
> | > fragile. (IMHO, naturally.)
> | 
> | No it's not.  If I put base64 encoded gzip compressed versions of the same
> | "escaped xml fragments" that I want to hide, then that would seem to make you
> | happy.  These xml documents are a transport, and when a transpot is interpreted
> | then certain tags may mean do something with the character data of this node. 
> | All seems pretty normal to me.
> 
> Ok, perhaps I overstated the case. I should have said something like "in
> most cases that's going to be ugly and fragile".
> 
> XML isn't particularly good at wrapping up other chunks of XML. Using
> CDATA sections is dangerous if there's any chance that the text you're
> wrapping up might contain "]]>". For example, if one of the documents
> that you're wrapping up has its own CDATA section.

Is it perhaps cleaner to use xlinks for the message nodes?  I haven't used these
yet, but I gather it would seperate transport from message.  Though to maintain
performance a server would have to parse it first to see what to transport in
the same network connection.  


> 
> | through it and find the improper formatting in the message.  However, if the
> | message has been translated into entity references, then forget it, you may as
> | well be looking at binary in a hex editor in some instances. 
> 
> Yes. That's a problem. Maybe you need that special-purpose serializer
> I alluded to above.
> 
>                                         Be seeing you,
>                                           norm
> 
> -- 
> Norman Walsh <ndw@nwalsh.com> | It is not impossibilities which fill us
> http://nwalsh.com/            | with the deepest despair, but
>                               | possibilities which we have failed to
>                               | realize.--Robert Mallet
-- 


From matt@virtualspectator.com  Thu Jan 18 20:53:29 2001
From: matt@virtualspectator.com (matt)
Date: Fri, 19 Jan 2001 09:53:29 +1300
Subject: thread 3) Re: Fwd: Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <8766jcn6el.fsf@nwalsh.com>
References: <01011812115302.00886@localhost.localdomain> <8766jcn6el.fsf@nwalsh.com>
Message-ID: <01011909565004.00874@localhost.localdomain>

On Fri, 19 Jan 2001, Norman Walsh wrote:
> / matt <matt@virtualspectator.com> was heard to say:
> | <?xml version='1.0' encoding='ISO-8859-1'?>
> | <text_20001222_154201>
> |   <body><![CDATA[some text and possibly some markup <name><<, but we don't
> | want to validate this yet]]>   </body>
> | </text_20001222_154201>
> | 
> | looks like markup inside CDATA to me ....  I think you actually mean
> | "unescaped" character data does not contain markup, eg : &lt; is certainly not
> | markup.
> 
> Yes, it looks like markup to you because you're a human being. 

That is exactly my purpose.

>At east, I think you are. Maybe you're just an NSA machine that passes
> the turing test, I dunno. Then again, maybe that's all I am, so
> nevermind. It does not look like markup to the XML processor.

> | what would you say to someone wanting to let other people put html formatting
> | in text node data, but knowing that html is often not written as valid xml,
> | then escaping it is a safe bet ....
> 
> I see your point, but I warned you that we were in danger of pedantry. :-)

The last thing I want is for the xml to become a mess, so pedantry is good. 
Perhaps it will force me to keep these messages seperate from the transport and
instead just place references within the document.

> 
>                                         Be seeing you,
>                                           norm
> 
> -- 
> Norman Walsh <ndw@nwalsh.com> | Do not seek to follow in the footsteps
> http://nwalsh.com/            | of men of old; seek what they
>                               | sought.--Matsuo Basho
-- 


From iron@mso.oz.net  Thu Jan 18 21:11:29 2001
From: iron@mso.oz.net (Mike Orr)
Date: Thu, 18 Jan 2001 13:11:29 -0800
Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011909403002.00874@localhost.localdomain>; from matt@virtualspectator.com on Fri, Jan 19, 2001 at 09:17:46AM +1300
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net> <01011909403002.00874@localhost.localdomain>
Message-ID: <20010118131129.A17157@mso.oz.net>

On Fri, Jan 19, 2001 at 09:17:46AM +1300, matt wrote:
> > Perhaps then you'll want to consider
> > modifying one of the existing XML parser classes or the sgmllib parser
> > to fit your needs.
> 
> That would defeat my intention of using xml from the point of view that it is
> a standard.

The purpose of XML is to provide data interchange between diverse
applications.  If your application somehow produces a valid XML file,
that should be enough.  Of course, if your program may be expanded
later by XML programmers, you'll want something familiar enough they
can work with it.  But trying to contort your application to work with
the standard xml modules if they weren't desgined for that job may not
be the answer.

> why couldn't the DOM
> document still represent it as a CDATA section internally?  

Do you necessarily need DOM?

-- 
-Mike (Iron) Orr, iron@mso.oz.net  (if mail problems: mso@jimpick.com)
   http://mso.oz.net/     English * Esperanto * Russkiy * Deutsch * Espan~ol


From sales@spiderline.com  Thu Jan 18 19:51:27 2001
From: sales@spiderline.com (Spiderline)
Date: Thu, 18 Jan 2001 19:51:27
Subject: [XML-SIG] Your Site Search Engine
Message-ID: <20010119015212.1DA35F128@mail.python.org>

Make your Website Searchable in Minutes!

With Spiderline(SM), you can add a search engine to your website without 
any additional software or special maintenance.  Visitors can search 
through the pages of your website to quickly find useful information.

- No ads or design limitations of any kind.  Your design can be 
customized to look exactly like your website!

- Comprehensive query reports - Know what visitors are searching for.

- No software or special maintenance required.  Register today and add 
working search options to your site immediately.


HOW DOES IT WORK?

Follow a one-step registration process and Spiderline will crawl your 
website and make an index from the pages it finds.  When a visitor 
submits a search query on your website, information on relevant pages 
is retrieved from the index and displayed on customized pages.

Your customers will click on a link from the search results page and 
return to your site withought knowing they left! 


REGISTER FOR FREE TODAY, by visiting  http://www.spiderline.com/


 - The Spiderline Team
 - http://www.spiderline.com/


----------------------------------------------------------------------
Note:  If you reply to this message with the subject "REMOVE", 
we will be sure you are not part of future mailings.


From jeremy.kloth@fourthought.com  Fri Jan 19 03:09:41 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Thu, 18 Jan 2001 20:09:41 -0700
Subject: [XML-SIG] Announcing PyXPath 1.2
References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de>
Message-ID: <3A67AFF5.F0895522@fourthought.com>

"Martin v. Loewis" wrote:
> module XPath{
> 
>   typedef wstring DOMString;
> 
>   const unsigned short ABSOLUTE_LOCATION_PATH = 1;
>   const unsigned short ABBREVIATED_ABSOLUTE_LOCATION_PATH = 2;
>   const unsigned short RELATIVE_LOCATION_PATH = 3;
>   const unsigned short ABBREVIATED_RELATIVE_LOCATION_PATH = 4;
>   const unsigned short STEP_EXPR = 5; // STEP would conflict with Step in case
>   const unsigned short NODE_TEST = 6;
>   const unsigned short NAME_TEST = 7;

>   const unsigned short BINARY_EXPR = 8;
Since there are two basic types of binary expressions, I suggest
splitting this into a BOOLEAN_EXPR and NUMERIC_EXPR.  They do offer
quite different functionality.

>   const unsigned short UNARY_EXPR = 9;
This would be considered a NUMERIC_EXPR.

>   const unsigned short PATH_EXPR = 10;
>   const unsigned short ABBREVIATED_PATH_EXPR = 11; // filter '//' path
>   const unsigned short FILTER_EXPR = 12;
>   const unsigned short VARIABLE_REFERENCE = 13;
>   const unsigned short LITERAL_EXPR = 14;
>   const unsigned short NUMBER_EXPR = 15;
>   const unsigned short FUNCTION_CALL = 16;
> 
>   interface Expr{
>     readonly attribute unsigned short exprType;
>   };
> 
>   interface AbsoluteLocationPath;
>   interface AbbreviatedAbsoluteLocationPath;
>   interface RelativeLocationPath;
>   interface Step;
>   interface AxisSpecifier;
>   interface NodeTest;
>   typedef sequence<Expr> PredicateList, ExprList;
>   interface NameTest;
>   interface BinaryExpr;
>   interface UnaryExpr;
>   interface UnionExpr;
>   interface PathExpr;
>   interface FilterExpr;
>   interface VariableReference;
>   interface Literal;
>   interface Number;
>   interface FunctionCall;
> 
>   interface ExprFactory{
>     AbsoluteLocationPath createAbsoluteLocationPath(in RelativeLocationPath p);
>     AbsoluteLocationPath createAbbreviatedAbsoluteLocationPath(in RelativeLocationPath p);
>     RelativeLocationPath createRelativeLocationPath(in RelativeLocationPath left,
>                                                     in Step right);
>     RelativeLocationPath createAbbreviatedRelativeLocationPath(in RelativeLocationPath left,
>                                                                in Step right);
> 
>     Step createStep(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates);
>     // . is represented as self::node(); .. as parent::node()
>     Step createAbbreviatedStep(in boolean dotdot); // false for .; true for ..
>     // An omitted axisname is created as CHILD; @ is created as ATTRIBUTE
> 
>     AxisSpecifier createAxisSpecifier(in unsigned short name);
> 
>     NodeTest createNodeTest(in unsigned short type);
>     NameTest createNameTest(in DOMString prefix, in DOMString localName);
> 

>     BinaryExpr createBinaryExpr(in unsigned short operator, in Expr left, in Expr right);
> 
>     UnaryExpr createUnaryExpr(in Expr exp);
> 

See above for Binary and Unary expressions.

>     PathExpr createPathExpr(in Expr filter, in Expr path);
>     // filter '//' path
>     PathExpr createAbbreviatedPathExpr(in Expr filter, in Expr path);
> 
>     FilterExpr createFilterExpr(in Expr filter, in Expr predicate);
> 
>     // the name must still contain the leading $
>     VariableReference createVariableReference(in DOMString name);

name can be a qualified name.  use prefix, localname

> 
>     Literal createLiteral(in DOMString literal);
>     Number createNumber(in DOMString value);
>     FunctionCall createFunctionCall(in DOMString name, in ExprList args);

See createVariableReference

>   };
> 
>   interface Parser{
>     Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step
>   };

This should probably be parseExpression, since the Expr is the primary
construct.  (See XPath spec - sect 1)

> 
>   interface AbsoluteLocationPath:Expr{
>     /* '/' relative-opt, or '//' relative */
>     readonly attribute Expr relative; // step or relative path

relative may be null  (case of '/')

>   };
> 
>   interface RelativeLocationPath:Expr{
>     readonly attribute Expr left; // step or relative path
>     readonly attribute Step right;
>   };
> 
>   interface Step:Expr{
>     readonly attribute AxisSpecifier axis;
>     readonly attribute NodeTest test;
>     readonly attribute PredicateList predicates;
>   };
> 
>   const unsigned short ANCESTOR = 1;
>   const unsigned short ANCESTOR_OR_SELF = 2;
>   const unsigned short _ATTRIBUTE = 3; // attribute is a keyword
>   const unsigned short CHILD = 4;
>   const unsigned short DESCENDANT = 5;
>   const unsigned short DESCENDANT_OR_SELF = 6;
>   const unsigned short FOLLOWING = 7;
>   const unsigned short FOLLOWING_SIBLING = 8;
>   const unsigned short NAMESPACE = 9;
>   const unsigned short PARENT = 10;
>   const unsigned short PRECEDING = 11;
>   const unsigned short PRECEDING_SIBLING = 12;
>   const unsigned short SELF = 13;

Maybe suffix the types with '_AXIS'?

>   interface AxisSpecifier:Expr{
>     readonly attribute unsigned short name;

Should we use axisType just for consistancy?

>   };
> 
>   const unsigned short COMMENT = 1;
>   const unsigned short TEXT = 2;
>   const unsigned short PROCESSING_INSTRUCTION = 3;
>   const unsigned short NODE = 4;

suffix of '_NODE_TEST' ??

>   interface NodeTest:Expr{
>     readonly attribute unsigned short test;

testType ??

>     readonly attribute DOMString literal; // only for PROCESSING_INSTRUCTION
>   };
> 
>   interface NameTest:Expr{
>     readonly attribute DOMString prefix; // may be null
>     readonly attribute DOMString localName; // may be "*"
>   };
> 
>   const unsigned short BINOP_OR = 1;
>   const unsigned short BINOP_AND = 2;
>   const unsigned short BINOP_EQ = 3;
>   const unsigned short BINOP_NEQ = 4;
>   const unsigned short BINOP_LT = 5;
>   const unsigned short BINOP_GT = 6;
>   const unsigned short BINOP_LE = 7;
>   const unsigned short BINOP_GE = 8;
>   const unsigned short BINOP_PLUS = 9;
>   const unsigned short BINOP_MINUS = 10;
>   const unsigned short BINOP_TIMES = 11;
>   const unsigned short BINOP_DIV = 12;
>   const unsigned short BINOP_MOD = 13;
>   const unsigned short BINOP_UNION = 14;

possibly ??_OPERATOR as apposed to BINOP_??

>   interface BinaryExpr:Expr{
>     readonly attribute unsigned short operator;
>     readonly attribute Expr left,right;
>   };

> 
>     UnaryExpr createUnaryExpr(in Expr exp);
> 
See factory functions above.

>   interface PathExpr:Expr{
>     readonly attribute Expr filter;
>     readonly attribute Expr path;
>   };
> 
>   interface FilterExpr:Expr{
>     readonly attribute Expr filter;
>     readonly attribute Expr predicate;
>   };
> 
>   interface VariableReference:Expr{
>     readonly attribute DOMString name;
>   };
> 
>   interface Literal:Expr{
>     readonly attribute DOMString value;
>   };
> 
>   interface Number:Expr{
>     readonly attribute double value;
>   };
> 
>   interface FunctionCall:Expr{
>     readonly attribute DOMString name;
>     readonly attribute ExprList args;
>   };
> 
> };
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jeremy.kloth@fourthought.com  Fri Jan 19 03:18:28 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Thu, 18 Jan 2001 20:18:28 -0700
Subject: [XML-SIG] Announcing PyXPath 1.2
References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de>
Message-ID: <3A67B204.4CC0CABD@fourthought.com>

"Martin v. Loewis" wrote:
> 
> The API is IDL based, which is meant in the same way as in the DOM:
> there is a (yet to be specified) mapping to Python, which roughly
> works that way:
> - global constants are defined in the module xml.xpath.
> - DOMString means Unicode objects, although normal strings should
>   be accepted were possible.
> - attributes are accessed as attributes; _get_ accessor functions
>   are optional.

Should the constants be defined where they are used?

The expression types in the Expr interface,
axis specifier types in the AxisSpecifier interface,
node test types in the NodeTest interface,

This would be similar to node types in Node, filter types in NodeFilter.

A benefit from this would be helping to avoiding circular imports.

xml.xpath -> (the parser) -> ExprFactory -> (constants from xml.xpath)
Sure imports could be done in the functions, but top level imports offer
some speed improvements.  Slight, but every little bit helps.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Anthony Baxter <anthony@interlink.com.au>  Fri Jan 19 09:43:13 2001
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Fri, 19 Jan 2001 09:43:13 +0000
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: Message from Norman Walsh <ndw@nwalsh.com>
 of "18 Jan 2001 23:54:38 +0700." <87ae8on6lt.fsf@nwalsh.com>
Message-ID: <200101190943.UAA04290@mbuna.arbhome.com.au>

>>> Norman Walsh wrote
> In the interest of technical accuracy, I'll point out that there's nothing
> that says a processor is not allowed to use CDATA to escape text. (It might
> be an interesting switch on a serializer: use CDATA for any text node that
> contains more than 5% entity references or something...)

That was something that occurred to me when reading this thread - aside
from the file size issue, it's also going to be faster to write out and
read in the documents. Ok, this is assuming a fairly odd slab of text,
but hey, look at the number of tags in your average web page today - 
including slabs of them as text is going to hurt.

The readability is surely only an issue if you're editing the XML
directly in vi, or whatever, I can't see an XML-aware editor leaving
the text as entities...

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From paulp@ActiveState.com  Fri Jan 19 04:01:59 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 18 Jan 2001 20:01:59 -0800
Subject: [XML-SIG] Re: adding the XML to 2.0 to be a mistake?
References: <mailman.979611383.15529.python-list@python.org> <3d8zocuqrd.fsf@kronos.cnri.reston.va.us>
Message-ID: <3A67BC37.213BAACB@ActiveState.com>

Andrew Kuchling wrote:
> 
> John Schmitt <jschmitt@vmlabs.com> writes:
> > Pardon the ignorance, but where is the mistake?  Is it in adding PyXML to
> > 2.0 or is it the way it was done?  Is there no development strategy that
> > makes this less of a burden?  If a previous release of PyXML had been added
> > to 2.0, would you still consider it a mistake?
> 
> Duplicating complex code in two different projects, so that they have
> to be kept in sync manually at the cost of time and effort, is the
> mistake.  

I agree with this. I don't think that minidom should have an existence
independent of Python. The PyXML minidom should be phased out. The only
reason it was not is because some people still use it with older
versions of Python. But that will always be a problem when code is moved
from an "extension" environment to the standard library.

> Another one is tying a fast-moving project such as PyXML to
> the slower releases of Python; Python 2.0 was released on October 16,
> and there have been two PyXML releases (0.6.2 and 0.6.3) since then.

I don't know what you mean by saying that PyXML is "tied to Python."
PyXML depends on Python, just as PIL and NumPy do.

 Paul Prescod


From martin@mira.cs.tu-berlin.de  Fri Jan 19 09:04:13 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 19 Jan 2001 10:04:13 +0100
Subject: [XML-SIG] Announcing PyXPath 1.2
In-Reply-To: <3A67B204.4CC0CABD@fourthought.com> (message from Jeremy Kloth on
 Thu, 18 Jan 2001 20:18:28 -0700)
References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> <3A67B204.4CC0CABD@fourthought.com>
Message-ID: <200101190904.f0J94DV00897@mira.informatik.hu-berlin.de>

Hi Jeremy,

Thanks for your comments. I'll study them in detail later.

> Should the constants be defined where they are used?

I think the DOM is proof that this is not desirable. If constants are
defined in an interface, applications have to know the names of the
interface implementation classes. In the case of the DOM, we just
solved this by providing xml.dom.Node in the package, which *just*
contains the node type constants. That, in turn, required to rename
4DOM's Node.py to FtNode.

> The expression types in the Expr interface,
> axis specifier types in the AxisSpecifier interface,
> node test types in the NodeTest interface,

To get a true separation of interface and implementation, the base
package would need to provide xml.xpath.Expr.RELATIVE_LOCATION_PATH -
how else are applications supposed to refer to these constants?

> A benefit from this would be helping to avoiding circular imports.
> 
> xml.xpath -> (the parser) -> ExprFactory -> (constants from xml.xpath)
> Sure imports could be done in the functions, but top level imports offer
> some speed improvements.  Slight, but every little bit helps.

I don't see how it would remove circular imports: the constants would
still live in xml.xpath.__init__.py. Also, circular imports are not a
problem per se: __init__ just needs to guarantee that the constants
(and anything else provided to implementations) is defined before
anything originating from an implementation is imported.

Perhaps it would be even better *not* to provide
xml.xpath.{parser|factory}, but to require the user to explicitly
specify the implementation to use:

from xml.xpath.FtFactory import factory
from xml.xpath.PyXPath import parser

or, with the "pick an arbitrary one" API

from xml.xpath.anyfactory import factory
from xml.xpath.anyparser import parser

Regards,
Martin


From larsga@garshol.priv.no  Fri Jan 19 09:15:48 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 19 Jan 2001 10:15:48 +0100
Subject: [XML-SIG] CDATA sections still not handled
In-Reply-To: <87ae8on6lt.fsf@nwalsh.com>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> 	<01011722360200.00860@localhost.localdomain> 	<87k87t8hrm.fsf@nwalsh.com> 	<01011823353400.00859@localhost.localdomain> <87ae8on6lt.fsf@nwalsh.com>
Message-ID: <m3y9w7aomz.fsf@lambda.garshol.priv.no>

* Norman Walsh
| 
| In the interest of technical accuracy, I'll point out that there's
| nothing that says a processor is not allowed to use CDATA to escape
| text. (It might be an interesting switch on a serializer: use CDATA
| for any text node that contains more than 5% entity references or
| something...)

I think giving serializers a switch similar to that used by the XSLT
serializers would be a good idea: a list of elements, the contents of
which will be wrapped in CDATA sections.

--Lars M.


From larsga@garshol.priv.no  Fri Jan 19 09:27:02 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 19 Jan 2001 10:27:02 +0100
Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01011909403002.00874@localhost.localdomain>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011823353400.00859@localhost.localdomain> <20010118085431.A15316@mso.oz.net> <01011909403002.00874@localhost.localdomain>
Message-ID: <m3wvbrao49.fsf@lambda.garshol.priv.no>

* matt@virtualspectator.com
| 
| [...] since one gets CDATA begin and end events while parsing a
| document that contains CDATA section, then why couldn't the DOM
| document still represent it as a CDATA section internally?  

Because it would be a real pain, and would most likely break lots of
applications. If text nodes can suddenly be represented as both text
and cdata nodes, applications that only test for text nodes (and I
assume this is the majority) will be silently losing data.

Furthermore, the normalize method, which many applications use to
ensure that there are no adjacent text nodes in the DOM tree stops
working in the presence of cdata nodes, since these are not
normalized. 

| Furthermore, a parser such as expat will preserve the original form
| of the characters that have been escaped, and even convert them if
| they happened to be in entity references.  

What are you trying to say here?

| It seems to me that the handling of CDATA sits at the level of it's
| base class which is a text node and that the CDATA sections are only
| used to say "don't validate the following, it is ALL character
| data"..

CDATA sections and ordinary 'text'[1] are just two ways to represent
the same thing, and applications should not care which of the two ways
have been used. The distinction between these two ways of representing
character data is information about how the document was put together,
as opposed to information about what is in the document. 

In other words, this issue is really the same as the issues 'white
space in tags is lost', 'I can't tell what character data came from
numeric character references' and so on.

I think your current way of handling it, to control what is
represented as CDATA in the serializer, is the correct way to do it.
One should consider very carefully before adding information of this
sort to the document tree (or event stream), because there is such an
unbelievably awful lot of it that it needs to be handled with the
greatest of care.

I have been thinking lately that it would be an interesting experiment
to make an XML parser with an interface specialized for representing
ALL the lexical information about a document. I guess this could be
done by passing along with every event the list of tokens that made up
that event.

--Lars M.

[1] Correct terminology is really to call it character data. Text, as
    defined by XML, is both markup and character data.


From akuchlin@mems-exchange.org  Fri Jan 19 18:31:26 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 19 Jan 2001 13:31:26 -0500
Subject: [XML-SIG] Re: adding the XML to 2.0 to be a mistake?
In-Reply-To: <3A67BC37.213BAACB@ActiveState.com>; from paulp@activestate.com on Thu, Jan 18, 2001 at 08:01:59PM -0800
References: <mailman.979611383.15529.python-list@python.org> <3d8zocuqrd.fsf@kronos.cnri.reston.va.us> <3A67BC37.213BAACB@ActiveState.com>
Message-ID: <20010119133126.A875@kronos.cnri.reston.va.us>

On Thu, Jan 18, 2001 at 08:01:59PM -0800, Paul Prescod wrote:
>I agree with this. I don't think that minidom should have an existence
>independent of Python. The PyXML minidom should be phased out. The only

That won't work, because the _xmlplus package overrides the xml/
package completely, and therefore has to keep copies of everything in
Python's package, so we're stuck with the duplication.

>I don't know what you mean by saying that PyXML is "tied to Python."
>PyXML depends on Python, just as PIL and NumPy do.

I should have been clearer and said that effectively its release
schedule is tied to Python.  

--amk


From matt@virtualspectator.com  Fri Jan 19 20:19:33 2001
From: matt@virtualspectator.com (matt)
Date: Sat, 20 Jan 2001 09:19:33 +1300
Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <m3wvbrao49.fsf@lambda.garshol.priv.no>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011909403002.00874@localhost.localdomain> <m3wvbrao49.fsf@lambda.garshol.priv.no>
Message-ID: <01012009443202.00856@localhost.localdomain>

Sorry to keep this thread going, but now it's getting really interesting ....
and useful.


On Fri, 19 Jan 2001, Lars Marius Garshol wrote:
> * matt@virtualspectator.com
> | 
> | [...] since one gets CDATA begin and end events while parsing a
> | document that contains CDATA section, then why couldn't the DOM
> | document still represent it as a CDATA section internally?  
> 
> Because it would be a real pain, and would most likely break lots of
> applications. If text nodes can suddenly be represented as both text
> and cdata nodes, applications that only test for text nodes (and I
> assume this is the majority) will be silently losing data.

That would make either the implementation of CDATA wrong, or the way you use
it.  Text nodes are base classes of CDATA, so process that works on text nodes
will implicitly work on CDATA nodes .... which it does fortunately.  Even if
you try a type cast to assert this you should get a valid base class pointer
back .... not that python on it's face worries too much about that. 

Otherwise I am confused as to what you mean.  It seems to me anyway that
everyone has been trying to make the argument that they are one in the same,
which they are in the interpretation sense.  A parser such as expat handles the
inheritance perfectly since for a CDATA section it will give you CDATA begin
and end events while passing the data itself into character data handlers.

I don't see things breaking anywhere.


> 
> Furthermore, the normalize method, which many applications use to
> ensure that there are no adjacent text nodes in the DOM tree stops
> working in the presence of cdata nodes, since these are not
> normalized. 

Perhaps the specification for normalize on a nodes sub-tree is wrong, or, you
expect it to always give you a nice single replacement node.  I think it is
equally wrong to flatly remove all CDATA nodes without giving the user a handle
to keep them.  They serve a useful purpose, and it seems bizarre that the DOM
document builder just throws away the events that tell us we have come across a
CDATA node.  Perhaps it should sit at the level of normalize itself .... pass
an extra optional argument that translates CDATA nodes and therefore includes
them in the merge?


> 
> | Furthermore, a parser such as expat will preserve the original form
> | of the characters that have been escaped, and even convert them if
> | they happened to be in entity references.  
> 
> What are you trying to say here?

That it doesn't matter which way you represent any "hidden" markup eg as &lt;
or as < within a CDATA section, expat will give '<' to the character data
handler.   Which is useful.

> 
> | It seems to me that the handling of CDATA sits at the level of it's
> | base class which is a text node and that the CDATA sections are only
> | used to say "don't validate the following, it is ALL character
> | data"..
> 
> CDATA sections and ordinary 'text'[1] are just two ways to represent
> the same thing, and applications should not care which of the two ways
> have been used. The distinction between these two ways of representing
> character data is information about how the document was put together,
> as opposed to information about what is in the document. 
> 
> In other words, this issue is really the same as the issues 'white
> space in tags is lost', 'I can't tell what character data came from
> numeric character references' and so on.
> 
> I think your current way of handling it, to control what is
> represented as CDATA in the serializer, is the correct way to do it.
> One should consider very carefully before adding information of this
> sort to the document tree (or event stream), because there is such an
> unbelievably awful lot of it that it needs to be handled with the
> greatest of care.
> 

But when you build a CDATA section in a DOM document you get a CDATA section
object, which I assume, should inherit a Text node object.

> I have been thinking lately that it would be an interesting experiment
> to make an XML parser with an interface specialized for representing
> ALL the lexical information about a document. I guess this could be
> done by passing along with every event the list of tokens that made up
> that event.

What sort of representation?

> 
> --Lars M.
> 
> [1] Correct terminology is really to call it character data. Text, as
>     defined by XML, is both markup and character data.
>

yes .... but since Text nodes inherit character data I just left that alone
......


regards
Matt

 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig


From dan.rolander@marriott.com  Sat Jan 20 04:39:03 2001
From: dan.rolander@marriott.com (Dan Rolander)
Date: Fri, 19 Jan 2001 23:39:03 -0500
Subject: [XML-SIG] Using Installer with PyXML
Message-ID: <016801c0829a$ea7bf2e0$11260340@yin>

David Bolen has been a tremendous help to me figuring out how to use Gordon
McMillan's Installer 20_3i to create standalone EXEs for Win32 with Python
2.0 and PyXML 0.6.3. We've discovered a couple of things though that I'd
like to point out and perhaps get some explanations on.

In order for Installer to properly discover the required PyXML files, we had
to rename the _xmlplus directory to xml and rename the core xml directory to
something else. According to David...

"The problem here has to be the way that the xml library tree is replacing
itself with the _xmlplus tree from the later PyXML distribution.  While
runtime re-assigns xml to _xmlplus in the __init__ for xml, the import
system used by the installation package can't track that, so it still looks
for the actual module tree it loaded from the Python distribution beneath
the name xml."

So the question is, will this adversely impact normal Python operation, and
is there a better way?

The other question I have is... Why are there two different pyexpat.pyd
files, one as part of the core 2.0 distribution (at only 25 kb) and the
other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb). I
haven't been able to get the large one to work using Installer, but the
small core file works fine. What is the difference?

Thanks to anybody who can help here,
Dan


From martin@mira.cs.tu-berlin.de  Sat Jan 20 09:55:54 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 20 Jan 2001 10:55:54 +0100
Subject: [XML-SIG] Using Installer with PyXML
In-Reply-To: <016801c0829a$ea7bf2e0$11260340@yin> (dan.rolander@marriott.com)
References: <016801c0829a$ea7bf2e0$11260340@yin>
Message-ID: <200101200955.f0K9tsL00802@mira.informatik.hu-berlin.de>

> "The problem here has to be the way that the xml library tree is replacing
> itself with the _xmlplus tree from the later PyXML distribution.  While
> runtime re-assigns xml to _xmlplus in the __init__ for xml, the import
> system used by the installation package can't track that, so it still looks
> for the actual module tree it loaded from the Python distribution beneath
> the name xml."

I'm not sure I understand the problem. Will the packager refuse (or
forget) to package the xml package, or will it, at runtime, fail to
load it?

If it manages to package both xml and _xmlplus: when loading xml, will
it execute xml/__init__.py? In there, there is an import of _xmlplus.
Will that succeed? If so, what happens to the lines

            import sys
            sys.modules[__name__] = _xmlplus

Will __name__ have a value of "xml"? Will the assignment succeed?

Now, suppose we do

from xml.sax import sax2exts

In normal Python, this will look for sys.modules["xml"] and start from
there. Are you saying the installer does not work that way, or that
even if it starts from there, it still can't figure out to load
_xmlplus.sax?

> So the question is, will this adversely impact normal Python operation, and
> is there a better way?

No, replacing the Python xml package completely with _xmlplus will
work just fine - except perhaps for the pyexpat difference.

> The other question I have is... Why are there two different pyexpat.pyd
> files, one as part of the core 2.0 distribution (at only 25 kb) and the
> other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb). I
> haven't been able to get the large one to work using Installer, but the
> small core file works fine. What is the difference?

There are two differences: the one from PyXML contains a number of bug
fixes which are not in Python 2. In addition, it contains a literal
copy of the expat libraries, so that the expat DLLs in the Python core
should not be needed anymore.

When you say "get the large one to work", what exactly have you tried,
and how exactly did it fail?

Regards,
Martin


From dan.rolander@marriott.com  Sat Jan 20 18:48:58 2001
From: dan.rolander@marriott.com (Dan Rolander)
Date: Sat, 20 Jan 2001 13:48:58 -0500
Subject: [XML-SIG] Re: Using Installer with PyXML
Message-ID: <02c401c08311$a66e8f00$11260340@yin>

Hi Martin,

Thanks for responding. Here are the specifics--

When I use a script with the statements:

    from xml.sax import saxexts, saxlib, saxutils

and

    parser = saxexts.make_parser("xml.sax.drivers.drv_pyexpat")

the packager (Gordon McMillan's Installer) is able to find xml.sax.saxutils,
but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually
reside in _xmlplus.sax.  I can force builder.py to include the entire
_xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section of
the .cfg file, but the exe still fails because it is looking for xml.sax.*:

    ImportError: cannot import name xml.sax.saxexts

When I rename _xmlplus to xml and then run builder again without specifying
any additional packages, the EXE fails because it can't find an available
parser:

  File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in
make_parser
    xml.sax._exceptions.SAXReaderNotAvailable: No parsers found

If I manually import the entire PyXML tree (now named 'xml') by adding a
packages=xml line to the .cfg file, I get a little farther but now the exe
isn't able to find pyexpat.

    ImportError: cannot import name xml.parsers.pyexpat

I then try to manually import pyexpat by adding xml.parsers.pyexpat to the
misc line in the [MYCOLLECT] section, but finder.py is not able to find it:

      File "D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py",
line 121, in identify
      ValueError: xml.parsers.pyexpat.pyd not found

If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the core
\DLLs version of pyexpat.pyd is found and put into the dist directory. Now
when the exe is run I get a Windows error stating that the xmlparse.dll
couldn't be located.

I add xmlparse.dll to the misc= line and then I get an error stating that
the xmltok.dll couldn't be found.

I add xmltok.dll to the misc= line and voila! it works!

I then start to wonder why the exe couldn't find xml.parsers.pyexpat.pyd if
I imported the entire xml tree. I study the builder.log some more and
realize that it only imported .py files and not .pyd files! I tried using
directories= instead of packages= and got the same results.

I re-read Gordon's documentation several times and tried different
combinations of .cfg statements but nothing I tried resulted in a good
import of xml.parsers.pyexpat.

I then replaced the core version of pyexpat.pyd in \DLLs with the PyXML
version and found that I could build a good exe without having to manually
include the xmlparse.dll and xmltok.dll. So my final .cfg file looks like
this:

[MYCOLLECT]
type= COLLECT
name= dist_testsax
bindepends= testsax.py
misc= MYSTANDALONE, pyexpat.pyd
debug = 0
excludes = PyWinTypes20.dll, win32api

[MYSTANDALONE]
type= STANDALONE
name= testsax.exe
script= testsax.py
zlib = APPZLIB
userunw = 0
support = 0
debug = 0

[APPZLIB]
name= testsax.pyz
dependencies= testsax.py
excludes= dospath, posixpath, macpath
directories=xml


Now, for another example...

Another test script has the statement:

    from xml.parsers import pyexpat

and

    parser = pyexpat.ParserCreate()

I start with the _xmlplus directory renamed to xml, because I know that's
necessary, and I build a new standalone installation. This time the pyexpat
file is imported to the dist directory as xml.parsers.pyexpat.pyd but the
exe won't import it:

    ImportError: cannot import name xml.parsers.pyexpat

Renaming the file to pyexpat.pyd does not help.

I add packages=xml to the .cfg file and I still have the same problem.

The only fix I can figure out is to change the import statement to:

    import pyexpat

and that works.


So in summary, my tests lead me to conclude the following...

To use Gordon McMillan's Installer to create standalone executables of
scripts that import modules from the PyXML package, the following must be
done (depending on what modules are actually being used):

1.  Replace the core xml directory with the _xmlplus directory, by renaming
_xmlplus to xml.

2.  Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the
<python_root>\DLLs directory.

3.  If pyexpat is needed, either explicitly import it in your script, or
manually include it in the standalone installation by adding an entry to the
misc line in the COLLECT section of the builder .cfg file.

4.  If importing from xml.sax, manually import the entire PyXML tree (source
files only) by specifying either packages=xml or directories=xml in the PYZ
section of the builder .cfg file.

(I have not even tried using DOM yet, so I'm sure there are more issues
there to be found.)

I am by no means an expert on this, so if anybody understands this better
and can provide simpler workarounds I would appreciate hearing it.

Thanks, and I hope this helps someone!
Dan

----- Original Message -----
From: "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
To: <dan.rolander@marriott.com>
Cc: <gmcm@hypernet.com>; <python-list@python.org>; <xml-sig@python.org>
Sent: Saturday, January 20, 2001 4:55 AM
Subject: Re: [XML-SIG] Using Installer with PyXML


> > "The problem here has to be the way that the xml library tree is
replacing
> > itself with the _xmlplus tree from the later PyXML distribution.  While
> > runtime re-assigns xml to _xmlplus in the __init__ for xml, the import
> > system used by the installation package can't track that, so it still
looks
> > for the actual module tree it loaded from the Python distribution
beneath
> > the name xml."
>
> I'm not sure I understand the problem. Will the packager refuse (or
> forget) to package the xml package, or will it, at runtime, fail to
> load it?
>
> If it manages to package both xml and _xmlplus: when loading xml, will
> it execute xml/__init__.py? In there, there is an import of _xmlplus.
> Will that succeed? If so, what happens to the lines
>
>             import sys
>             sys.modules[__name__] = _xmlplus
>
> Will __name__ have a value of "xml"? Will the assignment succeed?
>
> Now, suppose we do
>
> from xml.sax import sax2exts
>
> In normal Python, this will look for sys.modules["xml"] and start from
> there. Are you saying the installer does not work that way, or that
> even if it starts from there, it still can't figure out to load
> _xmlplus.sax?
>
> > So the question is, will this adversely impact normal Python operation,
and
> > is there a better way?
>
> No, replacing the Python xml package completely with _xmlplus will
> work just fine - except perhaps for the pyexpat difference.
>
> > The other question I have is... Why are there two different pyexpat.pyd
> > files, one as part of the core 2.0 distribution (at only 25 kb) and the
> > other as part of the PyXML distribution in _xmlplus.parsers (at 124 kb).
I
> > haven't been able to get the large one to work using Installer, but the
> > small core file works fine. What is the difference?
>
> There are two differences: the one from PyXML contains a number of bug
> fixes which are not in Python 2. In addition, it contains a literal
> copy of the expat libraries, so that the expat DLLs in the Python core
> should not be needed anymore.
>
> When you say "get the large one to work", what exactly have you tried,
> and how exactly did it fail?
>
> Regards,
> Martin
>


From martin@mira.cs.tu-berlin.de  Sat Jan 20 22:25:47 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 20 Jan 2001 23:25:47 +0100
Subject: [XML-SIG] Re: Using Installer with PyXML
In-Reply-To: <02c401c08311$a66e8f00$11260340@yin> (dan.rolander@marriott.com)
References: <02c401c08311$a66e8f00$11260340@yin>
Message-ID: <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de>

>     from xml.sax import saxexts, saxlib, saxutils
[...]
> the packager (Gordon McMillan's Installer) is able to find xml.sax.saxutils,
> but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually
> reside in _xmlplus.sax.  

I was going to claim this to be a bug in the installer, but it now
rather seems like an operator error: The installer has now way of
knowing that it ought to load the _xmlplus.sax.saxexts into the
distribution, since there is no import statement for it.

So announcing the full _xmlplus package to it is the right thing to
do.

> I can force builder.py to include the entire
> _xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section of
> the .cfg file, but the exe still fails because it is looking for xml.sax.*:
> 
>     ImportError: cannot import name xml.sax.saxexts

It's not clear what is causing that. It could be a bug in the
installer, or it could be the distribution contains no pyexpat.pyd. In
that case, you'll have to explicitly request inclusion of pyexpat.pyd.

It would be good to check what files are actually included.

> When I rename _xmlplus to xml and then run builder again without specifying
> any additional packages, the EXE fails because it can't find an available
> parser:
> 
>   File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in
> make_parser
>     xml.sax._exceptions.SAXReaderNotAvailable: No parsers found

No surprise. The installer is looking at import statements, but there
are no import statements for xml.sax.drivers.*; instead, they are
imported by calling __import__ for a computed string. So again, that
is an operator error: everything imported "by magic" must be announced
explicitly to such a packager.

> If I manually import the entire PyXML tree (now named 'xml') by adding a
> packages=xml line to the .cfg file, I get a little farther but now the exe
> isn't able to find pyexpat.
> 
>     ImportError: cannot import name xml.parsers.pyexpat
> 
> I then try to manually import pyexpat by adding xml.parsers.pyexpat to the
> misc line in the [MYCOLLECT] section, but finder.py is not able to find it:
> 
>       File "D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py",
> line 121, in identify
>       ValueError: xml.parsers.pyexpat.pyd not found

You did not say *how* you specified it - it might be that Installer
mistook your command as trying to import a module named "pyd" from a
package named "pyexpat" - that is not available.

> If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the core
> \DLLs version of pyexpat.pyd is found and put into the dist directory. Now
> when the exe is run I get a Windows error stating that the xmlparse.dll
> couldn't be located.
> 
> I add xmlparse.dll to the misc= line and then I get an error stating that
> the xmltok.dll couldn't be found.
> 
> I add xmltok.dll to the misc= line and voila! it works!

When you use the pyexpat from PyXML, the difference should be that
xmlparse.dll and xmltok.dll are not required.

> I then replaced the core version of pyexpat.pyd in \DLLs with the
> PyXML version and found that I could build a good exe without having
> to manually include the xmlparse.dll and xmltok.dll.

Not only do you not need to include them manually - they are not
needed at all. Care to write a small howto document for the XML topic
guide?

> I start with the _xmlplus directory renamed to xml, because I know that's
> necessary, and I build a new standalone installation. This time the pyexpat
> file is imported to the dist directory as xml.parsers.pyexpat.pyd but the
> exe won't import it:
> 
>     ImportError: cannot import name xml.parsers.pyexpat

Do you have a traceback for that? All applications should import
xml.parsers.expat, which should have 

from pyexpat import *

so there should be no request to load xml.parsers.pyexpat. Older PyXML
versions had such code, but it should have been wrapped with catching
and ImportError, which then should fall back to load pyexpat
unqualified.

> The only fix I can figure out is to change the import statement to:
> 
>     import pyexpat
> 
> and that works.

As I said, the real solution is to write 

  from xml.parsers import expat

or, if you need to keep the pyexpat name,

 from xml.parsers import expat as pyexpat

> 1.  Replace the core xml directory with the _xmlplus directory, by renaming
> _xmlplus to xml.

I'm not entirely sure *why* this is needed, but it certainly can't hurt.

> 2.  Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the
> <python_root>\DLLs directory.

That is a good idea, yes.

> 3.  If pyexpat is needed, either explicitly import it in your script, or
> manually include it in the standalone installation by adding an entry to the
> misc line in the COLLECT section of the builder .cfg file.

pyexpat should always be included in PyXML applications, so that is
also fine.

> 4.  If importing from xml.sax, manually import the entire PyXML tree (source
> files only) by specifying either packages=xml or directories=xml in the PYZ
> section of the builder .cfg file.

I would guess the same applies when importing DOM stuff - the DOM
readers also use make_parser at some point.

Regards,
Martin


From dan.rolander@marriott.com  Sat Jan 20 23:25:21 2001
From: dan.rolander@marriott.com (Dan Rolander)
Date: Sat, 20 Jan 2001 18:25:21 -0500
Subject: [XML-SIG] Re: Using Installer with PyXML
References: <02c401c08311$a66e8f00$11260340@yin> <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de>
Message-ID: <037201c08338$42ce08a0$11260340@yin>

Thank you for your assistance Martin.  Although your analysis of the problem
("operator error") is close, I would probably more correctly identify it as
operator ignorance.  I'm still trying to figure out how to effectively use
PyXML and build standalone executables with it.  Since the Installer seems
to play by its own rules when it comes to imports, it is especially
challenging.

I have not found the xml-sig documentation, or the python library reference,
to be too helpful for someone new to xml processing, so I bought Sean
McGrath's book "XML Processing with Python" and have found that to be *very*
helpful. But his examples, which I was testing, use references to pyexpat.

I tested your suggestion of using "from xml.parsers import expat" vs.
"import pyexpat" and that works fine, but I'm not sure what the benefit of
using that form is.

I haven't quite grok'd all of this yet, but once I do I would have no
problem with writing a mini-howto.

Thanks again,
Dan

----- Original Message -----
From: "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
To: <dan.rolander@marriott.com>
Cc: <python-list@python.org>; <xml-sig@python.org>; <gmcm@hypernet.com>
Sent: Saturday, January 20, 2001 5:25 PM
Subject: Re: [XML-SIG] Re: Using Installer with PyXML


>     from xml.sax import saxexts, saxlib, saxutils
[...]
> the packager (Gordon McMillan's Installer) is able to find
xml.sax.saxutils,
> but is not able to find xml.sax.saxexts or xml.sax.saxlib which actually
> reside in _xmlplus.sax.

I was going to claim this to be a bug in the installer, but it now
rather seems like an operator error: The installer has now way of
knowing that it ought to load the _xmlplus.sax.saxexts into the
distribution, since there is no import statement for it.

So announcing the full _xmlplus package to it is the right thing to
do.

> I can force builder.py to include the entire
> _xmlplus tree by adding a packages=_xmlplus line to the [APPZLIB] section
of
> the .cfg file, but the exe still fails because it is looking for
xml.sax.*:
>
>     ImportError: cannot import name xml.sax.saxexts

It's not clear what is causing that. It could be a bug in the
installer, or it could be the distribution contains no pyexpat.pyd. In
that case, you'll have to explicitly request inclusion of pyexpat.pyd.

It would be good to check what files are actually included.

> When I rename _xmlplus to xml and then run builder again without
specifying
> any additional packages, the EXE fails because it can't find an available
> parser:
>
>   File "c:\program files\python20\_xmlplus\sax\saxexts.py", line 77, in
> make_parser
>     xml.sax._exceptions.SAXReaderNotAvailable: No parsers found

No surprise. The installer is looking at import statements, but there
are no import statements for xml.sax.drivers.*; instead, they are
imported by calling __import__ for a computed string. So again, that
is an operator error: everything imported "by magic" must be announced
explicitly to such a packager.

> If I manually import the entire PyXML tree (now named 'xml') by adding a
> packages=xml line to the .cfg file, I get a little farther but now the exe
> isn't able to find pyexpat.
>
>     ImportError: cannot import name xml.parsers.pyexpat
>
> I then try to manually import pyexpat by adding xml.parsers.pyexpat to the
> misc line in the [MYCOLLECT] section, but finder.py is not able to find
it:
>
>       File
"D:\DOCUME~1\Dan\Software\Python\INSTAL~1\MEInc\Dist\finder.py",
> line 121, in identify
>       ValueError: xml.parsers.pyexpat.pyd not found

You did not say *how* you specified it - it might be that Installer
mistook your command as trying to import a module named "pyd" from a
package named "pyexpat" - that is not available.

> If I changed the .cfg line in [MYCOLLECT] to misc=pyexpat.pyd then the
core
> \DLLs version of pyexpat.pyd is found and put into the dist directory. Now
> when the exe is run I get a Windows error stating that the xmlparse.dll
> couldn't be located.
>
> I add xmlparse.dll to the misc= line and then I get an error stating that
> the xmltok.dll couldn't be found.
>
> I add xmltok.dll to the misc= line and voila! it works!

When you use the pyexpat from PyXML, the difference should be that
xmlparse.dll and xmltok.dll are not required.

> I then replaced the core version of pyexpat.pyd in \DLLs with the
> PyXML version and found that I could build a good exe without having
> to manually include the xmlparse.dll and xmltok.dll.

Not only do you not need to include them manually - they are not
needed at all. Care to write a small howto document for the XML topic
guide?

> I start with the _xmlplus directory renamed to xml, because I know that's
> necessary, and I build a new standalone installation. This time the
pyexpat
> file is imported to the dist directory as xml.parsers.pyexpat.pyd but the
> exe won't import it:
>
>     ImportError: cannot import name xml.parsers.pyexpat

Do you have a traceback for that? All applications should import
xml.parsers.expat, which should have

from pyexpat import *

so there should be no request to load xml.parsers.pyexpat. Older PyXML
versions had such code, but it should have been wrapped with catching
and ImportError, which then should fall back to load pyexpat
unqualified.

> The only fix I can figure out is to change the import statement to:
>
>     import pyexpat
>
> and that works.

As I said, the real solution is to write

  from xml.parsers import expat

or, if you need to keep the pyexpat name,

 from xml.parsers import expat as pyexpat

> 1.  Replace the core xml directory with the _xmlplus directory, by
renaming
> _xmlplus to xml.

I'm not entirely sure *why* this is needed, but it certainly can't hurt.

> 2.  Copy the PyXML pyexpat.pyd file from the xml.parsers directory to the
> <python_root>\DLLs directory.

That is a good idea, yes.

> 3.  If pyexpat is needed, either explicitly import it in your script, or
> manually include it in the standalone installation by adding an entry to
the
> misc line in the COLLECT section of the builder .cfg file.

pyexpat should always be included in PyXML applications, so that is
also fine.

> 4.  If importing from xml.sax, manually import the entire PyXML tree
(source
> files only) by specifying either packages=xml or directories=xml in the
PYZ
> section of the builder .cfg file.

I would guess the same applies when importing DOM stuff - the DOM
readers also use make_parser at some point.

Regards,
Martin


From ole@discus.anu.edu.au  Sun Jan 21 08:28:03 2001
From: ole@discus.anu.edu.au (Ole NIELSEN)
Date: Sun, 21 Jan 2001 19:28:03 +1100 (EST)
Subject: [XML-SIG] Problem Installing PyXML
Message-ID: <Pine.GSO.3.93.1010121192423.15368A-100000@discus>

Dear xml-sig specialist

I have tried to install PyXML on two different machines with the following
problem: "ImportError: cannot import name Extension"
We have Python 1.5.2 installed. I have enclosed a transcript of the
installation below.

Would upgrading to a newer version of Python solve the problem ?

Thanks you very much in advance.

Ole Nielsen

TRANSCRIPT:
-------------------------------------------------
capricorn: ole/PyXML-0.6.3/:python setup.py build
Traceback (innermost last):
  File "setup.py", line 8, in ?
    from distutils.core import setup, Extension
ImportError: cannot import name Extension


capricorn: ole/PyXML-0.6.3/:python
Python 1.5.2 (#2, Aug 16 2000, 09:31:06)  [GCC 2.95.1 19990816 (release)]
on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
-------------------------------------------------------------------------


-------------------------------------------------------------------  
Ole Moller Nielsen             | Email: Ole.Nielsen@anu.edu.au
Computer Sciences Lab, RSISE,  |-----------------------------------
Australian National University | Phone: +61 2 6125 8627 (Direct)
Canberra ACT 0200              | Phone: +61 2 6125 8644 (Secr.)
Australia                      | Fax:   +61 2 6125 8645/8651
-------------------------------------------------------------------
URL:       www.bigfoot.com/~uniomni
-------------------------------------------------------------------


From martin@mira.cs.tu-berlin.de  Sun Jan 21 08:58:40 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 21 Jan 2001 09:58:40 +0100
Subject: [XML-SIG] Problem Installing PyXML
In-Reply-To: <Pine.GSO.3.93.1010121192423.15368A-100000@discus> (message from
 Ole NIELSEN on Sun, 21 Jan 2001 19:28:03 +1100 (EST))
References: <Pine.GSO.3.93.1010121192423.15368A-100000@discus>
Message-ID: <200101210858.f0L8we501034@mira.informatik.hu-berlin.de>

> I have tried to install PyXML on two different machines with the following
> problem: "ImportError: cannot import name Extension"
> We have Python 1.5.2 installed. I have enclosed a transcript of the
> installation below.
> 
> Would upgrading to a newer version of Python solve the problem ?

That, or upgrading to distutils 1.0 (which is probably easier to
achieve).

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 21 08:56:58 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 21 Jan 2001 09:56:58 +0100
Subject: [XML-SIG] Re: Using Installer with PyXML
In-Reply-To: <037201c08338$42ce08a0$11260340@yin> (dan.rolander@marriott.com)
References: <02c401c08311$a66e8f00$11260340@yin> <200101202225.f0KMPl200861@mira.informatik.hu-berlin.de> <037201c08338$42ce08a0$11260340@yin>
Message-ID: <200101210856.f0L8uwj01030@mira.informatik.hu-berlin.de>

> I have not found the xml-sig documentation, or the python library
> reference, to be too helpful for someone new to xml processing, so I
> bought Sean McGrath's book "XML Processing with Python" and have
> found that to be *very* helpful.

I'm glad to hear this. The PyXML documentation is certainly not
targetted at people new to XML at all; it is mostly for people that
know XML, and want to learn about XML processing in Python.

BTW, did you have a look at the PyXML tutorial as well?

> But his examples, which I was testing, use references to pyexpat.

Not surprising; the wrapper module was created just before the Python
2.0 release.

> I tested your suggestion of using "from xml.parsers import expat"
> vs.  "import pyexpat" and that works fine, but I'm not sure what the
> benefit of using that form is.

To get independent from the location of the pyexpat module. If you say
"import pyexpat", and you use PyXML, you still won't get the PyXML
version of that module - this lives in xml.parsers.pyexpat.

Regards,
Martin


From larsga@garshol.priv.no  Mon Jan 22 09:27:06 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 22 Jan 2001 10:27:06 +0100
Subject: newthread 1) Re: [XML-SIG] CDATA sections still not handled
In-Reply-To: <01012009443202.00856@localhost.localdomain>
References: <E14FeHE-0004Ms-00@usw-sf-web2.sourceforge.net> <01011909403002.00874@localhost.localdomain> <m3wvbrao49.fsf@lambda.garshol.priv.no> <01012009443202.00856@localhost.localdomain>
Message-ID: <m3itn8ym1h.fsf@lambda.garshol.priv.no>

* Lars Marius Garshol
|
| Because it would be a real pain, and would most likely break lots of
| applications. If text nodes can suddenly be represented as both text
| and cdata nodes, applications that only test for text nodes (and I
| assume this is the majority) will be silently losing data.

* matt@virtualspectator.com
| 
| That would make either the implementation of CDATA wrong, or the way
| you use it. 

Well, both, actually. I think the way CDATA is handled by the DOM is
wrong, in that it pushes lexical information[1] into your face and
forces you to deal with it when in 99% of the cases you do not care at
all.

SAX and expat handle this much better, by telling you about the CDATA
without forcing you to care.  xmllib gets it very wrong.

And since no Python DOMs currently create CDATA nodes and it requires
some extra thought to handle I suspect that the great majority of DOM
applications have no code to handle CDATA nodes appearing instead of
Text nodes.

| Text nodes are base classes of CDATA, so process that works on text
| nodes will implicitly work on CDATA nodes .... 

Nope, because in most cases you will test for the type of node through
the nodeType attribute, and that has different values for CDATA and
Text.

You can also test via the isinstance function, but that would tie your
application to a specific implementation and would be a very bad idea.

| Even if you try a type cast to assert this you should get a valid
| base class pointer back .... not that python on it's face worries
| too much about that.

:-)
 
| Otherwise I am confused as to what you mean.  It seems to me anyway
| that everyone has been trying to make the argument that they are one
| in the same, which they are in the interpretation sense.  

Exactly.

| A parser such as expat handles the inheritance perfectly since for a
| CDATA section it will give you CDATA begin and end events while
| passing the data itself into character data handlers.

This is the way to handle it, yes.
 
| I don't see things breaking anywhere.

Not with expat, but with the DOM and xmllib chances are that
applications written by people who are not fully into XML and the API
they are using will break when CDATA starts appearing.
 
* Lars Marius Garshol
|
| Furthermore, the normalize method, which many applications use to
| ensure that there are no adjacent text nodes in the DOM tree stops
| working in the presence of cdata nodes, since these are not
| normalized. 
 
* matt@virtualspectator.com
|
| Perhaps the specification for normalize on a nodes sub-tree is
| wrong, or, you expect it to always give you a nice single
| replacement node.  I think it is equally wrong to flatly remove all
| CDATA nodes without giving the user a handle to keep them.  

Well, I think the whole cake should have been cut up differently.
Text nodes should have a method isCDATA that could be used to check
whether it originally was a CDATA section or not. (Note that this
requires CDATA sections to give rise to separate DOM nodes, but they
tend to do that anyway.)

Normalize would then collapse both text and CDATA, which IMHO is the
only reasonable behaviour for it anyway. It is only useful to simplify
traversal of the tree, but it doesn't achieve that if CDATA nodes are
not normalized.

Any user that cares about the CDATA/text distinction will then have to
do without normalize(), but I doubt that they will care much, and in
any case they are a very small minority.

| They serve a useful purpose, and it seems bizarre that the DOM
| document builder just throws away the events that tell us we have
| come across a CDATA node.  Perhaps it should sit at the level of
| normalize itself .... pass an extra optional argument that
| translates CDATA nodes and therefore includes them in the merge?

That is an option, but I don't really like it.  If you keep the CDATA
interface you should really have HexNumericCharacterReference and
DecimalNumericCharacterReference interfaces as well.
 
| That it doesn't matter which way you represent any "hidden" markup
| eg as &lt; or as < within a CDATA section, expat will give '<' to
| the character data handler.  Which is useful.

Uh, no, it's actually wrong, and that's probably why expat doesn't do
it either. :-)
 
* Lars Marius Garshol
|
| I have been thinking lately that it would be an interesting experiment
| to make an XML parser with an interface specialized for representing
| ALL the lexical information about a document. I guess this could be
| done by passing along with every event the list of tokens that made up
| that event.
 
* matt@virtualspectator.com
|
| What sort of representation?

Well, say that you have an event-based interface like SAX, pyexpat or
something else, and that the event for character data is

  character_data(data, raw)

where raw is a list of tokens.  So for the document

  <doc>
  &#65;
  <![CDATA[ wheee! ]]>
  Testing testing.
  </doc>

you get these calls

  character_data('\012  ', ['\012  '])
  character_data('A', ['&#65;'])
  character_data(' wheee! ', ['<!CDATA...'])
  character_data('\012  Test...', ['\012  Test...'])

while for start tags you might get something like

  start_element('doc', {...}, ['<doc', ' ', 'version', '=', '"1.0"', '>'])

--Lars M.


From jerome.marant@free.fr  Mon Jan 22 10:03:02 2001
From: jerome.marant@free.fr (Jérôme Marant)
Date: 22 Jan 2001 11:03:02 +0100
Subject: [XML-SIG] Problematic use of setupext
Message-ID: <7z4ryr52g9.fsf@amboise.ird.idealx.com>

Hi,

  I'm trying to update the package the new 0.6.3 version for Debian
  and it seems that setupext is problematic.

  Whenever I use `python setup.py clean --all', .pyc files are generated
  (__init__.pyc and install_data.pyc). What I would like is to ge rid of
  all pyc and pyo in the package but for that reason, It seems to be
  impossible. So I'll have to remove them by hand.

  Could anyone explain ?

  Thanks.

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From akuchlin@mems-exchange.org  Mon Jan 22 19:46:03 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 22 Jan 2001 14:46:03 -0500
Subject: [XML-SIG] Note on XML for 2.1
Message-ID: <E14Kmuh-0001fD-00@ute.cnri.reston.va.us>

I'm working on a "What's New in 2.1" article, and want to add
a mention of the improvements to the xml package.  Here's my proposed
text; is it accurate?  

\item The PyXML package has gone through a few releases since Python
2.0, and Python 2.1 includes an updated version of the \module{xml}
package.  Some of the noteworthy changes include support for Expat
1.2, the ability for Expat parsers to handle files in any encoding
supported by Python, and various bugfixes for SAX, DOM, and the
\module{minidom} module.

--amk


From martin@mira.cs.tu-berlin.de  Mon Jan 22 22:35:59 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 22 Jan 2001 23:35:59 +0100
Subject: [XML-SIG] Problematic use of setupext
In-Reply-To: <7z4ryr52g9.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <7z4ryr52g9.fsf@amboise.ird.idealx.com>
Message-ID: <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de>

>   Whenever I use `python setup.py clean --all', .pyc files are generated
>   (__init__.pyc and install_data.pyc). What I would like is to ge rid of
>   all pyc and pyo in the package but for that reason, It seems to be
>   impossible. So I'll have to remove them by hand.
> 
>   Could anyone explain ?

It's not that difficult to explain: setup.py does a straight import of
setupext, which results in pyc files being generated.

If you think you can fix this: patches are welcome.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Mon Jan 22 22:57:59 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 22 Jan 2001 23:57:59 +0100
Subject: [XML-SIG] Note on XML for 2.1
In-Reply-To: <E14Kmuh-0001fD-00@ute.cnri.reston.va.us> (message from Andrew
 Kuchling on Mon, 22 Jan 2001 14:46:03 -0500)
References: <E14Kmuh-0001fD-00@ute.cnri.reston.va.us>
Message-ID: <200101222257.f0MMvxh01639@mira.informatik.hu-berlin.de>

> I'm working on a "What's New in 2.1" article, and want to add
> a mention of the improvements to the xml package.  Here's my proposed
> text; is it accurate?  

It certainly is.

Thanks,
Martin


From 935551@ican.net  Tue Jan 23 07:52:23 2001
From: 935551@ican.net (Richard Anthony Hein)
Date: Tue, 23 Jan 2001 02:52:23 -0500
Subject: [XML-SIG] Newbie confused by output ...
Message-ID: <000201c08511$b250ec80$0100a8c0@k6>

Hi everyone,

I am new to Python and am trying to get a hang of the XML libraries
available.  I am having trouble finding tutorials and documentation.

When I finally found some documentation at
http://velocity.activestate.com/docs/ActivePython/lib/expat-example.html, I
tried the example for expat (actually used pyexpat and expat), and have the
following result:

>>> from xml.parsers import expat
>>> def start_element(name, attrs):
... 	print 'Start element:', name, attrs
...
>>> def end_element(name):
... 	print 'End element:', name
...
>>> def char_data(data):
... 	print 'Character data:', repr(data)
...
>>> p = pyexpat.ParserCreate()
>>> p.StartElementHandler = start_element
>>> p.EndElementHandler = end_element
>>> p.CharacterDataHandler = char_data
>>> p.Parse("""<?xml version="1.0"?>
... <parent id="top"><child1 name="Paul">Text goes here</child1>
... <child2 name="Fred">More text</child2>
... </parent>""")
Start element: parent {u'id': u'top'}
Start element: child1 {u'name': u'Paul'}
Character data: u'Text goes here'
End element: child1
Character data: u'\012'
Start element: child2 {u'name': u'Fred'}
Character data: u'More text'
End element: child2
Character data: u'\012'
End element: parent
1

So what are all of those u's doing in there, and why is there a 1 printed?
This was unexpected.

Also, perhaps you can point me towards some helpful tutorials for getting up
to speed with XML processing in Python?

TIA,

Richard Anthony Hein


From larsga@garshol.priv.no  Tue Jan 23 09:05:35 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 23 Jan 2001 10:05:35 +0100
Subject: [XML-SIG] Newbie confused by output ...
In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6>
References: <000201c08511$b250ec80$0100a8c0@k6>
Message-ID: <m3hf2qfxk0.fsf@lambda.garshol.priv.no>

* Richard Anthony Hein
| 
| When I finally found some documentation at
| http://velocity.activestate.com/docs/ActivePython/lib/expat-example.html,
| I tried the example for expat (actually used pyexpat and expat),

There is documentation in the standard library documentation on
python.org, which you can download and also browse online.

| So what are all of those u's doing in there, 

The u prefix means that the string is a Unicode string. In most cases,
this is no different from an ordinary string, except that it can
contain any Unicode character.

| and why is there a 1 printed?

The 1 is the return value of your call to Parse(), meaning that there
were no errors.

| Also, perhaps you can point me towards some helpful tutorials for
| getting up to speed with XML processing in Python?

The standard documentation is the only one I know of.

There is at least one Python XML book listed at

<URL: http://www.amk.ca/bookstore/python.html >

which may be worth looking at.

--Lars M.


From martin@mira.cs.tu-berlin.de  Tue Jan 23 09:18:49 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 23 Jan 2001 10:18:49 +0100
Subject: [XML-SIG] Newbie confused by output ...
In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6> (935551@ican.net)
References: <000201c08511$b250ec80$0100a8c0@k6>
Message-ID: <200101230918.f0N9Ink01207@mira.informatik.hu-berlin.de>

> I am new to Python and am trying to get a hang of the XML libraries
> available.  I am having trouble finding tutorials and documentation.

Please have a look at

http://pyxml.sourceforge.net/topics/

specifically

http://www.python.org/doc/howto/xml/

For reference documentation, use

http://python.sourceforge.net/devel-docs/lib/markup.html

> Character data: u'\012'
> End element: parent
> 1
> 
> So what are all of those u's doing in there, and why is there a 1 printed?
> This was unexpected.

The u indicates that this is a unicode object, not a bytestring
object.  It appears that the feature is not documented in the Python
Reference Manual, see

http://www.python.org/2.0/new-python.html#SECTION000500000000000000000

The 1 means that the Parse function returned with a value of 1, see

http://www.python.org/doc/current/tut/node4.html

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 23 09:18:49 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 23 Jan 2001 10:18:49 +0100
Subject: [XML-SIG] Newbie confused by output ...
In-Reply-To: <000201c08511$b250ec80$0100a8c0@k6> (935551@ican.net)
References: <000201c08511$b250ec80$0100a8c0@k6>
Message-ID: <200101230918.f0N9Ink01207@mira.informatik.hu-berlin.de>

> I am new to Python and am trying to get a hang of the XML libraries
> available.  I am having trouble finding tutorials and documentation.

Please have a look at

http://pyxml.sourceforge.net/topics/

specifically

http://www.python.org/doc/howto/xml/

For reference documentation, use

http://python.sourceforge.net/devel-docs/lib/markup.html

> Character data: u'\012'
> End element: parent
> 1
> 
> So what are all of those u's doing in there, and why is there a 1 printed?
> This was unexpected.

The u indicates that this is a unicode object, not a bytestring
object.  It appears that the feature is not documented in the Python
Reference Manual, see

http://www.python.org/2.0/new-python.html#SECTION000500000000000000000

The 1 means that the Parse function returned with a value of 1, see

http://www.python.org/doc/current/tut/node4.html

Regards,
Martin


From jerome.marant@free.fr  Tue Jan 23 11:23:28 2001
From: jerome.marant@free.fr (Jérôme Marant)
Date: 23 Jan 2001 12:23:28 +0100
Subject: [XML-SIG] Problematic use of setupext
In-Reply-To: "Martin v. Loewis"'s message of "Mon, 22 Jan 2001 23:35:59 +0100"
References: <7z4ryr52g9.fsf@amboise.ird.idealx.com>
 <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de>
Message-ID: <7zg0ia8qbz.fsf@amboise.ird.idealx.com>

"Martin v. Loewis" <martin@mira.cs.tu-berlin.de> writes:

> It's not that difficult to explain: setup.py does a straight import of
> setupext, which results in pyc files being generated.

  AFAIK, as long as this is how the interpreter behaves, I have no clue.
  For instance, It would be nice to specify the interpreter not to
  generate pyc files ...

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From larsga@garshol.priv.no  Tue Jan 23 12:36:43 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 23 Jan 2001 13:36:43 +0100
Subject: [XML-SIG] Development roadmap?
Message-ID: <m33deafns4.fsf@lambda.garshol.priv.no>

I think it would make sense for the XML-SIG to create a development
roadmap document that basically outlines

 - tasks that we plan to do
 - who is assigned to what task, if known
 - rough estimate for task completion, when known

I think this would make it easier for ourselves to keep track of what
is going on, make it clearer to the rest of the world what we are
doing, and also help newcomers to find out where they can chip in and
make a contribution.

This would obviously have to be a living document that is constantly
updated to reflect the current state of affairs. I'd be willing to
take on that task, which I guess would consist of writing the first
version of it and updating it when others neglect to do so.

Comments? Thoughts? Opinions?

--Lars M.


From Nicolas.Chauvat@logilab.fr  Tue Jan 23 12:38:46 2001
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Tue, 23 Jan 2001 13:38:46 +0100 (CET)
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <m33deafns4.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0101231335590.28418-100000@aries>

On 23 Jan 2001, Lars Marius Garshol wrote:

> I think this would make it easier for ourselves to keep track of what
> is going on, make it clearer to the rest of the world what we are
> doing, and also help newcomers to find out where they can chip in and
> make a contribution.
>=20
> Comments? Thoughts? Opinions?

http://www.logilab.org/pygantt/ may help for that.

1. data stored as XML.
2. python script renders data as HTML Gantt diagram.
3. If you prefer, add you own renderer using XSL or deriving the basic one.

Hope this helps, though it's probably not the format you were thinking to.

--=20
Nicolas Chauvat

http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)


From larsga@garshol.priv.no  Tue Jan 23 12:51:06 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 23 Jan 2001 13:51:06 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <Pine.LNX.4.21.0101231335590.28418-100000@aries>
References: <Pine.LNX.4.21.0101231335590.28418-100000@aries>
Message-ID: <m3zogie8jp.fsf@lambda.garshol.priv.no>

* Nicolas Chauvat
| 
| http://www.logilab.org/pygantt/ may help for that.
| [...]
| Hope this helps, though it's probably not the format you were
| thinking to.

I think that for something as simple as this it would really be
overkill.  I don't think we really need something as fancy, and as
tight, as a project plan, merely something that describes the
direction we intend to go in.

Hand-edited HTML would do just fine, I think.

--Lars M.


From uche.ogbuji@fourthought.com  Tue Jan 23 16:58:53 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 23 Jan 2001 09:58:53 -0700
Subject: [XML-SIG] Note on XML for 2.1
In-Reply-To: Message from Andrew Kuchling <akuchlin@mems-exchange.org>
 of "Mon, 22 Jan 2001 14:46:03 EST." <E14Kmuh-0001fD-00@ute.cnri.reston.va.us>
Message-ID: <200101231658.JAA03305@localhost.localdomain>

> I'm working on a "What's New in 2.1" article, and want to add
> a mention of the improvements to the xml package.  Here's my proposed
> text; is it accurate?  
> 
> \item The PyXML package has gone through a few releases since Python
> 2.0, and Python 2.1 includes an updated version of the \module{xml}
> package.  Some of the noteworthy changes include support for Expat
> 1.2, the ability for Expat parsers to handle files in any encoding
> supported by Python, and various bugfixes for SAX, DOM, and the
> \module{minidom} module.

I can't think of anything else.  Sounds good.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ken@bitsko.slc.ut.us  Tue Jan 23 23:25:59 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 23 Jan 2001 17:25:59 -0600
Subject: [XML-SIG] Development roadmap?
In-Reply-To: Lars Marius Garshol's message of "23 Jan 2001 13:36:43 +0100"
References: <m33deafns4.fsf@lambda.garshol.priv.no>
Message-ID: <x73de9q29k.fsf@bitsko.slc.ut.us>

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> I think it would make sense for the XML-SIG to create a development
> roadmap document that basically outlines
  [...]
> I think this would make it easier for ourselves to keep track of what
> is going on, make it clearer to the rest of the world what we are
> doing, and also help newcomers to find out where they can chip in and
> make a contribution.

On a somewhat related note, I've been developing a C-based extension
library that's designed for binding to host languages, called Orchard.
Orchard implements "node-based" SAX (push, and soon, pull) and a DOM
comparible to minidom (minus even a few more gratuitous W3C DOM
methods).  There will (also soon, I hope) be a Python binding for
Orchard.

Orchard's C preprocessor and runtime includes garbage collection,
attribute syntax, dynamic methods, and accessor override methods, to
make binding to languages as simple as possible (think of SWIG in
reverse).  Orchard also supports namespaces as a core feature (making
some XML applications truly simple, see the Python RSS and SOAP
implementations for examples).

Orchard will not initially be Py SAX and DOM compatible, but
compatibility modules are possible.  I mention it mostly because it's
a parallel development going on at the moment.  Source and initial
docs are available at <http://casbah.org/~kmacleod/orchard/>.  Anyone
interested in working on Python support is welcome, drop me an email.

  -- Ken


From tpassin@home.com  Wed Jan 24 01:33:20 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 23 Jan 2001 20:33:20 -0500
Subject: [XML-SIG] Development roadmap?
References: <m33deafns4.fsf@lambda.garshol.priv.no>
Message-ID: <003601c085a5$a28b2560$7cac1218@reston1.va.home.com>

Yes, this would be a good thing to do.

Cheers,

Tom P

Lars Marius Garshol wrote -

> 
> I think it would make sense for the XML-SIG to create a development
> roadmap document that basically outlines
> 
>  - tasks that we plan to do
>  - who is assigned to what task, if known
>  - rough estimate for task completion, when known
> 
> I think this would make it easier for ourselves to keep track of what
> is going on, make it clearer to the rest of the world what we are
> doing, and also help newcomers to find out where they can chip in and
> make a contribution.
> 
> This would obviously have to be a living document that is constantly
> updated to reflect the current state of affairs. I'd be willing to
> take on that task, which I guess would consist of writing the first
> version of it and updating it when others neglect to do so.
> 
> Comments? Thoughts? Opinions?
> 


From martin@mira.cs.tu-berlin.de  Wed Jan 24 07:45:30 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 24 Jan 2001 08:45:30 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <m33deafns4.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 23 Jan 2001 13:36:43 +0100)
References: <m33deafns4.fsf@lambda.garshol.priv.no>
Message-ID: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de>

> I think it would make sense for the XML-SIG to create a development
> roadmap document that basically outlines
> 
>  - tasks that we plan to do
>  - who is assigned to what task, if known
>  - rough estimate for task completion, when known

It seems that the current PyXML TODO document could be used to hold
that information. The only 'maintainance' I did to that document so
far was to remove obsolete entries - feel free to add stuff back.

If you think this information would be better maintained in a
different format or location (e.g. SF task manager), I suggest that
the TODO file is deleted altogether.

> This would obviously have to be a living document that is constantly
> updated to reflect the current state of affairs. I'd be willing to
> take on that task, which I guess would consist of writing the first
> version of it and updating it when others neglect to do so.

That sounds like your book is complete :-) Anyway, if you are willing
to maintain a roadmap, go just ahead.

As for specific things I plan to do (over the course of the next
months): I'd like to offer XPath and XSLT support in PyXML. I also
like to push contributors to contribute updates of their respective
packages :-)

Regards,
Martin


From rob@hooft.net  Wed Jan 24 10:09:04 2001
From: rob@hooft.net (Rob W. W. Hooft)
Date: Wed, 24 Jan 2001 11:09:04 +0100
Subject: [XML-SIG] Problematic use of setupext
In-Reply-To: <7zg0ia8qbz.fsf@amboise.ird.idealx.com>
References: <7z4ryr52g9.fsf@amboise.ird.idealx.com>
 <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de>
 <7zg0ia8qbz.fsf@amboise.ird.idealx.com>
Message-ID: <14958.43456.710851.579062@temoleh.chem.uu.nl>

>>>>> "JM" =3D=3D J=E9r=F4me Marant <jerome.marant@free.fr> writes:

 JM> "Martin v. Loewis" <martin@mira.cs.tu-berlin.de> writes:
 >> It's not that difficult to explain: setup.py does a straight
 >> import of setupext, which results in pyc files being generated.

 JM>   AFAIK, as long as this is how the interpreter behaves, I have
 JM> no clue. For instance, It would be nice to specify the
 JM> interpreter not to generate pyc files ...

How about changing the directory protection to 555 before import?
At least on unix that should prevent .pyc generation.

Rob

--=20
=3D=3D=3D=3D=3D   rob@hooft.net          http://www.hooft.net/people/ro=
b/  =3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D   R&D, Nonius BV, Delft  http://www.nonius.nl/         =
    =3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D PGPid 0xFA19277D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Use Linux! =3D=3D=3D=3D=3D=3D=3D=
=3D=3D


From jerome.marant@free.fr  Wed Jan 24 10:12:30 2001
From: jerome.marant@free.fr (Jérôme Marant)
Date: 24 Jan 2001 11:12:30 +0100
Subject: [XML-SIG] Problematic use of setupext
In-Reply-To: rob@hooft.net's message of "Wed, 24 Jan 2001 11:09:04 +0100"
References: <7z4ryr52g9.fsf@amboise.ird.idealx.com>
 <200101222235.f0MMZxX01474@mira.informatik.hu-berlin.de>
 <7zg0ia8qbz.fsf@amboise.ird.idealx.com>
 <14958.43456.710851.579062@temoleh.chem.uu.nl>
Message-ID: <7zk87ljm29.fsf@amboise.ird.idealx.com>

rob@hooft.net (Rob W. W. Hooft) writes:

> How about changing the directory protection to 555 before import?
> At least on unix that should prevent .pyc generation.

  This is one of the possible solutions, but not the most elegant one :-)

  Cheers,

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From larsga@garshol.priv.no  Wed Jan 24 11:29:57 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Jan 2001 12:29:57 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de>
References: <m33deafns4.fsf@lambda.garshol.priv.no> <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de>
Message-ID: <m38zo1faru.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| It seems that the current PyXML TODO document could be used to hold
| that information. The only 'maintainance' I did to that document so
| far was to remove obsolete entries - feel free to add stuff back.

To be honest I have to admit that I didn't even know that it existed.
Now that I've found it I think that we could start from it, but that
it should be on the web somewhere and also fleshed out somewhat.
 
| If you think this information would be better maintained in a
| different format or location (e.g. SF task manager), I suggest that
| the TODO file is deleted altogether.

I think the main candidates are an HTML file on python.org or
sourceforge.net or the SF task manager.  I don't really have any
strong opinions on either.

The HTML file would probably give a better overview, provide more
information about each task and is also more flexible, but the task
manager is probably easier to keep up to date and more helpful as an
organizational tool.

So I think I prefer the HTML page, but it's not a very strong opinion.
 
| That sounds like your book is complete :-) 

It is.  (Yes yes yes yes YES!!! :-)

| Anyway, if you are willing to maintain a roadmap, go just ahead.

OK, will do, unless people speak up and say they want the SF task
manager instead.
 
| I also like to push contributors to contribute updates of their
| respective packages :-)

Will do, as soon as the problems with my account are fixed, so that I
can commit again. :) (See support request 111946.)

--Lars M.


From tpassin@home.com  Wed Jan 24 13:29:19 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 24 Jan 2001 08:29:19 -0500
Subject: [XML-SIG] Development roadmap?
References: <m33deafns4.fsf@lambda.garshol.priv.no> <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> <m38zo1faru.fsf@lambda.garshol.priv.no>
Message-ID: <001401c08609$a838db60$7cac1218@reston1.va.home.com>

Lars Marius Garshol wrote -

>
> * Martin v. Loewis
> |
> | It seems that the current PyXML TODO document could be used to hold
> | that information. The only 'maintainance' I did to that document so
> | far was to remove obsolete entries - feel free to add stuff back.
>
...
> I think the main candidates are an HTML file on python.org or
> sourceforge.net or the SF task manager.  I don't really have any
> strong opinions on either.
>

Let's do it as an HTML file in the documentation section of the SourceForge
Site.

>
> | That sounds like your book is complete :-)
>
> It is.  (Yes yes yes yes YES!!! :-)
>
Congratulations! I'm very keen to see the result.

Cheers,

Tom P


From uche.ogbuji@fourthought.com  Wed Jan 24 14:13:10 2001
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Wed, 24 Jan 2001 07:13:10 -0700
Subject: [XML-SIG] Development roadmap?
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Wed, 24 Jan 2001 08:45:30 +0100." <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de>
Message-ID: <200101241413.HAA31296@localhost.localdomain>

> That sounds like your book is complete :-) Anyway, if you are willing
> to maintain a roadmap, go just ahead.
> 
> As for specific things I plan to do (over the course of the next
> months): I'd like to offer XPath and XSLT support in PyXML. I also
> like to push contributors to contribute updates of their respective
> packages :-)

I was going to talk about this when I posted the 4Suite road-map, but we've 
agreed to move 4XPath and 4XSLT into PyXML.

Let's get through the current opening of the XPath parser API and we can begin 
the process, preferably if it can be started not too close to the next release 
of 4Suite (scheduled second monday Feb).

As for your last sentence, Jeremy was going to update PyXML's 4DOM.  I know he 
has been working on 4Suite.org for the past few days, but he said it's high on 
his to-do list.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Wed Jan 24 19:32:34 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jan 2001 11:32:34 -0800
Subject: [XML-SIG] [Patch #103408] xml/marshal/wddx.py mods
Message-ID: <E14LVek-0002N0-00@usw-sf-web1.sourceforge.net>

Patch #103408 has been updated. 

Project: pyxml
Category: None
Status: Open
Submitted by: robin900
Assigned to : nobody
Summary: xml/marshal/wddx.py mods

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103408&group_id=6473


From martin@mira.cs.tu-berlin.de  Wed Jan 24 19:55:12 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 24 Jan 2001 20:55:12 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <m38zo1faru.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 24 Jan 2001 12:29:57 +0100)
References: <m33deafns4.fsf@lambda.garshol.priv.no> <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> <m38zo1faru.fsf@lambda.garshol.priv.no>
Message-ID: <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de>

> Now that I've found it I think that we could start from it, but that
> it should be on the web somewhere and also fleshed out somewhat.

Since you just got elected maintainer, you can chose any format you
consider appropriate.

> I think the main candidates are an HTML file on python.org or
> sourceforge.net or the SF task manager.  I don't really have any
> strong opinions on either.

I'd suggest a location inside the topic guide then; that already is
CVS-accessible. There is an automatic update procedure so you just
need to cvs commit to publish (if you can stand the 6h delay until the
cron job runs). I understand that maintaining files on python.org is
still possible only for a chosen few.

Again, please remove the TODO file from PyXML when you commit the
first version of your roadmap document. I'll then come up with a
procedure to include the roadmap in the distributions.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Wed Jan 24 19:59:00 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 24 Jan 2001 20:59:00 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <200101241413.HAA31296@localhost.localdomain>
 (uche.ogbuji@fourthought.com)
References: <200101241413.HAA31296@localhost.localdomain>
Message-ID: <200101241959.f0OJx0J00961@mira.informatik.hu-berlin.de>

> I was going to talk about this when I posted the 4Suite road-map,
> but we've agreed to move 4XPath and 4XSLT into PyXML.

I was hoping you'd say that. Sorry for the little pushing :-)

> Let's get through the current opening of the XPath parser API and we
> can begin the process, preferably if it can be started not too close
> to the next release of 4Suite (scheduled second monday Feb).

Ok, it seems that I still owe some commentary and updates to the
draft...

> As for your last sentence, Jeremy was going to update PyXML's 4DOM.
> I know he has been working on 4Suite.org for the past few days, but
> he said it's high on his to-do list.

That sounds all very well.

Regards,
Martin


From jeremy.kloth@fourthought.com  Wed Jan 24 23:45:05 2001
From: jeremy.kloth@fourthought.com (Jeremy J Kloth)
Date: Wed, 24 Jan 2001 16:45:05 -0700
Subject: [XML-SIG] Re: [4suite] Where is xml.xslt?
References: <000a01c08648$ace3c730$0100a8c0@k6>
Message-ID: <00da01c0865f$b759baa0$1b01a8c0@fourthought.com>

> Actually, it hasn't gone well ... is this the problem you were warning me
about (I am using Windows 2000 and Python 2.0 btw):
>
[...snip traceback...]
>
> I installed the binary distribution first, before I got your message below
warning me about the clashes.  So I couldn't do what you suggested.  I am
downloading the sources right now.
>
> -- Richard A. Hein
>

There is bug in the HTML DOM implementation that is causing this error.
Below is a patch for the related files.

---PATCH---
diff -u html/HTMLCollection.py devel/Ft/Dom/html/HTMLCollection.py
--- html/HTMLCollection.py Mon Jan 15 13:21:26 2001
+++ devel/Ft/Dom/html/HTMLCollection.py Wed Jan 17 15:17:48 2001
@@ -54,11 +54,16 @@
     def namedItem(self, name):
         found_node = None
         for node in self:
+            # IDs take presedence over NAMEs
             if node.getAttribute('ID') == name:
-                return node
+                found_node = node
+                break
             if not found_node and node.getAttribute('NAME') == name \
-            and node.tagName in HTML_NAME_ALLOWED:
+               and node.tagName in HTML_NAME_ALLOWED:
+                # We found a node with NAME attribute, but we have to wait
+                # until all nodes are done (one might have an ID that
matches)
                 found_node = node
+        print 'found:', found_node
         return found_node

diff -u html/HTMLDocument.py devel/Ft/Dom/html/HTMLDocument.py
--- html/HTMLDocument.py Mon Jan 15 21:00:52 2001
+++ devel/Ft/Dom/html/HTMLDocument.py Wed Jan 17 15:36:46 2001
@@ -72,7 +72,7 @@
             elements = self.getElementsByTagName('BODY')
         if elements:
             # Replace the existing one
-            oldBody.parentNode.replaceChild(newBody, elements[0])
+            elements[0].parentNode.replaceChild(newBody, elements[0])
         else:
             # Add it
             self.documentElement.appendChild(newBody)

diff -u html/HTMLElement.py devel/Ft/Dom/html/HTMLElement.py
--- html/HTMLElement.py Mon Jan 15 13:21:16 2001
+++ devel/Ft/Dom/html/HTMLElement.py Wed Jan 17 15:22:06 2001
@@ -58,19 +57,19 @@

     def getAttribute(self, name):
         attr = self.attributes.getNamedItem(string.upper(name))
-        attr and attr.value or ''
+        return attr and attr.value or ''

     def getAttributeNode(self, name):
-        return self.attribute.getNamedItem(string.upper(name))
+        return self.attributes.getNamedItem(string.upper(name))

     def getElementsByTagName(self, tagName):
         return Element.getElementsByTagName(self, string.upper(tagName))

     def hasAttribute(self, name):
-        return self.attribute.getNamedItem(string.upper(name)) is not None
+        return self.attributes.getNamedItem(string.upper(name)) is not None

     def removeAttribute(self, name):
-        attr = set.attributes.getNamedItem(string.upper(name))
+        attr = self.attributes.getNamedItem(string.upper(name))
         attr and self.removeAttributeNode(attr)

     def setAttribute(self, name, value):
@@ -80,6 +79,18 @@
         return value

     ### Helper Functions For Cloning ###
+
+    def _4dom_clone(self, owner):
+        e = self.__class__(owner,
+                           self.tagName)
+        for attr in self.attributes:
+            clone = attr._4dom_clone(owner)
+            if clone.localName is None:
+                e.attributes.setNamedItem(clone)
+            else:
+                self.attributes.setNamedItemNS(clone)
+            clone._4dom_setOwnerElement(self)
+        return e

     def __getinitargs__(self):
         return (self.ownerDocument,

--
Jeremy Kloth                        Consultant
jeremy.kloth@fourthought.com        (303)583-9900 x 105
Fourthought, Inc.                   http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From eugeneai@icc.ru  Thu Jan 25 02:24:46 2001
From: eugeneai@icc.ru (Evgeny Cherkashin)
Date: Thu, 25 Jan 2001 10:24:46 +0800
Subject: [XML-SIG] What install builder do You use ...
In-Reply-To: <20010123170106.05390EEB0@mail.python.org>
References: <20010123170106.05390EEB0@mail.python.org>
Message-ID: <200101250226.KAA10772@monster.icc.ru>

Hi!

What instlator program builder do You use for building .exe packages? e.g. PyXML?
An where can I find it?

Evgeny


From martin@mira.cs.tu-berlin.de  Thu Jan 25 06:01:39 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 25 Jan 2001 07:01:39 +0100
Subject: [XML-SIG] What install builder do You use ...
In-Reply-To: <200101250226.KAA10772@monster.icc.ru> (message from Evgeny
 Cherkashin on Thu, 25 Jan 2001 10:24:46 +0800)
References: <20010123170106.05390EEB0@mail.python.org> <200101250226.KAA10772@monster.icc.ru>
Message-ID: <200101250601.f0P61dl01216@mira.informatik.hu-berlin.de>

> What instlator program builder do You use for building .exe packages? 

Distutils. python setup.py bdist_wininst.

> An where can I find it?

It's part of Python 2.0.

Regards,
Martin


From noreply@sourceforge.net  Thu Jan 25 09:22:57 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jan 2001 01:22:57 -0800
Subject: [XML-SIG] [Bug #130020] 4DOM: cloneNode broken for derived classes
Message-ID: <E14LicL-00027s-00@usw-sf-web3.sourceforge.net>

Bug #130020, was updated on 2001-Jan-25 01:22
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: 4DOM: cloneNode broken for derived classes

Details: I'm just posting this on SF as a reminder. For a description and
possible resolution, please refer to
 http://lists.fourthought.com/pipermail/4suite/2001-January/001199.html
and
http://lists.fourthought.com/pipermail/4suite/2001-January/001200.html


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=130020&group_id=6473


From larsga@garshol.priv.no  Thu Jan 25 09:51:38 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 25 Jan 2001 10:51:38 +0100
Subject: [XML-SIG] Development roadmap?
In-Reply-To: <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de>
References: <m33deafns4.fsf@lambda.garshol.priv.no> <200101240745.f0O7jUp00846@mira.informatik.hu-berlin.de> <m38zo1faru.fsf@lambda.garshol.priv.no> <200101241955.f0OJtCP00959@mira.informatik.hu-berlin.de>
Message-ID: <m3r91sj6xh.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| Since you just got elected maintainer, you can chose any format you
| consider appropriate.

Will do.  I don't expect to have anything until after the weekend.
 
| I'd suggest a location inside the topic guide then; that already is
| CVS-accessible. There is an automatic update procedure so you just
| need to cvs commit to publish (if you can stand the 6h delay until
| the cron job runs)

This sounds fine to me.
 
| Again, please remove the TODO file from PyXML when you commit the
| first version of your roadmap document.

Will do.

--Lars M.


From noreply@sourceforge.net  Thu Jan 25 15:30:02 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jan 2001 07:30:02 -0800
Subject: [XML-SIG] [Bug #130049] [4DOM] normalize() fails on DocumentFragments
Message-ID: <E14LoLa-0004Qz-00@usw-sf-web3.sourceforge.net>

Bug #130049, was updated on 2001-Jan-25 07:30
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: afayolle
Assigned to : nobody
Summary: [4DOM] normalize() fails on DocumentFragments

Details: I only tested this on 4Suite 0.10.0, please tell me if this was
fixed in 0.10.1.

Calling normalize() on a document fragment does not recurse to the child
elements (though it processes text nodes that are immediate childs of the
DF. Workaround is manually iterating through the child nodes of the DF and
calling normalize() manually.

Sample code:
from xml.dom.ext.reader import Sax2
d = Sax2.FromXml('<doc/>') # Yes, I know I'm a lazy boy...

df = d.createDocumentFragment()
df.appendChild(d.createElementNS('','foo'))
df.firstChild.appendChild(d.createTextNode('textNode1 '))
df.firstChild.appendChild(d.createTextNode('textNode2 '))

print 'before normalize'
print df.firstChild

df.normalize()

print 'after normalize'
print df.firstChild


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=130049&group_id=6473


From noreply@sourceforge.net  Thu Jan 25 17:24:50 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jan 2001 09:24:50 -0800
Subject: [XML-SIG] [Patch #103417] 4DOM: Patch for normalize()
Message-ID: <E14Lq8g-0001cc-00@usw-sf-web1.sourceforge.net>

Patch #103417 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: jkloth
Assigned to : nobody
Summary: 4DOM: Patch for normalize()

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103417&group_id=6473


From noreply@sourceforge.net  Thu Jan 25 18:19:06 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jan 2001 10:19:06 -0800
Subject: [XML-SIG] [Patch #103418] 4DOM: Derived class cloning
Message-ID: <E14LqzC-000711-00@usw-sf-web2.sourceforge.net>

Patch #103418 has been updated. 

Project: pyxml
Category: 4Suite
Status: Open
Submitted by: jkloth
Assigned to : nobody
Summary: 4DOM: Derived class cloning

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103418&group_id=6473


From uche.ogbuji@fourthought.com  Fri Jan 26 19:20:44 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 26 Jan 2001 12:20:44 -0700
Subject: [XML-SIG] Can't write to www repo
Message-ID: <200101261920.MAA27435@localhost.localdomain>

I'm trying to commit my changes to the Python XML topic, but it won't let me:


[uogbuji@borgia www]$ cvs commit
cvs commit: Examining .
cvs commit: Examining ht2html
cvs commit: Examining htdocs
cvs commit: Examining htdocs/topics
cvs commit: Examining htdocs/topics/dtds
cvs commit: Examining htdocs/topics/xbel
cvs commit: Examining htdocs/topics/xbel/docs
cvs commit: Examining htdocs/topics/xbel/docs/html
cvs [server aborted]: "commit" requires write access to the repository
cvs commit: saving log message in /tmp/cvsXssgBm
[uogbuji@borgia www]$ cat CVS/
Entries     Repository  Root        
[uogbuji@borgia www]$ cat CVS/Root 
:pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml
[uogbuji@borgia www]$ 

Can someone fix the permissions?  Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Fri Jan 26 22:11:46 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 26 Jan 2001 23:11:46 +0100
Subject: [XML-SIG] Can't write to www repo
In-Reply-To: <200101261920.MAA27435@localhost.localdomain> (message from Uche
 Ogbuji on Fri, 26 Jan 2001 12:20:44 -0700)
References: <200101261920.MAA27435@localhost.localdomain>
Message-ID: <200101262211.f0QMBkk01145@mira.informatik.hu-berlin.de>

> :pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml
> [uogbuji@borgia www]$ 
> 
> Can someone fix the permissions?

It's not a matter of permissions, but of authentication. Please see

http://sourceforge.net/cvs/?group_id=6473

pserver only allows anonymous access - you need ssh/CVS_RSH for
developer access.

Regards,
Martin


From uche.ogbuji@fourthought.com  Fri Jan 26 22:26:13 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 26 Jan 2001 15:26:13 -0700
Subject: [XML-SIG] Can't write to www repo
References: <200101261920.MAA27435@localhost.localdomain> <200101262211.f0QMBkk01145@mira.informatik.hu-berlin.de>
Message-ID: <3A71F985.CC6D9B88@fourthought.com>

"Martin v. Loewis" wrote:
> 
> > :pserver:uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml
> > [uogbuji@borgia www]$
> >
> > Can someone fix the permissions?
> 
> It's not a matter of permissions, but of authentication. Please see
> 
> http://sourceforge.net/cvs/?group_id=6473
> 
> pserver only allows anonymous access - you need ssh/CVS_RSH for
> developer access.

I didn't look carefully enough.  All I had to do was take out the
":pserver:" part

[uogbuji@borgia www]$ cvs -d
uche@cvs.pyxml.sourceforge.net:/cvsroot/pyxml commit

Worked fine.

Thanks.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Jan 26 22:39:36 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 26 Jan 2001 15:39:36 -0700
Subject: [XML-SIG] Update Python XML topic
Message-ID: <200101262239.PAA28063@localhost.localdomain>

I've completed and checked in the changes.

I updated the front page, status, software, dom and fourthought pages.

In the software page I updated links and added Pyxie, python davserver, 
soaplib, Lye and redfoot.  Python/XMl software authors, please check

http://pyxml.sourceforge.net/topics/software.html

And see if I'm missing or misrepresent your work.  I'll can make any additions 
or fixes.

One question: the PyPointers link goes to

http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html

Which gives 404.  Lars, is this still something you stillwant listed?  If so, 
where do I point to?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Fri Jan 26 22:51:41 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 26 Jan 2001 15:51:41 -0700
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: Message from Uche Ogbuji <uche.ogbuji@fourthought.com>
 of "Fri, 26 Jan 2001 15:39:36 MST." <200101262239.PAA28063@localhost.localdomain>
Message-ID: <200101262251.PAA28146@localhost.localdomain>

> I've completed and checked in the changes.
> 
> I updated the front page, status, software, dom and fourthought pages.

Note: the changes won't show up until the page auto-regenerates.  I believe 
someone mentioned a 6-hour interval?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Fri Jan 26 23:39:52 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 27 Jan 2001 00:39:52 +0100
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <200101262239.PAA28063@localhost.localdomain> (message from Uche
 Ogbuji on Fri, 26 Jan 2001 15:39:36 -0700)
References: <200101262239.PAA28063@localhost.localdomain>
Message-ID: <200101262339.f0QNdqI01809@mira.informatik.hu-berlin.de>

> I've completed and checked in the changes.
> 
> I updated the front page, status, software, dom and fourthought pages.

Thanks! Contributions of documentation are often more desirable than
contributions of code :-)

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Fri Jan 26 23:44:28 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 27 Jan 2001 00:44:28 +0100
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <200101262251.PAA28146@localhost.localdomain> (message from Uche
 Ogbuji on Fri, 26 Jan 2001 15:51:41 -0700)
References: <200101262251.PAA28146@localhost.localdomain>
Message-ID: <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de>

> Note: the changes won't show up until the page auto-regenerates.  I believe 
> someone mentioned a 6-hour interval?

Indeed. You can run the generator by invoking doupdate on
shell1.sourceforge.net if you want, but I'd take this as a test case
whether the mechanism still works.

Anybody advise of a mechanism that performs the update on commit is
highly appreciated. Please note that the specific problem is not to
just execute some script (such a script is in CVSROOT already), but to
have that script properly run on shell1 even though the commit occurs
on cvs.sourceforge.net.

Regards,
Martin


From gstein@lyra.org  Sat Jan 27 02:28:13 2001
From: gstein@lyra.org (Greg Stein)
Date: Fri, 26 Jan 2001 18:28:13 -0800
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Sat, Jan 27, 2001 at 12:44:28AM +0100
References: <200101262251.PAA28146@localhost.localdomain> <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de>
Message-ID: <20010126182812.Y704@lyra.org>

On Sat, Jan 27, 2001 at 12:44:28AM +0100, Martin v. Loewis wrote:
> > Note: the changes won't show up until the page auto-regenerates.  I believe 
> > someone mentioned a 6-hour interval?
> 
> Indeed. You can run the generator by invoking doupdate on
> shell1.sourceforge.net if you want, but I'd take this as a test case
> whether the mechanism still works.
> 
> Anybody advise of a mechanism that performs the update on commit is
> highly appreciated. Please note that the specific problem is not to
> just execute some script (such a script is in CVSROOT already), but to
> have that script properly run on shell1 even though the commit occurs
> on cvs.sourceforge.net.

Maybe start a script which uses HTTP to invoke a CGI script on the web
server? Would that propagate correctly? Have the right permissions?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From uche.ogbuji@fourthought.com  Sat Jan 27 15:24:40 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 27 Jan 2001 08:24:40 -0700
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Sat, 27 Jan 2001 00:44:28 +0100." <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de>
Message-ID: <200101271524.IAA02505@localhost.localdomain>

> > Note: the changes won't show up until the page auto-regenerates.  I believe 
> > someone mentioned a 6-hour interval?
> 
> Indeed. You can run the generator by invoking doupdate on
> shell1.sourceforge.net if you want, but I'd take this as a test case
> whether the mechanism still works.

I guess it failed the test.

The Web pages are still not updated, so I tried to go to the shell account to 
see what's what.  doupdate yields an error as you can see, and I don't know 
enough about the SF directory layout to figure out how to correct it.

[uogbuji@borgia uogbuji]$ ssh uche@pyxml.sourceforge.net
<SNIP>
Linux usw-cf-linux1 2.2.14-va.4.4-i586 #1 Tue Sep 5 15:18:51 PDT 2000 i686 
unknown

Welcome to usf-sf-shell1.  (orbital generation two)

Any problems : please submit a support request: 
http://sourceforge.net/support/?group_id=1

------------------------------------------------------------

uche@usw-pr-shell1:~$ cd /home/groups/pyxml
uche@usw-pr-shell1:/home/groups/pyxml$ ls
cgi-bin  doupdate  foo	ht2html  htdocs  log
uche@usw-pr-shell1:/home/groups/pyxml$ ./doupdate 
cvs [export aborted]: connect to slayer:2401 failed: Connection refused
./doupdate: cd: www/htdocs: No such file or directory
cp: cannot stat `/var/tmp/www5871/www/*': No such file or directory
uche@usw-pr-shell1:/home/groups/pyxml$ ls
cgi-bin  doupdate  foo	ht2html  htdocs  log
uche@usw-pr-shell1:/home/groups/pyxml$ 


Any ideas?  Looking at the doupdate script I have no idea why it's supposed to 
work, but I'm not sure how to fix it.  I'll keep investigating.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From rnd@onego.ru  Sun Jan 28 07:53:06 2001
From: rnd@onego.ru (Roman Suzi)
Date: Sun, 28 Jan 2001 10:53:06 +0300 (MSK)
Subject: [XML-SIG] I am confused...
Message-ID: <Pine.LNX.4.30.0101281021260.21581-100000@rnd.onego.ru>

Hello,

I've just subscribed to this list and my brief
browsing of archives suggested this is right place to
ask my question.

I maintain a journal and newspaper sites (in russian and finnish
languages) ( for example, http://carelia.onego.ru )
and am thinking about using XML to store articles.

(Now I use custom Python scripts to generate sites)

However, when I made a prototype program
and tried to generate page with Python XML tools (xml.*,
not 4Suite, I used Python 1.5.2) - it was so slow
that I just thrown the idea out.

However, XML is a natural format to represent the data
I  store in an ad-hoc format anyway.

So, my main question is:

- are Python XML tools (and which of them?) up to the task of facilitating
site-generation with bearable speed?

And one more less related to the above:

Right now I need to markup raw material for the articles
by hand and I want to do it with less keystrokes. Just
typing tags for '<author>This And This</author>' is not less typing
How do you solve this?

I am planning to do something like:

a::This And This
h::The Headline
...

and then run custom pre-processor which will store this
in proper format (I hope it will be XML if I find
fast way to deal with it in Python)

The other way to do the same is to write special mode
for Emacs, but I am not very proficient in that and
I take into consideration that if somebody else will need
to add material instead of me he will be
not happy...

Any ideas?

Thanks!

Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Patience is a virtue, it's just not one of my better virtues" _/


From martin@mira.cs.tu-berlin.de  Sun Jan 28 10:27:41 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 11:27:41 +0100
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <20010126182812.Y704@lyra.org> (message from Greg Stein on Fri,
 26 Jan 2001 18:28:13 -0800)
References: <200101262251.PAA28146@localhost.localdomain> <200101262344.f0QNiS401869@mira.informatik.hu-berlin.de> <20010126182812.Y704@lyra.org>
Message-ID: <200101281027.f0SARfK01418@mira.informatik.hu-berlin.de>

> Maybe start a script which uses HTTP to invoke a CGI script on the web
> server? Would that propagate correctly? Have the right permissions?

Good idea. I've tried, and when I was almost done, I noticed that it
will run as nobody.nobody, thus *not* have the right permissions.

I guess this is a sensible thing from the SF point of view, so I'm
back to square one.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 28 10:32:57 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 11:32:57 +0100
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <200101271524.IAA02505@localhost.localdomain> (message from Uche
 Ogbuji on Sat, 27 Jan 2001 08:24:40 -0700)
References: <200101271524.IAA02505@localhost.localdomain>
Message-ID: <200101281032.f0SAWvf01441@mira.informatik.hu-berlin.de>

> I guess it failed the test.

Yes, a number of things seems to have broken. I corrected the script
so that the hostnames are good now. I also noticed that it is best run
on pyxml.sourceforge.net (which, interestingly, is not the Web server
when you login, but is the Web server when you come through port 80
:-). IOW, it works fine when I run it; I wouldn't mind somebody else
trying to run it.

Furthermore, it seems that the crontabs are not user-readable anymore
- although they seem to contain varying per-user information. So I
can't even be sure that the cron job still runs; I'll ask SF what this
is about.

So again, any proposals (or, even better, attempts to solve this) are
welcome. For the moment, you have to run /home/groups/pyxml/doupdate
manually on pyxml.sourceforge.net after you've committed something.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 28 11:17:28 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 12:17:28 +0100
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101281021260.21581-100000@rnd.onego.ru> (message
 from Roman Suzi on Sun, 28 Jan 2001 10:53:06 +0300 (MSK))
References: <Pine.LNX.4.30.0101281021260.21581-100000@rnd.onego.ru>
Message-ID: <200101281117.f0SBHSL01714@mira.informatik.hu-berlin.de>

> However, when I made a prototype program
> and tried to generate page with Python XML tools (xml.*,
> not 4Suite, I used Python 1.5.2) - it was so slow
> that I just thrown the idea out.

Python 1.5.2 did not come with an xml.* package, so I wonder what
exactly you've been using. Perhaps xmllib? That *is* slow.

> So, my main question is:
> 
> - are Python XML tools (and which of them?) up to the task of facilitating
> site-generation with bearable speed?

That probably depends on many things: what exactly you want to
achieve, and what approximately you consider bearable. I personally
haven't tried myself to produce web sites with PyXML, but I haven't
heard complaints about unbearable speed so far.

I'd be really curious as to what transformations you wanted to
achieve, and how exactly you attempted them. E.g. choice of XML parser
matters significantly; there is a number of alternatives in PyXML.

> Right now I need to markup raw material for the articles
> by hand and I want to do it with less keystrokes. Just
> typing tags for '<author>This And This</author>' is not less typing
> How do you solve this?

Smart editors can help. For example, the psgml mode of Emacs can
perform auto-completion of tags (in particular of closing tags, but
also of opening tags if it sees a DTD).

> I am planning to do something like:
> 
> a::This And This
> h::The Headline
> ...
> 
> and then run custom pre-processor which will store this
> in proper format (I hope it will be XML if I find
> fast way to deal with it in Python)

That sounds also like a reasonable thing to do.

> The other way to do the same is to write special mode for Emacs, but
> I am not very proficient in that and I take into consideration that
> if somebody else will need to add material instead of me he will be
> not happy...

That should favour using XML all the time. People use different
editors, right. However, putting XML into a text editor is
straight-forward. Some people may want to use your Emacs macros for
convenience, but they don't *have* to - they might have some other
smart XML editor they know, and the output will still be XML.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 28 12:43:33 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 13:43:33 +0100
Subject: [XML-SIG] Announcing PyXPath 1.2
In-Reply-To: <3A67AFF5.F0895522@fourthought.com> (message from Jeremy Kloth on
 Thu, 18 Jan 2001 20:09:41 -0700)
References: <200012291557.QAA01457@loewis.home.cs.tu-berlin.de> <3A67AFF5.F0895522@fourthought.com>
Message-ID: <200101281243.f0SChX501988@mira.informatik.hu-berlin.de>

> >   const unsigned short BINARY_EXPR = 8;
> Since there are two basic types of binary expressions, I suggest
> splitting this into a BOOLEAN_EXPR and NUMERIC_EXPR.  They do offer
> quite different functionality.

Sounds good. How does the UNION_OPERATOR fit in?

> >   const unsigned short UNARY_EXPR = 9;
> This would be considered a NUMERIC_EXPR.

How do you represent a '-x' in a NumericExpr object, then?
In particular, how to distingiush 'a-b' and '-a'? The first
is

  createNumericExpr(MINUS_OPERATOR, a, b)

Some options for the second one:

  createNumericExpr(UNARY_MINUS_OPERATOR, a, None)
  createNumericExpr(UNARY_MINUS_OPERATOR, None, a)
  createNumericExpr(MINUS_OPERATOR, a, None)
  createNumericExpr(MINUS_OPERATOR, None, a)

Which one would you prefer?

> >     // the name must still contain the leading $
> >     VariableReference createVariableReference(in DOMString name);
> 
> name can be a qualified name.  use prefix, localname

Ok.

> >     Literal createLiteral(in DOMString literal);
> >     Number createNumber(in DOMString value);
> >     FunctionCall createFunctionCall(in DOMString name, in ExprList args);
> 
> See createVariableReference

Ok.

> >     Expr parseLocationPath(in DOMString path); // returns absolute or relative path, or step
> 
> This should probably be parseExpression, since the Expr is the primary
> construct.  (See XPath spec - sect 1)

Probably. I'm still not sure certain which start symbol is required in
what applications. For the moment, I dropped parseLocationPath in
favour of parseExpr.

> >   interface AbsoluteLocationPath:Expr{
> >     /* '/' relative-opt, or '//' relative */
> >     readonly attribute Expr relative; // step or relative path
> 
> relative may be null  (case of '/')

Sure. That is implied in all cases where the grammar has option
constructs.

> >   const unsigned short ANCESTOR = 1;
> >   const unsigned short ANCESTOR_OR_SELF = 2;
> >   const unsigned short _ATTRIBUTE = 3; // attribute is a keyword
> >   const unsigned short CHILD = 4;
> >   const unsigned short DESCENDANT = 5;
> >   const unsigned short DESCENDANT_OR_SELF = 6;
> >   const unsigned short FOLLOWING = 7;
> >   const unsigned short FOLLOWING_SIBLING = 8;
> >   const unsigned short NAMESPACE = 9;
> >   const unsigned short PARENT = 10;
> >   const unsigned short PRECEDING = 11;
> >   const unsigned short PRECEDING_SIBLING = 12;
> >   const unsigned short SELF = 13;
> 
> Maybe suffix the types with '_AXIS'?

All of them? Ok.

> >   interface AxisSpecifier:Expr{
> >     readonly attribute unsigned short name;
> 
> Should we use axisType just for consistancy?

In the grammar, the non-terminal collecting them is AxisName, so I'm
not sure what consistency really means here.

> >   const unsigned short COMMENT = 1;
> >   const unsigned short TEXT = 2;
> >   const unsigned short PROCESSING_INSTRUCTION = 3;
> >   const unsigned short NODE = 4;
> 
> suffix of '_NODE_TEST' ??

So we get NODE_NODE_TEST? Try again :-) 

> >   interface NodeTest:Expr{
> >     readonly attribute unsigned short test;
> 
> testType ??

Ok. I guess that also means we get axisType.

[...]
> >   const unsigned short BINOP_UNION = 14;
> 
> possibly ??_OPERATOR as apposed to BINOP_??

Ok.

> >     UnaryExpr createUnaryExpr(in Expr exp);
> > 
> See factory functions above.

Changed (using createNumericExpr(MINUS_OPERATOR, exp, None) instead).

I'll release PyXPath 1.3 soon, which will also include a proposal for
integration of XSLT match expressions. Then I'll try to patch
4XPath/4XSLT to use PyXPath. I won't change the attribute names in
4XPath to conform with the IDL, though, atleast for the moment.

Regards,
Martin


From rnd@onego.ru  Sun Jan 28 13:05:26 2001
From: rnd@onego.ru (Roman Suzi)
Date: Sun, 28 Jan 2001 16:05:26 +0300 (MSK)
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101281021260.21581-100000@rnd.onego.ru>
Message-ID: <Pine.LNX.4.30.0101281547160.26622-100000@rnd.onego.ru>

(for some reason I have not received replies
from the list in my mailbox - but I'll try
to answer on reading Martin's reply from Web-page)

On Sun, 28 Jan 2001, Roman Suzi wrote:

>- are Python XML tools (and which of them?) up to the task of facilitating
>site-generation with bearable speed?

I remember I was doing queries in the form
"/article/author/name"
- and it was so slow... (0.5 - 1 sec per query on Celeron 400)

In my application I need many such queries to fill
the template - that is why speed was unbearable.

Please, tell me if I did it wrong:

- parsed xml-file
- quered each variable in a template-file from the xml-file
- filled template with values found to produce web-page
  (some variables go to other pages, for example, content page)

I am trying to learn XML for 2 years already but am
still a newbie in practice.

Anyway, before claiming XML tools for Python slow I need to recheck
with new versions - if there are no objections to the
above scheme. (And what is preferrable tool for queries?
XPath?)

Is there any on-line tutorial (?) or just example code
to learn how to work efficiently with XML from Python?
(Python is my favorite language while Java is not)
I read code from xml.* but it doesn't give me clues
for real usage.


Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Patience is a virtue, it's just not one of my better virtues" _/


From Alexandre.Fayolle@logilab.fr  Sun Jan 28 15:22:58 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sun, 28 Jan 2001 16:22:58 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
Message-ID: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>

Hello,

I'm testing Narval with what is currently in the CVS for 4Suite and
PyXML. I noticed a weird behaviour in 4DOM which is probably
parser-related, so this is why I post here.

If I build a DOM using the default non-validating parser, attributes that
have no namespace are available by specifying an empty string as the
namespace uri parameter to getAttributeNS().

Now, if I build a DOM using the default validating parser, using an empty
string won't do the trick. Instead, I have to use None as the namespace
uri.

I think this is a problem with the sax2 driver for xmlproc, or maybe
xmlproc itself. I'll look into it and submit a patch if I can figure it
out. 


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From noreply@sourceforge.net  Sun Jan 28 15:22:46 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jan 2001 07:22:46 -0800
Subject: [XML-SIG] [Patch #103470] drv_xmlproc reports None instead of empty ns-uri
Message-ID: <E14MtfC-0000Tx-00@usw-sf-web2.sourceforge.net>

Patch #103470 has been updated. 

Project: pyxml
Category: sax
Status: Open
Submitted by: afayolle
Assigned to : nobody
Summary: drv_xmlproc reports None instead of empty ns-uri

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103470&group_id=6473


From uche.ogbuji@fourthought.com  Sun Jan 28 15:46:34 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 28 Jan 2001 08:46:34 -0700
Subject: [XML-SIG] I am confused...
In-Reply-To: Message from Roman Suzi <rnd@onego.ru>
 of "Sun, 28 Jan 2001 16:05:26 +0300." <Pine.LNX.4.30.0101281547160.26622-100000@rnd.onego.ru>
Message-ID: <200101281546.IAA07482@localhost.localdomain>

> (for some reason I have not received replies
> from the list in my mailbox - but I'll try
> to answer on reading Martin's reply from Web-page)
> 
> On Sun, 28 Jan 2001, Roman Suzi wrote:
> 
> >- are Python XML tools (and which of them?) up to the task of facilitating
> >site-generation with bearable speed?
> 
> I remember I was doing queries in the form
> "/article/author/name"
> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)

What size was the file?  The time you mentioned is in line for using 4XPath on 
a 640KB file, as you can see in this demo:

[uogbuji@borgia uogbuji]$ python
Python 2.0 (#6, Oct 26 2000, 12:04:19) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> f = open("bigxml", "w")
>>> f.write("<article>\n")
>>> for i in range(10000):
...     f.write("<author><name>Uche Ogbuji</name><name>Roman Suzi</name>
</author>")
... 
>>> f.write("</article>\n")
>>> f.close()              
>>> from Ft.Lib.cDomlette import RawExpatReader
>>> reader = RawExpatReader()
>>> doc = reader.fromUri("bigxml")
>>> from xml.xpath import Evaluate
>>> import time
>>> start = time.time(); result = Evaluate("/article/author/name", 
contextNode=doc); end = time.time()
>>> print end - start
1.24777603149
>>> len(result)
20000
>>> 

bigxml is 640K once generated.  I don't think it's unreasonable for processing 
of that file that navigates through and extracts 20,000 nodes according to a 
path expression.

If you cut the loop to generate only 100 author elements (6.4K file), the 
XPath only takes 0.018 seconds to execute.

I'm curious to learn more about your data and the Python app you're using.  
You say not 4Suite so I assume you mean the old PyPath that used to come in 
PyXML.

> In my application I need many such queries to fill
> the template - that is why speed was unbearable.
> 
> Please, tell me if I did it wrong:
> 
> - parsed xml-file
> - quered each variable in a template-file from the xml-file
> - filled template with values found to produce web-page
>   (some variables go to other pages, for example, content page)
> 
> I am trying to learn XML for 2 years already but am
> still a newbie in practice.
> 
> Anyway, before claiming XML tools for Python slow I need to recheck
> with new versions - if there are no objections to the
> above scheme. (And what is preferrable tool for queries?
> XPath?)

It depends on the nature of the queries.

> Is there any on-line tutorial (?) or just example code
> to learn how to work efficiently with XML from Python?
> (Python is my favorite language while Java is not)
> I read code from xml.* but it doesn't give me clues
> for real usage.

If you get 4Suite there are some examples in the demo directories.  And you 
can always get help here.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Jan 28 16:16:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 28 Jan 2001 09:16:42 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Sun, 28 Jan 2001 16:22:58 +0100." <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>
Message-ID: <200101281616.JAA07586@localhost.localdomain>

> I'm testing Narval with what is currently in the CVS for 4Suite and
> PyXML. I noticed a weird behaviour in 4DOM which is probably
> parser-related, so this is why I post here.
> 
> If I build a DOM using the default non-validating parser, attributes that
> have no namespace are available by specifying an empty string as the
> namespace uri parameter to getAttributeNS().
> 
> Now, if I build a DOM using the default validating parser, using an empty
> string won't do the trick. Instead, I have to use None as the namespace
> uri.
> 
> I think this is a problem with the sax2 driver for xmlproc, or maybe
> xmlproc itself. I'll look into it and submit a patch if I can figure it
> out. 

Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
namespaces on unprefixes attributes were being returned as the namespace of 
the element).

I thought None was an acceptable NSUri in Python SAX2.  The docs certainly 
seem to think so.  No big deal returning "" instead.  I saw your patch.  Have 
you checked this in, or should I?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Sun Jan 28 16:42:09 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sun, 28 Jan 2001 17:42:09 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <200101281616.JAA07586@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0101281731290.24102-100000@leo.logilab.fr>

On Sun, 28 Jan 2001, Uche Ogbuji wrote:

> Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
> namespaces on unprefixes attributes were being returned as the namespace of 
> the element).
> 
> I thought None was an acceptable NSUri in Python SAX2.  The docs certainly 
> seem to think so.  No big deal returning "" instead.  

Well I don't mind having None instead of '', but I'm certainly in favour
of consistency. As long as empty ns uri always show up the same, this is
fine by me. I was assuming None was 'wrong' only because I had always seen
'' before (and all our code uses '').

> I saw your patch.  Have you checked this in, or should I?

I don't think I have write access on the PyXML cvs, since I'm not
registered as a developer on the project, but correct me if I'm wrong.

Narval (including a couple of kludges to work around bug #128860) with
todays cvs snapshot of 4Suite and PyXML, and this patch works fine, so I'd
say it works fine, as long as noone else is expecting None as a ns-uri.


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From rnd@onego.ru  Sun Jan 28 19:32:06 2001
From: rnd@onego.ru (Roman Suzi)
Date: Sun, 28 Jan 2001 22:32:06 +0300 (MSK)
Subject: [XML-SIG] I am confused...
In-Reply-To: <200101281546.IAA07482@localhost.localdomain>
Message-ID: <Pine.LNX.4.30.0101282147120.1103-100000@rnd.onego.ru>

On Sun, 28 Jan 2001, Uche Ogbuji wrote:

>> On Sun, 28 Jan 2001, Roman Suzi wrote:
>>
>> >- are Python XML tools (and which of them?) up to the task of facilitating
>> >site-generation with bearable speed?
>>
>> I remember I was doing queries in the form
>> "/article/author/name"
>> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)
>
>What size was the file?  The time you mentioned is in line for using 4XPath on
>a 640KB file, as you can see in this demo:


>1.24777603149
>>>> len(result)
>20000

On my AMD k6-200 this is more than 2 times longer, but still impressing:

python1.5 big.py
2.75321102142

>bigxml is 640K once generated.  I don't think it's unreasonable for processing
>I'm curious to learn more about your data and the Python app you're using.
>You say not 4Suite so I assume you mean the old PyPath that used to come in
>PyXML.

I do not remember exact name.

>> In my application I need many such queries to fill
>> the template - that is why speed was unbearable.
>>
>> Anyway, before claiming XML tools for Python slow I need to recheck
>> with new versions - if there are no objections to the
>> above scheme. (And what is preferrable tool for queries?
>> XPath?)
>
>It depends on the nature of the queries.

Mostly of the type  shown above. Sometimes with conditions.

>> Is there any on-line tutorial (?) or just example code
>> to learn how to work efficiently with XML from Python?
>> (Python is my favorite language while Java is not)
>> I read code from xml.* but it doesn't give me clues
>> for real usage.
>
>If you get 4Suite there are some examples in the demo directories.  And you
>can always get help here.

Thank you! Your example shows good performance of 4Suite tools.


Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Patience is a virtue, it's just not one of my better virtues" _/


From dieter@handshake.de  Sun Jan 28 20:25:36 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Sun, 28 Jan 2001 21:25:36 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <278066535@toto.iv>
Message-ID: <14964.32832.866565.161806@lindm.dm>

Uche Ogbuji writes:
 > Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
 > namespaces on unprefixes attributes were being returned as the namespace of 
 > the element).
Is this not correct?

I interpreted the following phrase from the namespace spec
in this direction:

   "Note that default namespaces do not apply directly to attributes."


Dieter


From Mike.Olson@fourthought.com  Sun Jan 28 20:50:10 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 28 Jan 2001 13:50:10 -0700
Subject: [XML-SIG] I am confused...
References: <Pine.LNX.4.30.0101281547160.26622-100000@rnd.onego.ru>
Message-ID: <3A748602.9935FEA2@FourThought.com>

Roman Suzi wrote:
> 
> 
> >- are Python XML tools (and which of them?) up to the task of facilitating
> >site-generation with bearable speed?


http://4suite.org is completly dynamic from XML.  Infact there is one
additionaly step we go from a set of RDF statements --> XML and then
render it with XSLT.  This is running on a Celeron 400, some times it
gets a bit slow, but usally it is acceptable.

> 
> I remember I was doing queries in the form
> "/article/author/name"
> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)

If you didn;t use 4Suite, then what did you use?  I think there was an
XPath implementation in PyXML but I know little about it.  I know 4XPath
performs pretty well.  Going to the site again, there are hundreds of
XPath expressions, but we still get resonable times.

> 
> In my application I need many such queries to fill
> the template - that is why speed was unbearable.

What is you template?  XSLT?  If not have you thought of using it.  It
sounds like it was designed to do exactly what you need.

> 
> Please, tell me if I did it wrong:
> 
> - parsed xml-file
> - quered each variable in a template-file from the xml-file
> - filled template with values found to produce web-page
>   (some variables go to other pages, for example, content page)

Again, it sounds like your doing a lot by hand that is not needed.  You
can do this in XSLT with a simple template like

<xsl:template match='article'>
  <HTML><HEAD><TITLE>Article By <xsl:value-of
select='author/name'/></TITLE></HEAD></HTML>
</xsl:template>

The big advantage is that all of your XPath expressions can be relative
to the current context.  In the above example, the current context is
already the article so you don't need to match on it again.


> 
> I am trying to learn XML for 2 years already but am
> still a newbie in practice.
> 
> Anyway, before claiming XML tools for Python slow I need to recheck
> with new versions - if there are no objections to the
> above scheme. (And what is preferrable tool for queries?
> XPath?)

I'd definitly upgrade to latest versions.  I'd also consider XSLT.

> 
> Is there any on-line tutorial (?) or just example code
> to learn how to work efficiently with XML from Python?
> (Python is my favorite language while Java is not)
> I read code from xml.* but it doesn't give me clues
> for real usage.

Were working on them.  There are some demos that come with the code, but
no real beginners tutorial.

Hope this helps,

Mike

> 
> Sincerely yours, Roman Suzi
> --
> Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
> _/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
> _/ Sunday, January 28, 2001 _/ Powered by Linux RedHat 6.2 _/
> _/ "Patience is a virtue, it's just not one of my better virtues" _/
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Sun Jan 28 20:57:02 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 21:57:02 +0100
Subject: [XML-SIG] PyXPath 1.3
Message-ID: <200101282057.f0SKv2p08814@mira.informatik.hu-berlin.de>

A new release of PyXPath is now available on

http://www.informatik.hu-berlin.de/~loewis/xml/PyXPath-1.3.tgz

In this release, the IDL is updated according to Jeremy's suggestion,
and to include XSLT pattern support. In addition, the function
pyxpath.CompilePattern was added to support parsing pattern
expressions.

I have updated the grammar for use with Yapps 2. Even though this
generator provides a number of improvements, PyXPath was changed just
so it compiles with Yapps 2; future release will make use of the
Kleene star and other features where appropriate.

Like previous releases, this requires a 4Suite installation to
represent the expression in objects roughly according to the API.

Regards,
Martin


From uche.ogbuji@fourthought.com  Sun Jan 28 21:07:34 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 28 Jan 2001 14:07:34 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from Dieter Maurer <dieter@handshake.de>
 of "Sun, 28 Jan 2001 21:25:36 +0100." <14964.32832.866565.161806@lindm.dm>
Message-ID: <200101282107.OAA08130@localhost.localdomain>

> Uche Ogbuji writes:
>  > Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
>  > namespaces on unprefixes attributes were being returned as the namespace of 
>  > the element).
> Is this not correct?
> 
> I interpreted the following phrase from the namespace spec
> in this direction:
> 
>    "Note that default namespaces do not apply directly to attributes."

Yes.  And I fixed the driver to meet this.  Prior to my fix, drv_xmlproc was 
returning the default namespace on unprefixed attributes in violation of XML 
Namespaces 1.0, and in particular, the portion you quoted.  Now it returns 
None, or after I check in Alexandre's patch, "".


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Sun Jan 28 22:05:11 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 23:05:11 +0100
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101281547160.26622-100000@rnd.onego.ru> (message
 from Roman Suzi on Sun, 28 Jan 2001 16:05:26 +0300 (MSK))
References: <Pine.LNX.4.30.0101281547160.26622-100000@rnd.onego.ru>
Message-ID: <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de>

> I remember I was doing queries in the form
> "/article/author/name"
> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)

What kind of API did you use? For simple queries like this, a SAX
ContentHandler may be sufficient. Using Uche's bigxml file, you can
try

import xml.sax
class NameRetriever(xml.sax.ContentHandler):
    def __init__(self):
        self.authors = []
        self.in_author = self.in_name = 0

    def startElement(self, tag, attrs):
        if tag=="author":
            self.in_author = 1
        else:
            if self.in_author and tag == "name":
                self.in_name = 1
                self.txt = ""

    def characters(self,str):
        if self.in_name:
            self.txt = self.txt+str

    def endElement(self,tag):
        if self.in_name and tag=="name":
            self.authors.append(self.txt)
            self.in_name=0
        elif self.in_author and tag=="author":
            self.in_author=0

h = NameRetriever()
start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time()
print end - start
print len(h.authors)

To my own surprise, this is not as fast as the cDomlette; probably
because the latter links directly with expat, and thus avoids a number
of indirections. Still, it takes only three times as long (0.5s vs
1.4s on my machine), and it will work on any Python 2.0 installation.

> Please, tell me if I did it wrong:
> 
> - parsed xml-file
> - quered each variable in a template-file from the xml-file
> - filled template with values found to produce web-page
>   (some variables go to other pages, for example, content page)

In general, that is ok - except that the description is unprecise. How
did you parse? How did you query? How did you fill the template?

> Anyway, before claiming XML tools for Python slow I need to recheck
> with new versions - if there are no objections to the above
> scheme. (And what is preferrable tool for queries?  XPath?)

It depends. A SAX ContentHandler may do in many cases - although it is
apparently not necessarily faster than XPath over a fast DOM
implementation.

> Is there any on-line tutorial (?) or just example code
> to learn how to work efficiently with XML from Python?

To learn PyXML, there is a an online tutorial on the PyXML topic
guide. To learn working efficiently is probably not something that can
be taught in a tutorial - that is much a matter of experience.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 28 22:23:24 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 23:23:24 +0100
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <200101281616.JAA07586@localhost.localdomain> (message from Uche
 Ogbuji on Sun, 28 Jan 2001 09:16:42 -0700)
References: <200101281616.JAA07586@localhost.localdomain>
Message-ID: <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de>

> I thought None was an acceptable NSUri in Python SAX2.  The docs
> certainly seem to think so. 

What part of the docs specifically do you refer to, here? I think the
None vs "" business is sufficiently confusing so it needs to be
spelled out explicitly in all places. I do not think that applications
should need to behave polymorphically, accepting either None or "".

For SAX, the only explicit statement I could find is in the Java SAX
spec:

  uri - The Namespace URI, or the empty string if the element has no
  Namespace URI or if Namespace processing is not being performed.
  (http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/ContentHandler.html)

So unless you found documentation that Python has to use None here,
I'd say we have to clarify the SAX API that a missing namespace is
represented as "".

Unfortunately, the DOM specification has that different:

  # Note that because the DOM does no lexical checking, the empty
  # string will be treated as a real namespace URI in DOM Level 2
  # methods. Applications must use the value null as the namespaceURI
  # parameter for methods if they wish to have no namespace.
  (1.1.8 of DOM 2 Core)

This clearly means that a node without namespace has a null
namespaceURI, according to
http://python.sourceforge.net/devel-docs/lib/dom-type-mapping.html,
this maps to None in Python.

If everybody agrees that this is how it should be, we should document
it as such where appropriate, and fix existing implementations
accordingly.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Sun Jan 28 22:41:16 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 28 Jan 2001 23:41:16 +0100
Subject: [XML-SIG] XSLT parser interface
Message-ID: <200101282241.f0SMfGn09737@mira.informatik.hu-berlin.de>

[This was sent to python-dev by mistake; my apologies - MvL]

Based on my previous IDL interface for XPath parsers, I've defined an
API for a parser that parsers XSLT pattern expressions. It is an
extension to the XPath API, so I attach only the additional functions.

Any comments are appreciated.

Martin

module XPath{
  // XSLT exprType values
  const unsigned short PATTERN = 17;
  const unsigned short LOCATION_PATTERN = 18;
  const unsigned short RELATIVE_PATH_PATTERN = 19;
  const unsigned short STEP_PATTERN = 20;

  interface Pattern;
  interface LocationPathPattern;
  interface RelativePathPattern;
  interface StepPattern;

  interface PatternFactory:ExprFactory{
    Pattern createPattern(in LocationPathPattern first);
    // idkey may be null, represents IdKeyPattern
    // if parent is true, it is '/', else '//'
    // rel may be null
    LocationPathPattern createLocationPathPattern(in FunctionCall idkey,
						  boolean parent,
						  in RelativePathPattern rel);
    // if parent is true, it is /, else //
    RelativePathPattern createRelativePathPattern(in RelativePathPattern rel,
						  boolean parent,
						  in StepPattern step);
    StepPattern createStepPattern(in AxisSpecifier axis,
				  in NodeTest test,
				  in PredicateList predicates);
  };

  typedef sequence<LocationPathPattern> LocationPathPatterns;
  interface Pattern:Expr{
    readonly attribute LocationPathPatterns patterns;
    void append(in LocationPathPattern pattern);
  };

  interface LocationPathPattern:Expr{
    readonly attribute FunctionCall idkey;
    readonly attribute boolean parent;
    readonly attribute RelativePathPattern relative_pattern;
  };

  interface RelativePathPattern:Expr{
    readonly attribute RelativePathPattern relative;
    readonly attribute boolean parent;
    readonly attribute StepPattern step;
  };

  interface StepPattern:Expr{
    readonly attribute AxisSpecifier axis;
    readonly attribute NodeTest test;
    readonly attribute PredicateList predicates;
  };

  interface XSLTParser:Parser{
    Pattern parsePattern(in DOMString pattern);
  };
};


From Alexandre.Fayolle@logilab.fr  Mon Jan 29 08:55:08 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 29 Jan 2001 09:55:08 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.21.0101290941440.24805-100000@leo.logilab.fr>

On Sun, 28 Jan 2001, Martin v. Loewis wrote:

> spelled out explicitly in all places. I do not think that applications
> should need to behave polymorphically, accepting either None or "".

I could not agree more. 
 
<snipped useful ref to specs>
 
> If everybody agrees that this is how it should be, we should document
> it as such where appropriate, and fix existing implementations
> accordingly.

So to sum things up, this means that:

 * the patch to drv_xmlproc should be correct. I believe drv_expat should 
be already fine;
 * 4DOM/minidom/etc. should be updated to use None for the namespace uri;
 * applications using these implementation should be updated. 


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From larsga@garshol.priv.no  Mon Jan 29 09:48:14 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Jan 2001 10:48:14 +0100
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: <200101262239.PAA28063@localhost.localdomain>
References: <200101262239.PAA28063@localhost.localdomain>
Message-ID: <m3zogaheox.fsf@lambda.garshol.priv.no>

* Uche Ogbuji
| 
| One question: the PyPointers link goes to
| 
| http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html
| 
| Which gives 404.  Lars, is this still something you stillwant
| listed?  If so, where do I point to?

Just remove it.  That module implements a now obsolete XPointer
syntax that is totally different from the current XPath-based one, and
so really is useless.

--Lars M.


From larsga@garshol.priv.no  Mon Jan 29 09:58:37 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Jan 2001 10:58:37 +0100
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>
Message-ID: <m3y9vuhe7m.fsf@lambda.garshol.priv.no>

* Alexandre Fayolle
| 
| If I build a DOM using the default non-validating parser, attributes
| that have no namespace are available by specifying an empty string
| as the namespace uri parameter to getAttributeNS().

Actually, I think this is something that is underspecified in both SAX
and the DOM. We need to decide how to represent no namespace URI both
in SAX and the DOM. At the moment I think both different SAX drivers
and 4DOM/minidom disagree here. 4DOM/minidom also disagree in other
parts of their Attributes implementations.

I have, unfortunately, not had time to dig sufficiently into this to
know the exact state of things, but please don't start changing the
code until we have agreed what is the correct behaviour.

My opinion is that names that have no namespace URI should be
represented using None rather than "".

--Lars M.

 
From Alexandre.Fayolle@logilab.fr  Mon Jan 29 10:20:10 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 29 Jan 2001 11:20:10 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0101291103330.24947-100000@leo.logilab.fr>

On 29 Jan 2001, Lars Marius Garshol wrote:
 
> I have, unfortunately, not had time to dig sufficiently into this to
> know the exact state of things, but please don't start changing the
> code until we have agreed what is the correct behaviour.

Do not worry about that: I just submitted a very quick patch for review,
which enables Narval to work with the cvs HEAD code of 4Suite and PyXML
and not too many kludges in the code to handle both conventions. It is now
up to the PyXML developers to decide whether it will be applied or not. 

I agree that some agreement has to be reached first. And if the agreement
is to use None, I'll change the code in Narval to match this, it's as
simple as that (and a strong 'requires' statement on the download page,
for this decision can break existing code).

Martin pointed out some very interesting parts of the various specs, in
another mail on this thread, which seem to clarify this point very much.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From rnd@onego.ru  Mon Jan 29 10:59:29 2001
From: rnd@onego.ru (Roman Suzi)
Date: Mon, 29 Jan 2001 13:59:29 +0300 (MSK)
Subject: [XML-SIG] I am confused...
In-Reply-To: <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.30.0101291329370.10885-100000@rnd.onego.ru>

On Sun, 28 Jan 2001, Martin v. Loewis wrote:

I do not remember if this was what I used for measuring, but
this was my another effort to create query-mechanisms
(It doesnt work anymore due to lack of xml.dom.utils)

--------------------

#!/usr/bin/python1.5

print "1. simple"

from xml.dom.utils import FileReader
from xml.dom.core import createDocument
from string import split, index

ELEMENT                 = 1
ATTRIBUTE               = 2
TEXT                    = 3
CDATA_SECTION           = 4
ENTITY_REFERENCE        = 5
ENTITY                  = 6
PROCESSING_INSTRUCTION  = 7
COMMENT                 = 8
DOCUMENT                = 9
DOCUMENT_TYPE           = 10
DOCUMENT_FRAGMENT       = 11
NOTATION                = 12

d = FileReader()
dom = d.readFile('104.xml')

def portr(node):
    typ = node.get_nodeType()
    value = node.get_nodeValue()
    name = node.get_nodeName()
    atts = node.get_attributes()
    par = node.get_parentNode()
    print "t ",   typ, "v ",value, "n ",name, "a ", atts, "p ", par

class strstream:
  def __init__(self, str):
     self.str = str
#     print "strstream init"

  def read(self, n):
     tmp = self.str[:n]
     self.str = self.str[n:]
     return tmp

  def readline(self):
     return self.str

def _normalize_tokens(tl):
    """ rules:
    $,word,$ --> $word$
    """
    rules2 = {
    ("/","/") : "//",
    (".","/") : "./",
    ("!","=") : "$ne$",
    ("<","=") : "$le$",
    (">","=") : "$ge$",
    ("=","~") : "$match$",
    ("!","~") : "$no_match$",
    (";",";") : ";",
    }

    rules1 = {
    "=" : "$eq$",
    "!" : "$lt$",
    "<" : "$lt$",
    ">" : "$gt$",
    }

    ntl = []
    i = 0
    while i < len(tl)-1:
      if rules2.has_key( tuple(tl[i:i+2]) ):
        toapp = rules2[tuple(tl[i:i+2])]
        i = i+2
      else:
        if tl[i] == "$":
          if i+2 < len(tl):
            toapp = tl[i] + tl[i+1] + tl[i+2]
            i = i+3
          else:
            raise "Query error !!!" + `tl`
        else:
          toapp = tl[i]
          i = i+1
      if rules1.has_key( toapp ):
        toapp = rules1[toapp]
      ntl.append( toapp )
    return ntl

def _parse_query(q):
    from shlex import shlex
    #  i1 = index(q, "/")
    lexer = shlex(strstream(q))
    tokens = []
    tt = lexer.get_token()
    while tt:
      tokens.append(tt)
      tt = lexer.get_token()
    return _normalize_tokens(tokens)

def find_all_descendants(node, cond):
    return None     # XXX !!! stub

def find_all_children(node, cond):
    lst = []
    exec(cond)       ### must define condition !!!
    for n in node.get_childNodes():
      if condition(n):
        lst.append(n)
    return lst

class PYQL:
  def __init__(self, file):
    d = FileReader()
    self.dom = d.readFile(file)
    if self.dom.get_nodeType() == DOCUMENT:
      self.docel = self.dom.get_documentElement()


  def query(self, q):
#    return  self._query(self.docel, q)
#     return  _parse_query(q)
    qr = self._query(self.docel, _parse_query(q), self.dom )      # ???
    qel = self.dom.createElement("xql:result")
    if qr:
      qel.appendChild(qr)
    qel.setAttribute("orig", str(q))
    return qel

  def _query(self, node, subq, qrdoc):
#    print subq
    print find_all_children(node,
    """def condition(n): return n.get_nodeName() == "fig" """)
    if subq[0] == "//":
      self._query(node, subq[1:], qrdoc)
    elif subq[0] == "/":
      if subq[1] == node.get_nodeName():
        if len(subq) > 2:
          if subq[2] == "/":
            qel = qrdoc.createElement(node.get_nodeName())
            for a in node.get_attributes().keys():
              qel.setAttribute(a, node.get_attributes()[a].get_nodeValue())
            for node1 in node.get_childNodes():
              q2 = self._query(node1, subq[2:], qrdoc)
#              print "q2: ", q2
              if q2:
                 qel.appendChild(q2)
            if len(qel.get_childNodes())==0:
              del qel
              return None
            else:
              return qel
          else:
            return node
        else:
          return node
      else:
        return None


a = PYQL('104.xml')
#  a.query('$or$ != 1.23E-4          /article/text/topic$')
#  print a.query('/article/text/topic.').toxml()
print a.query('/article/text/figures/fig.').toxml()
#   print a.query('//fig.').toxml()

-----------

It was naive attempt to write XQL for Python...

>> I remember I was doing queries in the form
>> "/article/author/name"
>> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)
>
>What kind of API did you use? For simple queries like this, a SAX
>ContentHandler may be sufficient. Using Uche's bigxml file, you can
>try

>import xml.sax
>class NameRetriever(xml.sax.ContentHandler):
>    def __init__(self):
>        self.authors = []
>        self.in_author = self.in_name = 0
>
>    def startElement(self, tag, attrs):
>        if tag=="author":
>            self.in_author = 1
>        else:
>            if self.in_author and tag == "name":
>                self.in_name = 1
>                self.txt = ""
>
>    def characters(self,str):
>        if self.in_name:
>            self.txt = self.txt+str
>
>    def endElement(self,tag):
>        if self.in_name and tag=="name":
>            self.authors.append(self.txt)
>            self.in_name=0
>        elif self.in_author and tag=="author":
>            self.in_author=0
>
>h = NameRetriever()
>start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time()
>print end - start
>print len(h.authors)

The above code is what I avoid to do.  I want my application to be
completely data-driven, so even "/article/author/name" must not appear in
the program!

>To my own surprise, this is not as fast as the cDomlette; probably
>because the latter links directly with expat, and thus avoids a number
>of indirections. Still, it takes only three times as long (0.5s vs
>1.4s on my machine), and it will work on any Python 2.0 installation.
>
>> Please, tell me if I did it wrong:
>>
>> - parsed xml-file
>> - quered each variable in a template-file from the xml-file
>> - filled template with values found to produce web-page
>>   (some variables go to other pages, for example, content page)
>
>In general, that is ok - except that the description is unprecise. How
>did you parse? How did you query? How did you fill the template?

My code above answer these questions.

>> Anyway, before claiming XML tools for Python slow I need to recheck
>> with new versions - if there are no objections to the above
>> scheme. (And what is preferrable tool for queries?  XPath?)
>
>It depends. A SAX ContentHandler may do in many cases - although it is
>apparently not necessarily faster than XPath over a fast DOM
>implementation.

>> Is there any on-line tutorial (?) or just example code
>> to learn how to work efficiently with XML from Python?
>
>To learn PyXML, there is a an online tutorial on the PyXML topic
>guide. To learn working efficiently is probably not something that can
>be taught in a tutorial - that is much a matter of experience.

Thanks! I shall look there too.

>Regards,
>Martin

Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "The tuna doesn't taste the same without the dolphin." _/


From rnd@onego.ru  Mon Jan 29 13:33:26 2001
From: rnd@onego.ru (Roman Suzi)
Date: Mon, 29 Jan 2001 16:33:26 +0300 (MSK)
Subject: One more ques Re: [XML-SIG] I am confused...
In-Reply-To: <3A748602.9935FEA2@FourThought.com>
Message-ID: <Pine.LNX.4.30.0101291606550.10885-100000@rnd.onego.ru>

On Sun, 28 Jan 2001, Mike Olson wrote:

And one more problem: my texts are far from plain ASCII.
Do I need to convert them to utf8 or unicode before
working with XML+XSLT+XPath?
Do I need Python-2 to implement non US-ASCII site (and not latin-1)?

>Roman Suzi wrote:

I must admit I never had clearer answers for my
questions as in this list! Even though I formulated my
problem poorly, I received well-targeted answers
which will help me tailor solution to my problem.

>> In my application I need many such queries to fill
>> the template - that is why speed was unbearable.
>
>What is you template?  XSLT?  If not have you thought of using it.  It
>sounds like it was designed to do exactly what you need.

My templates are just fiels with %(var)s -style things inside.
And thank you mentioning XSLT with referring to
working site - I will see if this fit in my case.

>> Please, tell me if I did it wrong:
>>
>> - parsed xml-file
>> - quered each variable in a template-file from the xml-file
>> - filled template with values found to produce web-page
>>   (some variables go to other pages, for example, content page)
>
>Again, it sounds like your doing a lot by hand that is not needed.  You
>can do this in XSLT with a simple template like

><xsl:template match='article'>
>  <HTML><HEAD><TITLE>Article By <xsl:value-of
>select='author/name'/></TITLE></HEAD></HTML>
></xsl:template>

Wow! If it works as advertized - this is what I need.

Can I also embed some python sentences there to handle
hard cases?


>The big advantage is that all of your XPath expressions can be relative
>to the current context.  In the above example, the current context is
>already the article so you don't need to match on it again.
>>
>> I am trying to learn XML for 2 years already but am
>> still a newbie in practice.
>>
>> Anyway, before claiming XML tools for Python slow I need to recheck
>> with new versions - if there are no objections to the
>> above scheme. (And what is preferrable tool for queries?
>> XPath?)
>
>I'd definitly upgrade to latest versions.

I did it already.

>I'd also consider XSLT.

>From what you have shown - sure.

>> Is there any on-line tutorial (?) or just example code
>> to learn how to work efficiently with XML from Python?
>> (Python is my favorite language while Java is not)
>> I read code from xml.* but it doesn't give me clues
>> for real usage.
>
>Were working on them.  There are some demos that come with the code, but
>no real beginners tutorial.

Demos are sometimes more valuable than tutorials.
In fact, I feel a need to reread overviews on XML (XSLT, XPath, AFs etc)
to have better idea what they do before looking at
demos.

>Hope this helps,
>
>Mike

Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "The tuna doesn't taste the same without the dolphin." _/


From tpassin@home.com  Mon Jan 29 14:32:45 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 29 Jan 2001 09:32:45 -0500
Subject: [XML-SIG] problem with empty namespace uri
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr> <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
Message-ID: <002f01c08a00$58f295a0$7cac1218@reston1.va.home.com>

Lars Marius Garshol wrote -
>
> My opinion is that names that have no namespace URI should be
> represented using None rather than "".
>
I completely agree with this.  If there is ***no*** namespace, the ns value
should be None.  The empty string should indicate that there is a namespace,
but its value happens to be empty.

Illustrations seem to be like this - someone help me out here, please.

1) No namespace is declared or used in the whole document, but SAX2 is in use.
(ns='')
2) SAX 1 is in use. (ns=None)
3) Namespaces are used in the document, but not in some particular element.
(ns='' for that element)
4) Namespaces are used in the document, but some particular element is a child
of an element that declares a default namespace. (ns=default ns for that
element).

This leaves open the ns for an attribute in an element that declare a default
ns - the old question that comes up over and over.  I don't know the answer.

Maybe tests like this:

if ns:
    # Do  your namespace stuff

wouldn't add that much to the processing time.  They would act the same on
None and '' ns values.  Of course, you could say, then why make a distinction.
Maybe we don't need to.

I thought this had been hashed out and resolved on the list a while ago,
although I don't remember the details.  This would be a perfect subject for
one of those PEP-like pages I proposed a while ago.  I'd like to resurrect
that suggestion, and have this topic be the subject of the first one.  What do
you say?

Cheers,

Tom P


From martin@mira.cs.tu-berlin.de  Mon Jan 29 15:28:25 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 16:28:25 +0100
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <m3y9vuhe7m.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 29 Jan 2001 10:58:37 +0100)
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr> <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
Message-ID: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de>

> My opinion is that names that have no namespace URI should be
> represented using None rather than "".

That would be fine if agreed-upon, especially as it is consistent. I
just point out that this would be another deviation from the Java,
which then should be explicitly documented as such.

I agree on your point to agree first, and change the code then :-) I'd
go further to change the documentation before changing the code.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Mon Jan 29 15:41:43 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 16:41:43 +0100
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101291329370.10885-100000@rnd.onego.ru> (message
 from Roman Suzi on Mon, 29 Jan 2001 13:59:29 +0300 (MSK))
References: <Pine.LNX.4.30.0101291329370.10885-100000@rnd.onego.ru>
Message-ID: <200101291541.f0TFfhC00861@mira.informatik.hu-berlin.de>

> The above code is what I avoid to do.  I want my application to be
> completely data-driven, so even "/article/author/name" must not appear in
> the program!

I'll look into your code separately, but I'd like to make two points
here:

a) There is often a trade-off between data-driven and fast
   algorithms. Somebody will probably shoot me for that statement, but
   you should be willing to accept some performance degrading if you
   need it very general.

b) In Python, it is often possible to transform a data-driven approach
   in one with explicitly coded decisions, due to the dynamic nature
   of the language. If all else fails, you could generate the a program
   from the data.

c) I very much doubt that your *application* really needs to be
   completely data-driven; in any specific installation, there will be
   only a small set of queries. So that seems rather like a "nice to
   have" but a "must have" requirement.

Well, that's three points :-)

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Mon Jan 29 15:25:30 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 16:25:30 +0100
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <Pine.LNX.4.21.0101290941440.24805-100000@leo.logilab.fr>
 (message from Alexandre Fayolle on Mon, 29 Jan 2001 09:55:08 +0100
 (CET))
References: <Pine.LNX.4.21.0101290941440.24805-100000@leo.logilab.fr>
Message-ID: <200101291525.f0TFPUe00830@mira.informatik.hu-berlin.de>

> So to sum things up, this means that:
> 
>  * the patch to drv_xmlproc should be correct. I believe drv_expat should 
> be already fine;
>  * 4DOM/minidom/etc. should be updated to use None for the namespace uri;
>  * applications using these implementation should be updated. 

Right.

Martin


From larsga@garshol.priv.no  Mon Jan 29 16:15:53 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Jan 2001 17:15:53 +0100
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de>
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr> <m3y9vuhe7m.fsf@lambda.garshol.priv.no> <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de>
Message-ID: <m3elxmgwqu.fsf@lambda.garshol.priv.no>

* Lars Marius Garshol
|
| My opinion is that names that have no namespace URI should be
| represented using None rather than "".

* Martin v. Loewis
| 
| That would be fine if agreed-upon, especially as it is consistent. 

Yup. Tom Passin has said that he agrees; it would be nice if more
people could post their opinions so that we have some idea of who
agrees and who does not.  

I'd hate it if we changed this later on.

| I just point out that this would be another deviation from the Java,
| which then should be explicitly documented as such.

I am aware of this, and agree that it should be documented as a
deviation.
 
| I agree on your point to agree first, and change the code then :-)
| I'd go further to change the documentation before changing the code.

I agree.  I'll also change my book before we change the code.  :-)

--Lars M.


From fdrake@acm.org  Mon Jan 29 16:15:35 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 29 Jan 2001 11:15:35 -0500 (EST)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <m3elxmgwqu.fsf@lambda.garshol.priv.no>
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>
 <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
 <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de>
 <m3elxmgwqu.fsf@lambda.garshol.priv.no>
Message-ID: <14965.38695.931277.109716@cj42289-a.reston1.va.home.com>

Lars Marius Garshol writes:
 > Yup. Tom Passin has said that he agrees; it would be nice if more
 > people could post their opinions so that we have some idea of who
 > agrees and who does not.  

  I'll support the move to use None, and can make the changes to the
documentation in the Python Library Reference.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From ken@bitsko.slc.ut.us  Mon Jan 29 16:45:58 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 29 Jan 2001 10:45:58 -0600
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Lars Marius Garshol's message of "29 Jan 2001 17:15:53 +0100"
References: <Pine.LNX.4.21.0101281621300.23972-100000@leo.logilab.fr>
 <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
 <200101291528.f0TFSPd00832@mira.informatik.hu-berlin.de>
 <m3elxmgwqu.fsf@lambda.garshol.priv.no>
Message-ID: <x7puh6xq61.fsf@bitsko.slc.ut.us>

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> * Lars Marius Garshol
> |
> | My opinion is that names that have no namespace URI should be
> | represented using None rather than "".
> 
> * Martin v. Loewis
> | 
> | That would be fine if agreed-upon, especially as it is consistent. 
> 
> Yup. Tom Passin has said that he agrees; it would be nice if more
> people could post their opinions so that we have some idea of who
> agrees and who does not.  

+1 on None.

  -- Ken


From martin@mira.cs.tu-berlin.de  Mon Jan 29 16:34:20 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 17:34:20 +0100
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101291329370.10885-100000@rnd.onego.ru> (message
 from Roman Suzi on Mon, 29 Jan 2001 13:59:29 +0300 (MSK))
References: <Pine.LNX.4.30.0101291329370.10885-100000@rnd.onego.ru>
Message-ID: <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de>

> I do not remember if this was what I used for measuring, but
> this was my another effort to create query-mechanisms
> (It doesnt work anymore due to lack of xml.dom.utils)

Thanks. I've ported it to minidom, see the code below. Fortunately,
the DOM implementations follow the official API quite closely these
days, so it is easy to move from one implementation to another.

Using Uche's 640k document, I get the following timings:

minidom: 6.4s
4DOM: 45s
pDomlette: 8.9s

cDomlette fails since it does not support createElement (pDomlette
only has create*NS operations, so I added None as the namespace
everywhere).

Remember, this is the same machine where Uche's cDomlette/XPath query
took 0.5s. So it *does* matter how exactly you approach a certain task
(you can easily get a factor of 90 between solutions). However, if I
had to guess in advance what the approximate outcome would have been
in each of the solutions, I had been totally wrong.

Regards,
Martin

#!/usr/local/bin/python

print "1. simple"

from xml.dom import minidom
from string import split, index

def portr(node):
    typ = node.nodeType
    value = node.nodeValue
    name = node.nodeName
    atts = node.attributes
    par = node.parentNode
    print "t ",   typ, "v ",value, "n ",name, "a ", atts, "p ", par

class strstream:
  def __init__(self, str):
     self.str = str
#     print "strstream init"

  def read(self, n):
     tmp = self.str[:n]
     self.str = self.str[n:]
     return tmp

  def readline(self):
     return self.str

def _normalize_tokens(tl):
    """ rules:
    $,word,$ --> $word$
    """
    rules2 = {
    ("/","/") : "//",
    (".","/") : "./",
    ("!","=") : "$ne$",
    ("<","=") : "$le$",
    (">","=") : "$ge$",
    ("=","~") : "$match$",
    ("!","~") : "$no_match$",
    (";",";") : ";",
    }

    rules1 = {
    "=" : "$eq$",
    "!" : "$lt$",
    "<" : "$lt$",
    ">" : "$gt$",
    }

    ntl = []
    i = 0
    while i < len(tl)-1:
      if rules2.has_key( tuple(tl[i:i+2]) ):
        toapp = rules2[tuple(tl[i:i+2])]
        i = i+2
      else:
        if tl[i] == "$":
          if i+2 < len(tl):
            toapp = tl[i] + tl[i+1] + tl[i+2]
            i = i+3
          else:
            raise "Query error !!!" + `tl`
        else:
          toapp = tl[i]
          i = i+1
      if rules1.has_key( toapp ):
        toapp = rules1[toapp]
      ntl.append( toapp )
    return ntl

def _parse_query(q):
    from shlex import shlex
    #  i1 = index(q, "/")
    lexer = shlex(strstream(q))
    tokens = []
    tt = lexer.get_token()
    while tt:
      tokens.append(tt)
      tt = lexer.get_token()
    return _normalize_tokens(tokens)

def find_all_descendants(node, cond):
    return None     # XXX !!! stub

def find_all_children(node, cond):
    lst = []
    exec(cond)       ### must define condition !!!
    for n in node.childNodes:
      if condition(n):
        lst.append(n)
    return lst

class PYQL:
  def __init__(self, file):
    self.dom = minidom.parse(file)
    self.docel = self.dom.documentElement

  def query(self, q):
    qr = self._query(self.docel, _parse_query(q), self.dom)
    qel = self.dom.createElement("xql:result")
    if qr:
      qel.appendChild(qr)
    qel.setAttribute("orig", str(q))
    return qel

  def _query(self, node, subq, qrdoc):
    #print subq
    #print find_all_children(node,
    #"""def condition(n): return n.nodeName == "fig" """)
    if subq[0] == "//":
      self._query(node, subq[1:], qrdoc)
    elif subq[0] == "/":
      if subq[1] == node.nodeName:
        if len(subq) > 2:
          if subq[2] == "/":
            qel = qrdoc.createElement(node.nodeName)
            for a in node.attributes.keys():
              qel.setAttribute(a, node.attributes[a].nodeValue)
            for node1 in node.childNodes:
              q2 = self._query(node1, subq[2:], qrdoc)
#              print "q2: ", q2
              if q2:
                 qel.appendChild(q2)
            if len(qel.childNodes)==0:
              del qel
              return None
            else:
              return qel
          else:
            return node
        else:
          return node
      else:
        return None


a = PYQL('bigxml')
#  a.query('$or$ != 1.23E-4          /article/text/topic$')
#  print a.query('/article/text/topic.').toxml()
import time;start=time.time()
res=a.query('/article/author/name.').toxml()
print time.time()-start
print len(res)
#   print a.query('//fig.').toxml()


From Mike.Olson@fourthought.com  Mon Jan 29 18:43:07 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 29 Jan 2001 11:43:07 -0700
Subject: One more ques Re: [XML-SIG] I am confused...
References: <Pine.LNX.4.30.0101291606550.10885-100000@rnd.onego.ru>
Message-ID: <3A75B9BB.7EEAC2F6@FourThought.com>

Roman Suzi wrote:
> 
> On Sun, 28 Jan 2001, Mike Olson wrote:
> 
> And one more problem: my texts are far from plain ASCII.
> Do I need to convert them to utf8 or unicode before
> working with XML+XSLT+XPath?
> Do I need Python-2 to implement non US-ASCII site (and not latin-1)?

It would certainly make life easier, but you should be able to use 1.5.2

> 
> My templates are just fiels with %(var)s -style things inside.
> And thank you mentioning XSLT with referring to
> working site - I will see if this fit in my case.

It sounds like it will.  I think it will help performance as well.  You
can precompile your stylesheets so there is almost no overhead for
loading them.

> 
> >> Please, tell me if I did it wrong:
> >>
> >> - parsed xml-file
> >> - quered each variable in a template-file from the xml-file
> >> - filled template with values found to produce web-page
> >>   (some variables go to other pages, for example, content page)
> >
> >Again, it sounds like your doing a lot by hand that is not needed.  You
> >can do this in XSLT with a simple template like
> 
> ><xsl:template match='article'>
> >  <HTML><HEAD><TITLE>Article By <xsl:value-of
> >select='author/name'/></TITLE></HEAD></HTML>
> ></xsl:template>
> 
> Wow! If it works as advertized - this is what I need.
> 
> Can I also embed some python sentences there to handle
> hard cases?

What kind of hard cases?  XSLT is a lot more powerful then what I
showed, there are for loops, variables, if statements.  If you do reach
the extent of what XSLT can do, then you can write extension functions
and extension elements in Python.


Cheers,

Mike

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From rnd@onego.ru  Mon Jan 29 19:10:09 2001
From: rnd@onego.ru (Roman Suzi)
Date: Mon, 29 Jan 2001 22:10:09 +0300 (MSK)
Subject: [XML-SIG] I am confused...
In-Reply-To: <200101291541.f0TFfhC00861@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.30.0101292051470.17319-100000@rnd.onego.ru>

On Mon, 29 Jan 2001, Martin v. Loewis wrote:

>> The above code is what I avoid to do.  I want my application to be
>> completely data-driven, so even "/article/author/name" must not appear in
>> the program!
>
>I'll look into your code separately, but I'd like to make two points
>here:
>
>a) There is often a trade-off between data-driven and fast
>   algorithms. Somebody will probably shoot me for that statement, but
>   you should be willing to accept some performance degrading if you
>   need it very general.

In C - yes, but in Python - I doubt.
Data-driven programs are shorter, contain less errors
and (IMHO) are faster.

>b) In Python, it is often possible to transform a data-driven approach
>   in one with explicitly coded decisions, due to the dynamic nature
>   of the language. If all else fails, you could generate the a program
>   from the data.

This is true. But This add more complexity.

>c) I very much doubt that your *application* really needs to be
>   completely data-driven; in any specific installation, there will be
>   only a small set of queries. So that seems rather like a "nice to
>   have" but a "must have" requirement.

I agree. I have this working now - but am not satisfied, because
do like to make changes in one place instead of hunting
them thruout many places.

My points are (they are drived by laziness ;-)

a) Software solution must be as general, as possible
(I think its a myth that less general solutions are harder,
longer to implement or are much less efficient:
2+2 is not easier than x+y, why hardcode x+x ?;-)

b) One parameter change requires one change in the code
("write everything once")
(if some nontrivial constant repeats in the code in the same role - its
a variable ;-)

c) Count total time of solution: time of programming
+ time of execution. (Not forgetting time of reprogramming!)
(In my case I better wait 3 more seconds than make
hell from supporting my solution)

Now I am turning toward XML & co. because it happen to
be a common data model to store such data I have for
web-site. Anything else is reinventing the wheel.
However, I want to apply the same design principles
(expressed above) while dealing with XML.

>Well, that's three points :-)

I think this branch of discussion is kinda offtopic.

Probably one day I will write a test for programmers
where there will be questions like:

#. What do you prefer more:

a)
if a == "1":
   b = "5"
elif a == "4":
   b = "20"
# ...
else:
   b = "5000"

b)
b = {"1":"5", "4":"20", ..., "1000":"5000"}[a]

c)
b = str(int(a)*5)

d)
try:
  b = str(int(a)*5)
except:
  b = "5000"

:-)

For now my answer is (d) but there are cases where (c or d)
are not possible - then it will be (b).

>Regards,
>Martin

Sincerely yours, Roman Suzi
-- 
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Monday, January 29, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "The tuna doesn't taste the same without the dolphin." _/


From uche.ogbuji@fourthought.com  Mon Jan 29 19:49:02 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 12:49:02 -0700
Subject: [XML-SIG] New articles up: XML Messaging
Message-ID: <3A75C92E.CE803DE4@fourthought.com>

[Sorry if you get multiple copies of this]

I've started a series of tutorials on IBM developerWorks that will cover
XML messaging.  Python is the implementation language.

The first two parts of it are up.  Neither uses Python yet.  The first
is a background article

http://www-106.ibm.com/developerworks/library/co-tutintro.html

And then comes the first tutorial: on IDL (which is used to
specify/document the XML messaging interfaces)

http://www-105.ibm.com/developerworks/education.nsf/components-onlinecourse-bytitle/19CEA37A7099DFFC862569D50063163C?OpenDocument

The actual tutorials require free registration at dW.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From jeremy.kloth@fourthought.com  Mon Jan 29 20:06:06 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Mon, 29 Jan 2001 13:06:06 -0700
Subject: [XML-SIG] problem with empty namespace uri
References: <Pine.LNX.4.21.0101290941440.24805-100000@leo.logilab.fr>
Message-ID: <3A75CD2E.8BE10051@fourthought.com>

Alexandre Fayolle wrote:
> 
> On Sun, 28 Jan 2001, Martin v. Loewis wrote:
> 
> > spelled out explicitly in all places. I do not think that applications
> > should need to behave polymorphically, accepting either None or "".
> 
> I could not agree more.
> 
> <snipped useful ref to specs>
> 
> > If everybody agrees that this is how it should be, we should document
> > it as such where appropriate, and fix existing implementations
> > accordingly.
> 
> So to sum things up, this means that:
> 
>  * the patch to drv_xmlproc should be correct. I believe drv_expat should
> be already fine;
>  * 4DOM/minidom/etc. should be updated to use None for the namespace uri;
>  * applications using these implementation should be updated.
> 

Actually, the DOM spec says that objects created with the non-NS methods
have the null namespaceURI, localName and prefix.  So I would say that
if the parser is running in NS mode, everything is created with the NS
methods.

That would mean that unprefixed attributes would have an '' for the
namespaceURI and prefix.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From dieter@handshake.de  Mon Jan 29 17:53:09 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Mon, 29 Jan 2001 18:53:09 +0100 (CET)
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: <200101282107.OAA08130@localhost.localdomain>
References: <dieter@handshake.de>
 <200101282107.OAA08130@localhost.localdomain>
Message-ID: <14965.44549.716772.938879@lindm.dm>

Uche Ogbuji writes:
 > > Uche Ogbuji writes:
 > >  > Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
 > >  > namespaces on unprefixes attributes were being returned as the namespace of 
 > >  > the element).
 > > Is this not correct?
 > > 
 > > I interpreted the following phrase from the namespace spec
 > > in this direction:
 > > 
 > >    "Note that default namespaces do not apply directly to attributes."
 > 
 > Yes.  And I fixed the driver to meet this.  Prior to my fix, drv_xmlproc was 
 > returning the default namespace on unprefixed attributes in violation of XML 
 > Namespaces 1.0, and in particular, the portion you quoted.  Now it returns 
 > None, or after I check in Alexandre's patch, "".
I interpret this part differently:

  Default namespaces do not apply directly to attributes but
  indirectly via the element they belong to.

  If a have:

     <ns:elem attr=val ...>

  then (at least semantically), "attr" delongs to the same
  namespace as "elem" (the namespace associated with "ns").


I am not sure, whether the application or the parser should make
this namespace association for attributes.


Dieter


From uche.ogbuji@fourthought.com  Mon Jan 29 20:39:48 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 13:39:48 -0700
Subject: [XML-SIG] I am confused...
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Sun, 28 Jan 2001 23:05:11 +0100." <200101282205.f0SM5BB09225@mira.informatik.hu-berlin.de>
Message-ID: <200101292039.NAA11439@localhost.localdomain>

> > I remember I was doing queries in the form
> > "/article/author/name"
> > - and it was so slow... (0.5 - 1 sec per query on Celeron 400)
> 
> What kind of API did you use? For simple queries like this, a SAX
> ContentHandler may be sufficient. Using Uche's bigxml file, you can
> try
> 
> import xml.sax
> class NameRetriever(xml.sax.ContentHandler):
>     def __init__(self):
>         self.authors = []
>         self.in_author = self.in_name = 0
> 
>     def startElement(self, tag, attrs):
>         if tag=="author":
>             self.in_author = 1
>         else:
>             if self.in_author and tag == "name":
>                 self.in_name = 1
>                 self.txt = ""
> 
>     def characters(self,str):
>         if self.in_name:
>             self.txt = self.txt+str
> 
>     def endElement(self,tag):
>         if self.in_name and tag=="name":
>             self.authors.append(self.txt)
>             self.in_name=0
>         elif self.in_author and tag=="author":
>             self.in_author=0
> 
> h = NameRetriever()
> start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time()
> print end - start
> print len(h.authors)

This one needs to go into the XML HOWTO as an example.  We now have an XPath 
and SAX approach.  It would be easy to add a DOM approach.  I'll try to do it 
with the extra 3 hours the Devil offered me today in exchange for the pinkie 
fingernail of my soul.

> To my own surprise, this is not as fast as the cDomlette; probably
> because the latter links directly with expat, and thus avoids a number
> of indirections. Still, it takes only three times as long (0.5s vs
> 1.4s on my machine), and it will work on any Python 2.0 installation.

Cool!  I must confess that I would have guessed that SAX was close to 
cDomlette.  Yes, PySAX does add quite a bit of overhead (which was one of the 
motivations for the PyExpat reader and cDomlette), but I would have though 
that the integration of the processing with the parsing would make up the 
advantage.

Looks as if we might want to consider expanding cDomlette into a full-blown 
mutable DOM, though Mike and I are still discussing the best internal data 
structures.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 20:47:04 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 13:47:04 -0700
Subject: [XML-SIG] XSLT parser interface
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Sun, 28 Jan 2001 23:41:16 +0100." <200101282241.f0SMfGn09737@mira.informatik.hu-berlin.de>
Message-ID: <200101292047.NAA11452@localhost.localdomain>

> [This was sent to python-dev by mistake; my apologies - MvL]
> 
> Based on my previous IDL interface for XPath parsers, I've defined an
> API for a parser that parsers XSLT pattern expressions. It is an
> extension to the XPath API, so I attach only the additional functions.
> 
> Any comments are appreciated.
> 
> Martin
> 
> module XPath{
>   // XSLT exprType values
>   const unsigned short PATTERN = 17;
>   const unsigned short LOCATION_PATTERN = 18;
>   const unsigned short RELATIVE_PATH_PATTERN = 19;
>   const unsigned short STEP_PATTERN = 20;

I think we might want to space out these module-level constants a bit to allow 
for user extension.  Or should all extensions use numbers above a certain 
ceiling?

>   interface Pattern;
>   interface LocationPathPattern;
>   interface RelativePathPattern;
>   interface StepPattern;
> 
>   interface PatternFactory:ExprFactory{
>     Pattern createPattern(in LocationPathPattern first);
>     // idkey may be null, represents IdKeyPattern

Minor nit, but it puzzled me for a few seconds.  the comman above should be a 
colon, or just rephrase to

"If idkey is non-Null, this is an IdKeyPattern

>     // if parent is true, it is '/', else '//'
>     // rel may be null
>     LocationPathPattern createLocationPathPattern(in FunctionCall idkey,
> 						  boolean parent,
> 						  in RelativePathPattern rel);
>     // if parent is true, it is /, else //
>     RelativePathPattern createRelativePathPattern(in RelativePathPattern rel,
> 						  boolean parent,
> 						  in StepPattern step);
>     StepPattern createStepPattern(in AxisSpecifier axis,
> 				  in NodeTest test,
> 				  in PredicateList predicates);
>   };

Some of these take an approach that's a bit cute (for instance, the boolean 
parent idea), but since it's really a developer-only interface, this should be 
fine.

>   typedef sequence<LocationPathPattern> LocationPathPatterns;
>   interface Pattern:Expr{
>     readonly attribute LocationPathPatterns patterns;
>     void append(in LocationPathPattern pattern);
>   };
> 
>   interface LocationPathPattern:Expr{
>     readonly attribute FunctionCall idkey;
>     readonly attribute boolean parent;
>     readonly attribute RelativePathPattern relative_pattern;
>   };

I forgot whether Expr defines a pprint method.  If not, I think it should.  
this is a *very* handy debugging aid (and required by 4XDebug).

>   interface RelativePathPattern:Expr{
>     readonly attribute RelativePathPattern relative;
>     readonly attribute boolean parent;
>     readonly attribute StepPattern step;
>   };
> 
>   interface StepPattern:Expr{
>     readonly attribute AxisSpecifier axis;
>     readonly attribute NodeTest test;
>     readonly attribute PredicateList predicates;
>   };
> 
>   interface XSLTParser:Parser{
>     Pattern parsePattern(in DOMString pattern);
>   };
> };

Other than that, looks great.  Jeremy?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 20:48:07 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 13:48:07 -0700
Subject: [XML-SIG] Update Python XML topic
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "29 Jan 2001 10:48:14 +0100." <m3zogaheox.fsf@lambda.garshol.priv.no>
Message-ID: <200101292048.NAA11470@localhost.localdomain>

> 
> * Uche Ogbuji
> | 
> | One question: the PyPointers link goes to
> | 
> | http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/xptr.html
> | 
> | Which gives 404.  Lars, is this still something you stillwant
> | listed?  If so, where do I point to?
> 
> Just remove it.  That module implements a now obsolete XPointer
> syntax that is totally different from the current XPath-based one, and
> so really is useless.

K.  There were some bugs in the docs I added anyway, so I have some more work 
to do there.  And I get to test Martin's doupdate fixes.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 20:50:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 13:50:42 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "29 Jan 2001 10:58:37 +0100." <m3y9vuhe7m.fsf@lambda.garshol.priv.no>
Message-ID: <200101292050.NAA11485@localhost.localdomain>

> 
> * Alexandre Fayolle
> | 
> | If I build a DOM using the default non-validating parser, attributes
> | that have no namespace are available by specifying an empty string
> | as the namespace uri parameter to getAttributeNS().
> 
> Actually, I think this is something that is underspecified in both SAX
> and the DOM. We need to decide how to represent no namespace URI both
> in SAX and the DOM. At the moment I think both different SAX drivers
> and 4DOM/minidom disagree here. 4DOM/minidom also disagree in other
> parts of their Attributes implementations.
> 
> I have, unfortunately, not had time to dig sufficiently into this to
> know the exact state of things, but please don't start changing the
> code until we have agreed what is the correct behaviour.

Will hold off.  Too bad we don't have a dictator to Pronounce (if we were 
voting for one, I'd probably vote for Martin), but perhaps we're better off 
that way.

If the tide continues in favor of None in the next few days, we'll consider it 
a Group Pronouncement.

> My opinion is that names that have no namespace URI should be
> represented using None rather than "".

+1 for None


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 20:53:08 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 13:53:08 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Sun, 28 Jan 2001 23:23:24 +0100." <200101282223.f0SMNO009516@mira.informatik.hu-berlin.de>
Message-ID: <200101292053.NAA11496@localhost.localdomain>

> > I thought None was an acceptable NSUri in Python SAX2.  The docs
> > certainly seem to think so. 
> 
> What part of the docs specifically do you refer to, here? I think the
> None vs "" business is sufficiently confusing so it needs to be
> spelled out explicitly in all places. I do not think that applications
> should need to behave polymorphically, accepting either None or "".
> 
> For SAX, the only explicit statement I could find is in the Java SAX
> spec:
> 
>   uri - The Namespace URI, or the empty string if the element has no
>   Namespace URI or if Namespace processing is not being performed.
>   (http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/ContentHandler.html)

IIRC, NULLs are more of a hazard in Java, so perhaps we needn;t worry about 
this divergence.

> So unless you found documentation that Python has to use None here,
> I'd say we have to clarify the SAX API that a missing namespace is
> represented as "".
> 
> Unfortunately, the DOM specification has that different:
> 
>   # Note that because the DOM does no lexical checking, the empty
>   # string will be treated as a real namespace URI in DOM Level 2
>   # methods. Applications must use the value null as the namespaceURI
>   # parameter for methods if they wish to have no namespace.
>   (1.1.8 of DOM 2 Core)
> 
> This clearly means that a node without namespace has a null
> namespaceURI, according to
> http://python.sourceforge.net/devel-docs/lib/dom-type-mapping.html,
> this maps to None in Python.

Yes.  The DOM used to be very confused, allowing both empty string and null, 
but they cleaned this up, and 4DOM has followed suit.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 21:07:10 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 14:07:10 -0700
Subject: One more ques Re: [XML-SIG] I am confused...
In-Reply-To: Message from Roman Suzi <rnd@onego.ru>
 of "Mon, 29 Jan 2001 16:33:26 +0300." <Pine.LNX.4.30.0101291606550.10885-100000@rnd.onego.ru>
Message-ID: <200101292107.OAA11558@localhost.localdomain>

> On Sun, 28 Jan 2001, Mike Olson wrote:
> 
> And one more problem: my texts are far from plain ASCII.
> Do I need to convert them to utf8 or unicode before
> working with XML+XSLT+XPath?
> Do I need Python-2 to implement non US-ASCII site (and not latin-1)?

If you're using anything besides US-ASCII, I *stringly* suggest Python 2.0.

> >> In my application I need many such queries to fill
> >> the template - that is why speed was unbearable.
> >
> >What is you template?  XSLT?  If not have you thought of using it.  It
> >sounds like it was designed to do exactly what you need.

Pretty much.

> >Again, it sounds like your doing a lot by hand that is not needed.  You
> >can do this in XSLT with a simple template like
> 
> ><xsl:template match='article'>
> >  <HTML><HEAD><TITLE>Article By <xsl:value-of
> >select='author/name'/></TITLE></HEAD></HTML>
> ></xsl:template>
> 
> Wow! If it works as advertized - this is what I need.
> 
> Can I also embed some python sentences there to handle
> hard cases?

The easiest way to do this is through what's known as extension functions and 
extension elements.  But you might be surprised at how much you can do without 
straying from XSLT.

I would look in these places for inspiration

http://www.ibiblio.org/xml/books/bible/updates/14.html
http://www.zvon.org/xxl/XSLTutorial/Books/Book1/index.html
http://www.jenitennison.com/xslt/index.html
http://www.dpawson.co.uk/xsl/xslfaq.html
http://www.w3schools.com/xsl/

> Demos are sometimes more valuable than tutorials.
> In fact, I feel a need to reread overviews on XML (XSLT, XPath, AFs etc)
> to have better idea what they do before looking at
> demos.

There's some of this in the demo directory of the 4Suite documentation, but 
also see the above links for examples all of which should work with 4XSLT.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 21:11:40 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 14:11:40 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from "Thomas B. Passin" <tpassin@home.com>
 of "Mon, 29 Jan 2001 09:32:45 EST." <002f01c08a00$58f295a0$7cac1218@reston1.va.home.com>
Message-ID: <200101292111.OAA11580@localhost.localdomain>

> Lars Marius Garshol wrote -
> >
> > My opinion is that names that have no namespace URI should be
> > represented using None rather than "".
> >
> I completely agree with this.  If there is ***no*** namespace, the ns value
> should be None.  The empty string should indicate that there is a namespace,
> but its value happens to be empty.
> 
> Illustrations seem to be like this - someone help me out here, please.
> 
> 1) No namespace is declared or used in the whole document, but SAX2 is in use.
> (ns='')

Hmm.  According to XMLNS 1.0, we shouldn't be differentiating this case.  I'd 
say (ns=None)

> 2) SAX 1 is in use. (ns=None)

It's not really applicable to SAX1: no ns-aware interfaces.

> 3) Namespaces are used in the document, but not in some particular element.
> (ns='' for that element)

OK.  Now I'm confused.  I guess you actually propose (ns='') to mean "no 
namespace on this name).

> This leaves open the ns for an attribute in an element that declare a default
> ns - the old question that comes up over and over.  I don't know the answer.

Pretty clear.  The processor should report no namespace.  It is up to the 
application to interpret differently, if it chooses to.


> I thought this had been hashed out and resolved on the list a while ago,
> although I don't remember the details.  This would be a perfect subject for
> one of those PEP-like pages I proposed a while ago.  I'd like to resurrect
> that suggestion, and have this topic be the subject of the first one.  What do
> you say?

I think it's a great idea.  For instance, the XPath API work could have been 
proposed and worked on in PEP fashion.  The problem is getting someone to set 
it up.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 21:15:03 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 14:15:03 -0700
Subject: [XML-SIG] I am confused...
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Mon, 29 Jan 2001 17:34:20 +0100." <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de>
Message-ID: <200101292115.OAA11617@localhost.localdomain>

> > I do not remember if this was what I used for measuring, but
> > this was my another effort to create query-mechanisms
> > (It doesnt work anymore due to lack of xml.dom.utils)
> 
> Thanks. I've ported it to minidom, see the code below. Fortunately,
> the DOM implementations follow the official API quite closely these
> days, so it is easy to move from one implementation to another.

Ain't standardization coool?

> Using Uche's 640k document, I get the following timings:
> 
> minidom: 6.4s
> 4DOM: 45s
> pDomlette: 8.9s

That chunky 4DOM.  Who wrote that anyway?

> cDomlette fails since it does not support createElement (pDomlette
> only has create*NS operations, so I added None as the namespace
> everywhere).

Yeah.  We're still debating adding mutation to cDomlette.  This thread makes 
me inclined to do so.

> Remember, this is the same machine where Uche's cDomlette/XPath query
> took 0.5s. So it *does* matter how exactly you approach a certain task
> (you can easily get a factor of 90 between solutions). However, if I
> had to guess in advance what the approximate outcome would have been
> in each of the solutions, I had been totally wrong.

So would I.  My guess would have been

cDomlette = 1
SAX = 1.5
pDomlette (pyexpat reader) = 2
4DOM = 10
minidom = 2

As you can see, I was way off as well.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 21:18:03 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 14:18:03 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from Jeremy Kloth <jeremy.kloth@fourthought.com>
 of "Mon, 29 Jan 2001 13:06:06 MST." <3A75CD2E.8BE10051@fourthought.com>
Message-ID: <200101292118.OAA11636@localhost.localdomain>

> Actually, the DOM spec says that objects created with the non-NS methods
> have the null namespaceURI, localName and prefix.  So I would say that
> if the parser is running in NS mode, everything is created with the NS
> methods.
> 
> That would mean that unprefixed attributes would have an '' for the
> namespaceURI and prefix.

Hmm.  Regardless of what the DOM says (I thought they'd unconfused themselves. 
 I guess I was wrong), that we should keep the interface consistent between 
the element and attribute no-ns indicators in *SAX2*.

The readers can conform to the DOM with some trivial extra effort.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Jan 29 21:22:38 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 29 Jan 2001 14:22:38 -0700
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Message from Dieter Maurer <dieter@handshake.de>
 of "Mon, 29 Jan 2001 18:53:09 +0100." <14965.44549.716772.938879@lindm.dm>
Message-ID: <200101292122.OAA11651@localhost.localdomain>

> Uche Ogbuji writes:
>  > > Uche Ogbuji writes:
>  > >  > Hmm.  I introduced this behavior while fixing another drv_pyexpat bug (default 
>  > >  > namespaces on unprefixes attributes were being returned as the namespace of 
>  > >  > the element).
>  > > Is this not correct?
>  > > 
>  > > I interpreted the following phrase from the namespace spec
>  > > in this direction:
>  > > 
>  > >    "Note that default namespaces do not apply directly to attributes."
>  > 
>  > Yes.  And I fixed the driver to meet this.  Prior to my fix, drv_xmlproc was 
>  > returning the default namespace on unprefixed attributes in violation of XML 
>  > Namespaces 1.0, and in particular, the portion you quoted.  Now it returns 
>  > None, or after I check in Alexandre's patch, "".
> I interpret this part differently:
> 
>   Default namespaces do not apply directly to attributes but
>   indirectly via the element they belong to.
> 
>   If a have:
> 
>      <ns:elem attr=val ...>
> 
>   then (at least semantically), "attr" delongs to the same
>   namespace as "elem" (the namespace associated with "ns").

No, I think this much is pretty clear from authoritative discussion, even 
though the XMLNS 1.0 spec is stupidly vague on the matter.  Based on my 
understanding of Tim Bray, James Tauber, etc, unprefixed attributes are 
*syntactically* in no namespace.

It is up to the application to decide that it *semantically* shares the 
namespace of its owner element, and this determination is easy enough to 
determine even though it differs from the strict syntax.

Basically, the XMLNS 1.0 processor should return a null namespace for attr in 
your example, but the appication is free to say "it's an attribute of elem, so 
I'll treat it as being in the {ns} namespace.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ken@bitsko.slc.ut.us  Mon Jan 29 22:10:26 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 29 Jan 2001 16:10:26 -0600
Subject: [XML-SIG] problem with empty namespace uri
In-Reply-To: Uche Ogbuji's message of "Mon, 29 Jan 2001 13:50:42 -0700"
References: <200101292050.NAA11485@localhost.localdomain>
Message-ID: <x7hf2ixb59.fsf@bitsko.slc.ut.us>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

> If the tide continues in favor of None in the next few days, we'll
> consider it a Group Pronouncement.
> 
> > My opinion is that names that have no namespace URI should be
> > represented using None rather than "".
> 
> +1 for None

Another data point -- the XML Infoset says:

 For Element Information Items[1]:
  [namespace name] The namespace name, if any, of the element type. If
  the element does not belong to a namespace, this property is null.

 For Attribute Information Items[2]:
  [namespace name] The namespace name, if any, of the attribute.
  Otherwise, this property is null.

  -- Ken

[1] <http://www.w3.org/TR/xml-infoset/#infoitem.element>
[2] <http://www.w3.org/TR/xml-infoset/#infoitem.attribute>


From martin@mira.cs.tu-berlin.de  Mon Jan 29 22:16:08 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 23:16:08 +0100
Subject: [XML-SIG] XSLT parser interface
In-Reply-To: <200101292047.NAA11452@localhost.localdomain> (message from Uche
 Ogbuji on Mon, 29 Jan 2001 13:47:04 -0700)
References: <200101292047.NAA11452@localhost.localdomain>
Message-ID: <200101292216.f0TMG8n00920@mira.informatik.hu-berlin.de>

> > module XPath{
> >   // XSLT exprType values
> >   const unsigned short PATTERN = 17;
> >   const unsigned short LOCATION_PATTERN = 18;
> >   const unsigned short RELATIVE_PATH_PATTERN = 19;
> >   const unsigned short STEP_PATTERN = 20;

> I think we might want to space out these module-level constants a
> bit to allow for user extension.

We might want to do so for future revisions of XPath itself, so this
is a good idea.

> Or should all extensions use numbers above a certain ceiling?

This is the general problem with a numeric type identification: you
need UUIDs or otherwise not-conflicting strings (like the IDL
repository IDs). However, this kind of identification appears to be
W3C tradition. So requesting that user extensions use another range
seems reasonable.

> Minor nit, but it puzzled me for a few seconds.  the comman above
> should be a colon, or just rephrase to
> 
> "If idkey is non-Null, this is an IdKeyPattern

Ok.


> Some of these take an approach that's a bit cute (for instance, the
> boolean parent idea), but since it's really a developer-only
> interface, this should be fine.

No, please suggest a more natural interface - I'm no XSLT expert at
all. The XPath tradition seems to be that everything with // is called
"abbreviated", so it would be

    /* rel/step */
    RelativePathPattern createRelativePathPattern(in RelativePathPattern rel,
						  in StepPattern step);
    /* rel//step */
    AbbreviatedRelativePathPattern createAbbreviatedRelativePathPattern
                      (in RelativePathPattern rel, in StepPattern step);

but that does not sound much better. I don't mind revising my
implementation, I did so a number of times when coming up with the
interface initially.

BTW, I find the grammar part of XSLT worded much worse than the one in
XPath. E.g. there is no apparent concern for lexical issues, like when
'id' should be considered as an NCName and when it should be the
pseudo-keyword of an IdKeyPattern.

> I forgot whether Expr defines a pprint method.  If not, I think it
> should.

It currently does not have anything except for the bare data
model/abstract syntax. Adding methods would be the next step;
I just added

  DOMString pprint();

to Expr. Evaluation needs more thought - atleast for me.

> Other than that, looks great.

Thanks!

Martin


From tpassin@home.com  Tue Jan 30 02:26:20 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 29 Jan 2001 21:26:20 -0500
Subject: [XML-SIG] problem with empty namespace uri
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us>
Message-ID: <006d01c08a64$086567c0$7cac1218@reston1.va.home.com>

We translate "this property is null" to "this python property is None", right?

Cheers,

Tom P

Ken MacLeod wrote -

> > +1 for None
> 
> Another data point -- the XML Infoset says:
> 
>  For Element Information Items[1]:
>   [namespace name] The namespace name, if any, of the element type. If
>   the element does not belong to a namespace, this property is null.
> 
>  For Attribute Information Items[2]:
>   [namespace name] The namespace name, if any, of the attribute.
>   Otherwise, this property is null.
> 


From tpassin@home.com  Tue Jan 30 06:43:57 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 30 Jan 2001 01:43:57 -0500
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com>
Message-ID: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>

Here is a first draft of a PEP about the value for the namespace uri when it
is "empty".  I modeled the PEP after the Python PEP guidelines.  Of course,
the main Python PEPs are written in ascii and there is a python script to
convert them to html.  But for fun, I created an xml format.  A stylesheet
will follow as I get time.  Other XML PEPs could be written in ascii if the
author wants.

Please comment - all useful material will find its way into the PEP.
Especially, would someone please give the main arguments for using the empty
string instead of None, and also if there are casees where one or the other
shuld be used, please identify them.

We need to have a home for these things - someone want to start a cvs branch,
or should be just have it in the files section of the SF pyxml pages?  Until
there is a home, we can just keep including it in emails, I think.

If anyone wants to take this one over, feel free, and add your name to the
author list.  Let's try to keep the discussion so that it can fit into the PEP
(as extended, of course).  The idea is that when stabilized, the PEP will be a
permanent record of whatever pyxml has decided and why.

We also need to figure out typical copyright statements (the Python PEP
gudelines call for copyright statements).

Finally, I invite everyone to suggest more topics for other PEPs we may find
helpful.

Cheers,

Tom P

=======================================================================
<?xml version='1.0'?>
<xmlpep>
 <headers>
  <pep_number>xmlpep-1</pep_number>
  <pep_title>Values for Null Or Empty Namespace URIs</pep_title>
  <pep_version>0.10</pep_version>
  <cvs_version_string/>
  <list_of_authors>
   <author name='Thomas B. Passin' email='tpassin@home.com'/>
  </list_of_authors>
  <status>Draft</status>
  <type>Standards Track</type>
  <created>29-Jan-2001</created>
  <history>
   <post date='29-Jan-2001'/>
  </history>
 </headers>
 <abstract>
  This PEP specifies the proper values of the Namespace URI property
  when its value might appear to be either "null", "None", or the
  empty string.

  The XMLPEP, when approved, will apply to all namespace-aware software
  maintained by the pyxml interest group.
 </abstract>
 <specification>
  <para title='Namespace-aware applications'>
   When no namespace has been declared whose scope applies to a
   particular element or attribute, the application MUST report the
   URI of the namespace of the element or attribute as None.

   When a namespace applies but its URI value is empty or null or None,
   the application MUST report the URI of the namespace value as None.
  </para>
  <para title='Namespace-ignorant applications'>
   This requirement does not apply for applications that are not
   namespace-aware.
  </para>
  <para title='Applicability'>
   Applies to all XML processing software maintained by the pyxml
   interest group.
  </para>
 </specification>
 <rationale>
  <para title='Definitive Treatment Needed'>
  This PEP is needed because of continued uncertainty among varous pyxml
  developers as to the proper values to use, and because of inconsistency
  among various pyxml products.  Differences between Python, IDL, and Java
  make it difficult to interpret existing W3C Recommendations
  unambiguously in this regard.

  A definitive and consistent treatment is needed so that all the pyxml
  software may be brought into agreement.
  </para>

  <para title='Arguments for "None"'>
   Most references in the Recommendations to the cases in question
   refer to "null" values.  Python offers a data object well adapted to
   indicate such values.  It is the None object.  The None object can
   be tested for exactly as for an empty string:

    <code>if uri:
                          doYourThing()
    </code>

   Alternatively, None can be tested for explicitely, as in:

    <code>if uri is not None:
                          doYourThing()
    </code>

   Thus, None is flexible enough to be useful in this application.
   Should there be some situation in which the use of an empty string
   would be logical or advantageous, it would be clearly distinguishable
   from the normal case where the value is None.

   Future versions of this PEP should specifify clearly in what
   situations, if any, an empty string should be used in lieu of
   the None object.
  </para>

 </rationale>
 <reference_implementation>[Should there be a reference here to one
  particular processor, such as xmlproc?]
 </reference_implementation>
 <notes></notes>
 <references></references>
 <copyright>This PEP may be used by anyone.</copyright>
</xmlpep>


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 08:06:06 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 09:06:06 +0100 (CET)
Subject: [XML-SIG] dom implementations
In-Reply-To: <200101292115.OAA11617@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0101300901040.26510-100000@leo.logilab.fr>

On Mon, 29 Jan 2001, Uche Ogbuji wrote:

> > Using Uche's 640k document, I get the following timings:
> > 
> > minidom: 6.4s
> > 4DOM: 45s
> > pDomlette: 8.9s
> 
> That chunky 4DOM.  Who wrote that anyway?

One thing you have to keep in mind is that 4DOM include features not
available in other implementations, such as DOM L2 Events: each time you
manipulate nodes, events get propagated up the DOM tree. This is a huge
overhead, but it is so useful when displaying a DOM in a gui...
 
> > cDomlette fails since it does not support createElement (pDomlette
> > only has create*NS operations, so I added None as the namespace
> > everywhere).
> 
> Yeah.  We're still debating adding mutation to cDomlette.  This thread makes 
> me inclined to do so.

This we would consider a good thing. We are considering switching from
4DOM to pDomlette for the kernel of Narval (after 1.0 is released), but
cDomlette would be even better.


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From martin@mira.cs.tu-berlin.de  Mon Jan 29 22:50:00 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 29 Jan 2001 23:50:00 +0100
Subject: One more ques Re: [XML-SIG] I am confused...
In-Reply-To: <3A75B9BB.7EEAC2F6@FourThought.com> (message from Mike Olson on
 Mon, 29 Jan 2001 11:43:07 -0700)
References: <Pine.LNX.4.30.0101291606550.10885-100000@rnd.onego.ru> <3A75B9BB.7EEAC2F6@FourThought.com>
Message-ID: <200101292250.f0TMo0r01083@mira.informatik.hu-berlin.de>

> > And one more problem: my texts are far from plain ASCII.
> > Do I need to convert them to utf8 or unicode before
> > working with XML+XSLT+XPath?
> > Do I need Python-2 to implement non US-ASCII site (and not latin-1)?
> 
> It would certainly make life easier, but you should be able to use 1.5.2

Depending on the exact software package you are going to use, and the
exact encoding that your documents have, it may or may not work. For
example, expat only knows about Latin-1 and UTF-8. In Python 2, it
will have access to the Python codecs, but they are not present in
1.5.2.

If you use drv_xmllib, and later when you produce output, the list of
supported encodings (from xml.unicode) is somewhat longer, but still
limited. E.g. ISO-8859-5 is supported, KOI-8R is not; that would easy
to add, though.

Since they perform to-utf8 conversion anyway, it is probably best to
recode to UTF-8 for 1.5.2 before parsing. Make sure that the recoding
drops or changes any encoding= attribute in the xml header, though.

Maybe you want to make an entire UTF-8 site :-? Many browsers display
that fine these days, in my experience.

Regards,
Martin


From rnd@onego.ru  Tue Jan 30 08:13:57 2001
From: rnd@onego.ru (Roman Suzi)
Date: Tue, 30 Jan 2001 11:13:57 +0300 (MSK)
Subject: [XML-SIG] I am confused...
In-Reply-To: <200101291634.f0TGYK401051@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.30.0101301111590.26463-100000@rnd.onego.ru>

On Mon, 29 Jan 2001, Martin v. Loewis wrote:

>Using Uche's 640k document, I get the following timings:
>
>minidom: 6.4s
>4DOM: 45s
>pDomlette: 8.9s


My computer has only 64M of RAM - so I was not able to measure anything
because  the system just dig into swap...
(top showed 33M of memory used by Python... :-(

>cDomlette fails since it does not support createElement (pDomlette
>only has create*NS operations, so I added None as the namespace
>everywhere).
>
>Remember, this is the same machine where Uche's cDomlette/XPath query
>took 0.5s. So it *does* matter how exactly you approach a certain task
>(you can easily get a factor of 90 between solutions). However, if I
>had to guess in advance what the approximate outcome would have been
>in each of the solutions, I had been totally wrong.
>
>Regards,
>Martin
>
>#!/usr/local/bin/python
>
>print "1. simple"
>
>from xml.dom import minidom
>from string import split, index
>
>def portr(node):
>    typ = node.nodeType
>    value = node.nodeValue
>    name = node.nodeName
>    atts = node.attributes
>    par = node.parentNode
>    print "t ",   typ, "v ",value, "n ",name, "a ", atts, "p ", par
>
>class strstream:
>  def __init__(self, str):
>     self.str = str
>#     print "strstream init"
>
>  def read(self, n):
>     tmp = self.str[:n]
>     self.str = self.str[n:]
>     return tmp
>
>  def readline(self):
>     return self.str
>
>def _normalize_tokens(tl):
>    """ rules:
>    $,word,$ --> $word$
>    """
>    rules2 = {
>    ("/","/") : "//",
>    (".","/") : "./",
>    ("!","=") : "$ne$",
>    ("<","=") : "$le$",
>    (">","=") : "$ge$",
>    ("=","~") : "$match$",
>    ("!","~") : "$no_match$",
>    (";",";") : ";",
>    }
>
>    rules1 = {
>    "=" : "$eq$",
>    "!" : "$lt$",
>    "<" : "$lt$",
>    ">" : "$gt$",
>    }
>
>    ntl = []
>    i = 0
>    while i < len(tl)-1:
>      if rules2.has_key( tuple(tl[i:i+2]) ):
>        toapp = rules2[tuple(tl[i:i+2])]
>        i = i+2
>      else:
>        if tl[i] == "$":
>          if i+2 < len(tl):
>            toapp = tl[i] + tl[i+1] + tl[i+2]
>            i = i+3
>          else:
>            raise "Query error !!!" + `tl`
>        else:
>          toapp = tl[i]
>          i = i+1
>      if rules1.has_key( toapp ):
>        toapp = rules1[toapp]
>      ntl.append( toapp )
>    return ntl
>
>def _parse_query(q):
>    from shlex import shlex
>    #  i1 = index(q, "/")
>    lexer = shlex(strstream(q))
>    tokens = []
>    tt = lexer.get_token()
>    while tt:
>      tokens.append(tt)
>      tt = lexer.get_token()
>    return _normalize_tokens(tokens)
>
>def find_all_descendants(node, cond):
>    return None     # XXX !!! stub
>
>def find_all_children(node, cond):
>    lst = []
>    exec(cond)       ### must define condition !!!
>    for n in node.childNodes:
>      if condition(n):
>        lst.append(n)
>    return lst
>
>class PYQL:
>  def __init__(self, file):
>    self.dom = minidom.parse(file)
>    self.docel = self.dom.documentElement
>
>  def query(self, q):
>    qr = self._query(self.docel, _parse_query(q), self.dom)
>    qel = self.dom.createElement("xql:result")
>    if qr:
>      qel.appendChild(qr)
>    qel.setAttribute("orig", str(q))
>    return qel
>
>  def _query(self, node, subq, qrdoc):
>    #print subq
>    #print find_all_children(node,
>    #"""def condition(n): return n.nodeName == "fig" """)
>    if subq[0] == "//":
>      self._query(node, subq[1:], qrdoc)
>    elif subq[0] == "/":
>      if subq[1] == node.nodeName:
>        if len(subq) > 2:
>          if subq[2] == "/":
>            qel = qrdoc.createElement(node.nodeName)
>            for a in node.attributes.keys():
>              qel.setAttribute(a, node.attributes[a].nodeValue)
>            for node1 in node.childNodes:
>              q2 = self._query(node1, subq[2:], qrdoc)
>#              print "q2: ", q2
>              if q2:
>                 qel.appendChild(q2)
>            if len(qel.childNodes)==0:
>              del qel
>              return None
>            else:
>              return qel
>          else:
>            return node
>        else:
>          return node
>      else:
>        return None
>
>
>a = PYQL('bigxml')
>#  a.query('$or$ != 1.23E-4          /article/text/topic$')
>#  print a.query('/article/text/topic.').toxml()
>import time;start=time.time()
>res=a.query('/article/author/name.').toxml()
>print time.time()-start
>print len(res)
>#   print a.query('//fig.').toxml()
>


Sincerely yours, Roman Suzi
-- 
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Tuesday, January 30, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Give instruction to a wise man and he will be yet wiser." _/


From uche.ogbuji@fourthought.com  Tue Jan 30 08:32:55 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 01:32:55 -0700
Subject: [XML-SIG] Will gettext do?
Message-ID: <200101300832.BAA13572@localhost.localdomain>

OK.  I've hacked away at it with a vengeance, and I nearly have gettext 
working in 4Suite on my computer, but I'm beginning to wonder whether gettext 
is not too brittle a solution.

Basically, I changed the en_US.py files to MessageSource.py files, used the 
following globally:

try:
    import gettext
    gettext.install('4Suite')
except:
    def _(msg):
        return msg

And wrapped all the strings in "_()".  That was all the easy part.

Then came the issue of building this thing.  I ended up checking pygettext.py 
into Ft/admin, and importing the right objects (TokenEater, Options, etc.).  
After a lot of hacking, I got a usable distutils module that could prepare 
4Suite.pot files and put them in the corresponding location in 
site-packages/Ft or whatever.  I verified that all the messages were extracted 
and all that.

Victory, right?

Hell no!

It turns out that the .pot files are useless.  Even Python's gettext module 
requires GNU gettext and the binary .mo files.

So first of all, this seems a non-starter in Windows.

So I wandered off to find out how to make these .mo files.  Never mind that I 
can't bloody get the GNU gettext command-line processor to do anything 
regardless of how many options and environment variables I throw at it.  It 
looks as if even if I get it to work, I'm going to need full access to 
/usr/share/locale on the machine.

So it's also a non-starter unless one can get root to install it.

I assume I'm missing a great deal here, because if not, I don't see how 
pygettext is usable as a general i18n solution.

And I've read and re-read the Python 2.0 gettext docs.  What I can follow of 
it isn't very promising.

Help?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Tue Jan 30 08:38:49 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 30 Jan 2001 09:38:49 +0100
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
In-Reply-To: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>
Message-ID: <200101300838.f0U8cn701268@mira.informatik.hu-berlin.de>

> Here is a first draft of a PEP about the value for the namespace uri
> when it is "empty".

One more comment: The discussion started with a specific patch for a
SAX driver, and it circled around how things are done in SAX and
DOM. So I think this PEP should explicitly elaborate what specific
parameters in the SAX and DOM APIs are treated in what way.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 30 08:36:23 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 30 Jan 2001 09:36:23 +0100
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
In-Reply-To: <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>
Message-ID: <200101300836.f0U8aN501266@mira.informatik.hu-berlin.de>

>    When a namespace applies but its URI value is empty or null or None,
>    the application MUST report the URI of the namespace value as None.

I'm not sure what this means. In section 2 of REC-xml-names-19990114,
they write

# If the attribute name matches PrefixedAttName, then the NCName gives
# the namespace prefix, used to associate element and attribute names
# with the namespace name in the attribute value in the scope of the
# element to which the declaration is attached. In such declarations,
# the namespace name may not be empty.

In section 4, they say

# The namespace prefix, unless it is xml or xmlns, must have been
# declared in a namespace declaration attribute in either the
# start-tag of the element where the prefix is used or in an an
# ancestor element (i.e. an element in whose content the prefixed
# markup occurs).

So how could it ever happen that "a namespace applies but its URI
value is empty or null or None"?

>    This requirement does not apply for applications that are not
>    namespace-aware.

What exactly does that mean? The XMLNS recommendation specifies what
it means that documents conform to it.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 30 08:44:09 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 30 Jan 2001 09:44:09 +0100
Subject: [XML-SIG] I am confused...
In-Reply-To: <Pine.LNX.4.30.0101301111590.26463-100000@rnd.onego.ru> (message
 from Roman Suzi on Tue, 30 Jan 2001 11:13:57 +0300 (MSK))
References: <Pine.LNX.4.30.0101301111590.26463-100000@rnd.onego.ru>
Message-ID: <200101300844.f0U8i9r01342@mira.informatik.hu-berlin.de>

> My computer has only 64M of RAM - so I was not able to measure anything
> because  the system just dig into swap...

That is another good reason to use SAX-based processing: In a
DOM-based approach, you typically need to build an internal
representation of the entire document first. It would still be
possible to work out a data-driven algorithm, but it would be more
limited (e.g. you couldn't go backwards in the document, or perform
multiple subsequent transformations).

Regards,
Martin


From larsga@garshol.priv.no  Tue Jan 30 09:02:50 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 30 Jan 2001 10:02:50 +0100
Subject: [XML-SIG] dom implementations
In-Reply-To: <Pine.LNX.4.21.0101300901040.26510-100000@leo.logilab.fr>
References: <Pine.LNX.4.21.0101300901040.26510-100000@leo.logilab.fr>
Message-ID: <m3u26ho1j9.fsf@lambda.garshol.priv.no>

* Alexandre Fayolle
| 
| One thing you have to keep in mind is that 4DOM include features not
| available in other implementations, such as DOM L2 Events: each time
| you manipulate nodes, events get propagated up the DOM tree. This is
| a huge overhead, but it is so useful when displaying a DOM in a
| gui...

It is tempting to look into ways of not having to pay this huge
penalty when you don't use that feature.  I've come across similar
problems many times when doing Python programming and wish there were
a general solution.

Features from Aspect-Oriented Programming, or CLOS, would be nice.

--Lars M.


From uche.ogbuji@fourthought.com  Tue Jan 30 09:06:09 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 02:06:09 -0700
Subject: [XML-SIG] XSLT parser interface
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Mon, 29 Jan 2001 23:16:08 +0100." <200101292216.f0TMG8n00920@mira.informatik.hu-berlin.de>
Message-ID: <200101300906.CAA13618@localhost.localdomain>

> > > module XPath{
> > >   // XSLT exprType values
> > >   const unsigned short PATTERN = 17;
> > >   const unsigned short LOCATION_PATTERN = 18;
> > >   const unsigned short RELATIVE_PATH_PATTERN = 19;
> > >   const unsigned short STEP_PATTERN = 20;
> 
> > I think we might want to space out these module-level constants a
> > bit to allow for user extension.
> 
> We might want to do so for future revisions of XPath itself, so this
> is a good idea.
> 
> > Or should all extensions use numbers above a certain ceiling?
> 
> This is the general problem with a numeric type identification: you
> need UUIDs or otherwise not-conflicting strings (like the IDL
> repository IDs). However, this kind of identification appears to be
> W3C tradition. So requesting that user extensions use another range
> seems reasonable.

How about users get numbers above 1000?

> > Some of these take an approach that's a bit cute (for instance, the
> > boolean parent idea), but since it's really a developer-only
> > interface, this should be fine.
> 
> No, please suggest a more natural interface - I'm no XSLT expert at
> all. The XPath tradition seems to be that everything with // is called
> "abbreviated", so it would be

Not quite.  Abbreviated is any abbreviation, "//" just being one (abbr for 
'/descendant-or-self::node()/').

I think what might be throwing you off is the inconsistent modularization in 
4XPath.  The various Parsed* classes were pretty much thrown into modules at 
random without much inconsistency, and it makes things like "Abbreviated" take 
on significance that they shouldn't have.

I've wanted to clean this up for a while, but we've always been short on time. 
 I think your confusion and the changes we're making are good enough reason to 
finally neaten things up.

Here are my suggested mods to your interface


  interface PatternFactory:ExprFactory{
    Pattern createPattern(in LocationPathPattern first);
    // idkey or step must be null
    // if left is null, it's an absolute pattern
    LocationPathPattern locationPathPattern(in locationPathPattern left,
					    in locationPathPattern right,
                                            in StepPattern step,
                                            in FunctionCall idkey);

    StepPattern createStepPattern(in AxisSpecifier axis,
				  in NodeTest test,
				  in PredicateList predicates);
  };

I'm not even sure of this.  I'll want to talk some things over with Mike and 
Jeremy tomorrow.  For one thing, I wonder whether we don't have too many 
"Parsed*" classes.  Some things look as if they could be parameterized in 
combined classes.  For instance, the separation of "Absolute*".


Also, I wonder whether in the general case, the parser should expand 
abbreviations, or whether they should be reported as is to the engine.  My 
first inclination is to make the parser do the expansion.

> but that does not sound much better. I don't mind revising my
> implementation, I did so a number of times when coming up with the
> interface initially.

Yes.  One thing about the messy XPath BNF is that it doesn't suggest a model 
very straightforwardly.

> BTW, I find the grammar part of XSLT worded much worse than the one in
> XPath. E.g. there is no apparent concern for lexical issues, like when
> 'id' should be considered as an NCName and when it should be the
> pseudo-keyword of an IdKeyPattern.

We've often wondered about this part ourselves.  It looks as if they are 
trying to make a distinction between "id" and "key" and other function calls 
at a syntactic layer, when IMO it should be at the semantic layer.  This 
wouldn't be so bad if the XPath parser state machine wasn't already so chaotic.

> > I forgot whether Expr defines a pprint method.  If not, I think it
> > should.
> 
> It currently does not have anything except for the bare data
> model/abstract syntax. Adding methods would be the next step;
> I just added
> 
>   DOMString pprint();
> 
> to Expr. Evaluation needs more thought - atleast for me.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Tue Jan 30 09:04:46 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 30 Jan 2001 10:04:46 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <200101300832.BAA13572@localhost.localdomain> (message from Uche
 Ogbuji on Tue, 30 Jan 2001 01:32:55 -0700)
References: <200101300832.BAA13572@localhost.localdomain>
Message-ID: <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>

> It turns out that the .pot files are useless.  Even Python's gettext
> module requires GNU gettext and the binary .mo files.

Sure, it requires .mo files - but why GNU gettext?

> So I wandered off to find out how to make these .mo files.  Never
> mind that I can't bloody get the GNU gettext command-line processor
> to do anything regardless of how many options and environment
> variables I throw at it.

Do you already got .mo files? The tool to create them is msgfmt, not
gettext (that utility reads .mo files). Of course, there is not much
fun to generate a binary catalog if it has no translations. So you'd
first produce 4Suite.de.po, send it to me, and I send it back to you
filled with German translations.  Then you use msgfmt.

> So first of all, this seems a non-starter in Windows.

Why, again, is that? To format the catalog? Please have a look at
Tools/i18n/msgfmt.py.

> It looks as if even if I get it to work, I'm going to need full
> access to /usr/share/locale on the machine.

gettext will look in /usr/share/locale for catalogs by default, yes -
unless you've called bindtextdomain before that. Of course, if you
know the specific catalog to use, you can also instanciate
gettext.GNUTranslations directly.

> I assume I'm missing a great deal here

That is my theory also :-)

> And I've read and re-read the Python 2.0 gettext docs.

Which documentation specifically? And what specific passages made you
despair (sp?).

Regards,
Martin


From uche.ogbuji@fourthought.com  Tue Jan 30 09:10:30 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 02:10:30 -0700
Subject: [XML-SIG] I am confused...
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Tue, 30 Jan 2001 09:44:09 +0100." <200101300844.f0U8i9r01342@mira.informatik.hu-berlin.de>
Message-ID: <200101300910.CAA13661@localhost.localdomain>

> > My computer has only 64M of RAM - so I was not able to measure anything
> > because  the system just dig into swap...
> 
> That is another good reason to use SAX-based processing: In a
> DOM-based approach, you typically need to build an internal
> representation of the entire document first.

Not in 4Suite 0.10.2 you won't.  DbDOM is undergoing some serious surgery.  It 
will have pretty nifty swapping of nodes in and out of memory courtesy of 4ODS.

Of course, it will be slower than even 4DOM, but as ever, that's the trade-off.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 09:13:40 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 10:13:40 +0100 (CET)
Subject: [XML-SIG] dom implementations
In-Reply-To: <m3u26ho1j9.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0101301007100.26603-100000@leo.logilab.fr>

On 30 Jan 2001, Lars Marius Garshol wrote:

> * Alexandre Fayolle
> | 
> | One thing you have to keep in mind is that 4DOM include features not
> | available in other implementations, such as DOM L2 Events: each time
> | you manipulate nodes, events get propagated up the DOM tree. This is
> | a huge overhead, but it is so useful when displaying a DOM in a
> | gui...
> 
> It is tempting to look into ways of not having to pay this huge
> penalty when you don't use that feature.  I've come across similar
> problems many times when doing Python programming and wish there were
> a general solution.

I thought about this when I added Events to 4DOM, and finally did not
implement a way to disable them, because I was a bit in a hurry at that
time.

I now see two solutions possible solutions:

 * have the Document instance be aware of existing listeners, and let the
propagation methods query the document before actually propagating
anything (this would completely disable Event propagation if noone is
listening)

 * use the the hasFeature method of the DOM implementation to see if we
want DOM L3 events (a bit as is done in the 4DOM test suite where
namespaces are disabled at some point), and let the propagatoin know if it
is expected to propagate anything.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Tue Jan 30 09:16:35 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 02:16:35 -0700
Subject: [XML-SIG] Will gettext do?
References: <200101300832.BAA13572@localhost.localdomain> <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
Message-ID: <3A768673.16ABCE06@fourthought.com>

"Martin v. Loewis" wrote:

> Which documentation specifically? And what specific passages made you
> despair (sp?).

I've got to go to bed (up at bloody 6:00 a.m.when the nipper wakes up),
but I wanted to first point out the culprit that seems to have led me so
far astray

See

http://python.sourceforge.net/devel-docs/lib/node160.html

Which seems to suggest that you need GNU and makes no mention of
msgfmt.py

I read the whole gettext section and I don't think I ever say msgfmt.py
mentioned.

I even tried checking out

http://www.iro.umontreal.ca/contrib/po-utils/HTML

But the key pages kept timing out.

It does look as if you have answers, so I'll be back at it tomorrow.

Thanks and good night.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 09:23:17 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 10:23:17 +0100 (CET)
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
Message-ID: <Pine.LNX.4.21.0101301021540.26603-100000@leo.logilab.fr>

On Tue, 30 Jan 2001, Martin v. Loewis wrote:

> Do you already got .mo files? The tool to create them is msgfmt, not
> gettext (that utility reads .mo files). Of course, there is not much
> fun to generate a binary catalog if it has no translations. So you'd
> first produce 4Suite.de.po, send it to me, and I send it back to you
> filled with German translations.  Then you use msgfmt.

If you need french translators, I guess you know where to look for them...

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From larsga@garshol.priv.no  Tue Jan 30 10:26:25 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 30 Jan 2001 11:26:25 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <Pine.LNX.4.21.0101301021540.26603-100000@leo.logilab.fr>
References: <Pine.LNX.4.21.0101301021540.26603-100000@leo.logilab.fr>
Message-ID: <m3lmrtnxny.fsf@lambda.garshol.priv.no>

* Alexandre Fayolle
|
| [to Uche] 
| If you need french translators, I guess you know where to look for
| them...

BTW: xmlproc supports localization of its error messages, using a
home-spun mechanism, which is far less powerful than gettext, but
seems to do the job.

Currently it has error messages in English, Norwegian and Swedish.
Contributions of translations to any other language would be most
welcome. 

All you need to make a translation can be found in 

  xml/parsers/xmlproc/errors.py


Just in case anyone is interested.

--Lars M.


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 11:49:40 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 12:49:40 +0100 (CET)
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <m3lmrtnxny.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0101301244320.26847-101000@leo.logilab.fr>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

---1463794431-682694632-980855380=:26847
Content-Type: TEXT/PLAIN; charset=US-ASCII

On 30 Jan 2001, Lars Marius Garshol wrote:

> Currently it has error messages in English, Norwegian and Swedish.
> Contributions of translations to any other language would be most
> welcome. 
> 
> All you need to make a translation can be found in 
> 
>   xml/parsers/xmlproc/errors.py
> 
> 
> Just in case anyone is interested.

You'll find a french translation of the message table attached to this
mail. Comments are welcome.

Just to be sure :
msg 4003 : PE == Parsed Entity  ?

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).

---1463794431-682694632-980855380=:26847
Content-Type: APPLICATION/x-gzip; name="xmlproc_errors_french.py.gz"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.21.0101301249400.26847@leo.logilab.fr>
Content-Description: 
Content-Disposition: attachment; filename="xmlproc_errors_french.py.gz"

H4sICA6pdjoCA3htbHByb2NfZXJyb3JzX2ZyZW5jaC5weQCtWF1v5LYVffev
IGIspgXsxYw0s2sbbdHGHgMD2K6x9m43D3mgJY7NQh8TUprYKPpfg77E6kN+
Q84lRZEayets2gAb71q8l/fj3HPv5T5bKlUqzWTBzpUokoe9vbX5+ed/sX32
t0w88iJVgp3zpzLLxAG7KO9lxu/e+m9v229/zdpPa7XH8N8+uxCaVULl+LFW
vPiZS83qSmZSN5rpsqhYJRX+mgqWTcp6q/i9aEU/X17gLl7c41dMVIxvNplM
eCXLQh/gbg6DL2XywFV6wJZPigzQrezq5tsrFh1Gs+hwejxdRIdHe+2Xw8ND
9g+uClnc6xM2m06nh7Pj42Pzlf51wr65Vs1aPgqyKS1zXCNYUeay4Bl+Fixt
koyrhk3e6Mk3B04wguCySMqUrKVjut5sSlXtnotx7uapqPj4BeWdLrPnSniB
BQROueJJ9YwkFHXeqBI600n/lpQXmmWcwfaGsme0J07Oq3sHdReTJmtygeCT
bYyzGhYgJRWEJryqlLyrK81yStaGm+RYp03wcaYV91rfG61OVFi9QleQW8tC
sk1WQ82E7lmXUnvBIwhelulzZu0FIERRM57fyfsap4KkXXGAaMPh2I9t+qyO
Y5MzwtnG5k0HcdVtYOtHhqs3ot6S1/YaqVgWhIhNTro0HU9nxqGPH1YsrYco
IP2yMrH5bwXRrUyFl42MPc6cUQUTio0JrQWTF46tME4GuQgE6kL+UAsUg3pG
ZSkuK2EymZLfu073I/iJZzKV1RMTpuBPWETojxz6I4v+C862PBO1sjXZmWBS
WlCOlEIiyiI19vxEoGsFZL4pdeMiAX2zIdhsGiq2RoKVUJBPUNsCLIKIesmo
J4kUUfhMJJO6p2zMHHfcxjawJx6xZ8KNYEM11EsGzs+H5ykPRGiyaA5shcgw
n0mZbzIROLJowZmWRdFQaoSlPTrM66pUYEJh5HKPfyoFRKVfZpEr3p2UfMkB
W5ers7bQNxug/ZfKlKNExrSpxpY7RBdcL0/leSNqECujP/gmq0Yb5gE9t55Q
NLTo1MCvjkKWV7er2++8vmMLsKKsLJW85IHP2Wza91mJH2qY3OWC3+nQ4tms
f3x19mF5zjalhKEaEK0L/AqtTjyC7nibVC8djSfcUf44jUGOgPURbIxQseBu
uKVY9bQRA86EzLxFho8WBArhSMoQy/756vPyjJWgxG0pTUm2tUaaQa6AkNdI
WHvMsxNLk4aijBZIGSMa0z1M5L3Qu6GQvctC0l4Hci21lneU3UkqcG1WTagp
TzaoPqG2Iggiga5NrGetzvAO9FCJf+4wjNdy9BoXef+QVOLLLfFbgJzjV9ks
sI9BFOheS27IdExhNP0ahb6jBpp1q1q3uj2EIgvc0ArrZdvXf8ulu/ZG/Vro
V52dwgTp7j4E443opyOy1GkJoKv/r1IUDmAiyw7XJcbCFO1Kdw0ppoYUu4a0
j3kUWQA46Qrb7mPbpFa5wyPiS2MjNXMQP5QBVEnP+Nj2oVOMjZWqE2OfwXaR
wDhD4f/k5gdPHsS2457YdqFu9qLKtxOIIQeEFV2r1qMdLLa9JjQTlMGlChuW
cmaSr3B29flyecJuH2BJXiOldwIOoW8Ip3LuCgt8YH42FZedJf7qhTuH7+Hn
nahuVJm8RRpPEYyyuOYKhexUECvcfHdzu7yky64/fnuxOn3puve2LHojIhhH
PFaOT9q+A4dFgblI5BKd3yCb6sSCyus7GuojsL2uareBxbbhXHfdLlCpAxud
IUTfA3NMA3pBhTFrvHnGthWtMiCsoImNalgoLWl+9nKBm0VJ+rx4ZHpKf/SW
RdpsEH3Qg/AOmDaMDicUxruOU+Lp3A0fhnFoqLAMDkoKaoHU+DnSVsbP3Iwo
ik2Ak4m9AGsaiL4J1BNIPpFPVgt5Fe4krpBegN01flK9OsyZJtqrUpllzT14
z8eEKuBcFnbCbydBzNk5r2owzQG4cWe4c9S4xpAS5HTxih4zmu+Mn156bAwz
wPyPpN6BZSOYDmLbECkNrT4d7iIBoDG7TpDPRsmO6V+y4Oj/uBTGtlN+7CoR
xxNAXW6xZrRDWSWSKiDGyK1cnWKiwTSYEAmXilaRHWz6IdENFsHG6fXPwm7z
2pAY2043+ZMd1txlv/Gq2LLNHTdNjGgauDCjSbsc043bGqDgGT2TUC1/gXyj
3s4wMB29SijPEZGh6u+//8ukt1QGoOic0CKxpJPJChiBS4EThLAri4Iww4hF
bybAyaNh5kKJYcIEVckDef4lREbHrd4RM3+nyvhVkP1OvbPX9PbWm13sos30
R3lojCzTPwEtuDnHcCw3hmg9t+tuCk9HekUcj/fQnfM9fDtsvOinI0tPbCYu
nJ3dnlktwem5bTfON0ajEK/xfwO+TOY2ApNewx5jdijvjRKxa0IuugTt/LlS
oseD3VtK2ri8NAHC43evJS1QO845437H7/9XKnNTBE1rWP7b+dBfcBRyWWel
Xy5lgbW/qAOTLB3b7AeLxNgyZlbgHQqau8Lp5JKHUj7Supa53xo46gZkZjwO
EmGSkDcZL+7DBMxnuwUOqxEH/LUQWfaVUZ+3eG+V7eja7duBnF2aMVMpmO44
eueh0b9stHL78O+xovVi/n4wkShxX6PmkGZaZbVxrXrA3PyjoNZaVrisQh2k
1gT73HgFwV1utY+GnoW37WTkzrE/vNF/9IejVg2WCvdsPNRoRyKayCqzPAxP
LKyaO3x8EI8cBCJzno0cfOcPDj9SEXzohoSfiOQ8UofHCdK39lmjm4KGpwjG
ZyYtlNmxp4svypvB+1O79vpz9hGjGTk/M/d5AqWXBky+AweCRXRlcIl4uQ10
Thvo3G2gc7ttLrGx0fOmBTE7YRtpNrqWIBKu/avn3G6bAxHVi243arW1K956
8WhMvB3xABb3xZBGIBa/fiu7Xjo2G3SRXu/pb+ufT3mF2ee+C9KCgrSYtkFa
tO/GXR+iVaAT8QQHc95oZ/DCPQ53eQ2lcNA99NGj3RvLc2aRwmLidISJvPn7
wMgZGTlzRs6Mkf31whUmNtze3tKz9N97e3u/AlqNsa4kGwAA
---1463794431-682694632-980855380=:26847--


From larsga@garshol.priv.no  Tue Jan 30 11:55:12 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 30 Jan 2001 12:55:12 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <Pine.LNX.4.21.0101301244320.26847-101000@leo.logilab.fr>
References: <Pine.LNX.4.21.0101301244320.26847-101000@leo.logilab.fr>
Message-ID: <m3hf2hntjz.fsf@lambda.garshol.priv.no>

* Alexandre Fayolle
| 
| You'll find a french translation of the message table attached to this
| mail. Comments are welcome.

Great! Thank you!

I am not capable of providing comments as I do not speak French, but
this goes into the CVS tree immediately.
 
| Just to be sure :
| msg 4003 : PE = Parsed Entity  ?

That is correct.
 
--Lars M.


From Nicolas.Chauvat@logilab.fr  Tue Jan 30 11:50:16 2001
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Tue, 30 Jan 2001 12:50:16 +0100 (CET)
Subject: [XML-SIG] dom implementations
In-Reply-To: <m3u26ho1j9.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0101301249360.15235-100000@aries>

On 30 Jan 2001, Lars Marius Garshol wrote:

> [...]
> problems many times when doing Python programming and wish there were
> a general solution.
>=20
> Features from Aspect-Oriented Programming, or CLOS, would be nice.

Any pointers ?

--=20
Nicolas Chauvat

http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)


From larsga@garshol.priv.no  Tue Jan 30 12:15:26 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 30 Jan 2001 13:15:26 +0100
Subject: [XML-SIG] dom implementations
In-Reply-To: <Pine.LNX.4.21.0101301249360.15235-100000@aries>
References: <Pine.LNX.4.21.0101301249360.15235-100000@aries>
Message-ID: <m3elxlnsm9.fsf@lambda.garshol.priv.no>

* Lars Marius Garshol
| 
| Features from Aspect-Oriented Programming, or CLOS, would be nice.

* Nicolas Chauvat
| 
| Any pointers ?

AOP:

  <URL: http://aspectj.org/servlets/AJSite >

As far as I can tell this is little more than a cut-down version of
CLOS. That is still interesting, though.


CLOS is the Common Lisp Object System, which is basically the
object-oriented part of Common Lisp. 

  <URL: http://www.cetus-links.org/oo_clos.html >
  <URL: http://www.cs.cmu.edu/~dst/LispBook/index.html >

Paul Graham's 'ANSI Common Lisp' is by far the best introduction to
Common Lisp I have ever read, and it also covers CLOS. IMHO it is also
the best 'learn how to program in this language'-book ever.

--Lars M.


From tpassin@home.com  Tue Jan 30 13:59:36 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 30 Jan 2001 08:59:36 -0500
Subject: [XML-SIG] dom implementations
References: <Pine.LNX.4.21.0101301249360.15235-100000@aries>
Message-ID: <005601c08ac4$e1b78de0$7cac1218@reston1.va.home.com>

Nicolas Chauvat asked -

On 30 Jan 2001, Lars Marius Garshol wrote:

> [...]
> problems many times when doing Python programming and wish there were
> a general solution.
> 
> Features from Aspect-Oriented Programming, or CLOS, would be nice.

> Any pointers ?

See the Aspect Oriented Programming site at 

http://www.parc.xerox.com/csl/projects/aop/

Cheers,

Tom P


From uche.ogbuji@fourthought.com  Tue Jan 30 14:08:24 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 07:08:24 -0700
Subject: [XML-SIG] dom implementations
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Tue, 30 Jan 2001 10:13:40 +0100." <Pine.LNX.4.21.0101301007100.26603-100000@leo.logilab.fr>
Message-ID: <200101301408.HAA14224@localhost.localdomain>

> > It is tempting to look into ways of not having to pay this huge
> > penalty when you don't use that feature.  I've come across similar
> > problems many times when doing Python programming and wish there were
> > a general solution.
> 
> I thought about this when I added Events to 4DOM, and finally did not
> implement a way to disable them, because I was a bit in a hurry at that
> time.
> 
> I now see two solutions possible solutions:
> 
>  * have the Document instance be aware of existing listeners, and let the
> propagation methods query the document before actually propagating
> anything (this would completely disable Event propagation if noone is
> listening)

I like this idea.  Users needn't do anything by default to avoid the events 
slow-down.

Given the listeners, we should also consider having the readers use Element 
constructors directly rather than the factory functions, as long as we can 
accommodate subclasses, as we did for cloneNode.

>  * use the the hasFeature method of the DOM implementation to see if we
> want DOM L3 events (a bit as is done in the 4DOM test suite where
> namespaces are disabled at some point), and let the propagatoin know if it
> is expected to propagate anything.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Jan 30 14:11:54 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 07:11:54 -0700
Subject: [XML-SIG] Will gettext do?
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Tue, 30 Jan 2001 12:49:40 +0100." <Pine.LNX.4.21.0101301244320.26847-101000@leo.logilab.fr>
Message-ID: <200101301411.HAA14240@localhost.localdomain>

Lars wonders if anyone is interested in more xmlproc message file translations 
at

10:26:25 GMT

Alexandre Fayolle submits French translation at

11:49:40 GMT

Is this a great group or what?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 14:57:57 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 15:57:57 +0100 (CET)
Subject: [XML-SIG] dom implementations
In-Reply-To: <200101301408.HAA14224@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0101301518400.27157-100000@leo.logilab.fr>

On Tue, 30 Jan 2001, Uche Ogbuji wrote:

> Given the listeners, we should also consider having the readers use Element 
> constructors directly rather than the factory functions, as long as we can 
> accommodate subclasses, as we did for cloneNode.

I'd be curious to hear about the implementation you have in mind. We
overloaded the factory functions in a custom document, so that they check
the tag and ns of an element (for instance) and instantiate the right
class depending on this, so I'd say we actually need these factory
methods. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Tue Jan 30 16:28:47 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 09:28:47 -0700
Subject: [XML-SIG] Will gettext do?
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Tue, 30 Jan 2001 10:04:46 +0100." <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
Message-ID: <200101301628.JAA15183@localhost.localdomain>

OK.  With the rechrge given by Martin, some more digging throught the bowels 
of msgfmt.py and gettext.py, and a lot more hacking at distutils, I think I'm 
ready to declare partial victory.  I have gettext working on my machine.

The reason the victory is only partial is that I can't see how to generalize 
the procedure when more languages are added.

Here is the procedure I follow in distutils, say for 4DOM.

1. run pygettext to create build/[platform]/_xmlplus/4Suite.po
2. run msgfmt to create the build/[platform]/_xmlplus/4Suite.mo
3. create build/[platform]/_xmlplus/en_US/LC_MESSAGES and move 4Suite.mo there
4. Make distutils copy build/[platform]/_xmlplus/en_US/LC_MESSAGES/4Suite.mo 
to the equivalent directory in the Python lib

The problem is step 3.  I can't see a way (and I read all the way through 
msgfmt.py) to automatically mark the locales whose directories I should 
create.  I basically hard-code the creation of "en_US", and I'd have to 
hard-code "de_DE" and all that when I get the translations.

Maybe this is how it's supposed to be, but it seems odd.

I'll troll about a bit more in Tools/i18n/, but I thought maybe Martin or 
someone has the snap answer.

Anyway, if anyone wants to have an early look, I've synced the anonymous CVS 
with my internal version.  You can now check out 4Suite and see the updates. 
setup.py and admin/DistExt.py have the distutils extensions for gettext and 
Dom, Lib, Rdf, and Xslt have changed __init__ and en_US.py -> 
"MessageSource.py".

Also, I've checked in the .po files for interested translators to work on, 
though I'd wait for a bit because I'm about to post a message on gettext 
translation maintenance.

Anonymous CVS procedures here:

http://lists.fourthought.com/pipermail/4suite/2001-January/001165.html


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Jan 30 16:41:39 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 09:41:39 -0700
Subject: [XML-SIG] on gettext maintenance
Message-ID: <200101301641.JAA15264@localhost.localdomain>

Again I read through all the Python gettext docs, and I might just be 
completely missing something, but I don't see how .po files are to be cleanly 
maintained.

Martin said

"So you'd first produce 4Suite.de.po, send it to me, and I send it back to you
filled with German translations."

BTW, the "4Suite.de.po" part confuses me.  Based on this and the msgid/msgstr 
combos in the code, I'm guessing each language has a .po.  Fine, but again, 
how does this feed into msgfmt.py?  Is a single .mo file created, or one for 
each language?  I see no fields that specify the localization for each .po 
file.

Anyway, so what happens when I change or add messages and all that.  Do I 
simply send brand new .po files to each translation, maybe sending a diff as 
well to make the changes clear?  This seems cumbersome.  Of course, I'm not 
sure what scheme would be smoother.

I must say againa that the Python gettext docs are pretty hard to follow, and 
they leave quite a few holes unexplained, besides their occasional bad advice 
("Once you've used pygettext to create your .pot files, you can use the 
standard GNU gettext tools to generate your machine-readable .mo files").

BTW, what's the difference between a .po and .pot file?  If none, why does 
msgfmt.py insist on ".po" when the docs just talk about ".pot"?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Tue Jan 30 16:54:55 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 30 Jan 2001 17:54:55 +0100 (CET)
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <200101301641.JAA15264@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0101301749050.27329-100000@leo.logilab.fr>

On Tue, 30 Jan 2001, Uche Ogbuji wrote:

> Anyway, so what happens when I change or add messages and all that.  Do I 
> simply send brand new .po files to each translation, maybe sending a diff as 
> well to make the changes clear?  This seems cumbersome.  Of course, I'm not 
> sure what scheme would be smoother.

Well, a diff is a much friendlier way to present these kind of things than
a brand new file. 

> BTW, what's the difference between a .po and .pot file?  

I think the latter is supposed to be smoked. As for the former, well... I
won't go into the details, and let you decide for yourself. ;o)

Alexandre 'sorry, I cound not resist this one' Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From Mike.Olson@fourthought.com  Tue Jan 30 17:45:37 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 30 Jan 2001 10:45:37 -0700
Subject: [XML-SIG] dom implementations
References: <200101301408.HAA14224@localhost.localdomain>
Message-ID: <3A76FDC1.C2591589@FourThought.com>

Uche Ogbuji wrote:
> 
> > > It is tempting to look into ways of not having to pay this huge
> > > penalty when you don't use that feature.  I've come across similar
> > > problems many times when doing Python programming and wish there were
> > > a general solution.
> >
> > I thought about this when I added Events to 4DOM, and finally did not
> > implement a way to disable them, because I was a bit in a hurry at that
> > time.
> >
> > I now see two solutions possible solutions:
> >
> >  * have the Document instance be aware of existing listeners, and let the
> > propagation methods query the document before actually propagating
> > anything (this would completely disable Event propagation if noone is
> > listening)
> 

Or make the document the hub, all events go through the document and
either it propogates or it doesn't.  I don't think there is much
overhead in the actual sending of an event.  

Another though I had was to be able to turn events on and off at
runtime.  Ex, when you read in a document you don't want all of the
events, but after it is read you may...


> >  * use the the hasFeature method of the DOM implementation to see if we
> > want DOM L3 events (a bit as is done in the 4DOM test suite where
> > namespaces are disabled at some point), and let the propagatoin know if it
> > is expected to propagate anything.


Yes, except instead of haveing the code spread all over have a
_4dom_propogate on the document that is a noop if the feature is not
enabled.


I'd like the same type of setup for Ranges as well, but I'll wait until
we agree on something before I implement....

Mike

> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From teg@redhat.com  Tue Jan 30 19:05:10 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 30 Jan 2001 14:05:10 -0500
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <200101301641.JAA15264@localhost.localdomain>
References: <200101301641.JAA15264@localhost.localdomain>
Message-ID: <xuy7l3csvx5.fsf@halden.devel.redhat.com>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

> BTW, the "4Suite.de.po" part confuses me.  Based on this and the msgid/msgstr 
> combos in the code, I'm guessing each language has a .po.

Yes... but it's usially just called "de.po" (in this case)

> Fine, but again,  how does this feed into msgfmt.py?

Never used that... only the standard gettext modules

> Is a single .mo file created, or one for 
> each language?

One for each language... the way it usually works, is that the source
package has a file like "de.po". From that, it creates a .mo file
which eventually is installed as 
"/usr/share/locale/de/LC_MESSAGES/4Suite.mo"

> Anyway, so what happens when I change or add messages and all that.  Do I 
> simply send brand new .po files to each translation, 

Take a look at the makefile for e.g. kbdconfig - it's simple, and it
handles this.

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From dieter@handshake.de  Tue Jan 30 19:28:07 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Tue, 30 Jan 2001 20:28:07 +0100 (CET)
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <4590926@toto.iv>
Message-ID: <14967.5575.121696.89293@lindm.dm>

Uche Ogbuji writes:
 > Anyway, so what happens when I change or add messages and all that.  Do I 
 > simply send brand new .po files to each translation, maybe sending a diff as 
 > well to make the changes clear?  This seems cumbersome.  Of course, I'm not 
 > sure what scheme would be smoother.
The extraction routine is smart enough to merge in new
string keys to be translated and mark slightly changed keys
as fuzzy.

The best thing probably is to extract the new and fuzzy keys
and send them.

Dieter


From fdrake@acm.org  Tue Jan 30 20:46:13 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 30 Jan 2001 15:46:13 -0500 (EST)
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <3A768673.16ABCE06@fourthought.com>
References: <200101300832.BAA13572@localhost.localdomain>
 <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
 <3A768673.16ABCE06@fourthought.com>
Message-ID: <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>

Uche Ogbuji writes:
 > http://python.sourceforge.net/devel-docs/lib/node160.html
 > 
 > Which seems to suggest that you need GNU and makes no mention of
 > msgfmt.py
 > 
 > I read the whole gettext section and I don't think I ever say msgfmt.py
 > mentioned.

  I've forwarded your comments on this to Barry Warsaw, so that he can
update that portion of the documentation.  Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Tue Jan 30 20:58:30 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 13:58:30 -0700
Subject: [XML-SIG] Will gettext do?
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Tue, 30 Jan 2001 15:46:13 EST." <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>
Message-ID: <200101302058.NAA15883@localhost.localdomain>

> Uche Ogbuji writes:
>  > http://python.sourceforge.net/devel-docs/lib/node160.html
>  > 
>  > Which seems to suggest that you need GNU and makes no mention of
>  > msgfmt.py
>  > 
>  > I read the whole gettext section and I don't think I ever say msgfmt.py
>  > mentioned.
> 
>   I've forwarded your comments on this to Barry Warsaw, so that he can
> update that portion of the documentation.  Thanks!

Oh dear.  I've kvetched enough about the Python docs lately that someone's 
going to challenge me to actually do something productive about it one of 
these days.

I should never forget to say that in general, they are very good.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Tue Jan 30 21:40:53 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 30 Jan 2001 16:40:53 -0500 (EST)
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <200101302058.NAA15883@localhost.localdomain>
References: <fdrake@acm.org>
 <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>
 <200101302058.NAA15883@localhost.localdomain>
Message-ID: <14967.13541.550582.65144@cj42289-a.reston1.va.home.com>

Uche Ogbuji writes:
 > Oh dear.  I've kvetched enough about the Python docs lately that someone's 
 > going to challenge me to actually do something productive about it one of 
 > these days.

  Chances are it will be me.  You're always free to submit
patches and bug reports.  ;-)

 > I should never forget to say that in general, they are very good.

  Thank you!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Tue Jan 30 23:27:00 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 16:27:00 -0700
Subject: [XML-SIG] po files ready for translation
Message-ID: <200101302327.QAA16754@localhost.localdomain>

OK.  I think I have it all working.  The real-world test is to get some 
translations in and see if I get my nice German/French/etc messages.

I've synced up CVS so that you can get the latest if you like.

The po files for Lib, Dom, Xslt and Rdf are at

ftp://ftp.fourthought.com/pub/etc/4Suite-po.zip

If anyone translates them, just send them back (attachment to private e-mail 
will do) and I'll check the translations back in.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@mira.cs.tu-berlin.de  Tue Jan 30 23:30:17 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 00:30:17 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <200101301628.JAA15183@localhost.localdomain> (message from Uche
 Ogbuji on Tue, 30 Jan 2001 09:28:47 -0700)
References: <200101301628.JAA15183@localhost.localdomain>
Message-ID: <200101302330.f0UNUHA00961@mira.informatik.hu-berlin.de>

> Here is the procedure I follow in distutils, say for 4DOM.
> 
> 1. run pygettext to create build/[platform]/_xmlplus/4Suite.po
> 2. run msgfmt to create the build/[platform]/_xmlplus/4Suite.mo
> 3. create build/[platform]/_xmlplus/en_US/LC_MESSAGES and move 4Suite.mo there
> 4. Make distutils copy build/[platform]/_xmlplus/en_US/LC_MESSAGES/4Suite.mo 
> to the equivalent directory in the Python lib
> 
> The problem is step 3.  I can't see a way (and I read all the way through 
> msgfmt.py) to automatically mark the locales whose directories I should 
> create.  I basically hard-code the creation of "en_US", and I'd have to 
> hard-code "de_DE" and all that when I get the translations.
> 
> Maybe this is how it's supposed to be, but it seems odd.
> 
> I'll troll about a bit more in Tools/i18n/, but I thought maybe Martin or 
> someone has the snap answer.

It's my turn to go to bed now :-) but as a snap answer: the common
tradition is to have 4Suite.<lang>.po in the source distribution,
where <lang> is typically fr, de, en (and *not* fr_FR, de_DE, en_US -
unless the German translation for Germany really differs from the one
for, say, Austria). With that, it should not be too difficult to
generate the directories in a loop.

Furthermore, I feel that any <package/<lang>/LC_MESSAGES/catalog.po
approach is bad (even though it's gettext tradition), so I'd promote
the idea of having <package>.<lang>.mo instead, and keeping the mo
files all in a single directory. Unfortunately, I'm not sure whether
gettext.py supports such a scheme - essentially, you need the module
to tell you what languages to consider in what order, but you may want
to override the resulting file naming scheme.

So *if* you can install into <python prefix>/share/locale, that is
probably best if that also is the platform convention, otherwise, any
scheme that just works should do - even if it creates many extra
directories.

I'd support proposals to enhance gettext.py for easier distribution of
catalogs - in particular on non-Unix platforms, as well as proposals
to enhance distutils to support message catalogs.

Regards,
Martin


From uche.ogbuji@fourthought.com  Tue Jan 30 23:45:54 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 30 Jan 2001 16:45:54 -0700
Subject: [XML-SIG] Will gettext do?
In-Reply-To: Message from "Martin v. Loewis" <martin@mira.cs.tu-berlin.de>
 of "Wed, 31 Jan 2001 00:30:17 +0100." <200101302330.f0UNUHA00961@mira.informatik.hu-berlin.de>
Message-ID: <200101302345.QAA16810@localhost.localdomain>

> It's my turn to go to bed now :-) but as a snap answer: the common
> tradition is to have 4Suite.<lang>.po in the source distribution,
> where <lang> is typically fr, de, en (and *not* fr_FR, de_DE, en_US -
> unless the German translation for Germany really differs from the one
> for, say, Austria). With that, it should not be too difficult to
> generate the directories in a loop.

Well, I do know that en_US could differ from en_GR, but probably not so it's 
inconceivable to combine the two.

> Furthermore, I feel that any <package/<lang>/LC_MESSAGES/catalog.po
> approach is bad (even though it's gettext tradition), so I'd promote
> the idea of having <package>.<lang>.mo instead, and keeping the mo
> files all in a single directory. Unfortunately, I'm not sure whether
> gettext.py supports such a scheme - essentially, you need the module
> to tell you what languages to consider in what order, but you may want
> to override the resulting file naming scheme.

Given that I have reasonable dictionary for now, I'll leave it as is, and we 
can go about improving it when we've shaken out all the cases.

> So *if* you can install into <python prefix>/share/locale, that is
> probably best if that also is the platform convention, otherwise, any
> scheme that just works should do - even if it creates many extra
> directories.

I'll investigate <python prefix>/share/locale

For now they,re in each module's directory itself, which is easy to find 
(__file__), and I know can be written to on install.

> I'd support proposals to enhance gettext.py for easier distribution of
> catalogs - in particular on non-Unix platforms, as well as proposals
> to enhance distutils to support message catalogs.

I think I already have the distutils part down.  It generates a "default" po 
in a "generate" phase, and creates and installs the mo in a "build" phase.

See Ft/admin/DistExt.py


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From barry@digicool.com  Tue Jan 30 23:53:25 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Tue, 30 Jan 2001 18:53:25 -0500
Subject: [XML-SIG] Will gettext do?
References: <200101300832.BAA13572@localhost.localdomain>
 <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
 <3A768673.16ABCE06@fourthought.com>
Message-ID: <14967.21493.166564.515469@anthem.wooz.org>

    >> Which documentation specifically? And what specific passages
    >> made you despair (sp?).

>>>>> "UO" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

    UO> I've got to go to bed (up at bloody 6:00 a.m.when the nipper
    UO> wakes up), but I wanted to first point out the culprit that
    UO> seems to have led me so far astray

    UO> See

    UO> http://python.sourceforge.net/devel-docs/lib/node160.html

    UO> Which seems to suggest that you need GNU and makes no mention
    UO> of msgfmt.py

Fred Drake's brought this to my attention, since I'm not on the
xml-sig.  I think msgfmt.py was added after the gettext module's
documentation was written, and the docos were never updated when we
added Martin's tool.  I'll go ahead and add some text to the page.

Cheers,
-Barry


From martin@mira.cs.tu-berlin.de  Tue Jan 30 23:41:53 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 00:41:53 +0100
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <200101301641.JAA15264@localhost.localdomain> (message from Uche
 Ogbuji on Tue, 30 Jan 2001 09:41:39 -0700)
References: <200101301641.JAA15264@localhost.localdomain>
Message-ID: <200101302341.f0UNfru01025@mira.informatik.hu-berlin.de>

> "So you'd first produce 4Suite.de.po, send it to me, and I send it
> back to you filled with German translations."
>
> BTW, the "4Suite.de.po" part confuses me.  Based on this and the
> msgid/msgstr combos in the code, I'm guessing each language has a
> .po.  Fine, but again, how does this feed into msgfmt.py?  Is a
> single .mo file created, or one for each language?  I see no fields
> that specify the localization for each .po file.

msgfmt.py will transform <foo>.po into <foo>.mo, as does GNU msgfmt.

I suggest that you download the sources of, say, GNU fileutils, and
have a look at the directory structure. There is a lot of automake
magic as well which you probably want to ignore - just consider the
'po' directory.

> Anyway, so what happens when I change or add messages and all that.
> Do I simply send brand new .po files to each translation, maybe
> sending a diff as well to make the changes clear?  This seems
> cumbersome.  Of course, I'm not sure what scheme would be smoother.

For that, GNU gettext offers the "msgmerge" utility. It will find
messages that didn't change and keep the translation, find messages
that changed slightly and mark the translations as "fuzzy", find new
messages and put empty translation into them, and find messages that
disappeared and put their translations as "obsolete" into comments.

Emacs po-mode then offers to navigate between fuzzy and untranslated
messages. It *is* common to clearly label a version of the message
catalog (e.g. 0.10.1a, 0.10.1b, etc), so translators can use diff to
find differences - a good xgettext utility will spit out the msgids in
the same order each time.

Unfortunately, there is no msgfmt.py, yet - so you have to use the GNU
tool, or off-load merging with previous revisions to the translators.
Contribution of such a tool would be welcome, of course (I know we are
deep in i18n-sig stuff now).

> BTW, what's the difference between a .po and .pot file?  If none, why does 
> msgfmt.py insist on ".po" when the docs just talk about ".pot"?

There always is a file with just the msgids, and no translations -
that is called .pot, and no .mo file is created from it.

So what you extract is .pot (or, .po template), what translators
produce is .po.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 30 23:52:43 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 00:52:43 +0100
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <xuy7l3csvx5.fsf@halden.devel.redhat.com> (teg@redhat.com)
References: <200101301641.JAA15264@localhost.localdomain> <xuy7l3csvx5.fsf@halden.devel.redhat.com>
Message-ID: <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de>

> > BTW, the "4Suite.de.po" part confuses me.  Based on this and the msgid/msgstr 
> > combos in the code, I'm guessing each language has a .po.
> 
> Yes... but it's usially just called "de.po" (in this case)

You are right. Although, as a translator, I always get files named,
say, grep-2.4a.de.po, so I forgot that they are renamed to de.po in
the grep distribution. Depending on the exact installation procedure,
4Suite.de.po would work just fine, wouldn't it?

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 30 23:55:56 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 00:55:56 +0100
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <14967.5575.121696.89293@lindm.dm> (message from Dieter Maurer on
 Tue, 30 Jan 2001 20:28:07 +0100 (CET))
References: <14967.5575.121696.89293@lindm.dm>
Message-ID: <200101302355.f0UNtuR01054@mira.informatik.hu-berlin.de>

> The extraction routine is smart enough to merge in new
> string keys to be translated and mark slightly changed keys
> as fuzzy.
> 
> The best thing probably is to extract the new and fuzzy keys
> and send them.

I'm not so sure about this advice. First, it's msgmerge, not xgettext,
that does the fuzzy-marking. Then, in the GNU translation project,
full files are always sent, which allows the translator to review old
translations, and put an updated PO-Revision-Date header into the
catalog.

Regards,
Martin


From martin@mira.cs.tu-berlin.de  Tue Jan 30 23:59:43 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 00:59:43 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>
 (fdrake@acm.org)
References: <200101300832.BAA13572@localhost.localdomain>
 <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
 <3A768673.16ABCE06@fourthought.com> <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>
Message-ID: <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de>

>  > I read the whole gettext section and I don't think I ever say msgfmt.py
>  > mentioned.
> 
>   I've forwarded your comments on this to Barry Warsaw, so that he can
> update that portion of the documentation.  Thanks!

I'd like to point out that I wrote msgfmt.py just barely before the
2.0 release, and I'm guilty of providing no documentation whatsoever :-(
(is that a trademark now?)

Regards,
Martin


From teg@redhat.com  Wed Jan 31 00:04:05 2001
From: teg@redhat.com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Date: 30 Jan 2001 19:04:05 -0500
Subject: [XML-SIG] on gettext maintenance
In-Reply-To: <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de>
References: <200101301641.JAA15264@localhost.localdomain>
 <xuy7l3csvx5.fsf@halden.devel.redhat.com>
 <200101302352.f0UNqhL01051@mira.informatik.hu-berlin.de>
Message-ID: <xuyzog8a8p6.fsf@halden.devel.redhat.com>

"Martin v. Loewis" <martin@mira.cs.tu-berlin.de> writes:

> > > BTW, the "4Suite.de.po" part confuses me.  Based on this and the msgid/msgstr 
> > > combos in the code, I'm guessing each language has a .po.
> > 
> > Yes... but it's usually just called "de.po" (in this case)
> 
> You are right. Although, as a translator, I always get files named,
> say, grep-2.4a.de.po, so I forgot that they are renamed to de.po in
> the grep distribution. Depending on the exact installation procedure,
> 4Suite.de.po would work just fine, wouldn't it?

Yes, but it would be .... weird... and nonstandard. 

I suggest taking a look at some simple packages (like kbdconfig,
mouseconfig etc) and their makefiles (unless you want to go the entire
autoconf way).
-- 
Trond Eivind Glomsrød
Red Hat, Inc.


From fdrake@acm.org  Wed Jan 31 06:06:04 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 31 Jan 2001 01:06:04 -0500 (EST)
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de>
References: <200101300832.BAA13572@localhost.localdomain>
 <200101300904.f0U94k201466@mira.informatik.hu-berlin.de>
 <3A768673.16ABCE06@fourthought.com>
 <14967.10261.74642.924466@cj42289-a.reston1.va.home.com>
 <200101302359.f0UNxhC01056@mira.informatik.hu-berlin.de>
Message-ID: <14967.43852.565918.648240@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > I'd like to point out that I wrote msgfmt.py just barely before the
 > 2.0 release, and I'm guilty of providing no documentation whatsoever :-(
 > (is that a trademark now?)

  No, but it may justify a plane ticket to Germany so I can hunt you
down and berate you in person.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From martin@mira.cs.tu-berlin.de  Wed Jan 31 20:59:14 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 31 Jan 2001 21:59:14 +0100
Subject: [XML-SIG] Will gettext do?
In-Reply-To: <m3lmrtnxny.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 30 Jan 2001 11:26:25 +0100)
References: <Pine.LNX.4.21.0101301021540.26603-100000@leo.logilab.fr> <m3lmrtnxny.fsf@lambda.garshol.priv.no>
Message-ID: <200101312059.f0VKxEZ01572@mira.informatik.hu-berlin.de>

> BTW: xmlproc supports localization of its error messages, using a
> home-spun mechanism, which is far less powerful than gettext, but
> seems to do the job.

Have you considered moving to gettext as well?

Regards,
Martin


From homeloan013101@aol.com  Wed Jan 31 20:57:49 2001
From: homeloan013101@aol.com (homeloan013101@aol.com)
Date: Wed, 31 Jan 2001 20:57:49
Subject: [XML-SIG] Buying a home?  Self employed?  Hard to qualify?	1216
Message-ID: <370.462366.81047@aol.com>

WE SOLVE MORTGAGE PROBLEMS !!!

Specializing in loans for exceptional people

     Self-Employed Borrowers
     No Income or Asset Verification
     All Levels of Credit Quality
     Up to 100% Financing
     High Debt Ratios
     Non-Owner Occupied Properties
     Renovation Plus Purchase and Refinance

$UPER $OLUTION$ ......... $UPER RE$ULT$

If you would like additional information 
please email us at wed1111@excite.com?Subject=MoreInformation

Help a family member or a friend with their home loan needs by 
FORWARDING THIS EMAIL TO THEM!

An Equal Housing Opportunity Lender


If you wish to be removed from this advertiser's future mailings, please reply 
with the subject "Remove" and this software will automatically block you 
from their future mailings.