From RoD@qnet20.com  Thu Feb  1 23:24:23 2001
From: RoD@qnet20.com (Rod)
Date: Thu, 1 Feb 2001 23:24:23
Subject: [XML-SIG] Diamond x Jungle Carpet Python
Message-ID: <20010202072446.AF667F506@mail.python.org>

I have several Diamond x Jungle Capret Pythons for SALE.

Make me an offer....

Go to: www.qnet20.com


From mal@lemburg.com  Fri Feb  2 09:25:53 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 02 Feb 2001 10:25:53 +0100
Subject: [XML-SIG] Diamond x Jungle Carpet Python
References: <20010202072446.AF667F506@mail.python.org>
Message-ID: <3A7A7D21.B43614F8@lemburg.com>

Rod wrote:
> 
> I have several Diamond x Jungle Capret Pythons for SALE.
> 
> Make me an offer....
> 
> Go to: www.qnet20.com

Perhaps we ought throw together and buy Guido one of these elegant
Pythons for the conference ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From eugeneai@icc.ru  Fri Feb  2 09:42:45 2001
From: eugeneai@icc.ru (Evgeny Cherkashin)
Date: Fri, 2 Feb 2001 17:42:45 +0800
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
Message-ID: <200102020943.RAA23939@monster.icc.ru>

Hi!

I just installed codecs aware pyXML-0.6.3 package and figured out, 
that at least for python 2.0 the package installer should replace
python's original pyexpat.pyd module (in python's DLLs folder under windows), 
as it is usually loaded by pyXML (no that new in package). Or, may be, remove all
old pyexpat.pyd before installation. This results in pyXML does not support codecs.

Thank you for codecs inclusion in the package.

Evegeny

--


From martin@mira.cs.tu-berlin.de  Fri Feb  2 13:33:46 2001
From: martin@mira.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 2 Feb 2001 14:33:46 +0100
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: <200102020943.RAA23939@monster.icc.ru> (message from Evgeny
 Cherkashin on Fri, 2 Feb 2001 17:42:45 +0800)
References: <200102020943.RAA23939@monster.icc.ru>
Message-ID: <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de>

> I just installed codecs aware pyXML-0.6.3 package and figured out,
> that at least for python 2.0 the package installer should replace
> python's original pyexpat.pyd module (in python's DLLs folder under
> windows), as it is usually loaded by pyXML (no that new in
> package). Or, may be, remove all old pyexpat.pyd before
> installation. This results in pyXML does not support codecs.

Why is that? It should work just fine if you use xml.parsers.expat.

Regards,
Martin


From noreply@sourceforge.net  Fri Feb  2 21:26:58 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 02 Feb 2001 13:26:58 -0800
Subject: [XML-SIG] [Bug #130913] XML processing instruction being output wrong
Message-ID: <E14OnjO-0002RX-00@usw-sf-web2.sourceforge.net>

Bug #130913, was updated on 2001-Feb-02 13:26
Here is a current snapshot of the bug.

Project: Python/XML
Category: SAX
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: nobody
Assigned to : nobody
Summary: XML processing instruction being output wrong

Details: The version="1.0" which is required in the XML processing
instruction is not included when the XmlWrite.startDocument is done.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=130913&group_id=6473


From guido@digicool.com  Sat Feb  3 19:39:54 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 03 Feb 2001 14:39:54 -0500
Subject: [XML-SIG] Minidom bugs/questions
Message-ID: <200102031939.OAA12187@cj20424-a.reston1.va.home.com>

I'm making my first steps into XML, so please forgive me.  I wrote a
simple XML application using a DOM implementation by Digital Creations
folks.  Then I was trapped in a hotel room with my code on a laptop
without a copy of DC's code, but with the Python 2.1a1 release
installed.

Converting my app to use minidom was easy enough, but I found out a
bout a bunch of differences between the two DOM implementations.  Some
of these are fine with me (e.g. minidom doesn't preserve comments,
doesn't prefix its output with "<?xml version="1.0" ?>" when writing
XML output, minidom returns Unicode strings even for ASCII input).

But others suggest that either the DOM standard isn't very strict or
unambiguous, or one of the implementations has a bug.  Here's the list
of things that I had to fix in my code:

1. The other DOM has a hasAttributes() predicate; minidom is missing
   this and I have to use the more expensive form "if node.attributes".

2. In minidom, Element.getAttribute() and .getAttributeNS() raise
   KeyError for a non-existing attribute; in the othe DOM, they return
   "".  (Personally, I'd prefer KeyError or perhaps None, but according
   to Fred, the DOM standard requires "".  Note that this is poorly
   documented -- from the docs for getAttribute*() it's not clear
   *what* is returned in this case.)

3. Note that getAttributeNode() correctly returns None of the attribute
   doesn't exist, but getAttributeNodeNS() looks like it will raise
   KeyError too!

4. In minidom, createDocument() leaves doc.documentElement set to None;
   in the other DOM, doc.documentElement is initialized to an Element
   node created from the second argument to createDocument().  (Again,
   according to Fred, the DOM standard requires the latter.)

5. When writing XML output from a DOM tree that uses namespace
   attributes, minidom doesn't insert the proper "xmlns:<tag>=<URI>"
   attributes.  The other DOM gets this right.  (This is a bit tricky
   to do, although I've figured a good way to do it which I'll gladly
   donate to minidom if it's deemed useful.)

6. When writing XML output from a DOM tree that has a default
   namespace, minidom writes <:tag>...</:tag> instead of
   <tag>...</tag> like the other DOM, and like I would have expected.

Other comments:

7. I noticed that minidom's __getattr__ special-cases requests for an
   attribute whose name begins with _get_, and makes up a lambda on the
   fly.  This suggests that the caller is using for _get_foo() where
   there is no such method, but there is a foo attribute.  Since
   _get_foo() is a detail of the implementation (I hope), doesn't this
   mean that the implementation is doing something silly?  Shouldn't
   the implementation be fixed rather than accommodated?  Or am I
   missing something?

Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6
turns out to require a patch to pulldom.py).  5 is more work; 7 is a
trivial patch but I expect there's a reason (in which case a comment
would be a nice idea :-).

I'd like some feedback before checking this in...

*** pulldom.py	2001/01/27 08:47:37	1.17
--- pulldom.py	2001/02/03 19:38:26
***************
*** 56,62 ****
              # provide us with the original name. If not, create
              # *a* valid tagName from the current context.
              if tagName is None:
!                 tagName = self._current_context[uri] + ":" + localname
              node = self.document.createElementNS(uri, tagName)
          else:
              # When the tagname is not prefixed, it just appears as
--- 56,66 ----
              # provide us with the original name. If not, create
              # *a* valid tagName from the current context.
              if tagName is None:
!                 prefix = self._current_context[uri]
!                 if prefix:
!                     tagName = prefix + ":" + localname
!                 else:
!                     tagName = localname
              node = self.document.createElementNS(uri, tagName)
          else:
              # When the tagname is not prefixed, it just appears as
***************
*** 66,72 ****
          for aname,value in attrs.items():
              a_uri, a_localname = aname
              if a_uri:
!                 qname = self._current_context[a_uri] + ":" + a_localname
                  attr = self.document.createAttributeNS(a_uri, qname)
              else:
                  attr = self.document.createAttribute(a_localname)
--- 70,80 ----
          for aname,value in attrs.items():
              a_uri, a_localname = aname
              if a_uri:
!                 prefix = self._current_context[a_uri]
!                 if prefix:
!                     qname = prefix + ":" + a_localname
!                 else:
!                     qname = a_localname
                  attr = self.document.createAttributeNS(a_uri, qname)
              else:
                  attr = self.document.createAttribute(a_localname)
*** minidom.py	2001/02/02 19:40:19	1.22
--- minidom.py	2001/02/03 19:38:50
***************
*** 435,444 ****
          Node.unlink(self)
  
      def getAttribute(self, attname):
!         return self._attrs[attname].value
  
      def getAttributeNS(self, namespaceURI, localName):
!         return self._attrsNS[(namespaceURI, localName)].value
  
      def setAttribute(self, attname, value):
          attr = Attr(attname)
--- 435,450 ----
          Node.unlink(self)
  
      def getAttribute(self, attname):
!         try:
!             return self._attrs[attname].value
!         except KeyError:
!             return ""
  
      def getAttributeNS(self, namespaceURI, localName):
!         try:
!             return self._attrsNS[(namespaceURI, localName)].value
!         except KeyError:
!             return ""
  
      def setAttribute(self, attname, value):
          attr = Attr(attname)
***************
*** 457,463 ****
          return self._attrs.get(attrname)
  
      def getAttributeNodeNS(self, namespaceURI, localName):
!         return self._attrsNS[(namespaceURI, localName)]
  
      def setAttributeNode(self, attr):
          if attr.ownerElement not in (None, self):
--- 463,469 ----
          return self._attrs.get(attrname)
  
      def getAttributeNodeNS(self, namespaceURI, localName):
!         return self._attrsNS.get((namespaceURI, localName))
  
      def setAttributeNode(self, attr):
          if attr.ownerElement not in (None, self):
***************
*** 528,533 ****
--- 534,545 ----
      def _get_attributes(self):
          return AttributeList(self._attrs, self._attrsNS)
  
+     def hasAttributes(self):
+         if self._attrs or self._attrsNS:
+             return 1
+         else:
+             return 0
+ 
  class Comment(Node):
      nodeType = Node.COMMENT_NODE
      nodeName = "#comment"
***************
*** 635,640 ****
--- 647,654 ----
                  raise xml.dom.NamespaceErr("illegal use of 'xml' prefix")
              if prefix and not namespaceURI:
                  raise xml.dom.NamespaceErr("illegal use of prefix without namespaces")
+             element = doc.createElementNS(namespaceURI, qualifiedName)
+             doc.appendChild(element)
          doctype.parentNode = doc
          doc.doctype = doctype
          doc.implementation = self

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Mike.Olson@fourthought.com  Sat Feb  3 23:24:47 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sat, 03 Feb 2001 16:24:47 -0700
Subject: [XML-SIG] Updating our servers
Message-ID: <3A7C933F.D34EFEE2@FourThought.com>

Sorry if you get this twice

Just wanted to send a quick message to everyone to say that we are in
the middle of updating our web servers so connections to fourthought.com
and 4suite.org will be spotty for the rest of the weekend.  We hope to
have it all configured and running by the end of the day, and then it
will take a day for name servers to update and point to the new
machines.  In the meantime we will be running on both the new and old
machine so you _should_ be able to get to the site, but errors might pop
up as caches are updated etc.

Sorry for the inconvience, but this should help performance of these
sites greatly.

Mike


-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Feb  3 23:23:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 4 Feb 2001 00:23:15 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <200102031939.OAA12187@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Sat, 03 Feb 2001 14:39:54 -0500)
References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com>
Message-ID: <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de>

> I'm making my first steps into XML, so please forgive me.  

Hi Guido,

I will forgive, but I will still comment :-)

> Converting my app to use minidom was easy enough, but I found out a
> bout a bunch of differences between the two DOM implementations.  Some
> of these are fine with me (e.g. minidom doesn't preserve comments,
> doesn't prefix its output with "<?xml version="1.0" ?>" when writing
> XML output, minidom returns Unicode strings even for ASCII input).

Actually, input in XML is always Unicode. If no encoding is specified
in the document, it is treated as UTF-8. If an encoding is specified,
DOM implementations shall transform it into Unicode before giving it
to the user.

It is only that older Python versions did not support Unicode; I guess
that's the reason why the Zope one does not comply here.

> 1. The other DOM has a hasAttributes() predicate; minidom is missing
>    this and I have to use the more expensive form "if node.attributes".

Right; that's a bug in minidom: hasAttributes was introduced in "DOM
Level 2".

The original idea of minidom was that it should be "minimal"; clearly
that has not worked out, so we probably should review it carefully to
achieve completeness (with respect to "DOM 2 Core").

> 2. In minidom, Element.getAttribute() and .getAttributeNS() raise
>    KeyError for a non-existing attribute; in the othe DOM, they return
>    "".  (Personally, I'd prefer KeyError or perhaps None, but according
>    to Fred, the DOM standard requires "".

Right. To get the KeyError, use .attributes['attrname'], which is a
Python extension to the DOM.

> 3. Note that getAttributeNode() correctly returns None of the attribute
>    doesn't exist, but getAttributeNodeNS() looks like it will raise
>    KeyError too!

Yes, that's yet another error.

> 4. In minidom, createDocument() leaves doc.documentElement set to None;
>    in the other DOM, doc.documentElement is initialized to an Element
>    node created from the second argument to createDocument().  (Again,
>    according to Fred, the DOM standard requires the latter.)

That was a surprise to me. After reading the spec and a number of
implementations, I think the requirement is much stronger: You MUST
pass a qualifiedName, only the namespaceURI and the doctype are
optional. 

So your patch is incomplete in this respect; you also need to correct
pulldom to pass meaningful content (with your patch, you could get two
document elements).

It appears to be a common trick to allow null in createDocument, so
that the first element found during parsing can be introduced with
appendChild, but that appears to be non-conforming (somebody please
correct me if it is).

I could try to come up with a separate patch for that issue.

> 5. When writing XML output from a DOM tree that uses namespace
>    attributes, minidom doesn't insert the proper "xmlns:<tag>=<URI>"
>    attributes.  The other DOM gets this right.  (This is a bit tricky
>    to do, although I've figured a good way to do it which I'll gladly
>    donate to minidom if it's deemed useful.)

Yes, that is certainly desirable; minidom should support namespaces
fully.

> 
> 6. When writing XML output from a DOM tree that has a default
>    namespace, minidom writes <:tag>...</:tag> instead of
>    <tag>...</tag> like the other DOM, and like I would have expected.

Certainly a bug. When writing out namespace declarations, dealing with
default default namespace is really tricky (e.g. when a tree that had
a default namespace is extended with an element with no namespace).

> 7. I noticed that minidom's __getattr__ special-cases requests for an
>    attribute whose name begins with _get_, and makes up a lambda on the
>    fly.  This suggests that the caller is using for _get_foo() where
>    there is no such method, but there is a foo attribute.  Since
>    _get_foo() is a detail of the implementation (I hope)

No, its actually not. The DOM is defined in terms of CORBA IDL,
unfortunately with a massive use of attributes. Attributes, in CORBA,
map to two functions, _get_<attr> and _set_<attr>; this is also how
the IDL language mapping for Python works.

So the canonical way of using DOM in Python would be to use the _get_
and _set_ methods; a number of Python DOM implementations support that
- although the now-official Python DOM mapping marks these methods as
optional.

Some people might be using this interface, e.g. when they access a DOM
both locally and remotely. Some may use it because they consider
accessor functions cleaner than attribute access. Since it does not
cost anything to have that feature, I'd leave it.

> Hare are proposed patches for items 1, 2, 3, 4 and 6 above (fixing 6
> turns out to require a patch to pulldom.py).  

The ones for 1,2,3 and 6 look fine; for the one to 4, see my comments
above.

> 7 is a trivial patch but I expect there's a reason (in which case a
> comment would be a nice idea :-).

It is elaborated at

http://python.sourceforge.net/devel-docs/lib/dom-accessor-methods.html

So referring the reader to the documentation may be appropriate.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sun Feb  4 23:12:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 00:12:15 +0100
Subject: [XML-SIG] Providing a DOMImplementationFactory
Message-ID: <200102042312.f14NCF501636@mira.informatik.hu-berlin.de>

The DOM level 3 draft proposes a mechanism for Java to locate a
DOMImplementation object. In short, Java programs can invoke

   org.w3c.dom.DOMImplementationFactory.getDOMImplementation()

which loads the implementation defined in the property
org.w3c.dom.DOMImplementation. Should Python offer a similar
mechanism? If so, how should it work?

I can think of the following strategy:
- offer two functions, 
  xml.dom.getDOMImplementation([name])
  xml.dom.registerDOMImplementation(name, implementation)

  That is not really a factory, but rather a locator (should that be
  an implementation factory?)

- In getDOMImplementation, use various approaches of returning an
  implementation:
  * if a name was given, and an implementation with that name was
    registered, return it. Well-known names should be published by
    posting to xml-sig@python.org, and subsequently recorded in
    xml.dom.__init__
  * if no name is given, but the PYTHON_DOM environment variable is set,
    this variable names a module which should have an .implementation
    attribute; this is then used. I don't know whether it is good or bad
    that Python does not provide Java-style properties...
  * if no name was given, and attempt to return a "best" implementation
    should be done, where best means "most featureful". Not sure how
    to compute this, though.

- The implementation of xml.dom.__init__ would provide a number of
  pre-registered DOM implementations, which would always include
  minidom and would include 4DOM if PyXML is installed.

- add-on packages (like 4Suite, or Zope) can install .pth files which
  register additional DOM implementations (starting with Python 2.1).

Please comment.

Regards,
Martin


From eugeneai@icc.ru  Mon Feb  5 02:32:13 2001
From: eugeneai@icc.ru (Evgeny Cherkashin)
Date: Mon, 5 Feb 2001 10:32:13 +0800
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de>
References: <200102020943.RAA23939@monster.icc.ru>
 <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de>
Message-ID: <200102050233.KAA24522@monster.icc.ru>

On Fri, 2 Feb 2001 14:33:46 +0100
"Martin v. Loewis" <martin@mira.cs.tu-berlin.de> wrote:

MVL> > I just installed codecs aware pyXML-0.6.3 package and figured out,
MVL> > that at least for python 2.0 the package installer should replace
MVL> > python's original pyexpat.pyd module (in python's DLLs folder under
MVL> > windows), as it is usually loaded by pyXML (no that new in
MVL> > package). Or, may be, remove all old pyexpat.pyd before
MVL> > installation. This results in pyXML does not support codecs.
MVL> 
MVL> Why is that? It should work just fine if you use xml.parsers.expat.
MVL> 

But in the automatical mode (without explicit notification) does not.

MVL> Regards,
MVL> Martin
MVL> 


--


From uche.ogbuji@fourthought.com  Mon Feb  5 04:46:35 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 04 Feb 2001 21:46:35 -0700
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sun, 04 Feb 2001 00:23:15 +0100." <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de>
Message-ID: <200102050446.VAA29599@localhost.localdomain>

> > Converting my app to use minidom was easy enough, but I found out a
> > bout a bunch of differences between the two DOM implementations.  Some
> > of these are fine with me (e.g. minidom doesn't preserve comments,
> > doesn't prefix its output with "<?xml version="1.0" ?>" when writing
> > XML output,

minidom should be fixed to put out an XML declaration, preferably with the 
encoding.  This is hardly a burden, and is *highly* recommended XML practice.

> > minidom returns Unicode strings even for ASCII input).

> > 1. The other DOM has a hasAttributes() predicate; minidom is missing
> >    this and I have to use the more expensive form "if node.attributes".
> 
> Right; that's a bug in minidom: hasAttributes was introduced in "DOM
> Level 2".
> 
> The original idea of minidom was that it should be "minimal"; clearly
> that has not worked out, so we probably should review it carefully to
> achieve completeness (with respect to "DOM 2 Core").

Well, we should think about exactly what makes minidom "mini".  It's debatable 
whether it is possible to implement all of DOM Level 2 core and still be 
"mini".  And what about DOm level 3?

> > 4. In minidom, createDocument() leaves doc.documentElement set to None;
> >    in the other DOM, doc.documentElement is initialized to an Element
> >    node created from the second argument to createDocument().  (Again,
> >    according to Fred, the DOM standard requires the latter.)
> 
> That was a surprise to me. After reading the spec and a number of
> implementations, I think the requirement is much stronger: You MUST
> pass a qualifiedName, only the namespaceURI and the doctype are
> optional. 

Yes.  This is a pain, but it is clearly fundamental to the DOM WG conceptual 
model.

> It appears to be a common trick to allow null in createDocument, so
> that the first element found during parsing can be introduced with
> appendChild, but that appears to be non-conforming (somebody please
> correct me if it is).

I think it is, even though 4DOM does this.  Mike or Jeremy will probably 
remind me if I'm missing something.  From what I see of the readers, we don't 
need this convenience.

> I could try to come up with a separate patch for that issue.
> 
> > 5. When writing XML output from a DOM tree that uses namespace
> >    attributes, minidom doesn't insert the proper "xmlns:<tag>=<URI>"
> >    attributes.  The other DOM gets this right.  (This is a bit tricky
> >    to do, although I've figured a good way to do it which I'll gladly
> >    donate to minidom if it's deemed useful.)
> 
> Yes, that is certainly desirable; minidom should support namespaces
> fully.

Of course if it isn't Level 2 compliant, it needn't do so.  I wouldn't 
consider it unreasonable to have minidom L1 only.  If users want Level 2, they 
install PyXML or other.

> > 6. When writing XML output from a DOM tree that has a default
> >    namespace, minidom writes <:tag>...</:tag> instead of
> >    <tag>...</tag> like the other DOM, and like I would have expected.
> 
> Certainly a bug. When writing out namespace declarations, dealing with
> default default namespace is really tricky (e.g. when a tree that had
> a default namespace is extended with an element with no namespace).

Horrid bug.  Those are invalid XML 1.0 NMTOKENS.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Feb  5 04:50:37 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 04 Feb 2001 21:50:37 -0700
Subject: [XML-SIG] Providing a DOMImplementationFactory
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Mon, 05 Feb 2001 00:12:15 +0100." <200102042312.f14NCF501636@mira.informatik.hu-berlin.de>
Message-ID: <200102050450.VAA29629@localhost.localdomain>

> The DOM level 3 draft proposes a mechanism for Java to locate a
> DOMImplementation object. In short, Java programs can invoke
> 
>    org.w3c.dom.DOMImplementationFactory.getDOMImplementation()
> 
> which loads the implementation defined in the property
> org.w3c.dom.DOMImplementation. Should Python offer a similar
> mechanism? If so, how should it work?
> 
> I can think of the following strategy:
> - offer two functions, 
>   xml.dom.getDOMImplementation([name])
>   xml.dom.registerDOMImplementation(name, implementation)
> 
>   That is not really a factory, but rather a locator (should that be
>   an implementation factory?)

I think it should be a factory, because I've just been thinking about the 
ability to set properties non-globally on DOM implementations.  For instance, 
I think 4DOM should come with the mutation event system disabled unless 
support for this is set as a property.

A factory would be a perfect place to set such properties.

> - In getDOMImplementation, use various approaches of returning an
>   implementation:
>   * if a name was given, and an implementation with that name was
>     registered, return it. Well-known names should be published by
>     posting to xml-sig@python.org, and subsequently recorded in
>     xml.dom.__init__
>   * if no name is given, but the PYTHON_DOM environment variable is set,
>     this variable names a module which should have an .implementation
>     attribute; this is then used. I don't know whether it is good or bad
>     that Python does not provide Java-style properties...
>   * if no name was given, and attempt to return a "best" implementation
>     should be done, where best means "most featureful". Not sure how
>     to compute this, though.
> 
> - The implementation of xml.dom.__init__ would provide a number of
>   pre-registered DOM implementations, which would always include
>   minidom and would include 4DOM if PyXML is installed.
> 
> - add-on packages (like 4Suite, or Zope) can install .pth files which
>   register additional DOM implementations (starting with Python 2.1).

Sounds good enough to try out.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Mon Feb  5 05:38:22 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 5 Feb 2001 00:38:22 -0500
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com>
Message-ID: <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com>

Here is a revised version of the PEP about using None for namespace URIs.
It's extended quite a bit.  I've tried to use the suggestions of Ken MacLeod,
Uchi, and Martin, among others, and I've spent a fair amount of time rummaging
through the various Recs.

I was disappointed to learn that the PyXML docs, as well as the 4DOM docs,
don't really say anything about this issue.  So I made a start on reading
through some of the code (not the most recent versions in the CVS tree, but
what I've got from version 0.6.2 and from downloading from 4Thought).

I'd appreciate it if anyone who is quite familiar with the SAX and SAX2 code
help out and verify if using None would cause any problems for the existing
code.

The PEP (below) makes for a longish posting, but I didn't want to use an
attachment unless everyone agrees it's OK to do so.. What do you all think
about using attachments for this kind of thing?

Cheers,

Tom P

=============================================
<?xml version='1.0'?>
<xmlpep>
 <headers>
  <pep_number>xmlpep-1</pep_number>
  <pep_title>Values for Null Or Empty Namespace URIs</pep_title>
  <pep_version>0.20</pep_version>
  <cvs_version_string/>
  <list_of_authors>
   <author name='Thomas B. Passin' email='tpassin@home.com'/>
  </list_of_authors>
  <status>Draft</status>
  <type>Standards Track</type>
  <created>29-Jan-2001</created>
  <history>
   <post date='29-Jan-2001'/>
   <post date='4-Feb-2001'/>
  </history>
 </headers>
 <abstract>
  This PEP specifies the proper values of the Namespace URI property
  when its value might otherwise appear to be either "null", "None", or the
  empty string.

  Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3]
  These three recommendations do not appear to be in full agreement.  This
fact,
  and differences between Java and Python, has lead to some confusion and
  some disagreement between various implementations supported by PyXML.  The
  language in these three Recommendations is reviewed.

  The recommendation is made to use None as the URI value in all cases where
  no URI applies to an element or attribute.

  The XMLPEP, when approved, will apply to all namespace-aware software
  maintained by the pyxml interest group.
 </abstract>

 <specification>
  <para title='Namespace-aware applications'>
   When no namespace has been declared whose scope applies to a
   particular element or attribute, the application MUST report the
   URI of the namespace of the element or attribute as None.  When there is no
   namespace prefix, the application MUST report the value of the prefix as
None.
  </para>

  <para title='Namespace-ignorant applications'>
   This requirement does not apply for applications that are not
   namespace-aware.
  </para>

  <para title='Applicability'>
   This requirement applies to all XML processing software maintained by the
PyXML
   interest group.
  </para>
 </specification>


 <rationale>
  <para title='Definitive Treatment Needed'>
  This PEP is needed because of continued uncertainty among varous PyXML
  developers as to the proper values to use, and because of inconsistency
  among various PyXML products.  Differences between Python, IDL, and Java
  make an unambiguous interpretation unclear.
  </para>

  <para>
  A definitive and consistent treatment is needed so that all the PyXML
  software may be made consistent.
  </para>

  <para title='W3C Namespaces Recommendation'>
   The Namespaces Recommendation recognizes that a namespace URI may
   be given no value - called "empty" in the Recommendation - even
   though a structure for a URI is provided in the document.  Two relevant
   passages are quoted here:

    <quote>Section 2. ...
      [Definition:] If the attribute name matches DefaultAttName,
      then the namespace name in the attribute value is that of the
      default namespace in the scope of the element to which the declaration
      is attached. In such a default declaration, the attribute value
      may be empty.
    </quote>
    <quote>5.2 Namespace Defaulting
      A default namespace is considered to apply to the element where
      it is declared (if that element has no namespace prefix), and to
      all elements with no prefix within the content of that element.
      If the URI reference in a default namespace declaration is empty,
      then unprefixed elements in the scope of the declaration are not
      considered to be in any namespace. Note that default namespaces
      do not apply directly to attributes.

      ...The default namespace can be set to the empty string. This has the
      same effect, within the scope of the declaration, of there being no
      default namespace.
    </quote>
  </para>

  <para>
     The term "empty" is not defined further, but in the context of the
     Recommendation, it must mean a missing string value.  The last
     fragment quoted above suggests, but does not require, that an
     empty string may be returned for an "empty" URI value.

     This has no direct applicability to values returned by implemenations,
     since
       1) the word "can" is used, rather than "must", and
       2) the Recommendation seems to apply to XML documents,
          not to implementations.
  </para>

  <para title='W3C DOM Level 2 Recommendation'>
    The W3C DOM Level 2 Recommendation refers to "null" namespaces in
    several places.  The thrust is clear and consistent: a "null" value
    is to be used to indicate a non-existent namespace URI value. Here
    are some relevant extracts from the Recommendation:

     <quote>Note that because the DOM does no lexical checking, the
       empty string will be treated as a real namespace URI in DOM Level 2
       methods. Applications must use the value null as the namespaceURI
       parameter for methods if they wish to have no namespace.
     </quote>
  </para>

  <para>
    The IDL definition for the createAttributeNS() method creates an
    attribute with these characteristics:
     <quote>
        A new Attr object with the following attributes:
Attribute    Value
Node.nodeName    qualifiedName
Node.namespaceURI   namespaceURI
Node.prefix    prefix, extracted from qualifiedName,
                                    or null if there is no prefix
Node.localName    local name, extracted from qualifiedName
Attr.name    qualifiedName
Node.nodeValue    the empty string
     </quote>
  </para>

  <para>For the older, non-NS aware createAttribute() method, the
Recommendation says
    <quote>...localName, prefix, and namespaceURI set to null. </quote>
  </para>

  <para>This is typical - a "null" is returned of there is no prefix or
URI.</para>

  <para>It is clear that the IDL specifies the use of "null" for empty
namespaces,
    rather that the empty string.  The java binding does not specify any
particular
    way value.
  </para>

  <para>
    Thus there seems to be nothing the the DOM Recommendation that suggests
that
    empty strings should be used, and there is clear language that "null"
values
    should be used.
  </para>

  <para title='SAX2'>
    The SAX2 java API clearly says that an empty string is to be
    returned.  The following extracts demonstrate this:

    <quote>In SAX2, the startElement and endElement callbacks in a content
handler
      look like this:
            public void startElement (String uri, String localName,
                 String qName, Attributes atts)
                 throws SAXException;

            public void endElement (String uri, String localName, String
qName)
                   throws SAXException;
      By default, an XML reader will report a Namespace URI and a local name
for
      every element, in both the start and end handler. Consider the following
      example:
        <html:hr xmlns:html="http://www.w3.org/1999/xhtml"/>
      With the default SAX2 Namespace processing, the XML reader would report
      a start and end element event with the Namespace URI
      "http://www.w3.org/1999/xhtml" and the local name "hr". The XML
       reader might also report the original qName "html:hr", but that
       parameter might simply be an empty string.
    </quote>

     <quote>
        <h:hello xmlns:h="http://www.greeting.com/ns/" id="a1"
h:person="David"/>
        If namespaces is true and namespace-prefixes is true,
        then a SAX2 XML reader will report the following:
           an element with the Namespace URI "http://www.greeting.com/ns/",
           the local name "hello", and the qName "h:hello";
           an attribute with no Namespace URI (empty string),
             no local name (empty string), and the qName "xmlns:h";
           an attribute with no Namespace URI (empty string), the
             local name "id", and the qName "id"; and an attribute
             with the Namespace URI "http://www.greeting.com/ns/",
             the local name "person", and the qName "h:person".
     </quote>
  </para>

  <para title='Discussion of The Three Recommendations'>
    To summarize, the Namespace Recommendation is essentially silent
    on the subject, the DOM clearly specifies "null" values, and SAX2
    clearly specifies the use of empty strings.
  </para>

  <para>

  </para>

  <para title='Arguments Favoring the Use of "None"'>
   The "highest" level Recommendation is presumably the DOM.
   Python offers a data object similar to "null" - the None object.
   The None object can be tested for exactly as for an empty string:

    <code>if uri:
              doYourThing()
    </code>

   Alternatively, None can be tested for explicitly, as in:

    <code>if uri is not None:
                  doYourThing()
    </code>

   Thus, None is flexible enough to be useful for this purpose.
  </para>

  <para>
    Many posts to the PyXML list have favored the use of None,
    although not all.  Either None or the empty string would seem to
    work in this context.  "None" agrees with the DOM Recommendation,
    and would seem (in a mnemonic sense)to suggest the absence of
    a prefix or URI.
  </para>

  <para title='4DOM Handling of None URIs and Prefixes'>
    The 4DOM code will handle a None URI correctly in many places,
     since it uses tests like this typical example:

      <code>
          if namespaceURI and namespaceURI != XML_NAMESPACE:
            # ...
      </code>

    This code works correctly if the namespaceURI is None.

  <para>Another test used in 4DOM is as follows:

    <code>def getElementsByTagNameNS(self,namespaceURI,localName):
        root = self.documentElement
        if root == None:
            return implementation.createNodeList([])
        py = root.getElementsByTagNameNS(namespaceURI,localName)
        if namespaceURI == '*' or namespaceURI == root.namespaceURI:
            if localName == '*' or localName == root.localName:
                py.insert(0,root)
        return py
     </code>

    The expression "namespaceURI == '*'" also evaluates correctly when
    the URI is None.
  </para>

  <para>If handling code is consistent throughout 4DOM, then it will handle
     None correctly.
  </para>

  <para title='SAX2'>
   [Need material here]
  </para>

 </rationale>
 <reference_implementation>[Should there be a reference here to one
  particular processor, such as xmlproc?]
 </reference_implementation>
 <notes></notes>
 <references></references>
 <copyright>This PEP may be used by anyone.</copyright>
</xmlpep>


From Mike.Olson@fourthought.com  Mon Feb  5 06:17:47 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 04 Feb 2001 23:17:47 -0700
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050446.VAA29599@localhost.localdomain>
Message-ID: <3A7E458B.E787CD07@FourThought.com>

Uche Ogbuji wrote:
> > The original idea of minidom was that it should be "minimal"; clearly
> > that has not worked out, so we probably should review it carefully to
> > achieve completeness (with respect to "DOM 2 Core").
> 
> Well, we should think about exactly what makes minidom "mini".  It's debatable
> whether it is possible to implement all of DOM Level 2 core and still be
> "mini".  And what about DOm level 3?

I think we should also look at merging minidom and pDomlette.  Both are
supposed to be "mini" and I think they both support about the same sets
of functionality.  No sense keeping both of them around.  I can look at
the differences and try to merge them.

> 
> > It appears to be a common trick to allow null in createDocument, so
> > that the first element found during parsing can be introduced with
> > appendChild, but that appears to be non-conforming (somebody please
> > correct me if it is).
> 
> I think it is, even though 4DOM does this.  Mike or Jeremy will probably
> remind me if I'm missing something.  From what I see of the readers, we don't
> need this convenience.

It was originally there for the readers and to allow a user to create a
document with out a document type.  I don't think the readers need this
functionality any more (I'd have to look at all of them).


-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Mike.Olson@fourthought.com  Mon Feb  5 06:19:37 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 04 Feb 2001 23:19:37 -0700
Subject: [XML-SIG] Providing a DOMImplementationFactory
References: <200102042312.f14NCF501636@mira.informatik.hu-berlin.de>
Message-ID: <3A7E45F9.1AF5A19C@FourThought.com>

"Martin v. Loewis" wrote:
> 
> The DOM level 3 draft proposes a mechanism for Java to locate a
> DOMImplementation object. In short, Java programs can invoke
> 
>    org.w3c.dom.DOMImplementationFactory.getDOMImplementation()
> 
> which loads the implementation defined in the property
> org.w3c.dom.DOMImplementation. Should Python offer a similar
> mechanism? If so, how should it work?
> 
> I can think of the following strategy:
> - offer two functions,
>   xml.dom.getDOMImplementation([name])
>   xml.dom.registerDOMImplementation(name, implementation)
> 
>   That is not really a factory, but rather a locator (should that be
>   an implementation factory?)
> 
> - In getDOMImplementation, use various approaches of returning an
>   implementation:
>   * if a name was given, and an implementation with that name was
>     registered, return it. Well-known names should be published by
>     posting to xml-sig@python.org, and subsequently recorded in
>     xml.dom.__init__
>   * if no name is given, but the PYTHON_DOM environment variable is set,
>     this variable names a module which should have an .implementation
>     attribute; this is then used. I don't know whether it is good or bad
>     that Python does not provide Java-style properties...
>   * if no name was given, and attempt to return a "best" implementation
>     should be done, where best means "most featureful". Not sure how
>     to compute this, though.
> 
> - The implementation of xml.dom.__init__ would provide a number of
>   pre-registered DOM implementations, which would always include
>   minidom and would include 4DOM if PyXML is installed.
> 
> - add-on packages (like 4Suite, or Zope) can install .pth files which
>   register additional DOM implementations (starting with Python 2.1).

I like this approach.  It is the one I recommended to Jeremy for the
XPath interface and I wouldn't mind seeing it used for all of the xml
libraries (XPath, XPointer, DOM, etc).

+1 for me

Mike

> 
> Please comment.
> 
> Regards,
> Martin
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Feb  5 06:29:08 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 04 Feb 2001 23:29:08 -0700
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: Message from Mike Olson <Mike.Olson@fourthought.com>
 of "Sun, 04 Feb 2001 23:17:47 MST." <3A7E458B.E787CD07@FourThought.com>
Message-ID: <200102050629.XAA29875@localhost.localdomain>

> Uche Ogbuji wrote:
> > > The original idea of minidom was that it should be "minimal"; clearly
> > > that has not worked out, so we probably should review it carefully to
> > > achieve completeness (with respect to "DOM 2 Core").
> > 
> > Well, we should think about exactly what makes minidom "mini".  It's debatable
> > whether it is possible to implement all of DOM Level 2 core and still be
> > "mini".  And what about DOm level 3?
> 
> I think we should also look at merging minidom and pDomlette.  Both are
> supposed to be "mini" and I think they both support about the same sets
> of functionality.  No sense keeping both of them around.  I can look at
> the differences and try to merge them.

Before you start doing this, I think we need to really air the matter out.  It 
wouldn't normally be such a big deal except for the special status of minidom 
(as the default Python DOM).

My sentiments are in favor of the idea.  Probably the biggest issues would be 
the DOM extension interfaces, e.g. PrettyPrint vs. toXML.  Of course DOM Level 
3 should settle that.

This would be a very opportune time for Paul Prescod to make a re-appearance.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 06:57:29 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 07:57:29 +0100
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: <200102050233.KAA24522@monster.icc.ru> (message from Evgeny
 Cherkashin on Mon, 5 Feb 2001 10:32:13 +0800)
References: <200102020943.RAA23939@monster.icc.ru>
 <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru>
Message-ID: <200102050657.f156vTe00831@mira.informatik.hu-berlin.de>

> MVL> Why is that? It should work just fine if you use xml.parsers.expat.
> MVL> 
> 
> But in the automatical mode (without explicit notification) does not.

Can you please elaborate? If one writes

from xml.parsers import expat

it works fine; the PyXML version of pyexpat is used. What is the
automatical mode? What is explicit notification?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 07:21:25 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 08:21:25 +0100
Subject: [XML-SIG] Providing a DOMImplementationFactory
In-Reply-To: <200102050450.VAA29629@localhost.localdomain> (message from Uche
 Ogbuji on Sun, 04 Feb 2001 21:50:37 -0700)
References: <200102050450.VAA29629@localhost.localdomain>
Message-ID: <200102050721.f157LPo00881@mira.informatik.hu-berlin.de>

> >   xml.dom.getDOMImplementation([name])
> >   xml.dom.registerDOMImplementation(name, implementation)
> 
>
> I think it should be a factory, because I've just been thinking
> about the ability to set properties non-globally on DOM
> implementations.  For instance, I think 4DOM should come with the
> mutation event system disabled unless support for this is set as a
> property.

Ok. So register gets a callable as its second argument then, and
modules which provide an implemation should provide a
getDOMImplementation function (in addition to the implementation
singleton that they may provide for backwards compatibility).

> Sounds good enough to try out.

Thanks. I'll prepare a patch for PyXML and 2.1b1.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 07:14:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 08:14:15 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <200102050446.VAA29599@localhost.localdomain> (message from Uche
 Ogbuji on Sun, 04 Feb 2001 21:46:35 -0700)
References: <200102050446.VAA29599@localhost.localdomain>
Message-ID: <200102050714.f157EFe00856@mira.informatik.hu-berlin.de>

> minidom should be fixed to put out an XML declaration, preferably
> with the encoding.  This is hardly a burden, and is *highly*
> recommended XML practice.

Certainly. I'll look into that once Guido has committed his patches;
there is also a "pretty-print" patch pending that I'll have to commit.

> > The original idea of minidom was that it should be "minimal"; clearly
> > that has not worked out, so we probably should review it carefully to
> > achieve completeness (with respect to "DOM 2 Core").
>
> Well, we should think about exactly what makes minidom "mini".  It's
> debatable whether it is possible to implement all of DOM Level 2
> core and still be "mini".  And what about DOm level 3?

I think the original understanding was that everything that is
"convenience", ie. can be composed from other interfaces, should not
be included. In addition, minidom originally had no DOMImplementation,
you had to know the implementation class names to build a tree.

That approach has failed; people have been contributing bits and
pieces so that what they wanted to use is there. These days, I think
it is mini by only implementing DOM Core. That probably makes it a AA
battery.

[supporting namespaces]
> Of course if it isn't Level 2 compliant, it needn't do so.  I
> wouldn't consider it unreasonable to have minidom L1 only.  If users
> want Level 2, they install PyXML or other.

I'd say that this is a matter of internal consistency. Since the SAX
part in Python supports namespaces, the DOM part should do so as
well. That means L2. It also turns out that what I hope is the larger
half of NS support is already in minidom as of Python 2.0, so ripping
it out would not be sensible.

As for supporting L3, following your advice to not do anything until
the spec nears completion is reasonable. If there is any interest,
providing a standard definition for the enumerations (inside Node3)
would be feasible, if the exact version of the draft is documented in
the code.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 07:25:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 08:25:33 +0100
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
In-Reply-To: <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com>
Message-ID: <200102050725.f157PXD00934@mira.informatik.hu-berlin.de>

> The PEP (below) makes for a longish posting, but I didn't want to
> use an attachment unless everyone agrees it's OK to do so.. What do
> you all think about using attachments for this kind of thing?

While hopefully looking at the actual text later, I think a major
point of the Python PEPs is that they are online even when in draft
status. That way, interested people don't have to react when it is
published, as they can always go to a well-known repository and look
what's there.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 07:45:24 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 08:45:24 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <3A7E458B.E787CD07@FourThought.com> (message from Mike Olson on
 Sun, 04 Feb 2001 23:17:47 -0700)
References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com>
Message-ID: <200102050745.f157jOm00978@mira.informatik.hu-berlin.de>

> I think we should also look at merging minidom and pDomlette.  Both are
> supposed to be "mini" and I think they both support about the same sets
> of functionality.  No sense keeping both of them around.  I can look at
> the differences and try to merge them.

I never quite understood where the "p" in pDomlette came from. To
date, pDomlette is just 200 lines longer than minidom, so yes, merging
them is a genuine option. Bear in mind that a new Python release is
upcoming, and that the final beta release is probably the last point
to add missing features (i.e. bug corrections with regard to DOM
conformance). It is not inherently wrong to include a more complete
version of minidom with PyXML, but it would be nice if it was stable
after 2.1.

As for the differences, I wonder what to do with the
auto-normalization feature of pDomlette. I can't figure out what
exactly that means: auto-normalization during parsing, or
auto-normalization during insertion of nodes. While I can see that it
is useful, I'm concerned about standards compliance here.

> > > It appears to be a common trick to allow null in createDocument,
> > > so that the first element found during parsing can be introduced
> > > with appendChild, but that appears to be non-conforming
> > > (somebody please correct me if it is [conforming, I meant]).
> >
> > I think it is, even though 4DOM does this.  Mike or Jeremy will
> > probably remind me if I'm missing something.  From what I see of
> > the readers, we don't need this convenience.
>
> It was originally there for the readers and to allow a user to
> create a document with out a document type.  I don't think the
> readers need this functionality any more (I'd have to look at all of
> them).

I feel some misunderstanding here. I'm talking about code like

        if ownerDoc == None:
            dt = implementation.createDocumentType('', '', '')
            self._ownerDoc = implementation.createDocument('', None, dt)
            self._rootNode = self._ownerDoc

(from xml.dom.ext.reader.Sax), in particular about the invocation of
createDocument with a null qualifiedName. I could not find any
permission in the DOM spec for such usage, and Xerces/C++ has code like

something::createDocument(DOMString& uri, DOMString& qualifiedName, DocumentType*dt){
  Document *d = new DocumentImpl(dt);
  d->appendChild(new ElementImpl(uri, qualifiedName);
  return d;
}

I.e. they create an element unconditionally, whereas
4DOM.DOMImplementation creates it only if qualifiedName.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 07:59:26 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 08:59:26 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <200102050629.XAA29875@localhost.localdomain> (message from Uche
 Ogbuji on Sun, 04 Feb 2001 23:29:08 -0700)
References: <200102050629.XAA29875@localhost.localdomain>
Message-ID: <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>

> Before you start doing this, I think we need to really air the
> matter out.  It wouldn't normally be such a big deal except for the
> special status of minidom (as the default Python DOM).

> My sentiments are in favor of the idea.  Probably the biggest issues
> would be the DOM extension interfaces, e.g. PrettyPrint vs. toXML.
> Of course DOM Level 3 should settle that.

> This would be a very opportune time for Paul Prescod to make a
> re-appearance.

I agree in all three points, in particular with the last one :-)

On the second point, I think a PEP "standard DOM extensions" would be
good. Even if that is not ready for Python 2.1, it would be desirable
unless L3 supercedes it before. In particular, it should deal with the
following aspects:

- getting an implementation; I think I can provide the proposed
  interface RSN.

- getting a tree from a parser. For SAX parsers, we could publish the
  pulldom contents, which has a standard DOM builder, as long as it is
  provided with a SAX parser and a DOM implementation.  That would not
  cover the "smart" 4Suite DOM builders which directly interact with a
  parser, or do other stuff besides building the tree.

- pretty printing.

Any volunteers who want to draft a proposal? This is the time to get
your own share of fame :-)

Regards,
Martin

P.S. As for people who I'd like to appear or re-appear: Anybody from
digicool interested? Fred's and Guido's comments are always a pleasure
to read, but who is the person or the place I could bombard with
questions about XML-in-Zope?


From eugeneai@icc.ru  Mon Feb  5 08:09:29 2001
From: eugeneai@icc.ru (Evgeny Cherkashin)
Date: Mon, 5 Feb 2001 16:09:29 +0800
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: <200102050657.f156vTe00831@mira.informatik.hu-berlin.de>
References: <200102020943.RAA23939@monster.icc.ru>
 <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru>
 <200102050657.f156vTe00831@mira.informatik.hu-berlin.de>
Message-ID: <200102050810.QAA28414@monster.icc.ru>

On Mon, 5 Feb 2001 07:57:29 +0100
"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> wrote:


MVL> from xml.parsers import expat
MVL> 


Okay. I undestood. 

MVL> Regards,
MVL> Martin
MVL> 


--


From tpassin@home.com  Mon Feb  5 12:58:07 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 5 Feb 2001 07:58:07 -0500
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> <200102050725.f157PXD00934@mira.informatik.hu-berlin.de>
Message-ID: <001401c08f73$491b22a0$7cac1218@reston1.va.home.com>

Martin v. Loewis


> 
> While hopefully looking at the actual text later, I think a major
> point of the Python PEPs is that they are online even when in draft
> status. That way, interested people don't have to react when it is
> published, as they can always go to a well-known repository and look
> what's there.
> 
I agree.  Shouldn't these xmlPEPs go onto the SF site?  Who can set that up?

Cheers,

Tom P


From akuchlin@mems-exchange.org  Mon Feb  5 13:19:28 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 5 Feb 2001 08:19:28 -0500
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, Feb 05, 2001 at 08:59:26AM +0100
References: <200102050629.XAA29875@localhost.localdomain> <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>
Message-ID: <20010205081928.A15233@newcnri.cnri.reston.va.us>

On Mon, Feb 05, 2001 at 08:59:26AM +0100, Martin v. Loewis wrote:
>P.S. As for people who I'd like to appear or re-appear: Anybody from
>digicool interested? Fred's and Guido's comments are always a pleasure
>to read, but who is the person or the place I could bombard with
>questions about XML-in-Zope?

It might be Fred, now; see http://www.advogato.org/person/fdrake/ .

--amk


From guido@digicool.com  Mon Feb  5 15:11:38 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 05 Feb 2001 10:11:38 -0500
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: Your message of "Mon, 05 Feb 2001 08:59:26 +0100."
 <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>
References: <200102050629.XAA29875@localhost.localdomain>
 <200102050759.f157xQc01033@mira.informatik.hu-berlin.de>
Message-ID: <200102051511.KAA31888@cj20424-a.reston1.va.home.com>

> P.S. As for people who I'd like to appear or re-appear: Anybody from
> digicool interested? Fred's and Guido's comments are always a pleasure
> to read, but who is the person or the place I could bombard with
> questions about XML-in-Zope?

I'm subscribed again.  Both Fred and I can handle questions about
XML-in-Zope; Fred has more implementation knowledge, I've been more
involved in architectural issues (plus one prototype app).  I'm pretty
excited about the Template Attribute Language and Zope Presentation
Templates, but I'm not sure if it is the right time to describe that
yet.  (I'll know more later this week.)  The ParsedXML stuff is open
though:

  http://www.zope.org/Wikis/DevSite/Projects/ParsedXML/FrontPage

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Mon Feb  5 15:22:23 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 05 Feb 2001 10:22:23 -0500
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: Your message of "Mon, 05 Feb 2001 07:57:29 +0100."
 <200102050657.f156vTe00831@mira.informatik.hu-berlin.de>
References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru>
 <200102050657.f156vTe00831@mira.informatik.hu-berlin.de>
Message-ID: <200102051522.KAA31951@cj20424-a.reston1.va.home.com>

> > MVL> Why is that? It should work just fine if you use xml.parsers.expat.
> > MVL> 
> > 
> > But in the automatical mode (without explicit notification) does not.
> 
> Can you please elaborate? If one writes
> 
> from xml.parsers import expat
> 
> it works fine; the PyXML version of pyexpat is used. What is the
> automatical mode? What is explicit notification?

Adding more confusion: I recently got bitten by a really nasty
convention in Zope where you must do "import ZODB" for a side effect
it has.  (It installs a persistency implementation in another module.
If you import that other module before ZODB, it fails with a
mysterious error.)

I would hope that the standard xml package (nor PyXML) does not repeat
that trick (requiring an import for its side effect).  Factory
functions should be used, and if there's a generic factory function,
it should also have a sensible default.

I don't know enough about the xml package to be able to figure out
whether or not it engages in such tricks, and offer my apologies if
this is already taken care of!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 18:00:49 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 19:00:49 +0100
Subject: [XML-SIG] Draft PEP for Using None in Namespace URIs
In-Reply-To: <001401c08f73$491b22a0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <200101292050.NAA11485@localhost.localdomain> <x7hf2ixb59.fsf@bitsko.slc.ut.us> <006d01c08a64$086567c0$7cac1218@reston1.va.home.com> <001201c08a88$056eb2a0$7cac1218@reston1.va.home.com> <00c301c08f35$da9dfec0$7cac1218@reston1.va.home.com> <200102050725.f157PXD00934@mira.informatik.hu-berlin.de> <001401c08f73$491b22a0$7cac1218@reston1.va.home.com>
Message-ID: <200102051800.f15I0n600860@mira.informatik.hu-berlin.de>

> I agree.  Shouldn't these xmlPEPs go onto the SF site?  Who can set that up?

It would be best if you check it into the www project in the CVS,
which will also provide the versioning of the document. At the moment,
you have to run /<somewhere I forgot>/pyxml/doupdate on an SF shell
machine to propagate the content to the Web page; I hope I can restore
the cron job for that.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb  5 18:07:02 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 5 Feb 2001 19:07:02 +0100
Subject: [XML-SIG] Underdeveloped installer of the pyXML-0.6.3
In-Reply-To: <200102051522.KAA31951@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Mon, 05 Feb 2001 10:22:23 -0500)
References: <200102020943.RAA23939@monster.icc.ru> <200102021333.f12DXkg00810@mira.informatik.hu-berlin.de> <200102050233.KAA24522@monster.icc.ru>
 <200102050657.f156vTe00831@mira.informatik.hu-berlin.de> <200102051522.KAA31951@cj20424-a.reston1.va.home.com>
Message-ID: <200102051807.f15I72E00863@mira.informatik.hu-berlin.de>

> I would hope that the standard xml package (nor PyXML) does not repeat
> that trick (requiring an import for its side effect).  Factory
> functions should be used, and if there's a generic factory function,
> it should also have a sensible default.

No, the trick it engages in is that xml/parsers/expat.py reads

from pyexpat import *

Now, PyXML provides a pyexpat copy that sometimes supercedes the one
in Python (if bugs are detected in the Python version at installation
time). It installs xml/parsers/pyexpat.pyd, so that is found in the
package before the builtin module; if it is not present, the builtin
is used.

I'm not sure what Evgeny's concern was, perhaps that a plain

import pyexpat

in the application would not get the PyXML-provided replacement; there
is not much we could do about that.

Regards,
Martin


From guido@digicool.com  Mon Feb  5 19:22:09 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 05 Feb 2001 14:22:09 -0500
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: Your message of "Sun, 04 Feb 2001 00:23:15 +0100."
 <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de>
References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com>
 <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de>
Message-ID: <200102051922.OAA01298@cj20424-a.reston1.va.home.com>

Thanks to Martin and Uche, I've gathered confidence and checked in my
changes to minidom and pulldom in the Python tree.  (Are there also
PyXML versions?  Someone should update them too then.)

I'll leave it to Martin to add code to raise hell when
createDocument() is passed an empty qualified name, and also to change
pulldom to do the right thing when it in fact passes a non-null
qualified name: The code in startDocument() looks like it would insert
two document elements if self._locator is set and its getPublicId()
returns a non-null qualified name.  I don't know how to fix that, or
how common this is.

(My checkin comment has a bug: it claims that hasAttributes() is DOM
level 3, but it is really level 2.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Tue Feb  6 01:31:46 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 6 Feb 2001 02:31:46 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <200102051922.OAA01298@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Mon, 05 Feb 2001 14:22:09 -0500)
References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com>
 <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> <200102051922.OAA01298@cj20424-a.reston1.va.home.com>
Message-ID: <200102060131.f161Vk311008@mira.informatik.hu-berlin.de>

> Are there also PyXML versions?  Someone should update them too then.

At the moment, the Python versions are a couple of revisions
ahead. I'll add the recently-discussed factory finder for DOM
implementations to xml.dom.__init__, and merge all that into PyXML
afterwards.

> I'll leave it to Martin to add code to raise hell when
> createDocument() is passed an empty qualified name, and also to change
> pulldom to do the right thing when it in fact passes a non-null
> qualified name

Done.

> The code in startDocument() looks like it would insert two document
> elements if self._locator is set and its getPublicId() returns a
> non-null qualified name.  I don't know how to fix that, or how
> common this is.

I think this code was completely bogus. The author apparently thought
of creating DocumentTypes, in which case publicId and systemId would
be required. However, the SAX locator does not provide that
information (atleast not for the DTD; rather for the document itself),
nor were we in the process of creating document types. 

It seems that the processing of the doctype argument is also
incorrect: It should *not* create one given the qualifiedName, atleast
I can't find any indication that it should. It MUST set the
ownerDocument, though, which it doesn't. I'm not sure whether the
doctype needs to appear in the childNodes of the Document, can anybody
clarify this?

Regards,
Martin


From mclay@nist.gov  Tue Feb  6 02:03:41 2001
From: mclay@nist.gov (Michael McLay)
Date: Mon, 5 Feb 2001 21:03:41 -0500
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
Message-ID: <0102052103410E.03631@fermi.eeel.nist.gov>

The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to 
include the addition of the freevars and cellvars arguments that were added 
to PyCode_New and  closure that was added to PyFrame_New 

copying xml/utils/iso8601.py -> build/lib.linux-i586-2.1/_xmlplus/utils
copying xml/utils/qp_xml.py -> build/lib.linux-i586-2.1/_xmlplus/utils
running build_ext
building '_xmlplus.parsers.pyexpat' extension
creating build/temp.linux-i586-2.1
gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD 
-DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok 
-Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c 
extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o
extensions/pyexpat.c: In function `getcode':
extensions/pyexpat.c:266: warning: passing arg 11 of `PyCode_New' makes 
pointer from integer without a cast
extensions/pyexpat.c:266: too few arguments to function `PyCode_New'
extensions/pyexpat.c: In function `call_with_frame':
extensions/pyexpat.c:293: too few arguments to function `PyFrame_New'
error: command 'gcc' failed with exit status 1


according to modsupport.h
   25-Jan-2001  FLD     1010    Parameters added to PyCode_New() and
                                PyFrame_New(); Python 2.1a2

In compile.c 
  PyCodeObject *
  PyCode_New(int argcount, int nlocals, int stacksize, int flags,
	   PyObject *code, PyObject *consts, PyObject *names,
	   PyObject *varnames, PyObject *freevars, PyObject *cellvars,
	   PyObject *filename, PyObject *name, int firstlineno,
	   PyObject *lnotab) 
  {

and frameobject.c
  PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals, 
	    PyObject *locals, PyObject *closure)
  {


From rnd@onego.ru  Tue Feb  6 05:23:26 2001
From: rnd@onego.ru (Roman Suzi)
Date: Tue, 6 Feb 2001 08:23:26 +0300 (MSK)
Subject: [XML-SIG] [OT] locale.py doesn't work? (fwd)
Message-ID: <Pine.LNX.4.30.0102060810310.23118-100000@rnd.onego.ru>

I am sorry for offtopic, but I can't contact Martin at
martin@mira.cs.tu-berlin.de for a week already.
(connection refused)

Roman.
---------- Forwarded message ----------
Date: Tue, 30 Jan 2001 15:46:02 +0300 (MSK)
From: Roman Suzi <rnd@onego.ru>
To: Martin v. Loewis <martin@mira.cs.tu-berlin.de>
Subject: locale.py doesn't work?

Hello, Martin!

I am trying to use ru_RU.koi8-r locale
(collation, uppercase, etc) but it doesn't work
for unknown reason.

Here is a code:

> cat ./try_locale.py
#!/usr/bin/env python

import locale

import os
# os.environ["LC_ALL"] =3D "ru_RU.CP1251"

# print locale.getdefaultlocale()

locale.setlocale(locale.LC_ALL,['ru_RU','koi8-r'])
#locale.setlocale(locale.LC_ALL,['ru_RU','KOI8-R'])

print locale.getlocale()

print locale.string.uppercase
print locale.string.lowercase

# End of try_locale.py

> ./try_locale.py
['ru_RU', 'ISO8859-5']
ABCDEFGHIJKLMNOPQRSTUVWXYZ=A1=A2=A3=A4=A5=A6=A7=A8=A9=AA=AB=AC=AE=AF=B0=B1=
=B2=B3=B4=B5=B6=B7=B8=A0=BA=BB=BC=BD=BE=BF=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=
=CB=CC=CD=CE=CF
abcdefghijklmnopqrstuvwxyz=D0=D1=D2=D3=D4=D5=D6=D7=D8=D9=DA=DB=DC=DD=DE=DF=
=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=EF=F1=F2=F3=F4=F5=F6=F7=F8=F9=
=FA=FB=FC=FE=FF

- which is wrong, because koi8-r uppercase letters are different.
It is not even ISO8859-5 (I tried with recode:

> ./try_locale.py | recode ISO8859-5..koi8-r
['ru_RU', 'ISO8859-5']
ABCDEFGHIJKLMNOPQRSTUVWXYZ=B3recode: Invalid input in step
`ISO-8859-5..KOI8-R'

- and this is strange.

Am I missing something important?
Or is it a bug in Python 2.0?
(All this in BlackCat Linux 6.2 ~=3D RH 6.2)

(You are the author of the article "Internationalizing Python"
so probably you could answer this question.)

Sincerely yours, Roman Suzi
--=20
Vote for my design: http://silvermouse.onego.ru/gray.php3?id=3D0018
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Tuesday, January 30, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Give instruction to a wise man and he will be yet wiser." _/


From uche.ogbuji@fourthought.com  Tue Feb  6 05:29:13 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 05 Feb 2001 22:29:13 -0700
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Tue, 06 Feb 2001 02:31:46 +0100." <200102060131.f161Vk311008@mira.informatik.hu-berlin.de>
Message-ID: <200102060529.WAA18345@localhost.localdomain>

> > The code in startDocument() looks like it would insert two document
> > elements if self._locator is set and its getPublicId() returns a
> > non-null qualified name.  I don't know how to fix that, or how
> > common this is.
> 
> I think this code was completely bogus. The author apparently thought
> of creating DocumentTypes, in which case publicId and systemId would
> be required. However, the SAX locator does not provide that
> information (atleast not for the DTD; rather for the document itself),
> nor were we in the process of creating document types. 
> 
> It seems that the processing of the doctype argument is also
> incorrect: It should *not* create one given the qualifiedName, atleast
> I can't find any indication that it should. It MUST set the
> ownerDocument, though, which it doesn't. I'm not sure whether the
> doctype needs to appear in the childNodes of the Document, can anybody
> clarify this?

Yes.  The doctype is a child of the Document, along with any comments and PIs 
in the prolog.  This is the main reason for having a documentElement() method.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Feb  6 05:30:54 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 05 Feb 2001 22:30:54 -0700
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: Message from Michael McLay <mclay@nist.gov>
 of "Mon, 05 Feb 2001 21:03:41 EST." <0102052103410E.03631@fermi.eeel.nist.gov>
Message-ID: <200102060530.WAA18464@localhost.localdomain>

> The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to 
> include the addition of the freevars and cellvars arguments that were added 
> to PyCode_New and  closure that was added to PyFrame_New 
> 
> copying xml/utils/iso8601.py -> build/lib.linux-i586-2.1/_xmlplus/utils
> copying xml/utils/qp_xml.py -> build/lib.linux-i586-2.1/_xmlplus/utils
> running build_ext
> building '_xmlplus.parsers.pyexpat' extension
> creating build/temp.linux-i586-2.1
> gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD 
> -DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok 
> -Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c 
> extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o
> extensions/pyexpat.c: In function `getcode':
> extensions/pyexpat.c:266: warning: passing arg 11 of `PyCode_New' makes 
> pointer from integer without a cast
> extensions/pyexpat.c:266: too few arguments to function `PyCode_New'
> extensions/pyexpat.c: In function `call_with_frame':
> extensions/pyexpat.c:293: too few arguments to function `PyFrame_New'
> error: command 'gcc' failed with exit status 1

Odd.  It does compile for me with Python 2.1a2, but then I'm using PyXML from 
CVS.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Tue Feb  6 08:53:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 6 Feb 2001 09:53:47 +0100
Subject: [XML-SIG] [OT] locale.py doesn't work? (fwd)
In-Reply-To: <Pine.LNX.4.30.0102060810310.23118-100000@rnd.onego.ru> (message
 from Roman Suzi on Tue, 6 Feb 2001 08:23:26 +0300 (MSK))
References: <Pine.LNX.4.30.0102060810310.23118-100000@rnd.onego.ru>
Message-ID: <200102060853.f168rlX01040@mira.informatik.hu-berlin.de>

> I am sorry for offtopic, but I can't contact Martin at
> martin@mira.cs.tu-berlin.de for a week already.
> (connection refused)

Sorry for any confusion this has caused; please use
martin@loewis.home.cs.tu-berlin.de (which *should* be the From:
address in this message).

BTW, i18n-sig@python.org would have been the right for this kind of
issue.

> print locale.string.uppercase
> print locale.string.lowercase
> 
> # End of try_locale.py
> 
> > ./try_locale.py
> ['ru_RU', 'ISO8859-5']
> ABCDEFGHIJKLMNOPQRSTUVWXYZ����������������������������������������������
> abcdefghijklmnopqrstuvwxyz����������������������������������������������
> 
> - which is wrong, because koi8-r uppercase letters are different.

Interesting. Please run the C program

#include <locale.h>
#include <ctype.h>

int main()
{
	int i;
	printf("%s\n",setlocale(LC_ALL,"ru_RU"));
	for(i=1;i<256;i++){
		if(islower(i))
			printf("%d, ",i);
	}
	printf("\n");
}

on your system. It is supposed to print the decimal values of all
lowercase letters. As you'll find, it prints the numeric values of all
letters in string.letters (try map(ord, string.letters) to obtain such
a list in Python).

I get the same results on my Linux installation, which uses glibc
2.1.3. So I'd say it is a bug in the C library; please submit a bug
using the glibcbug script if you agree, or complain to your Linux
distributor. If you find out a solution, please let us know.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb  6 09:44:29 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 6 Feb 2001 10:44:29 +0100
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: <0102052103410E.03631@fermi.eeel.nist.gov> (message from Michael
 McLay on Mon, 5 Feb 2001 21:03:41 -0500)
References: <0102052103410E.03631@fermi.eeel.nist.gov>
Message-ID: <200102060944.f169iTe01589@mira.informatik.hu-berlin.de>

> The calls to PyCode_New and PyFrame_New in pyexpat.c need to be updated to 
> include the addition of the freevars and cellvars arguments that were added 
> to PyCode_New and  closure that was added to PyFrame_New 

Thanks for the reminder. The pyexpat copy in 2.0a2 already had these
changes, yet in a manner that only worked for the modified API. I have
modified both copies of pyexpat.c to support the 2.0a2 API.

pyexpat.c has good chances of being the Python module with the highest
number of independently-maintained copies; a third copy lives in the
Zope CVS.

People building PyXML might not have noticed the problem, setup.py
won't build pyexpat if it finds that the Python one is good enough.

Regards,
Martin


From edd@usefulinc.com  Tue Feb  6 10:14:56 2001
From: edd@usefulinc.com (Edd Dumbill)
Date: Tue, 6 Feb 2001 10:14:56 +0000
Subject: [XML-SIG] REMINDER: Days left for O'Reilly Open Source Conference XML CFP
Message-ID: <20010206101456.N25446@usefulinc.com>

A reminder -- just days left to submit a proposal for the XML track at
O'Reilly's Open Source convention this year.  I include the original CFP
below. Please get in touch if you have any questions or need more time.

(sent to XML-DEV, copied to Apache General, Python XML-SIG and Perl-XML
lists)


Call for Participation

XTech 2001 Conference (in co-operation with GCA)

Part of the O'Reilly Open Source Convention

July 23-27, 2001 in San Diego, California
<http://conferences.oreilly.com/oscon2001/>
<http://conferences.oreilly.com/oscon2001/call-xml.html>

The Open Source Convention is a five-day event designed for programmers,
developers, and technical staff involved in Open Source technology and
its applications.  The Convention includes two days of intensely focused
tutorials aimed at novices and experienced users, and three days of
multi-tracked convention sessions, including an XML track, XTech 2001.

The XML program committee invites submissions of tutorials or
convention presentations on pure XML topics, open source XML
applications and the use of XML in open source platforms. Submissions
tailored for open source developers new to XML, as well as those that
highlight the cutting edge of XML technology are sought.

Submissions by marketing staff or with a marketing focus will not be
accepted.

The deadline for tutorial and presentation proposals is February 9, 2001

Further details and guidelines for submission may be found at
<http://conferences.oreilly.com/oscon2001/call-xml.html>

-- Edd Dumbill, XML Track Chair <mailto:edd@oreilly.com>


From mclay@nist.gov  Tue Feb  6 01:31:25 2001
From: mclay@nist.gov (Michael McLay)
Date: Mon, 5 Feb 2001 20:31:25 -0500
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: <200102060944.f169iTe01589@mira.informatik.hu-berlin.de>
References: <0102052103410E.03631@fermi.eeel.nist.gov> <200102060944.f169iTe01589@mira.informatik.hu-berlin.de>
Message-ID: <01020520312500.01559@fermi.eeel.nist.gov>

On Tuesday 06 February 2001 04:44, Martin v. Loewis wrote:
>
> People building PyXML might not have noticed the problem, setup.py
> won't build pyexpat if it finds that the Python one is good enough.

When I ran make on 2.0a2 and then ran build the pyexpat module wasn't built.  
I'm running Redhat linux:

    Linux  2.2.14-5.0 #1 Tue Mar 7 20:53:41 EST 2000 i586 unknown

I suspect it may have failed to build because I did not have expat installed. 
There wasn't a warning to this effect and the instructions in the README file 
did not say I need to have it installed. (The README file still talks about 
editing the Module/Setup file so I think it is probably out of date.)  

The 2.1a2 download page does nor reference source code for expat.  It was 
included in the 2.0 download page, http://www.python.org/2.0/.  It was not 
clear to me if the expat source code was included in the 2.1a2 source 
distribution. 

The 2.1 download page, http://www.python.org/2.1/, references the 2.1a1 
release on SourceForge.  (The returned page highlights the older release 
instead of the 2.1a2 release.)  The http://www.python.org/ftp/python/2.1/ 
download location is also reference on the /2.1/ page.  None of the pages 
reference expat source or mention the need to install expat.


From martin@loewis.home.cs.tu-berlin.de  Tue Feb  6 19:42:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 6 Feb 2001 20:42:36 +0100
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: <01020520312500.01559@fermi.eeel.nist.gov> (message from Michael
 McLay on Mon, 5 Feb 2001 20:31:25 -0500)
References: <0102052103410E.03631@fermi.eeel.nist.gov> <200102060944.f169iTe01589@mira.informatik.hu-berlin.de> <01020520312500.01559@fermi.eeel.nist.gov>
Message-ID: <200102061942.f16Jga700919@mira.informatik.hu-berlin.de>

> > People building PyXML might not have noticed the problem, setup.py
> > won't build pyexpat if it finds that the Python one is good enough.
>
> When I ran make on 2.0a2 and then ran build the pyexpat module
> wasn't built.  I'm running Redhat linux:
> 
>     Linux  2.2.14-5.0 #1 Tue Mar 7 20:53:41 EST 2000 i586 unknown
>
> I suspect it may have failed to build because I did not have expat
> installed.

That is the likely cause, indeed. I was not saying that you did
anything wrong, or that others did anything wrong - I just tried to
explain the differences.

> There wasn't a warning to this effect and the instructions in the
> README file did not say I need to have it installed. (The README
> file still talks about editing the Module/Setup file so I think it
> is probably out of date.)

It is not an error for an extension module not being built - it just
won't be there when you need it. The autoconfiguration can't know what
modules you meant to be built; instead, it will build everything it
can (sometimes, it errs at guessing what it can build, such case is a
genuine bug).

> The 2.1a2 download page does nor reference source code for expat.
> It was included in the 2.0 download page,
> http://www.python.org/2.0/.  It was not clear to me if the expat
> source code was included in the 2.1a2 source distribution.

It wasn't included, and likely will not be. Instead, you need to
install it separately (as you did for 2.0 - the Python download pages
just provided a copy that was known to work).

> The 2.1 download page, http://www.python.org/2.1/, references the
> 2.1a1 release on SourceForge.  (The returned page highlights the
> older release instead of the 2.1a2 release.)  The
> http://www.python.org/ftp/python/2.1/ download location is also
> reference on the /2.1/ page.  None of the pages reference expat
> source or mention the need to install expat.

It is not needed, at least not more than Tkinter, zlib, BSDDB, OpenGL,
Purify, readline, OpenSSL, or GDBM. Different manual installation will
provide different sets of extension modules, but that is really no
change. It will be the responsibility of packagers (i.e. Windows
installation authors and Linux distributors) to make sure a common set
of extension modules is always available. It is certainly recommended
that pyexpat is in this common set.

Regards,
Martin


From paulp@ActiveState.com  Tue Feb  6 21:08:26 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Tue, 06 Feb 2001 13:08:26 -0800
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050629.XAA29875@localhost.localdomain>
Message-ID: <3A8067CA.3983596@ActiveState.com>

Uche Ogbuji wrote:
> 
> ...
> 
> This would be a very opportune time for Paul Prescod to make a re-appearance.

Your invocation cut my vacation short. Thanks alot!

I think that minidom should remain as mini as possible. I'll comment on
the other issues later today...

 Paul Prescod


From mclay@nist.gov  Tue Feb  6 10:17:20 2001
From: mclay@nist.gov (Michael McLay)
Date: Tue, 6 Feb 2001 05:17:20 -0500
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: <200102061942.f16Jga700919@mira.informatik.hu-berlin.de>
References: <0102052103410E.03631@fermi.eeel.nist.gov> <01020520312500.01559@fermi.eeel.nist.gov> <200102061942.f16Jga700919@mira.informatik.hu-berlin.de>
Message-ID: <01020605172007.01559@fermi.eeel.nist.gov>

On Tuesday 06 February 2001 14:42, Martin v. Loewis wrote:
> > I suspect it may have failed to build because I did not have expat
> > installed.
>
> That is the likely cause, indeed. I was not saying that you did
> anything wrong, or that others did anything wrong - I just tried to
> explain the differences.

Is there a specific version of expat that needs to be installed?  I found no 
reference to where to get the library or which version was required.  I could 
go to  freshmeat to look it up, but what if I find a stall reference to an 
old version of the library?  There is an advantage in having a list of URLs 
required to build extensions would make it much safer and error free for end 
users when they are trying to build a fully populated module library.

>
> > There wasn't a warning to this effect and the instructions in the
> > README file did not say I need to have it installed. (The README
> > file still talks about editing the Module/Setup file so I think it
> > is probably out of date.)
>
> It is not an error for an extension module not being built - it just
> won't be there when you need it. The autoconfiguration can't know what
> modules you meant to be built; instead, it will build everything it
> can (sometimes, it errs at guessing what it can build, such case is a
> genuine bug).

I found out the pyexpat wasn't built when I tried executing a script that 
required the module.  Fortunately I was still running ground tests on my 
flight control system when the exception was raised:-)

> > The 2.1a2 download page does nor reference source code for expat.
> > It was included in the 2.0 download page,
> > http://www.python.org/2.0/.  It was not clear to me if the expat
> > source code was included in the 2.1a2 source distribution.
>
> It wasn't included, and likely will not be. Instead, you need to
> install it separately (as you did for 2.0 - the Python download pages
> just provided a copy that was known to work).

I understand why you don't bundle it with the distribution, I just was 
pointing out that the documentation didn't make it clear that I needed to 
have the expat library installed before PyXML would work.  

> > The 2.1 download page, http://www.python.org/2.1/, references the
> > 2.1a1 release on SourceForge.  (The returned page highlights the
> > older release instead of the 2.1a2 release.)  The
> > http://www.python.org/ftp/python/2.1/ download location is also
> > reference on the /2.1/ page.  None of the pages reference expat
> > source or mention the need to install expat.
>
> It is not needed, at least not more than Tkinter, zlib, BSDDB, OpenGL,
> Purify, readline, OpenSSL, or GDBM. Different manual installation will
> provide different sets of extension modules, but that is really no
> change. It will be the responsibility of packagers (i.e. Windows
> installation authors and Linux distributors) to make sure a common set
> of extension modules is always available. It is certainly recommended
> that pyexpat is in this common set.

The Linux distributions are not consistent about which modules are built.  I 
assume that the included modules are based on what they need internally and 
maybe what is easy to build.  If it were possible to easily identify which 
modules were not built and what was missing that prevented them from being 
built.  With some additional information it is more likely that the 
maintainers of Linux distributions would add imissing libraries so the buiild 
would be complete.   This would help Python to have a more uniform base of 
preinstalled modules.

That is just speculation on my part.  I would expect the report generation 
could be done automatically by the build process.  The build tool would need 
to track which modules were skipped and generate a report at the end of what 
was not built and why.  To enable the reporting a module maintainers would 
add a dictionary that mapped libraries to a list of URLs where the 
libraries can be retrieved.


From martin@loewis.home.cs.tu-berlin.de  Tue Feb  6 23:29:38 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 7 Feb 2001 00:29:38 +0100
Subject: [XML-SIG] pyexpat.c does not compile with Python2.1a2
In-Reply-To: <01020605172007.01559@fermi.eeel.nist.gov> (message from Michael
 McLay on Tue, 6 Feb 2001 05:17:20 -0500)
References: <0102052103410E.03631@fermi.eeel.nist.gov> <01020520312500.01559@fermi.eeel.nist.gov> <200102061942.f16Jga700919@mira.informatik.hu-berlin.de> <01020605172007.01559@fermi.eeel.nist.gov>
Message-ID: <200102062329.f16NTcs02332@mira.informatik.hu-berlin.de>

> Is there a specific version of expat that needs to be installed?  I
> found no reference to where to get the library or which version was
> required.

Both expat 1.1 and 1.2 are known to work. Unfortunately, these
releases are not self-identifying, so it is hard to tell which one you
got after you've installed them. For 1.95.1, I think there are some
issues that pyexpat will behave differently - I don't know whether
that is due to bug fixes, new bugs, or simply changed behaviour; it
was also a while ago that I've used this version (by accident at that
time).

> There is an advantage in having a list of URLs required to build
> extensions would make it much safer and error free for end users
> when they are trying to build a fully populated module library.

Certainly. The disadvantage of having such a list is that it requires
a volunteer to maintain it. Please have a look at Modules/Setup.dist,
though. As it is not required in the actual build process, it may be
inaccurate.

> I found out the pyexpat wasn't built when I tried executing a script
> that required the module.  Fortunately I was still running ground
> tests on my flight control system when the exception was raised:-)

Yes, packaging and deployment is a hard business, and often requires
expert knowledge of the system being deployed.

> > > The 2.1a2 download page does nor reference source code for expat.
> > > It was included in the 2.0 download page,
> > > http://www.python.org/2.0/.  It was not clear to me if the expat
> > > source code was included in the 2.1a2 source distribution.
> >
> > It wasn't included, and likely will not be. Instead, you need to
> > install it separately (as you did for 2.0 - the Python download pages
> > just provided a copy that was known to work).
> 
> I understand why you don't bundle it with the distribution, I just was 
> pointing out that the documentation didn't make it clear that I needed to 
> have the expat library installed before PyXML would work.  

There is some confusion here: PyXML *does* include expat. Python 2.0
does not include it, nor will 2.1.

> If it were possible to easily identify which modules were not built
> and what was missing that prevented them from being built.  With
> some additional information it is more likely that the maintainers
> of Linux distributions would add imissing libraries so the buiild
> would be complete.

In Python 1.5, Modules/Setup* did provide such a complete list, yet
there were still differences - apparently caused by distributors being
unwilling to make the required headers and libraries available on the
build system. So I do not believe your claim that a comprehensive list
would solve this matter.

> That is just speculation on my part.  I would expect the report
> generation could be done automatically by the build process.  The
> build tool would need to track which modules were skipped and
> generate a report at the end of what was not built and why.

It is certainly possible; it just requires a volunteer to implement
such a feature. It would then require cooperation of all contributors
to use the feature properly when they make changes to the build
process.

To officially request a feature, please file a bug report at
sourceforge.net/projects/python. This is also the place where patches
can be contributed.

Regards,
Martin


From Mike.Olson@fourthought.com  Wed Feb  7 00:56:34 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 06 Feb 2001 17:56:34 -0700
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com> <200102050745.f157jOm00978@mira.informatik.hu-berlin.de>
Message-ID: <3A809D42.95B40680@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > I think we should also look at merging minidom and pDomlette.  Both are
> > supposed to be "mini" and I think they both support about the same sets
> > of functionality.  No sense keeping both of them around.  I can look at
> > the differences and try to merge them.
> 
> I never quite understood where the "p" in pDomlette came from. To
> date, pDomlette is just 200 lines longer than minidom, so yes, merging
> them is a genuine option. Bear in mind that a new Python release is
> upcoming, and that the final beta release is probably the last point
> to add missing features (i.e. bug corrections with regard to DOM
> conformance). It is not inherently wrong to include a more complete
> version of minidom with PyXML, but it would be nice if it was stable
> after 2.1.

We've been using pDomlette pretty heavily for quite some time now.  It
is the default DOM in 4XSLT.  That would be one nice feature about
combining them both, the DOM that ships with Python 2.0 will work with
4XSLT (and 4XPath, 4XLink, et al).

> 
> As for the differences, I wonder what to do with the
> auto-normalization feature of pDomlette. I can't figure out what
> exactly that means: auto-normalization during parsing, or
> auto-normalization during insertion of nodes. While I can see that it
> is useful, I'm concerned about standards compliance here.


It is during parsing.  If you append a text node after another text
node, it will keep them as two seperate nodes.  I think this is
standards compliant though iuf I recall the spec is a little bit hazy
there...

> I feel some misunderstanding here. I'm talking about code like
> 
>         if ownerDoc == None:
>             dt = implementation.createDocumentType('', '', '')
>             self._ownerDoc = implementation.createDocument('', None, dt)
>             self._rootNode = self._ownerDoc
> 
> (from xml.dom.ext.reader.Sax), in particular about the invocation of
> createDocument with a null qualifiedName. I could not find any
> permission in the DOM spec for such usage, and Xerces/C++ has code like

Its for the readers.  If pass in a namespaceURI to createDocument it
will add the root element.  Then we would need special handeling code in
start_element to determine if a document element has been added, or if
it even has a document element.

Mike
-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Mike.Olson@fourthought.com  Wed Feb  7 01:18:22 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 06 Feb 2001 18:18:22 -0700
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050446.VAA29599@localhost.localdomain> <3A7E458B.E787CD07@FourThought.com> <200102050745.f157jOm00978@mira.informatik.hu-berlin.de> <3A809D42.95B40680@FourThought.com>
Message-ID: <3A80A25E.8BC0A360@FourThought.com>

> > I never quite understood where the "p" in pDomlette came from.


For Python DOM, as opposed to cDomlette, our C DOM.

Its a naming convention I stole from Zope....our eventual idea is to
have Ft.Lib.domlette and at import time decide if it should be "p" or
"c" but we have a fair amount of work to do on our cDomlette first.


Mike

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Wed Feb  7 04:52:12 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 6 Feb 2001 23:52:12 -0500
Subject: [XML-SIG] Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2
Message-ID: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com>

I started trying out some of the demos that come with PyXML and 4Suite -
actually I got them by downloading the source for 4Suite from the 4Suite
server.  I've picked up some bugs.  Most, but not all, are in test or demo
scripts, but some are in actual working modules.  I've only tried out a few
things, so this is not comprhensive at all.

My system is Python 1.5.2. on Windows 98, with PyXML 0.6.2.

First, a number of modules reference the "core" module, which isn't there (any
more, I assume?).  Some are tests, some are not.  I don't have a list at the
moment, but they should be flushed out.

Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement doesn't throw
an ImportError, but a NameError (seems strange).  I fixed it  like this (I
used Exception instead of NameError, in case some other versoin should throw
ImportError as you would expect.
line 27:
try:
    #Python 2.0
    import pyexpat
#except ImportError:   ==>Currently this import throws NameError, not
ImportError
except  Exception:
    #Python 1.x with PyXML
    from xml.parsers import pyexpat

I don't think this is really the way to fix it, though - there must be some
reason I'm getting an unexpected type of exception, and that is what ought to
be fixed.

Finally, in Ft\Xlink\XLinkElements.py, reader.fromURI() has an additional
argument which is no longer used in the reader's parent class.  I fixed it
like this, commenting out the extra arg so you can see it:

line 51:
       frag = reader.fromUri(self.href)#, doc = doc) ==> API doesn't include
'doc' arg

It looks to me like there are a lot of left-over things that haven't gotten
caught yet, and a lot of the tests haven't run for me - DOM seems OK but XLink
and XPointer have given problems.  They look like the kind of things that
wouldn't have been worked for 0.6.3, but I haven't tried that yet.

I'm not sure who shoud be putting fixes for these bugs in once they are agreed
on.  I'm still not getting secure access negotiated properly, so it's not
going to be me for while yet.

Cheers,

Tom P


From tpassin@home.com  Wed Feb  7 04:55:54 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 6 Feb 2001 23:55:54 -0500
Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2
Message-ID: <008101c090c2$40afeba0$7cac1218@reston1.va.home.com>

I forget to add - a lot of FT's test scripts use "import TestSuite", but
that's now at Ft.Lib.TestSuite, and I had to make a number of corresponding
import changes, too. There were a lot of them in the xpath test directory,
though I didn't make a list yet.

Cheers,

Tom P

----- Original Message -----
From: "Thomas B. Passin" <tpassin@home.com>
To: <xml-sig@python.org>
Sent: Tuesday, February 06, 2001 11:52 PM
Subject: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2


> I started trying out some of the demos that come with PyXML and 4Suite -
> actually I got them by downloading the source for 4Suite from the 4Suite
> server.  I've picked up some bugs.  Most, but not all, are in test or demo
> scripts, but some are in actual working modules.  I've only tried out a few
> things, so this is not comprhensive at all.
>
> My system is Python 1.5.2. on Windows 98, with PyXML 0.6.2.
>
> First, a number of modules reference the "core" module, which isn't there
(any
> more, I assume?).  Some are tests, some are not.  I don't have a list at the
> moment, but they should be flushed out.
>
> Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement doesn't
throw
> an ImportError, but a NameError (seems strange).  I fixed it  like this (I
> used Exception instead of NameError, in case some other versoin should throw
> ImportError as you would expect.
> line 27:
> try:
>     #Python 2.0
>     import pyexpat
> #except ImportError:   ==>Currently this import throws NameError, not
> ImportError
> except  Exception:
>     #Python 1.x with PyXML
>     from xml.parsers import pyexpat
>
> I don't think this is really the way to fix it, though - there must be some
> reason I'm getting an unexpected type of exception, and that is what ought
to
> be fixed.
>
> Finally, in Ft\Xlink\XLinkElements.py, reader.fromURI() has an additional
> argument which is no longer used in the reader's parent class.  I fixed it
> like this, commenting out the extra arg so you can see it:
>
> line 51:
>        frag = reader.fromUri(self.href)#, doc = doc) ==> API doesn't include
> 'doc' arg
>
> It looks to me like there are a lot of left-over things that haven't gotten
> caught yet, and a lot of the tests haven't run for me - DOM seems OK but
XLink
> and XPointer have given problems.  They look like the kind of things that
> wouldn't have been worked for 0.6.3, but I haven't tried that yet.
>
> I'm not sure who shoud be putting fixes for these bugs in once they are
agreed
> on.  I'm still not getting secure access negotiated properly, so it's not
> going to be me for while yet.
>
> Cheers,
>
> Tom P
>


From tpassin@home.com  Wed Feb  7 05:16:53 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 7 Feb 2001 00:16:53 -0500
Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2
Message-ID: <008801c090c5$2ed14e80$7cac1218@reston1.va.home.com>

More problems with the XPointer TestParser script.

At line 12, the ReadFromUri() no longer exists. I hacked up a fix as shown
below:

********** XPointerParser **********
Traceback (innermost last):
  File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line
67,
in ?
    retval = test()
  File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line
12,
in test
    doc = pDomlette.ReadFromUri('addrbook.xml')
AttributeError: ReadFromUri


##    doc = pDomlette.ReadFromUri('addrbook.xml')  #Original code
    reader=pDomlette.PyExpatReader()  # Need a reader, original call is no
more
    doc = reader.fromUri('addrbook.xml')

Now the code runs, but the example being tested still fails -  Apparently the
xpointer expression no longer works.  When I try various plausible variations,
a few of them work, some of them return None (which causes an error) , and
some give this same error.

Seems to me that if no match is found for an expression, returning None would
always be appropriate.  It shouldn't return an exception unless there was
invalid syntax in the xpointer expression.

Here's the trace:

C:>TestParser.py
********** XPointerParser **********
Creating test environment                                              [
OK  ]
Traceback (innermost last):
  File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line
69,
in ?
    retval = test()
  File "C:\Program Files\Python\Ft\XPointer\test_suite\TestParser.py", line
49,
in test
    result = XPointer.SelectNode(doc, frag)
  File "C:\Program Files\Python\Ft\XPointer\__init__.py", line 57, in
SelectNode

    return xptr.select(doc, contextNode, nss)
  File "C:\Program Files\Python\Ft\XPointer\ParsedXPointer.py", line 47, in
sele
ct
    raise XPtrException(XPtrException.SUB_RESOURCE_ERROR)
Ft.XPointer.XPtrException.XPtrException: Expression does not locate a resource


Cheers,

Tom P

p.s. - That's it for tonight, no more posts on this!


From paulp@ActiveState.com  Wed Feb  7 21:02:31 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Wed, 07 Feb 2001 13:02:31 -0800
Subject: [XML-SIG] Minidom bugs/questions
References: <200102031939.OAA12187@cj20424-a.reston1.va.home.com>
 <200102032323.f13NNFI01991@mira.informatik.hu-berlin.de> <200102051922.OAA01298@cj20424-a.reston1.va.home.com> <200102060131.f161Vk311008@mira.informatik.hu-berlin.de>
Message-ID: <3A81B7E7.7B55C9A1@ActiveState.com>

"Martin v. Loewis" wrote:
> 
>...
>
> I think this code was completely bogus. The author apparently thought
> of creating DocumentTypes, in which case publicId and systemId would
> be required. However, the SAX locator does not provide that
> information (atleast not for the DTD; rather for the document itself),
> nor were we in the process of creating document types.

I do not think that minidom should support any DTD information. I would
advise that you should just remove any code relating to public and
system identifiers. It was not there originally and I don't think it is
useful.

> It seems that the processing of the doctype argument is also
> incorrect: It should *not* create one given the qualifiedName, atleast
> I can't find any indication that it should. It MUST set the
> ownerDocument, though, which it doesn't. I'm not sure whether the
> doctype needs to appear in the childNodes of the Document, can anybody
> clarify this?

Yes, the DocumentType would be a child of the Document. But I don't
think we should have doctype at all...leave that to 4dom.

 Paul Prescod


From jeremy.kloth@fourthought.com  Wed Feb  7 21:12:44 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Wed, 07 Feb 2001 14:12:44 -0700
Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python
 1.5.2
References: <008101c090c2$40afeba0$7cac1218@reston1.va.home.com>
Message-ID: <3A81BA4C.9741C89B@fourthought.com>

"Thomas B. Passin" wrote:
> 
> I forget to add - a lot of FT's test scripts use "import TestSuite", but
> that's now at Ft.Lib.TestSuite, and I had to make a number of corresponding
> import changes, too. There were a lot of them in the xpath test directory,
> though I didn't make a list yet.
> 

Actually, the local import works just fine when the test scripts are run
from the directory where they are installed, the same as the
documentation directory.  During install, that file gets copied into the
directory as well.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From paulp@ActiveState.com  Wed Feb  7 21:09:57 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Wed, 07 Feb 2001 13:09:57 -0800
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de>
Message-ID: <3A81B9A5.CE897F06@ActiveState.com>

"Martin v. Loewis" wrote:
> 

> > .
> > Well, we should think about exactly what makes minidom "mini".  It's
> > debatable whether it is possible to implement all of DOM Level 2
> > core and still be "mini".  And what about DOm level 3?
> 
> I think the original understanding was that everything that is
> "convenience", ie. can be composed from other interfaces, should not
> be included. In addition, minidom originally had no DOMImplementation,
> you had to know the implementation class names to build a tree.
> 
> That approach has failed; people have been contributing bits and
> pieces so that what they wanted to use is there. These days, I think
> it is mini by only implementing DOM Core. That probably makes it a AA
> battery.

First, would implementing DOM core include entities, notations, document
types, entity references etc.? If so, I think you're increasing the
conceptual load quite a bit.

I also originally wanted minidom to be readonly but yes, that has gone
away also.

> [supporting namespaces]
> > Of course if it isn't Level 2 compliant, it needn't do so.  I
> > wouldn't consider it unreasonable to have minidom L1 only.  If users
> > want Level 2, they install PyXML or other.
> 
> I'd say that this is a matter of internal consistency. Since the SAX
> part in Python supports namespaces, the DOM part should do so as
> well. That means L2. It also turns out that what I hope is the larger
> half of NS support is already in minidom as of Python 2.0, so ripping
> it out would not be sensible.

I put off namespace support as long as I could trying to keep it simple.
The tricky part of doing namespaces "right" is doing movement of nodes
across namespace boundaries right. You've got issues of prefix clashes,
element type renaming and so forth.

Having proper namespace support would not be trivial. I admittedly
should have not have started adding any namespace support at all until I
had figured out the end-game...is it too late to go back and make it
readonly again? :)

 Paul Prescod


From jeremy.kloth@fourthought.com  Wed Feb  7 21:17:32 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Wed, 07 Feb 2001 14:17:32 -0700
Subject: [XML-SIG] Re: Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python
 1.5.2
References: <008801c090c5$2ed14e80$7cac1218@reston1.va.home.com>
Message-ID: <3A81BB6C.1600F1EE@fourthought.com>

"Thomas B. Passin" wrote:

> 
> Now the code runs, but the example being tested still fails -  Apparently the
> xpointer expression no longer works.  When I try various plausible variations,
> a few of them work, some of them return None (which causes an error) , and
> some give this same error.
> 
> Seems to me that if no match is found for an expression, returning None would
> always be appropriate.  It shouldn't return an exception unless there was
> invalid syntax in the xpointer expression.
> 

According the the XPointer specification, it is an error.
In section 3.4:

[Definition: If a syntactically correct XPointer, suitably escaped,
fails as discussed in 4.3 Schemes, the XPointer has a sub-resource
error.] Note that XPath allows expressions that return empty node-sets
as their results and does not regard this situation as an error. Because
the XPointer language is intended as a specification of document
locations rather than a broader query language, an empty result is an
error.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Thu Feb  8 01:14:03 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Feb 2001 02:14:03 +0100
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: <3A81B9A5.CE897F06@ActiveState.com> (message from Paul Prescod on
 Wed, 07 Feb 2001 13:09:57 -0800)
References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> <3A81B9A5.CE897F06@ActiveState.com>
Message-ID: <200102080114.f181E3k01764@mira.informatik.hu-berlin.de>

> First, would implementing DOM core include entities, notations, document
> types, entity references etc.? If so, I think you're increasing the
> conceptual load quite a bit.

I think it should include anything that users want to use. Refusing
patches because they extend beyond an originally-set feature set is
not good.

Regards,
Martin


From paulp@ActiveState.com  Thu Feb  8 02:00:48 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Wed, 07 Feb 2001 18:00:48 -0800
Subject: [XML-SIG] Minidom bugs/questions
References: <200102050446.VAA29599@localhost.localdomain> <200102050714.f157EFe00856@mira.informatik.hu-berlin.de> <3A81B9A5.CE897F06@ActiveState.com> <200102080114.f181E3k01764@mira.informatik.hu-berlin.de>
Message-ID: <3A81FDD0.DEBCE83C@ActiveState.com>

"Martin v. Loewis" wrote:
> 
> > First, would implementing DOM core include entities, notations, document
> > types, entity references etc.? If so, I think you're increasing the
> > conceptual load quite a bit.
> 
> I think it should include anything that users want to use. Refusing
> patches because they extend beyond an originally-set feature set is
> not good.

Guido does that all of the time. It's part of how we keep things simple!
Every extra feature must be added to the documentation and increases
people's XML-a-phobia by that much more.

 Paul Prescod


From chris@rpgarchive.com  Thu Feb  8 06:00:21 2001
From: chris@rpgarchive.com (Chris Davis)
Date: Thu, 8 Feb 2001 00:00:21 -0600
Subject: [XML-SIG] SAXReaderNotAvailable
Message-ID: <0102080000210J.12796@lab.rpgarchive.com>

I'm sorry to just throw an error out like this, but can anyone tell me wh=
at=20
might be the cause of this expection. I trying to help a friend get pytho=
n20=20
installed properly. He has expat installed and running slackware. =20

 File "./minidom.py", line 581, in parseString
=A0 =A0 return _doparse(pulldom.parseString, args, kwargs)
=A0 File "./minidom.py", line 570, in _doparse
=A0 =A0 events =3D apply(func, args, kwargs)
=A0 File "./pulldom.py", line 244, in parseString
=A0 =A0 parser =3D xml.sax.make_parser()
=A0 File "/usr/lib/python2.0/xml/sax/__init__.py", line 88, in make_parse=
r
=A0 =A0 raise SAXReaderNotAvailable("No parsers found", None)
xml.sax._exceptions.SAXReaderNotAvailable: No parsers found

xml dir:

darkstar:/usr/lib/python2.0:>ls -R xml
xml:
__init__.py =A0__init__.pyc =A0__init__.pyo =A0dom/ =A0parsers/ =A0sax/

xml/dom:
__init__.py =A0 __init__.pyo =A0minidom.pyc =A0pulldom.py =A0 pulldom.pyo
__init__.pyc =A0minidom.py =A0 =A0minidom.pyo =A0pulldom.pyc

xml/parsers:
__init__.py =A0__init__.pyc =A0__init__.pyo =A0expat.py =A0expat.pyc =A0e=
xpat.pyo

xml/sax:
__init__.py =A0 =A0 _exceptions.pyc =A0expatreader.pyc =A0handler.pyo =A0=
 saxutils.pyo
__init__.pyc =A0 =A0_exceptions.pyo =A0expatreader.pyo =A0parsers@ =A0 =A0=
 =A0xmlreader.py
__init__.pyo =A0 =A0dom@ =A0 =A0 =A0 =A0 =A0 =A0 handler.py =A0 =A0 =A0 s=
axutils.py =A0=20
xmlreader.pyc
_exceptions.py =A0expatreader.py =A0 handler.pyc =A0 =A0 =A0saxutils.pyc =
=A0
xmlreader.pyo

Thanks

--=20
Chris Davis
chris@rpgarchive.com

RPGArchive	http://rpgarchive.com
OpenRPG	http://openrpg.com


From stefan.marsiske@sysdata.siemens.hu  Thu Feb  8 12:40:21 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Thu, 8 Feb 2001 13:40:21 +0100
Subject: [XML-SIG] sax and entities
Message-ID: <20010208134021.D4340@sysdata.siemens.hu>

hi,

i got a little problem. when i want to load an xml file using sax2, i loose
entities.
in one file (which is actually almost html) i have a "&nbsp;" entity, but once
loaded that entity in the dom tree is already converted to a space. that is
quite unfortunate. because i want to write this dom tree back after a few
changes, but then this &nbsp; is lost... "one in a million..."
how can i force the sax2 reader not to expand entities? or do i miss the point
here?


ciao
-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From larsga@garshol.priv.no  Thu Feb  8 13:31:05 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 08 Feb 2001 14:31:05 +0100
Subject: [XML-SIG] sax and entities
In-Reply-To: <20010208134021.D4340@sysdata.siemens.hu>
References: <20010208134021.D4340@sysdata.siemens.hu>
Message-ID: <m3snlp5mjq.fsf@lambda.garshol.priv.no>

* Marsiske Stefan
| 
| i got a little problem. when i want to load an xml file using sax2,
| i loose entities.

You are quite right that you lose information about which character
data came from character entities, and that this information is not
passed on to the DOM.

The reason this is so is that this information is hardly ever wanted,
and keeping all information of this kind would make the SAX API a lot
more complicated.

| in one file (which is actually almost html) i have a "&nbsp;"
| entity, but once loaded that entity in the dom tree is already
| converted to a space. that is quite unfortunate. because i want to
| write this dom tree back after a few changes, but then this &nbsp;
| is lost...

Well, first of all, it should not be converted to a space, but to the
NBSP character, ISO Latin-1 character 160, U+00A0.  If it is converted
to an NBSP character, you still have it, and it will still be there
when you write your DOM tree back, although in a different form.

If you really want to have it as an '&nbsp;' in your output XML rather
than as an NBSP character you should do something like

  string.replace(text, "\240", "&nbsp;")

when you write the DOM tree out.  Exactly how to do this will depend
on your DOM implementation.


I think it would make very good sense, BTW, for the DOM serializers to
provide some mechanism for doing escapings of this kind when
serializing the DOM.  It might be that you pass a dictionary like

  {"\240" : "&nbsp;"}

or perhaps a function. What say ye, DOM implementors?

--Lars M.


From uche.ogbuji@fourthought.com  Thu Feb  8 17:41:26 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 08 Feb 2001 10:41:26 -0700
Subject: [XML-SIG] sax and entities
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "08 Feb 2001 14:31:05 +0100." <m3snlp5mjq.fsf@lambda.garshol.priv.no>
Message-ID: <200102081741.KAA19184@localhost.localdomain>

> If you really want to have it as an '&nbsp;' in your output XML rather
> than as an NBSP character you should do something like
> 
>   string.replace(text, "\240", "&nbsp;")
> 
> when you write the DOM tree out.  Exactly how to do this will depend
> on your DOM implementation.
> 
> 
> I think it would make very good sense, BTW, for the DOM serializers to
> provide some mechanism for doing escapings of this kind when
> serializing the DOM.  It might be that you pass a dictionary like
> 
>   {"\240" : "&nbsp;"}
> 
> or perhaps a function. What say ye, DOM implementors?

4DOM and 4XSLT already automatically do this for HTML output.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ken@bitsko.slc.ut.us  Thu Feb  8 17:46:38 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 08 Feb 2001 11:46:38 -0600
Subject: [XML-SIG] Minidom bugs/questions
In-Reply-To: "Martin v. Loewis"'s message of "Thu, 8 Feb 2001 02:14:03 +0100"
References: <200102050446.VAA29599@localhost.localdomain>
 <200102050714.f157EFe00856@mira.informatik.hu-berlin.de>
 <3A81B9A5.CE897F06@ActiveState.com>
 <200102080114.f181E3k01764@mira.informatik.hu-berlin.de>
Message-ID: <x7d7ctnk3l.fsf@bitsko.slc.ut.us>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> [Paul Prescod wrote:]
> > First, would implementing DOM core include entities, notations,
> > document types, entity references etc.? If so, I think you're
> > increasing the conceptual load quite a bit.
> 
> I think it should include anything that users want to use. Refusing
> patches because they extend beyond an originally-set feature set is
> not good.

Seperating basic usage from extended usage in the documentation would
go a long way towards satisfying both requirements: keeping the
initial conceptual load down while still allowing richer use.

I think what makes minidom lightweight is it's implementation and
footprint, mostly due to relaxing many of the requirements of W3C DOM.

  -- Ken


From martin@loewis.home.cs.tu-berlin.de  Thu Feb  8 20:06:41 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Feb 2001 21:06:41 +0100
Subject: [XML-SIG] sax and entities
In-Reply-To: <m3snlp5mjq.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 08 Feb 2001 14:31:05 +0100)
References: <20010208134021.D4340@sysdata.siemens.hu> <m3snlp5mjq.fsf@lambda.garshol.priv.no>
Message-ID: <200102082006.f18K6fO01195@mira.informatik.hu-berlin.de>

> I think it would make very good sense, BTW, for the DOM serializers to
> provide some mechanism for doing escapings of this kind when
> serializing the DOM.  It might be that you pass a dictionary like
> 
>   {"\240" : "&nbsp;"}
> 
> or perhaps a function. What say ye, DOM implementors?

That would make another issue on a "standard Python DOM extensions"
PEP. Unfortunately, so far, nobody has offered to draft one. Once it
is there and agreed, I think it won't be too hard to provide such a
feature in DOM implementations. The purpose of the PEP would be to
present the feature uniformly across DOM implementations, of course.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Feb  8 20:31:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 8 Feb 2001 21:31:15 +0100
Subject: [XML-SIG] SAXReaderNotAvailable
In-Reply-To: <0102080000210J.12796@lab.rpgarchive.com> (message from Chris
 Davis on Thu, 8 Feb 2001 00:00:21 -0600)
References: <0102080000210J.12796@lab.rpgarchive.com>
Message-ID: <200102082031.f18KVFx01306@mira.informatik.hu-berlin.de>

> I'm sorry to just throw an error out like this, but can anyone tell me what 
> might be the cause of this expection. I trying to help a friend get python20 
> installed properly. He has expat installed and running slackware.  

What do you mean with "has expat installed"? That expat.py is present?
That the expat 1.1 header files and library is present? That the
pyexpat module is present? The last one is required to find a parser.
You need to enable pyexpat in Modules/Setup.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Fri Feb  9 08:51:56 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 9 Feb 2001 09:51:56 +0100
Subject: [XML-SIG] Some pyXML Bugs in PyXML 0.6.2 and 4Suite for Python 1.5.2
In-Reply-To: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <007b01c090c1$bc07f5a0$7cac1218@reston1.va.home.com>
Message-ID: <200102090851.f198puK01486@mira.informatik.hu-berlin.de>

> First, a number of modules reference the "core" module, which isn't
> there (any more, I assume?).  Some are tests, some are not.  I don't
> have a list at the moment, but they should be flushed out.

That's a known issue; not all of that has been ported to
4DOM. Contributions are welcome.

> Second, in xml\dom\ext\PyExpat.py, the import pyexpat statement
> doesn't throw an ImportError, but a NameError (seems strange).

That code will look differently once we get an updated copy of 4DOM.

> I'm not sure who shoud be putting fixes for these bugs in once they
> are agreed on.  I'm still not getting secure access negotiated
> properly, so it's not going to be me for while yet.

Patches can be submitted to sourceforge.net/projects/pyxml also.

Regards,
Martin


From fdrake@acm.org  Fri Feb  9 18:09:33 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 9 Feb 2001 13:09:33 -0500 (EST)
Subject: [XML-SIG] Question about namespace declarations
Message-ID: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com>

  I took a look in the namespaces recommendation
(http://www.w3.org/TR/REC-xml-names/), and it makes me think that this
isn't quite right.  I vaguely recall that "xmlns" was supposed to
magically map into that URI, but I don't see it in the
recommendation.  Further, the recommendation says (section 4, in
"Namespace Constraint: Prefix Declared") that the "xmlns" prefix is
not bound to any namespace URI.
  This makes me think that both "xmlns" and "xmlns:*" should be
presented as attributes without namespaces in the DOM.  Can anyone
point to references that extend or override this recommendation?
  Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From paulp@ActiveState.com  Fri Feb  9 18:27:12 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Fri, 09 Feb 2001 10:27:12 -0800
Subject: [XML-SIG] Question about namespace declarations
References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com>
Message-ID: <3A843680.64D8235B@ActiveState.com>

I am probably missing some context but your reading of the XML
namespaces specification is correct. Minidom does not bind xmlns to
REC-xml-names. Which DOM does?

 Paul Prescod


From paulp@ActiveState.com  Fri Feb  9 18:31:40 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Fri, 09 Feb 2001 10:31:40 -0800
Subject: [XML-SIG] Question about namespace declarations
References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com>
Message-ID: <3A84378C.19B0AFD2@ActiveState.com>

Oops, I just found this:

"Note: In the DOM, all namespace declaration attributes are by
definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/".
These are the attributes whose namespace prefix or qualified name is
"xmlns". Although, at the time of writing, this is not part of the XML
Namespaces specification [Namespaces], it is planned to be incorporated
in a future revision."

http://www.w3.org/TR/DOM-Level-2-Core/core.html

 Paul Prescod


From dsturtevant@comversens.com  Fri Feb  9 18:50:23 2001
From: dsturtevant@comversens.com (Sturtevant, Dean)
Date: Fri, 9 Feb 2001 13:50:23 -0500
Subject: [XML-SIG] problem with saxdemo.py
Message-ID: <CD0BA48D13A9D311A274009027C5B4B101384FD4@intm1.btrd.bostontechnology.com>

Hi - I'm trying to find a simple example of the usage of the xml parser, so
I thought I'd try saxdemo. But there's a problem: saxdemo wants saxexts from
xml/sax, which doesn't exist in the python 2.0 installation. Should I look
to another example? Which one?
- Dean


From fdrake@acm.org  Fri Feb  9 18:46:07 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 9 Feb 2001 13:46:07 -0500 (EST)
Subject: [Parsed-XML-Dev] Re: [XML-SIG] Question about namespace declarations
In-Reply-To: <3A84378C.19B0AFD2@ActiveState.com>
References: <14980.12893.917059.581182@cj42289-a.reston1.va.home.com>
 <3A84378C.19B0AFD2@ActiveState.com>
Message-ID: <14980.15087.450232.203602@cj42289-a.reston1.va.home.com>

Paul Prescod writes:
 > "Note: In the DOM, all namespace declaration attributes are by
 > definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/".
 > These are the attributes whose namespace prefix or qualified name is
 > "xmlns". Although, at the time of writing, this is not part of the XML
 > Namespaces specification [Namespaces], it is planned to be incorporated
 > in a future revision."

  Boy is this stuff messy!
  The context is a DOM implementation I'm working on for use in Zope
and some DOM client code Guido is working on.  Our current
implementation does what the DOM Level 2 recommendation says, but
Guido complained because exposed a bug elsewhere in our DOM when he
tried to insert namespace declarations.
  There's some information about our DOM project at:

	http://dev.zope.org/Wikis/DevSite/Projects/ParsedXML/


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Fri Feb  9 18:52:39 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 09 Feb 2001 11:52:39 -0700
Subject: [XML-SIG] Question about namespace declarations
In-Reply-To: Message from Paul Prescod <paulp@ActiveState.com>
 of "Fri, 09 Feb 2001 10:31:40 PST." <3A84378C.19B0AFD2@ActiveState.com>
Message-ID: <200102091852.LAA18011@localhost.localdomain>

> Oops, I just found this:
> 
> "Note: In the DOM, all namespace declaration attributes are by
> definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/".
> These are the attributes whose namespace prefix or qualified name is
> "xmlns". Although, at the time of writing, this is not part of the XML
> Namespaces specification [Namespaces], it is planned to be incorporated
> in a future revision."
> 
> http://www.w3.org/TR/DOM-Level-2-Core/core.html

Yes.  And this is precisely how 4DOM, pDomlette and cDomlette are implemented. 
 If minidom doesn't have a namespaceURI assigned for namespace declaration 
attributes, it should be fixed.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From noreply@sourceforge.net  Sat Feb 10 00:29:01 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 09 Feb 2001 16:29:01 -0800
Subject: [XML-SIG] [Bug #131797] failed build on 2.1a2 and 2.0
Message-ID: <E14RNuP-00005Q-00@usw-sf-web3.sourceforge.net>

Bug #131797, was updated on 2001-Feb-09 16:29
Here is a current snapshot of the bug.

Project: Python/XML
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: bcollar
Assigned to : nobody
Summary: failed build on 2.1a2 and 2.0

Details: I get the following output from "python2.1 ./setup.py build", on
debian. The same occurs when using python2.0:

running build_ext
building '_xmlplus.parsers.pyexpat' extension
gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS -DXML_DTD
-DEXPAT_VERSION=0x010200 -Iextensions/expat/xmltok
-Iextensions/expat/xmlparse -I/usr/local/include/python2.1 -c
extensions/pyexpat.c -o build/temp.linux-i586-2.1/pyexpat.o
extensions/pyexpat.c: In function `getcode':
extensions/pyexpat.c:262: warning: passing arg 11 of `PyCode_New' makes
pointer from integer without a cast
extensions/pyexpat.c:262: too few arguments to function `PyCode_New'
extensions/pyexpat.c: In function `call_with_frame':
extensions/pyexpat.c:289: too few arguments to function `PyFrame_New'
error: command 'gcc' failed with exit status 1


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=131797&group_id=6473


From don_wakefield@mentorg.com  Sat Feb 10 02:05:55 2001
From: don_wakefield@mentorg.com (Don Wakefield)
Date: Fri, 9 Feb 2001 18:05:55 -0800 (PST)
Subject: [XML-SIG] Using PyExpat.py
Message-ID: <14980.41475.888529.565845@gargle.gargle.HOWL>

I'm trying to construct a DOM using PyExpat.py. My environment is:

   Python 1.5.2
   PyXML 0.6.2

Here's the simple code. I've added a few lines like

  reader._override = None

to get past errors that I didn't understand until I came to this
point. Now I don't know what more to do...
------------------------------------------
<CODE>
from xml.parsers import pyexpat
from xml.dom.ext.reader import PyExpat

import time

class Cells:
    def __init__(self, filename):
   
        try:
            reader = PyExpat.Reader()
            reader._override = None
            fp = open(filename, 'r')
            xml_dom_object = reader.fromStream(fp)
        except Exception, msg:
            print "Exception caught:", msg
            return

        self.root = xml_dom_object.documentElement

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        start = time.clock()
        x = Cells(sys.argv[1])
        end   = time.clock()
        print "Finished loading:", end - start
        
    else:
        print "Usage: python %s [XML-filename]" % sys.argv[0]
</CODE>

Here is the output of a run:
----------------------------
<343 : /user/donw/src/Demo/bigproto> !334
python_ic Time2.py big.xml
Exception caught: pyexpat
Finished loading: 0.0
----------------------------

Can anybody tell me what I'm doing wrong? The goal here is to use
pyexpat.so to speed the building of the DOM. Thanks for any comments.

-- 
Don Wakefield                              Mentor Graphics Corporation
(503) 685-1262                             8005 S.W. Boeckman Road    
don_wakefield@mentorg.com                  Wilsonville, OR 97070-7777


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 10 07:00:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 10 Feb 2001 08:00:33 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <14980.41475.888529.565845@gargle.gargle.HOWL> (message from Don
 Wakefield on Fri, 9 Feb 2001 18:05:55 -0800 (PST))
References: <14980.41475.888529.565845@gargle.gargle.HOWL>
Message-ID: <200102100700.f1A70X701220@mira.informatik.hu-berlin.de>

> I'm trying to construct a DOM using PyExpat.py. My environment is:
> 
>    Python 1.5.2
>    PyXML 0.6.2
[...]
>         try:
>             reader = PyExpat.Reader()
>             reader._override = None
>             fp = open(filename, 'r')
>             xml_dom_object = reader.fromStream(fp)
>         except Exception, msg:
>             print "Exception caught:", msg
>             return
[...]
> Can anybody tell me what I'm doing wrong?

That's hard to say. First, a number of changes have been made since
0.6.2; I can't reproduce your problem. In any case, I recommend to let
the exception through instead of trying to print it this way: it is
much more informative to get a full traceback, and full information
about the exception.

Regards,
Martin


From don_wakefield@mentorg.com  Sat Feb 10 18:52:20 2001
From: don_wakefield@mentorg.com (Don Wakefield)
Date: Sat, 10 Feb 2001 10:52:20 -0800 (PST)
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102100700.f1A70X701220@mira.informatik.hu-berlin.de>
References: <14980.41475.888529.565845@gargle.gargle.HOWL>
 <200102100700.f1A70X701220@mira.informatik.hu-berlin.de>
Message-ID: <14981.36324.913804.941652@gargle.gargle.HOWL>

>>>>> "Martin" == Martin v Loewis <martin@loewis.home.cs.tu-berlin.de> writes:

Martin> I recommend to let the exception through instead of trying to
Martin> print it this way

Thanks. I tried that, and got the following traceback:

python_ic Timediag.py cs39.xml
Traceback (innermost last):
  File "Timediag.py", line 20, in ?
    x = Cells(sys.argv[1])
  File "Timediag.py", line 12, in __init__
    xml_dom_object = reader.fromStream(fp)
  File "/wv/icdet/python_src/12-19-00/BUILD_AREA/ss6/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 65, in fromStream
    self.initParser()
  File "/wv/icdet/python_src/12-19-00/BUILD_AREA/ss6/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 51, in initParser
    self.parser=pyexpat.ParserCreate()
NameError: pyexpat

But if I start python from the command line, I can do:

<47 : /user/donw/src/Demo/bigproto> python
Python 1.5.2 (#1, Dec 20 2000, 08:50:14)  [GCC 2.9-mentor-98r2p24] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from xml.parsers import pyexpat
>>> parser=pyexpat.ParserCreate() 
>>> ^D

So my environment is fine. PyExpat.py does not import pyexpat, but I do
in my calling test script:

  from xml.parsers import pyexpat
  from xml.dom.ext.reader import PyExpat

Martin> That's hard to say. First, a number of changes have been made since
Martin> 0.6.2; I can't reproduce your problem.

Note that I've downloaded PyXML-0.6.3 from Sourceforge (haven't
installed it yet) and PyExpat.py in *that* version does not import
pyexpat either. So if you are not able to duplicate the problem with
that version, it must be something deeper...

I'll try installing 0.6.3 and hammer on it for awhile. Note that I'm
able to build a DOM using the lines:

   from xml.dom import ext
   from xml.dom.ext.reader import Sax2
      :
      :
   xml_dom_object = Sax2.FromXmlFile(filename, validate=0)

I'm just hoping to build a DOM with expat to improve performance. Thanks
for any suggestions.

-- 
Don Wakefield                              Mentor Graphics Corporation
(503) 685-1262                             8005 S.W. Boeckman Road    
don_wakefield@mentorg.com                  Wilsonville, OR 97070-7777


From uche.ogbuji@fourthought.com  Sat Feb 10 21:07:40 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 10 Feb 2001 14:07:40 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Don Wakefield <don_wakefield@mentorg.com>
 of "Sat, 10 Feb 2001 10:52:20 PST." <14981.36324.913804.941652@gargle.gargle.HOWL>
Message-ID: <200102102107.OAA00904@localhost.localdomain>

> I'll try installing 0.6.3 and hammer on it for awhile. Note that I'm
> able to build a DOM using the lines:
> 
>    from xml.dom import ext
>    from xml.dom.ext.reader import Sax2
>       :
>       :
>    xml_dom_object = Sax2.FromXmlFile(filename, validate=0)
> 
> I'm just hoping to build a DOM with expat to improve performance. Thanks
> for any suggestions.

I do recommend the upgrade, and 0.6.4 is on its way.

As a forewarning, the 0.6.3 and up way is

from xml.dom.ext.reader import PyExpat     #or Sax2
reader = PyExpat.Reader()
xml_dom_object = reader.fromUri(filename)  #should work for either URL or file

Good luck.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From guido@digicool.com  Sat Feb 10 22:13:23 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 10 Feb 2001 17:13:23 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Your message of "Sat, 10 Feb 2001 14:07:40 MST."
 <200102102107.OAA00904@localhost.localdomain>
References: <200102102107.OAA00904@localhost.localdomain>
Message-ID: <200102102213.RAA28403@cj20424-a.reston1.va.home.com>

> xml_dom_object = reader.fromUri(filename)  #should work for either URL or file

Let's talk about this comment.  Is it really a good idea to build URL
access right into the API here?  For apps that need this, it's trivial
to write as long as the reader takes an open file object ("stream") as
an alternative to a filename: just call urllib.urlopen(uri) and pass
it as the argument.

Case in point: I found this bit in saxutilx.py:

        if os.path.isfile(sysid):
            basehead = os.path.split(os.path.normpath(base))[0]
            source.setSystemId(os.path.join(basehead, sysid))
            f = open(sysid, "rb")
        else:
            source.setSystemId(urlparse.urljoin(base, sysid))
            f = urllib.urlopen(source.getSystemId())

Now I don't know under which circumstances this get triggered (the
context is obscure), but I'd say it's a bad idea to just try to open a
URL when a string isn't a local file.  Maybe *you* live in a world
where the network is "always on" (and I do too!), but for plenty of
folks, it's rather annoying to find that their modem starts dialing
out each time they make a typo in a filename.

Besides, the syntax for local filenames and URLs is not the same; the
quoting conventions are different and it's quite possible to find that
the same name could be either a URL or a filename, with vastly
different interpretations.  (See nturl2path.)  Without more context,
it's unclear which syntax should be tried first.  The application
knows this, but the library doesn't.  It's also fine to have an
alternative API that takes a URL instead of a local filename -- but
it's not okay to attempt to overlap the two namespaces.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From uche.ogbuji@fourthought.com  Sun Feb 11 00:41:24 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 10 Feb 2001 17:41:24 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Sat, 10 Feb 2001 17:13:23 EST." <200102102213.RAA28403@cj20424-a.reston1.va.home.com>
Message-ID: <200102110041.RAA09847@localhost.localdomain>

> > xml_dom_object = reader.fromUri(filename)  #should work for either URL or file
> 
> Let's talk about this comment.  Is it really a good idea to build URL
> access right into the API here?  For apps that need this, it's trivial
> to write as long as the reader takes an open file object ("stream") as
> an alternative to a filename: just call urllib.urlopen(uri) and pass
> it as the argument.

Yes, but XML's interactions with URI are by no means straightforward.  The 
reason that URIs are built into so many APIs side-by-side with stream APIs 
(and this is the case in all implementations I know of Python or not) is to 
allow a smooth interface to all the URI complications XML brings about, mainly 
the network of rules for luuk-up according to base-URI reolution.  Basically, 
in XML just about everything is a URI.  Some implementations (such as PySAX) 
resolve to local file names merely as a convenience to the user.

And, for instance, there is the matter that URIs are a superset of URL, and 
esoterica such as URNs actually do exist in the XML fairy land.

> Case in point: I found this bit in saxutilx.py:
> 
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> 
> Now I don't know under which circumstances this get triggered (the
> context is obscure), but I'd say it's a bad idea to just try to open a
> URL when a string isn't a local file.  Maybe *you* live in a world
> where the network is "always on" (and I do too!), but for plenty of
> folks, it's rather annoying to find that their modem starts dialing
> out each time they make a typo in a filename.

I think this is a good point in general, but the attitude embodied into many 
XML practices is just this "always on" mentality.  This matter is the subject 
of debate every month or so on XML-DEV.  In fact, there are far more nasty 
implications of XML's URI-happiness than just the modem dialing example.

But I must say: unless urllib is broken, I don't see why this would cause any 
modem dialing in any environment other than Windows, where unfortunately drive 
specifiers look like URL schemes.

And even in windows, why would this cause dialing in a case other than when 
someone has ill-advisedly set up a share drive called http: or ftp:?

> Besides, the syntax for local filenames and URLs is not the same;

I didn't know there was any universal syntax for local filenames.

> the quoting conventions are different and it's quite possible to find that
> the same name could be either a URL or a filename, with vastly
> different interpretations.

I don't see where this is a problem.  If someone wants file "hello\\ world" on 
his local drive, he can just specify it as so, and if someone wants 
"http://spam.com/hello%20world", he can just specify it as so.  If he tries to 
resolve "http://spam.com/hello\\ world", he should get a malformed URL error 
from his user agent or library.

The solution is to use URL quoting if you want a URL, and your local quoting 
convention if you want a local file.

> (See nturl2path.)

Ah.  I don't claim to be able to speak intelligently about Windows NT.

> Without more context,
> it's unclear which syntax should be tried first. The application
> knows this, but the library doesn't.  It's also fine to have an
> alternative API that takes a URL instead of a local filename -- but
> it's not okay to attempt to overlap the two namespaces.

Actually, the library does know.  There is very little about XML that has 
anything to do with file names.  Pretty much everything is a URI.  In most 
cases, the library's trying to resolve a file name first is merely a 
convenience to the user so that he doesn't need to deal with URI arcana for 
local resources, say by type "file:" before every path.  If anything is to be 
done, I'd say this convenience should be taken away.  But I don't see a 
problem big enough to warrant doing so.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From don_wakefield@mentorg.com  Sun Feb 11 01:45:20 2001
From: don_wakefield@mentorg.com (Don Wakefield)
Date: Sat, 10 Feb 2001 17:45:20 -0800 (PST)
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102102107.OAA00904@localhost.localdomain>
References: <don_wakefield@mentorg.com>
 <14981.36324.913804.941652@gargle.gargle.HOWL>
 <200102102107.OAA00904@localhost.localdomain>
Message-ID: <14981.61104.276292.699921@gargle.gargle.HOWL>

>>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

Uche> I do recommend the upgrade, and 0.6.4 is on its way.

I installed 0.6.3, and immediately encountered several problems. Part of
this may be my freshness to Python. My environment may not be complete
in some way. First things first:

 - Using the code supplied by Uche below, I got a complaint of 'os' not
   being visible within PyExpat.py. It isn't imported there, and my
   importing it into my calling script didn't help. I had to add the
   'import os' to the top of PyExpat.py to eliminate this error.

- The next problem was this:

  > python Timediag.py cs39.xml
  Traceback (innermost last):
    File "Timediag.py", line 19, in ?
      x = Cells(sys.argv[1])
    File "Timediag.py", line 11, in __init__
      xml_dom_object = reader.fromUri(filename)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 80, in fromUri
      rt = self.fromStream(stream, doc,stripElements)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 64, in fromStream
      if not self._override:
  AttributeError: _override

  And indeed this variable is not defined in PyExpat.py. I added a line to my own script thusly:

    reader._override = None

  This eliminated that error, allowing me to move on to the next one.

- Here is the next traceback:

  > python Timediag.py cs39.xml
  Traceback (innermost last):
    File "Timediag.py", line 20, in ?
      x = Cells(sys.argv[1])
    File "Timediag.py", line 12, in __init__
      xml_dom_object = reader.fromUri(filename)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 80, in fromUri
      rt = self.fromStream(stream, doc,stripElements)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 65, in fromStream
      self.initParser()
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 51, in initParser
      self.parser=pyexpat.ParserCreate()
  NameError: pyexpat

  Based on my experience with 'os', I placed the line 'from xml.parsers import pyexpat'
  directly into PyExpat.py. Now that error has gone away...

- I ran again, and got:

  > python Timediag.py cs39.xml
  Traceback (innermost last):
    File "Timediag.py", line 20, in ?
      x = Cells(sys.argv[1])
    File "Timediag.py", line 12, in __init__
      xml_dom_object = reader.fromUri(filename)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 81, in fromUri
      rt = self.fromStream(stream, doc,stripElements)
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 67, in fromStream
      self.initState(doc, stripElements)
  TypeError: too many arguments; expected 2, got 3

  Checking in PyExpat.py, I indeed discovered that the caller was
  supplying stripElements, while the method did not take such an
  argument. I.e.:

  class Reader:
        :
     def initState(self, doc=None):
        :
     def fromStream(self, stream, doc=None, stripElements=None):
        if not self._override:
            self.initParser()
        self.initState(doc, stripElements)

  I've stopped here.

Admittedly some of these problems may stem from my limited understanding
of Python modules and/or namespaces. But the checking of an undefined
variable, and the calling of a method with more than the defined number
of arguments, leads me to believe that I've somehow managed to pick up a
corrupted/scrambled version of PyXML-0.6.3.tar.gz. Is this possible? Or
do 0.6.2 and 0.6.3 just work better with Python 2.0? I'm currently stuck
with 1.5.2, so I hope not.

Uche> As a forewarning, the 0.6.3 and up way is

Uche> from xml.dom.ext.reader import PyExpat     #or Sax2
Uche> reader = PyExpat.Reader()
Uche> xml_dom_object = reader.fromUri(filename)  #should work for either URL or file

By the way, thanks for all the friendly advice so far. I've noticed that
this list has more traffic by far relating to development work than
questions like mine, so I hope this isn't an intrusion.

-- 
Don Wakefield                              Mentor Graphics Corporation
(503) 685-1262                             8005 S.W. Boeckman Road    
don_wakefield@mentorg.com                  Wilsonville, OR 97070-7777


From uche.ogbuji@fourthought.com  Sun Feb 11 04:19:12 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 10 Feb 2001 21:19:12 -0700
Subject: [XML-SIG] 4Suite 0.10.2 beta 1
Message-ID: <200102110419.VAA19771@localhost.localdomain>

The 4Suite and 4SS 0.10.2 releases are about a week behind schedule.  Fingers 
crossed for Monday or Tuesday.  Great stuff in the offing, though.

I've posted a beta.  Source only for now, but Windows binaries should be along 
soon.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b1.tar.gz

Please help us find and squash the remaining bugs.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sun Feb 11 04:33:09 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 10 Feb 2001 21:33:09 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Don Wakefield <don_wakefield@mentorg.com>
 of "Sat, 10 Feb 2001 17:45:20 PST." <14981.61104.276292.699921@gargle.gargle.HOWL>
Message-ID: <200102110433.VAA22469@localhost.localdomain>

> 
> >>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:
> 
> Uche> I do recommend the upgrade, and 0.6.4 is on its way.
> 
> I installed 0.6.3, and immediately encountered several problems. Part of
> this may be my freshness to Python. My environment may not be complete
> in some way. First things first:

[Tale of woes snipped]

Ouch!  I don't use PyXML standalone, but even so I would have imagined screams 
from every quarter if 0.6.3 was really so broken.  I suspect someting might 
have gone wrong with your installation.  I'd suggest either using

python setup.py install -f

To force file overwrites or just blow away the _xmlplus directory in your 
Python library and reinstall.

Here are the results I get with Python 2.1a2 and 4Suite 0.10.2beta1 (which 
includes an updated PyXML).  Should be the same results with Python 1.5 or 2.0.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b1.tar.gz

[uogbuji@borgia uogbuji]$ cat test.xml 
<spam>
  <eggs>toast</eggs>
</spam>
[uogbuji@borgia uogbuji]$ python
Python 2.1a2 (#1, Feb  3 2001, 14:38:13) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.dom.ext.reader import PyExpat
>>> from xml.dom.ext import Print
>>> reader = PyExpat.Reader()
>>> xml_dom_object = reader.fromUri('test.xml')
>>> Print(xml_dom_object)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE spam><spam>
  <eggs>toast</eggs>
</spam>>>> 
>>> 

Huh?  Where'd that broken doctype come from?  Looks as if I found my own first 
beta bug.

Anyway, in general, you can see that the PyExpat reader works in 4Suite 
0.10.2beta1

Note that if your need is for speed and your pattern is just parse and read, 
you might want to consider cDomlette (in 4Suite only) which is *very* fast, 
but read-only:

[uogbuji@borgia uogbuji]$ python
Python 2.1a2 (#1, Feb  3 2001, 14:38:13) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from Ft.Lib import cDomlette
>>> reader = cDomlette.RawExpatReader()
>>> xml_dom_object = reader.fromUri('test.xml')
>>> from xml.dom.ext import Print
>>> Print(xml_dom_object)
<?xml version='1.0' encoding='UTF-8'?><spam>
  <eggs>toast</eggs>
</spam>>>> 
>>> 

Hmm.  Interesting BTW: no broken doctype.  My guess is that the PyExpat reader 
is inserting an incomplete DocumentType node, but again, this seems to be 
unrelated to your problems with PyXML 0.6.3.

> Uche> As a forewarning, the 0.6.3 and up way is
> 
> Uche> from xml.dom.ext.reader import PyExpat     #or Sax2
> Uche> reader = PyExpat.Reader()
> Uche> xml_dom_object = reader.fromUri(filename)  #should work for either URL or file
> 
> By the way, thanks for all the friendly advice so far. I've noticed that
> this list has more traffic by far relating to development work than
> questions like mine, so I hope this isn't an intrusion.

Not even close.  Your messages are *right* on-topic, and highly appreciated.  
We love to hear all the field-testing reports we can.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From don_wakefield@mentorg.com  Sun Feb 11 18:35:58 2001
From: don_wakefield@mentorg.com (Don Wakefield)
Date: Sun, 11 Feb 2001 10:35:58 -0800 (PST)
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102110433.VAA22469@localhost.localdomain>
References: <don_wakefield@mentorg.com>
 <14981.61104.276292.699921@gargle.gargle.HOWL>
 <200102110433.VAA22469@localhost.localdomain>
Message-ID: <14982.56206.740790.679411@gargle.gargle.HOWL>

>>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

Uche> [Tale of woes snipped]

Uche> [...] I'd suggest either using

Uche> python setup.py install -f

Uche> To force file overwrites or just blow away the _xmlplus directory in your 
Uche> Python library and reinstall.

Here's an interesting discrepancy. I don't *have* an _xmlplus directory
in my Python library. I instead have, starting from PYTHONHOME:
lib/python1.5/site-packages/xml. When I installed PyXML-0.6.3, I mv'ed
xml away, and sure enough, 'python setup.py install --prefix=$MYDIR' put
a new xml directory there, not _xmlplus.

Just for chuckles, I tried it your way, and still only got an xml
directory. Running your test case with test.xml, I get the following
after the '-f' install:

  <37 : /user/donw/src/Demo/bigproto> python
  Python 1.5.2 (#1, Feb 10 2001, 16:25:02)  [GCC 2.9-mentor-98r2p24] on sunos5
  Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
  >>> from xml.dom.ext.reader import PyExpat
  >>> from xml.dom.ext import Print
  >>> reader = PyExpat.Reader()
  >>> xml_dom_object = reader.fromUri('test.xml') 
  Traceback (innermost last):
    File "<stdin>", line 1, in ?
    File "/user/donw/src/local/ss5/obj/lib/python1.5/site-packages/xml/dom/ext/reader/PyExpat.py", line 76, in fromUri
      if os.path.exists(uri):
  NameError: os
  >>> ^D

So I'm seeing the same cascade of problems... At least, PyExpat.py
doesn't look any different from my last try at an install...

I'll work on this more during the week. I'm beginning to think that I'm
missing stuff anyway. Does PyXML require anything in the Python
environment other than what comes by default? For instance, PyExpat.py
has the following line (fragment qyoted):

   raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR...

But to the best of my knowledge, I don't have any Ft module anywhere in
my Python install... Could this be part of the problem?

Uche> Note that if your need is for speed and your pattern is just parse and read, 
Uche> you might want to consider cDomlette (in 4Suite only) which is *very* fast, 
Uche> but read-only:

Thanks. Some of my usages will be read-only, so I'll try this out
(probably on Monday, since Sundays are busy and I'm fighting a cold ;^)~ ).

Uche> Not even close.  Your messages are *right* on-topic, and highly appreciated.  
Uche> We love to hear all the field-testing reports we can.

Thanks for the encouragement!

-- 
Don Wakefield                              Mentor Graphics Corporation
(503) 685-1262                             8005 S.W. Boeckman Road    
don_wakefield@mentorg.com                  Wilsonville, OR 97070-7777


From guenter.radestock@sap.com  Mon Feb 12 10:36:06 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Mon, 12 Feb 2001 11:36:06 +0100
Subject: [XML-SIG] windows installer for XML package failing on Windows 95
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90E9F@dbwdfx14.wdf.sap-ag.de>

Hello,

I tried to install the XML package onto a Windoze 95 box a few days ago
and it does not work.  The installer crashes without unpacking source
or opening any window.  This may be a distutils issue.


First: I can successfully unpack the executable with winzip and move
the package directory into Python20/Lib.  This seems to work, but I
am not sure if I should also patch any existing files.  Is there a script
inside the installer that I should run after unpacking?  I did not
find a setup.py; the source package won't help me because I would
have to install a compiler for the extensions, right?


Second: To get the problem (distutils or not?) fixed, I have observed
the following:

1. The installer crashes only on this one Libretto 50ct Laptop with
Windows 95, second edition.  I have successfully used it on other
Windows computers.

2. Before installing the XML package, I first removed Python 1.5.2, then
removed the TCL/TK that came with 1.5.2, then installed Python 2.0.  I did
not have Python 1.5.2 on the other systems I installed the package on.
I also have an older (don't rememver the exact) version of Winzip on the
Libretto - can the Winzip DLL be the source of my problem?

- Guenter


From guenter.radestock@sap.com  Mon Feb 12 10:47:13 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Mon, 12 Feb 2001 11:47:13 +0100
Subject: [XML-SIG] Parsing DTDs
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EA0@dbwdfx14.wdf.sap-ag.de>

Hello,

in a current project, I want to parse simple DTDs and generate a kind
of recursive descent parser from them.  I have built a few of these
parsers and they work well.  I wanted to use a DTD parser from the
XML utilities to do the DTD parsing.

Looking into it, I have some problems - maybe someone who knows
the utilities better can hint me at what to do next.

1. I have not seen much documentation for the XML package.  Is anybody
currently working on documentation?  Is there any way to extract
documentation from the classes?

2. There is a DTD parser inside xmlproc.  This seems to be pretty closely
coupled to the validating XML parser.  At first sight it looks like it
gets very low level DTD events and generates finite state automata
objects among other things used to validate XML later on.  It looks
like there is no intermediate representation of the DTD that can (or should)
be used for other purposes than validating XML.  Is this correct?  Have
I looked at the wrong piece of code (i.e. is there something in the
4suite package I could use?

Thanks in advance for any help.
- Guenter


From larsga@garshol.priv.no  Mon Feb 12 11:05:31 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Feb 2001 12:05:31 +0100
Subject: [XML-SIG] Parsing DTDs
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EA0@dbwdfx14.wdf.sap-ag.de>
References: <FAFE609CB754D311B60C0008C75D355608C90EA0@dbwdfx14.wdf.sap-ag.de>
Message-ID: <m3y9vcb1qc.fsf@lambda.garshol.priv.no>

* Guenter Radestock
| 
| 2. There is a DTD parser inside xmlproc. 

Yup. It is documented at 
  <URL: http://www.garshol.priv.no/download/software/xmlproc/ >

This documentation should also be in the XML-SIG CVS.

| This seems to be pretty closely coupled to the validating XML
| parser.

It is not. The DTD API consists of two parts: an event-based parser
and an object structure for representing DTDs that also implements
the application interface of the event-based parser.

The event-based parser is not tied to the validating XML parser at
all. The DTD structure needs a reference to the event-based parser to
produce error messages. This is a weakness of the current design, but
shouldn't really cause any problems for your application.

| At first sight it looks like it gets very low level DTD events and
| generates finite state automata objects among other things used to
| validate XML later on.  It looks like there is no intermediate
| representation of the DTD that can (or should) be used for other
| purposes than validating XML.  Is this correct?

Yes, it is.  Look at the xmldtd module.  That contains the object
structure that is built by the parser.  The finite state automata are
used by the ElementType objects, and are hidden within their
interface.  You can get access to the information in them, but no the
automata themselves.

Do let me know if you have problems with the interface in any way.

--Lars M.


From Alexandre.Fayolle@logilab.fr  Mon Feb 12 11:15:05 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 12 Feb 2001 12:15:05 +0100 (CET)
Subject: [XML-SIG] Parsing DTDs
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EA0@dbwdfx14.wdf.sap-ag.de>
Message-ID: <Pine.LNX.4.21.0102121205320.1144-100000@leo.logilab.fr>

On Mon, 12 Feb 2001, Radestock, Guenter wrote:

> 2. There is a DTD parser inside xmlproc.  This seems to be pretty closely
> coupled to the validating XML parser.  At first sight it looks like it
> gets very low level DTD events and generates finite state automata
> objects among other things used to validate XML later on.  It looks
> like there is no intermediate representation of the DTD that can (or should)
> be used for other purposes than validating XML.  Is this correct?  Have
> I looked at the wrong piece of code (i.e. is there something in the
> 4suite package I could use?

You can access a DTD object that gets generated from the parsing. The
following sample code comes from the xmltools utility set that uses the
DTD information to generate contextual menus for an XML editor. There is
extensive API documentation on Lars Marius Garshol's page
(http://www.garshol.priv.no/download/software/xmlproc/)


-------------------------8<-------------------------------------
from xml.parsers.xmlproc.dtdparser import DTDParser
from xml.parsers.xmlproc.xmldtd import CompleteDTD

def parse_dtd_file(dtd_file,dtd_obj=None):
    parser = DTDParser()
    dtd = dtd_obj or CompleteDTD(parser)
    parser.set_dtd_consumer(dtd)
    parser.set_dtd_object(dtd)
    parser.parse_resource(dtd_file)
    parser.deref()
    return dtd

def getElementsName(child,dtd,list=None):
    """
    A recursive function that permits to extract allowed elements name
from
    the complex output tuple of ElementType.get_content_model (something
like
     (',', [('caption', '?'), ('|', [('col', '*'), ('colgroup', '*')],
''),
    ('thead', '?'), ('tfoot', '?'), ('|', [('tbody', '+'), ('tr', '+')],
'')],
    '') : example of the allowed elements of the HTML tag <table>)
    Inputs the complex tuple to be processed.
    Inputs the dtd object from which the elements have been read
    Inputs the list in which will be stored the elements name
    Returns the list
    """
    templist = list or []
    # processes the case of child == None (occurs when element content
    # is specified to be ANY)
    if (child == None) :
        # the return list is set to all of the elements declared in the
        # DTD
        templist = dtd.get_elements()
    else :
        # if the penultimate element of the complex tuple is a list,
        # then we have to recursively process each element of the list.
        if type(child[-2])==type([]):
            for c in child[-2]:
                templist =  getElementsName(c,dtd,templist)
        # if the penultimate element of the complex tuple is a tuple,
        # then we have to recursively process this last tuple.
        elif type(child[-2])==type(()):
            templist = getElementsName(child[-2],dtd,templist)
        # else the penultimate element of the complex tuple is a string
        # containing an allowed element name. We just have to append it
        # the return list.
        else:
            templist.append(child[-2])
    return templist

------------------------------8<----------------------------------------


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From akuchlin@mems-exchange.org  Mon Feb 12 16:26:51 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 12 Feb 2001 11:26:51 -0500
Subject: [XML-SIG] XML icons
Message-ID: <20010212112651.B3637@thrak.cnri.reston.va.us>

A minor thing that could be added to the PyXML distribution would be
icons for representing downloadable XML content on Web pages.  In
April/May 1999, there was an xml-dev discussion of this, and numerous
candidates were submitted: 
           http://www.iol.ie/~alank/xml/icons.htm

A vote was held so people could choose their favorites, but that page now 
returns a 404:
        http://users.javanet.com/~sbrown/icons.html

Does anyone recall the results?  Or should we just pick some set of
graphics and ask the designer's permission to include them?  

--amk


From Alexandre.Fayolle@logilab.fr  Mon Feb 12 16:55:21 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 12 Feb 2001 17:55:21 +0100 (CET)
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <14982.56206.740790.679411@gargle.gargle.HOWL>
Message-ID: <Pine.LNX.4.21.0102121746480.16428-100000@leo.logilab.fr>

On Sun, 11 Feb 2001, Don Wakefield wrote:

> 
> Uche> To force file overwrites or just blow away the _xmlplus directory in your 
> Uche> Python library and reinstall.
> 
> Here's an interesting discrepancy. I don't *have* an _xmlplus directory
> in my Python library. I instead have, starting from PYTHONHOME:

Uche was wrong, there. He forgot you were using Python 1.5.2. _xmlplus
comes on python 2.0 to avoid a name conflict.

> I'll work on this more during the week. I'm beginning to think that I'm
> missing stuff anyway. Does PyXML require anything in the Python
> environment other than what comes by default? For instance, PyExpat.py
> has the following line (fragment qyoted):
> 
>    raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR...
> 
> But to the best of my knowledge, I don't have any Ft module anywhere in
> my Python install... Could this be part of the problem?

This sounds like a 4Suite problem. I won't attempt to solve this, but just
to give you an idea of what's going on. To my best knowledge, xml.dom
comes from 4Suite, an XML library from Fourthought. This tiny part of the
library is part of PyXML. When used within 4Suite, it uses several other
modules in Ft.* (Ft stands for Fourthought). Periodically, changes from
the 4S cvs repository are commited to the PyXML cvs repository. And
sometimes, these changes were not intended to get there ;o) 

I think this is what happened in this case.


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From guenter.radestock@sap.com  Mon Feb 12 17:26:29 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Mon, 12 Feb 2001 18:26:29 +0100
Subject: Expat crashing Python (was: RE: [XML-SIG] Parsing DTDs)
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EA4@dbwdfx14.wdf.sap-ag.de>

> > like there is no intermediate representation of the DTD 
> that can (or should)
> > be used for other purposes than validating XML.  Is this 
> correct?  Have
> > I looked at the wrong piece of code (i.e. is there something in the
> > 4suite package I could use?
> 
> You can access a DTD object that gets generated from the parsing. The
> following sample code comes from the xmltools utility set 
> that uses the
> DTD information to generate contextual menus for an XML 
> editor. There is
> extensive API documentation on Lars Marius Garshol's page
> (http://www.garshol.priv.no/download/software/xmlproc/)
> 

Thanks a lot for the quick help.  It works perfectly well now.

There seems to be a problem in pyexpat.  It crashes, when I feed it
a file with an incorrect XML prefix, something like:

<?xml version="1.0" encoding="iso-8859-1" tralala="123"?>

or

<?xml encoding="iso-8859-1" ?>

I can reproduce this under Windows 2000, Python 2.0 (bombs out of python
with
a memory error):

---
from xml.parsers import expat
po = expat.ParserCreate('ISO-8859-1')
po.Parse("""<?xml encoding="iso-8859-1" ?><test></test>""", 1)
---

***thinking a little***

trying outside emacs, I see a stack trace before it bombs out.  so I
insert an exception handler:

---
from xml.parsers import expat
po = expat.ParserCreate('ISO-8859-1')
exc = None
try:
    po.Parse("""<?xml encoding="iso-8859-1" ?><test></test>""", 1)
except exc, arg:
    global xxx
    xxx = (exc, arg)
---

and now I get:

---
Traceback (most recent call last):
  File "C:\perforce\workplace\ims\dev\python-api\python\xml\expattest.py",
line
9, in ?

SystemError: 'finally' pops bad exception
---

Seems to be a problem of some exception handler in the Expat module.


From Alexandre.Fayolle@logilab.fr  Mon Feb 12 17:38:11 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 12 Feb 2001 18:38:11 +0100 (CET)
Subject: [XML-SIG] Re: Expat crashing Python
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EA4@dbwdfx14.wdf.sap-ag.de>
Message-ID: <Pine.LNX.4.21.0102121836000.16634-100000@leo.logilab.fr>

On Mon, 12 Feb 2001, Radestock, Guenter wrote:

> There seems to be a problem in pyexpat.  It crashes, when I feed it
> a file with an incorrect XML prefix, something like:

I think this has been fixed, or else the bug does not show up on Linux.

On a redhat 6.2 box, with python 1.5.2 and 4Suite0.10.2b1 (and whatever
version of PyXML comes bundled with it), I get:

Traceback (innermost last):
  File "<stdin>", line 2, in ?
xml.parsers.expat.error: syntax error: line 1, column 6


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From guenter.radestock@sap.com  Mon Feb 12 18:06:56 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Mon, 12 Feb 2001 19:06:56 +0100
Subject: [XML-SIG] RE: Expat crashing Python
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EA5@dbwdfx14.wdf.sap-ag.de>

> -----Original Message-----
> From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr]
> Sent: Montag, 12. Februar 2001 18:38
> To: Radestock, Guenter
> Cc: 'XML-SIG@python.org'
> Subject: Re: Expat crashing Python
> 
> 
> On Mon, 12 Feb 2001, Radestock, Guenter wrote:
> 
> > There seems to be a problem in pyexpat.  It crashes, when I feed it
> > a file with an incorrect XML prefix, something like:
> 
> I think this has been fixed, or else the bug does not show up 
> on Linux.
> 
> On a redhat 6.2 box, with python 1.5.2 and 4Suite0.10.2b1 
> (and whatever
> version of PyXML comes bundled with it), I get:
> 
> Traceback (innermost last):
>   File "<stdin>", line 2, in ?
> xml.parsers.expat.error: syntax error: line 1, column 6

Thanks again.  I tried to reproduce it, too under Linux (SuSE 7) and Windows
me
and another (nt) system without the XML package.  I could not reproduce it
on any
of those systems.  Mine is Win2000.  Hope I can find out what the problem
is,
or at least reproduce it.


From Alexandre.Fayolle@logilab.fr  Mon Feb 12 19:33:18 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 12 Feb 2001 20:33:18 +0100 (CET)
Subject: [XML-SIG] [ANN] VCalsSax : VCal parser with SAX API
Message-ID: <Pine.LNX.4.21.0102122031130.17379-100000@leo.logilab.fr>

We have just released vcalsax, which provides a vcal file parser with
a SAX API. It is thus possible to see such file as a DOM tree, to
manipulate it as if it were some XML data,  and then store it back in the
native format using an XSL Transformation, or some other scheme

It is easy to integrate vcalsax with the PyXML and 4Suite tools.

VCal is the file format used by many calendar programs, including
KOrganiser and Evolution.

http://www.logilab.org/vcalsax/
ftp://ftp.logilab.org/pub/vcalsax/


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Mon Feb 12 21:30:40 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 12 Feb 2001 14:30:40 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Don Wakefield <don_wakefield@mentorg.com>
 of "Sun, 11 Feb 2001 10:35:58 PST." <14982.56206.740790.679411@gargle.gargle.HOWL>
Message-ID: <200102122130.OAA18163@localhost.localdomain>

> 
> >>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:
> 
> Uche> [Tale of woes snipped]
> 
> Uche> [...] I'd suggest either using
> 
> Uche> python setup.py install -f
> 
> Uche> To force file overwrites or just blow away the _xmlplus directory in your 
> Uche> Python library and reinstall.
> 
> Here's an interesting discrepancy. I don't *have* an _xmlplus directory
> in my Python library. I instead have, starting from PYTHONHOME:
> lib/python1.5/site-packages/xml. When I installed PyXML-0.6.3, I mv'ed
> xml away, and sure enough, 'python setup.py install --prefix=$MYDIR' put
> a new xml directory there, not _xmlplus.

Sorry, I forgot that in Python 1.5 it is the xml dir not the _xmlplus dir.

> I'll work on this more during the week. I'm beginning to think that I'm
> missing stuff anyway. Does PyXML require anything in the Python
> environment other than what comes by default? For instance, PyExpat.py
> has the following line (fragment qyoted):
> 
>    raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR...
> 
> But to the best of my knowledge, I don't have any Ft module anywhere in
> my Python install... Could this be part of the problem?

Hmm.  I thought this was removed from PyXML 0.6.3.  The Ft module is part of 
4Suite.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From MichaelDyck@home.com  Tue Feb 13 08:42:01 2001
From: MichaelDyck@home.com (Michael Dyck)
Date: Tue, 13 Feb 2001 00:42:01 -0800
Subject: [XML-SIG] problems with PyXML 0.6.3
Message-ID: <3A88F359.991E26FD@home.com>

I downloaded PyXML-0.6.3.win32-py2.0.exe and ran it.
Here are some comments:

The first time I ran it, it installed into my existing _xmlplus directory,
which left some old files, which confused python.
Shouldn't the installer remove or rename the existing _xmlplus dir first?

xmldoc/README says it's "v0.6.2"

xmldoc/README could note that if you've just run an installer,
you don't have to do any of the "python setup.py ..." commands.
(At least, I *think* you don't have to.)

xmldoc/test:
    Either xmldoc/README or (new file) xmldoc/test/README should tell you
    how to run the tests in this dir (`python testxml.py -g', I think),
    and how to interpret what happens.  Similarly for subdirs.
    Maybe tests should be run automatically on installation.

    I had 2 tests fail:
    test test_sax crashed --
        exception.SystemError : 'finally' pops bad exception
    test test_saxdrivers crashed --
        exceptions.IOError : [Errno url error] unknown url type: 'c'

xmldoc/test/dom:
    When I tried `python test.py', I got "Error in syntax" right away.

When I ran one of my DOM programs, I got this exception:
        from xml.dom.Node import Node
    ImportError: No module named Node
When I tried removing the ".Node" from the import statement, the program ran
as before, so apparently that is the fix, but shouldn't this be noted fairly
prominently in xmldoc/README or xmldoc/README.dom?

xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and ../README.html,
which do not exist.

-Michael Dyck


From guenter.radestock@sap.com  Tue Feb 13 10:10:12 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Tue, 13 Feb 2001 11:10:12 +0100
Subject: [XML-SIG] problems with PyXML 0.6.3
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EA7@dbwdfx14.wdf.sap-ag.de>

>     I had 2 tests fail:
>     test test_sax crashed --
>         exception.SystemError : 'finally' pops bad exception
>     test test_saxdrivers crashed --
>         exceptions.IOError : [Errno url error] unknown url type: 'c'

The first is the same problem I tried to reproduce yesterday.  It
happens only on Windows NT or Windows 2000 with the installed
XML package.  I only wish I had some time to look into the coding
(I will try).  Downgrading to an older version of the expat extension
may help; the one supplied with Python2.0 does not have the problem.

The problem is whenever you parse incorrect XML, Python may crash,
instead of just raising an exception (very unfortunate e.g. in an
http server process).


From guenter.radestock@sap.com  Tue Feb 13 10:28:53 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Tue, 13 Feb 2001 11:28:53 +0100
Subject: [XML-SIG] Bug: XML Prolog from xml.sax.writer should contain version
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EA8@dbwdfx14.wdf.sap-ag.de>

The xml.sax.writer in 0.6.3 (and previous) will output a prolog like

<?xml encoding="iso-8859-1"?>

this is incorrect, according to the XML 1.0 specification and will
not be parsed by expat.  When outputting an encoding, the writer
must also say an XML version number.  I changed the code in 
sax/writer.py (unfortunately, I don't have diff available here):

    def startDocument(self):
        if self.__syntax.pic == "?>":
            lit = self.__syntax.lit
            s = '%sxml version="1.0" encoding%s%siso-8859-1%s' % (
                self.__syntax.pio, self.__syntax.vi, lit, lit)
            if self.__standalone:
                s = '%s standalone%s%s%s%s' % (
                    s, self.__syntax.vi, lit, self.__standalone, lit)
            self._write("%s%s\n" % (s, self.__syntax.pic))

please anybody fix this on sourceforge so it will be OK in the
next release.

- Guenter


From don_wakefield@mentorg.com  Tue Feb 13 18:28:21 2001
From: don_wakefield@mentorg.com (Don Wakefield)
Date: Tue, 13 Feb 2001 10:28:21 -0800 (PST)
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102122130.OAA18163@localhost.localdomain>
References: <don_wakefield@mentorg.com>
 <14982.56206.740790.679411@gargle.gargle.HOWL>
 <200102122130.OAA18163@localhost.localdomain>
Message-ID: <14985.31941.949839.528973@gargle.gargle.HOWL>

>>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

>> [...] When I installed PyXML-0.6.3 [...]

>> raise Ft.Lib.FtException(Ft.Lib.Error.XML_PARSE_ERROR...
>> 
>> But to the best of my knowledge, I don't have any Ft module anywhere
>> in my Python install [...]

Uche> Hmm.  I thought this was removed from PyXML 0.6.3.  The Ft module
Uche> is part of 4Suite.

Since I downloaded the 0.6.3 tarball from Sourceforge, and PyExpat.py
contains that line, there must have been a merge error...

-- 
Don Wakefield                              Mentor Graphics Corporation
(503) 685-1262                             8005 S.W. Boeckman Road    
don_wakefield@mentorg.com                  Wilsonville, OR 97070-7777


From MichaelDyck@home.com  Wed Feb 14 04:02:15 2001
From: MichaelDyck@home.com (Michael Dyck)
Date: Tue, 13 Feb 2001 20:02:15 -0800
Subject: [XML-SIG] problems with PyXML 0.6.3
References: <FAFE609CB754D311B60C0008C75D355608C90EA7@dbwdfx14.wdf.sap-ag.de>
Message-ID: <3A8A0347.FE109E3C@home.com>

"Radestock, Guenter" wrote:
> 
> >     I had 2 tests fail:
> >     test test_sax crashed --
> >         exception.SystemError : 'finally' pops bad exception
> >     test test_saxdrivers crashed --
> >         exceptions.IOError : [Errno url error] unknown url type: 'c'
> 
> The first is the same problem I tried to reproduce yesterday.  It
> happens only on Windows NT or Windows 2000 with the installed
> XML package.

I'm using Windows 95, so you can add that to the list.

-Michael Dyck


From MichaelDyck@home.com  Wed Feb 14 08:42:08 2001
From: MichaelDyck@home.com (Michael Dyck)
Date: Wed, 14 Feb 2001 00:42:08 -0800
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
Message-ID: <3A8A44E0.FEBD419C@home.com>

When I "import" a node from one document into another, it loses attributes.

To reproduce:
-------------
    from xml.dom import Document
    from xml.dom.ext.reader.Sax import FromXml
    from xml.dom.ext import PrettyPrint

    doc1 = FromXml("<foo a1='1' a2='2' a3='3'/>")
    original_node = doc1.documentElement
    PrettyPrint( original_node )

    doc2 = Document.Document( None )
    imported_node = doc2.importNode( original_node, deep=1 )
    PrettyPrint( imported_node )
-------------
prints out:
    <foo a2='2' a3='3' a1='1'/>
    <foo a1='1'/>

This happened with Python 2.0, and also happens with PyXML 0.6.3.
(I'm on Windows 95, if that makes a difference.)

I think the problem is somewhere near Element.__setstate__'s call to
setNamedItemNS.

If someone could provide a fix or workaround, I would appreciate it.

-Michael Dyck


From jere.kahanpaa@helsinki.fi  Wed Feb 14 11:09:51 2001
From: jere.kahanpaa@helsinki.fi (Jere =?iso-8859-1?Q?Kahanp=E4=E4?=)
Date: Wed, 14 Feb 2001 13:09:51 +0200
Subject: [XML-SIG] Unicode support problems in parsers
Message-ID: <3A8A677F.B227C56B@helsinki.fi>

Dear XML/Python-gurus,

I've encountered a slight problem while using the otherwise quite
excellent PyXML package 
(version 0.6.2, IIRC). One of my functions iterates thought a long list
of long XML files
with varying encodings, which makes it quite sensisitive to both memory
use and Unicode issues. 
I'm using the DOM interface and read the XML data using 

import xml.dom.ext.reader.Sax2
f = open('myfile')
doc = xml.dom.ext.reader.Sax2.FromXMLStream(f)
f.close()

Unfortunately the default parser seeems to have serious memory
management problems: the 
total amount of used memory grows by 1-2 megabytes for each processed
file. A forced 
garbage collection (this is Py2.0) doesn't help at all. The most obvious
solution was to use 
a  different parser - we needed a validating parser anyhow. And adding
the keyword 'validate=1'
to the 'FromXMLStream' call did indeed solve the memort leak bug.
However, an even more serious 
problem was now encountered; the default *validating* parser returns 
normal Python string, while the default parser returns Unicode strings
as any sensible 
XML-processing tool should do. This behaviour do cause any amount of
trouble elsewhere 
in the code: The PrettyPrinter, for example, don't work at all with
normal strings 
with non-ascii chars.

I don't have the names of the parsers with problems right here, but the
test runs were
done on a Linux box with PyXML 0.6.2.

Yours
	Jere Kahanp��
	jere.kahanpaa@helsinki.fi


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 12:57:37 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 13:57:37 +0100 (CET)
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: <3A8A44E0.FEBD419C@home.com>
Message-ID: <Pine.LNX.4.21.0102141356100.2571-100000@sagittarius.logilab.fr>

On Wed, 14 Feb 2001, Michael Dyck wrote:

> When I "import" a node from one document into another, it loses attributes.

This is a known bug in the 4DOM version that shipped with PyXML 0.6.3. It
has been fixed in the CVS.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From uche.ogbuji@fourthought.com  Wed Feb 14 14:20:01 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 07:20:01 -0700
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: Message from "Radestock, Guenter" <guenter.radestock@sap.com>
 of "Tue, 13 Feb 2001 11:10:12 +0100." <FAFE609CB754D311B60C0008C75D355608C90EA7@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200102141420.HAA01615@localhost.localdomain>

> >     I had 2 tests fail:
> >     test test_sax crashed --
> >         exception.SystemError : 'finally' pops bad exception
> >     test test_saxdrivers crashed --
> >         exceptions.IOError : [Errno url error] unknown url type: 'c'
> 
> The first is the same problem I tried to reproduce yesterday.  It
> happens only on Windows NT or Windows 2000 with the installed
> XML package.  I only wish I had some time to look into the coding
> (I will try).  Downgrading to an older version of the expat extension
> may help; the one supplied with Python2.0 does not have the problem.

I think this may be the sort of problem Guido was pointing to this weekend.  
My guess is that you specified "c:\foo\bar.xml" as for parsing, and the 
software checked and saw that that file did not exist, and then tried to 
interpret it as a URL.

So as usual, it seems the BDFL is right, but not for the reasons he originally 
gave.

So can we think of a better algorithm than the current "check for file, and if 
it doesn't exist, just blindly toss it to urllib)?

I personally think it's more important to be able to interpret things as URL 
than to interpret things as a file-name.

Maybe a flag named "force_file_interpretation" or the like is in order.

This problem affects 4Suite as well.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Wed Feb 14 14:22:56 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 07:22:56 -0700
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: Message from Michael Dyck <MichaelDyck@home.com>
 of "Wed, 14 Feb 2001 00:42:08 PST." <3A8A44E0.FEBD419C@home.com>
Message-ID: <200102141422.HAA01626@localhost.localdomain>

> When I "import" a node from one document into another, it loses attributes.
> 
> To reproduce:
> -------------
>     from xml.dom import Document
>     from xml.dom.ext.reader.Sax import FromXml
>     from xml.dom.ext import PrettyPrint
> 
>     doc1 = FromXml("<foo a1='1' a2='2' a3='3'/>")
>     original_node = doc1.documentElement
>     PrettyPrint( original_node )
> 
>     doc2 = Document.Document( None )
>     imported_node = doc2.importNode( original_node, deep=1 )
>     PrettyPrint( imported_node )
> -------------
> prints out:
>     <foo a2='2' a3='3' a1='1'/>
>     <foo a1='1'/>
> 
> This happened with Python 2.0, and also happens with PyXML 0.6.3.
> (I'm on Windows 95, if that makes a difference.)
> 
> I think the problem is somewhere near Element.__setstate__'s call to
> setNamedItemNS.
> 
> If someone could provide a fix or workaround, I would appreciate it.

I think Jeremey fixed this in 4Suite, and we'll be checking this into PyXML.  
Hopefully, based on all the problems reported lately, there will soon be a 
PyXML 0.6.4.

After today's 4Suite release (yes, we're in final packaging.  Hooray!) we'll 
be removing 4DOM from the package so it lives completely in PyXML.  This 
should accelerate maintenance.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Juergen Hermann" <jh@web.de  Wed Feb 14 14:39:16 2001
From: Juergen Hermann" <jh@web.de (Juergen Hermann)
Date: Wed, 14 Feb 2001 15:39:16 +0100
Subject: [XML-SIG] Python SOAP Implementations
Message-ID: <m14T35Q-000x5HC@smtp.web.de>

Hi!

I know of two SOAP implementations for Python:
 * soaplib.py by PythonWare, more or less beta software
 * Scarab - the WANT to implement SOAP, there's already a module named 
SOAP.py, but they're seemingly not ready yet

Any other implementations you know of?


Ciao, J=FCrgen

--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/


From fdrake@acm.org  Wed Feb 14 14:36:09 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 09:36:09 -0500 (EST)
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: <200102141422.HAA01626@localhost.localdomain>
References: <MichaelDyck@home.com>
 <3A8A44E0.FEBD419C@home.com>
 <200102141422.HAA01626@localhost.localdomain>
Message-ID: <14986.38873.297269.693330@cj42289-a.reston1.va.home.com>

Uche Ogbuji writes:
 > After today's 4Suite release (yes, we're in final packaging.
 > Hooray!) we'll be removing 4DOM from the package so it lives
 > completely in PyXML.  This should accelerate maintenance.

  Excellent news!
  Are you planning to update the PyXML CVS as soon as 4Suite is
released?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 14:48:42 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 15:48:42 +0100 (CET)
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: <200102141420.HAA01615@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>

On Wed, 14 Feb 2001, Uche Ogbuji wrote:

> So can we think of a better algorithm than the current "check for file, and if 
> it doesn't exist, just blindly toss it to urllib)?

If running windows, and the second character of the 'url' is a colon,
replace it with a pipe and prepend file: to the url?

> This problem affects 4Suite as well.

I had to use a similar hack when generating a CATALOG file for Narval, for
use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine
until it got C|\fooo\dtd_base\

Maybe what we need is a new function in os.path or similar that would
perform the file -> URL conversion described above. This would ease the
work of application writers. I, for one, would be much more at ease if I
knew that no implicit assumptions are made on what I pass. If the API
requires an URI/URL, then this is what it should get. 

Opinions?


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 14:52:54 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 15:52:54 +0100 (CET)
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: <m14T35Q-000x5HC@smtp.web.de>
Message-ID: <Pine.LNX.4.21.0102141549410.2624-100000@sagittarius.logilab.fr>

On Wed, 14 Feb 2001, Juergen Hermann wrote:

> Hi!
> 
> I know of two SOAP implementations for Python:
>  * soaplib.py by PythonWare, more or less beta software

We're using this in Narval. It works well. However it chokes on unicode
strings, so beware if you're planning to use it with python2.0

>  * Scarab - the WANT to implement SOAP, there's already a module named
> SOAP.py, but they're seemingly not ready yet

Do you have a URL for this one?

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 14:54:44 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 15:54:44 +0100 (CET)
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: <200102141422.HAA01626@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0102141553030.2624-100000@sagittarius.logilab.fr>

On Wed, 14 Feb 2001, Uche Ogbuji wrote:

> > If someone could provide a fix or workaround, I would appreciate it.
> 
> I think Jeremey fixed this in 4Suite, and we'll be checking this into PyXML.  
> Hopefully, based on all the problems reported lately, there will soon be a 
> PyXML 0.6.4.

Well, I thought it was in PyXML CVS. Sorry for the missinformation. It is
most certainly fixed in 4Suite. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From fdrake@acm.org  Wed Feb 14 15:10:17 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 10:10:17 -0500 (EST)
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: <Pine.LNX.4.21.0102141549410.2624-100000@sagittarius.logilab.fr>
References: <m14T35Q-000x5HC@smtp.web.de>
 <Pine.LNX.4.21.0102141549410.2624-100000@sagittarius.logilab.fr>
Message-ID: <14986.40921.565960.792080@cj42289-a.reston1.va.home.com>

Alexandre Fayolle writes:
 > On Wed, 14 Feb 2001, Juergen Hermann wrote:
 > >  * Scarab - the WANT to implement SOAP, there's already a module named
 > > SOAP.py, but they're seemingly not ready yet
 > 
 > Do you have a URL for this one?


	http://casbah.org/Scarab/

  I don't see a date on the page stating when this was last updated,
and the casbah.org pages seem old.  (Ken MacLeod, can you inform us on
this?  Or add dates to the Web pages?)  The front page at casbah.org
doesn't contain a link to Scarab, and the "Casbah Glossary" link is
broken (which is one place I'd expect to see a reference to Scarab).
  There might be more information in the download package.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From rsalz@caveosystems.com  Wed Feb 14 15:25:50 2001
From: rsalz@caveosystems.com (Rich Salz)
Date: Wed, 14 Feb 2001 10:25:50 -0500
Subject: [XML-SIG] problems with PyXML 0.6.3
References: <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>
Message-ID: <3A8AA37E.36AD9DF6@caveosystems.com>

> If running windows, and the second character of the 'url' is a colon,
> replace it with a pipe and prepend file: to the url?

Yes, it *IS* really gross, but internal windows code does this; I've
seen it, as part of a DCOM port (monikers, anyone?).  The test is
	if (isalpha(name[0]) && name[1] == ':') ...
actually, it might be isupper not isalpha, I can't recall.
	/r$


From fdrake@acm.org  Wed Feb 14 15:47:21 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 10:47:21 -0500 (EST)
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>
References: <200102141420.HAA01615@localhost.localdomain>
 <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>
Message-ID: <14986.43145.945513.277893@cj42289-a.reston1.va.home.com>

Alexandre Fayolle writes:
 > Maybe what we need is a new function in os.path or similar that would
 > perform the file -> URL conversion described above. This would ease the
 > work of application writers. I, for one, would be much more at ease if I
 > knew that no implicit assumptions are made on what I pass. If the API
 > requires an URI/URL, then this is what it should get. 

  I started to write a response saying "take a look at
urllib.pathname2url()", but upon thinking more about it and chatting
with Guido on the topic, have concluded that that's not the right
response.  Aside from urllib.pathname2url() being undocumented.  ;)
  What we decided was that while the "XML world" uses URIs for system
identifiers, it still doesn't make a lot of sense for the Python APIs
to hide the distinction between URLs and filenames (and URNs, if
you're using those).  What it comes down to is that there is no way to
ensure proper conversion from a filename to a URL for an arbitrary
system, and the application will generally need to know the difference
anyway.
  There are two places which need to feed data to an XML parser: the
public API which tells it to start parsing, and the internal entity
management.  The later can either be disabled (or non-existant), or
should allow the application to provide an entity manager which can do
whatever makes sense with regard to opening network resources.
  From this, it is reasonable to infer that we should be able to
provide data to the parser by passing it a string and/or a file
object.  Anything which opens a file based on a filename or URL is a
convenience method, and the URL and filename forms should be
distinct.  (And let's face it: while urllib may be a convenient entity
manager, it's not an efficient one!)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From ken@bitsko.slc.ut.us  Wed Feb 14 17:30:52 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 14 Feb 2001 11:30:52 -0600
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: "Fred L. Drake, Jr."'s message of "Wed, 14 Feb 2001 10:10:17 -0500 (EST)"
References: <m14T35Q-000x5HC@smtp.web.de>
 <Pine.LNX.4.21.0102141549410.2624-100000@sagittarius.logilab.fr>
 <14986.40921.565960.792080@cj42289-a.reston1.va.home.com>
Message-ID: <x71yt1kw8j.fsf@bitsko.slc.ut.us>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> Alexandre Fayolle writes:
>  > On Wed, 14 Feb 2001, Juergen Hermann wrote:
>  > >  * Scarab - the WANT to implement SOAP, there's already a module named
>  > > SOAP.py, but they're seemingly not ready yet
>  > 
>  > Do you have a URL for this one?
> 
> 
> 	http://casbah.org/Scarab/
> 
>   I don't see a date on the page stating when this was last updated,
> and the casbah.org pages seem old.  (Ken MacLeod, can you inform us
> on this?  Or add dates to the Web pages?)  The front page at
> casbah.org doesn't contain a link to Scarab, and the "Casbah
> Glossary" link is broken (which is one place I'd expect to see a
> reference to Scarab).
>   There might be more information in the download package.

The Scarab comm library went on hold when I went to rewrite some of
the underlying code (Casbah as a whole went on hold quite a bit
earlier :(.  That underlying code has resulted in Orchard[1] which has
a couple of features that not uncoincidentally make working with SOAP
(in particular, XML Namespaces) a *lot* easier.

The Orchard/Python implementation includes a new SOAP client[2,3]
module that we're using successfuly with Apache SOAP.  This module
supports both SOAP pickling and RPC over HTTP.  Note: I just found a
bug last week: we're not encoding &<>"'.  Doh!  A SOAP server would be
similarly easy, but we haven't penciled it in yet.

The pure Python implementation of Orchard was written as an API
prototype.  Eventually it will go away in favor of the Mostly-C
bridge, and SOAP will be ported to Mostly-C (can you say *screaming
fast*? ;-).  SOAP encoding will be the standard XML pickling format
for Orchard, so it will be a heavily used module, with corresponding
levels of maintenance and support.

Although we're not expecting to maintain the pure Python
implementation moving forward, this implementation is well tested so I
can recommend using it until the Mostly-C bridge is available.

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/>
[2] <http://casbah.org/~kmacleod/orchard/soap.html>
[3] <http://casbah.org/~kmacleod/orchard/SOAP.py>


From uche.ogbuji@fourthought.com  Wed Feb 14 18:40:46 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 11:40:46 -0700
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Wed, 14 Feb 2001 09:36:09 EST." <14986.38873.297269.693330@cj42289-a.reston1.va.home.com>
Message-ID: <200102141840.LAA02710@localhost.localdomain>

> 
> Uche Ogbuji writes:
>  > After today's 4Suite release (yes, we're in final packaging.
>  > Hooray!) we'll be removing 4DOM from the package so it lives
>  > completely in PyXML.  This should accelerate maintenance.
> 
>   Excellent news!
>   Are you planning to update the PyXML CVS as soon as 4Suite is
> released?

Yep.  And you'll be happy to know isSameNode() is implemented (but not 
documented).


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Wed Feb 14 18:43:01 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 11:43:01 -0700
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Wed, 14 Feb 2001 15:52:54 +0100." <Pine.LNX.4.21.0102141549410.2624-100000@sagittarius.logilab.fr>
Message-ID: <200102141843.LAA02730@localhost.localdomain>

> On Wed, 14 Feb 2001, Juergen Hermann wrote:
> 
> > Hi!
> > 
> > I know of two SOAP implementations for Python:
> >  * soaplib.py by PythonWare, more or less beta software
> 
> We're using this in Narval. It works well. However it chokes on unicode
> strings, so beware if you're planning to use it with python2.0

Note, /F says they are working on soaplib 0.9.5.  Sounds as if they have a 
hurdle or two, but it will probably emerge soon.

I imagine, based on his involvement with Python/Unicode that the next release 
will have better UNicode support.

Just in case, I'd send him e-mail mentioning the need.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Wed Feb 14 18:41:41 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 13:41:41 -0500 (EST)
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: <200102141840.LAA02710@localhost.localdomain>
References: <fdrake@acm.org>
 <14986.38873.297269.693330@cj42289-a.reston1.va.home.com>
 <200102141840.LAA02710@localhost.localdomain>
Message-ID: <14986.53605.447049.324038@cj42289-a.reston1.va.home.com>

Uche Ogbuji writes:
 > Yep.  And you'll be happy to know isSameNode() is implemented (but not 
 > documented).

  Even better!  I'm not worried about the 4Suite documentation since
the Python DOM API spec (in the Python Library Reference under
"xml.dom") covers it.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Wed Feb 14 18:57:47 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 11:57:47 -0700
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Wed, 14 Feb 2001 13:41:41 EST." <14986.53605.447049.324038@cj42289-a.reston1.va.home.com>
Message-ID: <200102141857.LAA02816@localhost.localdomain>

> 
> Uche Ogbuji writes:
>  > Yep.  And you'll be happy to know isSameNode() is implemented (but not 
>  > documented).
> 
>   Even better!  I'm not worried about the 4Suite documentation since
> the Python DOM API spec (in the Python Library Reference under
> "xml.dom") covers it.  ;-)

I don't see it.  At least not in the Node interface docs.

I think we should carefully mark this, since it could possibly change or even 
go away before DOM Level 3 goes gold.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 19:01:34 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 20:01:34 +0100 (CET)
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: <m14T35Q-000x5HC@smtp.web.de>
Message-ID: <Pine.LNX.4.21.0102142000550.933-100000@leo.logilab.fr>

On Wed, 14 Feb 2001, Juergen Hermann wrote:

> Hi!
> 
> I know of two SOAP implementations for Python:
> Any other implementations you know of?

http://python.scripting.com/directory/13/soap/implementations lists a few
things under that topic, but it looks like it's mostly java and perl
stuff.


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From Alexandre.Fayolle@logilab.fr  Wed Feb 14 19:04:17 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 14 Feb 2001 20:04:17 +0100 (CET)
Subject: [XML-SIG] Python SOAP Implementations
In-Reply-To: <200102141843.LAA02730@localhost.localdomain>
Message-ID: <Pine.LNX.4.21.0102142002400.945-100000@leo.logilab.fr>

On Wed, 14 Feb 2001, Uche Ogbuji wrote:

> > We're using this in Narval. It works well. However it chokes on unicode
> > strings, so beware if you're planning to use it with python2.0
> 
> Note, /F says they are working on soaplib 0.9.5.  Sounds as if they have a 
> hurdle or two, but it will probably emerge soon.
> 
> I imagine, based on his involvement with Python/Unicode that the next release 
> will have better UNicode support.
> 
> Just in case, I'd send him e-mail mentioning the need.

I'm pretty sure that he's aware of this: it is explicitely mentionned on
http://www.pythonware.com/products/soap/profile.htm :

"soaplib.py only supports 8-bit character sets. Future versions will add
support for arbitrary character sets (but only under Python 1.6)."


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From fdrake@acm.org  Wed Feb 14 19:35:17 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 14:35:17 -0500 (EST)
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: <200102141857.LAA02816@localhost.localdomain>
References: <fdrake@acm.org>
 <14986.53605.447049.324038@cj42289-a.reston1.va.home.com>
 <200102141857.LAA02816@localhost.localdomain>
Message-ID: <14986.56821.158358.561035@cj42289-a.reston1.va.home.com>

Uche Ogbuji writes:
 > I don't see it.  At least not in the Node interface docs.

  It's in the CVS version, so it becomes part of the "official" API in
Python 2.1.

 > I think we should carefully mark this, since it could possibly
 > change or even go away before DOM Level 3 goes gold.

  That's not necessarily the case for the Python bindings, but a note
about it shouldn't hurt.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Wed Feb 14 19:44:46 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 12:44:46 -0700
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: Message from Alexandre Fayolle <Alexandre.Fayolle@logilab.fr>
 of "Wed, 14 Feb 2001 15:48:42 +0100." <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>
Message-ID: <200102141944.MAA02965@localhost.localdomain>

> On Wed, 14 Feb 2001, Uche Ogbuji wrote:
> 
> > So can we think of a better algorithm than the current "check for file, and if 
> > it doesn't exist, just blindly toss it to urllib)?
> 
> If running windows, and the second character of the 'url' is a colon,
> replace it with a pipe and prepend file: to the url?
> 
> > This problem affects 4Suite as well.
> 
> I had to use a similar hack when generating a CATALOG file for Narval, for
> use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine
> until it got C|\fooo\dtd_base\
> 
> Maybe what we need is a new function in os.path or similar that would
> perform the file -> URL conversion described above. This would ease the
> work of application writers. I, for one, would be much more at ease if I
> knew that no implicit assumptions are made on what I pass. If the API
> requires an URI/URL, then this is what it should get. 

Here's what Tom Passim suggested to us a while back

"""
- Handle "file:" with no slashes because rightly or wrongly they're
often
used.
- For Windows, allow constructions like

    file:///c|...

even though it isn't in the rfc, because this form too is used a lot
(Who
started it, Netscape or Tim BL??)(The rfc doesn't require or  suggest
replacing a colon with a bar).
- For Windows, treat file:///c:\.... as an opaque url and just use the
embedded path literally.
- For Windows, treat file:///c:/... as a parsable path starting at c:\,
or at
least replace the forward with back slashes.
- Make sure that file://localhost/ acts the same as file:///   because
the rfc
says to do so.

- What have I missed? Something for the Mac?
"""

I meant to implement these heuristics for 4Suite, but I forgot.

Any comments?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From rvprasad@cis.ksu.edu  Wed Feb 14 21:00:25 2001
From: rvprasad@cis.ksu.edu (Venkatesh Prasad Ranganath)
Date: 14 Feb 2001 15:00:25 -0600
Subject: [XML-SIG] Re: DOM creation
In-Reply-To: Uche Ogbuji's message of "Wed, 14 Feb 2001 14:12:30 GMT"
References: <m3bss8k70l.fsf@boss.dreamsoft.com> <3A8A9254.C4F9FDC@ogbuji.net>
Message-ID: <m3k86t9dzq.fsf@boss.dreamsoft.com>

The following message is a courtesy copy of an article
that has been posted to comp.lang.python as well.

>>>>> "Uche" == Uche Ogbuji <uche@ogbuji.net> writes:

    Uche> Venkatesh Prasad Ranganath wrote:
    >> I have a question on how DOM for a XML document conforming to DOM 2
    >> should be constructed?
    >> 
    >> Now if there are no namespaces specified in the document then should
    >> attributes be added to DOM using setAttributeNS('', Name, Value) or
    >> setAttribute(Name, Value)?
    >> 
    >> The problem I am facing is when reading in a XML document with no
    >> explicit namespace specified in it through PyXML the attributes are added
    >> to the DOM using setAttributeNS with an empty NameSpace.  So, I wanted to
    >> clarify if this is a problem with PyXML or is this how other DOM
    >> Constructors work.

    Uche> This is correct behavior.  Of course, if you use the
    Uche> xml.dom.ext.reader.Sax reader, you get a tree with no namespace
    Uche> specifiers at all, which is also correct.

    Uche> If you plan to migrate to namespaces in future, or to mix namespace
    Uche> with non-namespace behavior, I'd suggest sticking to the PyExpat and
    Uche> Sax2 readers and using the DOm Level 2 methods (with appended "NS").

If this is the case then should or shouldn't set/getAttribute() in DOM2
"intelligently" assume empty namespace ('')?  Also, is there any specification
on construction of DOM for XML documents?  Or does the DOM specs available at
W3C describe the construction process?  If so, can somebody tell me in which
section?

    >> waiting for reply,

    Uche> This reminds me.  I'm not sure I sent a reply to your last enquiry.

    Uche> I had a few questions, such as which Reader you were trying to use (It
    Uche> looked as if you didn't paste all of your example code in).

    Uche> However, I'd suggest trying the latest 4Suite 0.10.2 beta that was
    Uche> announced (since I noticed you're using XPath), and see if you still
    Uche> have those problems.  If so, pleace copy follow-ups to
    Uche> xml-sig@python.org, which I check more regularly than this newsgroup.

I guess it works fine on with 0.10.2b.

Thankx

-- 
Venkatesh Prasad Ranganath


From fdrake@acm.org  Wed Feb 14 21:17:35 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 14 Feb 2001 16:17:35 -0500 (EST)
Subject: [XML-SIG] Re: DOM creation
In-Reply-To: <m3k86t9dzq.fsf@boss.dreamsoft.com>
References: <m3bss8k70l.fsf@boss.dreamsoft.com>
 <3A8A9254.C4F9FDC@ogbuji.net>
 <m3k86t9dzq.fsf@boss.dreamsoft.com>
Message-ID: <14986.62959.130541.7707@cj42289-a.reston1.va.home.com>

Venkatesh Prasad Ranganath writes:
 > If this is the case then should or shouldn't set/getAttribute() in
 > DOM2 "intelligently" assume empty namespace ('')?

  The level 1 methods should ignore the namespaceURI attribute and use
only the nodeName attribute when matching against existing nodes, and
new nodes created via setAttribute() should have a namespaceURI of
None (the Python way to spell the "empty" namespace).

 > Also, is there any specification on construction of DOM for XML
 > documents?  Or does the DOM specs available at W3C describe the
 > construction process?  If so, can somebody tell me in which
 > section?

  I presume you mean from a string or file containing marked text
rather than programmatically via DOM node constructors and
tree-manipulation methods.  There is some effort being made in the DOM
Level 3 working drafts which covers this, but that's still pretty raw.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Wed Feb 14 21:52:06 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 14:52:06 -0700
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: Message from Uche Ogbuji <uche.ogbuji@fourthought.com>
 of "Wed, 14 Feb 2001 12:44:46 MST." <200102141944.MAA02965@localhost.localdomain>
Message-ID: <200102142152.OAA03320@localhost.localdomain>

> > On Wed, 14 Feb 2001, Uche Ogbuji wrote:

> Here's what Tom Passim suggested to us a while back

My apologies to Tom Passin.  I also spelled Jeremy's name wrongly today.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Wed Feb 14 23:01:54 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 14 Feb 2001 16:01:54 -0700
Subject: [XML-SIG] bug in xml.dom.Document.importNode?
In-Reply-To: Message from Michael Dyck <MichaelDyck@home.com>
 of "Wed, 14 Feb 2001 00:42:08 PST." <3A8A44E0.FEBD419C@home.com>
Message-ID: <200102142301.QAA03637@localhost.localdomain>

> When I "import" a node from one document into another, it loses attributes.

> If someone could provide a fix or workaround, I would appreciate it.

This is another bug fixed in 4DOM CVS:

     from xml.dom import Document
     from xml.dom.ext.reader import PyExpat
     from xml.dom.ext import PrettyPrint
     reader = PyExpat.Reader()
     doc1 = reader.fromString("<foo a1='1' a2='2' a3='3'/>")
     original_node = doc1.documentElement
     PrettyPrint( original_node )
     doc2 = Document.Document( None )
     imported_node = doc2.importNode( original_node, deep=1 )
     PrettyPrint(imported_node)

prints

<foo a1='1' a2='2' a3='3'/>

We're wrapping up the 4Suite release, which will have the fix, then we'll 
check in to PyXML CVS to propagate the fix there.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tpassin@home.com  Thu Feb 15 01:02:49 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 14 Feb 2001 20:02:49 -0500
Subject: [XML-SIG] problems with PyXML 0.6.3
References: <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr>
Message-ID: <001901c096eb$04a9ed20$7cac1218@reston1.va.home.com>

This file: business is trickier than it seems, because the RFC is ambiguous
for file: urls.  A pipe character isn't in the rfc at all even though it's
used by some of the browsers.

I strongly suggest that when a local file is intended, that one should use the
file: scheme.  That way, the application doesn't have to guess and it won't
try a spurious url if the file isn't found.  The way it's done in this example
is just asking for continuous trouble, as I guess we're seeing now.

I think we should come to an agreement with the maintainer of the urllib about
the allowed forms for file: schemes.  It's mainly on Windows (and, perhaps,
Macs) that there would be a problem.  My preferred forms are these, for a file
at d:\temp\python\thefile.xml -

1) file:///d:/temp/python/thefile.xml

2) file:///d:\temp\python\thefile.xml

Both of these comply fully with the rfc.  2) is an "opaque" form - no further
parsing would be done by the url processor, it would just pass it to the os.
1) is what you get according to the rfc when you want the url processor to be
able to parse out the path parts.  The processor is supposed to know to
replace slashes by backslashes if appropriate for the os.

Either 1) or 2) would also work for files on a network file system, if you put
the host name in there -

file://host/temp/python/thefile.xml

1) would be more portable, and is my preference.  The processor should be able
to handle both, however.  For backwards compatibility, form 3) should also be
accepted, I suppose:

3) file:d:\temp\python\thefile.xml

This could be negotiated, though.

Let's agree on this and get it working right!

Cheers,

Tom P


Alexandre Fayolle wrote -

> On Wed, 14 Feb 2001, Uche Ogbuji wrote:
>
> > So can we think of a better algorithm than the current "check for file,
and if
> > it doesn't exist, just blindly toss it to urllib)?
>
> If running windows, and the second character of the 'url' is a colon,
> replace it with a pipe and prepend file: to the url?
>
> > This problem affects 4Suite as well.
>
> I had to use a similar hack when generating a CATALOG file for Narval, for
> use with xmlproc, since urllib would choke on C:\fooo\dtd_base\, and whine
> until it got C|\fooo\dtd_base\
>
> Maybe what we need is a new function in os.path or similar that would
> perform the file -> URL conversion described above. This would ease the
> work of application writers. I, for one, would be much more at ease if I
> knew that no implicit assumptions are made on what I pass. If the API
> requires an URI/URL, then this is what it should get.
>
> Opinions?


From tpassin@home.com  Thu Feb 15 03:40:52 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Wed, 14 Feb 2001 22:40:52 -0500
Subject: [XML-SIG] problems with PyXML 0.6.3
References: <Pine.LNX.4.21.0102141534450.2624-100000@sagittarius.logilab.fr> <001901c096eb$04a9ed20$7cac1218@reston1.va.home.com>
Message-ID: <001e01c09701$19653b00$7cac1218@reston1.va.home.com>

Sorry, for style 1) I meant this instead:

 1) file:///d/temp/python/thefile.xml

Using this style, the root of the path would be d/, and you don't need the
colon.

> I think we should come to an agreement with the maintainer of the urllib
about
> the allowed forms for file: schemes.  It's mainly on Windows (and, perhaps,
> Macs) that there would be a problem.  My preferred forms are these, for a
file
> at d:\temp\python\thefile.xml -
>
> 1) file:///d:/temp/python/thefile.xml
>
> 2) file:///d:\temp\python\thefile.xml
>


From uche.ogbuji@fourthought.com  Thu Feb 15 07:22:41 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 15 Feb 2001 00:22:41 -0700
Subject: [XML-SIG] Murphy strikes again
Message-ID: <200102150722.AAA00484@localhost.localdomain>

Our final tests turned up some more work needed in ODS and elsewhere.

Since we're trying to be extra-cautious with this release, and the others on 
the road to 1.0, we decided to hold off for a little more trouble-shooting and 
testing.

Unfortunately we have other obligations in the morning, so it could be until 
Friday before the next release.  I'll try to post another beta with today's 
fixes tomorrow morning.

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From l.szyster@ibm.net  Thu Feb 15 12:36:10 2001
From: l.szyster@ibm.net (Laurent Szyster)
Date: Thu, 15 Feb 2001 13:36:10 +0100
Subject: [XML-SIG] Python SOAP Implementations
References: <m14T35Q-000x5HC@smtp.web.de>
Message-ID: <3A8BCD3A.7D9BF7CC@ibm.net>

Juergen Hermann wrote:
> 
> I know of two SOAP implementations for Python:
>  * soaplib.py by PythonWare, more or less beta software
>  * Scarab - the WANT to implement SOAP, there's already a module named
> SOAP.py, but they're seemingly not ready yet
> 
> Any other implementations you know of?
> 

I've wrote a small SOAP server prototype for a customer,
using Medusa, pyexpat and a simplistic DOM based on
qp_xml.py (from Greg Stein).

But I did not implement a SOAP library (something that
instanciate objects from an XML stream and reverse).

The technique used for processing a SOAP request is to
pass a simple DOM instance to a function (actually, call
the __call__ method of the DOM instance), along with a
file-like instance where to "print" the response SOAP
envelope.

class SOAP_request (DOM.DOM):

    def __call__ (dom, stdout):

It's then up to this function to walk down the tree for
parameters, do what the procedure must do and output a
SOAP envelope response.

I cannot publish this prototype code, but I'm ready to
share my experience with the Apache SOAP toolkit.


Laurent Szyster


From guenter.radestock@sap.com  Thu Feb 15 14:34:35 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Thu, 15 Feb 2001 15:34:35 +0100
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EAE@dbwdfx14.wdf.sap-ag.de>

I have tried to find the problem in the expat parser module that
comes with PyXML-0.6.3 and that leads to Python crashes on
Windows when an exception is thrown while parsing incorrect 
stuff like:

from xml.parsers import expat
import sys
po = expat.ParserCreate('ISO-8859-1')
po.Parse(u'<?xml encoding="iso-8859-1" ?><test></test>', 1)

(The xml version is missing in the above example)
I have found the following:

1. the problems will go away if you remove the 
_xmlplus/parsers/pyexpat.pyd extension.  Then the extension
supplied with Python2.0 will be used.  Because this has less
features, things like SAX2 will probably not work any more,
but xml.parsers.expat will be usable as well as features of
the XML package that do not require expat.

2. in the file pyexpat.c, the variable "ErrorObject" is not
initialized (there is a test for null in the init method
of the module).  This is clearly a bug, but unfortunately
not the (only) source of the problem.  ErrorObject should
be declared as:

static PyObject *ErrorObject = NULL;

3. Inserting debug prints into the function xmlparse_Parse(()
shows that the pointer ErrorObject gets destroyed while
parsing the incorrect XML.  It does not get destroyed when
correct XML is parsed.

4. If I put the line

static int *willNotBeUsed;

immediately after the declaration of ErrorObject, the module
becomes more stable - it did not crash anymore with my tests.
This cannot be the solution, though.  I have no idea right now
how to get this straight and little experience in debugging
at would appreciate it a lot if somebody else could look into
this.  This may be a problem with expat itself and not the
module?

- Guenter


From noreply@sourceforge.net  Thu Feb 15 16:46:39 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 15 Feb 2001 08:46:39 -0800
Subject: [XML-SIG] [Bug #132541] xml.dom.WrongDocumentErr is missing redefinition of __init__
Message-ID: <E14TRYF-0007kB-00@usw-sf-web3.sourceforge.net>

Bug #132541, was updated on 2001-Feb-15 08:46
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: mjpieters
Assigned to : nobody
Summary: xml.dom.WrongDocumentErr is missing redefinition of __init__

Details: xml.dom.WrongDocumentErr (defined in xml/dom/__init__.py) is
missing the following line:

  __init__ = DOMException._derived_init

Trying to raise xml.dom.WrongDocumentErr(<exception description>) will
therefor fail.


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132541&group_id=6473


From uche.ogbuji@fourthought.com  Thu Feb 15 18:16:44 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 15 Feb 2001 11:16:44 -0700
Subject: [XML-SIG] Python Web Services Column
Message-ID: <200102151816.LAA09197@localhost.localdomain>

Also wanted to note that Mike and I are newly columnists on the Web Services 
Zone of IBM developerWorks:

http://www-106.ibm.com/developerworks/webservices/

The column is called 

"The Python Web services developer"

First installment is at

http://www-106.ibm.com/developerworks/library/ws-pyth1.html?dwzone=ws

Blurb:

"Python's motto has always been "batteries included," referring to the large 
array of standard libraries and facilities that come with the language 
installation. This article presents an overview and survey of tools and 
facilities available for Web services development in Python. This includes 
built-in Python features and third-party open-source tools."

Unfortunately, we only mentioned Ken MacLeod's Scarab, and not Orchard.  
Didn't know any better.  Next time.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Thu Feb 15 19:53:23 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 15 Feb 2001 20:53:23 +0100
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EAE@dbwdfx14.wdf.sap-ag.de>
 (guenter.radestock@sap.com)
References: <FAFE609CB754D311B60C0008C75D355608C90EAE@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200102151953.f1FJrNM02423@mira.informatik.hu-berlin.de>

> I have tried to find the problem in the expat parser module that
> comes with PyXML-0.6.3 and that leads to Python crashes on
> Windows when an exception is thrown while parsing incorrect 
> stuff like

I believe this bug is fixed on both Python CVS and PyXML CVS: an array
should have 257 instead of 256 elements.

You can either take the corrected version from the CVS, or wait for
0.6.4.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Thu Feb 15 20:05:44 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 15 Feb 2001 21:05:44 +0100
Subject: [XML-SIG] windows installer for XML package failing on Windows 95
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90E9F@dbwdfx14.wdf.sap-ag.de>
 (guenter.radestock@sap.com)
References: <FAFE609CB754D311B60C0008C75D355608C90E9F@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200102152005.f1FK5iK02500@mira.informatik.hu-berlin.de>

> I tried to install the XML package onto a Windoze 95 box a few days
> ago and it does not work.  The installer crashes without unpacking
> source or opening any window.  This may be a distutils issue.

It certainly sounds like one. I recommend to contact the author of the
bdist_wininst command, Thomas Heller; or to post a message to the
distutils SIG. I believe I've used distutils 1.0 to create teh
installer.

> First: I can successfully unpack the executable with winzip and move
> the package directory into Python20/Lib.  This seems to work, but I
> am not sure if I should also patch any existing files.  Is there a
> script inside the installer that I should run after unpacking?

No, nothing. The installer does not support any post-processing, AFAIK.

> I did not find a setup.py; the source package won't help me because
> I would have to install a compiler for the extensions, right?

Right.

> 1. The installer crashes only on this one Libretto 50ct Laptop with
> Windows 95, second edition.  I have successfully used it on other
> Windows computers.

Unfortunately, this is a multi-level bootstrapping. The installer GUI
might use some Windows DLLs or Windows API in the wrong way. However,
the installer itself is compressed with an auto-uncompression program,
which might also fail.

> 2. Before installing the XML package, I first removed Python 1.5.2, then
> removed the TCL/TK that came with 1.5.2, then installed Python 2.0.  I did
> not have Python 1.5.2 on the other systems I installed the package on.
> I also have an older (don't rememver the exact) version of Winzip on the
> Libretto - can the Winzip DLL be the source of my problem?

Unlikely. The installer has the InfoZip library statically linked.

Regards,
Martin


From karl@digicool.com  Fri Feb 16 00:31:21 2001
From: karl@digicool.com (Karl Anderson)
Date: 15 Feb 2001 16:31:21 -0800
Subject: [XML-SIG] Python IDL mapping reference?
Message-ID: <m1zofnxycm.fsf@localhost.localdomain>

I can't find the Python IDL mapping reference that IIRC used to be in
the xml-sig area.  Could someone send me an URL?

-- 
Karl Anderson                          karl@digicool.com


From martin@loewis.home.cs.tu-berlin.de  Fri Feb 16 07:11:17 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 16 Feb 2001 08:11:17 +0100
Subject: [XML-SIG] Python IDL mapping reference?
In-Reply-To: <m1zofnxycm.fsf@localhost.localdomain> (message from Karl
 Anderson on 15 Feb 2001 16:31:21 -0800)
References: <m1zofnxycm.fsf@localhost.localdomain>
Message-ID: <200102160711.f1G7BHm00848@mira.informatik.hu-berlin.de>

> I can't find the Python IDL mapping reference that IIRC used to be in
> the xml-sig area.  Could someone send me an URL?

Not sure where it was supposed to be in the xml-sig area, but the
OMG-adopted Python language mapping is at

http://cgi.omg.org/cgi-bin/doc?ptc/00-04-08

Regards,
Martin


From guenter.radestock@sap.com  Fri Feb 16 08:47:24 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Fri, 16 Feb 2001 09:47:24 +0100
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EAF@dbwdfx14.wdf.sap-ag.de>

> -----Original Message-----
> From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
> Sent: Donnerstag, 15. Februar 2001 20:53
> To: Radestock, Guenter
> Cc: XML-SIG@python.org
> Subject: Re: [XML-SIG] broken expat module in PyXML-0.6.3
> 
> 
> > I have tried to find the problem in the expat parser module that
> > comes with PyXML-0.6.3 and that leads to Python crashes on
> > Windows when an exception is thrown while parsing incorrect 
> > stuff like
> 
> I believe this bug is fixed on both Python CVS and PyXML CVS: an array
> should have 257 instead of 256 elements.
> 
> You can either take the corrected version from the CVS, or wait for
> 0.6.4.

Thanks a lot.  I got the corrected file from CVS.  Unfortunately, it
does not compile (revision 1.31 of pyexpat.c) because
my_StartElementHandler()
is defined twice (from a macro and literally).  I deleted one definition
(the literal one at the top of the file) and it seems the problem has
gone away.

- Guenter


From Juergen Hermann" <jh@web.de  Fri Feb 16 09:28:31 2001
From: Juergen Hermann" <jh@web.de (Juergen Hermann)
Date: Fri, 16 Feb 2001 10:28:31 +0100
Subject: [XML-SIG] Python IDL mapping reference?
In-Reply-To: <200102160711.f1G7BHm00848@mira.informatik.hu-berlin.de>
Message-ID: <m14ThBn-000uh7C@smtp.web.de>

On Fri, 16 Feb 2001 08:11:17 +0100, Martin v. Loewis wrote:

>> I can't find the Python IDL mapping reference that IIRC used to be 
>Not sure where it was supposed to be in the xml-sig area, but the
>OMG-adopted Python language mapping is at
>
>http://cgi.omg.org/cgi-bin/doc?ptc/00-04-08

Great, so far we only had two URLs with the original spec and the 
corrections for it. BTW, Martin, is there anything you are NOT involved =

with? ;)


Ciao, J=FCrgen

--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/


From noreply@sourceforge.net  Fri Feb 16 11:36:46 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 16 Feb 2001 03:36:46 -0800
Subject: [XML-SIG] [Bug #132683] DOMImplementation.hasFeature('Core', None) returns 0
Message-ID: <E14TjBu-0003bs-00@usw-sf-web2.sourceforge.net>

Bug #132683, was updated on 2001-Feb-16 03:36
Here is a current snapshot of the bug.

Project: Python/XML
Category: 4Suite
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: mjpieters
Assigned to : nobody
Summary: DOMImplementation.hasFeature('Core', None) returns 0

Details: The following is wrong:

> python 
Python 1.5.2 (#0, Dec 27 2000, 14:53:01)  [GCC 2.95.2 20000220 (Debian
GNU/Linux)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from xml.dom import implementation
>>> implementation.hasFeature('Core', None)
0
>>>

The spec says that any DOM implementation compliant with the DOM API should
at least implement the 'Core' feature set; PyXML certainly does, so the
call to hasFeature should succeed.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132683&group_id=6473


From Dan.Rolander@marriott.com  Fri Feb 16 19:03:58 2001
From: Dan.Rolander@marriott.com (Rolander, Dan)
Date: Fri, 16 Feb 2001 14:03:58 -0500
Subject: [XML-SIG] windows installer for XML package failing on Window
 s 95
Message-ID: <6176E3D8E36FD111B58900805FA7E0F80CCF63A9@mcnc-mdm1-ex01>

There are two possible problems. Either your missing MSVCRT.DLL, or you need
to update COMCTL32.DLL. The latter is probably the problem, because if
you're missing the first dll you'll get a warning telling you that, but if
you have an older version of comctl32.dll the installer will crash (I had
this same problem). You can get the update from
http://www.microsoft.com/msdownload/ieplatform/ie/comctrlx86.asp. There are
installers for IE 4.01 and IE 5.0.

HTH,
Dan

-----Original Message-----
From: Martin v. Loewis [mailto:martin@loewis.home.cs.tu-berlin.de]
Sent: Thursday, February 15, 2001 3:06 PM
To: guenter.radestock@sap.com
Cc: XML-SIG@python.org
Subject: Re: [XML-SIG] windows installer for XML package failing on
Windows 95


> I tried to install the XML package onto a Windoze 95 box a few days
> ago and it does not work.  The installer crashes without unpacking
> source or opening any window.  This may be a distutils issue.

It certainly sounds like one. I recommend to contact the author of the
bdist_wininst command, Thomas Heller; or to post a message to the
distutils SIG. I believe I've used distutils 1.0 to create teh
installer.

> First: I can successfully unpack the executable with winzip and move
> the package directory into Python20/Lib.  This seems to work, but I
> am not sure if I should also patch any existing files.  Is there a
> script inside the installer that I should run after unpacking?

No, nothing. The installer does not support any post-processing, AFAIK.

> I did not find a setup.py; the source package won't help me because
> I would have to install a compiler for the extensions, right?

Right.

> 1. The installer crashes only on this one Libretto 50ct Laptop with
> Windows 95, second edition.  I have successfully used it on other
> Windows computers.

Unfortunately, this is a multi-level bootstrapping. The installer GUI
might use some Windows DLLs or Windows API in the wrong way. However,
the installer itself is compressed with an auto-uncompression program,
which might also fail.

> 2. Before installing the XML package, I first removed Python 1.5.2, then
> removed the TCL/TK that came with 1.5.2, then installed Python 2.0.  I did
> not have Python 1.5.2 on the other systems I installed the package on.
> I also have an older (don't rememver the exact) version of Winzip on the
> Libretto - can the Winzip DLL be the source of my problem?

Unlikely. The installer has the InfoZip library statically linked.

Regards,
Martin

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig


From jeremy.kloth@fourthought.com  Fri Feb 16 20:23:39 2001
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Fri, 16 Feb 2001 13:23:39 -0700
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
References: <FAFE609CB754D311B60C0008C75D355608C90EAF@dbwdfx14.wdf.sap-ag.de>
Message-ID: <3A8D8C4B.3B16F3D5@fourthought.com>

"Radestock, Guenter" wrote:
> 
> Unfortunately, it does not compile (revision 1.31 of pyexpat.c) because
> my_StartElementHandler() is defined twice (from a macro and literally).
> I deleted one definition (the literal one at the top of the file) and 
> it seems the problem has gone away.
> 

Is the literal handler there for Expat 1.95?  If so, we should probably
have
a #if..#endif around it for that version.

-- 
Jeremy Kloth                             Consultant
jeremy.kloth@fourthought.com             (303)583-9900 x 105
Fourthought, Inc.                        http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Fri Feb 16 20:35:41 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 16 Feb 2001 15:35:41 -0500 (EST)
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
In-Reply-To: <3A8D8C4B.3B16F3D5@fourthought.com>
References: <FAFE609CB754D311B60C0008C75D355608C90EAF@dbwdfx14.wdf.sap-ag.de>
 <3A8D8C4B.3B16F3D5@fourthought.com>
Message-ID: <14989.36637.359790.864097@cj42289-a.reston1.va.home.com>

Jeremy Kloth writes:
 > Is the literal handler there for Expat 1.95?  If so, we should probably
 > have
 > a #if..#endif around it for that version.

  Actually, the literal handler should be there for all versions, and
the macro-ized version should be removed.  I'll get this fixed.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Fri Feb 16 20:37:08 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 16 Feb 2001 15:37:08 -0500 (EST)
Subject: [XML-SIG] broken expat module in PyXML-0.6.3
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EAF@dbwdfx14.wdf.sap-ag.de>
References: <FAFE609CB754D311B60C0008C75D355608C90EAF@dbwdfx14.wdf.sap-ag.de>
Message-ID: <14989.36724.278682.884483@cj42289-a.reston1.va.home.com>

Radestock, Guenter writes:
 > is defined twice (from a macro and literally).  I deleted one definition
 > (the literal one at the top of the file) and it seems the problem has
 > gone away.

  Try removing the macro-ized version; it doesn't have all the
features of the first implementation.  I'll make corrections in CVS
shortly.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From larsga@garshol.priv.no  Sat Feb 17 14:04:37 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Feb 2001 15:04:37 +0100
Subject: [XML-SIG] Roadmap document - finally!
Message-ID: <m3ofw1v216.fsf@lambda.garshol.priv.no>

After going through lots of trouble with mail servers and crashed disk
drives I've now written the roadmap document (twice) and posted it at
(once):

  <URL: http://pyxml.sourceforge.net/topics/roadmap.html >

Please have a look at it and tell me what you think. 

I haven't yet added any links to it, but will do so as soon as it is
accepted by the group.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 15:55:58 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 16:55:58 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <m3ofw1v216.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 17 Feb 2001 15:04:37 +0100)
References: <m3ofw1v216.fsf@lambda.garshol.priv.no>
Message-ID: <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de>

> Please have a look at it and tell me what you think. 

It looks good to me. On the pyexpat lexical handler: Uche already
contributed such support, which reports comments and CDATA. Do you
think you can talk pyexpat into reporting more than that? Would that
require some minimal expat version to work?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 15:44:51 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 16:44:51 +0100
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2
In-Reply-To: <E14U7ty-00076n-00@usw-pr-cvs1.sourceforge.net> (message from
 Lars Marius Garshol on Sat, 17 Feb 2001 05:59:54 -0800)
References: <E14U7ty-00076n-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de>

Hi Lars,

Thanks for maintaining the roadmap.

>    <li>Re-indent everything to 4-space indents

I'll do that. It is actually done for most of the code that does not
have an explicit owner, only 4DOM and xmlproc still need to go through
reindent.py.

>  <li>Move development to the PyXML CVS tree. This includes moving
>  the test suite.
>  <li>Release version 0.80 with updates for XML 1.0 2nd edition
>  compliance, better validator independence (from parser), better
>  location reporting and base sysid handling, Unicode support and
>  improved convenience APIs. It may be that there will be several
>  releases on the road to 0.80.

Can you give an estimate point in time for completion of these items?
Or perhaps just the first one?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 16:05:37 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 17:05:37 +0100
Subject: [XML-SIG] Unicode support problems in parsers
In-Reply-To: <3A8A677F.B227C56B@helsinki.fi> (message from Jere
 =?ISO-8859-1?Q?Kahanp=E4=E4?= on Wed, 14 Feb 2001 13:09:51 +0200)
References: <3A8A677F.B227C56B@helsinki.fi>
Message-ID: <200102171605.f1HG5bL09016@mira.informatik.hu-berlin.de>

> Unfortunately the default parser seeems to have serious memory
> management problems: the total amount of used memory grows by 1-2
> megabytes for each processed file. A forced garbage collection (this
> is Py2.0) doesn't help at all.

pyexpat in 0.6.2 had a number of memory leaks, most of which got fixed
in 0.6.3, although some are only fixed in the CVS. So if you take the
pyexpat.c from CVS, things should look much better.

There were two problems: the SAX reader created cyclic garbage (which
it shouldn't), and pyexpat would not participate in garbage
collection, which caused cycles involving Parser objects not to be
collected.

> However, an even more serious problem was now encountered; the
> default *validating* parser returns normal Python string, while the
> default parser returns Unicode strings as any sensible
> XML-processing tool should do.

Yes, this is a known problem with xmlproc in the Python CVS, I hope
Lars Marius will contribute an updated version soon.

> This behaviour do cause any amount of trouble elsewhere in the code:
> The PrettyPrinter, for example, don't work at all with normal
> strings with non-ascii chars.

Which, in turn, is a bug in the pretty printer - since we are
attempting backwards compatibility with 1.5.2, it *should* support
plain strings.

> I don't have the names of the parsers with problems right here, but
> the test runs were done on a Linux box with PyXML 0.6.2.

Sorry for the inconvenience. If you need a fix right away, I suggest
you either use the PyXML CVS, or the 4Suite 0.10.2 beta, which has
many of the components updated. If you can wait somewhat longer - I
hope that I can release PyXML 0.6.4 in the near future.

Regards,
Martin


From larsga@garshol.priv.no  Sat Feb 17 16:14:13 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Feb 2001 17:14:13 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de>
References: <m3ofw1v216.fsf@lambda.garshol.priv.no> <200102171555.f1HFtw208907@mira.informatik.hu-berlin.de>
Message-ID: <m3g0hduw16.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| On the pyexpat lexical handler: Uche already contributed such
| support, which reports comments and CDATA. Do you think you can talk
| pyexpat into reporting more than that?

It should be able to support reporting of entity boundaries, at least.

| Would that require some minimal expat version to work?

The current expat version should be sufficient for entity boundaries.
I forget whether the LexicalHandler contains anything more. If it does
and if that requires anything special from expat I'll raise the issue
at that point. I haven't got all this stuff in my head now, so I can't
say anything more yet.

--Lars M.


From larsga@garshol.priv.no  Sat Feb 17 16:19:00 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Feb 2001 17:19:00 +0100
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2
In-Reply-To: <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de>
References: <E14U7ty-00076n-00@usw-pr-cvs1.sourceforge.net> <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de>
Message-ID: <m3elwxuvt7.fsf@lambda.garshol.priv.no>

Hi Martin,

* Martin v. Loewis
| 
| Thanks for maintaining the roadmap.

No problem. :)
 
* Lars Marius Garshol
|
|  <li>Re-indent everything to 4-space indents
 
* Martin v. Loewis
|
| I'll do that.

OK. Should I remove it from the list or leave it there until you've
done it?

* Lars Marius Garshol
|
| <li>Move development to the PyXML CVS tree. This includes moving
| the test suite.
| <li>Release version 0.80 with updates for XML 1.0 2nd edition
| compliance, better validator independence (from parser), better
| location reporting and base sysid handling, Unicode support and
| improved convenience APIs. It may be that there will be several
| releases on the road to 0.80.
 
* Martin v. Loewis
|
| Can you give an estimate point in time for completion of these items?
| Or perhaps just the first one?

The first one I hope to do very soon. I would have done it already had
not my laptop crashed and taken some of this work with it. As it is I
am not sure how much I need to do over, but this is the first XML-SIG
related thing I'll do[1], and it shouldn't take too long.

Getting all of version 0.80 done will take several months, I expect,
mostly because I'll be taking a lot of time off from all kinds of work.

Since I have yet to provide an accurrate estimate of this kind of
thing I won't try to be more specific.

--Lars M.

[1] Provided I can resist the temptation to implement Rick Jelliffe's
    Hook schema language.


From tpassin@home.com  Sat Feb 17 16:46:54 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 17 Feb 2001 11:46:54 -0500
Subject: [XML-SIG] Roadmap document - finally!
References: <m3ofw1v216.fsf@lambda.garshol.priv.no>
Message-ID: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>

Lars Marius Garshol wrote -

>
> After going through lots of trouble with mail servers and crashed disk
> drives I've now written the roadmap document (twice) and posted it at
> (once):
>
>   <URL: http://pyxml.sourceforge.net/topics/roadmap.html >
>
> Please have a look at it and tell me what you think.
>
 Thanks, Lars, for doing this.  It's a big service.

I'd like to suggest a few things, and see what people think.  First of all, I
think we need to address testing and especially regression testing.  From
reading various posts lately, it seems like a lot of things pop up, get fixed
in some version on the cvs tree, and later on, who knows which version has
what fixed, or how to prevent it from popping up again.

We would benefit from a good test suite that is easy to run, self-evaluates
the results, contains plenty of regression tests, and makes it easy to add
tests.  Although I know that no one (including me) wants to spend time on
this, once it's accomplished, we should be able to improve the quality of the
results while spending less effort on testing and bug fixing.

I suggest we look at using pyUnit for this.  I only looked at it for a few
minutes, but it looks promising.  It might make sense to use the OASIS parser
test cases as a part of the test suite.

Second, I think the road map should include directions for future work.
What's in there now is mostly finishing up on current work.  What might we
want to get into?  One thing is to keep the standard tools up with newer
versions of existing W3C Recs.  This would include DOM 3, and the new releases
of xpath, xslt, and xpointer.  We did this for SAX2, and surely we will
want/need to do the same for the other key recs.  Let's sketch out these
intents in the Roadmap.

Next in the way of future directions would be important new Recs.  Xml Schemas
would seem to be a prime candidate.  Is anyone working or wanting to work on
py-xml-xchemas?  Can we get some of Henry Thompson's code?  What about an API
for xml schemas? Can we take the lead in that? Or do we not want to (or no one
is personally interested?).  Let's get it into the Roadmap.

Then there are the non-standards things.  Is pyXml going to do anything with
RDF? Topic maps? What else?  Into the roadmap, even if there is no one to work
on such projects at the moment.

Finally, let's add some direction for some of the other efforts that keep
popping up, like miniDOM.  How will it fit into the picture.  We've been
talking about it recently.  Into the roadmap, I say!

I apologise for the length of this post, but there is a lot to think about
here!

Cheers,

Tom Passin


From uche.ogbuji@fourthought.com  Sat Feb 17 16:51:36 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 17 Feb 2001 09:51:36 -0700
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics
 roadmap.ht,1.1,1.2
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Sat, 17 Feb 2001 16:44:51 +0100." <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de>
Message-ID: <200102171651.JAA05661@localhost.localdomain>

> Hi Lars,
> 
> Thanks for maintaining the roadmap.
> 
> >    <li>Re-indent everything to 4-space indents
> 
> I'll do that. It is actually done for most of the code that does not
> have an explicit owner, only 4DOM and xmlproc still need to go through
> reindent.py.

Our internal strandard is 4-space indents as well, so 4DOM should be a simple 
enough task.

Speaking of: any point mentioning the full and permanent merging of 4DOM into 
the PyXML core?  Then again, it will happen to soon to bother adding it now.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From larsga@garshol.priv.no  Sat Feb 17 16:57:10 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Feb 2001 17:57:10 +0100
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics   roadmap.ht,1.1,1.2
In-Reply-To: <200102171651.JAA05661@localhost.localdomain>
References: <200102171651.JAA05661@localhost.localdomain>
Message-ID: <m3ae7luu1l.fsf@lambda.garshol.priv.no>

* Uche Ogbuji
| 
| Speaking of: any point mentioning the full and permanent merging of
| 4DOM into the PyXML core?  Then again, it will happen to soon to
| bother adding it now.

Seems like you answered your own question. :-)

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 16:46:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 17:46:33 +0100
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: <3A88F359.991E26FD@home.com> (message from Michael Dyck on Tue,
 13 Feb 2001 00:42:01 -0800)
References: <3A88F359.991E26FD@home.com>
Message-ID: <200102171646.f1HGkXm09267@mira.informatik.hu-berlin.de>


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 16:23:41 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 17:23:41 +0100
Subject: [XML-SIG] Bug: XML Prolog from xml.sax.writer should contain version
In-Reply-To: <FAFE609CB754D311B60C0008C75D355608C90EA8@dbwdfx14.wdf.sap-ag.de>
 (guenter.radestock@sap.com)
References: <FAFE609CB754D311B60C0008C75D355608C90EA8@dbwdfx14.wdf.sap-ag.de>
Message-ID: <200102171623.f1HGNfe09147@mira.informatik.hu-berlin.de>

> please anybody fix this on sourceforge so it will be OK in the
> next release.

This was fixed in revision 1.4 of writer.py.

Regards,
Martin


From uche.ogbuji@fourthought.com  Sat Feb 17 17:06:35 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 17 Feb 2001 10:06:35 -0700
Subject: [XML-SIG] Hook
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "17 Feb 2001 17:19:00 +0100." <m3elwxuvt7.fsf@lambda.garshol.priv.no>
Message-ID: <200102171706.KAA06679@localhost.localdomain>

> [1] Provided I can resist the temptation to implement Rick Jelliffe's
>     Hook schema language.

Ah.  You too?  I'm also quite intrigued by Hook.  Interesting to see how such 
an extremely minimalist schema language will hold up to real-world cases.

In case anyone is wondering what Hook is, here is a complete schema for XHTML 
Basic.

<hook:order
  xmlns:hook="http://www.ascc.net/xml/hook"
  targetNamespace="http://www.w3.org/1999/xhtml"
>
  html head  [ title; meta. link. base. ]   
  body [ a br. blockquote caption; div  dl; form h1; h2; h3; h4; h5; h6;  
        img. ol; p; pre; table; ul; ]  
  [ tr;  dt; dd; li;  input; label; select; textarea; ]  [ td option. ]
  [ abbr acronym address cite code dfn em kbd q samp span strong var object; ] 
  param 
</hook:order>

Me, I like.  See http://www.ascc.net/xml/hook.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 17:16:00 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 18:16:00 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <14981.36324.913804.941652@gargle.gargle.HOWL> (message from Don
 Wakefield on Sat, 10 Feb 2001 10:52:20 -0800 (PST))
References: <14980.41475.888529.565845@gargle.gargle.HOWL>
 <200102100700.f1A70X701220@mira.informatik.hu-berlin.de> <14981.36324.913804.941652@gargle.gargle.HOWL>
Message-ID: <200102171716.f1HHG0B09368@mira.informatik.hu-berlin.de>

> So my environment is fine. PyExpat.py does not import pyexpat, but I do
> in my calling test script:
> 
>   from xml.parsers import pyexpat
>   from xml.dom.ext.reader import PyExpat

That does not matter. An import is always local to the module, so if
you import it into __main__, it still won't be in PyExpat - so there
is a clear bug in PyExpat.

> Note that I've downloaded PyXML-0.6.3 from Sourceforge (haven't
> installed it yet) and PyExpat.py in *that* version does not import
> pyexpat either. So if you are not able to duplicate the problem with
> that version, it must be something deeper...

Sorry for the confusion. I had some 4Suite release installed, not
PyXML 0.6.3, which indeed has this bug.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 17:06:21 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 18:06:21 +0100
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: <3A88F359.991E26FD@home.com> (message from Michael Dyck on Tue,
 13 Feb 2001 00:42:01 -0800)
References: <3A88F359.991E26FD@home.com>
Message-ID: <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de>

Hi Michael,

Thanks for your comments.

> Shouldn't the installer remove or rename the existing _xmlplus dir
> first?

Unfortunately, the installer is based on distutils, which does not
provide such a capability. Patches are welcome, of course.

> xmldoc/README says it's "v0.6.2"

Thanks, it will read 0.6.4 in the next release.

> xmldoc/README could note that if you've just run an installer,
> you don't have to do any of the "python setup.py ..." commands.

Ok, I added such a comment.

> xmldoc/test:
>     Either xmldoc/README or (new file) xmldoc/test/README should tell you
>     how to run the tests in this dir (`python testxml.py -g', I think),
>     and how to interpret what happens.  Similarly for subdirs.
>     Maybe tests should be run automatically on installation.

Not sure about that. Perhaps I should add a note that the tests should
*not* be run, unless you know what you do. Contributions of more
elaborate documentation would be welcome, of course.

>     I had 2 tests fail:
>     test test_sax crashed --
>         exception.SystemError : 'finally' pops bad exception

That is a serious bug of pyexpat in 0.6.3 on Windows, which basically
means that the Windows distribution is useless. It was subsequently
fixed with the pyexpat.c in the Python and PyXML CVS.

>     test test_saxdrivers crashed --
>         exceptions.IOError : [Errno url error] unknown url type: 'c'

Not sure about this one. It might be a problem with drive letters and
urllib.

> xmldoc/test/dom:
>     When I tried `python test.py', I got "Error in syntax" right away.

I hope that we'll get an update to this code soon, so there is
probably no need to investigate it further.

> When I ran one of my DOM programs, I got this exception:
>         from xml.dom.Node import Node
>     ImportError: No module named Node

Yes, xml.dom.Node is gone. Why did you need to import it? If it was to
get at the node type constants, they live in xml.dom.Node now.

> When I tried removing the ".Node" from the import statement, the
> program ran as before, so apparently that is the fix, but shouldn't
> this be noted fairly prominently in xmldoc/README or
> xmldoc/README.dom?

Contributions of documentation are welcome. I'd rather not maintain a
change log of all API changes; having the current state of the API
documented somewhere would be good, though.

> xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and
> ../README.html, which do not exist.

Again, with the next 4DOM update, this might look completely
different.

Regards,
Martin


From uche.ogbuji@fourthought.com  Sat Feb 17 17:19:09 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 17 Feb 2001 10:19:09 -0700
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: Message from "Thomas B. Passin" <tpassin@home.com>
 of "Sat, 17 Feb 2001 11:46:54 EST." <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
Message-ID: <200102171719.KAA07503@localhost.localdomain>

> We would benefit from a good test suite that is easy to run, self-evaluates
> the results, contains plenty of regression tests, and makes it easy to add
> tests.  Although I know that no one (including me) wants to spend time on
> this, once it's accomplished, we should be able to improve the quality of the
> results while spending less effort on testing and bug fixing.
> 
> I suggest we look at using pyUnit for this.  I only looked at it for a few
> minutes, but it looks promising.  It might make sense to use the OASIS parser
> test cases as a part of the test suite.

Looks as if PyUnit is about to be elevated to The True Python Unit Testing 
System, so I guess this makes sense.

> Second, I think the road map should include directions for future work.
> What's in there now is mostly finishing up on current work.  What might we
> want to get into?  One thing is to keep the standard tools up with newer
> versions of existing W3C Recs.  This would include DOM 3,

On its way.

> and the new releases
> of xpath,

None yet.

> xslt,

I'm still conducting a Jihad against XSLT 1.1 on xsl-list (and the 
xsl-editors@w3.org list).  Hopefully I can get them to ditch xsl:script.  
Looks as if I have quite a bit of support, but who ever knows what the W3C 
will do?

> and xpointer.

4XPointer in 0.10.2 is about 90% there.  A bit of work left on points and 
ranges.

> We did this for SAX2, and surely we will
> want/need to do the same for the other key recs.  Let's sketch out these
> intents in the Roadmap.
> 
> Next in the way of future directions would be important new Recs.  Xml Schemas
> would seem to be a prime candidate.  Is anyone working or wanting to work on
> py-xml-xchemas?

Eww!  XSchemas got cooties!  I'm not touching it.  I'd rather see if Lars 
comes up with anything on Hook.

But I know, I know, someone will have to implement XSchemas for maximum Python 
Buzzworthiness.

> Can we get some of Henry Thompson's code?  What about an API
> for xml schemas? Can we take the lead in that? Or do we not want to (or no one
> is personally interested?).  Let's get it into the Roadmap.
> 
> Then there are the non-standards things.  Is pyXml going to do anything with
> RDF?

There is 4RDF.  Does PyXML really need to dupe the effort?  4RDF is a *very* 
advanced RDF implementation, even though I say so myself.

See

http://www.xml.com/2000/10/11/rdf/index.html

> Topic maps?

I think Lars and Geir are manning this fort.

> What else?  Into the roadmap, even if there is no one to work
> on such projects at the moment.

Off-head:

XQL has finally awoken from its funk
Experimental parser-level XInclude and XML:Base support maybe
A low-level Infoset API would be interesting
Schematron implemented in Python rather than XSLT
RELAX
TREX
UDDI
WebDAV client services


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Sat Feb 17 17:31:00 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 17 Feb 2001 10:31:00 -0700
Subject: [XML-SIG] 4Suite Beta 3 (pretty much release candidate)
Message-ID: <200102171731.KAA08295@localhost.localdomain>

4Suite is pretty much all done.  4SS was the reason the release didn't go out 
on Friday.  We'll be in to finish the job tomorrow.

Meanwhile, here is a version with the ODS fixes I mentioned a few days ago and 
other fixes.  The only changes I expect between this one and the final are 
l10n changes based on discussions with Martin and Alexandre, so please help us 
keep off the brown paper bag.

Thanks.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b3.tar.gz


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 17:21:24 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 18:21:24 +0100
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2
In-Reply-To: <m3elwxuvt7.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 17 Feb 2001 17:19:00 +0100)
References: <E14U7ty-00076n-00@usw-pr-cvs1.sourceforge.net> <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> <m3elwxuvt7.fsf@lambda.garshol.priv.no>
Message-ID: <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de>

> OK. Should I remove it from the list or leave it there until you've
> done it?

Please leave it as a reminder.

> The first one I hope to do very soon. I would have done it already had
> not my laptop crashed and taken some of this work with it. As it is I
> am not sure how much I need to do over, but this is the first XML-SIG
> related thing I'll do[1], and it shouldn't take too long.
> 
> Getting all of version 0.80 done will take several months, I expect,
> mostly because I'll be taking a lot of time off from all kinds of work.
> 
> Since I have yet to provide an accurrate estimate of this kind of
> thing I won't try to be more specific.

Thanks. This is accurate enough. I'm looking forward to the
integration of the current xmlproc then, since I'd like to look into
generating Unicode strings in xmlproc myself, unless this is already done.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Sat Feb 17 17:48:38 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 17 Feb 2001 18:48:38 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
 (tpassin@home.com)
References: <m3ofw1v216.fsf@lambda.garshol.priv.no> <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
Message-ID: <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de>

> We would benefit from a good test suite that is easy to run, self-evaluates
> the results, contains plenty of regression tests, and makes it easy to add
> tests.  Although I know that no one (including me) wants to spend time on
> this, once it's accomplished, we should be able to improve the quality of the
> results while spending less effort on testing and bug fixing.

I'd like to point out that PyXML already has such a thing. I run it
regularly before building releases, and won't produce a release that
has new test failures. Of course, additions to this test suite are
infrequent.

> I suggest we look at using pyUnit for this.  I only looked at it for
> a few minutes, but it looks promising.  It might make sense to use
> the OASIS parser test cases as a part of the test suite.

Currently, the PyXML test suite uses regrtest for many tests; 4DOM has
its own framework. Could you please say what is wrong with these
frameworks? It seems that we don't really need a new framework; we
need more tests.

Of course, if somebody would contribute additional tests, requiring a
new framework would be acceptable if we can bundle the framework with
PyXML.

> Second, I think the road map should include directions for future work.

I'd avoid maintaining a pure wishlist. Additions to the roadmap should
include commitments of individual contributors to actually contribute;
ideally with a commitment to contribute at a specific time in the
future (which may be well several months from now).

Otherwise, people will think that they will get something soon, only
to find out that they did not get it two years from now.

> Xml Schemas would seem to be a prime candidate.  Is anyone working
> or wanting to work on py-xml-xchemas?  Can we get some of Henry
> Thompson's code?  What about an API for xml schemas? Can we take the
> lead in that? Or do we not want to (or no one is personally
> interested?).  Let's get it into the Roadmap.

These are good questions. Without answers, I'd like to avoid giving
the impression that any work on this is actually done.

E.g. if somebody stands up and offers to define an XML Schema API,
that would be a good thing to add to the roadmap, since it gives
people a contact point, and may keep discussion alive.

> Then there are the non-standards things.  Is pyXml going to do
> anything with RDF? Topic maps? What else?  Into the roadmap, even if
> there is no one to work on such projects at the moment.

Please, no. Maybe I misunderstand the purpose of this document. If so,
can you please explain what its purpose is?

> Finally, let's add some direction for some of the other efforts that keep
> popping up, like miniDOM.  How will it fit into the picture.  We've been
> talking about it recently.  Into the roadmap, I say!

I think the direction of minidom should be best documented in the
minidom documentation. If anybody can provide a specific patch against
the minidom documentation, I'm sure there is interest in discussing
that. When that is documented, it could give a clear guideline for the
maintenance of the package.

Regards,
Martin


From tpassin@home.com  Sat Feb 17 19:12:10 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 17 Feb 2001 14:12:10 -0500
Subject: [XML-SIG] Roadmap document - finally!
References: <m3ofw1v216.fsf@lambda.garshol.priv.no> <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de>
Message-ID: <003401c09915$878e8080$7cac1218@reston1.va.home.com>

Martin v. Loewis wrote

>
> I'd avoid maintaining a pure wishlist. Additions to the roadmap should
> include commitments of individual contributors to actually contribute;
> ideally with a commitment to contribute at a specific time in the
> future (which may be well several months from now).
>
> Otherwise, people will think that they will get something soon, only
> to find out that they did not get it two years from now.
...
I see the roadmap as more of a guide than a wishlist.  To the extent that "we"
have an idea of where we'd like to go, it should get into the roadmap.  If
there are some projects that have no contributor right now, the roadmap would
show that there is a hole.  The "Documentation" item in the current Roadmap is
an example.  Perhaps someone will decide to fill it.  The wish-list things I
see as different (although there is probably no clear line).  A roadmap like
this could also help people coordinate things, since some things  might need
to happen before others.

> Please, no. Maybe I misunderstand the purpose of this document. If so,
> can you please explain what its purpose is?
>

Maybe "roadmap" isn't the best term, then.  Lars might want to say what he
thought it was going to be, since he's the one who posted it.

Regards,

Tom P


From fdrake@acm.org  Sat Feb 17 19:05:38 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 17 Feb 2001 14:05:38 -0500 (EST)
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de>
References: <m3ofw1v216.fsf@lambda.garshol.priv.no>
 <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
 <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de>
Message-ID: <14990.52098.942491.452239@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > Of course, if somebody would contribute additional tests, requiring a
 > new framework would be acceptable if we can bundle the framework with
 > PyXML.

  This might be a good time to note that some of us at Digital
Creations (mostly Martijn Pieters) have created a DOM test suite that
can test for DOM Level 1 & 2 compliance of the "Core" and "XML"
features (so far); we hope to make this a standard test for Python DOM
implementations.
  The XML crew at DC will have to talk about how to make the suite
readily available, but I hope it won't be too far off.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From uche.ogbuji@fourthought.com  Sat Feb 17 19:39:43 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 17 Feb 2001 12:39:43 -0700
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: Message from "Fred L. Drake, Jr." <fdrake@acm.org>
 of "Sat, 17 Feb 2001 14:05:38 EST." <14990.52098.942491.452239@cj42289-a.reston1.va.home.com>
Message-ID: <200102171939.MAA16669@localhost.localdomain>

> 
> Martin v. Loewis writes:
>  > Of course, if somebody would contribute additional tests, requiring a
>  > new framework would be acceptable if we can bundle the framework with
>  > PyXML.
> 
>   This might be a good time to note that some of us at Digital
> Creations (mostly Martijn Pieters) have created a DOM test suite that
> can test for DOM Level 1 & 2 compliance of the "Core" and "XML"
> features (so far); we hope to make this a standard test for Python DOM
> implementations.
>   The XML crew at DC will have to talk about how to make the suite
> readily available, but I hope it won't be too far off.

Lars already has such a beast.  Does your test suite incorporate or work with 
his?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From rob@pangolin.org.uk  Sat Feb 17 19:47:49 2001
From: rob@pangolin.org.uk (rob)
Date: Sat, 17 Feb 2001 19:47:49 +0000
Subject: [XML-SIG] possible bug with xml dom events
Message-ID: <20010217194749.A7835@samantha.inRobsRoom>

hi,

I couldn't see this mentioned anywhere so I thought I mention it if you
change a cdata using "node.nodeValue = xxxx" no DOMCharacterDataModified
event is generated if you do "node.data = xxxxx" the event is generated
properly

Is this a bug or a feature? From reading the stuff at w3c I expected
"node.nodeValue = xxxx " to generate an event

nb i'm using python 2.0 and pyxml 6.3

rob


From MichaelDyck@home.com  Sat Feb 17 21:33:24 2001
From: MichaelDyck@home.com (Michael Dyck)
Date: Sat, 17 Feb 2001 13:33:24 -0800
Subject: [XML-SIG] problems with PyXML 0.6.3
References: <3A88F359.991E26FD@home.com> <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de>
Message-ID: <3A8EEE24.75A025C4@home.com>

"Martin v. Loewis" wrote:
> 
> Michael Dyck wrote:
> > Shouldn't the installer remove or rename the existing _xmlplus dir
> > first?
> 
> Unfortunately, the installer is based on distutils, which does not
> provide such a capability.

In that case, it might be nice for the download page (or message) to advise
the user to do it before running the installer.

> > xmldoc/test:
> >     Either xmldoc/README or (new file) xmldoc/test/README should tell you
> >     how to run the tests in this dir (`python testxml.py -g', I think),
> >     and how to interpret what happens.  Similarly for subdirs.
> >     Maybe tests should be run automatically on installation.
> 
> Not sure about that. Perhaps I should add a note that the tests should
> *not* be run, unless you know what you do.

Yeah, perhaps. But I think there should still be instructions somewhere.
Otherwise the only way to *become* someone who knows what they're doing wrt
tests is to read the code. Or maybe that's sufficient.

> > When I ran one of my DOM programs, I got this exception:
> >         from xml.dom.Node import Node
> >     ImportError: No module named Node
> 
> Yes, xml.dom.Node is gone. Why did you need to import it? If it was to
> get at the node type constants, they live in xml.dom.Node now.

Yup.

> > When I tried removing the ".Node" from the import statement, the
> > program ran as before, so apparently that is the fix, but shouldn't
> > this be noted fairly prominently in xmldoc/README or
> > xmldoc/README.dom?
> 
> Contributions of documentation are welcome. I'd rather not maintain a
> change log of all API changes; having the current state of the API
> documented somewhere would be good, though.

Well, it wouldn't have to be a log of *all* changes. What I'm really
concerned about are the non-backwards-compatible changes.

> > xmldoc/doc/4DOM/index.html has links to ../PACKAGES.html and
> > ../README.html, which do not exist.
> 
> Again, with the next 4DOM update, this might look completely
> different.

Will that be in PyXML 0.6.4?

-Michael


From larsga@garshol.priv.no  Sun Feb 18 15:44:32 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Feb 2001 16:44:32 +0100
Subject: [XML-SIG] pysp released
Message-ID: <m3elwwhu73.fsf@lambda.garshol.priv.no>

I've now put an experimental release of pysp on

  <URL: http://www.garshol.priv.no/download/software/pysp/ >


pysp is a wrapper for the SP SGML parser which can be used to develop
SGML processing applications.

No SAX driver is provided yet, since I'm not entirely certain where to
put it. I think it should be distributed with pysp, but if there are
any other opinions on this I'd like to hear them.

--Lars M.


From larsga@garshol.priv.no  Sun Feb 18 16:29:12 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Feb 2001 17:29:12 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
References: <m3ofw1v216.fsf@lambda.garshol.priv.no> <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com>
Message-ID: <m3ae7khs4n.fsf@lambda.garshol.priv.no>

Tom: thank you for this posting. You managed to start a discussion of
lots of issues that I've wanted to see discussed for quite a while now.

* Thomas B. Passin
| 
| I'd like to suggest a few things, and see what people think.  First
| of all, I think we need to address testing and especially regression
| testing.  From reading various posts lately, it seems like a lot of
| things pop up, get fixed in some version on the cvs tree, and later
| on, who knows which version has what fixed, or how to prevent it
| from popping up again.

I certainly agree with this. As you can see from the roadmap I plan to
improve the SAX test suite to ensure that it is well tested.

xmlproc already has a good test suite. I don't believe anything more
is needed there. 

javadom has an acceptable test suite, which, BTW, can be applied to
any Python DOM implementation. Doing this might be a good idea. The
test suite could be larger, but for something as seemingly little-used
as javadom it probably is not worthwhile.
 
| I suggest we look at using pyUnit for this.  I only looked at it for
| a few minutes, but it looks promising.  It might make sense to use
| the OASIS parser test cases as a part of the test suite.
 
This is what test_javadom uses and it worked very well for that test
case. This also has the benefit that PyUnit is already in the
package. :)

For some test suites, however, PyUnit is not suitable. The xmlproc
tests use a homespun set of scripts because most of them parse an XML
document and produce some output that is then compared with the output
from a baseline run. PyUnit is not very suitable for this. (There are
some API tests, however, that are tested with PyUnit.)

So the question is, I guess, what is there that needs to be improved
about the current test suite? The SAX tests for sure. Do we need a
description of how to run it and how to add new tests? Does the suite
need tighter integration?

| Second, I think the road map should include directions for future
| work.  What's in there now is mostly finishing up on current work.
| What might we want to get into?  One thing is to keep the standard
| tools up with newer versions of existing W3C Recs.  This would
| include DOM 3, and the new releases of xpath, xslt, and xpointer.
| We did this for SAX2, and surely we will want/need to do the same
| for the other key recs.  Let's sketch out these intents in the
| Roadmap.

I agree with this, though I also agree with Martin that it might be
confusing if we do this. So if we do, let's make sure that the text
leaves no doubt that these are wishes for the future rather than
planned work.
 
| Next in the way of future directions would be important new Recs.
| Xml Schemas would seem to be a prime candidate.  Is anyone working
| or wanting to work on py-xml-xchemas? 

Not me. If I wanted to do something like this I'd start with Hook,
RELAX and TREX, in that order.

Other than that I agree. If we can agree that we want it it might be
useful to list it as an open task.
 
| Then there are the non-standards things.  Is pyXml going to do
| anything with RDF? Topic maps? What else?  Into the roadmap, even if
| there is no one to work on such projects at the moment.

I think RDF and topic maps are both outside the scope of the XML-SIG.
Neither are really XML standards.
 
| Finally, let's add some direction for some of the other efforts that
| keep popping up, like miniDOM.  How will it fit into the picture.
| We've been talking about it recently.  Into the roadmap, I say!
 
If there is anything that needs to be done about minidom, then, yes, I
think it should go in.

| I apologise for the length of this post, but there is a lot to think
| about here!

There sure is. :)

--Lars M.


From larsga@garshol.priv.no  Sun Feb 18 16:34:16 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Feb 2001 17:34:16 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <003401c09915$878e8080$7cac1218@reston1.va.home.com>
References: <m3ofw1v216.fsf@lambda.garshol.priv.no> <000b01c09901$3c5fc420$7cac1218@reston1.va.home.com> <200102171748.f1HHmcr09590@mira.informatik.hu-berlin.de> <003401c09915$878e8080$7cac1218@reston1.va.home.com>
Message-ID: <m38zn4hrw7.fsf@lambda.garshol.priv.no>

* Thomas B. Passin
| 
| Maybe "roadmap" isn't the best term, then.  Lars might want to say
| what he thought it was going to be, since he's the one who posted
| it.

My idea was to have a single document that people could look at to see
where XML-SIG development is headed, what is going on and what may
show up in the future. Martin is of course right that adding pure
wishlist items may turn out to be disinformation, but I guess that can
be avoided by putting a warning in the document.

--Lars M.


From larsga@garshol.priv.no  Sun Feb 18 16:45:52 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Feb 2001 17:45:52 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <200102171719.KAA07503@localhost.localdomain>
References: <200102171719.KAA07503@localhost.localdomain>
Message-ID: <m37l2ohrcv.fsf@lambda.garshol.priv.no>

* Tom Passin
|
| Second, I think the road map should include directions for future work.
| What's in there now is mostly finishing up on current work.  What might we
| want to get into?  One thing is to keep the standard tools up with newer
| versions of existing W3C Recs.  This would include DOM 3,

* Uche Ogbuji
| 
| On its way.

Should I add a 4DOM section and note that DOM 3 support is in the
pipeline?
 
| But I know, I know, someone will have to implement XSchemas for
| maximum Python Buzzworthiness.

Henry Thompson has already done this for us. As far as I understand
his implementation can be used with any parser, so we may want to make
a SAX filter that can do schema validation based on his stuff. Does
anyone have opinions on this?
 
| [Topic maps]
| I think Lars and Geir are manning this fort.

Geir Ove is probably not going to work much more on tmproc. At least
not in the near future. (He's got a commercial Java implementation to
worry about.) I'll probably have to add SAX 2.0 and XTM support at
some stage, but those holding their breaths waiting for this do so at
their peril.
 
| Off-head:
| 
| XQL has finally awoken from its funk

Would be interesting to see an implementation based on DbDom.

| Experimental parser-level XInclude and XML:Base support maybe

I would say that this belongs in SAX filters.  This is planned for
saxtools. 

| A low-level Infoset API would be interesting

Personally I would prefer to see a nice tree-based XML API. My
personal opinion is that the DOM stinks and needs replacement.  Sean
McGrath's xTree looks far better, in my opinion.

| Schematron implemented in Python rather than XSLT
| RELAX
| TREX

Yes.

| UDDI
| WebDAV client services

Maybe, though probably not in the XML-SIG package.

--Lars M.


From larsga@garshol.priv.no  Sun Feb 18 16:47:17 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 18 Feb 2001 17:47:17 +0100
Subject: [XML-SIG] Re: [XML-checkins] CVS: www/htdocs/topics roadmap.ht,1.1,1.2
In-Reply-To: <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de>
References: <E14U7ty-00076n-00@usw-pr-cvs1.sourceforge.net> <200102171544.f1HFipq08749@mira.informatik.hu-berlin.de> <m3elwxuvt7.fsf@lambda.garshol.priv.no> <200102171721.f1HHLO109490@mira.informatik.hu-berlin.de>
Message-ID: <m366i8hrai.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| Please leave it as a reminder.

Me do.
 
| Thanks. This is accurate enough. I'm looking forward to the
| integration of the current xmlproc then, since I'd like to look into
| generating Unicode strings in xmlproc myself, unless this is already
| done.

It is not done, and we are now two people looking forward to this.
I've been itching to do this ever since the first Python 2.0 beta.
We'll see who gets there first. :-)

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Sun Feb 18 21:13:02 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 18 Feb 2001 22:13:02 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <m37l2ohrcv.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 18 Feb 2001 17:45:52 +0100)
References: <200102171719.KAA07503@localhost.localdomain> <m37l2ohrcv.fsf@lambda.garshol.priv.no>
Message-ID: <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de>

> Henry Thompson has already done this for us. As far as I understand
> his implementation can be used with any parser, so we may want to
> make a SAX filter that can do schema validation based on his
> stuff. Does anyone have opinions on this?

Assuming you are talking about XSV
(http://dev.w3.org/cvsweb/xmlschema/), I had a short look at this once
when I studied XPath. Unless I'm missing something obvious, it seems
that the XPath support in it is quite incomplete. E.g. where is the
evaluation of binary operators, or function calls to the builtin
functions?

Appart from that, I find the implementation strategy for XPath, well,
interesting...

I can't comment on the schema validation itself, as I don't understand
that spec at all (I haven't even read it).

> | A low-level Infoset API would be interesting
> 
> Personally I would prefer to see a nice tree-based XML API. My
> personal opinion is that the DOM stinks and needs replacement.  Sean
> McGrath's xTree looks far better, in my opinion.

XSV also has a file called XMLInfoset.py. I'm not sure how that
integrates with a parser; you may need to use LT XML.

Regards,
Martin


From Nicolas.Chauvat@logilab.fr  Mon Feb 19 09:25:59 2001
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Mon, 19 Feb 2001 10:25:59 +0100 (CET)
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <m38zn4hrw7.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.21.0102191024590.32648-100000@aries>

On 18 Feb 2001, Lars Marius Garshol wrote:

> My idea was to have a single document that people could look at to see
> where XML-SIG development is headed, what is going on and what may
> show up in the future. Martin is of course right that adding pure
> wishlist items may turn out to be disinformation, but I guess that can
> be avoided by putting a warning in the document.

FWIW, I'm voting +1 on that.

--=20
Nicolas Chauvat

http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)


From larsga@garshol.priv.no  Mon Feb 19 10:06:03 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 19 Feb 2001 11:06:03 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de>
References: <200102171719.KAA07503@localhost.localdomain> <m37l2ohrcv.fsf@lambda.garshol.priv.no> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de>
Message-ID: <m33ddbj8c4.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| Assuming you are talking about XSV
| (http://dev.w3.org/cvsweb/xmlschema/), 

I was.

| Appart from that, I find the implementation strategy for XPath, well,
| interesting...

How so? You've made me curious now. :)
 
| XSV also has a file called XMLInfoset.py. I'm not sure how that
| integrates with a parser; you may need to use LT XML.

It doesn't integrate directly.  XMLInfoset.py is just the data
structure. The LTXMLInfoset.py module has the code for using LTXML to
build a data structure. As far as I can tell no other parsers are
used, but it seems that layer.py is the place to look to integrate
them.

It also seems that a SAX filter may be difficult, because from what I
can tell one needs to build the entire tree before validating.

--Lars M.


From mj@digicool.com  Mon Feb 19 10:48:27 2001
From: mj@digicool.com (Martijn Pieters)
Date: Mon, 19 Feb 2001 11:48:27 +0100
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: <200102171939.MAA16669@localhost.localdomain>; from uche.ogbuji@fourthought.com on Sat, Feb 17, 2001 at 12:39:43PM -0700
References: <fdrake@acm.org> <200102171939.MAA16669@localhost.localdomain>
Message-ID: <20010219114827.B28553@zopatista.com>

--XsQoSWH+UP9D9v3l
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Sat, Feb 17, 2001 at 12:39:43PM -0700, Uche Ogbuji wrote:
> >   This might be a good time to note that some of us at Digital
> >   Creations (mostly Martijn Pieters) have created a DOM test suite
> >   that can test for DOM Level 1 & 2 compliance of the "Core" and "XML"
> >   features (so far); we hope to make this a standard test for Python
> >   DOM implementations.
> >
> >   The XML crew at DC will have to talk about how to make the suite
> >   readily available, but I hope it won't be too far off.
> 
> Lars already has such a beast.  Does your test suite incorporate or work
> with his?

I cannot find any references to Lars' test suite; so I don't know if it
will work with his.

Maybe a small overview of what our suite does may help:

- We use PyUnit; the whole Zope testing framework is based on it.

- The suite tests only for DOM compliance, nothing implementation specific
  should be in there. There are some python binding tests, we may want to
  move those out.

- The tests are organized by interface; the test classes follow the same
  inheritence structure as the interfaces in the DOM specs. So the
  CDATASection interface tests inherit the Text interface tests, which in
  turn inherit the Node interface tests. This has made the tests far more
  complete.

- The test suites are further organised by feature set and compliance
  level. There are seperate files for Core level 1 and Core level 2 tests,
  and the same for the XML tests. Adding tests for a different DOM feature
  is trivial.

- The "Core" feature is almost fully tested now; only some
  NO_MODIFICATION_ALLOWED and default attribute situations aren't tested
  for yet.

- The "XML" feature tests are still missing Entity and Notation Node
  tests; adding these is my next priority.

- I have made a first go at tests for the "Traversal" feature; only the
  DocumentTraversal interface is tested.

- DOMString and text manipulating interface methods are not tested beyond
  ASCII text due to an implementation limitation of ParsedXML.DOM. So,
  implementations will not be tested if text is correctly treated when
  multi-byte UTF-16 characters are involved.

- Currently, about 650 tests will be run on a DOM supporting all the
  features we can test for.

To obtain the tests, you'll have to do a CVS checkout from cvs.zope.org:

  % cvs -d :pserver:anonymous@cvs.zope.org:/cvs-repository login
  (Logging in to anonymous@cvs.zope.org)
  CVS Password: anonymous                 # So the password is 'anonymous'

  % cvs -z7 -d :pserver:anonymous@cvs.zope.org:/cvs-repository checkout \
            -d DOMTests Products/DC/ParsedXML/test/domapi

To test a DOM implementation, you need to pass in your DOMImplementation
object, and a parsing method that will create a DOM tree for a given XML
string. The latter is used to create Notation, Entity and default Attr
Nodes, which you can't produce with the current DOM API.

I attached a sample script which tests the PyXML DOM; it assumes you made
a stand-alone checkout of the tests as described above into a DOMTests
directory on the Python path. It requires a patched PyXML that will return
true on DOMImplementation.hasFeature('Core', '2.0') (fixed in the
FourThought CVS, I believe). See bug #132683 on SourceForge (now closed).

When running the tests, there are three that trigger an infinite loop in
the PyXML 0.6.3 suite. When a test seems to take too long, a keyboard
interrupt will cause PyUnit to skip to the next test (and log a traceback
on KeyboardInterrupt for the offending test).

-- 
Martijn Pieters
| Software Engineer  mailto:mj@digicool.com
| Digital Creations  http://www.digicool.com/
| Creators of Zope   http://www.zope.org/
---------------------------------------------

--XsQoSWH+UP9D9v3l
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="test_PyXMLDOM.py"

#!/usr/bin/env python
from xml.dom import implementation
from xml.dom.ext.reader.Sax2 import Reader
from DOMTests import DOMImplementationTestSuite

try:
    from cStringIO import StringIO
except ImportError:
    from StringIO import StringIO

def Sax2ParseString(self, xml):
    file = StringIO(xml)
    return Reader().fromStream(file)

def test_suite():
    """Create a test suite for a DOM implementation."""
    return DOMImplementationTestSuite(implementation, Sax2ParseString)

if __name__ == '__main__':
    import unittest
    unittest.TextTestRunner().run(test_suite())

--XsQoSWH+UP9D9v3l--


From martin@loewis.home.cs.tu-berlin.de  Mon Feb 19 20:08:40 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Feb 2001 21:08:40 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <m33ddbj8c4.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 19 Feb 2001 11:06:03 +0100)
References: <200102171719.KAA07503@localhost.localdomain> <m37l2ohrcv.fsf@lambda.garshol.priv.no> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> <m33ddbj8c4.fsf@lambda.garshol.priv.no>
Message-ID: <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de>

> | Appart from that, I find the implementation strategy for XPath, well,
> | interesting...
> 
> How so?

Well, try to understand

  def parse(self,str):
    disjuncts=map(lambda s:string.split(s,'/'),string.split(str,'|'))
    return map(lambda d,ss=self:map(lambda p,s=ss:s.patBit(p),
                                    d),
               disjuncts)

where patbit will return things like

  return lambda e,y=None,s=self,a=part,ns=ns:s.attrs(e,a,ns,y)

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb 19 20:59:18 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Feb 2001 21:59:18 +0100
Subject: [XML-SIG] problems with PyXML 0.6.3
In-Reply-To: <3A8EEE24.75A025C4@home.com> (message from Michael Dyck on Sat,
 17 Feb 2001 13:33:24 -0800)
References: <3A88F359.991E26FD@home.com> <200102171706.f1HH6Ls09345@mira.informatik.hu-berlin.de> <3A8EEE24.75A025C4@home.com>
Message-ID: <200102192059.f1JKxIL01502@mira.informatik.hu-berlin.de>

> In that case, it might be nice for the download page (or message) to advise
> the user to do it before running the installer.

Ok, the next release will do this in the installer.

> Yeah, perhaps. But I think there should still be instructions
> somewhere.  Otherwise the only way to *become* someone who knows
> what they're doing wrt tests is to read the code. Or maybe that's
> sufficient.

Again: contributions are welcome. I personally won't change the status
quo in this respect.

> Well, it wouldn't have to be a log of *all* changes. What I'm really
> concerned about are the non-backwards-compatible changes.

Same issue: I'd be happy if there was any documentation describing the
current API in detail; I cannot find the time to produce a detailed
report of what has changed between releases - especially if these
packages are updated by third-party contributors.

It is much easier if people that run into problems report them, and
ask for help in porting to a new release. If many people are affected,
and no easy transition is possible, API breakage should be considered
a bug and fixed in a subsequent release, instead of being documented.

> > Again, with the next 4DOM update, this might look completely
> > different.
> 
> Will that be in PyXML 0.6.4?

Probably yes; I hope that the 4DOM integration will happen RSN.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon Feb 19 21:11:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Feb 2001 22:11:47 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102102213.RAA28403@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Sat, 10 Feb 2001 17:13:23 -0500)
References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com>
Message-ID: <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de>

> > xml_dom_object = reader.fromUri(filename) #should work for either
> > URL or file

> Let's talk about this comment.  Is it really a good idea to build URL
> access right into the API here?

I can't find out whether this has been settled. Did you propose to
drop the support for URLs in the API, or the one for local files.

We just had a report where urllib apparently decided to use "c" as the
protocol name; I'm not entirely sure what the exact cause was.

> Case in point: I found this bit in saxutilx.py:
> 
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> 
> Now I don't know under which circumstances this get triggered (the
> context is obscure)

prepare_input_source is invoked by every parser when processing the
argument to .parse(), so the common usage is

  p = make_parser()
  p.setContentHandler(something)
  p.parse(filename)

Instead of filename, you can have URLs, stream, and InputSource
objects (the Java API only supports InputSource here).

> but I'd say it's a bad idea to just try to open a URL when a string
> isn't a local file.  Maybe *you* live in a world where the network
> is "always on" (and I do too!), but for plenty of folks, it's rather
> annoying to find that their modem starts dialing out each time they
> make a typo in a filename.

But would the modem actually start dialling? Wouldn't it rather
determine that the protocol is "file" and the report that the file is
missing? So I think it would either report an unknown url type, or an
ENOENT. What kind of typo did you think of?

> The application knows this, but the library doesn't.  It's also fine
> to have an alternative API that takes a URL instead of a local
> filename -- but it's not okay to attempt to overlap the two
> namespaces.

The application can always make sure that the right thing is processed
by opening it itself, and then passing that to the parser.

Regards,
Martin


From larsga@garshol.priv.no  Mon Feb 19 21:31:27 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 19 Feb 2001 22:31:27 +0100
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de>
References: <200102171719.KAA07503@localhost.localdomain> <m37l2ohrcv.fsf@lambda.garshol.priv.no> <200102182113.f1ILD2U01322@mira.informatik.hu-berlin.de> <m33ddbj8c4.fsf@lambda.garshol.priv.no> <200102192008.f1JK8em01092@mira.informatik.hu-berlin.de>
Message-ID: <m37l2mz7f4.fsf@lambda.garshol.priv.no>

* Martin v. Loewis
| 
| Well, try to understand
| 
|   def parse(self,str):
|     disjuncts=map(lambda s:string.split(s,'/'),string.split(str,'|'))
|     return map(lambda d,ss=self:map(lambda p,s=ss:s.patBit(p),
|                                     d),
|                disjuncts)
| 
| where patbit will return things like
| 
|   return lambda e,y=None,s=self,a=part,ns=ns:s.attrs(e,a,ns,y)

I see what you mean.  Interesting, indeed.  :-)

--Lars M.


From guido@digicool.com  Mon Feb 19 21:49:20 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 19 Feb 2001 16:49:20 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Your message of "Mon, 19 Feb 2001 22:11:47 +0100."
 <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de>
References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com>
 <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de>
Message-ID: <200102192149.QAA24348@cj20424-a.reston1.va.home.com>

> > > xml_dom_object = reader.fromUri(filename) #should work for either
> > > URL or file
> 
> > Let's talk about this comment.  Is it really a good idea to build URL
> > access right into the API here?
> 
> I can't find out whether this has been settled. Did you propose to
> drop the support for URLs in the API, or the one for local files.

I'd like to drop support for URLs; I don't think the typical computer
is sufficiently networked to make this work well.

> We just had a report where urllib apparently decided to use "c" as the
> protocol name; I'm not entirely sure what the exact cause was.

That's the ambiguity between local filenames and URLs.  You have to
decide whether filenames passed to APIs are in local filename space or
in URL space, and not try to guess based on what the name looks like.
On the Mac, all absolute filenames look like foo:bar or
foo:bar:bletch, so there you have even less to work with.

> > Case in point: I found this bit in saxutilx.py:
> > 
> >         if os.path.isfile(sysid):
> >             basehead = os.path.split(os.path.normpath(base))[0]
> >             source.setSystemId(os.path.join(basehead, sysid))
> >             f = open(sysid, "rb")
> >         else:
> >             source.setSystemId(urlparse.urljoin(base, sysid))
> >             f = urllib.urlopen(source.getSystemId())
> > 
> > Now I don't know under which circumstances this get triggered (the
> > context is obscure)
> 
> prepare_input_source is invoked by every parser when processing the
> argument to .parse(), so the common usage is
> 
>   p = make_parser()
>   p.setContentHandler(something)
>   p.parse(filename)
> 
> Instead of filename, you can have URLs, stream, and InputSource
> objects (the Java API only supports InputSource here).

I would suggest to have separate APIs depending on the argument type,
e.g. p.parseFile(filename), p.parseURL(url),
p.parseStream(InputSource), p.parseString(text).  (And no, Java
overloading wouldn't help much here, since three out of four APIs have
string arguments.)

> > but I'd say it's a bad idea to just try to open a URL when a string
> > isn't a local file.  Maybe *you* live in a world where the network
> > is "always on" (and I do too!), but for plenty of folks, it's rather
> > annoying to find that their modem starts dialing out each time they
> > make a typo in a filename.
> 
> But would the modem actually start dialling? Wouldn't it rather
> determine that the protocol is "file" and the report that the file is
> missing? So I think it would either report an unknown url type, or an
> ENOENT. What kind of typo did you think of?

Maybe I was thinking of another case (not involving PyXML) that was
reported to me third hand, where a filename containing a colon on
Windows (using Cygwin tools) ended up being interpreted as Unix rcp
filename syntax, and the system was doing a host lookup on the part
before the colon -- that really does make the modem dial!

> > The application knows this, but the library doesn't.  It's also fine
> > to have an alternative API that takes a URL instead of a local
> > filename -- but it's not okay to attempt to overlap the two
> > namespaces.
> 
> The application can always make sure that the right thing is processed
> by opening it itself, and then passing that to the parser.

Sure, and if a string is given, it should be assumed to be a local
filename unless the API name has "URL" in it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From uche.ogbuji@fourthought.com  Mon Feb 19 22:06:44 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 15:06:44 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Mon, 19 Feb 2001 16:49:20 EST." <200102192149.QAA24348@cj20424-a.reston1.va.home.com>
Message-ID: <200102192206.PAA15107@localhost.localdomain>

> > > > xml_dom_object = reader.fromUri(filename) #should work for either
> > > > URL or file
> > 
> > > Let's talk about this comment.  Is it really a good idea to build URL
> > > access right into the API here?
> > 
> > I can't find out whether this has been settled. Did you propose to
> > drop the support for URLs in the API, or the one for local files.
> 
> I'd like to drop support for URLs; I don't think the typical computer
> is sufficiently networked to make this work well.

In this case, the typical computer user will have a great deal of trouble 
using any XML application in any language.  Almost all of them use URIs as 
basis, and for good reason.  Special support for local files are almost 
universally a mere convenience.

Most XML processing specifications mandate that the URI of the XML entity that 
contains an infoset node is used as the basis for further processing.  To me, 
this argues strongly for dropping local files rather than URIs if we must 
choose.  Some XML specs would be very difficult to implement properly if the 
low-level tools became file-system-only readers.

> > We just had a report where urllib apparently decided to use "c" as the
> > protocol name; I'm not entirely sure what the exact cause was.
> 
> That's the ambiguity between local filenames and URLs.  You have to
> decide whether filenames passed to APIs are in local filename space or
> in URL space, and not try to guess based on what the name looks like.
> On the Mac, all absolute filenames look like foo:bar or
> foo:bar:bletch, so there you have even less to work with.

The Mac people should have spoken to the IETF a decade ago when URLs emerged, 
or a bit later when URIs came out.  I suspect, again that if this is the case, 
they suffer much more pain in XML processing than is inflicted on them by 
PyXML.

> > > Case in point: I found this bit in saxutilx.py:
> > > 
> > >         if os.path.isfile(sysid):
> > >             basehead = os.path.split(os.path.normpath(base))[0]
> > >             source.setSystemId(os.path.join(basehead, sysid))
> > >             f = open(sysid, "rb")
> > >         else:
> > >             source.setSystemId(urlparse.urljoin(base, sysid))
> > >             f = urllib.urlopen(source.getSystemId())
> > > 
> > > Now I don't know under which circumstances this get triggered (the
> > > context is obscure)
> > 
> > prepare_input_source is invoked by every parser when processing the
> > argument to .parse(), so the common usage is
> > 
> >   p = make_parser()
> >   p.setContentHandler(something)
> >   p.parse(filename)
> > 
> > Instead of filename, you can have URLs, stream, and InputSource
> > objects (the Java API only supports InputSource here).
> 
> I would suggest to have separate APIs depending on the argument type,
> e.g. p.parseFile(filename), p.parseURL(url),
> p.parseStream(InputSource), p.parseString(text).  (And no, Java
> overloading wouldn't help much here, since three out of four APIs have
> string arguments.)

Sure, one can add a parseFile, but what do you do with

<?xml version='1.0'?>
<!DOCTYPE spam [
  <!ENTITY foo SYSTEM 'foo.bar'>
]>
<spam>&foo;</spam>

URI or file?

Note that this is a trick question, and the "trick" is *exactly* my point.

> > > but I'd say it's a bad idea to just try to open a URL when a string
> > > isn't a local file.  Maybe *you* live in a world where the network
> > > is "always on" (and I do too!), but for plenty of folks, it's rather
> > > annoying to find that their modem starts dialing out each time they
> > > make a typo in a filename.
> > 
> > But would the modem actually start dialling? Wouldn't it rather
> > determine that the protocol is "file" and the report that the file is
> > missing? So I think it would either report an unknown url type, or an
> > ENOENT. What kind of typo did you think of?
> 
> Maybe I was thinking of another case (not involving PyXML) that was
> reported to me third hand, where a filename containing a colon on
> Windows (using Cygwin tools) ended up being interpreted as Unix rcp
> filename syntax, and the system was doing a host lookup on the part
> before the colon -- that really does make the modem dial!

Yes, but that does sound like a bug elsewhere.

> > > The application knows this, but the library doesn't.  It's also fine
> > > to have an alternative API that takes a URL instead of a local
> > > filename -- but it's not okay to attempt to overlap the two
> > > namespaces.
> > 
> > The application can always make sure that the right thing is processed
> > by opening it itself, and then passing that to the parser.
> 
> Sure, and if a string is given, it should be assumed to be a local
> filename unless the API name has "URL" in it.

It's not all that easy, as evidenced by my example above.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Feb 19 22:22:15 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 15:22:15 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Uche Ogbuji <uche.ogbuji@fourthought.com>
 of "Mon, 19 Feb 2001 15:06:44 MST." <200102192206.PAA15107@localhost.localdomain>
Message-ID: <200102192222.PAA16128@localhost.localdomain>

> Sure, one can add a parseFile, but what do you do with
> 
> <?xml version='1.0'?>
> <!DOCTYPE spam [
>   <!ENTITY foo SYSTEM 'foo.bar'>
> ]>
> <spam>&foo;</spam>
> 
> URI or file?
> 
> Note that this is a trick question, and the "trick" is *exactly* my point.

On re-reading, it seems as if I'm trying to be coy, but I'm not.

My point is that "foo.bar" must be evaluated against the base URI of the 
entity in which it is contained.  Here we have no choice of letting the user 
say "parseFile" or "parseUri".

The same trap is all over the place:

<foo xml:base='python.org'/>

<xsl:import href='style-lib.xslt'/>

<xsl:include href='cool-template.xslt'/>

<spam>
<xinclude:include href='wannabe-entity.xml'/>
</spam>

<rdf:Description about='inline-web-page' dc:creator='Uche Ogbuji'/>

<mylink xlink:type="simple"
        xlink:href="destination.xml"
        xlink:actuate="onLoad"
        xlink:show="replace"
        attribute="value"
/>

Basically, if you want to play with XML, you have to play with URI.  There's 
not much for it.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From guido@digicool.com  Mon Feb 19 22:34:16 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 19 Feb 2001 17:34:16 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Your message of "Mon, 19 Feb 2001 15:06:44 MST."
 <200102192206.PAA15107@localhost.localdomain>
References: <200102192206.PAA15107@localhost.localdomain>
Message-ID: <200102192234.RAA24747@cj20424-a.reston1.va.home.com>

> > I'd like to drop support for URLs; I don't think the typical computer
> > is sufficiently networked to make this work well.
> 
> In this case, the typical computer user will have a great deal of trouble 
> using any XML application in any language.  Almost all of them use URIs as 
> basis, and for good reason.  Special support for local files are almost 
> universally a mere convenience.
> 
> Most XML processing specifications mandate that the URI of the XML
> entity that contains an infoset node is used as the basis for
> further processing.  To me, this argues strongly for dropping local
> files rather than URIs if we must choose.  Some XML specs would be
> very difficult to implement properly if the low-level tools became
> file-system-only readers.

Can you give more details of how this is used?  I've got very limited
XML experience, and so far it all falls in the category of "here's a
file; give me a DOM tree for it" or "here's a DOM tree, write it to a
file".  There are no URLs anywhere.  Sometimes instead of a file it'll
be text data read from or written to a database.  But no URLs.

> The Mac people should have spoken to the IETF a decade ago when URLs
> emerged, or a bit later when URIs came out.  I suspect, again that
> if this is the case, they suffer much more pain in XML processing
> than is inflicted on them by PyXML.

That's a pretty intolerant attitude you're displaying there.  They
need not suffer at all if at all times it is clear whether a name is a
URL or a filename.  It's trying to fold the two namespaces into one
that I'm fighting here.

> > I would suggest to have separate APIs depending on the argument type,
> > e.g. p.parseFile(filename), p.parseURL(url),
> > p.parseStream(InputSource), p.parseString(text).  (And no, Java
> > overloading wouldn't help much here, since three out of four APIs have
> > string arguments.)
> 
> Sure, one can add a parseFile, but what do you do with
> 
> <?xml version='1.0'?>
> <!DOCTYPE spam [
>   <!ENTITY foo SYSTEM 'foo.bar'>
> ]>
> <spam>&foo;</spam>
> 
> URI or file?
> 
> Note that this is a trick question, and the "trick" is *exactly* my point.

So explain the trick.  I don't know enough XML to understand what it
means.  I don't even know which thing you are asking about!  spam?
foo?  foo.bar?  &foo;?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Mon Feb 19 22:38:24 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 19 Feb 2001 23:38:24 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102192206.PAA15107@localhost.localdomain> (message from Uche
 Ogbuji on Mon, 19 Feb 2001 15:06:44 -0700)
References: <200102192206.PAA15107@localhost.localdomain>
Message-ID: <200102192238.f1JMcOF06853@mira.informatik.hu-berlin.de>

> Most XML processing specifications mandate that the URI of the XML
> entity that contains an infoset node is used as the basis for
> further processing.

I agree. The XML recommendation is quite clear about this:

# The SystemLiteral is called the entity's system identifier. It is a
# URI, which may be used to retrieve the entity.

So in XML, a system identifier is an URI, even though in SGML, it is
system dependent (as the name suggests). It goes on

# Unless otherwise provided by information outside the scope of this
# specification (...), relative URIs are relative to the location of
# the resource within which the entity declaration occurs. A URI might
# thus be relative to the document entity, to the entity containing
# the external DTD subset, or to some other external parameter entity.

So if a document was downloaded from
http://www.python.org/xml/foo.xml, and encounter a system identifier
of "../bar/bar.dtd", it MUST be interpreted as
http://www.python.org/bar/bar.dtd.

Regards,
Martin


From uche.ogbuji@fourthought.com  Mon Feb 19 22:48:04 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 15:48:04 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Mon, 19 Feb 2001 17:34:16 EST." <200102192234.RAA24747@cj20424-a.reston1.va.home.com>
Message-ID: <200102192248.PAA17821@localhost.localdomain>

> > > I'd like to drop support for URLs; I don't think the typical computer
> > > is sufficiently networked to make this work well.
> > 
> > In this case, the typical computer user will have a great deal of trouble 
> > using any XML application in any language.  Almost all of them use URIs as 
> > basis, and for good reason.  Special support for local files are almost 
> > universally a mere convenience.
> > 
> > Most XML processing specifications mandate that the URI of the XML
> > entity that contains an infoset node is used as the basis for
> > further processing.  To me, this argues strongly for dropping local
> > files rather than URIs if we must choose.  Some XML specs would be
> > very difficult to implement properly if the low-level tools became
> > file-system-only readers.
> 
> Can you give more details of how this is used?  I've got very limited
> XML experience, and so far it all falls in the category of "here's a
> file; give me a DOM tree for it" or "here's a DOM tree, write it to a
> file".  There are no URLs anywhere.  Sometimes instead of a file it'll
> be text data read from or written to a database.  But no URLs.

Sorry.

Basically, it's what you do with the DOM, and especially how attributes, 
system identifiers and other such creatures are interpreted.

Basically, parseFile or parseUri in a top-level URI is typically only a small 
cross-section of the usage pattern in any XML processor.  Other functions such 
as Stylesheet processing, XIncludes, xml:base, RDF, and pretty much anything 
else, gets these strings and are *required* to interpret these as URIs.

If they were originally interpreted purely as files, then all the points of 
confusion you pointed out are immediately compounded as the system tries to 
reconcile the relative URIs against the "base URI" which is actually a file 
system file.

This is actually a problem that I have seen people run into far more often 
than any worries about computers not having network connections.  I'be been 
sorely tempted to remove file support just because it eliminates confusion 
with the large body of XML processing that requires relative URI normalization 
and resolution.

> > The Mac people should have spoken to the IETF a decade ago when URLs
> > emerged, or a bit later when URIs came out.  I suspect, again that
> > if this is the case, they suffer much more pain in XML processing
> > than is inflicted on them by PyXML.
> 
> That's a pretty intolerant attitude you're displaying there.  They
> need not suffer at all if at all times it is clear whether a name is a
> URL or a filename.  It's trying to fold the two namespaces into one
> that I'm fighting here.

Not my intention.  My point is that I can't imagine PyXML is an outstanding 
problem for XML developers on a platform that uses colons as path separators.

It's a purely technical argument.  I don't know a thing about the Mac.

> > > I would suggest to have separate APIs depending on the argument type,
> > > e.g. p.parseFile(filename), p.parseURL(url),
> > > p.parseStream(InputSource), p.parseString(text).  (And no, Java
> > > overloading wouldn't help much here, since three out of four APIs have
> > > string arguments.)
> > 
> > Sure, one can add a parseFile, but what do you do with
> > 
> > <?xml version='1.0'?>
> > <!DOCTYPE spam [
> >   <!ENTITY foo SYSTEM 'foo.bar'>
> > ]>
> > <spam>&foo;</spam>
> > 
> > URI or file?
> > 
> > Note that this is a trick question, and the "trick" is *exactly* my point.
> 
> So explain the trick.  I don't know enough XML to understand what it
> means.  I don't even know which thing you are asking about!  spam?
> foo?  foo.bar?  &foo;?


"foo.bar".  I think I explained it better in my succeeding message.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From larsga@garshol.priv.no  Mon Feb 19 22:49:03 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 19 Feb 2001 23:49:03 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102192149.QAA24348@cj20424-a.reston1.va.home.com>
References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com>               <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> <200102192149.QAA24348@cj20424-a.reston1.va.home.com>
Message-ID: <m3wvamxp9c.fsf@lambda.garshol.priv.no>

* Guido van Rossum
| 
| I'd like to drop support for URLs; I don't think the typical
| computer is sufficiently networked to make this work well.

Dropping support for URLs not really an option when dealing with XML.
The XML recommendation states clearly that all system identifiers[1]
are URIs in XML.

What this really means is that we have two cases to deal with:

  - XML software is provided a reference to an XML document
  - XML document references as used internally by XML software and
    also as passed back out to client software

In the second case the references must be URIs, since it is a
deep-seated assumption in the entire XML family of specifications that
all such references will be URIs. This is especially clear in the case
of entity references (as Uche illustrated), but most other XML
specifications are equally clear on this point, such as the infoset,
XSLT, XBase and so on.

Of course, in the first case there is no reason why it shouldn't be
allowed to pass file names into the APIs to have them converted into
URIs there. In fact, I think there is very good reason to do so, since
my experience with the Java tools that require URIs have been fairly
painful. (Who remembers the precise syntax for file URIs on all kinds
of platforms anyway?)

Outlawing URIs, however is not really an option.

[1] What most people would call 'references to external resources',
    usually files.

| I would suggest to have separate APIs depending on the argument
| type, e.g. p.parseFile(filename), p.parseURL(url),
| p.parseStream(InputSource), p.parseString(text).

That may be a better option than to have a single function/method, but
that is really separate from the issue of whether to allow URIs or
not.

--Lars M.


From larsga@garshol.priv.no  Mon Feb 19 23:03:51 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Feb 2001 00:03:51 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102192234.RAA24747@cj20424-a.reston1.va.home.com>
References: <200102192206.PAA15107@localhost.localdomain> <200102192234.RAA24747@cj20424-a.reston1.va.home.com>
Message-ID: <m3u25qxoko.fsf@lambda.garshol.priv.no>

* Uche Ogbuji
|
| Most XML processing specifications mandate that the URI of the XML
| entity that contains an infoset node is used as the basis for
| further processing.  To me, this argues strongly for dropping local
| files rather than URIs if we must choose.  Some XML specs would be
| very difficult to implement properly if the low-level tools became
| file-system-only readers.

* Guido van Rossum
| 
| Can you give more details of how this is used? 

The simplest example is perhaps

  <!DOCTYPE doc [
    <!ENTITY chapter1 SYSTEM "chapter1.xml">
    <!ENTITY chapter2 SYSTEM "chapter2.xml">
    <!ENTITY chapter3 SYSTEM "chapter3.xml">
    <!-- ... -->
  ]>
  <doc>
  <title>The Meaning of Life</title>

  <part><title>Life, the Universe and Everything</title>
  &chapter1;
  &chapter2;
  &chapter3;
  <!-- ... -->
  </part>
  </doc>

This XML document is really a hub document for a book, which contains
metadata about the book (the title), the part structure and references
to each chapter. The chapters, however, reside in files.  

The XML recommendation says clearly that the bit after the 'SYSTEM'
must be a URI, and that it is turned into an absolute URI by being
resolved against the base URI of the document.

With the XML Base specification you can put attributes named
'xml:base' into your documents to locally change the base URI in a
part of the document. This then interacts with other XML
specifications that allow URI references to appear in the contents of
the document. The XML syntax for topic maps is one example of this.

This does not mean that we can't have a parseFile method, but that the
file name given must be converted into a URI before the XML system
starts using it.

| I've got very limited XML experience, and so far it all falls in the
| category of "here's a file; give me a DOM tree for it" or "here's a
| DOM tree, write it to a file".  There are no URLs anywhere.
| Sometimes instead of a file it'll be text data read from or written
| to a database.  But no URLs.

That is probably the most common use case in the near future, but not
everyone uses XML like that and the entire family of standards assumes
that the basic framework is that of the web. 

Quite a few XML applications work across the network and really rely
on it being possible to parse remote documents (RSS perhaps being the
most famous), and I think this will only be more common in the future.
And in any case it works just fine already. :-)
 
[larsga@pc36 dist]$ python xvcmd.py http://www.w3.org/TR/2000/REC-xml-20001006.xml 
xmlproc version 0.70

Parsing 'http://www.w3.org/TR/2000/REC-xml-20001006.xml'
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:2:9: Attribute 'id'
defined more than once
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:3:9: Attribute 'role'
defined more than once
W:http://www.w3.org/XML/1998/06/xmlspec-v21.dtd:414:9: Attribute
'diff' defined more than once
E:http://www.w3.org/TR/2000/REC-xml-20001006.xml:2816:76: Actual value
of attribute 'xmlns:xlink' does not match fixed value

Parse complete, 1 error(s) and 3 warning(s)


--Lars M.


From tpassin@home.com  Mon Feb 19 23:27:20 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 19 Feb 2001 18:27:20 -0500
Subject: [XML-SIG] Using PyExpat.py
References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com>              <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de>  <200102192149.QAA24348@cj20424-a.reston1.va.home.com>
Message-ID: <002801c09acb$823dca20$7cac1218@reston1.va.home.com>

This discussion highlights  why I've said several times that you should use
file:/// if you mean a file on your local machine.  I've used a few
commandline tools where you actually had to write that (I forget which ones).
I was annoyed at first, but soon got used to it.  As soon as you do insist on
using file:///, distinctions about local files go away, and it becomes the
responsibility of the url handler code to figure out where to go to get that
particular resource.  Also, you can get files on network file systems with no
extra work, as in file://yourcomputer/...

It's a convenience to let the code try to figure it out from a bare filename.
But all that code should do is to translate a bare absolute local file
reference to the file:/// scheme, then hand it off.

Cheers,

Tom P


From fredrik@effbot.org  Mon Feb 19 23:38:04 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Tue, 20 Feb 2001 00:38:04 +0100
Subject: [XML-SIG] Using PyExpat.py
References: <200102192206.PAA15107@localhost.localdomain>
Message-ID: <00ba01c09acd$020db5c0$e46940d5@hagrid>

Uche Ogbuji wrote:
> > > I can't find out whether this has been settled. Did you propose to
> > > drop the support for URLs in the API, or the one for local files.
> > 
> > I'd like to drop support for URLs; I don't think the typical computer
> > is sufficiently networked to make this work well.
> 
> In this case, the typical computer user will have a great deal of trouble 
> using any XML application in any language.  Almost all of them use URIs as 
> basis, and for good reason.  Special support for local files are almost 
> universally a mere convenience.
> 
> Most XML processing specifications mandate that the URI of the XML entity that 
> contains an infoset node is used as the basis for further processing.  To me, 
> this argues strongly for dropping local files rather than URIs if we must 
> choose.  Some XML specs would be very difficult to implement properly if the 
> low-level tools became file-system-only readers.

is the code Guido quoted taken from a utility function (e.g. a standard
input handler), or is it part of the core library:

        if os.path.isfile(sysid):
            basehead = os.path.split(os.path.normpath(base))[0]
            source.setSystemId(os.path.join(basehead, sysid))
            f = open(sysid, "rb")
        else:
            source.setSystemId(urlparse.urljoin(base, sysid))
            f = urllib.urlopen(source.getSystemId())

if the latter, I hope you realize that this can be abused in all sorts of
interesting ways...

Cheers /F


From uche.ogbuji@fourthought.com  Mon Feb 19 23:40:43 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 16:40:43 -0700
Subject: [XML-SIG] ANN: 4Suite and 4Suite Server 0.10.2
Message-ID: <200102192340.QAA21305@localhost.localdomain>

Fourthought, Inc. (http://Fourthought.com) announces the release of

                 4Suite 0.10.2 and 4Suite Server 0.10.2
                      ----------------------------
          Open source XML processing tools and an XML data server

                           http://4Suite.org
                  http://Fourthought.com/4SuiteServer


4Suite News
-----------

	* ODS: optimized back end
	* ODS: Better collection support
	* ODS: DBM and Oracle driver fixes
	* XSLT: format-number overhaul
	* XPath: C boolean extension implemented for performance
	* XPath: Added extension functs search-re, base-uri
	* RDF: serialization fixes
	* RDF: shelve (DBM) driver
	* Localization support
	* Friendlier error messages
	* URI handling fixes
	* Many misc bug-fixes

4Suite Server News
------------------

        * Many usability improvements
	* omniNotify: Removed our implementation of Event Channel and
	              replaced with omniNotify
	* TxFactory: Rewrote to avoid common race conditions
	* Strobe: (formerly Reaper) Added a test harness
	* UserServer:  Moved many user specific things out of the common IDL
	* UserServer:  Added a test harness
	* RdfServer:  Now uses system exceptions for common exception cases.
	* RdfServer:  Added a test harness
	* XmlServer:  Allow Raw files
	* XmlServer:  Now uses the standard system exceptions
	* XmlServer:  Added a proper test harness
	* XmlServer:  Added XSLT-based API to 4SS
	* MetaUserServer:  Completed the implementation
	* MetaUserServer:  Added a proper test harness
	* MetaXmlServer:  Completed the implementation
	* MetaXmlServer:  Added a proper test harness
	* HTTPListener:  Added a test harness
	* HTTPListener:  XSLT support
	* HTTPListener:  Custom handler support
	* webDAV:  Incorporated pydav into 4SS
	* webDAV:  Finished initial implementation
	* All:  Renamed interfaces (where approriate) to follow
	        Create/Fetch/Update/Delete naming convention.
	* All:  Added command-line tools
	* All:  Added console
	* All:  Added populate script to bootstrap useful resources
	* All:  More comprehensive documentation
	* All:  Many, many fixes and optimizations

4Suite is a collection of Python tools for XML processing and object
database management.  It provides support for XML parsing, several
transient and persistent DOM implementations, XPath expressions,
XPointer, XSLT transforms, XLink, RDF and ODMG object databases.

4Suite Server is a platform for XML processing.  It features an XML data
repository, a rules-based engine, and XSLT transforms, XPath and
RDF-based indexing and query, XLink resolution and many other XML
services.  It also supports related services such as distributed
transactions and access control lists.  Along with basic console and
command-line management, it supports remote, cross-platform and
cross-language access through CORBA, WebDAV, HTTP and other request
protocols to be added shortly.

4Suite Server is not meant to be a full-blown application server.  It
provides highly-specialized services for XML processing that can be used
with other application servers.

All the software is open-source and free to download.  Priority support
and customization is available from Fourthought, Inc.  For more
information on this, see the http://FourThought.com, or contact
Fourthought at info@fourthought.com or +1 303 583 9900

More info and Obtaining 4Suite and 4Suite Server
------------------------------------------------

Please see

        http://4Suite.org
        http://Fourthought.com/4SuiteServer

>From where you can download source, Windows and Linux binaries.

4Suite is distributed under a license similar to that of the
Apache Web Server.


From uche.ogbuji@fourthought.com  Mon Feb 19 23:43:24 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 16:43:24 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from "Thomas B. Passin" <tpassin@home.com>
 of "Mon, 19 Feb 2001 18:27:20 EST." <002801c09acb$823dca20$7cac1218@reston1.va.home.com>
Message-ID: <200102192343.QAA21554@localhost.localdomain>

> This discussion highlights  why I've said several times that you should use
> file:/// if you mean a file on your local machine.

Agreed.  That's why I was saying I sometimes had a mind to banish regular file 
names.  People can always use "file:" if they need to.  Sometimes I think the 
extra typing is worth the minimized confusion.

> I've used a few
> commandline tools where you actually had to write that (I forget which ones).
> I was annoyed at first, but soon got used to it.

That was another point I was trying to make: PyXML is hardly unique in this.  
URI is the native form for most compliant XML processors.

> As soon as you do insist on
> using file:///, distinctions about local files go away, and it becomes the
> responsibility of the url handler code to figure out where to go to get that
> particular resource.  Also, you can get files on network file systems with no
> extra work, as in file://yourcomputer/...

Yes.

> It's a convenience to let the code try to figure it out from a bare filename.
> But all that code should do is to translate a bare absolute local file
> reference to the file:/// scheme, then hand it off.

Agreed.  I still think the algorithm you posted, and your follow-up, make most 
sense.

It's just a matter of implementing it.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Mon Feb 19 23:45:51 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 19 Feb 2001 16:45:51 -0700
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Message from "Fredrik Lundh" <fredrik@effbot.org>
 of "Tue, 20 Feb 2001 00:38:04 +0100." <00ba01c09acd$020db5c0$e46940d5@hagrid>
Message-ID: <200102192345.QAA21718@localhost.localdomain>

> Uche Ogbuji wrote:

> > Most XML processing specifications mandate that the URI of the XML entity that 
> > contains an infoset node is used as the basis for further processing.  To me, 
> > this argues strongly for dropping local files rather than URIs if we must 
> > choose.  Some XML specs would be very difficult to implement properly if the 
> > low-level tools became file-system-only readers.
> 
> is the code Guido quoted taken from a utility function (e.g. a standard
> input handler), or is it part of the core library:
> 
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> 
> if the latter, I hope you realize that this can be abused in all sorts of
> interesting ways...

I forgot who it was on XML-DEV who said that XML is a dream for malicious 
network abusers.

I'm not arguing whether or not it's a good thing that XML is so URI-happy.  
I'm just stating the fact.

As for your precise question, Guido said it came from saxutils.py


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Mon Feb 19 23:53:44 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 00:53:44 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <00ba01c09acd$020db5c0$e46940d5@hagrid> (fredrik@effbot.org)
References: <200102192206.PAA15107@localhost.localdomain> <00ba01c09acd$020db5c0$e46940d5@hagrid>
Message-ID: <200102192353.f1JNrim07455@mira.informatik.hu-berlin.de>

> is the code Guido quoted taken from a utility function (e.g. a standard
> input handler), or is it part of the core library:
> 
>         if os.path.isfile(sysid):
>             basehead = os.path.split(os.path.normpath(base))[0]
>             source.setSystemId(os.path.join(basehead, sysid))
>             f = open(sysid, "rb")
>         else:
>             source.setSystemId(urlparse.urljoin(base, sysid))
>             f = urllib.urlopen(source.getSystemId())
> 
> if the latter, I hope you realize that this can be abused in all sorts of
> interesting ways...

That is part of xml.sax.saxlib.prepare_input_source. I don't realize
all the sorts in which this can be abused, though - can you elaborate some?

Regards,
Martin


From larsga@garshol.priv.no  Tue Feb 20 07:48:32 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Feb 2001 08:48:32 +0100
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: <20010219114827.B28553@zopatista.com>
References: <fdrake@acm.org> <200102171939.MAA16669@localhost.localdomain> <20010219114827.B28553@zopatista.com>
Message-ID: <m3d7cdpzfz.fsf@lambda.garshol.priv.no>

* Martijn Pieters
| 
| I cannot find any references to Lars' test suite; so I don't know if it
| will work with his.

I think Uche is referring to test/test_javadom.py in the PyXML
package.  It's not very big, and it sounds like you've probably
covered what it does already. It also uses PyUnit.
 
| - The suite tests only for DOM compliance, nothing implementation specific
|   should be in there. There are some python binding tests, we may want to
|   move those out.

I don't think they should be. The Python extensions are more a part of
the interface than some of the W3C-defined stuff, I would say.
 
| - DOMString and text manipulating interface methods are not tested beyond
|   ASCII text due to an implementation limitation of ParsedXML.DOM. So,
|   implementations will not be tested if text is correctly treated when
|   multi-byte UTF-16 characters are involved.

By "multi-byte UTF-16 characters" I assume you mean Unicode characters
outside the BMP that are represented using two surrogates?
 

But this test suite really sounds like an excellent piece of work. It
would be great if we could start using it in the PyXML package, and
also if some scheme could be worked out so that both groups could
easily contribute to the package.

--Lars M.


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 07:50:37 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 08:50:37 +0100
Subject: [XML-SIG] Preparing for 0.6.4
Message-ID: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de>

I'm going to release PyXML 0.6.4 later this week or early next
week. If you have any pending changes that you want to integrate,
please let me know, or commit them yourself.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 07:56:09 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 08:56:09 +0100
Subject: [XML-SIG] Pending patches
Message-ID: <200102200756.f1K7u9A01449@mira.informatik.hu-berlin.de>

There is a number of patches pending on SF which need review, in
particular:

4DOM:    103418, 103417
wddx:    103408
xmlproc: 103470

I'd appreciate if the owners of these modules could review the patches
and accept or reject them. If you think you ought to review them but
cannot do so in any foreseeable future, please let me know.

Thanks,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 08:02:12 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 09:02:12 +0100
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: <m3d7cdpzfz.fsf@lambda.garshol.priv.no> (message from Lars Marius
 Garshol on 20 Feb 2001 08:48:32 +0100)
References: <fdrake@acm.org> <200102171939.MAA16669@localhost.localdomain> <20010219114827.B28553@zopatista.com> <m3d7cdpzfz.fsf@lambda.garshol.priv.no>
Message-ID: <200102200802.f1K82Cn01522@mira.informatik.hu-berlin.de>

> | - DOMString and text manipulating interface methods are not tested beyond
> |   ASCII text due to an implementation limitation of ParsedXML.DOM. So,
> |   implementations will not be tested if text is correctly treated when
> |   multi-byte UTF-16 characters are involved.
> 
> By "multi-byte UTF-16 characters" I assume you mean Unicode characters
> outside the BMP that are represented using two surrogates?

I rather read that as "Unicode characters outside row 0",
ie. non-Latin-1 - although problems likely occur for "multibyte UTF-8
characters", i.e. non-ASCII.

> But this test suite really sounds like an excellent piece of work.

I definitely agree.

> It would be great if we could start using it in the PyXML package,
> and also if some scheme could be worked out so that both groups
> could easily contribute to the package.

I'm not sure it needs to be incorporated in PyXML; getting our DOM
implementations to pass and then run them regularly as regression
tests should be sufficient.

An official feedback procedure (patch submission address, or CVS write
access) would be good, though. I actually don't know how people
contribute to Zope - although I could probably find out with little
reasearch.

Regards,
Martin


From jerome.marant@free.fr  Tue Feb 20 08:42:56 2001
From: jerome.marant@free.fr (J�r�me Marant)
Date: 20 Feb 2001 09:42:56 +0100
Subject: [XML-SIG] Preparing for 0.6.4
In-Reply-To: "Martin v. Loewis"'s message of "Tue, 20 Feb 2001 08:50:37 +0100"
References: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de>
Message-ID: <7z1yst219r.fsf@amboise.ird.idealx.com>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> I'm going to release PyXML 0.6.4 later this week or early next
> week. If you have any pending changes that you want to integrate,
> please let me know, or commit them yourself.

  Yes, please.

  There is a missing #!/usr/bin/env python in demo/xbel/xbel2html.py

  Please also make sure that the right version number appears in the RE=
ADME file.

  Thanks.=20

--=20
J=E9r=F4me Marant <jerome.marant@free.fr>

http://jerome.marant.free.fr


From larsga@garshol.priv.no  Tue Feb 20 09:00:23 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 20 Feb 2001 10:00:23 +0100
Subject: [XML-SIG] SAX: Names with no namespace
Message-ID: <m3zofhohjs.fsf@lambda.garshol.priv.no>

We had a discussion earlier about how to represent the namespace URI
of names that are not in any namespace, and this discussion was never
properly concluded.

The alternatives seem to be None and '', and the question is which to
choose. I see that the Java version of SAX has chosen '', but I think
this is in large part because anything else would be very inconvenient
because of the way Java and Java SAX are put together.

Personally, I am leaning toward None, since that seems to me the best
way to represent a missing namespace URI. That is also my only
argument in favour.

Does anyone else have any opinions on this?

--Lars M.


From ken@bitsko.slc.ut.us  Tue Feb 20 14:06:29 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 20 Feb 2001 08:06:29 -0600
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: Lars Marius Garshol's message of "18 Feb 2001 17:45:52 +0100"
References: <200102171719.KAA07503@localhost.localdomain>
 <m37l2ohrcv.fsf@lambda.garshol.priv.no>
Message-ID: <x7ofvxih3u.fsf@bitsko.slc.ut.us>

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> | A low-level Infoset API would be interesting
> 
> Personally I would prefer to see a nice tree-based XML API. My
> personal opinion is that the DOM stinks and needs replacement.  Sean
> McGrath's xTree looks far better, in my opinion.

Orchard[1] exposes *just* the infoset in the simplest possible way[2]
(that is, an element's attributes is a mapping, contents are
sequences, other attributes are simple values).

Orchard's nodes differ from DOM nodes in that they have no navigation
methods or attributes (firstChild, nextSibling) or DOM-special
manipulation (insertBefore, replaceChild) -- depending solely on
Python's standard mapping and sequence interface.  Orchard also uses a
(URI, LocalName) tuple for supporting XML Namespaces, instead of
additional *NS methods.  Like Python's DOM binding, Orchard uses
normal attribute accessors instead of (or in addition to) get/set
methods.

Essentially the whole API (the XML node attributes for common XML
nodes), in language-neutral form, less a few convenience methods like
getElementsByTagName(), load(), and save(), is attached below.

>From a quick re-review, Pyxie's xTree also has navigation methods (Up,
Down, HasUp).  I would be very interested to find out if people have a
preference for navigation methods vs. using the mappings and sequences
directly.  Again, Orchard nodes use direct access, no navigation
methods.

Like Pyxie's xDispatch (and discussed here earlier[3,4]), Orchard uses
node-based events/dispatch (SAX).  Event handlers, pull modules, or
dispatch functions all use the same node types as trees do.

"But Wait!!  That's not all!"  :-)

As a last note, the C optimization is well underway.  Orchard/Mostly-C
is about 3-10x faster than pure Python/Perl while still retaining
attribute accessors (with overrides), garbage collection, and no
problems with cycles.  Current status is that we have a pure Python
prototype of the Orchard APIs, and the Python binding is scheduled for
early post-1.0 (as always, volunteers can change that!).  We have
ported Matt Sergeant's XPath step evaluator to C as an example of C
optimization for higher language modules[5].

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/>
[2] <http://casbah.org/~kmacleod/orchard/quick.html#XMLNodes>
[3] <http://mail.python.org/pipermail/xml-sig/2000-February/001905.html>
[4] <http://mail.python.org/pipermail/xml-sig/2000-February/001907.html>
[5] <http://casbah.org/~kmacleod/orchard/xpath.moc.txt>

Orchard's common XML nodes:

      document  element         attribute       characters
      --------  --------------  --------------  ----------
      contents  name            name            data
      root      attributes      value
                contents        namespace-uri*
                namespace-uri*  local-name*
                local-name*     prefix*
                prefix*

      * Available when namespace processing is enabled (the default).

    The `contents' property of a document or element node is a list of
    the nodes within that document or element.  The `name' of an
    element or attribute node is name of the element/attribute,
    including prefix, if any.

    The `root' of a document is the root element of the document.

    An element's `attributes' is a container indexed by the
    attribute's `name' property.  The `value' of an attribute is the
    normalized, string value of the attribute.

    The `data' of a characters node is XML text.

    *** XML Namespaces

    If an XML document uses XML Namespaces, the following additional
    properties are available on element and attribute nodes.

    `namespace-uri' is the XML Namespace URI string.  `local-name' is
    local-name portion of the element name (the element name without
    the prefix).  `prefix' is the prefix portion of the element name
    (the element name without the local-name).

    The `attributes' container is indexed also by the
    namespace-uri/local-name pair of each attribute. When accessing
    documents using XML Namespaces, you should only use the
    namespace-uri/local-name indexes for attributes.

    XML Namespace processing is used by default if the document uses
    XML Namespaces.


From uche.ogbuji@fourthought.com  Tue Feb 20 14:07:25 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 20 Feb 2001 07:07:25 -0700
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "20 Feb 2001 08:48:32 +0100." <m3d7cdpzfz.fsf@lambda.garshol.priv.no>
Message-ID: <200102201407.HAA14271@localhost.localdomain>

> | - DOMString and text manipulating interface methods are not tested beyond
> |   ASCII text due to an implementation limitation of ParsedXML.DOM. So,
> |   implementations will not be tested if text is correctly treated when
> |   multi-byte UTF-16 characters are involved.
> 
> By "multi-byte UTF-16 characters" I assume you mean Unicode characters
> outside the BMP that are represented using two surrogates?

I wonder if that's what Martijn means.  I've read that most Java 
implementations have trouble with characters outside the BMP.  I wonder if 
Python handles these properly.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Feb 20 14:07:57 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 20 Feb 2001 07:07:57 -0700
Subject: [XML-SIG] Preparing for 0.6.4
In-Reply-To: Message from "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 of "Tue, 20 Feb 2001 08:50:37 +0100." <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de>
Message-ID: <200102201407.HAA14282@localhost.localdomain>

> I'm going to release PyXML 0.6.4 later this week or early next
> week. If you have any pending changes that you want to integrate,
> please let me know, or commit them yourself.

You probably noticed that Jeremy updated 4DOM.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From uche.ogbuji@fourthought.com  Tue Feb 20 14:32:02 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 20 Feb 2001 07:32:02 -0700
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: Message from Lars Marius Garshol <larsga@garshol.priv.no>
 of "20 Feb 2001 10:00:23 +0100." <m3zofhohjs.fsf@lambda.garshol.priv.no>
Message-ID: <200102201432.HAA14350@localhost.localdomain>

> 
> We had a discussion earlier about how to represent the namespace URI
> of names that are not in any namespace, and this discussion was never
> properly concluded.

I thought it was.

> The alternatives seem to be None and '', and the question is which to
> choose. I see that the Java version of SAX has chosen '', but I think
> this is in large part because anything else would be very inconvenient
> because of the way Java and Java SAX are put together.
> 
> Personally, I am leaning toward None, since that seems to me the best
> way to represent a missing namespace URI. That is also my only
> argument in favour.

Well, in the end I don't think there was a single dissention against "None", 
so I'd call it a group Pronouncement.

For those looking for these threads in the archive, note that it came up twice 
recently.  Look for the "DOM documentation update" subject line back in 
November/December and the "problem with empty namespace uri" subject in 
January.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From guido@digicool.com  Tue Feb 20 14:36:37 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 20 Feb 2001 09:36:37 -0500
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: Your message of "Tue, 20 Feb 2001 07:07:25 MST."
 <200102201407.HAA14271@localhost.localdomain>
References: <200102201407.HAA14271@localhost.localdomain>
Message-ID: <200102201436.JAA27994@cj20424-a.reston1.va.home.com>

> > | - DOMString and text manipulating interface methods are not tested beyond
> > |   ASCII text due to an implementation limitation of ParsedXML.DOM. So,
> > |   implementations will not be tested if text is correctly treated when
> > |   multi-byte UTF-16 characters are involved.
> > 
> > By "multi-byte UTF-16 characters" I assume you mean Unicode characters
> > outside the BMP that are represented using two surrogates?
> 
> I wonder if that's what Martijn means.  I've read that most Java 
> implementations have trouble with characters outside the BMP.  I wonder if 
> Python handles these properly.

Depends on what you call properly.  Can you elaborate on what you
would call proper treatment here?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Tue Feb 20 14:41:57 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 20 Feb 2001 09:41:57 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Your message of "19 Feb 2001 23:49:03 +0100."
 <m3wvamxp9c.fsf@lambda.garshol.priv.no>
References: <200102102107.OAA00904@localhost.localdomain> <200102102213.RAA28403@cj20424-a.reston1.va.home.com> <200102192111.f1JLBl301555@mira.informatik.hu-berlin.de> <200102192149.QAA24348@cj20424-a.reston1.va.home.com>
 <m3wvamxp9c.fsf@lambda.garshol.priv.no>
Message-ID: <200102201441.JAA28044@cj20424-a.reston1.va.home.com>

> * Guido van Rossum
> | 
> | I'd like to drop support for URLs; I don't think the typical
> | computer is sufficiently networked to make this work well.

[Lars]
> Dropping support for URLs not really an option when dealing with XML.
> The XML recommendation states clearly that all system identifiers[1]
> are URIs in XML.
> 
> What this really means is that we have two cases to deal with:
> 
>   - XML software is provided a reference to an XML document
>   - XML document references as used internally by XML software and
>     also as passed back out to client software
> 
> In the second case the references must be URIs, since it is a
> deep-seated assumption in the entire XML family of specifications that
> all such references will be URIs. This is especially clear in the case
> of entity references (as Uche illustrated), but most other XML
> specifications are equally clear on this point, such as the infoset,
> XSLT, XBase and so on.

OK, I understand.

> Of course, in the first case there is no reason why it shouldn't be
> allowed to pass file names into the APIs to have them converted into
> URIs there. In fact, I think there is very good reason to do so, since
> my experience with the Java tools that require URIs have been fairly
> painful. (Who remembers the precise syntax for file URIs on all kinds
> of platforms anyway?)

OK.  That's useful information.

> Outlawing URIs, however is not really an option.

OK, I also understand that.

> [1] What most people would call 'references to external resources',
>     usually files.
> 
> | I would suggest to have separate APIs depending on the argument
> | type, e.g. p.parseFile(filename), p.parseURL(url),
> | p.parseStream(InputSource), p.parseString(text).
> 
> That may be a better option than to have a single function/method, but
> that is really separate from the issue of whether to allow URIs or
> not.

OK, so let's focus on this then: APIs must be clear in whether they
accept a URI or a filename, and not guess based on the form of the
string.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Tue Feb 20 14:51:32 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 20 Feb 2001 09:51:32 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: Your message of "Mon, 19 Feb 2001 15:48:04 MST."
 <200102192248.PAA17821@localhost.localdomain>
References: <200102192248.PAA17821@localhost.localdomain>
Message-ID: <200102201451.JAA28102@cj20424-a.reston1.va.home.com>

[Uche]
> Sorry.

Accepted. :-)

> Basically, it's what you do with the DOM, and especially how attributes, 
> system identifiers and other such creatures are interpreted.
> 
> Basically, parseFile or parseUri in a top-level URI is typically
> only a small cross-section of the usage pattern in any XML
> processor.  Other functions such as Stylesheet processing,
> XIncludes, xml:base, RDF, and pretty much anything else, gets these
> strings and are *required* to interpret these as URIs.
> 
> If they were originally interpreted purely as files, then all the
> points of confusion you pointed out are immediately compounded as
> the system tries to reconcile the relative URIs against the "base
> URI" which is actually a file system file.
> 
> This is actually a problem that I have seen people run into far more
> often than any worries about computers not having network
> connections.  I'be been sorely tempted to remove file support just
> because it eliminates confusion with the large body of XML
> processing that requires relative URI normalization and resolution.

OK, I think I understand the issues a bit better now.  When XML docs
contain references to other things, they typically use (absolute or
relative) URL references.  I'm guessing that this means that the
separator is always "/" and the parent directory is always represented
by "..".  Fine.

But I still maintain that the API used by the application should be
clear and explicit about whether it is naming a local file or a URI.
Then parseFile(f) can call parseURI("file:" + f) [1] internally and
parseURI can set the proper base URI.  [1]: don't take this literally;
reality is more complicated than tacking "file:" onto the front.  On
non-Unix platforms, use macurl2path.pathname2url(f) on the Mac, and
nturl2path on DOS/Windows.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Tue Feb 20 14:55:41 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 20 Feb 2001 09:55:41 -0500
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: Your message of "Tue, 20 Feb 2001 07:32:02 MST."
 <200102201432.HAA14350@localhost.localdomain>
References: <200102201432.HAA14350@localhost.localdomain>
Message-ID: <200102201455.JAA28149@cj20424-a.reston1.va.home.com>

> > We had a discussion earlier about how to represent the namespace URI
> > of names that are not in any namespace, and this discussion was never
> > properly concluded.
> 
> I thought it was.
> 
> > The alternatives seem to be None and '', and the question is which to
> > choose. I see that the Java version of SAX has chosen '', but I think
> > this is in large part because anything else would be very inconvenient
> > because of the way Java and Java SAX are put together.
> > 
> > Personally, I am leaning toward None, since that seems to me the best
> > way to represent a missing namespace URI. That is also my only
> > argument in favour.
> 
> Well, in the end I don't think there was a single dissention against "None", 
> so I'd call it a group Pronouncement.
> 
> For those looking for these threads in the archive, note that it
> came up twice recently.  Look for the "DOM documentation update"
> subject line back in November/December and the "problem with empty
> namespace uri" subject in January.

Which reminds me.  I've been told that getAttribute() and
getAttributeNS() are supposed to return "" for a non-existent
attribute, and that if you want to know whether the attribute was
really there, you should use getAttributeNode() etc.  Again, that may
be a good design for Java or IDL, but is it right for Python?  I'd
much rather see None used as it was intended!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From uche.ogbuji@fourthought.com  Tue Feb 20 15:47:59 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 20 Feb 2001 08:47:59 -0700
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: Message from Ken MacLeod <ken@bitsko.slc.ut.us>
 of "20 Feb 2001 08:06:29 CST." <x7ofvxih3u.fsf@bitsko.slc.ut.us>
Message-ID: <200102201547.IAA14655@localhost.localdomain>

> Lars Marius Garshol <larsga@garshol.priv.no> writes:
> 
> > | A low-level Infoset API would be interesting
> > 
> > Personally I would prefer to see a nice tree-based XML API. My
> > personal opinion is that the DOM stinks and needs replacement.  Sean
> > McGrath's xTree looks far better, in my opinion.
> 
> Orchard[1] exposes *just* the infoset in the simplest possible way[2]
> (that is, an element's attributes is a mapping, contents are
> sequences, other attributes are simple values).
> 
> Orchard's nodes differ from DOM nodes in that they have no navigation
> methods or attributes (firstChild, nextSibling) or DOM-special
> manipulation (insertBefore, replaceChild) -- depending solely on
> Python's standard mapping and sequence interface.  Orchard also uses a
> (URI, LocalName) tuple for supporting XML Namespaces, instead of
> additional *NS methods.  Like Python's DOM binding, Orchard uses
> normal attribute accessors instead of (or in addition to) get/set
> methods.

Wow.  Sounds very clean and Pythonic.  I'll have to dig.


> "But Wait!!  That's not all!"  :-)
> 
> As a last note, the C optimization is well underway.  Orchard/Mostly-C
> is about 3-10x faster than pure Python/Perl while still retaining
> attribute accessors (with overrides), garbage collection, and no
> problems with cycles.  Current status is that we have a pure Python
> prototype of the Orchard APIs, and the Python binding is scheduled for
> early post-1.0 (as always, volunteers can change that!).  We have
> ported Matt Sergeant's XPath step evaluator to C as an example of C
> optimization for higher language modules[5].

How is the memory footprint?


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From fdrake@acm.org  Tue Feb 20 16:13:40 2001
From: fdrake@acm.org (Fred L. Drake)
Date: Tue, 20 Feb 2001 11:13:40 -0500
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: <m3zofhohjs.fsf@lambda.garshol.priv.no>
Message-ID: <web-1490474@digicool.com>

On 20 Feb 2001 10:00:23 +0100
 Lars Marius Garshol <larsga@garshol.priv.no> wrote:
 > We had a discussion earlier about how to represent the
 > namespace URI
 > of names that are not in any namespace, and this
 > discussion was never
 > properly concluded.

  Actually, I had though we *had* decided, and None was the
concensus.
  Anyway, I still favor None.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From ken@bitsko.slc.ut.us  Tue Feb 20 17:25:05 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 20 Feb 2001 11:25:05 -0600
Subject: [XML-SIG] Roadmap document - finally!
In-Reply-To: Uche Ogbuji's message of "Tue, 20 Feb 2001 08:47:59 -0700"
References: <200102201547.IAA14655@localhost.localdomain>
Message-ID: <x7vgq5gtce.fsf@bitsko.slc.ut.us>

Uche Ogbuji <uche.ogbuji@fourthought.com> writes:

> > "But Wait!!  That's not all!"  :-)
> > 
> > As a last note, the C optimization is well underway.
> > Orchard/Mostly-C is about 3-10x faster than pure Python/Perl while
> > still retaining attribute accessors (with overrides), garbage
> > collection, and no problems with cycles.  Current status is that
> > we have a pure Python prototype of the Orchard APIs, and the
> > Python binding is scheduled for early post-1.0 (as always,
> > volunteers can change that!).  We have ported Matt Sergeant's
> > XPath step evaluator to C as an example of C optimization for
> > higher language modules.
> 
> How is the memory footprint?

The core runtime, liborchard.so, is 129472 bytes (i386 Linux) and
requires Boehm-Demers-Weiser libgc.so, which is 74212 bytes.  It also
supports the expat 1.95.1 .so, but so should everyone else ;-).

The data footprint is still very small because the runtime is not
maintaining a lot of metainformation yet on classes.

The current "fast/small" DOM is running about 8x XML file size with
slots for XML Namespaces (XML Rec 159357bytes, 1246003bytes in
memory).

  -- Ken


From uche.ogbuji@fourthought.com  Tue Feb 20 18:54:34 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 20 Feb 2001 11:54:34 -0700
Subject: [XML-SIG] DC DOM tests (Was: Roadmap document - finally!)
In-Reply-To: Message from Guido van Rossum <guido@digicool.com>
 of "Tue, 20 Feb 2001 09:36:37 EST." <200102201436.JAA27994@cj20424-a.reston1.va.home.com>
Message-ID: <200102201854.LAA15786@localhost.localdomain>

> > > | - DOMString and text manipulating interface methods are not tested beyond
> > > |   ASCII text due to an implementation limitation of ParsedXML.DOM. So,
> > > |   implementations will not be tested if text is correctly treated when
> > > |   multi-byte UTF-16 characters are involved.
> > > 
> > > By "multi-byte UTF-16 characters" I assume you mean Unicode characters
> > > outside the BMP that are represented using two surrogates?
> > 
> > I wonder if that's what Martijn means.  I've read that most Java 
> > implementations have trouble with characters outside the BMP.  I wonder if 
> > Python handles these properly.
> 
> Depends on what you call properly.  Can you elaborate on what you
> would call proper treatment here?

Sure.  I admit it's hearsay, but I thought I'd read that because Java Unicode 
is or was underspecified, that there was the possibility of transposition of 
the high-surrogate with the low-surrogate character between Java 
implementations or platforms.

Now I don't exactly write XML dissertations on "Hello Kitty" <g>, so I'm not 
likely to run into this myself, but I was wondering whether Python handles 
surrogate blocks appropriately across platforms and implementations (I guess 
including cpyhton -> Jpython).


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:38:55 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:38:55 +0100
Subject: [XML-SIG] Preparing for 0.6.4
In-Reply-To: <200102201407.HAA14282@localhost.localdomain> (message from Uche
 Ogbuji on Tue, 20 Feb 2001 07:07:57 -0700)
References: <200102201407.HAA14282@localhost.localdomain>
Message-ID: <200102201838.f1KIctu00927@mira.informatik.hu-berlin.de>

> You probably noticed that Jeremy updated 4DOM.

Yes, thanks indeed for that update. That was actually what initiated
the release procedure :-)

Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:32:08 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:32:08 +0100
Subject: [XML-SIG] Preparing for 0.6.4
In-Reply-To: <7z1yst219r.fsf@amboise.ird.idealx.com> (jerome.marant@free.fr)
References: <200102200750.f1K7obi01443@mira.informatik.hu-berlin.de> <7z1yst219r.fsf@amboise.ird.idealx.com>
Message-ID: <200102201832.f1KIW8T00923@mira.informatik.hu-berlin.de>

>   There is a missing #!/usr/bin/env python in demo/xbel/xbel2html.py

Thanks! changed in my local sandbox.

>   Please also make sure that the right version number appears in the
>   README file.

That should be alright already...

Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:53:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:53:47 +0100
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: <web-1490474@digicool.com> (fdrake@acm.org)
References: <web-1490474@digicool.com>
Message-ID: <200102201853.f1KIrlw00957@mira.informatik.hu-berlin.de>

>   Actually, I had though we *had* decided, and None was the
> concensus.

That is also my recollection - there is even a PEP document somewhere;
you can get a copy from the archives, or from Tom Passin.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:42:16 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:42:16 +0100
Subject: [XML-SIG] DC DOM tests
In-Reply-To: <200102201407.HAA14271@localhost.localdomain> (message from Uche
 Ogbuji on Tue, 20 Feb 2001 07:07:25 -0700)
References: <200102201407.HAA14271@localhost.localdomain>
Message-ID: <200102201842.f1KIgGU00930@mira.informatik.hu-berlin.de>

> I wonder if that's what Martijn means.  I've read that most Java
> implementations have trouble with characters outside the BMP.  I
> wonder if Python handles these properly.

Not sure what "properly" would be:

>>> s=unichr(0xD000)+unichr(0xD800)
>>> s
u'\ud000\ud800'
>>> len(s)
2

Do I even use them in the right order here? It can store them, and
reproduce what was stored. Apart for that, it does not special-case
for surrogates at all.

Regards,
Martin

P.S. I really think Python should have used a 32-bit wide character
representation instead.


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:52:14 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:52:14 +0100
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102201451.JAA28102@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Tue, 20 Feb 2001 09:51:32 -0500)
References: <200102192248.PAA17821@localhost.localdomain> <200102201451.JAA28102@cj20424-a.reston1.va.home.com>
Message-ID: <200102201852.f1KIqEO00955@mira.informatik.hu-berlin.de>

> But I still maintain that the API used by the application should be
> clear and explicit about whether it is naming a local file or a URI.

I agree in principle. When it comes to changing existing API, I'd
hesitate to break existing code. If such breakage is planned, it ought
to be carried out rather earlier than later.

The specific case in question is the parse() method in the SAX2 API
(*); I'd argue it needs a PEP and/or your direct order to change
it. Deprecating it in the documentation is a different matter - that
still could be done after 2.1.

Regards,
Martin

(*) in turn, minidom should see a corresponding change.


From martin@loewis.home.cs.tu-berlin.de  Tue Feb 20 18:47:03 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 20 Feb 2001 19:47:03 +0100
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: <200102201455.JAA28149@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Tue, 20 Feb 2001 09:55:41 -0500)
References: <200102201432.HAA14350@localhost.localdomain> <200102201455.JAA28149@cj20424-a.reston1.va.home.com>
Message-ID: <200102201847.f1KIl3d00953@mira.informatik.hu-berlin.de>

> Which reminds me.  I've been told that getAttribute() and
> getAttributeNS() are supposed to return "" for a non-existent
> attribute, and that if you want to know whether the attribute was
> really there, you should use getAttributeNode() etc.  Again, that may
> be a good design for Java or IDL, but is it right for Python?  I'd
> much rather see None used as it was intended!

I'd have to check again, but I think the current DOM spec is painfully
clear about null and empty strings, and it also clear that a null
string ought to be None, and an empty string ought to be "". So there
is not much choice - except for developing a dislike towards the
entire DOM (which I wouldn't do just because of that problem).

Regards,
Martin


From fdrake@acm.org  Tue Feb 20 19:37:50 2001
From: fdrake@acm.org (Fred L. Drake)
Date: Tue, 20 Feb 2001 14:37:50 -0500
Subject: [XML-SIG] Using PyExpat.py
In-Reply-To: <200102201852.f1KIqEO00955@mira.informatik.hu-berlin.de>
Message-ID: <web-1491642@digicool.com>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
wrote:
 > it. Deprecating it in the documentation is a different
 > matter - that still could be done after 2.1.

  Actually, I'd expect to see the documentation updated as
soon as possible, and any new APIs added immediately.  This
would allow people to migrate away from the old way as early
as possible, and expose any bugs that are introduced with
the changes to the code.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From dieter@handshake.de  Tue Feb 20 19:03:12 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Tue, 20 Feb 2001 20:03:12 +0100 (CET)
Subject: [XML-SIG] Preparing for 0.6.4
In-Reply-To: <321438041@toto.iv>
Message-ID: <14994.49008.154170.999605@lindm.dm>

--Multipart_Tue_Feb_20_20:03:12_2001-1
Content-Type: text/plain; charset=US-ASCII

Martin v. Loewis writes:
 > I'm going to release PyXML 0.6.4 later this week or early next
 > week. If you have any pending changes that you want to integrate,
 > please let me know, or commit them yourself.
I hit a strange bug in "expat.c" (still 0.6.2) last Sunday:

  "expat" reported "no element found".

  The problem only occured during parsing of an external entity.

  It was caused by a buffer switch inside a CDATA section
  (in the external entity).
  When "expat.c" left the CDATA, it chose "contentProcessor"
  as "processor" rather than "externalEntityContentProcessor".
  When it reached the end of the external entity,
  "contentProcessor" found an inconsitent state and threw the
  "no element found" exception.

I have a patch appended. I am not sure, whether it is still
necessary for 0.6.3.


Dieter

----------------------------------------------------------------------

--Multipart_Tue_Feb_20_20:03:12_2001-1
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="xmlparse.pat"
Content-Transfer-Encoding: 7bit

--- :xmlparse.c	Fri Sep 24 04:18:38 1999
+++ xmlparse.c	Sat Feb 17 22:47:31 2001
@@ -301,6 +301,7 @@
   void (*m_unknownEncodingRelease)(void *);
   PROLOG_STATE m_prologState;
   Processor *m_processor;
+  Processor *m_beforeCdataProcessor;
   enum XML_Error m_errorCode;
   const char *m_eventPtr;
   const char *m_eventEndPtr;
@@ -360,6 +361,7 @@
 #define ns (((Parser *)parser)->m_ns)
 #define prologState (((Parser *)parser)->m_prologState)
 #define processor (((Parser *)parser)->m_processor)
+#define beforeCdataProcessor (((Parser *)parser)->m_beforeCdataProcessor)
 #define errorCode (((Parser *)parser)->m_errorCode)
 #define eventPtr (((Parser *)parser)->m_eventPtr)
 #define eventEndPtr (((Parser *)parser)->m_eventEndPtr)
@@ -1384,6 +1386,9 @@
     case XML_TOK_CDATA_SECT_OPEN:
       {
 	enum XML_Error result;
+
+	beforeCdataProcessor= processor;
+
 	if (startCdataSectionHandler)
   	  startCdataSectionHandler(handlerArg);
 #if 0
@@ -1731,8 +1736,8 @@
 {
   enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr);
   if (start) {
-    processor = contentProcessor;
-    return contentProcessor(parser, start, end, endPtr);
+    /* processor = contentProcessor; */
+    return processor(parser, start, end, endPtr);
   }
   return result;
 }
@@ -1767,6 +1772,9 @@
     *eventEndPP = next;
     switch (tok) {
     case XML_TOK_CDATA_SECT_CLOSE:
+
+      processor= beforeCdataProcessor;
+
       if (endCdataSectionHandler)
 	endCdataSectionHandler(handlerArg);
 #if 0


--Multipart_Tue_Feb_20_20:03:12_2001-1--


From tpassin@home.com  Wed Feb 21 01:02:56 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 20 Feb 2001 20:02:56 -0500
Subject: [XML-SIG] SAX: Names with no namespace
References: <web-1490474@digicool.com> <200102201853.f1KIrlw00957@mira.informatik.hu-berlin.de>
Message-ID: <002301c09ba2$08727560$7cac1218@reston1.va.home.com>

Martin v. Loewis wrote -

> >   Actually, I had though we *had* decided, and None was the
> > concensus.
>
> That is also my recollection - there is even a PEP document somewhere;
> you can get a copy from the archives, or from Tom Passin.
>
I don't recall  that anyone actually declared that it was decided, but almost
everyone who posted on this issue agreed that using "None" is the way to go.
I propose that we do declare that it has been decided - Martin, are you
willing to be the temporary benevolent dictator on this?

Here's a copy of the draft PEP:

=============================================
<?xml version='1.0'?>
<xmlpep>
 <headers>
  <pep_number>xmlpep-1</pep_number>
  <pep_title>Values for Null Or Empty Namespace URIs</pep_title>
  <pep_version>0.20</pep_version>
  <cvs_version_string/>
  <list_of_authors>
   <author name='Thomas B. Passin' email='tpassin@home.com'/>
  </list_of_authors>
  <status>Draft</status>
  <type>Standards Track</type>
  <created>29-Jan-2001</created>
  <history>
   <post date='29-Jan-2001'/>
   <post date='4-Feb-2001'/>
  </history>
 </headers>
 <abstract>
  This PEP specifies the proper values of the Namespace URI property
  when its value might otherwise appear to be either "null", "None", or the
  empty string.

  Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3]
  These three recommendations do not appear to be in full agreement.  This
fact,
  and differences between Java and Python, has lead to some confusion and
  some disagreement between various implementations supported by PyXML.  The
  language in these three Recommendations is reviewed.

  The recommendation is made to use None as the URI value in all cases where
  no URI applies to an element or attribute.

  The XMLPEP, when approved, will apply to all namespace-aware software
  maintained by the pyxml interest group.
 </abstract>

 <specification>
  <para title='Namespace-aware applications'>
   When no namespace has been declared whose scope applies to a
   particular element or attribute, the application MUST report the
   URI of the namespace of the element or attribute as None.  When there is no
   namespace prefix, the application MUST report the value of the prefix as
None.
  </para>

  <para title='Namespace-ignorant applications'>
   This requirement does not apply for applications that are not
   namespace-aware.
  </para>

  <para title='Applicability'>
   This requirement applies to all XML processing software maintained by the
PyXML
   interest group.
  </para>
 </specification>


 <rationale>
  <para title='Definitive Treatment Needed'>
  This PEP is needed because of continued uncertainty among varous PyXML
  developers as to the proper values to use, and because of inconsistency
  among various PyXML products.  Differences between Python, IDL, and Java
  make an unambiguous interpretation unclear.
  </para>

  <para>
  A definitive and consistent treatment is needed so that all the PyXML
  software may be made consistent.
  </para>

  <para title='W3C Namespaces Recommendation'>
   The Namespaces Recommendation recognizes that a namespace URI may
   be given no value - called "empty" in the Recommendation - even
   though a structure for a URI is provided in the document.  Two relevant
   passages are quoted here:

    <quote>Section 2. ...
      [Definition:] If the attribute name matches DefaultAttName,
      then the namespace name in the attribute value is that of the
      default namespace in the scope of the element to which the declaration
      is attached. In such a default declaration, the attribute value
      may be empty.
    </quote>
    <quote>5.2 Namespace Defaulting
      A default namespace is considered to apply to the element where
      it is declared (if that element has no namespace prefix), and to
      all elements with no prefix within the content of that element.
      If the URI reference in a default namespace declaration is empty,
      then unprefixed elements in the scope of the declaration are not
      considered to be in any namespace. Note that default namespaces
      do not apply directly to attributes.

      ...The default namespace can be set to the empty string. This has the
      same effect, within the scope of the declaration, of there being no
      default namespace.
    </quote>
  </para>

  <para>
     The term "empty" is not defined further, but in the context of the
     Recommendation, it must mean a missing string value.  The last
     fragment quoted above suggests, but does not require, that an
     empty string may be returned for an "empty" URI value.

     This has no direct applicability to values returned by implemenations,
     since
       1) the word "can" is used, rather than "must", and
       2) the Recommendation seems to apply to XML documents,
          not to implementations.
  </para>

  <para title='W3C DOM Level 2 Recommendation'>
    The W3C DOM Level 2 Recommendation refers to "null" namespaces in
    several places.  The thrust is clear and consistent: a "null" value
    is to be used to indicate a non-existent namespace URI value. Here
    are some relevant extracts from the Recommendation:

     <quote>Note that because the DOM does no lexical checking, the
       empty string will be treated as a real namespace URI in DOM Level 2
       methods. Applications must use the value null as the namespaceURI
       parameter for methods if they wish to have no namespace.
     </quote>
  </para>

  <para>
    The IDL definition for the createAttributeNS() method creates an
    attribute with these characteristics:
     <quote>
        A new Attr object with the following attributes:
Attribute    Value
Node.nodeName    qualifiedName
Node.namespaceURI   namespaceURI
Node.prefix    prefix, extracted from qualifiedName,
                                    or null if there is no prefix
Node.localName    local name, extracted from qualifiedName
Attr.name    qualifiedName
Node.nodeValue    the empty string
     </quote>
  </para>

  <para>For the older, non-NS aware createAttribute() method, the
Recommendation says
    <quote>...localName, prefix, and namespaceURI set to null. </quote>
  </para>

  <para>This is typical - a "null" is returned of there is no prefix or
URI.</para>

  <para>It is clear that the IDL specifies the use of "null" for empty
namespaces,
    rather that the empty string.  The java binding does not specify any
particular
    way value.
  </para>

  <para>
    Thus there seems to be nothing the the DOM Recommendation that suggests
that
    empty strings should be used, and there is clear language that "null"
values
    should be used.
  </para>

  <para title='SAX2'>
    The SAX2 java API clearly says that an empty string is to be
    returned.  The following extracts demonstrate this:

    <quote>In SAX2, the startElement and endElement callbacks in a content
handler
      look like this:
            public void startElement (String uri, String localName,
                 String qName, Attributes atts)
                 throws SAXException;

            public void endElement (String uri, String localName, String
qName)
                   throws SAXException;
      By default, an XML reader will report a Namespace URI and a local name
for
      every element, in both the start and end handler. Consider the following
      example:
        <html:hr xmlns:html="http://www.w3.org/1999/xhtml"/>
      With the default SAX2 Namespace processing, the XML reader would report
      a start and end element event with the Namespace URI
      "http://www.w3.org/1999/xhtml" and the local name "hr". The XML
       reader might also report the original qName "html:hr", but that
       parameter might simply be an empty string.
    </quote>

     <quote>
        <h:hello xmlns:h="http://www.greeting.com/ns/" id="a1"
h:person="David"/>
        If namespaces is true and namespace-prefixes is true,
        then a SAX2 XML reader will report the following:
           an element with the Namespace URI "http://www.greeting.com/ns/",
           the local name "hello", and the qName "h:hello";
           an attribute with no Namespace URI (empty string),
             no local name (empty string), and the qName "xmlns:h";
           an attribute with no Namespace URI (empty string), the
             local name "id", and the qName "id"; and an attribute
             with the Namespace URI "http://www.greeting.com/ns/",
             the local name "person", and the qName "h:person".
     </quote>
  </para>

  <para title='Discussion of The Three Recommendations'>
    To summarize, the Namespace Recommendation is essentially silent
    on the subject, the DOM clearly specifies "null" values, and SAX2
    clearly specifies the use of empty strings.
  </para>

  <para>

  </para>

  <para title='Arguments Favoring the Use of "None"'>
   The "highest" level Recommendation is presumably the DOM.
   Python offers a data object similar to "null" - the None object.
   The None object can be tested for exactly as for an empty string:

    <code>if uri:
              doYourThing()
    </code>

   Alternatively, None can be tested for explicitly, as in:

    <code>if uri is not None:
                  doYourThing()
    </code>

   Thus, None is flexible enough to be useful for this purpose.
  </para>

  <para>
    Many posts to the PyXML list have favored the use of None,
    although not all.  Either None or the empty string would seem to
    work in this context.  "None" agrees with the DOM Recommendation,
    and would seem (in a mnemonic sense)to suggest the absence of
    a prefix or URI.
  </para>

  <para title='4DOM Handling of None URIs and Prefixes'>
    The 4DOM code will handle a None URI correctly in many places,
     since it uses tests like this typical example:

      <code>
          if namespaceURI and namespaceURI != XML_NAMESPACE:
            # ...
      </code>

    This code works correctly if the namespaceURI is None.

  <para>Another test used in 4DOM is as follows:

    <code>def getElementsByTagNameNS(self,namespaceURI,localName):
        root = self.documentElement
        if root == None:
            return implementation.createNodeList([])
        py = root.getElementsByTagNameNS(namespaceURI,localName)
        if namespaceURI == '*' or namespaceURI == root.namespaceURI:
            if localName == '*' or localName == root.localName:
                py.insert(0,root)
        return py
     </code>

    The expression "namespaceURI == '*'" also evaluates correctly when
    the URI is None.
  </para>

  <para>If handling code is consistent throughout 4DOM, then it will handle
     None correctly.
  </para>

  <para title='SAX2'>
   [Need material here]
  </para>

 </rationale>
 <reference_implementation>[Should there be a reference here to one
  particular processor, such as xmlproc?]
 </reference_implementation>
 <notes></notes>
 <references></references>
 <copyright>This PEP may be used by anyone.</copyright>
</xmlpep>


From jeremy.kloth@fourthought.com  Wed Feb 21 01:17:23 2001
From: jeremy.kloth@fourthought.com (Jeremy J Kloth)
Date: Tue, 20 Feb 2001 18:17:23 -0700
Subject: [XML-SIG] 4DOM and PyXML
Message-ID: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com>

The code for 4DOM now exists solely in the PyXML CVS tree.
This should prevent any future feature clashes.

Happy hacking...
--
Jeremy Kloth                        Consultant
jeremy.kloth@fourthought.com        (303)583-9900 x 105
Fourthought, Inc.                   http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From ken@bitsko.slc.ut.us  Wed Feb 21 15:01:58 2001
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 21 Feb 2001 09:01:58 -0600
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: "Martin v. Loewis"'s message of "Tue, 20 Feb 2001 19:47:03 +0100"
References: <200102201432.HAA14350@localhost.localdomain>
 <200102201455.JAA28149@cj20424-a.reston1.va.home.com>
 <200102201847.f1KIl3d00953@mira.informatik.hu-berlin.de>
Message-ID: <x7n1bggjvd.fsf@bitsko.slc.ut.us>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de> writes:

> [Guido van Rossum writes:]
> > Which reminds me.  I've been told that getAttribute() and
> > getAttributeNS() are supposed to return "" for a non-existent
> > attribute, and that if you want to know whether the attribute was
> > really there, you should use getAttributeNode() etc.  Again, that
> > may be a good design for Java or IDL, but is it right for Python?
> > I'd much rather see None used as it was intended!
> 
> I'd have to check again, but I think the current DOM spec is
> painfully clear about null and empty strings, and it also clear that
> a null string ought to be None, and an empty string ought to be
> "". So there is not much choice - except for developing a dislike
> towards the entire DOM (which I wouldn't do just because of that
> problem).

Yes, the only place I see "" should be returned is in the two methods
getAttribute() and getAttributeNS(): "The Attr value as a string, or
the empty string if that attribute does not have a specified or
default value".

That does seem odd, and unfortunate, but this would be one of the
little places I'd rather adhere to the spec and not have any
Python-specific documentation to the contrary, rather than note the
difference and emphasize it wherever it might be an issue.

  -- Ken


From martin@loewis.home.cs.tu-berlin.de  Wed Feb 21 09:18:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 21 Feb 2001 10:18:36 +0100
Subject: [XML-SIG] 4DOM and PyXML
In-Reply-To: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com>
 (jeremy.kloth@fourthought.com)
References: <006201c09ba4$0b9aa580$1b01a8c0@fourthought.com>
Message-ID: <200102210918.f1L9IaQ01213@mira.informatik.hu-berlin.de>

> The code for 4DOM now exists solely in the PyXML CVS tree.
> This should prevent any future feature clashes.

Thanks a lot. This should reduce the troubles of distributors
(primarily of Linux distributions) where 4Suite and PyXML had an
overlap of files.

Regards,
Martin


From larsga@garshol.priv.no  Wed Feb 21 14:09:55 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 21 Feb 2001 15:09:55 +0100
Subject: [XML-SIG] SAX: Names with no namespace
In-Reply-To: <web-1490474@digicool.com>
References: <web-1490474@digicool.com>
Message-ID: <m3r90si0uk.fsf@lambda.garshol.priv.no>

* Fred L. Drake
| 
| Actually, I had though we *had* decided, and None was the concensus.

Then I bow to those with better memories than mine. :-)

--Lars M.


From paulp@ActiveState.com  Wed Feb 21 21:50:44 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Wed, 21 Feb 2001 13:50:44 -0800
Subject: [XML-SIG] Encoding autodetection
Message-ID: <3A943834.3C0473C@ActiveState.com>

Is there Python code around to do the encoding autodetection? I started
to write it and then thought I would check first...

-- 
Vote for Your Favorite Python & Perl Programming  
Accomplishments in the first Active Awards! 
http://www.ActiveState.com/Awards


From martin@loewis.home.cs.tu-berlin.de  Wed Feb 21 22:02:01 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 21 Feb 2001 23:02:01 +0100
Subject: [XML-SIG] Encoding autodetection
In-Reply-To: <3A943834.3C0473C@ActiveState.com> (message from Paul Prescod on
 Wed, 21 Feb 2001 13:50:44 -0800)
References: <3A943834.3C0473C@ActiveState.com>
Message-ID: <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de>

> Is there Python code around to do the encoding autodetection? I started
> to write it and then thought I would check first...

Not that I know of.

Regards,
Martin


From stefan.marsiske@sysdata.siemens.hu  Thu Feb 22 10:11:07 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Thu, 22 Feb 2001 11:11:07 +0100
Subject: [XML-SIG] cloning nodes
Message-ID: <20010222111107.O14235@sysdata.siemens.hu>

hi all,

once again i ran into a problem, this maybe my fault, or a bug (which alread
may have been fixed), anyhow here it is:

when i want to clone a dom node (and all its subnodes), the cloned node
doesn't contain the attributes for elements. it seems to me that cloneNode
doesn't clone Attribute type nodes. 

i'm using 4suite 0.10.1, is this a bug which is fixed in 0.10.2 or am i
missing something?

ciao
-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From uche.ogbuji@fourthought.com  Thu Feb 22 15:17:31 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 22 Feb 2001 08:17:31 -0700
Subject: [XML-SIG] cloning nodes
In-Reply-To: Message from Marsiske Stefan - 3244 <stefan.marsiske@sysdata.siemens.hu>
 of "Thu, 22 Feb 2001 11:11:07 +0100." <20010222111107.O14235@sysdata.siemens.hu>
Message-ID: <200102221517.IAA01939@localhost.localdomain>

> hi all,
> 
> once again i ran into a problem, this maybe my fault, or a bug (which alread
> may have been fixed), anyhow here it is:
> 
> when i want to clone a dom node (and all its subnodes), the cloned node
> doesn't contain the attributes for elements. it seems to me that cloneNode
> doesn't clone Attribute type nodes. 
> 
> i'm using 4suite 0.10.1, is this a bug which is fixed in 0.10.2 or am i
> missing something?

Yes.  It's a bug in 0.10.1 that was fixed in 0.10.2.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From paulp@ActiveState.com  Sat Feb 24 00:04:37 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Fri, 23 Feb 2001 16:04:37 -0800
Subject: [XML-SIG] Encoding autodetection
References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de>
Message-ID: <3A96FA95.93EB1888@ActiveState.com>

"Martin v. Loewis" wrote:
> 
> > Is there Python code around to do the encoding autodetection? I started
> > to write it and then thought I would check first...
> 
> Not that I know of.

Thanks anyways. I've written the code now. Would it be useful to anyone
else out there?

-- 
Vote for Your Favorite Python & Perl Programming  
Accomplishments in the first Active Awards! 
http://www.ActiveState.com/Awards


From larsga@garshol.priv.no  Sat Feb 24 10:05:58 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Feb 2001 11:05:58 +0100
Subject: [XML-SIG] Encoding autodetection
In-Reply-To: <3A96FA95.93EB1888@ActiveState.com>
References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> <3A96FA95.93EB1888@ActiveState.com>
Message-ID: <m3snl4fla1.fsf@lambda.garshol.priv.no>

* Paul Prescod
| 
| Thanks anyways. I've written the code now. Would it be useful to
| anyone else out there?

xmlproc could use it.  When the Unicode support is added it will need
to do the same thing.

I guess it could also be useful as a utility in some cases, such as in
a web server.

--Lars M.


From paulp@ActiveState.com  Sat Feb 24 19:36:13 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Sat, 24 Feb 2001 11:36:13 -0800
Subject: [XML-SIG] Encoding autodetection
References: <3A943834.3C0473C@ActiveState.com> <200102212202.f1LM21W01145@mira.informatik.hu-berlin.de> <3A96FA95.93EB1888@ActiveState.com> <m3snl4fla1.fsf@lambda.garshol.priv.no>
Message-ID: <3A980D2D.ACA5F980@ActiveState.com>

Lars Marius Garshol wrote:
> 
> * Paul Prescod
> |
> | Thanks anyways. I've written the code now. Would it be useful to
> | anyone else out there?
> 
> xmlproc could use it.  When the Unicode support is added it will need
> to do the same thing.

Yeah, that's where I looked first.

> I guess it could also be useful as a utility in some cases, such as in
> a web server.

I'll include it here for the record. If anyone wants to do anything with
it they can. It is hereby in the public domain. 

In response to a question I got privately: it will detect any encoding
that has a reasonable resemblence to an ASCII superset (e.g. UTF-8, ISO
8859-*, Shift-JIS) or to a 2 byte Unicode encoding (big or little
endian, with or without BOM). EBCDIC and 4-byte encodings are not
tested.

import codecs, encodings

"""Komodo will hand this library a buffer and ask it to either convert
it or auto-detect the type."""

# None represents a potentially variable byte. "##" in the XML spec... 
autodetect_dict={ # bytepattern     : ("name",              
                (0x00, 0x00, 0xFE, 0xFF) : ("ucs4_be"),        
                (0xFF, 0xFE, 0x00, 0x00) : ("ucs4_le"),
                (0xFE, 0xFF, None, None) : ("utf_16_be"), 
                (0xFF, 0xFE, None, None) : ("utf_16_le"), 
                (0x00, 0x3C, 0x00, 0x3F) : ("utf_16_be"),
                (0x3C, 0x00, 0x3F, 0x00) : ("utf_16_le"),
                (0x3C, 0x3F, 0x78, 0x6D): ("utf_8"),
                (0x4C, 0x6F, 0xA7, 0x94): ("EBCDIC")
                 }

def autoDetectXMLEncoding(buffer):
    """ buffer -> encoding_name
    The buffer should be at least 4 bytes long.
        Returns None if encoding cannot be detected.
        Note that encoding_name might not have an installed
        decoder (e.g. EBCDIC or Shift-JIS)
    """
    # a more efficient implementation would not decode the whole
    # buffer at once but otherwise we'd have to decode a character at
    # a time looking for the quote character...that's a pain

    encoding = "utf_8" # according to the XML spec, this is the default
                          # this code successively tries to refine the
default
                          # whenever it fails to refine, it falls back
to the last place
                          # encoding was set.
    bytes = (byte1, byte2, byte3, byte4) = tuple(map(ord, buffer[0:4]))
    enc_info = autodetect_dict.get(bytes, None)

    if not enc_info: # try autodetection again removing potentially
variable bytes
        bytes = (byte1, byte2, None, None)
        enc_info = autodetect_dict.get(bytes)

        
    if enc_info:
        encoding = enc_info # we've got a guess... these are
                                     #the new defaults

        # try to find a more precise encoding using xml declaration
        secret_decoder_ring = codecs.lookup(encoding)[1]
        (decoded,length) = secret_decoder_ring(buffer) 
        first_line = decoded.split("\n")[0]
        if first_line and first_line.startswith(u"<?xml"):
            encoding_pos = first_line.find(u"encoding")
            if encoding_pos!=-1:
                # look for double quote
                quote_pos=first_line.find('"', encoding_pos) 

                if quote_pos==-1:                 # look for single
quote
                    quote_pos=first_line.find("'", encoding_pos) 

                if quote_pos>-1:
                    quote_char,rest=(first_line[quote_pos],
                                               
first_line[quote_pos+1:])
                    encoding=rest[:rest.find(quote_char)]

    return encoding

##### Testing code 

big_teststrs = (u"<?xml version='1.0' encoding='%s'?><abc>\u2222</abc>",
                u'<?xml version="1.0" encoding="%s"?><abc>\u2222</abc>')

big_encodings = [
    #name           BOM prefix
    ("utf-16"   ,           None),  # this one already has a BOM prefix
    ("utf-8"    ,           None), 
    ("utf-16-le",           None), 
    ("utf-16-be",           None),
    ("utf-16-le",           codecs.BOM_LE), 
    ("utf-16-be",           codecs.BOM_BE),
    ("MBCS"     ,           None)]

little_teststrs = (u"<?xml version='1.0' encoding='%s'?><abc>q</abc>",
                u'<?xml version="1.0" encoding="%s"?><abc>q</abc>')
little_encodings = [
    ("ASCII"     ,          None),
    ("Latin-1"   ,          None),
    ("ISO 8859-1",          None)]

default_teststrs = ("<a>%s</a>", "<?xml version='1.0'?><a>%s</a>",
                    '<?xml version="1.0"?><a>%s</a>')

xml_default_encodings = [
    ("utf_8"    ,           None), 
    ("utf_16_le",           codecs.BOM_LE), 
    ("utf_16_be",           codecs.BOM_BE)]

def _assertSame(expr1,expr2):
    if expr1 != expr2:
        raise AssertionError, (expr1, "!=", expr2)

def testDetect(teststrs, test_encodings):
    for (encoding, bom) in test_encodings:
        for teststr in teststrs:
            data = (teststr % encoding).encode(encoding)
            if bom:
                data = bom + data
            _assertSame(autoDetectXMLEncoding(data), encoding)

def test():
    teststr=u"\u2222\u2323\u4343"
    testDetect(big_teststrs, big_encodings)
    testDetect(little_teststrs, little_encodings)
    testDetect(default_teststrs, xml_default_encodings)

if __name__=="__main__":
    test()
    print "All tests succeeded"


-- 
Vote for Your Favorite Python & Perl Programming  
Accomplishments in the first Active Awards! 
http://www.ActiveState.com/Awards


From guenter.radestock@sap.com  Sun Feb 25 18:25:44 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Sun, 25 Feb 2001 19:25:44 +0100
Subject: [XML-SIG] Whitespace handling in XMLWriter
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EC0@dbwdfx14.wdf.sap-ag.de>

I am using xml.sax.writer to produce nicely indentex XML output from Python.
The xmlwriter is pretty good at indentation and formatting, but for some
tags, I would like to have whitespace preserved.  I did not see a way
to tell this via the doctype info.  Right now I am using the following:


class OutputWriter:
    def __init__(self, fo=sys.stdout):
        self.fo = fo
        self.containers = []
        self.docinfo = xml.sax.writer.XMLDoctypeInfo()
        
        saxout = self.saxout = xml.sax.writer.PrettyPrinter(
            self.fo, dtdinfo=self.docinfo, endtagindentation=-2)
        # put a print here to see how (slow) output is generated.
        # there should not be a visible delay between the message
        # printed and the log output of the http server.
        #print '### starting xml output'
        saxout.startDocument()

    def pcdata_tag(self, name, s):
        s = '%s' % s
        self.saxout.startElement(name)
        self.saxout.characters(s, 0, len(s))
        self.saxout.endElement(name)

    def start_tag(self, name):
        if not name in self.containers:
            # needed to make pretty printing work (the pretty
            # printer needs to know where whitespace is allowed
            # in the output)
            self.containers.append(name)
            self.docinfo.add_element_container(name)
        self.saxout.startElement(name, {})

    def end_tag(self, name):
        self.saxout.endElement(name)

    def comment(self, text):
        text = ' ' + text + ' '
        self.saxout.comment(text, 0, len(text))

    def close(self):
        self.saxout.endDocument()
        self.fo.flush()

This works the way I want only for short content (there is no whitespace
inserted before and after).  Passing longer strings, possibly with
whitespace, to pcdata_tag will reformat, changing the internal and
external whitespace contained in my text.

Is there a way to do this with the current xmlwriter or is this
missing right now?

- Guenter


From guenter.radestock@sap.com  Sun Feb 25 19:23:13 2001
From: guenter.radestock@sap.com (Radestock, Guenter)
Date: Sun, 25 Feb 2001 20:23:13 +0100
Subject: [XML-SIG] Whitespace handling in XMLWriter
Message-ID: <FAFE609CB754D311B60C0008C75D355608C90EC2@dbwdfx14.wdf.sap-ag.de>


> -----Original Message-----
> From: Radestock, Guenter 
> Sent: Sonntag, 25. Februar 2001 19:26
> To: 'XML-SIG@python.org'
> Subject: [XML-SIG] Whitespace handling in XMLWriter
> 
> 
> I am using xml.sax.writer to produce nicely indentex XML 
> output from Python.
> The xmlwriter is pretty good at indentation and formatting, 
> but for some
> tags, I would like to have whitespace preserved.  I did not see a way
> to tell this via the doctype info.  
> 

diving in a little deeper I have found that when I call

   self.docinfo.add_attribute_defn(tagname, 'xml:space', None, None,
'preserve')

just before opening the tag tagname, it will preserve the whitespace the
way I want it to.  Now two little problems remain:

1. the tag itself will not be indented.  The place the tag is put
into the output should not have anything to do with how its content
is formatted?

2. the pretty printer does not behave properly when formatting empty
tags (a linefeed is missing after the empty tag).

You can see both in the output fragment below:

    <SearchDocList>
      <Document>
<DocID>hfk107_6_1010_1010140254.nitf</DocID>
<RankValue>0.00093607098097</RankValue>
<LAISO>de</LAISO>
        <IndexList>
          <IndexLoc>
            <Index>
<IndexID>dpa_german8de</IndexID>
<LAISO/>            </Index>
          </IndexLoc>
        </IndexList>

- Guenter


From loewis@informatik.hu-berlin.de  Mon Feb 26 08:41:50 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 26 Feb 2001 09:41:50 +0100 (MET)
Subject: [XML-SIG] PyXML 0.6.4 is released
Message-ID: <200102260841.JAA09898@pandora>

Version 0.6.4 of the Python/XML distribution is now available.  It
should be considered a beta release, and can be downloaded from
the following URLs:

http://download.sourceforge.net/pyxml/PyXML-0.6.4.tar.gz
http://download.sourceforge.net/pyxml/PyXML-0.6.4.win32-py1.5.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.4.win32-py2.0.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.4-1.5.2.i386.rpm
http://download.sourceforge.net/pyxml/PyXML-0.6.4-2.0.i386.rpm

Changes in this version, compared to 0.6.3:

	* 4DOM was integrated from 4Suite 0.10.2. 4DOM is now
          maintained as a part of PyXML. A detailed list of changes can
          be found in xml/dom/ChangeLog.

	* minidom now supports the standard methods isSameNode and
          hasAttributes, and the extension toprettyxml. A number of
	  bugs have been fixed

	* A DOM implementation registration is now available (functions
	  getDOMImplementation and registerDOMImplementation in xml.dom).

	* If expat 1.95.x is available on the system, this is used instead
	  of the included expat copy; it will then offer additional handlers.

	* A pyexpat parser can now return the attributes ordered, and
	  restrict the attribute list to the specified attributes.

	* The xmllib SAX1 driver now generates Unicode strings in
          Python 2.

	* The xml.unicode emulation was extended to support bidirectional
	  conversion, and to support a few more aliases.

The Python/XML distribution contains the basic tools required for
processing XML data using the Python programming language, assembled
into one easy-to-install package.  The distribution includes parsers
and standard interfaces such as SAX and DOM, along with various other
useful modules. =20

The package currently contains:

	* XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius
Garshol), sgmlop (Fredrik Lundh).

	* SAX interface (Lars Marius Garshol)
	* minidom DOM implementation (Paul Prescod)
	* 4DOM from Fourthought (Uche Ogbuji, Mike Olson)
	* Various utility modules and functions (various people)
	* Documentation and example programs (various people)

The code is being developed bazaar-style by contributors from the
Python XML Special Interest Group, so please send comments, questions,
or bug reports to <xml-sig@python.org>.

For more information about Python and XML, see:
	http://www.python.org/topics/xml/

--=20
Martin v. L=F6wis               http://www.informatik.hu-berlin.de/~loewis


From stefan.marsiske@sysdata.siemens.hu  Mon Feb 26 10:44:42 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Mon, 26 Feb 2001 11:44:42 +0100
Subject: [XML-SIG] 4Suite installation problems
Message-ID: <20010226114442.A14235@sysdata.siemens.hu>

hi all,

yesterday i decided to upgrade to 4Suite-0.10.2 at home. at work here
(solaris) i've been using it since the release, and it's very nice.
just one thing: in XHtmlPrint theres one line which has to be changed: 
on line 12 in python2.0/site-packages/_xmlplus/dom/ext/XHtmlPrinter.py

     self.notations = doctype and doctype.notation or []

needs an "s" after notation...

but! on my home system (linux) i've had a lot of trouble, the installation
script didn't update a lot files, and so a lot of missing functions/attributes
where the result. somehow 4Suite and PyXML together seem to screw up. i needed
to copy a lot of files by hand. and i played a lot with the PyXML-0.6.3
installation, the 4Suite installation, and the PyXML tree included with
4Suite, but in the end i worked it out. unfortunately i didn't document this,
so i can't really tell you what, and why went wrong.
but i can remember one error:
it said that DOMImplementation is missing _4dom_importfile() function or
something similar, so i found out which package is carrying this particular
implementation and copied it by hand.

so is my system screwed, or the installation procedure? or do 4Suite and PyXML
clash?

bye
-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From Alexandre.Fayolle@logilab.fr  Mon Feb 26 10:56:56 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 26 Feb 2001 11:56:56 +0100 (CET)
Subject: [XML-SIG] 4Suite installation problems
In-Reply-To: <20010226114442.A14235@sysdata.siemens.hu>
Message-ID: <Pine.LNX.4.21.0102261153320.10718-100000@leo.logilab.fr>

On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote:

> but! on my home system (linux) i've had a lot of trouble, the installation
> script didn't update a lot files, and so a lot of missing functions/attributes
> where the result. 

The 4Suite guys are the one who'd really be able to answer your question,
but in the meantime, you may want to use 'python setupt.py install -f' to
force the overwriting of all the files. I had a similar problem, and the
-f option was helpful.

In the last resort, maybe manually erasing site-packages/Ft,
site-packages/xml or site-packages/_xmlplus and running setup.py install
could solve your problem. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From stefan.marsiske@sysdata.siemens.hu  Mon Feb 26 10:56:56 2001
From: stefan.marsiske@sysdata.siemens.hu (Marsiske Stefan - 3244)
Date: Mon, 26 Feb 2001 11:56:56 +0100
Subject: [XML-SIG] 4Suite installation problems
In-Reply-To: <Pine.LNX.4.21.0102261153320.10718-100000@leo.logilab.fr>; from Alexandre.Fayolle@logilab.fr on Mon, Feb 26, 2001 at 11:56:56AM +0100
References: <20010226114442.A14235@sysdata.siemens.hu> <Pine.LNX.4.21.0102261153320.10718-100000@leo.logilab.fr>
Message-ID: <20010226115656.D14235@sysdata.siemens.hu>

On Mon, Feb 26, 2001 at 11:56:56AM +0100, Alexandre Fayolle wrote:
> On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote:
> 
> > but! on my home system (linux) i've had a lot of trouble, the installation
> > script didn't update a lot files, and so a lot of missing functions/attributes
> > where the result. 
> 
> The 4Suite guys are the one who'd really be able to answer your question,
> but in the meantime, you may want to use 'python setupt.py install -f' to
> force the overwriting of all the files. I had a similar problem, and the
> -f option was helpful.

i tried the -f and it didn't work...

> In the last resort, maybe manually erasing site-packages/Ft,
> site-packages/xml or site-packages/_xmlplus and running setup.py install
> could solve your problem. 
maybe, but i tracked down the missing sources by hand and copied them.
actually the problem is solved, i just sent this here, so others don't have
to struggle that much... 
---end quoted text---

-- 
Stefan [http://web.interware.hu/stef] UPDATED:001031
quote: "happy(y2k++)"
gpg-key: http://web.interware.hu/stef/gpg.txt


From uche.ogbuji@fourthought.com  Mon Feb 26 13:47:20 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 26 Feb 2001 06:47:20 -0700
Subject: [XML-SIG] 4Suite installation problems
In-Reply-To: Message from Marsiske Stefan - 3244 <stefan.marsiske@sysdata.siemens.hu>
 of "Mon, 26 Feb 2001 11:56:56 +0100." <20010226115656.D14235@sysdata.siemens.hu>
Message-ID: <200102261347.GAA20296@localhost.localdomain>

> On Mon, Feb 26, 2001 at 11:56:56AM +0100, Alexandre Fayolle wrote:
> > On Mon, 26 Feb 2001, Marsiske Stefan - 3244 wrote:
> > 
> > > but! on my home system (linux) i've had a lot of trouble, the installation
> > > script didn't update a lot files, and so a lot of missing functions/attributes
> > > where the result. 
> > 
> > The 4Suite guys are the one who'd really be able to answer your question,
> > but in the meantime, you may want to use 'python setupt.py install -f' to
> > force the overwriting of all the files. I had a similar problem, and the
> > -f option was helpful.
> 
> i tried the -f and it didn't work...
> 
> > In the last resort, maybe manually erasing site-packages/Ft,
> > site-packages/ml or site-packages/_xmlplus and running setup.py install
> > could solve your problem. 
> maybe, but i tracked down the missing sources by hand and copied them.
> actually the problem is solved, i just sent this here, so others don't have
> to struggle that much... 

Any clash between 4Suite 0.10.2 and PyXML 0.6.3 is a bug.  You said the 
problem is fixed, but if you have any more specifics, it would be great if you
posted them here.  The problems you originally posted looked like 
straightforward "-f needed" problems, but you said this didn't work for you.

I should note that I'm not sure that 4Suite 0.10.2 and PyXML 0.6.4 won't 
clash.  Most likely, one would end up with the older revision 4DOM from 
0.10.2, rather than the updated revision in PyXML 0.6.4 that contains some 
bug-fixes.  The next 4Suite release will only use the DOM in PyXML.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From mclay@nist.gov  Mon Feb 26 02:25:17 2001
From: mclay@nist.gov (Michael McLay)
Date: Sun, 25 Feb 2001 21:25:17 -0500
Subject: [XML-SIG] Version number question on PyXML 0.6.4
In-Reply-To: <200102260841.JAA09898@pandora>
References: <200102260841.JAA09898@pandora>
Message-ID: <01022521251706.28858@fermi.eeel.nist.gov>

On Monday 26 February 2001 03:41, Martin von Loewis wrote:
> Version 0.6.4 of the Python/XML distribution is now available.  It
> should be considered a beta release, and can be downloaded from
> the following URLs:


I'm begining to think someone from the Enlightenment window manager project 
has been given control of the version numbering for PyXML.  Version numbers 
are arbitrary, but some people will mistakenly read the low number on PyXML 
as an inidcation of unstable and immature software.  Based on the improved 
level of integration of this latest release the version number should have at 
least been bumped to a 0.7.0 release number.   What needs to be 
added/finished before the number can be bumped to 1.0?


From Alexandre.Fayolle@logilab.fr  Mon Feb 26 15:35:08 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Mon, 26 Feb 2001 16:35:08 +0100 (CET)
Subject: [XML-SIG] How to build a DOM from an HTML file?
Message-ID: <Pine.LNX.4.21.0102261626250.11146-100000@leo.logilab.fr>

Hello,

I'm trying to parse HTML documents into DOMs, using the 4DOM version that
comes with 4Suite 0.10.2

I first tried xml.dom.ext.reader.HtmlSax.HtmlDomGenerator with a
xml.dom.ext.reader.Sax.Reader but it seems to be broken (see
bug #404072). Then I tried xml.dom.ext.reader.HtmlLib.FromHmlUrl which
uses the Sgmlop parser. However, this parser looks only partially
implemented (it chokes on doctype directives, for example, which means
that pages which probably contain the most valid HTML won't be parsed).

What is the current prefered way to do this ?

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From nobody@sourceforge.net  Mon Feb 26 12:27:12 2001
From: nobody@sourceforge.net (nobody)
Date: Mon, 26 Feb 2001 04:27:12 -0800
Subject: [XML-SIG] [ pyxml-Bugs-404272 ] HtmlDomGenerator constructor bug
Message-ID: <E14XMkC-0004ll-00@usw-sf-web3.sourceforge.net>

Artifact #404272, was updated on 2001-02-26 04:27
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=404272&group_id=6473

Category: 4Suite
Group: None
Status: Open
Priority: 5
Submitted By: Alexandre Fayolle
Assigned to: Nobody/Anonymous
Summary: HtmlDomGenerator constructor bug

Initial Comment:
The HtmlDomGenerator's constructor expects 2 arguments,
an owner document and a keepAllWs flag, whereas all
other similar classes only expect the keepAllWs flag. 

The result is that when the constructor is invoked by
the Reader class, the flag is passed as the owner
document, which in turn deeply pertubates the
constructor:

>>> from xml.dom.ext.reader.Sax import Reader
>>> from xml.dom.ext.reader.HtmlSax import
HtmlDomGenerator 
>>> r = Reader(saxHandlerClass = HtmlDomGenerator)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File
"/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax.py",
line 124, in __init__
    self.handler = saxHandlerClass(keepAllWs)
  File
"/usr/lib/python1.5/site-packages/xml/dom/ext/reader/HtmlSax.py",
line 39, in __init__
    self._rootNode =
self._ownerDoc.createDocumentFragment()
AttributeError: 'int' object has no attribute
'createDocumentFragment'


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=404272&group_id=6473


From akuchlin@mems-exchange.org  Tue Feb 27 15:11:19 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 27 Feb 2001 10:11:19 -0500
Subject: [XML-SIG] Maintaining catalogs
Message-ID: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us>

For a project, I'd like to install a DTD on the system and
automatically add its public identifier to the catalog.  Is there a
standard place to put SGML/XML catalogs on Unix systems?
/usr/(local)?/lib/sgml?  /etc/sgml/?

--amk


From gregor@hoffleit.de  Tue Feb 27 15:24:34 2001
From: gregor@hoffleit.de (Gregor Hoffleit)
Date: Tue, 27 Feb 2001 16:24:34 +0100
Subject: [XML-SIG] Maintaining catalogs
In-Reply-To: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Tue, Feb 27, 2001 at 10:11:19AM -0500
References: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us>
Message-ID: <20010227162434.C20349@mediasupervision.de>

On Tue, Feb 27, 2001 at 10:11:19AM -0500, Andrew Kuchling wrote:
> For a project, I'd like to install a DTD on the system and
> automatically add its public identifier to the catalog.  Is there a
> standard place to put SGML/XML catalogs on Unix systems?
> /usr/(local)?/lib/sgml?  /etc/sgml/?

Debian has a package sgml-base that sets up some infrastructure for managing
SGML files. All SGML description files live in /usr/lib/sgml. The catalog
file is /etc/sgml.catalog, /usr/lib/sgml/catalog is a symlink pointing to
the real file /etc/sgml.catalog.

sgml-base contains a tool install-sgmlcatalog that's used to add and remove
entries to the catalog file. The README (see below) contains an example how
that's supposed to be done.

    Gregor


Guidelines for SGML packages
============================

Package dependencies
--------------------

All SGML packages that provide a DTD or entity description file have
to depend on "sgml-base". This package installs the "install-sgmlcatalog"
script and provides the necessary directory structure.


The SGML Description Files
--------------------------

The location of SGML description files (DTD's, entities, etc.) is
/usr/lib/sgml . All DTD's should be installed in /usr/lib/sgml/dtd ,
all entity description files should go into /usr/lib/sgml/entities .


The SGML Catalog
----------------

The SGML catalog file is /etc/sgml.catalog , but should be refered to
through the symbolic link /usr/lib/sgml/catalog . Furthermore, all
path specifications given in the SGML catalog have to be relativ
to /usr/lib/sgml .

Please don't modify the SGML catalog directly in the postinst/postrm
scripts of your package--you should use the install-sgmlcatalog script
for that.

Here is a simple example: Consider the package "foo" which provides the
DTD foo.dtd and an entity description file "foo-general". The package
will install the following files:

	/usr/lib/sgml/dtd/foo.dtd
        /usr/lib/sgml/entities/foo-general
        /usr/lib/foo/sgml.catalog
        
The sgml.catalog file will look like this:

	DOCTYPE foodoc            dtd/foo.dtd
	ENTITY %foo-general       entities/foo-general

That's the postinst script:

	#!/bin/sh
        install-sgmlcatalog --install /usr/lib/foo/sgml.catalog foo

and the postrm script:

	#!/bin/sh
        install-sgmlcatalog --remove foo

Please check the install-sgmlcatalog(8) manpage for details.


Feedback
--------

Please send me an email for bugs/suggestions/critics on these guidelines.

--
May 8, 1997
Christian Schwarz <schwarz@debian.org>
    

From akuchlin@mems-exchange.org  Tue Feb 27 15:31:50 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 27 Feb 2001 10:31:50 -0500
Subject: [XML-SIG] Maintaining catalogs
In-Reply-To: <20010227162434.C20349@mediasupervision.de>; from gregor@mediasupervision.de on Tue, Feb 27, 2001 at 04:24:34PM +0100
References: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us> <20010227162434.C20349@mediasupervision.de>
Message-ID: <20010227103150.B17362@ute.cnri.reston.va.us>

On Tue, Feb 27, 2001 at 04:24:34PM +0100, Gregor Hoffleit wrote:
>sgml-base contains a tool install-sgmlcatalog that's used to add and remove
>entries to the catalog file. The README (see below) contains an example how
>that's supposed to be done.

Redhat 7.0 has something similar, though annoyingly the script is called 
install-catalog instead.

--amk


From akuchlin@mems-exchange.org  Tue Feb 27 16:23:42 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 27 Feb 2001 11:23:42 -0500
Subject: [XML-SIG] DTD design: include categorization, or use RDF?
Message-ID: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us>

I'm revisiting and extending my quotation DTD this week, hence my
suddenly asking a bunch of questions here.  I'm wondering about
categorization.  A common application would be to group quotations
into categories.  I can add a category element or attribute, but then
someone comes along who wants to sort quotes by newsgroup, or by date,
or by some other wacky thing.  I can invent a general syntax, but
that's just reinventing RDF badly, so RDF seems like the obvious
course.

Question: is it better to embed RDF annotations in a single file, or
to encourage maintaining an RDF index in a separate file, as a gloss
on the original file.  In other words, I'm wondering about:

<quotation id="foo">
  <rdf:Description about="...#foo">
    <someschema:Author>Author's Name</someschema:Author>
  </rdf:Description>
   ...
</quotation>

   versus:

<quotation id="foo">
   ...
</quotation>

  and in some other file have:

  <rdf:Description about="...#foo">
    <someschema:Author>Author's Name</someschema:Author>
  </rdf:Description>

The first form has only one file, but I'm wondering if it will
complicate the task of modifying the file programmatically too much.
(I'd really like to write a Tkinter program for maintaining a
collection, which means that the data will have to be round-tripped
from XML to Python objects and back again.  Hopefully people here will
have application experience doing this sort of thing.)

--amk


From tpassin@home.com  Tue Feb 27 22:25:35 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 27 Feb 2001 17:25:35 -0500
Subject: [XML-SIG] DTD design: include categorization, or use RDF?
References: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us>
Message-ID: <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com>

Andrew Kuchling asks


> Question: is it better to embed RDF annotations in a single file, or
> to encourage maintaining an RDF index in a separate file, as a gloss
> on the original file.  In other words, I'm wondering about:
>

I encourage you to use a separate file.  This is because we're going to want
more tools for working with third-party data, I think, and you will be
furthering that if you choose to use a separate file.

It really depends on whether you see the annotations as being something
separate, and if you might like to apply the same idea to some other data that
you don't control.

Cheers,

Tom P


From akuchlin@mems-exchange.org  Tue Feb 27 23:03:44 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 27 Feb 2001 18:03:44 -0500
Subject: [XML-SIG] DTD design: include categorization, or use RDF?
In-Reply-To: <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com>; from tpassin@home.com on Tue, Feb 27, 2001 at 05:25:35PM -0500
References: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us> <000601c0a10c$3506ee20$7cac1218@reston1.va.home.com>
Message-ID: <20010227180344.D15343@ute.cnri.reston.va.us>

On Tue, Feb 27, 2001 at 05:25:35PM -0500, Thomas B. Passin wrote:
>It really depends on whether you see the annotations as being something
>separate, and if you might like to apply the same idea to some other data that
>you don't control.

Author and source seems critical, and therefore suitable as part of
the DTD, but additional categorizations seem less important and
application-dependent.

--amk


From martin@loewis.home.cs.tu-berlin.de  Wed Feb 28 00:14:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 28 Feb 2001 01:14:47 +0100
Subject: [XML-SIG] Maintaining catalogs
In-Reply-To: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us> (message from Andrew
 Kuchling on Tue, 27 Feb 2001 10:11:19 -0500)
References: <E14XlmZ-0004XC-00@ute.cnri.reston.va.us>
Message-ID: <200102280014.f1S0El901398@mira.informatik.hu-berlin.de>

> For a project, I'd like to install a DTD on the system and
> automatically add its public identifier to the catalog.  Is there a
> standard place to put SGML/XML catalogs on Unix systems?
> /usr/(local)?/lib/sgml?  /etc/sgml/?

I believe the standard location is below /usr/share/sgml. Not sure how
it is supposed to work; it seems that a tool should look at all files
matching CATALOG.* in that directory.

In addition, I have a number of subdirectories in /usr/share/sgml,
e.g. OASIS, W3C, James_Clark, Normal_Walsh, etc. They seem to
correspond to the public identifiers; eg. "-//OASIS//DTD DocBook
V3.1//EN" can be found in /usr/share/sgml/OASIS/dtd/DocBook_V3.1.
However, these files are referred-to in the CATALOG.* files, so that
seems to be the primary resource.

In addition, nsgml honors the SGML_CATALOG_FILES environment variable;
if this is not set, the documentation says it uses a system-dependent
default list of catalog files.

There is something called "open catalogs", but I'm not certain how
much that actually specifies.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed Feb 28 00:50:00 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 28 Feb 2001 01:50:00 +0100
Subject: [XML-SIG] Catalogs and LSB
Message-ID: <200102280050.f1S0o0D01871@mira.informatik.hu-berlin.de>

I just found that the Linux Standards Base addendum R003 specifies
locations for catalogs; they say that centralized catalogs must reside
in /etc/sgml, end in .cat, and only contain CATALOG declarations.

It goes on saying that /etc/sgml/catalog is the central catalog, and
managed by means of the install-catalog utility. It seems that Redhat
provides that utility, but that this utility manages
/usr/lib/sgml/CATALOG (and puts new catalog files into /usr/lib/sgml).

Debian apparently puts the central catalog into /etc/sgml.catalog, and
the individual catalogs into /usr/lib/sgml; they have a corresponding
install-catalog utility.

It would probably be worthwhile writing a library that locates the
central catalog, or individual catalogs, in a best-effort manner. If
we were to set a precedent, it would be probably best to stick to the
LSB proposal, regardless whether Debian and Redhat differ from that,
and even though only Caldera appears to implement it fully.

Regards,
Martin


From akuchlin@mems-exchange.org  Wed Feb 28 05:49:08 2001
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Wed, 28 Feb 2001 00:49:08 -0500
Subject: [XML-SIG] QEL 2.0 DTD
Message-ID: <200102280549.AAA01205@mira.erols.com>

First stab at a Quotation Exchange Language Web page:

http://www.amk.ca/qel/

Take a look at the QEL 2.0 DTD and offer any comments.  Now to work on
the software...

--amk


From eric2461@caramail.com  Wed Feb 28 17:24:20 2001
From: eric2461@caramail.com (RICO)
Date: Wed, 28 Feb 2001 18:24:20 +0100
Subject: [XML-SIG] Gratuitement : le meilleur du web !
Message-ID: <200102281809.f1SI9HO07583@bacho.adi.fr>


From Alexandre.Fayolle@logilab.fr  Wed Feb 28 18:42:58 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 28 Feb 2001 19:42:58 +0100 (CET)
Subject: [XML-SIG] [off topic] getting XML on the net
Message-ID: <Pine.LNX.4.21.0102281935390.19215-100000@leo.logilab.fr>

This is off topic, but I thought some of you might be interested. I've
just learned this from http://www.scripting.com.

Google can deliver results as XML: 
wget http://www.google.com/xml?q=narval

It is possible to get NASDAQ stock quotes in XML too:
wget "http://quotes.nasdaq.com/quote.dll?page=xml&mode=stock&symbol=AAPL"

I'm pretty sure I'll take some time to set up a couple of Narval recipes
to take advantage of this. 

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).


From xml-sig@teleo.net  Wed Feb 28 19:24:32 2001
From: xml-sig@teleo.net (xml-sig@teleo.net)
Date: Wed, 28 Feb 2001 11:24:32 -0800
Subject: [XML-SIG] DTD design: include categorization, or use RDF?
In-Reply-To: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us>
References: <E14Xmuc-0004ah-00@ute.cnri.reston.va.us>
Message-ID: <0102281124320Y.04301@quadra.teleo.net>

On Tuesday 27 February 2001 08:23, Andrew Kuchling wrote:
> I'm revisiting and extending my quotation DTD this week, hence my
> suddenly asking a bunch of questions here.  I'm wondering about
> categorization.  A common application would be to group quotations
> into categories.  I can add a category element or attribute, but then
> someone comes along who wants to sort quotes by newsgroup, or by date,
> or by some other wacky thing.  I can invent a general syntax, but
> that's just reinventing RDF badly, so RDF seems like the obvious
> course.


Have you considered Topic Maps, as a possible alternative to RDF?

http://xmlcoverpages.org/topicMaps.html

There's now an Open Source TM engine in Python:
http://ontopia.net/software/tmproc/