From boud2@rempt.xs4all.nl  Tue Feb  1 06:45:00 2000
From: boud2@rempt.xs4all.nl (Boudewijn Rempt (KDE test user))
Date: Tue, 1 Feb 2000 07:45:00 +0100 (CET)
Subject: [XML-SIG] DevDay results
In-Reply-To: <38961207.EF6835F2@prescod.net>
Message-ID: <Pine.LNX.4.21.0002010741340.18121-100000@calcifer.valdyas.org>

(I ought to introduce myself first - while I'm not working
on the XML modules myself, I've spent the past month working
on a XML editor for KDE, using Python.)

On Mon, 31 Jan 2000, Paul Prescod wrote:

> uche.ogbuji@fourthought.com wrote:
> > 
> 
> > My vote would be to bundle SAX and Expat, which will do for many uses.  If
> > they need more sophisticated XML, they can download the XML package to get
> > DOM, XPath, XSLT, etc.
> 
> My concern is that I don't consider the DOM "advanced". Hell, Visual
> Basic and Javascript programmers can't even spell SAX but they all use
> the DOM. If a new user asked me which to learn first, I'd say "the DOM"
> because any semi-competent newbie can find their way around a tree(?) to
> get the information they need whereas being smart enough to buffer the
> right information in the right order takes a little more algorithmic
> fore-though ('scuse me).
> 

As far as I'm concerned, the DOM is absolutely basic. At least, it's what
I turned to immediately when I started writing my editor. If a DOM isn't
included in the standard XML package, would it be allowable to include
it in every application that needs one?


From uche.ogbuji@fourthought.com  Tue Feb  1 15:02:37 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 01 Feb 2000 08:02:37 -0700
Subject: [XML-SIG] DevDay results
In-Reply-To: Your message of "Tue, 01 Feb 2000 07:45:00 +0100."
 <Pine.LNX.4.21.0002010741340.18121-100000@calcifer.valdyas.org>
Message-ID: <200002011502.IAA04939@localhost.localdomain>

> (I ought to introduce myself first - while I'm not working
> on the XML modules myself, I've spent the past month working
> on a XML editor for KDE, using Python.)

[snip]

> As far as I'm concerned, the DOM is absolutely basic. At least, it's what
> I turned to immediately when I started writing my editor. If a DOM isn't
> included in the standard XML package, would it be allowable to include
> it in every application that needs one?

I doubt anyone would disagree that the core of the DOM is basic, but as I've 
already witnessed elsewhere, if you got all these people together, there would 
be no easy consensus on what constitutes that core.

We all seem to be agreed that 4DOM is (and even PyDOM would have been) too 
bulky to be bundled with Python.  Most have also expressed that some DOM 
interface would be good for bundling with Python.  Perhaps you can bring your 
fresh perspective to the question of exactly how we go about this.

If you don't mind, take a look at the xml-sig thread beginning at:

http://www.python.org/pipermail/xml-sig/1999-April/002712.html

and Paul's final proposal at:

http://www.python.org/pipermail/xml-sig/1999-April/002763.html

Except for the hard-core DOM-haters, most of us liked Paul's proposal, and it 
is only time that has prevented us from building in a conversion layer from 
4DOM to miniDOM.

I think we should review Paul's proposal in the face of DOM Level 2, and come 
up with a miniDOM which _can_ be bundled with Python, knowing that miniDOM 
code could be easily migrated to 4DOM if bigger guns are needed.

You'll also see a lot of dicussion of qp_xml in that thread.  qp_xml is nice 
and lightweight, but my resistance to it (and others) is that it doesn't 
follow the XML Infoset, which, rightly or wrongly, makes many concessions to 
DOM.  I'm sure we have all written our own quick and effective XML APIs (Mike 
and I have written our share).  We moved to DOM, warts and all, because 
standardization and intellectual cohesiveness is more important than memory 
and processor footprint for a general API.


-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From paul@prescod.net  Tue Feb  1 20:14:58 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 01 Feb 2000 12:14:58 -0800
Subject: [XML-SIG] DevDay results
References: <200002011502.IAA04939@localhost.localdomain>
Message-ID: <38973EC2.704AD82B@prescod.net>

Are congratulations in order yet, Uche?

uche.ogbuji@fourthought.com wrote:
> 
> ...
> I doubt anyone would disagree that the core of the DOM is basic, but as I've
> already witnessed elsewhere, if you got all these people together, there would
> be no easy consensus on what constitutes that core.

And the "core" would likely not be sufficient to support an XML editor.

I am most interested in enabling XML->Foo transformations that require
walking around the DOM tree. On the other hand, walking around the DOM
tree without something like XPath is a little painful so maybe just
providing the DOM is not so useful....decisions, decisions.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Ivory towers are no longer in order. We need ivory 
networks. Today, sitting quietly and thinking is the 
world´s greatest generator of wealth and prosperity."
 - http://www.bespoke.org/viridian/print.asp?t=140


From boud2@rempt.xs4all.nl  Tue Feb  1 20:45:21 2000
From: boud2@rempt.xs4all.nl (Boudewijn Rempt (KDE test user))
Date: Tue, 1 Feb 2000 21:45:21 +0100 (CET)
Subject: [XML-SIG] DevDay results
In-Reply-To: <38973EC2.704AD82B@prescod.net>
Message-ID: <Pine.LNX.4.21.0002012144010.14326-100000@calcifer.valdyas.org>


On Tue, 1 Feb 2000, Paul Prescod wrote:

> Are congratulations in order yet, Uche?
> 
> uche.ogbuji@fourthought.com wrote:
> > 
> > ...
> > I doubt anyone would disagree that the core of the DOM is basic, but as I've
> > already witnessed elsewhere, if you got all these people together, there would
> > be no easy consensus on what constitutes that core.
> 
> And the "core" would likely not be sufficient to support an XML editor.
> 

I was getting nicely underway with what I had - of course, I was
only trying to build a simple editor, only showing the tree,
node attributes and text nodes. But then, I'm very much a layman
when it comes to these issues.


From uche.ogbuji@fourthought.com  Tue Feb  1 23:31:41 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 01 Feb 2000 16:31:41 -0700
Subject: [XML-SIG] DOM in Python 1.6
Message-ID: <200002012331.QAA06963@localhost.localdomain>

Did Guido set a timetable for Python 1.6?  What deadlines are we facing if we 
want to try to get a lightweight DOM into Python 1.6?

-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From steve@renlabs.com  Tue Feb  1 23:33:07 2000
From: steve@renlabs.com (Steven Work)
Date: 01 Feb 2000 15:33:07 -0800
Subject: [XML-SIG] Please resolve external parameter entity references
In-Reply-To: "Boudewijn Rempt's message of "Tue, 1 Feb 2000 21:45:21 +0100 (CET)"
Message-ID: <87g0vcwlak.fsf@solano.in.renlabs.com>

May I weigh in on the feature list question?  For many purposes the
core XML processor should resolve external parameter entity
references; expat currently doesn't.  W3C only *requires* this of a
validating parser, and that appears to be expat's justfication for
skipping them.  I'd like to argue that a good general-use
*non-validating* parser should do it too, at least optionally; and I
don't think it would bloat code measurably or slow things down any
when there are no external parameter entity references, or when the
option is turned off.

Why does this matter?  Here's one example.

I find myself logging (accumulating) information in XML-derived
formats pretty frequently these days.  The only way I know to do this
in a strictly append-only and atomic way is this:

1. Start with a (unchanging) top-level document like "log.xml" here:

  <?xml version="1.0" standalone="no"?>
  <!DOCTYPE log SYSTEM "log.dtd" [
  <!ENTITY % log.decls SYSTEM "log.decls">
  <!ENTITY   log.ents    SYSTEM "log.ents">
  %log.decls;
  ]>
  <log>
  &log.ents;
  </log>

2. For each "thing" to log, do these steps in order:

  a. Write a well-formed chunk of XML, valid within a <log> entity, to
     a uniquely-named new file.

  b. Append something like this to "log.decls":

       <!ENTITY unique-name SYSTEM "unique-name">

  c. Append something like this to "log.ents":

       &unique-name;

If you can assume the writes in 2b and 2c are atomic (happen to
completion without other writes to the same file intervening; for
small writes on most systems this is an OK assumption) then "log.xml"
remains valid at all times -- no need for locks or other interprocess
communications to avoid scrambling the data, even with many processes
writing data "simultaneously."

But to process "log.xml" I have to fall back from the very-fast expat,
usually to an ESIS parser chewing the data stream from nsgmls in a
separate process (validating xmlproc works too but it's even slower).
These systems don't need validating parsers, but the to my knowledge
the XML developer community hasn't built any good non-validating
parsers that don't just ignore external parameter entity references.

Only they can't ignore them entirely (Section 5.1 of the W3C
recommendation requires a non-validating parser to notice when it has
chosen NOT to read an external parameter entity, so it can know at
what point it is absolved of its responsibility to process entity
declarations or attribute-list declarations that come later).  So
there's essentially no speed cost to having the *option* of reading
external parameter entities, and choosing *not* to.  And you're
already expanding internally-declared parameter entities, so it won't
add a measurable amount of code to do so from another file.

I think I'm talking myself into patching expat.  Would some kind soul
please point out flaws in the above, so I can save myself the trouble?
-- 
Steven Work
Renaissance Labs
steve@renlabs.com
360 647-1833


From paul@prescod.net  Wed Feb  2 02:00:46 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 01 Feb 2000 18:00:46 -0800
Subject: [XML-SIG] Please resolve external parameter entity references
References: <87g0vcwlak.fsf@solano.in.renlabs.com>
Message-ID: <38978FCE.86BC3FB2@prescod.net>

The current experimental (beta) version of expat resolves paramater
entities. This "test version" is now about 6 months old and was last
updated in October. I have no way of knowing if there are any known bugs
in it.

http://www.oasis-open.org/cover/news1999Q2.html

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Ivory towers are no longer in order. We need ivory 
networks. Today, sitting quietly and thinking is the 
world´s greatest generator of wealth and prosperity."
 - http://www.bespoke.org/viridian/print.asp?t=140


From wunder@infoseek.com  Wed Feb  2 17:12:21 2000
From: wunder@infoseek.com (Walter Underwood)
Date: Wed, 02 Feb 2000 09:12:21 -0800
Subject: [XML-SIG] Please resolve external parameter entity
 references
In-Reply-To: <87g0vcwlak.fsf@solano.in.renlabs.com>
References: <"Boudewijn Rempt's message of "Tue, 1 Feb 2000 21:45:21 +0100 (CET)">
Message-ID: <4.3.0.33.1.20000202090220.00cd5650@corp.infoseek.com>

At 03:33 PM 2/1/00 -0800, Steven Work wrote:

>I find myself logging (accumulating) information in XML-derived
>formats pretty frequently these days.  The only way I know to do this
>in a strictly append-only and atomic way is this:

This exact problem has been discussed a few times on xml-dev.
The way to break out of it is to look at each log entry as
a separate document, rather than try to make the entire log
one document.

To separate the documents in the log, use a character not allowed
in XML. Formfeed is a fine choice, since it even means "next page"
which is pretty close to the semantics wanted. Using a character
that doesn't appear in XML means that even if a partial write
is made to the file, the log can be re-sync'ed at the beginning
of the next log entry. So the penalty for non-atomic writes is
lessened (from "partial write wrecks the whole file" to "partial
write wrecks one entry").

So an entry looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log SYSTEM "log.dtd">
<log>
<date>2000-01-02T08:30:22</date>
<user>wunder</user>
<action>logon</action>
</log>
[formfeed]

But the xml declaration and doctype are optional, so the 
space-conscious logger can do this:

<log>
<date>2000-01-02T08:30:22</date>
<user>wunder</user>
<action>logon</action>
</log>
[formfeed]

Or even lose the ignorable whitespace and put it all on one line.

wunder
--
Walter R. Underwood
Senior Staff Engineer
Infoseek Software
GO Network, part of The Walt Disney Company
wunder@infoseek.com
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946


From hansv@net4all.be  Thu Feb  3 16:43:07 2000
From: hansv@net4all.be (hansv@net4all.be)
Date: Thu, 3 Feb 2000 17:43:07 +0100
Subject: [XML-SIG] XML on the Mac
Message-ID: <9E725728E7D0D311B24D00508B6A05430A34BC@PDC>

Hi,

is there anybody who could point me in the right direction to install the
Python-XML package on a mac.

I have no Codewarrior or any other compiler, so binaries would be
appreciated. I'm planning on using the 4DOM package.

Any help is appreciated,

Hans


From uche.ogbuji@fourthought.com  Fri Feb  4 02:57:22 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 03 Feb 2000 19:57:22 -0700
Subject: [XML-SIG] XML on the Mac
In-Reply-To: Your message of "Thu, 03 Feb 2000 17:43:07 +0100."
 <9E725728E7D0D311B24D00508B6A05430A34BC@PDC>
Message-ID: <200002040257.TAA01589@localhost.localdomain>

> is there anybody who could point me in the right direction to install the
> Python-XML package on a mac.
> 
> I have no Codewarrior or any other compiler, so binaries would be
> appreciated. I'm planning on using the 4DOM package.

4DOM should work just fine with xmlproc and the SAX package, both of which are 
pure Python, are developed by Lars Marius Garshol, and can be obtained from 
his web site independently from the XML package.  See

http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/index.html

-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From dwallace@udel.edu  Fri Feb  4 17:45:04 2000
From: dwallace@udel.edu (Dave Wallace)
Date: Fri, 04 Feb 2000 12:45:04 -0500
Subject: [XML-SIG] XML-SIG and 4DOM
Message-ID: <389B1020.19D48865@udel.edu>

Hello,
I am beginning a project that will be using Python to manipulate a
series of HTML and XML documents.  My first thought was of course to
check out the xml-sig here, but I also see that there is a another
implementation.  Everyone seems to be co-existing well enough, but I am
confused as to which I should be using, are the xml-sig tools ready for
use? Is there a comparison of the two somewhere?

Dave.

--
   *************************************
  *  Dave Wallace (dwallace@udel.edu) *
 * MIS-TRG, University of Delaware   *
*************************************


From uche.ogbuji@fourthought.com  Fri Feb  4 18:52:17 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Fri, 04 Feb 2000 11:52:17 -0700
Subject: [XML-SIG] XML-SIG and 4DOM
In-Reply-To: Your message of "Fri, 04 Feb 2000 12:45:04 EST."
 <389B1020.19D48865@udel.edu>
Message-ID: <200002041852.LAA04248@localhost.localdomain>

> I am beginning a project that will be using Python to manipulate a
> series of HTML and XML documents.  My first thought was of course to
> check out the xml-sig here, but I also see that there is a another
> implementation.  Everyone seems to be co-existing well enough, but I am
> confused as to which I should be using, are the xml-sig tools ready for
> use? Is there a comparison of the two somewhere?

The XML SIG has actually adopted 4DOM, 4XSLT and 4XPath, and once we sort out 
some details such as the packaging, they will be in the xml-sig distribution.  
I would say it's pretty "safe" to just go ahead with 4DOM except for the point 
that as it is now packaged, you would use it in such like:

import Ft.Dom ...

While as part of the xml-sig distro it will probably be

import xml.dom ...

At least for a while, we shall probably maintain a version using the old 
packaging, and if the needs of Fourthought and its clients diverges from the 
sig, there will probably be closely parallel versions for a while.  The key 
thing is that if your code depends on the "Ft.Dom" form, you needn't worry 
about having to hack it all into the "xml.dom" form for a while.

Hopefully this didn't just confuse you further.

-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From hansv@net4all.be  Mon Feb  7 10:45:56 2000
From: hansv@net4all.be (hansv@net4all.be)
Date: Mon, 7 Feb 2000 11:45:56 +0100
Subject: [XML-SIG] Problem using 4DOM for xml parsing
Message-ID: <9E725728E7D0D311B24D00508B6A05430A34BE@PDC>

Hi,

I can't seem to get 4DOM for xml parsing working for me, when I try the demo
"python dom_from_xml_file.py addr_book1.xml" (I ran it with a script passing
"read_xml_from_file('Ft/Dom/demo/addr_book1.xml')" from idle). I get
following errors.

Traceback (innermost last):
  File "E:\Python\Tools\idle\ScriptBinding.py", line 131, in
run_module_event
    execfile(filename, mod.__dict__)
  File "E:\Python\Ft\Dom\demo\dom_from_xml_file.py", line 22, in ?
    read_xml_from_file('Ft/Dom/demo/addr_book1.xml')
  File "E:\Python\Ft\Dom\demo\dom_from_xml_file.py", line 7, in
read_xml_from_file
    xml_dom_object = Sax.FromXmlFile(fileName, validate=0)
  File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 155, in FromXmlFile
    rv =
FromXmlStream(fp,ownerDocument,validate,keepAllWs,catName,saxHandlerClass)
  File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 135, in FromXmlStream
    parser.parseFile(stream)
  File "E:\Python\xml\sax\drivers\pylibs.py", line 32, in parseFile
    self.feed(buf)
  File "E:\Python\xml\sax\drivers\drv_xmllib.py", line 68, in feed
    xmllib.XMLParser.feed(self,data)
  File "E:\Python\Lib\xmllib.py", line 149, in feed
    self.goahead(0)
  File "E:\Python\Lib\xmllib.py", line 240, in goahead
    k = self.parse_starttag(i)
  File "E:\Python\Lib\xmllib.py", line 609, in parse_starttag
    self.finish_starttag(nstag, attrdict, method)
  File "E:\Python\Lib\xmllib.py", line 646, in finish_starttag
    self.unknown_starttag(tagname, attrdict)
  File "E:\Python\xml\sax\drivers\drv_xmllib.py", line 24, in
unknown_starttag
    self.doc_handler.startElement(tag,saxutils.AttributeMap(attributes))
  File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 71, in startElement
    self.__completeTextNode()
  File "E:\Python\Ft\Dom\Ext\Reader\Sax.py", line 51, in __completeTextNode
    self.__nodeStack[-1].appendChild(new_text)
  File "E:\Python\Ft\Dom\Document.py", line 223, in appendChild
    return Node.appendChild(self,newChild)
  File "E:\Python\Ft\Dom\Node.py", line 225, in appendChild
    self._4dom_validateNode(newChild)
  File "E:\Python\Ft\Dom\Node.py", line 298, in _4dom_validateNode
    raise DOMException(HIERARCHY_REQUEST_ERR)

I get similar errors with Python on Mac. I'm a newbie to Python and probably
forgot to install something.
Could you please send me a list of things I should have installed to get
this working.

Any help is appreciated,

Hans verschooten


From mmc@r-l.de  Mon Feb  7 19:55:23 2000
From: mmc@r-l.de (Morten M. Christensen)
Date: Mon, 07 Feb 2000 11:55:23 -0800
Subject: [XML-SIG] Installing the XML Toolkit on Windows ?
Message-ID: <389F232B.952509C1@r-l.de>

Hi,

Is there a recompiled version of the XML Toolkit that one can use for
Windows?

Thanks in advance!

Cheers,
Morten Christensen


From paul@prescod.net  Mon Feb  7 20:50:11 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 14:50:11 -0600
Subject: [XML-SIG] PyExpat update
Message-ID: <389F3003.FE2DCA77@prescod.net>

I did some work on pyexpat over the weekend. Modulo bugs I have
introduced, I think that my changes so far have all been backwards
compatible. I list my new features at the bottom of this message.

Before I release, I want some xml-sig opinions on things I would like to
change that are NOT backwards compatible.

1. Attributes would be returned as a mapping {key:value, key:value} and
not a list [key,value,key,value] . Obviously this will break code that
expected the former.

2. Errors will be returned as strings, not integers. You can check for
string equality using "==" The intention is not that you would hard-code
strings into your code, but would rather use pre-defined string
constants: 

foo = parser.Parse( data )
if foo is pyexpat.unclosed_token:
        print "Oops:"+pyexpat.unclosed_token

IIRC, Python is smart about checking for pointer equality before string
equality, right?) 

3. There will be no list of exceptions in the modules interface. Here's
what it looks like now:

>>> import pyexpat
>>> for name in dir( pyexpat ):
...     if name[0:3]=="XML":
...         print name, getattr( pyexpat, name )
...
XML_ERROR_ASYNC_ENTITY 13
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF 16
XML_ERROR_BAD_CHAR_REF 14
XML_ERROR_BINARY_ENTITY_REF 15
XML_ERROR_DUPLICATE_ATTRIBUTE 8
XML_ERROR_INCORRECT_ENCODING 19
XML_ERROR_INVALID_TOKEN 4
XML_ERROR_JUNK_AFTER_DOC_ELEMENT 9
XML_ERROR_MISPLACED_XML_PI 17
XML_ERROR_NONE 0
XML_ERROR_NO_ELEMENTS 3
XML_ERROR_NO_MEMORY 1
XML_ERROR_PARAM_ENTITY_REF 10
XML_ERROR_PARTIAL_CHAR 6
XML_ERROR_RECURSIVE_ENTITY_REF 12
XML_ERROR_SYNTAX 2
XML_ERROR_TAG_MISMATCH 7
XML_ERROR_UNCLOSED_TOKEN 5
XML_ERROR_UNDEFINED_ENTITY 11
XML_ERROR_UNKNOWN_ENCODING 18

I would rather move all of these to an "errors" dictionary so they don't
clutter up the main module namespace (after converting them to strings
instead of integers).

-----------------

Here are the new features I have already added.

 * more handlers:

StartElement,
EndElement,
ProcessingInstruction,
CharacterData,
UnparsedEntityDecl,
NotationDecl,
StartNamespaceDecl,
EndNamespaceDecl,
Comment,
StartCdataSection,
EndCdataSection,
Default,

 * new error handling:

setjmp/longjmp is gone
exceptions are propogated properly even on Windows
I believe the new code is thread-safe.

 * ParseFile:

now possible to parse an open file or file-like object.

 * bug fixes:

setattr throws an proper exeption when you do a bad assignment
setjmp/longjmp works on Windows

 * new bugs:

???

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From paul@prescod.net  Mon Feb  7 20:54:58 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 14:54:58 -0600
Subject: [XML-SIG] Pyexpat error handling
Message-ID: <389F3122.D1E9BF57@prescod.net>

I'd like to improve the error handling in one case but am not sure how.

>>> from pyexpat import ParserCreate
>>> p=ParserCreate()
>>> p.StartElementHandler=lambda x:x
>>> p.ParseFile( open( "../hamlet.xml" ) )
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: too many arguments; expected 1, got 2

You see how it looks like it was the ParseFile that had too many
arguments but really it was the call to the callback. I'm not sure of
the best way to make this more clear. Perhaps add a bogus traceback
entry???

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From paul@prescod.net  Mon Feb  7 21:21:47 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 15:21:47 -0600
Subject: [XML-SIG] Re: PyExpat update
References: <389F1CD5.102E6757@prescod.net> <14495.8295.588575.301610@weyr.cnri.reston.va.us>
Message-ID: <389F376B.2F77F810@prescod.net>

I'll take this to xml-sig where I meant to post in the first place.

"Fred L. Drake, Jr." wrote:
> 
> Paul Prescod writes:
>  > 1. Attributes would be returned as a mapping {key:value, key:value} and
>  > not a list [key,value,key,value] . Obviously this will break code that
>  > expected the former.
> 
>   This is good.
> 
>  > 2. Errors will be returned as strings, not integers. You can check for
>  > string equality using "==" The intention is not that you would hard-code
>  > strings into your code, but would rather use pre-defined string
>  > constants:
> 
>   Please explain *why* you need this change; could the constants not
> still be numbers?  (I'm not saying they *should* be numbers, just
> trying to understand the rationale for the change.)they're just IDs,

Well, why use numbers? The numbers are meaningless. Strings are at least
meaningful for some percentage of the world.

>  > foo = parser.Parse( data )
>  > if foo is pyexpat.unclosed_token:
>  >      print "Oops:"+pyexpat.unclosed_token
> 
>   Are the strings the error messages or some sort of identifier?  If
> they're IDs, this code fragment doesn't make sense.  If they're
> messages, you tie the C component to a specific (human) language.\

They are both messages and identifiers. As you can see above they can be
used as "dumb" identifiers (just like the integers) and they can be used
as strings if you happen to want to output English error messages (which
will be the case in the vast majority of situations just because most
programmers are too lazy/busy to localize).

>   My inclination is to stick with IDs (numeric or string) and map that
> to natural language in the application.

If you want to map in your application, you can do that. If you want to
print out the string, you can do that too. Think of them as IDs that
have a __str__ that happens to be English readable. Oh, and they happen
to be implemented as Python strings. :)

>  > 3. There will be no list of exceptions in the modules interface. Here's
>  > what it looks like now:
> ...
>  > I would rather move all of these to an "errors" dictionary so they don't
>  > clutter up the main module namespace (after converting them to strings
>  > instead of integers).
> 
>   So what's the dictionary look like?  I imagine something like:
> 
> errors = {
>     "XML_ERROR_SYNTAX": "Syntax error!",
>     ...
> }
> 
> or are the integers still there?

No integers.

On second thought, instead of a dictionary I'll use an instance so that
you can say 

if rv == errors.XML_ERROR_SYNTAX:
  ...

>  > setattr throws an proper exeption when you do a bad assignment
>  > setjmp/longjmp works on Windows
> 
>   So is setjmp/longjmp still used, or not?

No. I meant to say that handler error reporting now works on Windows.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From paul@prescod.net  Mon Feb  7 21:30:10 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 15:30:10 -0600
Subject: [XML-SIG] Re: PyExpat update
References: <389F1CD5.102E6757@prescod.net> <3daelcg4s9.fsf@amarok.cnri.reston.va.us>
Message-ID: <389F3962.43AF77A3@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> I'd really much rather write:
> 
> if foo is pyexpat.UNCLOSED_TOKEN:
>      print 'Oops:', pyexpat.errors[ foo ]
> 
> That makes it clear that UNCLOSED_TOKEN is a constant.  (Losing the
> 'XML_' prefix from all the errors is definitely a good idea; losing
> the 'ERROR_' prefix might be not.  The above might be clearer if it
> was pyexpat.ERROR_UNCLOSED_TOKEN or UNCLOSED_TOKEN_ERROR.)

Upper case is one issue. Naming is a second. A third is whether the
referent is an integer or a string. In your example above you make no
use of the fact that it is an integer and it could just as easily be a
string. The only thing making an integer does is force an extra list
lookup in the common case of wanting to report the string error.

if foo is pyexpat.UNCLOSED_TOKEN:
     print 'Oops:', foo

> I have no problem with cluttering the module's namespace with error
> constants, if that's the only reason for the change.  How would you
> code error checks with an 'errors' dictionary?

Well I've been thinking it should be an instance instead of a dictionary

if foo is pyexpat.errors.UNCLOSED_TOKEN:
    print 'Ooops:', foo

Note that there are NO occurrences of dependence on the English
"spelling" of these messages in the code. 

If you want to localize for spanish then you just do:

spanish_errors={pyexpat.errors.UNCLOSED_TOKEN: "Something in Spanish", 
		...}

if foo is pyexpat.errors.UNCLOSED_TOKEN:
    print 'Ooops:', spanish_errors[ foo ]

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From paul@prescod.net  Mon Feb  7 23:12:22 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 17:12:22 -0600
Subject: [XML-SIG] Re: PyExpat update
References: <389F1CD5.102E6757@prescod.net> <mark.949963518@declan>
Message-ID: <389F5156.F69D9CC9@prescod.net>

Mark C Favas wrote:
> 
> Is there any chance that pyexpat could handle DTDs and thus default values for
> attributes (I believe there was a test version of expat that added this
> capability...) 

Yes, it makes sense to use the version of expat with external subset
support.

> I'd like to use the SAX interface to that to spped parsing up -
> I currently use the validating xmlproc part of the PyXML-0.5.3 package.

Pyexpat should meet your needs soon.

> >setjmp/longjmp is gone
> >setjmp/longjmp works on Windows
> 
> Umm - did setjmp/longjmp come back?

No, just a think-o. Error reporting from handlers now works on windows.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From uche.ogbuji@fourthought.com  Tue Feb  8 00:10:27 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 07 Feb 2000 17:10:27 -0700
Subject: [XML-SIG] Pyexpat error handling
In-Reply-To: Your message of "Mon, 07 Feb 2000 14:54:58 CST."
 <389F3122.D1E9BF57@prescod.net>
Message-ID: <200002080010.RAA09581@localhost.localdomain>

> >>> from pyexpat import ParserCreate
> >>> p=ParserCreate()
> >>> p.StartElementHandler=lambda x:x
> >>> p.ParseFile( open( "../hamlet.xml" ) )
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> TypeError: too many arguments; expected 1, got 2
> 
> You see how it looks like it was the ParseFile that had too many
> arguments but really it was the call to the callback. I'm not sure of
> the best way to make this more clear. Perhaps add a bogus traceback
> entry???

If I'm following correctly, we often run into this problem with C/Python 
call-backs.  We usually pass back an error-code that we can use to generate a 
custom exception.  I suppose adding a traceback entry would be another 
approach.  I'm curious as to how well that would work.

-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From paul@prescod.net  Tue Feb  8 01:15:11 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 19:15:11 -0600
Subject: [XML-SIG] Pyexpat error handling
References: <200002080010.RAA09581@localhost.localdomain>
Message-ID: <389F6E1F.F1539944@prescod.net>

uche.ogbuji@fourthought.com wrote:
> 
> If I'm following correctly, we often run into this problem with C/Python
> call-backs.  We usually pass back an error-code that we can use to generate a
> custom exception.  

Okay, but can you distinguish the TypeError generated from a bad arglist
from a regular TypeError in the code? If it actually got into the code
then you have a decent traceback and I'd rather not blow it away.

> I suppose adding a traceback entry would be another
> approach.  I'm curious as to how well that would work.

Me too.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From paul@prescod.net  Tue Feb  8 01:24:21 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 07 Feb 2000 19:24:21 -0600
Subject: [XML-SIG] Installer
Message-ID: <389F7045.CAE6C5D9@prescod.net>

People are really nervous about installing the xml package on Windows.
Why don't we ask Christian Tismer to keep his Windows installer up to
date for us and then link to it from the Python XML web page

ftp://ftp.pns.cc/pub/xml/PythonXML.EXE

While I am at it, how hard would it be to add Python as a "special
topic" along with JPython, Tkinter and (???) Emacs support on the main
python.org page.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/


From larsga@garshol.priv.no  Tue Feb  8 08:19:18 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 08 Feb 2000 09:19:18 +0100
Subject: [XML-SIG] Installing the XML Toolkit on Windows ?
In-Reply-To: <389F232B.952509C1@r-l.de>
References: <389F232B.952509C1@r-l.de>
Message-ID: <m3hffkw1h5.fsf@lambda.garshol.priv.no>

* Morten M. Christensen
| 
| Is there a recompiled version of the XML Toolkit that one can use
| for Windows?

Depends on what you mean by the XML Toolkit, but in the standard
XML-SIG package there are precompiled versions of the C tools for
Windows. 

--Lars M.


From fdrake@acm.org  Tue Feb  8 14:38:45 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 8 Feb 2000 09:38:45 -0500 (EST)
Subject: [XML-SIG] Re: PyExpat update
In-Reply-To: <389F376B.2F77F810@prescod.net>
References: <389F1CD5.102E6757@prescod.net>
 <14495.8295.588575.301610@weyr.cnri.reston.va.us>
 <389F376B.2F77F810@prescod.net>
Message-ID: <14496.10869.954432.391776@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > Well, why use numbers? The numbers are meaningless. Strings are at least
 > meaningful for some percentage of the world.

Paul,
  If they are identifiers, they are meaningless regardless.  They can
only be used as messages if they are natural language, which doesn't
appeal to me.
  As long as they're identifiers, I think it's fine for them to be
strings; I really am not *advocating* the use of numbers.  I do think
that API changes to a known-working module need to be justified in
some way.

 > They are both messages and identifiers. As you can see above they can be
 > used as "dumb" identifiers (just like the integers) and they can be used
 > as strings if you happen to want to output English error messages (which
 > will be the case in the vast majority of situations just because most
 > programmers are too lazy/busy to localize).

  What I'm disturbed by is the conflation of use.  I'd rather see some 
identifier be used and let the user take care of *all* messages
provided to the user.  A "default" set of English messages can (and
should) be provided, but it's better to ask the client code to perform 
some transformation (dictionary lookup, whatever the guise); this
allows better flexibility both for application writers and for future
maintainers of the pyexpat module.

 > On second thought, instead of a dictionary I'll use an instance so that
 > you can say 
 > 
 > if rv == errors.XML_ERROR_SYNTAX:
 >   ...

  That's a bit nicer.  I'm not sure that the namespace needs to be
separated from the module namespace, but I don't object, either.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guglielmetti@dynabits.com  Thu Feb 10 09:32:34 2000
From: guglielmetti@dynabits.com (guglielmetti@dynabits.com)
Date: Thu, 10 Feb 2000 10:32:34 +0100
Subject: [XML-SIG] XBEL tool and remard about its DTD
Message-ID: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>

C'est un message de format MIME en plusieurs parties.

------=_NextPart_000_0007_01BF73B3.984B86D0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

I wrote an XBEL export template for the excellent Compass bookmarks manager
(on MS Windows...)(http://www.softgauge.com/compass/) You can download this
file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/files/.

By the way, the problem I have is that XBEL DTD at
http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support
"european" accented characters such as éàîö... I think it should as long as
some billions people on this Earth will use a different language than
English...

Philippe Guglielmetti    http://i.am/goulu/   "C'est de la folie,  mais
Courtines 16                 goulu@i.am        avec de la méthode"
1242 Satigny (GE)         +41 22 753 4138
Suisse                      ICQ 30265921      (Hamlet, Acte 2 Scene 2)


------=_NextPart_000_0007_01BF73B3.984B86D0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">


<META content=3D"MSHTML 5.00.2919.3800" name=3DGENERATOR></HEAD>
<BODY>
<P><FONT face=3DArial size=3D2><SPAN class=3D234512409-10022000>I wrote =
an XBEL export=20
template for the excellent Compass bookmarks manager (on MS =
Windows...)(<A=20
href=3D"http://www.softgauge.com/compass/">http://www.softgauge.com/compa=
ss/</A>)=20
You can download this file (XBEL.TPL) from my page <A=20
href=3D"http://membres.tripod.fr/Guglielmetti/files/">http://membres.trip=
od.fr/Guglielmetti/files/</A>.</SPAN></FONT></P>
<P><FONT face=3DArial size=3D2><SPAN class=3D234512409-10022000>By the =
way, the=20
problem I have is that&nbsp;XBEL DTD at <A=20
href=3D"http://www.python.org/topics/xml/dtds/xbel-1.0.dtd">http://www.py=
thon.org/topics/xml/dtds/xbel-1.0.dtd</A>&nbsp;does=20
not support "european" accented characters such as =E9=E0=EE=F6... I =
think it should as=20
long as some billions people on this Earth will use a different language =
than=20
English...</SPAN></FONT></P>
<P><FONT face=3D"Courier New" size=3D2>Philippe =
Guglielmetti&nbsp;&nbsp;&nbsp; <A=20
href=3D"http://i.am/goulu/" =
target=3D_blank>http://i.am/goulu/</A>&nbsp;&nbsp;=20
"C'est de la folie,&nbsp; mais <BR>Courtines=20
16&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;=20
goulu@i.am&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; avec de la =
m=E9thode"=20
<BR>1242 Satigny (GE)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
+41 22 753=20
4138&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>Suisse&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
ICQ 30265921&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (Hamlet, Acte 2 Scene =
2)</FONT>=20
</P></BODY></HTML>

------=_NextPart_000_0007_01BF73B3.984B86D0--


From fdrake@acm.org  Thu Feb 10 15:04:31 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 Feb 2000 10:04:31 -0500 (EST)
Subject: [XML-SIG] XBEL tool and remard about its DTD
In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
Message-ID: <14498.54143.475911.873929@weyr.cnri.reston.va.us>


goulu@i.am writes:
 > I wrote an XBEL export template for the excellent Compass bookmarks =
manager
 > (on MS Windows...)(http://www.softgauge.com/compass/) You can downlo=
ad this
 > file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/f=
iles/.

Philippe,
  Cool!  Do you mind if I provide a link to this from the XBEL pages
on python.org?

 > By the way, the problem I have is that XBEL DTD at
 > http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support
 > "european" accented characters such as =E9=E0=EE=F6... I think it sh=
ould as long as
 > some billions people on this Earth will use a different language tha=
n
 > English...

  This I don't understand; what's missing that needs to be added to
the DTD?  This is XML, so the character set is Unicode.
  Now, if it's the *tools* that don't support a wide range of
encodings, that I do understand.  I think this will be fixed when
Python provides direct support for Unicode in the core.  I'll fix them=20=

myself if I have to!


  -Fred

--
Fred L. Drake, Jr.=09  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake@acm.org  Thu Feb 10 16:17:00 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 Feb 2000 11:17:00 -0500 (EST)
Subject: [XML-SIG] XBEL tool and remard about its DTD
In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
References: <14498.54143.475911.873929@weyr.cnri.reston.va.us>
 <000f01bf73de$ab0ad110$bfc0e6c2@HYDRE>
 <14498.57205.913845.298240@weyr.cnri.reston.va.us>
 <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
Message-ID: <14498.58492.684421.784894@weyr.cnri.reston.va.us>

goulu@i.am writes:
 > I wrote an XBEL export template for the excellent Compass bookmarks manager
 > (on MS Windows...)(http://www.softgauge.com/compass/) You can download this
 > file (XBEL.TPL) from my page http://membres.tripod.fr/Guglielmetti/files/.

Fred L. Drake, Jr. writes:
 >   Do you have a URL for Compass?

  Sheesh, I can't read today.  Nevermind....  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From wunder@infoseek.com  Thu Feb 10 17:32:02 2000
From: wunder@infoseek.com (Walter Underwood)
Date: Thu, 10 Feb 2000 09:32:02 -0800
Subject: [XML-SIG] XBEL tool and remard about its DTD
In-Reply-To: <14498.54143.475911.873929@weyr.cnri.reston.va.us>
References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
 <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
Message-ID: <4.3.0.40.1.20000210093037.00d65b60@corp.infoseek.com>

At 10:04 AM 2/10/00 -0500, Fred L. Drake, Jr. wrote:

>  > By the way, the problem I have is that XBEL DTD at
>  > http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support
>  > "european" accented characters such as =E9=E0=EE=F6... I think it=
 should as long as
>  > some billions people on this Earth will use a different language than
>  > English...
>
>   This I don't understand; what's missing that needs to be added to
>the DTD?  This is XML, so the character set is Unicode.

Some DTDs (like XHTML) provide entities for those characters,
like &ouml;. The characters are supported, but you may need
to enter them as numeric references.

wunder
--
Walter R. Underwood
Senior Staff Engineer
Infoseek Software
GO Network, part of The Walt Disney Company
wunder@infoseek.com
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946


From paul@prescod.net  Thu Feb 10 19:25:55 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 10 Feb 2000 11:25:55 -0800
Subject: [XML-SIG] XBEL tool and remard about its DTD
References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
Message-ID: <38A310C3.A346049@prescod.net>

> goulu@i.am wrote:
> 

> By the way, the problem I have is that XBEL DTD at
> http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support
> "european" accented characters such as éàîö... I think it should as
> long as some billions people on this Earth will use a different
> language than English...

A DTD cannot prohibit you from using a non-English character. You can
either do it directly with a Unicode text editor or you can use
&#some_unicode_number; syntax.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From Finlay.Thompson@MCS.VUW.AC.NZ  Fri Feb 11 01:21:18 2000
From: Finlay.Thompson@MCS.VUW.AC.NZ (Finlay Thompson)
Date: Fri, 11 Feb 2000 14:21:18 +1300
Subject: [XML-SIG] Swig, Xerces and python,
Message-ID: <00021114295502.02347@delta.mcs.vuw.ac.nz>

Hi there,

Im just at the stage of choosing tools and I would greatly appreciate advice:

The task is to upgrade an existing and busy internet news site. The problem is
that the existing formating is all in perl and very rigid. The idea is to
create a XPath interface onto the existing database, leave all the publishing
software intact, and provide an XML front end for the graphic people to work
with. 

The existing system is running on a FreeBSD server with Apache and lots of perl.
My experience, and that of others in our group, is with python, so we want to
use python tools.

After looking at the xml.apache.org site I had the idea of running the Xerces
C++ XML parser, that already supports dom and sax and .... , through SWIG to
produce a python interface. 

What do people think? Does anyone know a what is
wrong with Xerces?(apart from not having a python interface)

Finlay.


From gstein@lyra.org  Fri Feb 11 03:57:47 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 10 Feb 2000 19:57:47 -0800 (PST)
Subject: [XML-SIG] Swig, Xerces and python,
In-Reply-To: <00021114295502.02347@delta.mcs.vuw.ac.nz>
Message-ID: <Pine.LNX.4.10.10002101954290.4541-100000@nebula.lyra.org>

If Xerces is not a requirement (it appears that you have only recently
decided to use it), then I might recommend Expat and the PyExpat module.
You'll have your XML parsing and Python interface as quick as you can
install them :-)

Using the XML-SIG release, you'll also have a DOM to work with. SAX
operates inside of there, constructing the DOM -- you won't really need to
worry about it (unless you want to skip the DOM).

Fourthought has got a DOM and, IIRC, an XPath implementation. I think it
is all in Python, but there may be some remaining C cruft. You'll have to
follow up on that.

Anyhow... in a nutshell: I think there are ample alternatives without
going and fooling around with a C++ XML Parser, SWIG, and developing your
own Python/C Extension.

Cheers,
-g

On Fri, 11 Feb 2000, Finlay Thompson wrote:
> Hi there,
> 
> Im just at the stage of choosing tools and I would greatly appreciate advice:
> 
> The task is to upgrade an existing and busy internet news site. The problem is
> that the existing formating is all in perl and very rigid. The idea is to
> create a XPath interface onto the existing database, leave all the publishing
> software intact, and provide an XML front end for the graphic people to work
> with. 
> 
> The existing system is running on a FreeBSD server with Apache and lots of perl.
> My experience, and that of others in our group, is with python, so we want to
> use python tools.
> 
> After looking at the xml.apache.org site I had the idea of running the Xerces
> C++ XML parser, that already supports dom and sax and .... , through SWIG to
> produce a python interface. 
> 
> What do people think? Does anyone know a what is
> wrong with Xerces?(apart from not having a python interface)
> 
> Finlay.
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
> 

-- 
Greg Stein, http://www.lyra.org/


From larsga@garshol.priv.no  Mon Feb 14 08:08:38 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 14 Feb 2000 09:08:38 +0100
Subject: [XML-SIG] XBEL tool and remard about its DTD
In-Reply-To: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
References: <000601bf73ab$36871ed0$4ae7e6c2@HYDRE>
Message-ID: <m3d7q0ury1.fsf@lambda.garshol.priv.no>

* goulu@i.am
| 
| By the way, the problem I have is that XBEL DTD at
| http://www.python.org/topics/xml/dtds/xbel-1.0.dtd does not support
| "european" accented characters such as éàîö... I think it should as
| long as some billions people on this Earth will use a different
| language than English...

I guess the problem you've run into is that you've produced something
like this:

<doc>
Here is an accented char: é.
</doc>


That this causes a problem has nothing to do with the XBEL DTD, but
rather with the fact that conforming XML parsers must assume that this
document is UTF-8-encoded, and your 'é.' is not a legal UTF-8 bit
sequence, hence the problems.

So if you do like this instead, everything should be fine (provided
I've guessed correctly what your problem is):

<?xml version="1.0" encoding="iso-8859-1"?>
<doc>
Here is an accented char: é.
</doc>


--Lars M.


From gstein@lyra.org  Tue Feb 15 03:01:17 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 14 Feb 2000 19:01:17 -0800 (PST)
Subject: [XML-SIG] Re: qp_xml check-in
In-Reply-To: <200001241745.MAA13251@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10002141857550.7924-100000@nebula.lyra.org>

On Mon, 24 Jan 2000, Andrew M. Kuchling wrote:
> You don't seem to have checked in qp_xml.py into the XML-SIG's CVS
> tree.   Going to?  (And have you decided between xml.parsers and
> xml.utils ?)

I've checked this into xml.utils, along with an update to CREDITS and
LICENCE. I've got doc due to Fred, so qp_xml doc will be deferred for a
bit; I left a marker in TODO.

Since it isn't truly a parser, it made a bit more sense under utils.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From FightHunger@4mycommunity.com  Fri Feb 18 10:50:47 2000
From: FightHunger@4mycommunity.com (Fight Hunger)
Date: Fri, 18 Feb 2000 02:50:47 -0800
Subject: [XML-SIG] Every Click Counts
Message-ID: <C7FE7AC45981D311B4830090278D0AFB0BCC25@pdc4myc.4mycommunity.com>

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01BF79FE.05B2924A
Content-Type: text/plain;
	charset="iso-8859-1"

Every 3.6 seconds someone in the world dies of hunger -- 75% of these deaths
are children under 5.

Make A FREE DONATION to fight hunger by visiting
http://www.4mycommunity.com/online/ent/wfp.asp?tag=sc217ei106716 and
clicking on one of our "Every Click Counts" links. 

Each click buys a hungry person 1.5 cups of a staple food.

* It costs you nothing 
* We don't ask you for any personal information
* All donations go to The United Nations World Food Programme
(http://www.wfp.org)


Thank you,

FightHunger@4MyCommunity.com 

P.S. If you think this is a good idea, please pass this message to a friend.

P.P.S. You can also support 300,000 schools and churches via "Every Click
Counts" or online shopping. See http://www.4MyCommunity.com for details.

To be removed from this mailing list, please reply with the word
"Unsubscribe" in the subject.


------_=_NextPart_001_01BF79FE.05B2924A
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2650.12">
<TITLE>Every Click Counts</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Every 3.6 seconds someone in the world dies of hunger =
-- 75% of these deaths are children under 5.</FONT>
</P>

<P><FONT SIZE=3D2>Make A FREE DONATION to fight hunger by visiting <A =
HREF=3D"http://www.4mycommunity.com/online/ent/wfp.asp?tag=3Dsc217ei1067=
16" =
TARGET=3D"_blank">http://www.4mycommunity.com/online/ent/wfp.asp?tag=3Ds=
c217ei106716</A> and clicking on one of our &quot;Every Click =
Counts&quot; links. </FONT></P>

<P><FONT SIZE=3D2>Each click buys a hungry person 1.5 cups of a staple =
food.</FONT>
</P>

<P><FONT SIZE=3D2>* It costs you nothing </FONT>
<BR><FONT SIZE=3D2>* We don't ask you for any personal =
information</FONT>
<BR><FONT SIZE=3D2>* All donations go to The United Nations World Food =
Programme (<A HREF=3D"http://www.wfp.org" =
TARGET=3D"_blank">http://www.wfp.org</A>)</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>Thank you,</FONT>
</P>

<P><FONT SIZE=3D2>FightHunger@4MyCommunity.com </FONT>
</P>

<P><FONT SIZE=3D2>P.S. If you think this is a good idea, please pass =
this message to a friend.</FONT>
</P>

<P><FONT SIZE=3D2>P.P.S. You can also support 300,000 schools and =
churches via &quot;Every Click Counts&quot; or online shopping. See <A =
HREF=3D"http://www.4MyCommunity.com" =
TARGET=3D"_blank">http://www.4MyCommunity.com</A> for =
details.</FONT></P>

<P><FONT SIZE=3D2>To be removed from this mailing list, please reply =
with the word &quot;Unsubscribe&quot; in the subject.</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01BF79FE.05B2924A--


From paul@prescod.net  Fri Feb 18 15:34:03 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 07:34:03 -0800
Subject: [XML-SIG] DOM and Proxies
Message-ID: <38AD666B.E5C3E20C@prescod.net>

I propose that for Python 1.6 we define a generic Proxy mechanism with
the following properties:

You wrap an object by calling Proxy( object ). For an object to be
proxy-wrappable it must have an __unlink__ method. The object that you
pass to the original Proxy() call is calld the LemmingLeader.

Proxies proxy all method calls, field accesses and tp_...methods. 

When a field is accessed or a method called it looks at the returned
object. If it is proxy-wrappable, (e.g. a DOM or grove node) it is
wrapped. If it isn't, (e.g. an integer) it isn't. 

Proxied objects have "families". All objects in a family live for the
same length of time. Families are expected to be completely internally
linked. There is one proxy "family" for every LemmingLeader (created
through an explicit call to the proxy method) (e.g. one per DOM).

There is a hidden "proxy family object" -- it is used only for its
refcount and its reference to the patriarch. When a proxy generates a
proxy, it passes a reference to the family object.

When all proxies go away (the user is no longer interested in the object
family) the family object calls the LemmingLeader's __unlink__ method
which is presumed to unlink the object and recursively unlink and thus
destroy all children.

Proxies have an __realnode__ method to get back the real, real node. If
you hold a real reference to a real node and throw away the last proxy
then you will find that everything in that node's family except the node
is gone.

All of the proxy stuff is implemented in C so that it is very efficient.
Proxied objects can be implemented in C or Python.

We would use this class for both xml.dom and a minidom in the standard
library. It would also be usable from Pyxie, groves easysax, and
anywhere else that reference counting of cyclic objects is necessary.

Opinions? We could actually sneak this class into a C-coded minidom
library (built directly on top of expat) for use by anyone who knows it
is there.

No, I am not volunteering to do it -- at least not for another several
weeks.`

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From paul@prescod.net  Fri Feb 18 15:53:56 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 07:53:56 -0800
Subject: [XML-SIG] Minidom proposal
Message-ID: <38AD6B14.F4F7E84E@prescod.net>

I propose the following interface for a module that would go
into Python 1.6 (excuse my IDLish shorthand)

class Node :
	[List of Node] childNodes
	Node parent

class Document(Node):
	Element documentElement

class Attribute(Node):
	string namespaceURI
	string prefix
	string localName
	string value
	element ownerElement

class Element(Node):
	string tagname
	# check what the DOM does with namespaces
	{Dictionary of Name->Value} attributes
	GetElementsByTagName( tagname ) -> List[Node]:
	getElementsByTagNameNS(
		DOMString namespaceURI, 
                DOMString localName) -> NodeList
	string namespaceURI
	string prefix
	string localName

class Comment(Node):
	String data

class ProcessingInstruction(Node):
	String target
	String data

class Text( Node ):
	String data

All properties could be read-write but there would be no special cut and
paste/clone methods.

Opinions?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From paul@prescod.net  Fri Feb 18 15:54:03 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 07:54:03 -0800
Subject: [XML-SIG] DOM and Proxies
Message-ID: <38AD6B1B.C57AD16B@prescod.net>

I propose that for Python 1.6 we define a generic Proxy mechanism with
the following properties:

You wrap an object by calling Proxy( object ). For an object to be
proxy-wrappable it must have an __unlink__ method. The object that you
pass to the original Proxy() call is calld the LemmingLeader.

Proxies proxy all method calls, field accesses and tp_...methods. 

When a field is accessed or a method called it looks at the returned
object. If it is proxy-wrappable, (e.g. a DOM or grove node) it is
wrapped. If it isn't, (e.g. an integer) it isn't. 

Proxied objects have "families". All objects in a family live for the
same length of time. Families are expected to be completely internally
linked. There is one proxy "family" for every LemmingLeader (created
through an explicit call to the proxy method) (e.g. one per DOM).

There is a hidden "proxy family object" -- it is used only for its
refcount and its reference to the patriarch. When a proxy generates a
proxy, it passes a reference to the family object.

When all proxies go away (the user is no longer interested in the object
family) the family object calls the LemmingLeader's __unlink__ method
which is presumed to unlink the object and recursively unlink and thus
destroy all children.

Proxies have an __realnode__ method to get back the real, real node. If
you hold a real reference to a real node and throw away the last proxy
then you will find that everything in that node's family except the node
is gone.

All of the proxy stuff is implemented in C so that it is very efficient.
Proxied objects can be implemented in C or Python.

We would use this class for both xml.dom and a minidom in the standard
library. It would also be usable from Pyxie, groves easysax, and
anywhere else that reference counting of cyclic objects is necessary.

Opinions? I would actually sneak this class into a C-coded minidom
library (built directly on top of expat) for use by anyone who knows it
is there.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From akuchlin@mems-exchange.org  Fri Feb 18 18:56:46 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 18 Feb 2000 13:56:46 -0500 (EST)
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: <38AD6B1B.C57AD16B@prescod.net>
References: <38AD6B1B.C57AD16B@prescod.net>
Message-ID: <14509.38382.968129.719917@amarok.cnri.reston.va.us>

Paul Prescod writes:
>Opinions? I would actually sneak this class into a C-coded minidom
>library (built directly on top of expat) for use by anyone who knows it
>is there.

Open question: is the proxy mechanism still useful if a garbage
collection mechanism for collecting cycles gets into 1.6?  (Neal
Schemenauer is working on something, but it's too early to tell if
it'll get into 1.6; perhaps the cost will be too high.)

If cyclic trash was collected, would you still need a proxy mechanism?
Maybe you'd use it for performance reasons, to save the GC some work,
making less trash for it to scan through, but then you're losing a
tiny bit of performance from the extra indirection on every access to
the object.  My concern is simply to avoid spending time building
something that turns out to be unneeded.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "What's that awful noise?"
    "I beg your pardon... "Awful noise"? A good way to talk about my singing!"
    "No, Doctor, not that awful noise -- the other one!"
    -- Barbara and the Doctor, in "The Chase"


From ken@bitsko.slc.ut.us  Fri Feb 18 20:07:42 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 18 Feb 2000 14:07:42 -0600
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: "Andrew M. Kuchling"'s message of Fri, 18 Feb 2000 13:56:46 -0500 (EST)
References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us>
Message-ID: <x57lg2p94h.fsf@bitsko.slc.ut.us>

"Andrew M. Kuchling" <akuchlin@mems-exchange.org> writes:

> Paul Prescod writes:
> >Opinions? I would actually sneak this class into a C-coded minidom
> >library (built directly on top of expat) for use by anyone who knows it
> >is there.
> 
> Open question: is the proxy mechanism still useful if a garbage
> collection mechanism for collecting cycles gets into 1.6?  (Neal
> Schemenauer is working on something, but it's too early to tell if
> it'll get into 1.6; perhaps the cost will be too high.)

Probably not needed purely for GC reasons.

> Maybe you'd use it for performance reasons, to save the GC some work,
> making less trash for it to scan through, but then you're losing a
> tiny bit of performance from the extra indirection on every access to
> the object.  My concern is simply to avoid spending time building
> something that turns out to be unneeded.

Probably not a performance boost either, a GC would still likely scan
all the objects and using proxies would actually add more objects to
be scanned.

> If cyclic trash was collected, would you still need a proxy mechanism?

I'd like to offer up a different reason for using proxies: to remove
the concept of "ownership" from fragments of the tree so that they can
be shared by multiple processing steps.

It's not clear to me why the DOM processing model has such a strict
concept of "owning document".  To a lesser extent, a lot of data
models use parent references because the data is inherently hierarchic
but ignore the usefulness of being able to share tree fragments
between different trees.

I have found proxies to be very good at providing the illusion of
heritage while in reality allowing fragments to be shared among trees.

  -- Ken


From ken@bitsko.slc.ut.us  Fri Feb 18 20:12:24 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 18 Feb 2000 14:12:24 -0600
Subject: [XML-SIG] Minidom proposal
In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 07:53:56 -0800
References: <38AD6B14.F4F7E84E@prescod.net>
Message-ID: <x566vmp8wn.fsf@bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> I propose the following interface for a module that would go
> into Python 1.6 (excuse my IDLish shorthand)

All looks good to me.

> class Element(Node):
> 	string tagname
> 	# check what the DOM does with namespaces
> 	{Dictionary of Name->Value} attributes

To clarify, this dictionary follows the earlier proposal that
attributes are keyed by (namespaceURI, localName) tuples, correct?

  -- Ken

P.S. I wish Perl could do that gracefully.  :-/


From fdrake@acm.org  Fri Feb 18 22:36:32 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 Feb 2000 17:36:32 -0500 (EST)
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: <x57lg2p94h.fsf@bitsko.slc.ut.us>
References: <38AD6B1B.C57AD16B@prescod.net>
 <14509.38382.968129.719917@amarok.cnri.reston.va.us>
 <x57lg2p94h.fsf@bitsko.slc.ut.us>
Message-ID: <14509.51568.114533.714616@weyr.cnri.reston.va.us>

Ken MacLeod writes:
 > I'd like to offer up a different reason for using proxies: to remove
 > the concept of "ownership" from fragments of the tree so that they can
 > be shared by multiple processing steps.

  I like this!  This also requires proxies to work cleanly, as far as
I can tell.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake@acm.org  Fri Feb 18 23:26:31 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 Feb 2000 18:26:31 -0500 (EST)
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: <38ADAD92.BE9948FB@prescod.net>
References: <38AD6B1B.C57AD16B@prescod.net>
 <14509.38382.968129.719917@amarok.cnri.reston.va.us>
 <x57lg2p94h.fsf@bitsko.slc.ut.us>
 <14509.51568.114533.714616@weyr.cnri.reston.va.us>
 <38ADAD92.BE9948FB@prescod.net>
Message-ID: <14509.54567.163149.694183@weyr.cnri.reston.va.us>

Paul Prescod writes:
 > Insofar as this is minidom and provides minimal support for moving
 > things around, cloning them and so forth, I wouldn't put in proxies just
 > to get object reuse. In the full PyDOM they would be more appropriate.

  I was thinking more of the general case, DOM or otherwise.  I think
it would be really nice to have this sort of proxy available in a
"high performance" implementation.  The reality is that several
variants might be needed (with various support for mappings,
sequences, etc.), but that's a detail symptomatic of the type/class
dichotomy and not a long-term issue.  It may not be realistic to share 
one implementation, and may not be worth the C code if not.
  But to support "sharable sub-hierarchies" as Ken described, we would 
need to use some sort of proxy solution.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From Mike.Olson@Fourthought.com  Mon Feb 21 08:45:43 2000
From: Mike.Olson@Fourthought.com (Mike Olson)
Date: Mon, 21 Feb 2000 01:45:43 -0700
Subject: [XML-SIG] Minidom proposal
References: <38AD6B14.F4F7E84E@prescod.net> <x566vmp8wn.fsf@bitsko.slc.ut.us>
Message-ID: <38B0FB36.955F2C9A@Fourthought.com>

--------------7AC834847477473E2914850B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Ken MacLeod wrote:

> Paul Prescod <paul@prescod.net> writes:
>
> > I propose the following interface for a module that would go
> > into Python 1.6 (excuse my IDLish shorthand)
>
> All looks good to me.
>
> > class Element(Node):
> >       string tagname
> >       # check what the DOM does with namespaces
> >       {Dictionary of Name->Value} attributes
>
> To clarify, this dictionary follows the earlier proposal that
> attributes are keyed by (namespaceURI, localName) tuples, correct?
>

What if we are in a non-namespace-aware system?  Should the key be
(None,localName) or just localName?

Mike


>
>   -- Ken
>
> P.S. I wish Perl could do that gracefully.  :-/
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig

--
Mike Olson
Senior Consultant Fourthought, Inc.
http://www.fourthought.com http://www.opentechnology.com
720-304-0152


--------------7AC834847477473E2914850B
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Ken MacLeod wrote:
<blockquote TYPE=CITE>Paul Prescod &lt;paul@prescod.net> writes:
<p>> I propose the following interface for a module that would go
<br>> into Python 1.6 (excuse my IDLish shorthand)
<p>All looks good to me.
<p>> class Element(Node):
<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string tagname
<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # check what the DOM does with
namespaces
<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {Dictionary of Name->Value} attributes
<p>To clarify, this dictionary follows the earlier proposal that
<br>attributes are keyed by (namespaceURI, localName) tuples, correct?
<br>&nbsp;</blockquote>
What if we are in a non-namespace-aware system?&nbsp; Should the key be
(None,localName) or just localName?
<p>Mike
<br>&nbsp;
<blockquote TYPE=CITE>&nbsp;
<br>&nbsp; -- Ken
<p>P.S. I wish Perl could do that gracefully.&nbsp; :-/
<p>_______________________________________________
<br>XML-SIG maillist&nbsp; -&nbsp; XML-SIG@python.org
<br><a href="http://www.python.org/mailman/listinfo/xml-sig">http://www.python.org/mailman/listinfo/xml-sig</a></blockquote>

<pre>--&nbsp;
Mike Olson
Senior Consultant Fourthought, Inc.
<A HREF="http://www.fourthought.com">http://www.fourthought.com</A> <A HREF="http://www.opentechnology.com">http://www.opentechnology.com</A>
720-304-0152</pre>
&nbsp;</html>

--------------7AC834847477473E2914850B--


From ken@bitsko.slc.ut.us  Mon Feb 21 16:21:14 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 21 Feb 2000 10:21:14 -0600
Subject: [XML-SIG] Proposal: Marrying SAX2 and DOM
Message-ID: <x5ema6h6h1.fsf@bitsko.slc.ut.us>

As SAX2 comes near to being finalized, I'd like to make a proposal for
the Python binding that could make SAX2/Python a lot simpler.  SAX2
adds support for specifying "features" that the parser supports.  Many
of these features include additional properties be made available to
handlers.  In the Java binding these additional properties are only
available through "callbacks" to the parser.

What I would like to propose is that the Python SAX2 binding pass
objects, specifically DOM-conformant objects, as a single parameter
rather than using both positional parameters and callbacks.

Benefits:

  * Will allow additional properties to be passed to handlers in a
    straightforward way, making parser extensions and filters much
    simpler to use and implement.

  * Becomes much easier using SAX to traverse a DOM, each SAX event
    simply passes the DOM node itself, rather than having a domNode()
    callback on the "parser".

Drawbacks:

  * A wider gap between the Java binding and the Python binding.

  * Creating objects for each event is a performance hit.

The parser would most likely use a DOMFactory specific to the type of
DOM objects the user would want, MiniDOM, PyDOM, etc.  If the parse is
being used simply to create a DOM tree, then the DOM objects passed in
the events can be used to create the tree (by just appending children
to their parent).

This pattern has been used in the Perl SAX binding and I've found it
to be extremely convenient.  I would propose using DOM nodes for SAX2
(Java) altogether for the same reasons, but I think Java's strict
typing would be very prohibitive to this sort of idea.

Comments?

  -- Ken


From gstein@lyra.org  Fri Feb 18 23:53:29 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 18 Feb 2000 15:53:29 -0800 (PST)
Subject: [XML-SIG] Minidom proposal
In-Reply-To: <38AD6B14.F4F7E84E@prescod.net>
Message-ID: <Pine.LNX.4.10.10002181544300.8706-100000@nebula.lyra.org>

On Fri, 18 Feb 2000, Paul Prescod wrote:
>...
> class Attribute(Node):
> 	string namespaceURI
> 	string prefix
> 	string localName
> 	string value
> 	element ownerElement

Attribute is a subclass of Node, which has a parent. Why not use the
parent for the owner?

> class Element(Node):
> 	string tagname
> 	# check what the DOM does with namespaces
> 	{Dictionary of Name->Value} attributes
> 	GetElementsByTagName( tagname ) -> List[Node]:
> 	getElementsByTagNameNS(
> 		DOMString namespaceURI, 
>                 DOMString localName) -> NodeList

GetElementsByTagName* should have a matching capitaliztion.

DOMString??
NodeList -> List[Node]

> 	string namespaceURI
> 	string prefix
> 	string localName

Isn't this a duplicate of tagname? Why have both?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From larsga@garshol.priv.no  Mon Feb 21 08:23:36 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 21 Feb 2000 09:23:36 +0100
Subject: [XML-SIG] SAX 2.0, again
Message-ID: <m3hff3ezg7.fsf@lambda.garshol.priv.no>


Some weeks ago David Megginson released a SAX 2.0 beta in Java, and
this release appears to be quite close to the final form of SAX 2.0.
I've started working on translating this release into Python, but
there are some general design issues that need to be thought through
before this can be completed.


### XML names

The first problem is that of how to represent XML names. SAX 2.0 can
handle namespaces, and so we must somehow represent namespace-names.
I can see several different ways of doing this, all with their
advantages and disadvantages, and would very much like to hear the
opinion of the XML-SIG on this.

The alternatives I've thought of are

 - use (uri, localpart) tuple for namespace-names, simple strings for
   ordinary names

 - use (uri, localpart, rawname) for namespace-names, simple strings
   for ordinary names; rawname must be communicated out of band
   somehow

 - use XMLName objects for names, regardless of kind. If these were
   made immutable and drivers used hashtables of these this might not
   be too inefficient.

 - use separate parameters for uri, localpart and rawname, letting
   some of these be None depending on what was in the document and
   what the parser supports.


### Driver maintenance

Given that SAX 2.0 is larger than SAX 1.0 and also supports various
possibilities for extensions, writing a good and complete SAX 2.0
driver can be quite a bit of work. If any parser writers or others
feel like contributing to this work by writing and maintaining
drivers, then please feel encouraged to do so.

If nobody does write drivers, I will do it, but it will probably take
longer and they may not be as complete.


### Unicode support

Python 1.6 will have Unicode support, and so we should make PySAX 2.0
Unicode-ready. The main part of this is really adding the InputSource
object to the library, since this allows applications to feed byte or
character streams to the parser in a convenient way.

The question is: how will this distinction look in Python 1.6? Will
there be one? How should we relate to it? 

Could we do it simply by using file-like objects with different
semantics? 


### easySAX vs Pyxie

What should we do with this? Should we try to turn Pyxie into what we
envisioned easySAX to be, or should we maintain two such libraries? I
see advantages and disadvantages to both approaches.

One idea I've had for easySAX is something inspired by John Aycock's
Spark parser generator, that one could write SAX document handlers
with three kinds of special methods: start-element, end-element and
element content methods. These could use the 's_', 'e_' and 'c_'
prefixes, respectively.

Unlike in xmllib, though, the names of these methods would have no
significance beyond the prefix. Instead, the documentation string
could contain very simple XPath expressions to be used to dispatch
events onto the various methods.

This should allow us to write easySAX applications that look somewhat
like this (self.out is some XML generator class which may or may not
be part of easySAX):

class MyHandler(GenericEZSAXHandler):

  def s_doc(self, attrs):
    ' document '
    self.out.write_template("top")

  def c_sec_title(self, contents, attrs):
    ' section / title '
    self.out.make_element('h1', contents)

  def c_subsec_title(self, contents, attrs):
    ' subsection / title '
    self.out.make_element('h2', contents)

  def e_doc(self):
    ' document '
    self.out.write_template("bottom")


I'm fairly confident that a layer on top of SAX 2.0 to enable such
easySAX applications could be made fairly fast and it should be pretty
easy to implement as well. (I've made an early sketch of this.)

The only question is what to do with namespace-names. Perhaps the
application could declare constant namespace prefixes to be used in
the documentation strings in its constructor.


--Lars M.


From paul@prescod.net  Fri Feb 18 20:30:13 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 12:30:13 -0800
Subject: [XML-SIG] Minidom proposal
References: <38AD6B14.F4F7E84E@prescod.net> <x566vmp8wn.fsf@bitsko.slc.ut.us>
Message-ID: <38ADABD5.E0E02CBA@prescod.net>

> To clarify, this dictionary follows the earlier proposal that
> attributes are keyed by (namespaceURI, localName) tuples, correct?

Good question. I think that for simplicity we should index attributes
both with tuples AND with a simple tagname. I don't want to mess up
node["href"] just to support the much less common
node[("http://www.w3.org/TR/xlink","href")].

This is especially the case since attributes do not "namespace default."
So namespaced attributes will actually be relatively rare.

>   -- Ken
> 
> P.S. I wish Perl could do that gracefully.  :-/

You'd better get used to saying that. <0.9 wink>

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From paul@prescod.net  Fri Feb 18 20:37:38 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 12:37:38 -0800
Subject: [XML-SIG] DOM and Proxies
References: <38AD6B1B.C57AD16B@prescod.net>
 <14509.38382.968129.719917@amarok.cnri.reston.va.us>
 <x57lg2p94h.fsf@bitsko.slc.ut.us> <14509.51568.114533.714616@weyr.cnri.reston.va.us>
Message-ID: <38ADAD92.BE9948FB@prescod.net>

"Fred L. Drake, Jr." wrote:
> 
>   I like this!  This also requires proxies to work cleanly, as far as
> I can tell.
> 
>   -Fred

Insofar as this is minidom and provides minimal support for moving
things around, cloning them and so forth, I wouldn't put in proxies just
to get object reuse. In the full PyDOM they would be more appropriate.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From tpassin@idsonline.com  Mon Feb 21 17:25:29 2000
From: tpassin@idsonline.com (THOMAS PASSIN)
Date: Mon, 21 Feb 2000 12:25:29 -0500
Subject: [XML-SIG] SAX 2.0, again
References: <m3hff3ezg7.fsf@lambda.garshol.priv.no>
Message-ID: <002101bf7c90$a8f18a80$5da4fea9@tompassin>

Lars Marius Garshol wrote:
>
>
> Some weeks ago David Megginson released a SAX 2.0 beta in Java, and
> this release appears to be quite close to the final form of SAX 2.0.
> I've started working on translating this release into Python, but
> there are some general design issues that need to be thought through
> before this can be completed.
>
>
> ### XML names
>
> The first problem is that of how to represent XML names. SAX 2.0 can
> handle namespaces, and so we must somehow represent namespace-names.

I think we should make it as easy as possible to use either namespace-style
names or ordinary names, so both can be used in the same way as far as
possible.  The application shouldn't have to figure out the structure before
it can even extract the value.  So I don't think the xml name should be a
tuple if it has a declared namespace but a string if there is no namespace.

With this in mind, how about

((prefix,localpart),uri)

If namespaces were not being used, prefix and uri would be None (or possibly
the empty string).  This allows the use of alternative values for the prefix
(so you could, for example, use xslt:template for xsl:template if you wanted
to, which is the way it is supposed to work), and you could check the uri
value anytime you needed to learn the exact namespace.  localpart would
always be a string.

Also, if you had a document containing several prefixes for the same
namespace, you could easily use the localpart and uri, rather than the
prefix.

I don't recall how it shook out on XML-DEV, but there were a number of posts
that said it was important to keep the actual prefix value, and this
approach would do that.

BTW, "uri" doesn't actually need to be a uri, any unique string will do.

> I can see several different ways of doing this, all with their
> advantages and disadvantages, and would very much like to hear the
> opinion of the XML-SIG on this.
>
> The alternatives I've thought of are
>
>  - use (uri, localpart) tuple for namespace-names, simple strings for
>    ordinary names
>
>  - use (uri, localpart, rawname) for namespace-names, simple strings
>    for ordinary names; rawname must be communicated out of band
>    somehow
>
>  - use XMLName objects for names, regardless of kind. If these were
>    made immutable and drivers used hashtables of these this might not
>    be too inefficient.
>
>  - use separate parameters for uri, localpart and rawname, letting
>    some of these be None depending on what was in the document and
>    what the parser supports.
>
<snip/>

Tom Passin


From ken@bitsko.slc.ut.us  Mon Feb 21 19:37:46 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 21 Feb 2000 13:37:46 -0600
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: Lars Marius Garshol's message of "21 Feb 2000 09:23:36 +0100"
References: <m3hff3ezg7.fsf@lambda.garshol.priv.no>
Message-ID: <x54sb2uz1x.fsf@bitsko.slc.ut.us>

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> ### XML names
> 
> The first problem is that of how to represent XML names. SAX 2.0 can
> handle namespaces, and so we must somehow represent namespace-names.
> I can see several different ways of doing this, all with their
> advantages and disadvantages, and would very much like to hear the
> opinion of the XML-SIG on this.
> 
> The alternatives I've thought of are
> 
>  - use (uri, localpart) tuple for namespace-names, simple strings for
>    ordinary names
> 
>  - use (uri, localpart, rawname) for namespace-names, simple strings
>    for ordinary names; rawname must be communicated out of band
>    somehow
> 
>  - use XMLName objects for names, regardless of kind. If these were
>    made immutable and drivers used hashtables of these this might not
>    be too inefficient.
> 
>  - use separate parameters for uri, localpart and rawname, letting
>    some of these be None depending on what was in the document and
>    what the parser supports.

The proposal I made earlier (passing objects instead of positional
parameters) is another solution.  From my proposal and Paul's miniDOM
proposal earlier, start_element would be passed an Element object:

class Element(Node):
        string tagName
        {Dictionary of Name->Value} attributes
        string namespaceURI
        string prefix
        string localName

I believe tagName is the raw name and the remaining three are set
depending on whether NS processing is turned on.  For attributes to be
a dictionary and support both NS and no-NS processing, I like (uri,
localName) for NS and (None, tagName) for no-NS.

> ### Unicode support
> 
> Python 1.6 will have Unicode support, and so we should make PySAX 2.0
> Unicode-ready. The main part of this is really adding the InputSource
> object to the library, since this allows applications to feed byte or
> character streams to the parser in a convenient way.

Adding InputSource may not be necessary if there was a method
parseCharFile() to specify character streams.

> ### easySAX vs Pyxie
> 
> What should we do with this? Should we try to turn Pyxie into what we
> envisioned easySAX to be, or should we maintain two such libraries? I
> see advantages and disadvantages to both approaches.
> 
> One idea I've had for easySAX is something inspired by John Aycock's
> Spark parser generator, that one could write SAX document handlers
> with three kinds of special methods: start-element, end-element and
> element content methods. These could use the 's_', 'e_' and 'c_'
> prefixes, respectively.

> I'm fairly confident that a layer on top of SAX 2.0 to enable such
> easySAX applications could be made fairly fast and it should be pretty
> easy to implement as well. (I've made an early sketch of this.)

If I understand correctly, yes, having a SAX filter that calls
tag-based methods names should be really easy.

I think the part I don't understand about easySAX and Pyxie (and it's
probably from not having the opportunity to use them) is: why isn't
the SAX binding already this easy?

  -- Ken


From paul@prescod.net  Sat Feb 19 16:12:21 2000
From: paul@prescod.net (Paul Prescod)
Date: Sat, 19 Feb 2000 08:12:21 -0800
Subject: [XML-SIG] Minidom proposal
References: <Pine.LNX.4.10.10002181544300.8706-100000@nebula.lyra.org>
Message-ID: <38AEC0E5.95F54819@prescod.net>

Greg Stein wrote:
> 
> Attribute is a subclass of Node, which has a parent. Why not use the
> parent for the owner?

This is a common debate in the XML world. Attributes are not considered
"children" of elements so it is somewhat weird to call the owner
"parent". You're my parent but I'm not your child. Given that the
argument could go either way we might as well do it the way that PyDOM
and 4DOM currently do (AFAIK).

> GetElementsByTagName* should have a matching capitaliztion.

True.

> DOMString??

Cut and paste error. For Python, DOMString is just PyString --
especially since we will soon have Unicode.

> NodeList -> List[Node]

Right.

> Isn't this a duplicate of tagname? Why have both?

tagname = html:a or a
localname = a (always)

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From mrnolta@princeton.edu  Tue Feb 22 09:23:10 2000
From: mrnolta@princeton.edu (Michael Nolta)
Date: Tue, 22 Feb 2000 04:23:10 -0500 (EST)
Subject: [XML-SIG] installation problem
Message-ID: <Pine.LNX.4.10.10002220418370.1567-100000@ophelia.princeton.edu>

I'm having problems installing. It can't file the file

	/usr/lib/python1.5/config/Makefile

which it needs to make sedscript. I'm using RedHat 6.1, and there's no
config/ directory in /usr/lib/python1.5.

-Mike

---

VERSION=`python -c "import sys; print sys.version[:3]"`; \
installdir=`python -c "import sys; print sys.prefix"`; \
exec_installdir=`python -c "import sys; print sys.exec_prefix"`; \
make -f ./Makefile.pre.in VPATH=. srcdir=. \
	VERSION=$VERSION \
	installdir=$installdir \
	exec_installdir=$exec_installdir \
	Makefile
make[1]: Entering directory `/scr1/build/PyXML-0.5.2/extensions'
make[1]: *** No rule to make target `/usr/lib/python1.5/config/Makefile',
needed by `sedscript'.  Stop.

---


From hannu@tm.ee  Tue Feb 22 09:45:42 2000
From: hannu@tm.ee (Hannu Krosing)
Date: Tue, 22 Feb 2000 11:45:42 +0200
Subject: [XML-SIG] installation problem
References: <Pine.LNX.4.10.10002220418370.1567-100000@ophelia.princeton.edu>
Message-ID: <38B25AC6.13963FA@tm.ee>

Michael Nolta wrote:
> 
> I'm having problems installing. It can't file the file
> 
>         /usr/lib/python1.5/config/Makefile
> 
> which it needs to make sedscript. I'm using RedHat 6.1, and there's no
> config/ directory in /usr/lib/python1.5.

install python-devel-*.rpm

-------
Hannu


From larsga@garshol.priv.no  Tue Feb 22 16:06:12 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 22 Feb 2000 17:06:12 +0100
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: <002101bf7c90$a8f18a80$5da4fea9@tompassin>
References: <m3hff3ezg7.fsf@lambda.garshol.priv.no> <002101bf7c90$a8f18a80$5da4fea9@tompassin>
Message-ID: <m3hff12pe3.fsf@lambda.garshol.priv.no>

* THOMAS PASSIN
| 
| I think we should make it as easy as possible to use either
| namespace-style names or ordinary names, so both can be used in the
| same way as far as possible.  

Agreed. This has to be the overall goal.

| The application shouldn't have to figure out the structure before it
| can even extract the value.  So I don't think the xml name should be
| a tuple if it has a declared namespace but a string if there is no
| namespace.

This is a valid point, unless we can work around the problem somehow.

| With this in mind, how about
| 
| ((prefix,localpart),uri)

For performance and convenience it would be better to do this as

  (prefix, localpart, uri)

but I agree that this is better than

  (uri, localpart, rawname)

since you rarely want the rawname anyway, and when you want it you can
get it from the prefix + localpart.
 
The only problem I have with this is that it means that names with
different prefixes do not compare as equal. This is why I would prefer
to have the prefix reported somewhere else. (Any good ideas for where?)

| I don't recall how it shook out on XML-DEV, but there were a number
| of posts that said it was important to keep the actual prefix value,
| and this approach would do that.

I think it was needed for the DOM, and it's also part of the lexical
information that one sometimes needs, so there are definitely reasons
to keep it. The question is where.
 
| BTW, "uri" doesn't actually need to be a uri, any unique string will
| do.
 
Perhaps, but it doesn't really matter to us. :-)

--Lars M.


From larsga@garshol.priv.no  Tue Feb 22 15:59:11 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 22 Feb 2000 16:59:11 +0100
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: <x54sb2uz1x.fsf@bitsko.slc.ut.us>
References: <m3hff3ezg7.fsf@lambda.garshol.priv.no> <x54sb2uz1x.fsf@bitsko.slc.ut.us>
Message-ID: <m3itzh2pps.fsf@lambda.garshol.priv.no>

* Ken MacLeod
| 
| [representing XML names]
| The proposal I made earlier (passing objects instead of positional
| parameters) is another solution.

Yeah, I saw that after I'd posted and the list email had been fixed
again. I've looked at that proposal and want to think it through a
little before I say anything about it.

| Adding InputSource may not be necessary if there was a method
| parseCharFile() to specify character streams.

We still need to be able to return something from EntityResolver that
the parser can read from correctly, and I think InputSource is the way
to go. It's a very simple class anyway, and would be implemented only
once (in the SAX library).
 
| [easySAX vs Pyxie]
| 
| If I understand correctly, yes, having a SAX filter that calls
| tag-based methods names should be really easy.

It would, and we already have that. What I was thinking of was using
documentation comments to do dispatching on instead, since this would
give us more advanced dispatching.

You could do things like

  def c_beep(self, contents, attrs):
     ' section / title '
     self.out.make_element('h1', contents)

| I think the part I don't understand about easySAX and Pyxie (and
| it's probably from not having the opportunity to use them) is: why
| isn't the SAX binding already this easy?

It's a good question. The main reason is that I wanted something very
simple that could be implemented by parser libraries without too much
fuss and also something that could easily be put on top of databases,
converters and other kinds of tools to produce XML output.

Similarly, I wanted it to be possible to make competing toolkits for
making XML processing simple on top of a standard parser API so that
it would be trivially easy for all these toolkits to support all XML
parsers (and other XML generators) available in Python.

--Lars M.


From tpassin@idsonline.com  Wed Feb 23 02:29:25 2000
From: tpassin@idsonline.com (THOMAS PASSIN)
Date: Tue, 22 Feb 2000 21:29:25 -0500
Subject: [XML-SIG] SAX 2.0, again
References: <m3hff3ezg7.fsf@lambda.garshol.priv.no> <002101bf7c90$a8f18a80$5da4fea9@tompassin> <m3hff12pe3.fsf@lambda.garshol.priv.no>
Message-ID: <002d01bf7da5$d1ee6280$5c2a08d1@idsonline.com>

Lars Marius Garshol wrote, replying to my post:

<snip/>
> | The application shouldn't have to figure out the structure before it
> | can even extract the value.  So I don't think the xml name should be
> | a tuple if it has a declared namespace but a string if there is no
> | namespace.
>
> This is a valid point, unless we can work around the problem somehow.
>
> | With this in mind, how about
> |
> | ((prefix,localpart),uri)
>
> For performance and convenience it would be better to do this as
>
>   (prefix, localpart, uri)
>
> but I agree that this is better than
>
>   (uri, localpart, rawname)
>
> since you rarely want the rawname anyway, and when you want it you can
> get it from the prefix + localpart.
>
> The only problem I have with this is that it means that names with
> different prefixes do not compare as equal. This is why I would prefer
> to have the prefix reported somewhere else. (Any good ideas for where?)
>
OK, what about (prefix,(localpart,uri)).  Then we compare  names with
names_compare=(name1[1]==name2[1]).  Since names are the same by definition
if the localpart and namespace are identical, this should work fine.  And
the prefix is still there, tagging along for the ride.  As for performance,
you know far more about Python performance than I.  But maybe some
analysis... say we are processing 10,000 elements using SAX with some
typical kind of element processing methods.  What fraction of the total
processing time would be lost by using this structure and name test instead
of some optimized structure?  If the loss might be, say, 5%, I say don't
worry about it one little bit.  If it's 25% of the ***overall*** processing
time, probably that is too much.

Who can shed some reasonably definitive light on this?

Regards,

Tom Passin


From paul@prescod.net  Fri Feb 18 20:21:07 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 18 Feb 2000 12:21:07 -0800
Subject: [XML-SIG] DOM and Proxies
References: <38AD6B1B.C57AD16B@prescod.net> <14509.38382.968129.719917@amarok.cnri.reston.va.us>
Message-ID: <38ADA9B3.B5C01BE1@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> Open question: is the proxy mechanism still useful if a garbage
> collection mechanism for collecting cycles gets into 1.6?  (Neal
> Schemenauer is working on something, but it's too early to tell if
> it'll get into 1.6; perhaps the cost will be too high.)

No, if the cycle-reaper gets into 1.6 then I wouldn't bother with the
proxies. The mere fact that I've spent too much of my life thinking up
cycle-avoidance mechanisms suggests that we should give Neal's patch a
high priority (scuse the pun)! Obviously the proxy has the benefit of
not slowing down anything that doesn't use it.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From harri.pasanen@trema.com  Wed Feb 23 17:12:56 2000
From: harri.pasanen@trema.com (Harri Pasanen)
Date: Wed, 23 Feb 2000 18:12:56 +0100
Subject: [XML-SIG] small installation problem
Message-ID: <38B41518.3548D87B@trema.com>

I installed PyXML-0.5.3 on Solaris 2.7 following the README
instructions.


python setup.py install

failed at first, because of missing 
/usr/local/lib/python1.5/site-packages/ directory.
That directory does not appear the be created when Python 1.5.2 is
installed from the tar-ball.

After manually creating the directory, the install went through without
complaints.

Regards,

-Harri


From fdrake@acm.org  Wed Feb 23 17:24:36 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 23 Feb 2000 12:24:36 -0500 (EST)
Subject: [XML-SIG] small installation problem
In-Reply-To: <38B41518.3548D87B@trema.com>
References: <38B41518.3548D87B@trema.com>
Message-ID: <14516.6100.748263.302872@weyr.cnri.reston.va.us>

Harri,
  This is (in part) a distutils issue.  The distutils package should
always create this directory if it doesn't exist.  Raw Python
installations should not create it, since adding it to the search path 
would slow down the module search.
  Greg, I don't know if you're reading the XML-SIG list, so I'm adding 
the distutils list to the list of recipients.


Harri Pasanen writes:
 > I installed PyXML-0.5.3 on Solaris 2.7 following the README
 > instructions.
 > 
 > 
 > python setup.py install
 > 
 > failed at first, because of missing 
 > /usr/local/lib/python1.5/site-packages/ directory.
 > That directory does not appear the be created when Python 1.5.2 is
 > installed from the tar-ball.
 > 
 > After manually creating the directory, the install went through without
 > complaints.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gward@python.net  Thu Feb 24 02:53:56 2000
From: gward@python.net (Greg Ward)
Date: Wed, 23 Feb 2000 21:53:56 -0500
Subject: [Distutils] Re: [XML-SIG] small installation problem
In-Reply-To: <14516.6100.748263.302872@weyr.cnri.reston.va.us>; from Fred L. Drake, Jr. on Wed, Feb 23, 2000 at 12:24:36PM -0500
References: <38B41518.3548D87B@trema.com> <14516.6100.748263.302872@weyr.cnri.reston.va.us>
Message-ID: <20000223215356.A3815@beelzebub>

On 23 February 2000, Fred L. Drake, Jr. said:
>   This is (in part) a distutils issue.  The distutils package should
> always create this directory if it doesn't exist.

Fred is correct -- Distutils should create any directories it needs to
install files.  (In fact, it creates any directories it needs to do
anything.)

In fact, I've just tested this: both my current development version (on
Linux) and the 0.1.3 release (Solaris 2.6) work peachy keen.  If I
remove or rename my site-packages directory, Distutils recreates it *as
long as I have permission to write in $prefix/lib/python1.5*.

Harri Pasanen writes:
> python setup.py install
> 
> failed at first, because of missing 
> /usr/local/lib/python1.5/site-packages/ directory.
> That directory does not appear the be created when Python 1.5.2 is
> installed from the tar-ball.

Since I can't reproduce the bug, I'm going to need more information.
Could you supply an exact transcript of the session where Distutils
failed to create the site-packages directory?  (I'm guessing there's a
traceback that will reveal useful information.)

Also, what version of Distutils did you use?

        Greg
-- 
Greg Ward - Linux bigot                                 gward@python.net
http://starship.python.net/~gward/
Whatever became of eternal truth?


From Tony.McDonald@newcastle.ac.uk  Thu Feb 24 07:11:39 2000
From: Tony.McDonald@newcastle.ac.uk (Tony.McDonald@newcastle.ac.uk)
Date: Thu, 24 Feb 2000 07:11:39 +0000
Subject: [XML-SIG] Compiled mac version of pyexpat anywhere?
Message-ID: <MD2.0d02.1000224071138@black29.ncl.ac.uk>

Can someone point me to a source for a pyexpat library for the Mac that will

work with the latest Python version (1.5.2fc). With the library available at

the Mac Python site,  I keep getting 'ImportError: PythonCore: An import

library was too new for a client.' messages. 

As the pystones benchmark indicates that my iMac is *twice* (4300 vs 2400

pystones) as fast as our Sun iron at work, I'd like to try and get this

working! :)

cheers
tone


From bradmars@yahoo.com  Thu Feb 24 21:54:40 2000
From: bradmars@yahoo.com (Bradley Marshall)
Date: Thu, 24 Feb 2000 13:54:40 -0800 (PST)
Subject: [XML-SIG] Returning data from DocumentHandler
Message-ID: <20000224215440.23382.rocketmail@web222.mail.yahoo.com>

Hey guys,

How do I return data from a ducumentHandler?  I am
using sax to build a data structure from xml files.  I
want to do something like :

class DocHandler(DocumentHandler):
....

    def endDocument(self):
	return self.data

Then I'm calling it like:

dh = docHandler()
p = parser()
p.setDocumentHandler(dh)
data = p.parseFile(file)
p.close()

but if I do :

print data

I get:
None

If I do all my manipulations in endDocument(), it's
fine, but I'd like to seperate those functionalities.

Thanks a lot,
Brad Marshall

__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com


From akuchlin@mems-exchange.org  Thu Feb 24 22:19:50 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 24 Feb 2000 17:19:50 -0500 (EST)
Subject: [XML-SIG] Returning data from DocumentHandler
In-Reply-To: <20000224215440.23382.rocketmail@web222.mail.yahoo.com>
References: <20000224215440.23382.rocketmail@web222.mail.yahoo.com>
Message-ID: <14517.44678.636035.291431@amarok.cnri.reston.va.us>

Bradley Marshall writes:
>class DocHandler(DocumentHandler):
>....
>    def endDocument(self):
>	return self.data
>
>dh = docHandler()
>p = parser()
>p.setDocumentHandler(dh)
>data = p.parseFile(file)
>p.close()

The parseFile() method doesn't return anything, so it'll always be
None.  In the Java version of SAX, the parse() method is declared as
void, in other words.  Why not just access the attribute .data of your
DocHandler class?  You can also add an accessor method,
.getWhateverData(), to your class, if you prefer accessor methods to
attributes.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Perhaps God made cats so that man might have the pleasure of fondling the
tiger...
    -- Robertson Davies, _The Diary of Samuel Marchbanks_


From uche.ogbuji@fourthought.com  Sun Feb 27 08:04:11 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 27 Feb 2000 01:04:11 -0700
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: Your message of "21 Feb 2000 09:23:36 +0100."
 <m3hff3ezg7.fsf@lambda.garshol.priv.no>
Message-ID: <200002270804.BAA04277@localhost.localdomain>

> The first problem is that of how to represent XML names. SAX 2.0 can
> handle namespaces, and so we must somehow represent namespace-names.
> I can see several different ways of doing this, all with their
> advantages and disadvantages, and would very much like to hear the
> opinion of the XML-SIG on this.
> 
> The alternatives I've thought of are
> 
>  - use (uri, localpart) tuple for namespace-names, simple strings for
>    ordinary names

This is how names are indexed in 4DOM.  However, it can cause some od problems 
if namespace-aware code is mixed with non-ns code.

>  - use (uri, localpart, rawname) for namespace-names, simple strings
>    for ordinary names; rawname must be communicated out of band
>    somehow

I do think it is very important to at least keep track of the prefix, even 
though we'd admonish users not to attach semantic value to them.

>  - use XMLName objects for names, regardless of kind. If these were
>    made immutable and drivers used hashtables of these this might not
>    be too inefficient.

What interface do you have in mind?  What hashing approach?  Simple string 
hashing for string names, and maybe soem concatenation into a single string 
for namespace names?

>  - use separate parameters for uri, localpart and rawname, letting
>    some of these be None depending on what was in the document and
>    what the parser supports.


-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From uche.ogbuji@fourthought.com  Sun Feb 27 08:15:57 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 27 Feb 2000 01:15:57 -0700
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: Your message of "Mon, 21 Feb 2000 12:25:29 EST."
 <002101bf7c90$a8f18a80$5da4fea9@tompassin>
Message-ID: <200002270815.BAA04321@localhost.localdomain>

> > The first problem is that of how to represent XML names. SAX 2.0 can
> > handle namespaces, and so we must somehow represent namespace-names.
> 
> I think we should make it as easy as possible to use either namespace-style
> names or ordinary names, so both can be used in the same way as far as
> possible.  The application shouldn't have to figure out the structure before
> it can even extract the value.  So I don't think the xml name should be a
> tuple if it has a declared namespace but a string if there is no namespace.
> 
> With this in mind, how about
> 
> ((prefix,localpart),uri)
> 
> If namespaces were not being used, prefix and uri would be None (or possibly
> the empty string).

It would have to be the former, to avoid confusion with default namespaces and 
null NS in an NS-aware system.

> This allows the use of alternative values for the prefix
> (so you could, for example, use xslt:template for xsl:template if you wanted
> to, which is the way it is supposed to work), and you could check the uri
> value anytime you needed to learn the exact namespace.  localpart would
> always be a string.

This is pretty much essential.

> Also, if you had a document containing several prefixes for the same
> namespace, you could easily use the localpart and uri, rather than the
> prefix.

The prefix shouldn't be used except for convenient uniformity from input to 
output, and for the few W3C-sanctioned cases such as XPath name tests.

> I don't recall how it shook out on XML-DEV, but there were a number of posts
> that said it was important to keep the actual prefix value, and this
> approach would do that.

I was a champion of that on XML-DEV, for the above reasons.

> BTW, "uri" doesn't actually need to be a uri, any unique string will do.

Actually, it does have to be a URI or it is in contradiction of the spec 
(although they didn't go the natural step to make URI conformance a formal 
namespace constraint, they do have pretty conclusive wording to that effect in 
section 1).


-- 
Uche Ogbuji
Fourthought, Inc., IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software-engineering, project-management, knowledge-management
http://Fourthought.com		http://OpenTechnology.org


From tpassin@idsonline.com  Sun Feb 27 17:20:24 2000
From: tpassin@idsonline.com (THOMAS PASSIN)
Date: Sun, 27 Feb 2000 12:20:24 -0500
Subject: [XML-SIG] SAX 2.0, again
References: <200002270815.BAA04321@localhost.localdomain>
Message-ID: <001901bf8146$f04041a0$b92a08d1@idsonline.com>

<uche.ogbuji@fourthought.com> wrote

<snip qty="most"/>

> > BTW, "uri" doesn't actually need to be a uri, any unique string will do.
>
> Actually, it does have to be a URI or it is in contradiction of the spec
> (although they didn't go the natural step to make URI conformance a formal
> namespace constraint, they do have pretty conclusive wording to that
effect in
> section 1).
>
Actually I mis-spoke slightly.  I really meant it doesn't have to look like
a regular ***URL***.  I was thinking that the "scheme" of a URI could be
blank, but checking the RFC I see it has to have at least one letter plus
the ":".  The rest of it can just be a string (modulo using legal
characters. etc).  The namespace spec specifically says
"It is not a goal that it be directly usable for retrieval of a schema (if
any exists). "  So it doesn't have to be any existing URL or even an
existing scheme, as long as it is unique.

Regards,

Tom Passin


From josh@shock.pobox.com  Sun Feb 27 22:59:58 2000
From: josh@shock.pobox.com (Josh Marcus)
Date: Sun, 27 Feb 2000 17:59:58 -0500
Subject: [XML-SIG] XML database options
Message-ID: <20000227175958.A22001@shock.pobox.com>

Can I ask a slightly off-topic question?

Recently, I've been implementing an XML server
that is capable of storing XML documents in a
relational database in such a way that the
documents can be quickly queried.
I found an interesting paper that compares
the performance of alternative mapping
schemes ("A Performance Evalutation of
Alternative Mapping Schemes for Storing
XML Data in a Relational Database") and
decided to follow the scheme their experimentation
found most efficient -- with a few changes
to trade storage space for speed. 

I was just wondering:
  
   o  Is there an open source application that
      I can use for this?  
   o  If not, is there conventional wisdom
      regarding how one might go about storing
      and querying XML data (short of buying a
      commercial oo-db)?

Thanks,
--j


From jack@oratrix.nl  Sun Feb 27 23:28:15 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 28 Feb 2000 00:28:15 +0100
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: Message by uche.ogbuji@fourthought.com ,
 Sun, 27 Feb 2000 01:15:57 -0700 , <200002270815.BAA04321@localhost.localdomain>
Message-ID: <20000227232820.25A81D71F2@oratrix.oratrix.nl>

Sjoerds mods to xmllib (which I don't think are publicly available,
but they might be in the CVS archive) use a single string ns+' '+attr.

This has the advantage of being pretty easy to use: it doesn't matter
much whether you check for an attribute "foo" or an attribute "myns
bar". The only addition you would need would be an optional mapping of 
external namespaces, i.e. there'd have to be a way to specify that if
a certain namespace was used in a document you'd like to see it with a 
specific name in the parser regardless of what is used in the document.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From m.favas@per.dem.csiro.au  Mon Feb 28 03:16:49 2000
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Mon, 28 Feb 2000 11:16:49 +0800
Subject: [XML-SIG] Patches for PyXML-0.5.3 re single arg to list.append
Message-ID: <38B9E8A1.2F27364F@per.dem.csiro.au>

Recently, the CVS version of Python has been changed to flag as an error
usages of list.append() with more than one argument. (Previously,
multiple args were silently converted to a tuple.) The following two
patches fix this type of append() use for the released PyXML-0.5.3
(apologies if already fixed in XML CVS). Both occur in
xml/parsers/xmlproc.

*** dtdparser.py.orig   Mon Feb 28 10:31:17 2000
--- dtdparser.py        Mon Feb 28 10:33:00 2000
***************
*** 598,604 ****
                  self.scan_to(">")
                  
              self.skip_ws()
!             cont_list.append(self.get_match(reg_name),"")
  
          if sep=="|" and not self.now_at("*"):
              self.report_error(3005,"*")
--- 598,604 ----
                  self.scan_to(">")
                  
              self.skip_ws()
!             cont_list.append((self.get_match(reg_name),""))
  
          if sep=="|" and not self.now_at("*"):
              self.report_error(3005,"*")


*** xmlutils.py.orig    Mon Feb 28 10:31:26 2000
--- xmlutils.py Mon Feb 28 10:32:15 2000
***************
*** 406,414 ****
      # --- Internal methods
  
      def _push_ent_stack(self):
!        
self.ent_stack.append(self.get_current_sysid(),self.data,self.pos,\
!                              
self.line,self.last_break,self.datasize,\
!                              
self.last_upd_pos,self.block_offset,self.final)
  
      def _pop_ent_stack(self):
         
(self.current_sysID,self.data,self.pos,self.line,self.last_break,\
--- 406,414 ----
      # --- Internal methods
  
      def _push_ent_stack(self):
!        
self.ent_stack.append((self.get_current_sysid(),self.data,self.pos,
!                               
self.line,self.last_break,self.datasize,
!                               
self.last_upd_pos,self.block_offset,self.final))
  
      def _pop_ent_stack(self):
         
(self.current_sysID,self.data,self.pos,self.line,self.last_break,\

Cheers,
	Mark


-- 
Email - m.favas@per.dem.csiro.au       Postal - Mark C Favas
Phone - +61 8 9333 6268, 041 892 6074           CSIRO Exploration &
Mining
Fax   - +61 8 9333 6121                         Private Bag No 5
                                                Wembley, Western
Australia 6913


From fdrake@acm.org  Mon Feb 28 20:33:34 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 Feb 2000 15:33:34 -0500 (EST)
Subject: [XML-SIG] Patches for PyXML-0.5.3 re single arg to list.append
In-Reply-To: <38B9E8A1.2F27364F@per.dem.csiro.au>
References: <38B9E8A1.2F27364F@per.dem.csiro.au>
Message-ID: <14522.56222.262861.886569@weyr.cnri.reston.va.us>

Mark Favas writes:
 > Recently, the CVS version of Python has been changed to flag as an error
 > usages of list.append() with more than one argument. (Previously,
 > multiple args were silently converted to a tuple.) The following two
 > patches fix this type of append() use for the released PyXML-0.5.3

Mark,
  Thanks!  I've just checked this in.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From rehankhwaja@yahoo.com  Mon Feb 28 21:49:38 2000
From: rehankhwaja@yahoo.com (Rehan Khwaja)
Date: Mon, 28 Feb 2000 13:49:38 -0800 (PST)
Subject: [XML-SIG] xslt stylesheet for xbel
Message-ID: <20000228214938.17798.qmail@web114.yahoomail.com>

i've made an xslt stylesheet for tranforming an xbel
document into a collapsing/expanding tree.

the dhtml for the collapsing/expanding stuff works in
Internet Explorer, at least.

is anybody interested in this?  i'd like to post it
somewhere if possible.

thanks,
rehan khwaja
rehankhwaja@yahoo.com
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com


From fdrake@acm.org  Mon Feb 28 21:57:14 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 Feb 2000 16:57:14 -0500 (EST)
Subject: [XML-SIG] xslt stylesheet for xbel
In-Reply-To: <20000228214938.17798.qmail@web114.yahoomail.com>
References: <20000228214938.17798.qmail@web114.yahoomail.com>
Message-ID: <14522.61242.974334.713338@weyr.cnri.reston.va.us>

Rehan Khwaja writes:
 > i've made an xslt stylesheet for tranforming an xbel
 > document into a collapsing/expanding tree.

  Cool!  I played with one a while back for display, but was just
learning XSL (there wasn't a "T" back then!) and wasn't very pleased
with the result.

 > is anybody interested in this?  i'd like to post it
 > somewhere if possible.

  I'd love to see it.  I can add it to the xbel directory in the PyXML 
package if you think others will be interested.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein@lyra.org  Mon Feb 28 23:13:48 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 28 Feb 2000 15:13:48 -0800 (PST)
Subject: [XML-SIG] URI schemes (was: SAX 2.0, again)
In-Reply-To: <001901bf8146$f04041a0$b92a08d1@idsonline.com>
Message-ID: <Pine.LNX.4.10.10002281512360.10607-100000@nebula.lyra.org>

On Sun, 27 Feb 2000, THOMAS PASSIN wrote:
> <uche.ogbuji@fourthought.com> wrote
> 
> <snip qty="most"/>
> 
> > > BTW, "uri" doesn't actually need to be a uri, any unique string will do.
> >
> > Actually, it does have to be a URI or it is in contradiction of the spec
> > (although they didn't go the natural step to make URI conformance a formal
> > namespace constraint, they do have pretty conclusive wording to that
> effect in
> > section 1).
> >
> Actually I mis-spoke slightly.  I really meant it doesn't have to look like
> a regular ***URL***.  I was thinking that the "scheme" of a URI could be
> blank, but checking the RFC I see it has to have at least one letter plus
> the ":".  The rest of it can just be a string (modulo using legal
> characters. etc).  The namespace spec specifically says
> "It is not a goal that it be directly usable for retrieval of a schema (if
> any exists). "  So it doesn't have to be any existing URL or even an
> existing scheme, as long as it is unique.

Minor nit:
   For it to be called a URI, the scheme must be registered with the IANA.

If you just willy-nilly use arbitrary, unregistered schemes, then you *do*
run the chance that it is not unique.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From rehankhwaja@yahoo.com  Tue Feb 29 00:26:35 2000
From: rehankhwaja@yahoo.com (Rehan Khwaja)
Date: Mon, 28 Feb 2000 16:26:35 -0800 (PST)
Subject: [XML-SIG] xslt stylesheet for xbel
Message-ID: <20000229002635.23704.qmail@web111.yahoomail.com>

--0-596516649-951783995=:21007
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

ok - here it is, with a few of my bookmarks that i
tested it with.

i know that the dhtml it produces doesn't work in
navigator :(  i'd like to know of other bugs,
otherwise enjoy.


>  > is anybody interested in this?  i'd like to post
> it
>  > somewhere if possible.
> 
>   I'd love to see it.  I can add it to the xbel
> directory in the PyXML 
> package if you think others will be interested.
> 

that sounds great :o)

cheers,
rehan
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com
--0-596516649-951783995=:21007
Content-Type: application/x-zip-compressed; name="xbel-xsl.zip"
Content-Transfer-Encoding: base64
Content-Description: xbel-xsl.zip
Content-Disposition: attachment; filename="xbel-xsl.zip"

UEsDBBQAAAAIAMGZXCjmeSI3GAUAAMAQAAAJAAAAbGlua3MueG1stVjdb9s2
EH9OgP4PN790Q1OpW/cwDI6LzGnabnGaNV67PhW0dJJY80MgKX/0oX/7jpTV
SLbiuAEGJDBNHu9+93308MVKCligsVyr08HP0bMBoEp0ylV+OvhnevH0t8GL
0fHwh/O34+nH65ewmqGAm48305cTGBTOlb/H8XK5jMq1K7SKtMljp0ue2JgY
x6lLaUFXnhLniL4NiJcX+dS6tUBbIDpw6xJPBw5XLl5ZMYDCYHY6EFzNbeQ3
XoweHQ+D3A7O0fHRMNMiRUOro6HjTuBorGVZOSIbxvXGI392S9bQnXObVNYz
axMeDWdazyUz8w2IjYJWMFuk2nn1BoFNw6c56XA5GqZok5HCpYVMG1BoUgIU
NoOUuBEz2iPUW5UZ6zApFE9YlGjZlX1mLEw3p33yr8fwo9TWifVPUDCTLplB
4CrT/VD2GcBj8TeX2ojUI4nJx4qsZ2OuUlxFhZNi0EivobzTMzQOxoZCCcX6
sYVEi0oq27V4xxbDuOXRzfJRy22XXFWrQzyWGQouicxFCt0WsIvmrM9ob0tU
cKMrkyCwsmzcto31PmPlSkusw6Uj+5Xf75NLoBBoNafsgRmzmIJWkLv5AwHM
0z7xf52/PFz439MHyhbeS9qGYnBZGp0bJvsiJHgTNgSSwgQanrYPZagIIaHa
V7iCBTNcVxZSnmVoUDkQTOUVy9GewBJnJzB+8uQEzlIG6JI9Sn0Lv3bVaMff
hCdGW525Q2LQm0I2F+r07aj/ejsUDjSxtKnqMiaNFih0iYZWGauE8+beEje5
Ob/aJ+5e5cmIh6othBJRrhdxEuox+Sl2ldOGM2FjKiJzW+jydhWXjlgw6hWT
szdXfaFyPa0JoGHzEMN5ZNaxZB4pEX9NueTO8DjVq3WOKt6SuNnuC0RXcAv0
5wqkPEH6ZA44fegqLxwsmYUZWloUVEt4+M7VgnZ4zrwt/EXrr9F6X4Hpq4Yt
f/w7uWzQ7VFaLcnmxW73qPf71OMB2bxWK+iaV2sIvgrZx8xMG9+uI4DGFKSq
by4gkBnKP2rbkBktv7vp0Wjg+8wuXD+m1L1rF3Awtiew3DXeeLxAAlup9Psh
2Lsg2C6EHQR2L4JtD9/v4D/Zgh3g4c9EFtlK7SL2J0BY6LDD5yA7fKspUVtA
q9TUnf+zrdO1kzsNDbV8pTBxNCn8fzi0Iofh1DCaSKjQ0Nj1i6eMu5i+1Z/e
jG4OQ9SGxA62u1UkOLZR4Y6I6q2gLX+ODdUeD/EDTWJfDvCsL1hh5DIotUM/
y8SZS/z/pyUFmSVd60rf1vQ9Gp5x6uEX02v7/fb2Qlf2VyZEKJPimUljpphJ
ivWurNc8p8yHS5/2pNnDpBUZS8j+q/r1QIUl4/Q22PLflPbhwh/AGWHhC3yY
MFk+p5bjewAZc7tFXj+HG394UJ887jh89y3yjsmSUTEc05hcSXKKlS3I+6Z+
yb7oVjo3837YbrPoRl8/wxl5Bi1TqdIzgTtM/wjHcKZSuPIEd7Fva72j6Qfq
De2Lu6H/RjmkkU3Ae95+dN3jq0Ugngm92q1tu+d3x8N9z4oJo4CiRt4Fd98o
kTIeYRqxJKrm8fh9XYZoQQOoWfcMYP5tiirllQSdbWTBjauyrH/I8DOCf7Ix
mFUqKfylMAI/cCpPbFRZGgzn8dclF4IzGUtt8FNzw/bNXZPwfgS5MVBt8P1j
ev3khEnnygnp4IAKrIbwOgAKSNDO1w5bW+CQKeh2OQy/KoyO/wNQSwMEFAAA
AAgA4GRTKJLeYBmYAAAApAAAAAkAAABtaW51cy5wbmfrDPBz5+WS4mJgYOD1
9HAJAtLcIMwBJBh6jmv8AlKiJa4RJc5FqYklmfl5CiGZuakM7BeY+EWktc3f
6u8HKmAv8fR1BQoJi3Dz/Xz66B9QiLPAI7KYgYHrAwgzOq0xmQEUVPJ0cQyp
uJWc4MDg1va/QE14koWeGXPM33o2MZb2Rou9QEvNfVn4Z33m4QdZ7+nq57LO
KaEJAFBLAwQUAAAACADGllwogiIRyJ4BAABPBAAACQAAAGxpbmtzLnhzbLVU
O2/bMBCebcD/geCSdrDYIEvtSkmnTp3aBMhKU2eJMUUSvPND/760KMlxYAdB
gE4Cj9/d9zhC+cOhMWwHAbWzBb/NvnEGVrlS26rgT4+/5t/5w/1smh/QLJFa
A1gDEItdFpexWPCayC+F2O/32f4uc6ESt4vFQjz//S0eg7S4dqHh5xRx4Gw6
6WYSNN5IAtZIUnXBV85tGhk2R8xkkm9N901YSRT0ahvBVjZQcNJkIAF7xE6a
LczdmiEYUFTwElBx0c8QZ0P6onzd/5ahDrAeCK4w/OwwYphyieNKZ5I/iktC
cpEsp0FDOu8EtnamhPA2Lt1UDIMquDdbzLytONNlOnHmrDJabQoOBy9t+eXr
j1HFf4v5AwlgFMOUkYiDq7lylsBS1Ny9vUikMbpvl8w6e65Jem/a+ZAOjhx9
PuIj2PHxjVaOmj63F9GvpKZmWEoNshxkdNbv/0At7Q0yo+0GmZcV5CLd9DBU
QXti1PrjIuBA4kXuZKrytOGuN3vBMQ2RrgcLJ9Z85cr2nRxG1yMudvfyL/hO
pdNPIRb/AVBLAwQUAAAACADgZFMoa3GVj6IAAACsAAAACAAAAHBsdXMucG5n
6wzwc+flkuJiYGDg9fRwCQLS3CDMASQYeo5r/AJSoiWuESXORamJJZn5eQoh
mbmpDOwXmPhFpLXN3+rvBypgL/H0dQUKCYtw8/c9upQBFOIs8IgsZmDg+gDC
jE5rTGYABbU8XRxDKm4lJzgwuLX9L1ATDmFhmTHhisJmyQev/htvaLDs+f9x
ZbrTxWYGhrpFbPwuK7maQM7wdPVzWeeU0AQAUEsDBBQAAAAIAJSZXCj6AFLk
mwAAAJYBAAAIAAAAbGlua3MuanOtkDEKwzAMRecEcgfjKV10geKpdO0dTKy0
AkU2sdM0lN69dobSpSFDQUJ/eEgP9ZN0ibwofAQrrj009bOpK+pVO5M4PwPe
URLEsTszDiWSU8YoHXiKOuNV4atfcLBjHhfvENjGdLoRO4hpYQRHMbBdjCZh
EtTHrT3lqNIDST66Ceb4ISHIdaVfuZEj/sNW/D7X9T97VAv4bZrrDVBLAQIU
ABQAAAAIAMGZXCjmeSI3GAUAAMAQAAAJAAAAAAAAAAEAIAC2gQAAAABsaW5r
cy54bWxQSwECFAAUAAAACADgZFMokt5gGZgAAACkAAAACQAAAAAAAAAAACAA
toE/BQAAbWludXMucG5nUEsBAhQAFAAAAAgAxpZcKIIiEcieAQAATwQAAAkA
AAAAAAAAAQAgALaB/gUAAGxpbmtzLnhzbFBLAQIUABQAAAAIAOBkUyhrcZWP
ogAAAKwAAAAIAAAAAAAAAAAAIAC2gcMHAABwbHVzLnBuZ1BLAQIUABQAAAAI
AJSZXCj6AFLkmwAAAJYBAAAIAAAAAAAAAAEAIAC2gYsIAABsaW5rcy5qc1BL
BQYAAAAABQAFABEBAABMCQAAAAA=

--0-596516649-951783995=:21007--


From larsga@garshol.priv.no  Tue Feb 29 07:21:44 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Feb 2000 08:21:44 +0100
Subject: [XML-SIG] SAX 2.0 names
Message-ID: <m37lfowk1j.fsf@lambda.garshol.priv.no>

I've done some more thinking about this now, and this is the result:

Element and attribute type names in XML have the following properties:

 - a namespace URI
 - a local name
 - a raw name (SAX 2.0b2 is now reporting this instead of the prefix)


The essential operations on these names are:

 - comparison
 - indexing (that is, using them as keys in a dictionary)
 - decomposition (which includes partial comparison, where you check
   only the namespace or local name of the name)


After the discussions we've had so far, these are the best
alternatives for representations I can think of:

 - as objects (with __cmp__, __hash__, get_uri, get_local_name and
   get_rawname methods)

   - requires a bit of machinery in drivers to be effective
   - all operations will be slow
   - a natural way to model this

 - as strings (of the form 'uri localname', with the rawname in a
   separate parameter)

   - comparison and indexing will be fast, especially with interned
     names 
   - decomposition will be slow and awkward
   - feels kind of like a hack

 - as tuples (of the form ('uri', 'localname'), with the rawname in a
   separate parameter)

   - all operations are convenient
   - comparison and indexing may not be as fast as with strings
   - a natural way to model this


I have to go to a meeting before too long, but I'll try to make two
benchmarks to compare the performance of the different representations.

--Lars M.


From larsga@garshol.priv.no  Tue Feb 29 07:23:36 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Feb 2000 08:23:36 +0100
Subject: [XML-SIG] SAX 2.0, again
In-Reply-To: <200002270815.BAA04321@localhost.localdomain>
References: <200002270815.BAA04321@localhost.localdomain>
Message-ID: <m366v8wjyf.fsf@lambda.garshol.priv.no>

* THOMAS PASSIN
|
| Also, if you had a document containing several prefixes for the same
| namespace, you could easily use the localpart and uri, rather than
| the prefix.

* uche ogbuji
| 
| The prefix shouldn't be used except for convenient uniformity from
| input to output, and for the few W3C-sanctioned cases such as XPath
| name tests.

Agreed. The prefix is just a lexical detail, only useful for
roundtripping. 

--Lars M.


From tpassin@idsonline.com  Tue Feb 29 12:57:54 2000
From: tpassin@idsonline.com (THOMAS PASSIN)
Date: Tue, 29 Feb 2000 07:57:54 -0500
Subject: [XML-SIG] SAX 2.0, again
References: <200002270815.BAA04321@localhost.localdomain> <m366v8wjyf.fsf@lambda.garshol.priv.no>
Message-ID: <001901bf82b4$98c2f160$3415b0cf@idsonline.com>

Lars Marius Garshol > * THOMAS PASSIN
> |
> | Also, if you had a document containing several prefixes for the same
> | namespace, you could easily use the localpart and uri, rather than
> | the prefix.
> 
> * uche ogbuji
> | 
> | The prefix shouldn't be used except for convenient uniformity from
> | input to output, and for the few W3C-sanctioned cases such as XPath
> | name tests.
> 
> Agreed. The prefix is just a lexical detail, only useful for
> roundtripping. 
> 
Agreed here, too.

Tom Passin


From fdrake@acm.org  Tue Feb 29 15:31:05 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 29 Feb 2000 10:31:05 -0500 (EST)
Subject: [XML-SIG] SAX 2.0 names
In-Reply-To: <m37lfowk1j.fsf@lambda.garshol.priv.no>
References: <m37lfowk1j.fsf@lambda.garshol.priv.no>
Message-ID: <14523.58937.766784.166817@weyr.cnri.reston.va.us>

Lars Marius Garshol writes:
 >  - as objects (with __cmp__, __hash__, get_uri, get_local_name and
 >    get_rawname methods)
 > 
 >    - requires a bit of machinery in drivers to be effective
 >    - all operations will be slow
 >    - a natural way to model this

  If the objects are implemented as a C/Java extension type, it should
be plenty fast.  A 100% Pure Python implementation can be a fallback
if the extension isn't available.

 >  - as strings (of the form 'uri localname', with the rawname in a
 >    separate parameter)
 > 
 >    - comparison and indexing will be fast, especially with interned
 >      names 
 >    - decomposition will be slow and awkward
 >    - feels kind of like a hack

  Very much.

 >  - as tuples (of the form ('uri', 'localname'), with the rawname in a
 >    separate parameter)
 > 
 >    - all operations are convenient
 >    - comparison and indexing may not be as fast as with strings
 >    - a natural way to model this

  And the convenient tuple unpacking could also be provided using the
object approach; the objects can easily implement the sequence
protocol.
  I'd be willing to write a C implementation of the object version if
that's the API we decide on, but I'd also be fine with the third
option.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From larsga@garshol.priv.no  Tue Feb 29 16:21:06 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 29 Feb 2000 17:21:06 +0100
Subject: [XML-SIG] SAX 2.0 names
In-Reply-To: <14523.58937.766784.166817@weyr.cnri.reston.va.us>
References: <m37lfowk1j.fsf@lambda.garshol.priv.no> <14523.58937.766784.166817@weyr.cnri.reston.va.us>
Message-ID: <m3r9dwq8st.fsf@lambda.garshol.priv.no>

* Lars Marius Garshol
|
| - as objects (with __cmp__, __hash__, get_uri, get_local_name and
|   get_rawname methods)
| 
|   - requires a bit of machinery in drivers to be effective
|   - all operations will be slow
|   - a natural way to model this

* Fred L. Drake, Jr.
| 
| If the objects are implemented as a C/Java extension type, it should
| be plenty fast.  A 100% Pure Python implementation can be a fallback
| if the extension isn't available.

Hmmm.  That might be the way to go.  I still wonder about the speed,
though. 
 
| And the convenient tuple unpacking could also be provided using the
| object approach; the objects can easily implement the sequence
| protocol.

Good idea. This makes objects even more attractive.

| I'd be willing to write a C implementation of the object version if
| that's the API we decide on, but I'd also be fine with the third
| option.

Hmmm.  Let's chew on this a little more and hear some more opinions
before deciding. 

I did the benchmark I spoke of, and the results indicate that the
performance differences are very small between strings and tuples.
Also, how you put together the strings influences the speed a
bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium
II with plenty of RAM and MHz.


[larsga@pc-larsga python]$ python sax2bench.py 
Pure parsing time: 28.73

---Generic:
__main__.NamespaceFilterString  30.25
__main__.NamespaceFilterInternedString  30.85
 __main__.NamespaceFilterTuple  30.15

---Specific:
__main__.NamespaceFilterString  30.71
__main__.NamespaceFilterInternedString  30.7
 __main__.NamespaceFilterTuple  29.67


# A simple benchmark of various ways to represent namespace-names and
# how this affects performance.
    
# ==================== NAMESPACEFILTER

# This is xmlproc's normal namespace filter, but modified to use
# different name representations

import string
from xml.parsers.xmlproc import xmlapp

# --- Name objects

class SAXName:

    def __init__(self, uri, localname, rawname):
        self.__uri = uri
        self.__localname = localname
        self.__rawname = rawname
        self.__hash = hash(uri) + hash(localname)

    def get_uri(self):
        return self.__uri

    def get_localname(self):
        return self.__localname

    def get_rawname(self):
        return self.__rawname

    def __cmp__(self, other): # NB! Does not sort properly
        if self.__hash == hash(other) and isinstance(other, SAXName):
            return self.__uri == other.get_uri() and \
                   self.__localname == other.get_localname()
        else:
            return 0

    def __hash__(self):
        return self.__hash

# --- ParserFilter

class ParserFilter(xmlapp.Application):
    "A generic parser filter class."

    def __init__(self):
        xmlapp.Application.__init__(self)
        self.app=xmlapp.Application()

    def set_application(self,app):
        "Sets the application to report events to."
        self.app=app
        
    # --- Methods inherited from xmlapp.Application
        
    def set_locator(self,locator):
        xmlapp.Application.set_locator(self,locator)
        self.app.set_locator(locator)
    
    def doc_start(self):
        self.app.doc_start()
        
    def doc_end(self):
        self.app.doc_end()
	
    def handle_comment(self,data):
        self.app.handle_comment(data)

    def handle_start_tag(self,name,attrs):
        self.app.handle_start_tag(name,attrs)

    def handle_end_tag(self,name):
        self.app.handle_end_tag(name)
    
    def handle_data(self,data,start,end):
        self.app.handle_data(data,start,end)

    def handle_ignorable_data(self,data,start,end):
        self.app.handle_ignorable_data(data,start,end)
    
    def handle_pi(self,target,data):
        self.app.handle_pi(target,data)

    def handle_doctype(self,root,pubID,sysID):
        self.app.handle_doctype(root,pubID,sysID)
    
    def set_entity_info(self,xmlver,enc,sddecl):
        self.app.set_entity_info(xmlver,enc,sddecl)

# --- NamespaceFilter
        
class NamespaceFilterGeneric(ParserFilter):
    """An xmlproc application that processes qualified names and reports them
    as 'URI local-part' names. It reports errors through the error reporting
    mechanisms of the parser."""   

    def __init__(self,parser):
        ParserFilter.__init__(self)
        self.ns_map={}       # Current prefix -> URI map
        self.ns_stack=[]     # Pushed for each element, used to maint ns_map
        self.rep_ns_attrs=0  # Report xmlns-attributes?
        self.parser=parser

    def set_report_ns_attributes(self,action):
        "Tells the filter whether to report or delete xmlns-attributes."
        self.rep_ns_attrs=action
        
    # --- Overridden event methods
        
    def handle_start_tag(self,name,attrs):
        old_ns={} # Reset ns_map to these values when we leave this element
        del_ns=[] # Delete these prefixes from ns_map when we leave element

        # attrs=attrs.copy()   Will have to do this if more filters are made

        # Find declarations, update self.ns_map and self.ns_stack
        for (a,v) in attrs.items():
            if a[:6]=="xmlns:":
                prefix=a[6:]
                if string.find(prefix,":")!=-1:
                    self.parser.report_error(1900)

                if v=="":
                    self.parser.report_error(1901)
            elif a=="xmlns":
                prefix=""
            else:
                continue

            if self.ns_map.has_key(prefix):
                old_ns[prefix]=self.ns_map[prefix]
            else:
                del_ns.append(prefix)

            if prefix=="" and v=="":
                del self.ns_map[prefix]
            else:
                self.ns_map[prefix]=v

            if not self.rep_ns_attrs:
                del attrs[a]

        self.ns_stack.append((old_ns,del_ns))
        
        # Process elem and attr names
        name=self._process_name(name)
        for (a,v) in attrs.items():
            del attrs[a]
            attrs[self._process_name(a)]=v
        
        # Report event
        self.app.handle_start_tag(name,attrs)

    def handle_end_tag(self,name):
        name=self._process_name(name)

        # Clean up self.ns_map and self.ns_stack
        (old_ns,del_ns)=self.ns_stack[-1]
        del self.ns_stack[-1]

        self.ns_map.update(old_ns)
        for prefix in del_ns:
            del self.ns_map[prefix]        
            
        self.app.handle_end_tag(name)

class NamespaceFilterString(NamespaceFilterGeneric):
    
    def _process_name(self,name):
        n=string.split(name,":")
        if len(n)>2:
            self.parser.report_error(1900)
            return name
        elif len(n)==2:
            if n[0]=="xmlns":
                return name 
                
            try:
                #return string.join(self.ns_map[n[0]],n[1])   (slowest)
                #return "%s %s" % (self.ns_map[n[0]],n[1])    (slower)
                return self.ns_map[n[0]] + " " + n[1]
            except KeyError:
                self.parser.report_error(1902)
                return name
        elif self.ns_map.has_key("") and name!="xmlns":
            return "%s %s" % (self.ns_map[""],name)
        else:
            return name

class NamespaceFilterInternedString(NamespaceFilterGeneric):
    
    def _process_name(self,name):
        n=string.split(name,":")
        if len(n)>2:
            self.parser.report_error(1900)
            return name
        elif len(n)==2:
            if n[0]=="xmlns":
                return name 
                
            try:
                #return intern(string.join(self.ns_map[n[0]],n[1])) (slowest)
                #return intern("%s %s" % (self.ns_map[n[0]],n[1]))  (slower)
                return intern(self.ns_map[n[0]] + " " + n[1])
            except KeyError:
                self.parser.report_error(1902)
                return name
        elif self.ns_map.has_key("") and name!="xmlns":
            return intern("%s %s" % (self.ns_map[""],name))
        else:
            return name

class NamespaceFilterTuple(NamespaceFilterGeneric):

    def _process_name(self,name):
        n=string.split(name,":")
        if len(n)>2:
            self.parser.report_error(1900)
            return name
        elif len(n)==2:
            if n[0]=="xmlns":
                return name 
                
            try:
                return (self.ns_map[n[0]], n[1])
            except KeyError:
                self.parser.report_error(1902)
                return (None, name)
        elif self.ns_map.has_key("") and name!="xmlns":
            return (self.ns_map[""], name)
        else:
            return (None, name)

class NamespaceFilterObject(NamespaceFilterGeneric):
    
    def __init__(self, parser):
        NamespaceFilterGeneric.__init__(self, parser)
        self.__objs = {}
    
    def _process_name(self,name): # FIXME: implement!
        n=string.split(name,":")
        if len(n)>2:
            self.parser.report_error(1900)
            return name
        elif len(n)==2:
            if n[0]=="xmlns":
                return name 
                
            try:
                return (self.ns_map[n[0]], n[1])
            except KeyError:
                self.parser.report_error(1902)
                return (None, name)
        elif self.ns_map.has_key("") and name!="xmlns":
            return (self.ns_map[""], name)
        else:
            return name
        
# ==================== GENERIC BENCHMARK

class GenericStats(xmlapp.Application):

    def __init__(self):
        self.__elemtypes = {}
        self.__attrtypes = {}

    def handle_start_tag(self, name, attrs):
        try:
            self.__elemtypes[name] = self.__elemtypes[name] + 1
        except KeyError:
            self.__elemtypes[name] = 1

        for (attr, value) in attrs.items():
            try:
                self.__attrtypes[attr] = self.__attrtypes[attr]
            except KeyError:
                self.__attrtypes[attr] = 1

# ==================== SPECIFIC BENCHMARK

apt_airport = intern("http://www.megginson.com/exp/ns/airports# Airport")
apt_latitude = intern("http://www.megginson.com/exp/ns/airports# latitude")

apt_uri = "http://www.megginson.com/exp/ns/airports#"
apt_len = len(apt_uri)

rdf_uri = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdf_len = len(rdf_uri)

apt_airport2 = ("http://www.megginson.com/exp/ns/airports#", intern("Airport"))
apt_latitude2 = ("http://www.megginson.com/exp/ns/airports#", intern("latitude"))

class SpecificStatsString(xmlapp.Application):

    def __init__(self):
        self.__airports = 0
        self.__with_coords = 0
        self.__apt_elems = 0
        self.__rdf_elems = 0

    def handle_start_tag(self, name, attrs):
        if name == apt_airport:
            self.__airports = self.__airports + 1

        elif name == apt_latitude:
            self.__with_coords = self.__with_coords + 1

        if name[:apt_len] == apt_uri:
            self.__apt_elems = self.__apt_elems + 1

        elif name[:rdf_len] == rdf_uri:
            self.__rdf_elems = self.__rdf_elems + 1

class SpecificStatsTuple(xmlapp.Application):

    def __init__(self):
        self.__airports = 0
        self.__with_coords = 0
        self.__apt_elems = 0
        self.__rdf_elems = 0

    def handle_start_tag(self, name, attrs):
        if name == apt_airport2:
            self.__airports = self.__airports + 1

        elif name == apt_latitude2:
            self.__with_coords = self.__with_coords + 1

        if name[0] == apt_uri:
            self.__apt_elems = self.__apt_elems + 1

        elif name[0] == rdf_uri:
            self.__rdf_elems = self.__rdf_elems + 1
            
# ==================== MAIN PROGRAM

from xml.parsers.xmlproc import xmlproc
import time

p = xmlproc.XMLProcessor()
start = time.clock()
p.set_application(NamespaceFilterTuple(p))
p.parse_resource("airports.rdf")
used = time.clock() - start

print "Pure parsing time:", used

print
print "---Generic:"
for filter in [NamespaceFilterString, NamespaceFilterInternedString,
               NamespaceFilterTuple]:
    p = xmlproc.XMLProcessor()
    nsfilter = filter(p)
    nsfilter.set_application(GenericStats())
    p.set_application(nsfilter)

    start = time.clock()
    p.parse_resource("airports.rdf")
    used = time.clock() - start

    print "%30s\t%s" % (filter, used)

print
print "---Specific:"
for (Filter, App) in [(NamespaceFilterString, SpecificStatsString),
                      (NamespaceFilterInternedString, SpecificStatsString),
                      (NamespaceFilterTuple, SpecificStatsTuple)]:
    p = xmlproc.XMLProcessor()
    nsfilter = Filter(p)
    nsfilter.set_application(App())
    p.set_application(nsfilter)

    start = time.clock()
    p.parse_resource("airports.rdf")
    used = time.clock() - start

    print "%30s\t%s" % (Filter, used)


#--Lars M.


From fdrake@acm.org  Tue Feb 29 16:31:35 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 29 Feb 2000 11:31:35 -0500 (EST)
Subject: [XML-SIG] SAX 2.0 names
In-Reply-To: <m3r9dwq8st.fsf@lambda.garshol.priv.no>
References: <m37lfowk1j.fsf@lambda.garshol.priv.no>
 <14523.58937.766784.166817@weyr.cnri.reston.va.us>
 <m3r9dwq8st.fsf@lambda.garshol.priv.no>
Message-ID: <14523.62567.356663.65659@weyr.cnri.reston.va.us>

Lars Marius Garshol writes:
 > Hmmm.  That might be the way to go.  I still wonder about the speed,
 > though. 

  If the C extension is actually available, it should be about the
same a building a tuple; perhaps a *little* faster, but the difference 
would come out in the wash.

 > Hmmm.  Let's chew on this a little more and hear some more opinions
 > before deciding. 

  Agreed; I won't have time to write a bunch of new C code for a
couple of weeks anyway.

 > I did the benchmark I spoke of, and the results indicate that the
 > performance differences are very small between strings and tuples.
 > Also, how you put together the strings influences the speed a
 > bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium
 > II with plenty of RAM and MHz.

  Looks good!  As for string construction, "%s %s" % (uri, localpart)
requires 1 malloc() more for the new string than just creating the
tuple, and uri + " " + localpart would require the same number of
malloc() calls, but slightly more data copying when uri isn't "".
Very close, but both require the extra malloc() compared to just using 
a tuple.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tpassin@idsonline.com  Tue Feb 29 20:44:09 2000
From: tpassin@idsonline.com (THOMAS PASSIN)
Date: Tue, 29 Feb 2000 15:44:09 -0500
Subject: [XML-SIG] SAX 2.0 names
References: <m37lfowk1j.fsf@lambda.garshol.priv.no><14523.58937.766784.166817@weyr.cnri.reston.va.us><m3r9dwq8st.fsf@lambda.garshol.priv.no> <14523.62567.356663.65659@weyr.cnri.reston.va.us>
Message-ID: <001801bf82f5$bba340e0$4d15b0cf@idsonline.com>

Fred L. Drake, Jr. wrote
>
> Lars Marius Garshol writes:
>  > Hmmm.  That might be the way to go.  I still wonder about the speed,
>  > though.
>
>   If the C extension is actually available, it should be about the
> same a building a tuple; perhaps a *little* faster, but the difference
> would come out in the wash.
>
>  > Hmmm.  Let's chew on this a little more and hear some more opinions
>  > before deciding.
>
>   Agreed; I won't have time to write a bunch of new C code for a
> couple of weeks anyway.
>
>  > I did the benchmark I spoke of, and the results indicate that the
>  > performance differences are very small between strings and tuples.
>  > Also, how you put together the strings influences the speed a
>  > bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium
>  > II with plenty of RAM and MHz.
>
>   Looks good!  As for string construction, "%s %s" % (uri, localpart)
> requires 1 malloc() more for the new string than just creating the
> tuple, and uri + " " + localpart would require the same number of
> malloc() calls, but slightly more data copying when uri isn't "".
> Very close, but both require the extra malloc() compared to just using
> a tuple.
>
In earlier posts I suggested tuples.  Fred and Lars' posts seem to be saying
that tuples shouldn't cause a bug performance hit, and that could possibly
be finessed anyway.  Have I summarized what you have said correctly, Fred
and Lars?

Then I think we should go with tuples, because
1) They are easy for a non-expert Python programmer to understand and work
with,
2) they capitalize on a Python strength (nice data structures),
3) an expert can make them perform even better with extension modules, and
4) as Fred said, if the extension module were not available one could fall
back to a 100% Python implementation with practically no changes to existing
code.

Regards,
Tom P


From fdrake@acm.org  Tue Feb 29 21:17:31 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 29 Feb 2000 16:17:31 -0500 (EST)
Subject: [XML-SIG] SAX 2.0 names
In-Reply-To: <001801bf82f5$bba340e0$4d15b0cf@idsonline.com>
References: <m37lfowk1j.fsf@lambda.garshol.priv.no>
 <14523.58937.766784.166817@weyr.cnri.reston.va.us>
 <m3r9dwq8st.fsf@lambda.garshol.priv.no>
 <14523.62567.356663.65659@weyr.cnri.reston.va.us>
 <001801bf82f5$bba340e0$4d15b0cf@idsonline.com>
Message-ID: <14524.14187.795572.189167@weyr.cnri.reston.va.us>

THOMAS PASSIN writes:
 > In earlier posts I suggested tuples.  Fred and Lars' posts seem to be saying
 > that tuples shouldn't cause a bug performance hit, and that could possibly
 > be finessed anyway.  Have I summarized what you have said correctly, Fred
 > and Lars?

  That's my interpretation.

 > Then I think we should go with tuples, because
 > 1) They are easy for a non-expert Python programmer to understand and work
 > with,
 > 2) they capitalize on a Python strength (nice data structures),
 > 3) an expert can make them perform even better with extension modules, and
 > 4) as Fred said, if the extension module were not available one could fall
 > back to a 100% Python implementation with practically no changes to existing
 > code.

  The "object" I imagine has three attributes: namespace URI,
localpart, and prefix.  It would unpack to two values: URI &
localpart, and comparisons would only operate on those two as well.
  The advantage is that we get the prefix for those who want it,
single object comparisons, and no extraneous parameters to the call.
I don't think *this* is available using the non-object approaches.
Whether the objects are extension types or classes is irrelevant to
this.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein@lyra.org  Tue Feb 29 21:34:47 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 13:34:47 -0800 (PST)
Subject: [XML-SIG] SAX 2.0 names
In-Reply-To: <m3r9dwq8st.fsf@lambda.garshol.priv.no>
Message-ID: <Pine.LNX.4.10.10002291329380.10607-100000@nebula.lyra.org>

I'm all for using tuples. If somebody wants extended capabilities through
the use of objects, then they can use them on top of tuples. If you start
with objects, then you've set a minimum. As Thomas said, using tuples is
simple, clean, and Pythonic.

KISS

On 29 Feb 2000, Lars Marius Garshol wrote:
>...
> I did the benchmark I spoke of, and the results indicate that the
> performance differences are very small between strings and tuples.
> Also, how you put together the strings influences the speed a
> bit. Benchmark run with Python 1.5.2 on Debian GNU/Linux on a Pentium
> II with plenty of RAM and MHz.
> 
> [larsga@pc-larsga python]$ python sax2bench.py 
> Pure parsing time: 28.73
> 
> ---Generic:
> __main__.NamespaceFilterString  30.25
> __main__.NamespaceFilterInternedString  30.85
>  __main__.NamespaceFilterTuple  30.15
> 
> ---Specific:
> __main__.NamespaceFilterString  30.71
> __main__.NamespaceFilterInternedString  30.7
>  __main__.NamespaceFilterTuple  29.67

The reason they seem small is because the "benchmark" is bogus. You have a
HUGE constant factor. Just look at the thing: hundreds of lines. Classes
here and there, function calls over that way, etc.

If you want to truly benchmark the varieties, then initialize a number of
sample objects and time their *usage*. Alternatively, you can time their
*construction* from some fake data.

As it is, your test has *way* too much noise in it to provide adequate
information about the performance of the alternatives.


And besides... performance isn't everything. The use of tuples is clean
and straight-forward. That counts for quite a lot. The fact that it
appears they are more efficient (based on your rough test) is just another
wonderful boon for them.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From ken@bitsko.slc.ut.us  Fri Feb 18 23:04:27 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 18 Feb 2000 17:04:27 -0600
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 12:37:38 -0800
References: <38AD6B1B.C57AD16B@prescod.net> 		<14509.38382.968129.719917@amarok.cnri.reston.va.us> 		<x57lg2p94h.fsf@bitsko.slc.ut.us> <14509.51568.114533.714616@weyr.cnri.reston.va.us> <38ADAD92.BE9948FB@prescod.net>
Message-ID: <x5snyqnmdg.fsf@bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> "Fred L. Drake, Jr." wrote:
> > 
> >   I like this!  This also requires proxies to work cleanly, as far as
> > I can tell.
> 
> Insofar as this is minidom and provides minimal support for moving
> things around, cloning them and so forth, I wouldn't put in proxies
> just to get object reuse. In the full PyDOM they would be more
> appropriate.

AFAIK, "real" DOM doesn't support object reuse anyway, so compliant
DOM code wouldn't need proxies for that reason.  I was thinking more
of the general case, of which a mini-dom could optionally support
where a full DOM really wouldn't or shouldn't according to spec.

So, I still like proxies for data (especially grove-like data), but if
1.6 doesn't need 'em for DOM, I'm OK.

  -- Ken


From ken@bitsko.slc.ut.us  Fri Feb 18 21:11:38 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 18 Feb 2000 15:11:38 -0600
Subject: [XML-SIG] DOM and Proxies
In-Reply-To: Paul Prescod's message of Fri, 18 Feb 2000 07:34:03 -0800
References: <38AD666B.E5C3E20C@prescod.net>
Message-ID: <x5zosynrlh.fsf@bitsko.slc.ut.us>

Paul Prescod <paul@prescod.net> writes:

> Proxied objects have "families". All objects in a family live for
> the same length of time. Families are expected to be completely
> internally linked. There is one proxy "family" for every
> LemmingLeader (created through an explicit call to the proxy method)
> (e.g. one per DOM).

I kinda got lost in this.  What's the need for the "family" and
LemmingLeader?  In my usages of proxies, a reference to the root node
is almost always kept somewhere for the life of the tree, so the tree
never gets collected until that reference is released.  All of the
proxies that are generated from the tree are usually just temporary.
Even if the root-proxy is created implicitly (not by the user), it
still has the only one reference to the root of the tree, when the
root proxy is no longer referenced, the tree goes away.

This may be what you said, but it sounded like quite a bit more was
being described.

  -- Ken


From ken@bitsko.slc.ut.us  Tue Feb 29 19:50:56 2000
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 29 Feb 2000 13:50:56 -0600
Subject: C extension (was Re: [XML-SIG] SAX 2.0 names)
In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 29 Feb 2000 11:31:35 -0500 (EST)"
References: <m37lfowk1j.fsf@lambda.garshol.priv.no> <14523.58937.766784.166817@weyr.cnri.reston.va.us> <m3r9dwq8st.fsf@lambda.garshol.priv.no> <14523.62567.356663.65659@weyr.cnri.reston.va.us>
Message-ID: <x5og8ziy8v.fsf_-_@bitsko.slc.ut.us>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> Lars Marius Garshol writes:
>  > Hmmm.  That might be the way to go.  I still wonder about the
>  > speed, though.
> 
>   If the C extension is actually available, it should be about the
> same a building a tuple; perhaps a *little* faster, but the
> difference would come out in the wash.

Speaking of C extensions, I've started some work on a C library
similar to what was discussed here a few months back: the ability to
capture attribute (property) access in an efficient way and support
generated values, parent proxies, and inherited properties (SVG comes
to mind).  The core is very grove-like and the implementation is
strongly influenced by Objective-C and Python, even though I started
it to implement solutions for my Perl modules.  Even though the core
is grove-like, I definitely want to be able to support a DOM layer
over it for those who prefer DOM.

The core of the library has no intentional Perl-isms in it, I would
really like to have a Python co-developer work with me so we can share
the resources developing it.  I will/would have made a Python binding
for it asap, but it'd be really nice if it happened earlier in the
development.  I'm just finishing up the core data model and will check
the source into my CVS server as soon as that's complete.  I'll
crosspost between both lists as developments occur.  The "basic"
integration of the core library and Perl only took a couple of hours,
I would expect about the same for Python.

  -- Ken