[XML-SIG] dom building, sax, and namespaces

Sylvain Thenault Sylvain.Thenault@logilab.fr
Fri, 25 Jan 2002 14:15:44 +0100 (CET)


On Fri, 25 Jan 2002, Andrew Dalke wrote:

> Me:
> > > Please correct me if I'm wrong.  Doesn't XMLGenerator convert the SAX
> > > events to a text stream?
> 
> Sylvain Thenault wrote:
> > no, XMLGenerator produce a DOM tree using 4DOM implementation.
[snip grep suite]

Oops, sorry, I was wrong, you were right. I was talking about
xml.dom.ext.reader.Sax2.XmlDomGenerator from PyXML

> I went to download 4Suite from CVS.  I followed the CVS login commands
> at ftp://ftp.fourthought.com/pub/cvs-snapshots/ but it says
> 
> josiah> cvs -z3 -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot
> co \
>  -R STABLE FT
> cvs server: cannot find module `STABLE' - ignored
> cvs server: cannot find module `FT' - ignored
> cvs [checkout aborted]: cannot expand modules
> josiah>

the new module name for the latest 4Suite version is "4Suite".

> > I recommend you to read the W3C XPATH recommandation:
> > http://www.w3.org/TR/xpath
> 
> I have tried.  I find it hard going.  Something hasn't yet clicked
> on how the data model works.  That's also why I'm having problems
> working with the DOM.  I know it will come with practice, but it's
> annoying in the meanwhile.
[snip] 
> > Maybe should you think to use a database which could be queried using
> > XPATH ? (a database seem to be more adapted to your amount of data)
> 
> I have a spectrum of solutions based on the technology I'm developing.
> The one I'm focusing on now is a BerkeleyDBM solution for simple id ->
> flat
> file record lookups.  This fits in very closely with existing solutions,
> except for the use of XPath as the query language.
> 
> The next step is to use an XML database.  This is harder sell for now
> because no one I know in this field uses an XML database -- most are
> using relational databases.  And once I mention "database server" they
> start fretting about getting a database manager, or they say their
> Oracle person has no experience with XML databases.

You could read http://www.rpbourret.com/xml/XMLDatabaseProds.htm to see a
list of native xml database and relationnal database which can be queried
with xpath.
 
> By having an intermediate solution for simple searches, it's an easier
> path to having a more complex database, since the API for existing
> tasks stays the same.
> 
> 
> > if bioformat:dbid is always a child of bioformat:record,
> > //bioformat:record/bioformat:dbid[@type='primary'] should be faster (less
> > solutions to explore)
> > Same thing may be applied if bioformat:record is always a child of your
> > root element.
> 
> It isn't.  Here's the data flow
> 
>                                             [existing flat file]
>                                                     |
>                                                    \/
>   [format definition as]--> parser generator --> [parser]
>   [a regular expression]                            |
>                                                    \/
>                                              [SAX events in Python]
>                                                /   |    \
>                                          Special  DOM   Database
>                                          purpose
>                                          handlers

I was talking about applying xpath on a dom tree using a xpath compiler,
as in PyXML and 4Suite.

> The structure of the SAX events is the same as the original file,
> since I'm only adding markup.  The format definition which produces
> the markup may include other intermediate elements, and I don't know
> what those might be beforehand.  I can define some structure to it
> all, but mostly of the sort that
> 
>   "feature_location" must be a descendent of "feature"
> 
> I can make no assertions as to how far that descendency is.  Hence
> my liberal use of '//'s.
> 
> > XPATH is rather easy to understand with a litle look at the
> > documentation. In order to use it on an xml document, you also have to
> > know the document structure.
> 
> I insist that I have read it several times, read Kay's book, and read
> the 'Python&XML' book on the topic.  I don't find it easy.  For example,
> I definitely found SQL easier, in that I could understand it by looking
> at a few examples, rather than needing to read the documntation first.
> Now, I know the SQL data model is less complicated than XML's, but SQL
> is what all my potential customers are used to using, so at the very
> least I have to convince them that the complications are worth it.

So why don't you store your data in a standard relationnal database and
query it using SQL, if you and your customers find it easier ?
  
-- 
Sylvain Thenault

  LOGILAB           http://www.logilab.org