[XML-SIG] dom building, sax, and namespaces
Sylvain Thenault
Sylvain.Thenault@logilab.fr
Fri, 25 Jan 2002 14:15:44 +0100 (CET)
On Fri, 25 Jan 2002, Andrew Dalke wrote:
> Me:
> > > Please correct me if I'm wrong. Doesn't XMLGenerator convert the SAX
> > > events to a text stream?
>
> Sylvain Thenault wrote:
> > no, XMLGenerator produce a DOM tree using 4DOM implementation.
[snip grep suite]
Oops, sorry, I was wrong, you were right. I was talking about
xml.dom.ext.reader.Sax2.XmlDomGenerator from PyXML
> I went to download 4Suite from CVS. I followed the CVS login commands
> at ftp://ftp.fourthought.com/pub/cvs-snapshots/ but it says
>
> josiah> cvs -z3 -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot
> co \
> -R STABLE FT
> cvs server: cannot find module `STABLE' - ignored
> cvs server: cannot find module `FT' - ignored
> cvs [checkout aborted]: cannot expand modules
> josiah>
the new module name for the latest 4Suite version is "4Suite".
> > I recommend you to read the W3C XPATH recommandation:
> > http://www.w3.org/TR/xpath
>
> I have tried. I find it hard going. Something hasn't yet clicked
> on how the data model works. That's also why I'm having problems
> working with the DOM. I know it will come with practice, but it's
> annoying in the meanwhile.
[snip]
> > Maybe should you think to use a database which could be queried using
> > XPATH ? (a database seem to be more adapted to your amount of data)
>
> I have a spectrum of solutions based on the technology I'm developing.
> The one I'm focusing on now is a BerkeleyDBM solution for simple id ->
> flat
> file record lookups. This fits in very closely with existing solutions,
> except for the use of XPath as the query language.
>
> The next step is to use an XML database. This is harder sell for now
> because no one I know in this field uses an XML database -- most are
> using relational databases. And once I mention "database server" they
> start fretting about getting a database manager, or they say their
> Oracle person has no experience with XML databases.
You could read http://www.rpbourret.com/xml/XMLDatabaseProds.htm to see a
list of native xml database and relationnal database which can be queried
with xpath.
> By having an intermediate solution for simple searches, it's an easier
> path to having a more complex database, since the API for existing
> tasks stays the same.
>
>
> > if bioformat:dbid is always a child of bioformat:record,
> > //bioformat:record/bioformat:dbid[@type='primary'] should be faster (less
> > solutions to explore)
> > Same thing may be applied if bioformat:record is always a child of your
> > root element.
>
> It isn't. Here's the data flow
>
> [existing flat file]
> |
> \/
> [format definition as]--> parser generator --> [parser]
> [a regular expression] |
> \/
> [SAX events in Python]
> / | \
> Special DOM Database
> purpose
> handlers
I was talking about applying xpath on a dom tree using a xpath compiler,
as in PyXML and 4Suite.
> The structure of the SAX events is the same as the original file,
> since I'm only adding markup. The format definition which produces
> the markup may include other intermediate elements, and I don't know
> what those might be beforehand. I can define some structure to it
> all, but mostly of the sort that
>
> "feature_location" must be a descendent of "feature"
>
> I can make no assertions as to how far that descendency is. Hence
> my liberal use of '//'s.
>
> > XPATH is rather easy to understand with a litle look at the
> > documentation. In order to use it on an xml document, you also have to
> > know the document structure.
>
> I insist that I have read it several times, read Kay's book, and read
> the 'Python&XML' book on the topic. I don't find it easy. For example,
> I definitely found SQL easier, in that I could understand it by looking
> at a few examples, rather than needing to read the documntation first.
> Now, I know the SQL data model is less complicated than XML's, but SQL
> is what all my potential customers are used to using, so at the very
> least I have to convince them that the complications are worth it.
So why don't you store your data in a standard relationnal database and
query it using SQL, if you and your customers find it easier ?
--
Sylvain Thenault
LOGILAB http://www.logilab.org