XML question, DOM or SAX?

Suchandra Thapa ssthapa at classes.cs.uchicago.edu
Fri Oct 18 19:57:55 EDT 2002


Markus von Ehr <markus.vonehr at ipm.fhg.de> wrote:
> no, I don't have rgb values, inside one spec I have about 1000
> values representing spectral lines.
> I don't want to enclose every value inside something like
><red>
></red>
> the efficiency would be very bad...

    From a programming perspective it would be easiest to have something like
<spectrum attribute1="..." attribute2="..."> if at all possible.  It makes
parsing it using the SAX parser a lot easier if you don't have to worry about
CDATA.  If that's not feasible then something like:
<spectra>
  <spectrum>
    <line value1="..." value2="..." value3="..." />
    <line value1="..." value2="..." value3="..." />
    <line value1="..." value2="..." value3="..." />
    ...
  </spectrum>
  ...
</spectra>
would be almost as good.  Your current schema has some problems since the data
will be presented as one big string of values with a single space between each
number (remember that a XML parser will normalize the input and replace
multiple spaces and newlines with a single space).   If your looking for the
most efficient solution, why not have something raw data seperate by some type
of record marker, e.g.:
spectrum
1 2 3
1 2 3 
...

spectrum
1 2 3
1 2 3
...

    Also you should be aware that the DOM parser takes a lot more memory than
the  SAX parser since the DOM parser generates a tree structure of the document
in memory.  If you documents are going to be large then the DOM parser may need
a significant amount of memory.
    
-- 
----------------------------------------------------------------------------
Suchandra Thapa                       
s-thapa-11 at NOSPAMalumni.uchicago.edu  
----------------------------------------------------------------------------



More information about the Python-list mailing list