XML help needed

Paul Boddie paul at boddie.net
Mon Nov 26 09:55:16 EST 2001


Martin von Loewis <loewis at informatik.hu-berlin.de> wrote in message news:<j4d725n9ua.fsf at informatik.hu-berlin.de>...
> "Duncan Smith" <buzzard at urubu.freeserve.co.uk> writes:
> 
> > Thanks for the reply.  Basically I have an XML file containing data
> > that I need as various Python objects (strings, lists, arrays etc.).
> > I don't need to make changes to XML files, although I might want to
> > write data to new XML files of the same type (same DTD).
> 
> I see. In that case, I recommend to use plain print statements to
> generate the XML (or other means of string processing, like collecting
> pieces of the document in a list). This is easier than generating a
> DOM or SAX in-memory representation, and then serializing it.

Typically, I would agree that this is the case, although it can limit
your flexibility later on in cases where you might, for example, want
to generate documents in a "random access" fashion. Another danger is
that non-well-formed documents can be produced, whereas the xml.dom
package takes care of most of this for you.

> > (I don't really want to have to learn about XSL, XSLT, X-path, SAX,
> > DOM, Pyxie etc. unless I know it's going to be useful.)
> 
> Depending on the exact processing job, XSLT may be indeed useful. If
> you know that the output only depends on the input, then writing a
> style sheet that does the transformation, without writing a line of
> Python, might be possible.

But for anything more than really simple transformations, I'd think
that the terminology of the DOM, if nothing else, would be unavoidable
in XSLT (and with the terminology arguably comes most of the
complexity). XSLT is nice for certain situations, but it's not
something I would instinctively reach for, but then I've possibly done
some fairly bizarre things with it.

For beginners, xml.dom.minidom seems to be a good place to start, at
least from PyXML 0.6.6 onwards - there just aren't as many obscurities
in knowing how to parse documents as there are for 4DOM (or something
like the javax.xml package). It's also much faster than 4DOM as far as
I can tell.

[...]

> > Martin, you were bang on with the 'invalid XML' thing.  Cheers.
> 
> That's a phrase (to be bang on) that is beyond my English
> understanding. I take it to mean something good, in absence of a clear
> understanding :-)

If someone was "bang on" about something then it usually means that
they were accurate; another similar term is "spot on". Like a lot of
English slang, the meaning of a phrase can change dramatically (and
amusingly) if one makes it a verb ("to bang on" could mean "to
endlessly repeat the same thing"), omits or changes the preposition
("to bang"), or introduces things that the verb refers to... :-)

Paul



More information about the Python-list mailing list