Python & XML & DTD (warning: noob attack!)

Andrew Clover and-google at doxdesk.com
Fri Jan 30 09:53:17 EST 2004


Peter Hansen <peter at engcorp.com> wrote:

> Unfortunately I don't know of any way you could generate the DTD again

It is possible to preserve the internal subset in DOM Level 3. You can
read it from the property DocumentType.internalSubset, and it will be
included in documents serialised by an LSSerializer.

It is not, however, possible to write to the internalSubset, and you can't
create a new DocumentType object with a non-empty internalSubset, for some
reason. So the only standard way to copy an internalSubset would be to make
the new document by parsing something with the same value, eg.:

  dtd= oldDocument.doctype.internalSubset
  parser= oldDocument.implementation.createLSParser(1, None)
  input= oldDocument.implementation.createLSInput()
  input.stringData= '<!DOCTYPE x [%s]><x/>' % dtd
  newDocument= parser.parse(input)

> I've never seen a package which supports what you ask for

Plug time: the only package I know of to support DOM Level 3 is my own:

  http://www.doxdesk.com/software/py/pxdom.html

Currently this is based on the November 2003 CR spec; there have been a
number of fixes and changes to L3 functionality since, but I'm waiting for
W3C to publish the next draft (presumably Proposed Recommendation) before
releasing 1.0.

> Also, aren't DTDs sort of considered either obsolete or at least
> vastly inferior to the newer approaches such as XML Schema, or both?

Certainly they have their drawbacks: they're namespace-ignorant, not
flexible enough for some purposes, and they're a legacy bag on the side of
XML rather than something built on top of it in XML syntax.

Still, they're well-understood and widely supported, and simpler to learn
than Schema at least.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Python-list mailing list