How to get an XML DOM while offline?

Paul Boddie paul at boddie.org.uk
Wed Mar 19 13:19:22 EDT 2008


On 19 Mar, 16:27, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> william tanksley wrote:
> > I want to parse my iTunes Library xml. All was well, until I unplugged
> > and left for the train (where I get most of my personal projects
> > done). All of a sudden, I discovered that apparently the presence of a
> > DOCTYPE in the iTunes XML makes xml.dom.minidom insist on accessing
> > the Internet... So suddenly I was unable to do any work.

The desire to connect to the Internet for DTDs is documented in the
following bug:

http://bugs.python.org/issue2124

However, I can't reproduce the problem using xml.dom.minidom.parse/
parseString and plain XHTML, although I may be missing something which
activates the retrieval of the DTD.

> > I don't want to modify the iTunes XML; iTunes rewrites it too often.
> > How can I prevent xml.dom.minidom from dying when it can't access the
> > Internet?
>
> > Is there a simpler way to read the iTunes XML? (It's merely a plist,
> > so the format is much simpler than general XML.)
>
> Normally, this should be solved using an entity-handler that prevents the
> remote fetching. I presume the underlying implementation of a SAX-parser
> does use one, but you can't override that (at least I didn't find anything
> in the docs)

There's a lot of complicated stuff in the xml.dom package, but I found
that the DOMBuilder class (in xml.dom.xmlbuilder) probably contains
the things which switch such behaviour on or off. That said, I've
hardly ever used the most formal DOM classes to parse XML in Python
(where you get the DOM implementation and then create other factory
classes - it's all very "Java" in nature), so the precise incantation
is unknown/forgotten to me.

> The most pragmatic solution would be to rip the doctype out using simple
> string methods and/or regexes.

Maybe, but an example fragment of the XML might help us diagnose the
problem, ideally with some commentary from the people who wrote the
xml.dom software in the first place.

Paul



More information about the Python-list mailing list