xml encoding in minidom

Martin v. Loewis martin at v.loewis.de
Tue Apr 9 16:09:01 EDT 2002


joakim.storck at home.se (Joakim Storck) writes:

> I've been playing around with the xml.dom.minidom, but I have some
> problems with the document encoding. What I'm trying to do is
> basically to take care of data from a web form using the cgi-module,
> then insert data into a xml-document using the minidom. However, the
> data does contain swedish characters (å, ä, ö) which messes things up
> more than I could have imagined.

The easiest thing should be to use the XML default encoding, UTF-8.

> Since the minidom has no way of setting document encoding directly, i
> tried the following:
> 
> doc = xml.dom.minidom.Document()
> pi = doc.createProcessingInstruction('xml','version="1.0"
> encoding="ISO-8859-1" ')
> doc.appendChild(pi)

Notice that the <?xml header is *not* a processing instruction,
although it looks like one.

> There is also some kind of problem with the minidom.toxml()-method.
> From what I've read in other postings it has to do with file encoding,
> which seems to be set to UTF-8 by default.

That is not the default file encoding, it is the default encoding
mandated by the XML recommendation. minidom does not support emission
of XML in any other encoding.

> I'm no expert, neither at xml nor Python, but it seems I'm not the
> only one who experienced these problems.

It might be easiest if you stop fighting the defaults. If you insist
on generating XML documents in Latin-1, you need to write the XML
header explicitly, then use writexml of the root node.

Regards,
Martin



More information about the Python-list mailing list