accent letters in xml

"Martin v. Löwis" martin at v.loewis.de
Fri May 2 06:49:21 EDT 2003


Alessio Pace wrote:

> Hi, I wrote a config file for an application in xml format.
> Some xml.dom.minidom.Text node must contain accents (they are real words)
> and so I write them in the default way: à è and so on...

Notice that these entities are predefined in HTML, but not predefined
in XML. So your document is invalid, as it does not refer to a DOCTYPE
that defines these entities.

I recommend that you use the UTF-8 representation of these characters,
instead of entity references. If this is not possible, you may use
character references (i.e. à è and so on)

If this is still not possible, you need to process skipped entities in 
the XML SAX handler. This is quite tricky, since you need to interact
with the DOM building process, and the standard DOM builder of Python 
2.3 does not even use SAX.

Regards,
Martin





More information about the Python-list mailing list