Unicode strings -> xml.dom.minidom Text elements?

Martin v. Loewis martin at v.loewis.de
Mon Oct 21 13:31:21 EDT 2002


Patrick Surry <Patrick.Surry at quadstone.com> writes:

> and am stuffing it into an xml.dom.minidom Text() element.  But when I
> serialize the document with doc.writexml(), it turns into:
> 
> <text>ABC?DEF</text>

I find that hard to believe. Are you sure it really puts a question
mark in there? Or is it just that your email program is not capable of
sending GREEK CAPITAL LETTER SIGMA?

Have you, by any chance, modified sys.setdefaultencoding?

> This seems to be because writexml() effectively does 
> 
> writer.write('%s' % a)
> 
> making the unicode character turn into a '?'

Extremely unlikely. Can you show a complete program that demonstrates
this problem?

> Am I doing something dumb and/or is there a workaround I could use
> other than writing my own XML unicode character escaper...

As a starting point, I recommend that you refrain from setting the
default encoding to "mbcs". As the next step, I recommend that you try
to save the XML document in UTF-8.

As it is, writexml is not capable of escaping characters itself. So
you will find that writexml gives you a Unicode string, which you need
to encode as UTF-8 yourself.

Depending on where exactly you got writexml from, you may find that it
has an encoding= parameter. It still won't produce character
references, though.

Regards,
Martin



More information about the Python-list mailing list