Unicode strings -> xml.dom.minidom Text elements?
Martin v. Loewis
martin at v.loewis.de
Mon Oct 21 13:31:21 EDT 2002
Patrick Surry <Patrick.Surry at quadstone.com> writes:
> and am stuffing it into an xml.dom.minidom Text() element. But when I
> serialize the document with doc.writexml(), it turns into:
>
> <text>ABC?DEF</text>
I find that hard to believe. Are you sure it really puts a question
mark in there? Or is it just that your email program is not capable of
sending GREEK CAPITAL LETTER SIGMA?
Have you, by any chance, modified sys.setdefaultencoding?
> This seems to be because writexml() effectively does
>
> writer.write('%s' % a)
>
> making the unicode character turn into a '?'
Extremely unlikely. Can you show a complete program that demonstrates
this problem?
> Am I doing something dumb and/or is there a workaround I could use
> other than writing my own XML unicode character escaper...
As a starting point, I recommend that you refrain from setting the
default encoding to "mbcs". As the next step, I recommend that you try
to save the XML document in UTF-8.
As it is, writexml is not capable of escaping characters itself. So
you will find that writexml gives you a Unicode string, which you need
to encode as UTF-8 yourself.
Depending on where exactly you got writexml from, you may find that it
has an encoding= parameter. It still won't produce character
references, though.
Regards,
Martin
More information about the Python-list
mailing list