minidom and encoding problem
Martin v. Loewis
martin at v.loewis.de
Fri Jun 7 02:05:37 EDT 2002
ehab_teima at hotmail.com (Ehab Teima) writes:
> > This is a bug in your code. You must not insert (byte) string in a DOM
> > tree; always use Unicode objects.
>
> I do not have control over the sent text.
[I assume that the "sent text" is also the one that you pass to
createTextNode].
Even if you don't have that control, you still need to know what
encoding it uses. If you don't know the encoding, you cannot put it
into XML documents.
> The issue started when some bullets were copied from a word document
> and pasted into a file and the whole file was passed to my
> classes. I cound not find a way to convert this text to UTF-8 or
> anything else.
You don't need to convert it to UTF-8, you need to convert it to
Unicode objects. You can use the unicode() builtin to do that.
> Is there a way to prevent this from happening?
What is "this", and why do you want to prevent it from happening?
Regards,
Martin
More information about the Python-list
mailing list