minidom and encoding problem

Martin v. Loewis martin at v.loewis.de
Fri Jun 7 02:05:37 EDT 2002


ehab_teima at hotmail.com (Ehab Teima) writes:

> > This is a bug in your code. You must not insert (byte) string in a DOM
> > tree; always use Unicode objects.
> 
> I do not have control over the sent text. 

[I assume that the "sent text" is also the one that you pass to
createTextNode].

Even if you don't have that control, you still need to know what
encoding it uses. If you don't know the encoding, you cannot put it
into XML documents.

> The issue started when some bullets were copied from a word document
> and pasted into a file and the whole file was passed to my
> classes. I cound not find a way to convert this text to UTF-8 or
> anything else.

You don't need to convert it to UTF-8, you need to convert it to
Unicode objects. You can use the unicode() builtin to do that.

> Is there a way to prevent this from happening?

What is "this", and why do you want to prevent it from happening?

Regards,
Martin



More information about the Python-list mailing list