XML and UnicodeError

Pinke Panke dev at null.oo
Tue Oct 5 05:46:21 EDT 2004


Hello Paul,

thanky you for your answer.

> The solution is to use Unicode throughout.

I thought so, but it seemed to me not easy enough.


>   1. Let minidom provide you with Unicode values.

Yes, I assume this is the default behaviour of the minidom parser.


>   2. Convert any other text to Unicode as soon as possible.

Ok, i.e.

headline = structure[0] # is unicode
pagetext = structure[1] # is unicode
fill = "bar".encode('utf-8') # lets make it unicode
foo = headline + fill + pagetext # foo is unicode, too

?

>   3. Manipulate only Unicode values - don't mix them up with
>      plain strings.

It makes sense, but I need some string concatenations. E.g. I set 
default values in the python script and try to concatenate them with 
XML values.

But now, I would think the safest way is to transfer all plain strings 
in the python script into a second XML file and use them, because 
after reading in they would be in Unicode. Right?

Or saving the python script in utf-8 would make the difference?

>   4. Serialise to your chosen encoding only when preparing
>      output.

Every string concatenation in my script is preparing output.

I am looking forward to your answer.

Martin



More information about the Python-list mailing list