small inconsistency in ElementTree (1.2.6)

Damjan gdamjan at gmail.com
Sat Dec 10 11:17:48 EST 2005


>>> ascii strings and unicode strings are perfectly interchangable, with
>>> some minor exceptions.
>>
>> It's not only translate, it's decode too...
>
> why would you use decode on the strings you get back from ET ?

Long story... some time ago when computers wouldn't support charsets
people
invented so called "cyrillic fonts" - ie a font that has cyrillic
glyphs
mapped on the latin posstions. Since our cyrillic alphabet has 31
characters, some characters in said fonts were mapped to { or ~ etc..
Of
course this ,,sollution" is awful but it was the only one at the
time.

So I'm making a python script that takes an OpenDocument file and
translates
it to UTF-8...

ps. I use translate now, but I was making a general note that unicode
and
string objects are not 100% interchangeable. translate, encode, decode
are
especially problematic.

anyway, I wrap the output of ET in unicode() now... I don't see
another, better, sollution.




More information about the Python-list mailing list