small inconsistency in ElementTree (1.2.6)
Damjan
gdamjan at gmail.com
Sat Dec 10 11:17:48 EST 2005
>>> ascii strings and unicode strings are perfectly interchangable, with
>>> some minor exceptions.
>>
>> It's not only translate, it's decode too...
>
> why would you use decode on the strings you get back from ET ?
Long story... some time ago when computers wouldn't support charsets
people
invented so called "cyrillic fonts" - ie a font that has cyrillic
glyphs
mapped on the latin posstions. Since our cyrillic alphabet has 31
characters, some characters in said fonts were mapped to { or ~ etc..
Of
course this ,,sollution" is awful but it was the only one at the
time.
So I'm making a python script that takes an OpenDocument file and
translates
it to UTF-8...
ps. I use translate now, but I was making a general note that unicode
and
string objects are not 100% interchangeable. translate, encode, decode
are
especially problematic.
anyway, I wrap the output of ET in unicode() now... I don't see
another, better, sollution.
More information about the Python-list
mailing list