Hebrew in idle ans eclipse (Windows)

iu2 israelu at elbit.co.il
Tue Jan 22 16:55:36 EST 2008


On Jan 17, 10:35 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> ...
>   print lines[0].decode("<a-encoding>").encode("<c-encoding>")
> ...
> Regards,
> Martin

Ok, I've got the solution, but I still have a question.

Recall:
When I read data using sql I got a sequence like this:
\x88\x89\x85
But when I entered heberw words directly in the print statement (or as
a dictionary key)
I got this:
\xe8\xe9\xe5

Now, scanning the encoding module I discovered that cp1255 maps
'\u05d9' to \xe9
while cp856 maps '\u05d9' to \x89,
so trasforming \x88\x89\x85 to \xe8\xe9\xe5 is done by

s.decode('cp856').encode('cp1255')

ending up with the pattern you suggested.

My qestion is, is there a way I can deduce cp856 and cp1255 from the
string itself? Is there a function doing it? (making the
transformation more robust)

I don't know how IDLE guessed cp856, but it must have done it.
(perhaps because it uses tcl, and maybe tcl guesses the encoding
automatically?)

thanks
iu2






More information about the Python-list mailing list