Hebrew in idle ans eclipse (Windows)

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 23 04:17:54 EST 2008


> Recall:
> When I read data using sql I got a sequence like this:
> \x88\x89\x85
> But when I entered heberw words directly in the print statement (or as
> a dictionary key)
> I got this:
> \xe8\xe9\xe5
> 
> Now, scanning the encoding module I discovered that cp1255 maps
> '\u05d9' to \xe9
> while cp856 maps '\u05d9' to \x89,
> so trasforming \x88\x89\x85 to \xe8\xe9\xe5 is done by

Hebrew Windows apparently uses cp1255 (aka windows-1255) as
the "ANSI code page", used in all GUI APIs, and cp856 as the
"OEM code page", used in terminal window - and, for some reason,
in MS SQL.

> My qestion is, is there a way I can deduce cp856 and cp1255 from the
> string itself?

That's not possible. You have to know where the string comes from.
to know what the encoding is.

In the specific case, if the string comes out of MS SQL, it apparently
has cp856 (but I'm sure you can specify the client encoding somewhere
in SQL server, or in pymssql)

> I don't know how IDLE guessed cp856, but it must have done it.

I don't know why you think it did. You said you entered \xe9 directly
into the source code in IDLE, so
a) this is windows-1255, not cp856, and
b) IDLE just *used* windows-1255 (i.e. the ANSI code page), it did
   not guess it.

If you are claimaing that the program

import pymssql

con =
pymssql.connect(host='192.168.13.122',user='sa',password='',database='tempdb')
cur = con.cursor()
cur.execute('select firstname, lastname from [users]')
lines = cur.fetchall()
print repr(lines[0])

does different things depending on whether it is run in IDLE or in a
terminal window - I find that hard to believe. IDLE/Tk has nothing to
do with that. It's the *repr* that you are printing, ie. all escaping
has been done before IDLE/Tk even sees the text. So it must have been
pymssql that returns different data in each case.

It could be that the DB-API does such things, see

http://msdn2.microsoft.com/en-us/library/aa937147(SQL.80).aspx

Apparently, they do the OEMtoANSI conversion when you run a console
application (i.e. python.exe), whereas they don't convert when running
a GUI application (pythonw.exe).

I'm not quite sure how they find out whether the program is a console
application or not; the easiest thing to do might be to turn the
autoconversion off on the server.


Regards,
Martin



More information about the Python-list mailing list