handling unicode data

Wed Jul 5 07:14:21 EDT 2006

Martin v. Löwis wrote:
> Filipe wrote:
> > term = row[1]
> > print repr(term)
> >
> > output I got in Pyscripter's interpreter window:
> > 'Fran\x87a'
> >
> > output I got in the command line:
> > 'Fran\xd8a'
> >
> > I'd expect "print" to behave differently according with the console's
> > encoding, but does this mean this happens with repr() too?
>
> repr always generates ASCII bytes. They are not effected by the
> console's encoding. If you get different output, it really means
> that the values are different (check ord(row[1][4]) to be sure)

They do, in fact, output different values. The value outputed by
pyscripter was "135" (x87) while the value outputed in the command line
was "216" (xd8). I can't understand why though, because the script
being run is precisely the same on both environments.

> What is the precise sequence of statements that you used to
> set the "row" variable?

The complete script follows:
-----------------------------------------------------------------------
import sys
import locale
print getattr(sys.stdout,'encoding',None)
print locale.getdefaultlocale()[1]
print sys.getdefaultencoding()

import pymssql
mssqlConnection =
pymssql.connect(host='localhost',user='sa',password='password',database='TestDB')
cur = mssqlConnection.cursor()
query="Select ID, Term from TestTable where ID = 204;"
cur.execute(query)
row = cur.fetchone()
results = []
while row is not None:
    term = unicode(row[1], "cp850")
    print ord(row[1][4])
    print ord(term[4])
    print term
    results.append(term)
    row = cur.fetchone()
cur.close()
mssqlConnection.close()
print results
-----------------------------------------------------------------------

The values outputed were, in pyscripter:
None
cp1252
ascii
135
231
França
[uFran\xe7a']

and in the command line
cp850
cp1252
ascii
216
207
FranÏa
[u'Fran\xcfa']

regards,
Filipe