handling unicode data

Filipe fcorreia at gmail.com
Tue Jul 4 10:43:06 EDT 2006


Martin v. Löwis wrote:
> Filipe wrote:
> > ---- output -------------------------------------------
> > u'Fran\xd8a'
> > FranØa
> > --------------------------------------------------------
> >
> > What do you think? Might it be Pymssql doing something wrong?
>
> I think the data in your database is already wrong. Are you
> sure the value in question is really "França" in the database?
>

yes, I'm pretty sure. There's an application that was built to run on
top of this database and it correctly reads as writes data to the DB. I
also used SqlServer's Query Analyzer to select the data and it
displayed fine.

I've done some more tests and I think I'm very close to finding what
the problem is. The tests I had done before were executed from the
windows command line. I tried printing the following (row[1] is a value
I selected from the database) in two distinct environments, from within
an IDE (Pyscripter)  and from the command line:

import sys
import locale
print getattr(sys.stdout,'encoding',None)
print locale.getdefaultlocale()[1]
print sys.getdefaultencoding()
term = "Fran\x87a"
print repr(term)
term = row[1]
print repr(term)

output I got in Pyscripter's interpreter window:
None
cp1252
ascii
'Fran\x87a'
'Fran\x87a'

output I got in the command line:
cp1252
cp1252
ascii
'Fran\x87a'
'Fran\xd8a'

I'd expect "print" to behave differently according with the console's
encoding, but does this mean this happens with repr() too?
in which way?

thanks,
Filipe




More information about the Python-list mailing list