the unicode saga continues...

Ethan Furman ethan at stoneleaf.us
Sat Nov 14 00:33:53 EST 2009


So I've added unicode support to my dbf package, but I also have some 
rather large programs that aren't ready to make the switch over yet.  So 
as a workaround I added a (rather lame) option to convert the 
unicode-ified data that was decoded from the dbf table back into an 
encoded format.

Here's the fun part:  in figuring out what the option should be for use 
with my system, I tried some tests...

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> print u'\xed'
í
 >>> print u'\xed'.encode('cp437')
í
 >>> print u'\xed'.encode('cp850')
í
 >>> print u'\xed'.encode('cp1252')
φ
 >>> import locale
 >>> locale.getdefaultlocale()
('en_US', 'cp1252')

My confusion lies in my apparant codepage (cp1252), and the discrepancy 
with character u'\xed' which is absolutely an i with an accent; yet when 
I encode with cp1252 and print it, I get an o with a line.

Can anybody clue me in to what's going on here?

~Ethan~



More information about the Python-list mailing list