Puzzled by code pages

Adam Tauno Williams awilliam at whitemice.org
Fri May 14 20:27:18 EDT 2010


I'm trying to process OpenStep plist files in Python.  I have a parser
which works, but only for strict ASCII.  However plist files may contain
accented characters - equivalent to ISO-8859-2 (I believe).  For example
I read in the line:

>>> handle = open('file.txt', 'rb')
>>> data = handle.read()
>>> handle.close()
>>> data
'    "skyp4_filelist_10201/localit\xc3\xa0 termali_sortfield" =
NSFileName;\n'

What is the correct way to re-encode this data into UTF-8 so I can use
unicode strings, and then write the output back to ISO8859-?

I can read the file using codecs as ISO8859-2, but it still doesn't seem
correct.

>>> handle = codecs.open('file.txt', 'rb', encoding='iso8859-2')
>>> data = handle.read()
>>> handle.close()
>>> data
u'    "skyp4_filelist_10201/localit\u0102\xa0 termali_sortfield" =
NSFileName;\n'





More information about the Python-list mailing list