encoding problem CP850 to ISO_8859_9
v.wehren
v.wehren at home.nl
Thu May 23 08:27:51 EDT 2002
When trying to convert the encoding of a file originally in CP850 to
ISO_8859_2 (Latin9), there are some mappings causing a UnicodeError which is
rather unexpected, especially since the characters are available in both
encodings (unlike most of the "box drawing" stuff, and so on, where a "maps
to <undefined>" is logical. The offensive characters (plus the values they
eventually should be mapped to) are:
0xD0 #0xD0 in CP850 should map to 0xF0 = LATIN SMALL LETTER
ETH in Latin9
0xD1 #0xD1 in CP850 should map to 0xD0 = LATIN CAPITAL LETTER
ETH in Latin9
0xE7 #0xE7 in CP850 should map to 0xFE = LATIN SMALL LETTER
THORN in Latin9
0xE8 #0xE8 in CP850 should map to 0xDE = LATIN CAPITAL LETTER
THORN in Latin9
0xEC #0xEC in CP850 should map to 0xFD = LATIN SMALL Y WITH
ACUTE in Latin9
0xED #0xED inCP850 should map to 0xDD LATIN CAPITAL LETTER Y
WITH ACUTE in Latin9
Each of these values raise an unexpected UnicodeError:
>>> oem2iso = unicode('\xD0', 'CP850')
>>> oem = unicode('\xD0', 'CP850')
>>> iso = oem.encode('iso_8859_9')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in ?
iso = oem.encode('iso_8859_9')
File "C:\PYTHON22\lib\encodings\iso8859_9.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeError: charmap encoding error: character maps to <undefined>
Should the encoding_map be enhanced? Or am I missing something...
Regards..
vincent wehren
(vincent at visualtrans.de)
More information about the Python-list
mailing list