encoding problem CP850 to ISO_8859_9

v.wehren v.wehren at home.nl
Thu May 23 08:27:51 EDT 2002


When trying to convert the encoding of a file originally in CP850 to
ISO_8859_2 (Latin9), there are some mappings causing a UnicodeError which is
rather unexpected, especially since the characters are available in both
encodings (unlike most of the "box drawing" stuff, and so on, where a "maps
to <undefined>" is  logical. The offensive characters (plus the values they
eventually should be mapped to) are:

             0xD0  #0xD0 in CP850 should map to 0xF0 = LATIN SMALL LETTER
ETH in Latin9
             0xD1  #0xD1 in CP850 should map to 0xD0 = LATIN CAPITAL LETTER
ETH in Latin9
             0xE7  #0xE7 in CP850 should map to 0xFE = LATIN SMALL LETTER
THORN in Latin9
             0xE8  #0xE8 in CP850 should map to 0xDE = LATIN CAPITAL LETTER
THORN in Latin9
             0xEC  #0xEC in CP850 should map to 0xFD = LATIN SMALL Y WITH
ACUTE in Latin9
             0xED  #0xED inCP850 should map to 0xDD LATIN CAPITAL LETTER Y
WITH ACUTE in Latin9

Each of these values raise an unexpected UnicodeError:

>>> oem2iso = unicode('\xD0', 'CP850')
>>> oem = unicode('\xD0', 'CP850')
>>> iso = oem.encode('iso_8859_9')
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in ?
    iso = oem.encode('iso_8859_9')
  File "C:\PYTHON22\lib\encodings\iso8859_9.py", line 18, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeError: charmap encoding error: character maps to <undefined>

Should the encoding_map be enhanced? Or am I missing something...

Regards..

vincent wehren
(vincent at visualtrans.de)










More information about the Python-list mailing list