Python 3.0b2 cannot map '\u12b'

Terry Reedy tjreedy at udel.edu
Mon Sep 1 02:27:54 EDT 2008



Tim Roberts wrote:
> josh logan <dear.jay.logan at gmail.com> wrote:
>> I am using Python 3.0b2.
>> I have an XML file that has the unicode character '\u012b' in it,
>> which, when parsed, causes a UnicodeEncodeError:
>>
>> 'charmap' codec can't encode character '\u012b' in position 26:
>> character maps to <undefined>
>>
>> This happens even when I assign this character to a reference in the
>> interpreter:
>>
>> Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
>> (Intel)] on
>> win32
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> s = '\u012b'
>>>>> s
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "C:\Python30\lib\io.py", line 1428, in write
>>    b = encoder.encode(s)
>>  File "C:\Python30\lib\encodings\cp437.py", line 19, in encode
>>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>> UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
>> position
>> 1: character maps to <undefined>
>>
>> Is this a known issue, or am I doing something wrong?
> 
> Both.  U+012B is the Latin lower-case i with macron (i with a bar instead
> of a dot).  That character does not exist in the 8-bit character set CP437,
> which you are trying to use.
> 
> If you choose an 8-bit character set that includes i-with-macron, then it
> will work.  UTF-8 would be a good choice.  It's in ISO-8859-10.

I doubt the OP 'chose' cp437.  Why does Python using cp437 even when the 
default encoding is utf-8?

On WinXP
 >>> sys.getdefaultencoding()
'utf-8'
 >>> s='\u012b'
 >>> s
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\Program Files\Python30\lib\io.py", line 1428, in write
     b = encoder.encode(s)
   File "C:\Program Files\Python30\lib\encodings\cp437.py", line 19, in 
encode
     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in 
position
1: character maps to <undefined>

To put it another way, how can one 'choose' utf-8 for display to screen?

Using IDLE, display works fine.

IDLE 3.0b2
 >>> s='\u012b'
 >>> s
'ī' # i macron
 >>> import sys
 >>> sys.getdefaultencoding()
'utf-8'

I ran across this is a different context and mentioned it on the bug 
tracker, but the Windows interpreter seems broken here.

I will send this in UTF-8 so the i-macron will hopefully show up.

tjr




More information about the Python-list mailing list