Problem with Unicode char in Python 3.3.0

Terry Reedy tjreedy at udel.edu
Tue Jan 8 03:40:47 EST 2013


On 1/7/2013 8:12 AM, Terry Reedy wrote:
> On 1/7/2013 7:57 AM, Franck Ditter wrote:
>
>> <<< print('\U0001d11e')
>> Traceback (most recent call last):
>>    File "<pyshell#1>", line 1, in <module>
>>      print('\U0001d11e')
>> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
>> in position 0: Non-BMP character not supported in Tk
>
> The message comes from printing to a tk text widget (the IDLE shell),
> not from creating the 1 char string. c = '\U0001d11e' works fine. When
> you have problems with creating and printing unicode, *separate*
> creating from printing to see where the problem is. (I do not know if
> the brand new tcl/tk 8.6 is any better.)
>
> The windows console also chokes, but with a different message.
>
>  >>> c='\U0001d11e'
>  >>> print(c)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
>      return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
> in posit
> ion 0: character maps to <undefined>
>
> Yes, this is very annoying, especially in Win 7.

The above is in 3.3, in which '\U0001d11e' is actually translated to a 
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow 
builds, as on Windows) to a length 2 string surrogate pair (in the BMP). 
On printing, the pair of surrogates got translated to a square box used 
for all characters for which the font does not have a glyph.  𝄞When cut 
and pasted, it shows in this mail composer as a weird music sign with 
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
---   𝄞   ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on 
printing.

-- 
Terry Jan Reedy





More information about the Python-list mailing list