Newbie question about text encoding

Terry Reedy tjreedy at udel.edu
Sat Mar 7 01:11:01 EST 2015


On 3/6/2015 11:20 AM, Rustom Mody wrote:

> =========
> pp = "💩"
> print (pp)
> =========
> Try open it in idle3 and you get (at least I get):
>
> $ idle3 ff.py
> Traceback (most recent call last):
>    File "/usr/bin/idle3", line 5, in <module>
>      main()
>    File "/usr/lib/python3.4/idlelib/PyShell.py", line 1562, in main
>      if flist.open(filename) is None:
>    File "/usr/lib/python3.4/idlelib/FileList.py", line 36, in open
>      edit = self.EditorWindow(self, filename, key)
>    File "/usr/lib/python3.4/idlelib/PyShell.py", line 126, in __init__
>      EditorWindow.__init__(self, *args)
>    File "/usr/lib/python3.4/idlelib/EditorWindow.py", line 294, in __init__
>      if io.loadfile(filename):
>    File "/usr/lib/python3.4/idlelib/IOBinding.py", line 236, in loadfile
>      self.text.insert("1.0", chars)
>    File "/usr/lib/python3.4/idlelib/Percolator.py", line 25, in insert
>      self.top.insert(index, chars, tags)
>    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 81, in insert
>      self.addcmd(InsertCommand(index, chars, tags))
>    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 116, in addcmd
>      cmd.do(self.delegate)
>    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 219, in do
>      text.insert(self.index1, self.chars, self.tags)
>    File "/usr/lib/python3.4/idlelib/ColorDelegator.py", line 82, in insert
>      self.delegate.insert(index, chars, tags)
>    File "/usr/lib/python3.4/idlelib/WidgetRedirector.py", line 148, in __call__
>      return self.tk_call(self.orig_and_operation + args)
> _tkinter.TclError: character U+1f4a9 is above the range (U+0000-U+FFFF) allowed by Tcl
>
> So who/what is broken?

tcl
The possible workaround is for Idle to translate "💩" to "\U0001f4a9" 
(10 chars) before sending it to tk.

But some perspective.  In the console interpreter:

 >>> print("\U0001f4a9")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\Programs\Python34\lib\encodings\cp437.py", line 19, in encode
     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4a9' 
in posit
ion 0: character maps to <undefined>

So what is broken?  Windows Command Prompt.

More perspective.  tk/Idle *will* print *something* for every BMP char. 
  Command Prompt will not.  It does not even do ucs-2 correctly. So 
which is more broken?  Windows Command Prompt.  Who has perhaps 
1,000,000 times more resources, Microsoft? or the tcl/tk group?  I think 
we all know.

-- 
Terry Jan Reedy





More information about the Python-list mailing list