Newbie question about text encoding

Rustom Mody rustompmody at gmail.com
Sun Mar 8 00:25:32 EST 2015


On Saturday, March 7, 2015 at 11:41:53 AM UTC+5:30, Terry Reedy wrote:
> On 3/6/2015 11:20 AM, Rustom Mody wrote:
> 
> > =========
> > pp = "💩"
> > print (pp)
> > =========
> > Try open it in idle3 and you get (at least I get):
> >
> > $ idle3 ff.py
> > Traceback (most recent call last):
> >    File "/usr/bin/idle3", line 5, in <module>
> >      main()
> >    File "/usr/lib/python3.4/idlelib/PyShell.py", line 1562, in main
> >      if flist.open(filename) is None:
> >    File "/usr/lib/python3.4/idlelib/FileList.py", line 36, in open
> >      edit = self.EditorWindow(self, filename, key)
> >    File "/usr/lib/python3.4/idlelib/PyShell.py", line 126, in __init__
> >      EditorWindow.__init__(self, *args)
> >    File "/usr/lib/python3.4/idlelib/EditorWindow.py", line 294, in __init__
> >      if io.loadfile(filename):
> >    File "/usr/lib/python3.4/idlelib/IOBinding.py", line 236, in loadfile
> >      self.text.insert("1.0", chars)
> >    File "/usr/lib/python3.4/idlelib/Percolator.py", line 25, in insert
> >      self.top.insert(index, chars, tags)
> >    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 81, in insert
> >      self.addcmd(InsertCommand(index, chars, tags))
> >    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 116, in addcmd
> >      cmd.do(self.delegate)
> >    File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 219, in do
> >      text.insert(self.index1, self.chars, self.tags)
> >    File "/usr/lib/python3.4/idlelib/ColorDelegator.py", line 82, in insert
> >      self.delegate.insert(index, chars, tags)
> >    File "/usr/lib/python3.4/idlelib/WidgetRedirector.py", line 148, in __call__
> >      return self.tk_call(self.orig_and_operation + args)
> > _tkinter.TclError: character U+1f4a9 is above the range (U+0000-U+FFFF) allowed by Tcl
> >
> > So who/what is broken?
> 
> tcl
> The possible workaround is for Idle to translate "💩" to "\U0001f4a9" 
> (10 chars) before sending it to tk.
> 
> But some perspective.  In the console interpreter:
> 
>  >>> print("\U0001f4a9")
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "C:\Programs\Python34\lib\encodings\cp437.py", line 19, in encode
>      return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4a9' 
> in posit
> ion 0: character maps to <undefined>
> 
> So what is broken?  Windows Command Prompt.
> 
> More perspective.  tk/Idle *will* print *something* for every BMP char. 
>   Command Prompt will not.  It does not even do ucs-2 correctly. So 
> which is more broken?  Windows Command Prompt.  Who has perhaps 
> 1,000,000 times more resources, Microsoft? or the tcl/tk group?  I think 
> we all know.

Thanks Terry for the perspective.

>From my side:

No complaints about python or tcl (or idle -- its actually neater than emacs
if only emacs was not burnt into my nervous system)

Even unicode -- only marginal complaints.
I wrote http://blog.languager.org/2015/02/universal-unicode.html
precisely to say that unicode is a wonderful thing and one should be 
enthusiastic
about it.
[You got that better than anyone else who has spoken -- Thanks]

Xah's pages are way more comprehensive than mine.
But comprehensive can be a negative -- ultimately the unicode standard is
the most comprehensive and correspondingly impenetrable without a compass.

The only very minor complaint I would make is:
If idle is unable to deal with SMP-chars and this is known and unlikely to change
(until TK changes), why not put up a dialog of the sort:
SMP char on line <nn>
SMP support currently unimplemented -- Sorry

instead of a backtrace?

[As I said just a suggestion]



More information about the Python-list mailing list