Newbie question about text encoding
Rustom Mody
rustompmody at gmail.com
Sun Mar 8 00:25:32 EST 2015
On Saturday, March 7, 2015 at 11:41:53 AM UTC+5:30, Terry Reedy wrote:
> On 3/6/2015 11:20 AM, Rustom Mody wrote:
>
> > =========
> > pp = "💩"
> > print (pp)
> > =========
> > Try open it in idle3 and you get (at least I get):
> >
> > $ idle3 ff.py
> > Traceback (most recent call last):
> > File "/usr/bin/idle3", line 5, in <module>
> > main()
> > File "/usr/lib/python3.4/idlelib/PyShell.py", line 1562, in main
> > if flist.open(filename) is None:
> > File "/usr/lib/python3.4/idlelib/FileList.py", line 36, in open
> > edit = self.EditorWindow(self, filename, key)
> > File "/usr/lib/python3.4/idlelib/PyShell.py", line 126, in __init__
> > EditorWindow.__init__(self, *args)
> > File "/usr/lib/python3.4/idlelib/EditorWindow.py", line 294, in __init__
> > if io.loadfile(filename):
> > File "/usr/lib/python3.4/idlelib/IOBinding.py", line 236, in loadfile
> > self.text.insert("1.0", chars)
> > File "/usr/lib/python3.4/idlelib/Percolator.py", line 25, in insert
> > self.top.insert(index, chars, tags)
> > File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 81, in insert
> > self.addcmd(InsertCommand(index, chars, tags))
> > File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 116, in addcmd
> > cmd.do(self.delegate)
> > File "/usr/lib/python3.4/idlelib/UndoDelegator.py", line 219, in do
> > text.insert(self.index1, self.chars, self.tags)
> > File "/usr/lib/python3.4/idlelib/ColorDelegator.py", line 82, in insert
> > self.delegate.insert(index, chars, tags)
> > File "/usr/lib/python3.4/idlelib/WidgetRedirector.py", line 148, in __call__
> > return self.tk_call(self.orig_and_operation + args)
> > _tkinter.TclError: character U+1f4a9 is above the range (U+0000-U+FFFF) allowed by Tcl
> >
> > So who/what is broken?
>
> tcl
> The possible workaround is for Idle to translate "💩" to "\U0001f4a9"
> (10 chars) before sending it to tk.
>
> But some perspective. In the console interpreter:
>
> >>> print("\U0001f4a9")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Programs\Python34\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4a9'
> in posit
> ion 0: character maps to <undefined>
>
> So what is broken? Windows Command Prompt.
>
> More perspective. tk/Idle *will* print *something* for every BMP char.
> Command Prompt will not. It does not even do ucs-2 correctly. So
> which is more broken? Windows Command Prompt. Who has perhaps
> 1,000,000 times more resources, Microsoft? or the tcl/tk group? I think
> we all know.
Thanks Terry for the perspective.
>From my side:
No complaints about python or tcl (or idle -- its actually neater than emacs
if only emacs was not burnt into my nervous system)
Even unicode -- only marginal complaints.
I wrote http://blog.languager.org/2015/02/universal-unicode.html
precisely to say that unicode is a wonderful thing and one should be
enthusiastic
about it.
[You got that better than anyone else who has spoken -- Thanks]
Xah's pages are way more comprehensive than mine.
But comprehensive can be a negative -- ultimately the unicode standard is
the most comprehensive and correspondingly impenetrable without a compass.
The only very minor complaint I would make is:
If idle is unable to deal with SMP-chars and this is known and unlikely to change
(until TK changes), why not put up a dialog of the sort:
SMP char on line <nn>
SMP support currently unimplemented -- Sorry
instead of a backtrace?
[As I said just a suggestion]
More information about the Python-list
mailing list