[issue1542677] IDLE shell gives different len() of unicode strings compared to Python shell

Santiago Gala report at bugs.python.org
Sun Apr 12 11:02:32 CEST 2009


Santiago Gala <sgala at apache.org> added the comment:

Updating the components as the error surfaces in the compile builtin.
the compile builtin works when given unicode, but fails when using a
utf8 (local input encoding) string.

Rather than adding a "coding" string to compile, my guess is that
compile should be fixed or fed a unicode string. See the effects on the
shell:

>>> print len('à')
2
>>> print len(u'à')
1
>>> exec compile("print len('à')",'test', 'single')
2
>>> exec compile("print len(u'à')",'test', 'single')
2
>>> exec compile("print len('à')".decode("utf8"),'test', 'single')
2
>>> exec compile("print len(u'à')".decode("utf8"),'test', 'single')
1
>>> 

So the error disappears when the string fed to exec compile is properly
decoded to unicode.

In idlelib there is an attempt to encode the input to
IOBindings.encoding, but IOBindings.encoding is broken here, as
locale.nl_langinfo(locale.CODESET) gives 'ANSI_X3.4-1968', which looks
up as 'ascii', while locale.getpreferredencoding() gives 'UTF-8' (as it
should).


If I comment the whole attempt, idle works (for this test, not fully
tested):

sgala at marlow ~ $ diff -u /tmp/PyShell.py 
/usr/lib64/python2.6/idlelib/PyShell.py
--- /tmp/PyShell.py	2009-04-12 11:01:01.000000000 +0200
+++ /usr/lib64/python2.6/idlelib/PyShell.py	2009-04-12
10:59:16.000000000 +0200
@@ -592,14 +592,14 @@
         self.more = 0
         self.save_warnings_filters = warnings.filters[:]
         warnings.filterwarnings(action="error", category=SyntaxWarning)
-        if isinstance(source, types.UnicodeType):
-            import IOBinding
-            try:
-                source = source.encode(IOBinding.encoding)
-            except UnicodeError:
-                self.tkconsole.resetoutput()
-                self.write("Unsupported characters in input\n")
-                return
+        #if isinstance(source, types.UnicodeType):
+        #    import IOBinding
+        #    try:
+        #        source = source.encode(IOBinding.encoding)
+        #    except UnicodeError:
+        #        self.tkconsole.resetoutput()
+        #        self.write("Unsupported characters in input\n")
+        #        return
         try:
             # InteractiveInterpreter.runsource() calls its runcode()
method,
             # which is overridden (see below)


>>> print len('á')
2
>>> print len(u'á')
1
>>> print 'á'
á
>>> print u'á'
á
>>> 


Now using Python 2.6.1 (r261:67515, Apr 10 2009, 14:34:00) on x86_64

----------
components: +Interpreter Core

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1542677>
_______________________________________


More information about the Python-bugs-list mailing list