[Python-Dev] Unicode input issues

M.-A. Lemburg mal@lemburg.com
Mon, 10 Apr 2000 17:32:17 +0200


Guido van Rossum wrote:
> 
> Thinking about entering Japanese into raw_input() in IDLE more, I
> thought I figured a way to give Takeuchi a Unicode string when he
> enters Japanese characters.
> 
> I added an experimental patch to the readline method of the PyShell
> class: if the line just read, when converted to Unicode, has fewer
> characters but still compares equal (and no exceptions happen during
> this test) then return the Unicode version.
> 
> This doesn't currently work because the built-in raw_input() function
> requires that the readline() call it makes internally returns an 8-bit
> string.  Should I relax that requirement in general?  (I could also
> just replace __builtin__.[raw_]input with more liberal versions
> supplied by IDLE.)
> 
> I also discovered that the built-in unicode() function is not
> idempotent: unicode(unicode('a')) returns u'\000a'.  I think it should
> special-case this and return u'a' !

Good idea. I'll fix this in the next round.
 
> Finally, I believe we need a way to discover the encoding used by
> stdin or stdout.  I have to admit I know very little about the file
> wrappers that Marc wrote -- is it easy to get the encoding out of
> them? 

I'm not sure what you mean: the name of the input encoding ?
Currently, only the names of the encoding and decoding functions
are available to be queried.

> IDLE should probably emulate this, as it's encoding is clearly
> UTF-8 (at least when using Tcl 8.1 or newer).

It should be possible to redirect sys.stdin/stdout using
the codecs.EncodedFile wrapper. Some tests show that raw_input()
doesn't seem to use the redirected sys.stdin though...

>>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1')
>>> s = raw_input()
äöü
>>> s
'\344\366\374'
>>> s = sys.stdin.read()
äöü
>>> s
'\303\244\303\266\303\274\012'

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/