[Python-Dev] Unicode input issues

M.-A. Lemburg mal@lemburg.com
Mon, 10 Apr 2000 18:01:52 +0200


Guido van Rossum wrote:
> 
> > > Finally, I believe we need a way to discover the encoding used by
> > > stdin or stdout.  I have to admit I know very little about the file
> > > wrappers that Marc wrote -- is it easy to get the encoding out of
> > > them?
> >
> > I'm not sure what you mean: the name of the input encoding ?
> > Currently, only the names of the encoding and decoding functions
> > are available to be queried.
> 
> Whatever is helpful for a module or program that wants to know what
> kind of encoding is used.
> 
> > > IDLE should probably emulate this, as it's encoding is clearly
> > > UTF-8 (at least when using Tcl 8.1 or newer).
> >
> > It should be possible to redirect sys.stdin/stdout using
> > the codecs.EncodedFile wrapper. Some tests show that raw_input()
> > doesn't seem to use the redirected sys.stdin though...
> >
> > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1')
> > >>> s = raw_input()
> > äöü
> > >>> s
> > '\344\366\374'
> > >>> s = sys.stdin.read()
> > äöü
> > >>> s
> > '\303\244\303\266\303\274\012'

The latter is the "correct" output, BTW.
 
> This deserves more looking into.  The code for raw_input() in
> bltinmodule.c certainly *tries* to use sys.stdin.  (I think that
> because your EncodedFile object is not a real stdio file object, it
> will take the second branch, near the end of the function; this calls
> PyFile_GetLine() which attempts to call readline().)
> 
> Aha!  It actually seems that your read() and readline() are
> inconsistent!

They are because I haven't yet found a way to implement
readline() without buffering read-ahead data. The only way
I can think of to implement it without buffering would be
to read one char at a time which is much too slow.
 
Buffering is hard to implement right when assuming that
streams are stacked... every level would have its own
buffering scheme and mixing .read() and .readline()
wouldn't work too well. Anyway, I'll give it try...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/