[Python-3000] New io system and binary data

Bill Janssen janssen at parc.com
Wed Sep 19 19:56:38 CEST 2007


GvR wrote:
> I wouldn't do the assignments you propose
> though, since that might surprise other code which expects text files.

But presumably that code wouldn't be used in that same program.

This really isn't a UTF-8 problem.  It is the problem with file opens
defaulting to "text" mode instead of "binary" mode rearing its ugly
head again.

Bill

> Changing the mode between text and binary is not feasible (since it
> would have to change the class). But it is perfectly acceptable to use
> sys.std{in,out}.buffer if you need to write a binary transparent
> filter. Of course you'll be dealing with bytes at that point so the
> usual cautions apply. I wouldn't do the assignments you propose
> though, since that might surprise other code which expects text files.
> 
> --Guido
> 
> On 9/19/07, Christian Heimes <lists at cheimes.de> wrote:
> > Today I stumbled over another problem that is related to the unicode and
> > OS string topic. The new io system - or to be more precisely the
> > implicit converting of input and output data to UTF-8 makes it
> > impossible to pipe binary data through Python 3.0.
> >
> > For example an user wants to write a filter for binary data like images
> > in Python. With Python 2.5 the input and output data isn't implicitly
> > converted:
> >
> > # stdredirect.py
> > # simple stupid example
> > import sys
> > sys.stdout.write(sys.stdin.read())
> >
> > $ chmod 755 stdredict.py
> > $ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
> > $ diff ./Mac/Demo/html.icons/python.gif out.gif
> >
> > But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:
> >
> > $ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
> > >out.gifTraceback (most recent call last):
> >   File "./stdredict.py", line 4, in <module>
> >     sys.stdout.write(sys.stdin.read())
> >   File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
> >     res += decoder.decode(self.buffer.read(), True)
> >   File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
> >     (result, consumed) = self._buffer_decode(data, self.errors, final)
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
> > invalid data
> >
> > An easy workaround for the problem is:
> >
> > sys.stdout = sys.stdout.buffer
> > sys.stdin = sys.stdin.buffer
> >
> > I recommend that the problem and fix gets documented. Maybe stdin,
> > stdout and stderr should get a method that disables the implicit
> > conversion like setMode("b") / setMode("t").


More information about the Python-3000 mailing list