[Python-3000] Draft PEP for New IO system

Tue Feb 27 00:48:13 CET 2007

On 2/26/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > Daniel Stutzbach and I have prepared a draft PEP for the new IO system
> > for Python 3000.
>
> Thanks for doing this! Generally, it looks pretty good.

Agreed. I made some changes to the published doc, you may want to refresh it.

> > Additionally, it defines a few other methods:
> >
> >     (should these "is_" functions be attributes instead?
> > "file.readable == True")
> >
> >     .is_readable()
> [snip]
> >     .is_writable()
> [snip]
> >     .is_seekable()
> [snip]

These are now .readable() etc.

> > Additionally, the abstract base class provides one member variable:
> >
> >     .raw
> [snip]
>
> I gather that the reason for methods instead of attributes is that
> it's easier to delegate to a method than it is to an attribute?  That
> is::
>
>     def is_readable(self):
>         return self.raw.is_readable()
>
> is easier to write than::
>
>     @property
>     def readable(self):
>         return self.raw.readable
>
> If that's the motivation, I'd assume that we'd want a ``get_raw()``
> method instead of the ``.raw`` attribute.  FWLIW, as a user, I'd
> rather just work with attributes.

No, the difference in API styles has more to do with that readable()
etc. *may* require actual work to be done to come up with a value
(especially seekable() may require one to try an lseek() syscall to
see if it work).

> > TextIOBase class implementations additionally provide the following methods:
> >
> >     .readline(self)
> >        Read until newline or EOF and return the line.
> >
> >     .readlinesiter()
> >        Returns an iterator that returns lines from the file (which
> > happens to be 'self').
> >
> >     .next()
> >        Same as readline()
> >
> >     .__iter__()
> >        Same as readlinesiter()
>
> If they do the same thing, why do we want them?  I gather that the
> next()/readline() duplication is for backwards compatibility, but why
> the __iter__()/readlinesiter() duplication?

Right. readlinesiter() is gone.

> > Another way to do it is as follows (we should pick one or the other):
> >
> >     .__init__(self, buffer, encoding=None, newline=None)
> >
> >        Same as above but if newline is not None use that as the
> > newline pattern (for reading and writing), and if newline is not set
> > attempt to find the newline pattern from the file and if we can't for
> > some reason use the system default newline pattern.
>
> I like this API better, but I'm not certain I understand the proposal.

Me neither. I'll think about this some more.

>  If I call::
>
>     TextIOWrapper(buffer, newline='\n')
>
> does that mean that any '\r\n' strings in the file will appear as
> '\n'?  Likewise, if I call::
>
>     TextIOWrapper(buffer, newline='\r\n')
>
> does that mean that any bare '\n' strings will appear as '\r\n'?  If
> not, how do I get universal newline support with this API?  (FWLIW,
> I'd be happy with the you-only-see-newlines-like-you-asked-for-them
> semantics above.)
>
> > Another implementation, StringIO, creates a file-like TextIO
> > implementation without an underlying Buffer I/O object.  While similar
> > functionality could be provided by wrapping a BytesIO object in a
> > Buffered I/O object in a TextIOWrapper, the String I/O object allows
> > for much greater efficiency as it does not need to actually performing
> > encoding and decoding.
>
> Sorry, I didn't understand this part. The StringIO won't have to do
> encoding/decoding when ``.next()`` is called?

The idea is that this should work like StringIO.py in Python 2.x when
you only write unicode strings to it. It will then store everything as
Unicode strings and the seek positions count characters, not bytes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)