[Python-3000] Draft PEP for New IO system

Daniel Stutzbach daniel at stutzbachenterprises.com
Wed Feb 28 14:39:33 CET 2007


Note:  to make my answers true, I had to change the Non-blocking I/O
part of the PEP so that .read(), .write(), and .readinto() all return
None if no data is available from a non-blocking object.  Previously
it had specified that .readinto() would return 0, but I realized this
would be ambiguous with an EOF condition.

I'll work on fleshing out the PEP with answers to these questions
within a couple hours.

On 2/28/07, Giovanni Bajo <rasky at develer.com> wrote:
>  > Raw I/O
>  >
>  >    .read(n: int) -> bytes
>  >    .readinto(b: bytes) -> int
>  >    .write(b: bytes) -> int
>
> What are the requirements here?
>
> - Can read()/readinto() return *less* bytes than specified?

Yes.

> - Can read() return a 0-sized byte object (=no data available)?

A 0-sized byte object indicates end-of-file.

> - Can read() return *more* bytes than specified (think of a datagram socket or
> a decompressing stream)?

No.  For a Raw I/O object, any such extra bytes are either buffered in
the kernel or lost.  For a Buffered IO object, extra bytes are
buffered.

> - Can readinto() read *less* bytes than specified?

For a Raw I/O object, yes.  For a Buffered I/O object in non-blocking
mode, yes.  For a Buffer I/O object in blocking mode, no.

> - Can readinto() read zero bytes?

Only on end-of-file.

> - Should read()/readinto() raise EOFError?

On EOF, they return a length-0 object or 0 instead.  If the user tries
to read again *after* hitting EOF, then an EOFError is raised.

> - Can write() write less bytes than specified?

For a Raw I/O or non-blocking Buffered I/O object, yes.  For a
blocking Buffered I/O object, no.

> - Can write() write zero bytes?

Only if requested by the user. ;)

Exception to a few questions about: a zero-byte read/readinto/write
can occur on a non-blocking object, but the functions return None to
distinguish this case from an EOF condition.

> Please, see also the examples at the end of the mail before providing an answer :)
>
>  >    .seek(pos: int, whence: int = 0) -> None
>  >    .tell() -> int
>  >    .truncate(n: int = None) -> None
>  >    .close() -> None
>
> Why should this very low-level basic type define *two* read methods? Assuming
> that readinto() is the most primitive, can we have the ABC RawIOBase provide a
> default read() method that calls readinto?

> Yes, I think readable/writeable/seekable/fileno *perfectly* match the good
> usage of attributes/properties. They all provide a value without any
> side-effect and that can be computed without doing O(n)-style computations.

Unfortunately, seekable() may need to call .seek() to figure it out.
I favor calling .seek() (or usting stat()) once when constructing the
object and storing the value (since we'll almost certainly need to do
this anyway to figure out what kind of Buffered I/O object to use).
If we do that, then we can make these attributes.

> Now for some real example. Let's say I'm given a readable RawIOBase object.
> I'm told that it's a foobar-compressed utf-8 text-file. I have this API available:
>
>      class Foobar:
>         # initialize decompressor
>         __init__()
>
>         # feed compressed bytes and get uncompressed bytes.
>         # The uncompressed data can be smaller, equal or larger
>         # than the compressed data
>         decompress(bytes) -> bytes
>
>         # finish decompression and get tail
>         flush() -> bytes
>
>
> This is basically similar to the way zlib.decompress/flush works. I would like
> to wrap the readable RawIOBase object in a way that I obtain a textual
> file-like with readline() etc.

The easy way to do this is for the zlib decompressor to wrap the
RawIOBase object in an appropriate BufferIOBase object first.  Then
read() can be called with no argument and return as many bytes as are
available.  It sounds like you want to force RawIOBase objects to have
a buffer, too, which defeats the point of having layers.  Most
use-cases will want to use a BufferIOBase object to buffer the bytes
coming out of the raw object.  In a few cases though, it really is
useful to get down to the system-call level.  Part of the motivation
for reworking the I/O interface is to make this possible.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC


More information about the Python-3000 mailing list