[Python-Dev] Fuzziness in io module specs - PEP update proposition

Sun Sep 20 14:57:45 CEST 2009

Pascal Chambon wrote:
> Hello
> 
> After weighing up here and that, here is what I have come with. Comments 
> and issue notifications more than welcome, of course. The exception 
> thingy is not yet addressed.
> 
> Regards,
> Pascal
> 
> 
> *Truncate and file pointer semantics*
> 
> Rationale :
> 
> The current implementation of truncate() always move the file pointer to 
> the new end of file.
> 
> This behaviour is interesting for compatibility, if the file has been 
> reduced and the file pointer is now past its end, since some platforms 
> might require 0 <= filepointer <= filesize.
> 
> However, there are several arguments against this semantic:
> 
>     * Most common standards (posix, win32…) allow the file pointer to be
>       past the end of file, and define the behaviour of other stream
>       methods in this case
>     * In many cases, moving the filepointer when truncating has no
>       reasons to happen (if we’re extending the file, or reducing it
>       without going beneath the file pointer)
>     * Making 0 <= filepointer <= filesize a global rule of the python IO
>       module doesn’t seems possible, since it would require
>       modifications of the semantic of other methods (eg. seek() should
>       raise exceptions or silently disobey when asked to move the
>       filepointer past the end of file), and lead to incoherent
>       situations when concurrently accessing files without locking (what
>       if another process truncates to 0 bytes the file you’re writing ?)
> 
> So here is the proposed semantic, which matches established conventions:
> 
> *RawIOBase.truncate(n: int = None) -> int*
> 
> *(same for BufferedIOBase.truncate(pos: int = None) -> int)*
> 
> Resizes the file to the size specified by the positive integer n, or by 
> the current filepointer position if n is None.
> 
The new size could be positive or zero.

> The file must be opened with write permissions.
> 
> If the file was previously larger than n, the extra data is discarded. 
> If the file was previously shorter than n, its size is increased, and 
> the extended area appears as if it were zero-filled.
> 
> In any case, the file pointer is left unchanged, and may point beyond 
> the end of file.
> 
> Note: trying to read past the end of file returns an empty string, and 
> trying to write past the end of file extends it by zero-ing the gap. On 
> rare platforms which don’t support file pointers to be beyond the end of 
> file, all these behaviours shall be faked thanks to internal storage of 
> the “wanted” file pointer position (silently extending the file, if 
> necessary, when a write operation occurs).
> 
>  
> 
> *Proposition of doc update*
> 
> *RawIOBase*.read(n: int) -> bytes
> 
> Read up to n bytes from the object and return them. Fewer than n bytes 
> may be returned if the operating system call returns fewer than n bytes. 
> If 0 bytes are returned, and n was not 0, this indicates end of file. If 
> the object is in non-blocking mode and no bytes are available, the call 
> returns None.
> 
> *RawIOBase*.readinto(b: bytes) -> int
> 
> Read up to len(b) bytes from the object and stores them in b, returning 
> the number of bytes read. Like .read, fewer than len(b) bytes may be 
> read, and 0 indicates end of file if b is not 0. None is returned if a 
> non-blocking object has no bytes available. The length of b is never 
> changed.
> 
I thought 'bytes' was immutable!

If you're going to read into a list or array, it would be nice to also
be able to give the start index and either the end index or the count
(start defaults to 0, end defaults to len).