[Python-Dev] Why do we flush before truncating?

Sat Sep 6 12:53:36 EDT 2003

>    http://www.python.org/sf/801631
> 
> gives a failing program on Windows, paraphrased:
> 
> f = file('test.dat', 'wb')
> f.write('1234567890')   # 10 bytes
> f.close()
> 
> f = file('test.dat','rb+')
> f.read(5)
> print f.tell()  # prints 5, as expected
> 
> f.truncate()    # leaves the file at 10 bytes
> print f.tell()  # prints 10
> 
> 
> The problem is that fileobject.c's file_truncate() calls fflush() before
> truncating.  The C standard says that the effect of calling fflush() is
> undefined if the most recent operation on a stream opened for update was an
> input operation.  The stream is indeed opened for update here, and the most
> recent operation performed by the *user* was indeed a read.  It so happens
> that MS's fflush() changes the file position then.  But the user didn't call
> fflush(), Python did, so we can't blame the user for relying on undefined
> behavior here.
> 
> The problem can be repaired inside file_truncate() by seeking back to the
> original file position after the fflush() call -- but the original file
> position isn't always available now, so I'd also have to add another call to
> _portable_ftell() before the fflush() to find it.
> 
> So that gets increasingly complicated.  Much simpler would be to remove this
> block of code (which does fix the test program's problem on Windows, by
> simply getting rid of the undefined operation):
> 
> 	/* Flush the file. */
> 	Py_BEGIN_ALLOW_THREADS
> 	errno = 0;
> 	ret = fflush(f->f_fp);
> 	Py_END_ALLOW_THREADS
> 	if (ret != 0)
> 		goto onioerror;
> 
> I don't understand why we're flushing the file.  ftruncate() isn't a
> standard C function, so the standard sheds no light on why we might be doing
> that.  AFAICT, POSIX/SUS doesn't give a reason to flush either:
> 
>   http://www.opengroup.org/onlinepubs/007904975/functions/ftruncate.html

ftruncate() is not a standard C function; it's a standard Unix system
call.  It works on a file descriptor (i.e. a small int), not on a
stream (i.e. a FILE *).  The fflush() call is necessary if the last
call was a write, because in that case the stream's buffer may contain
data that the OS file descriptor doesn't have yet.

But ftruncate() is irrelevant, because on Windows, it is never called;
there's a huge #ifdef MS_WINDOWS block containing Windows specific
code, starting with the comment

	/* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
	   so don't even try using it. */

and the ftruncate() call is made in the #else part.

It also looks like the MS_WINDOWS specific code block *does* attempt
to record the current file position and seek back to it -- however it
does this after fflush() has already messed with it.  So perhaps
moving the fflush() call into the #else part and doing something
Windows-specific instead of calling fflush() to ensure the buffer is
flushed inside the MS_WINDOWS part would be the right solution.

I just realize that I have always worked under the assumption that
fflush() after a read is a no-op; I just checked the 89 std and it
says it is undefined.  (I must have picked up that misunderstanding
from some platform-specific man page.)  This can be fixed by doing a
ftell() followed by an fseek() call; this is required to flush the
buffer if there was unwritten output data in the buffer, and is always
allowed.

--Guido van Rossum (home page: http://www.python.org/~guido/)