[Python-Dev] Why do we flush before truncating?
Guido van Rossum
guido at python.org
Sat Sep 6 12:53:36 EDT 2003
> http://www.python.org/sf/801631
>
> gives a failing program on Windows, paraphrased:
>
> f = file('test.dat', 'wb')
> f.write('1234567890') # 10 bytes
> f.close()
>
> f = file('test.dat','rb+')
> f.read(5)
> print f.tell() # prints 5, as expected
>
> f.truncate() # leaves the file at 10 bytes
> print f.tell() # prints 10
>
>
> The problem is that fileobject.c's file_truncate() calls fflush() before
> truncating. The C standard says that the effect of calling fflush() is
> undefined if the most recent operation on a stream opened for update was an
> input operation. The stream is indeed opened for update here, and the most
> recent operation performed by the *user* was indeed a read. It so happens
> that MS's fflush() changes the file position then. But the user didn't call
> fflush(), Python did, so we can't blame the user for relying on undefined
> behavior here.
>
> The problem can be repaired inside file_truncate() by seeking back to the
> original file position after the fflush() call -- but the original file
> position isn't always available now, so I'd also have to add another call to
> _portable_ftell() before the fflush() to find it.
>
> So that gets increasingly complicated. Much simpler would be to remove this
> block of code (which does fix the test program's problem on Windows, by
> simply getting rid of the undefined operation):
>
> /* Flush the file. */
> Py_BEGIN_ALLOW_THREADS
> errno = 0;
> ret = fflush(f->f_fp);
> Py_END_ALLOW_THREADS
> if (ret != 0)
> goto onioerror;
>
> I don't understand why we're flushing the file. ftruncate() isn't a
> standard C function, so the standard sheds no light on why we might be doing
> that. AFAICT, POSIX/SUS doesn't give a reason to flush either:
>
> http://www.opengroup.org/onlinepubs/007904975/functions/ftruncate.html
ftruncate() is not a standard C function; it's a standard Unix system
call. It works on a file descriptor (i.e. a small int), not on a
stream (i.e. a FILE *). The fflush() call is necessary if the last
call was a write, because in that case the stream's buffer may contain
data that the OS file descriptor doesn't have yet.
But ftruncate() is irrelevant, because on Windows, it is never called;
there's a huge #ifdef MS_WINDOWS block containing Windows specific
code, starting with the comment
/* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
so don't even try using it. */
and the ftruncate() call is made in the #else part.
It also looks like the MS_WINDOWS specific code block *does* attempt
to record the current file position and seek back to it -- however it
does this after fflush() has already messed with it. So perhaps
moving the fflush() call into the #else part and doing something
Windows-specific instead of calling fflush() to ensure the buffer is
flushed inside the MS_WINDOWS part would be the right solution.
I just realize that I have always worked under the assumption that
fflush() after a read is a no-op; I just checked the 89 std and it
says it is undefined. (I must have picked up that misunderstanding
from some platform-specific man page.) This can be fixed by doing a
ftell() followed by an fseek() call; this is required to flush the
buffer if there was unwritten output data in the buffer, and is always
allowed.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list