[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

Jeffrey Kintscher report at bugs.python.org
Thu May 30 03:53:26 EDT 2019


Jeffrey Kintscher <websurfer at surf2c.net> added the comment:

I did some tracing through bpo-36411.py using pdb and lldb.

The problem is that the file position variables stored in the buffered file object get out of sync with the kernel in the write operation after the seek(0) call. It thinks it is writing to the buffer as if it were relative to the last seek position. This is reflected in the incorrect values returned by tell (3 instead of 9, 6 instead of 12). The second seek operation flushes the buffer to disk and synchronizes the position variables with the kernel. The data is always written correctly to disk because the kernel maintains its own file position and handles the append operations internally.

Now that I have a handle on what is happening, I will dig into the buffered writes to find the root cause and hopefully a solution.

A workaround is to call f.seek(0, io.SEEK_CUR) before f.tell() to force the file object to synchronize its position variables with the kernel. There is a performance hit as f.seek() will flush dirty buffers to disk each time it is called, though it probably won't be noticed on a file system buffered by the kernel.

Here is what I found line by line (only relevant details are included):

1. The first f.open() creates the file and initializes the internal position variables to reflect the beginning of the file.
2. The first f.write() stores three bytes in the buffer and updates the internal position variables accordingly.
3. f.tell() correctly returns 3.
4. f.close() flushes the buffer to disk and closes the file.
5. When f.open() is called a second time with append mode, it calls lseek() to get the file position from the kernel and sets the internal position variables accordingly.
6. f.tell() correctly returns 3 to reflect the 3 bytes already in the file.
7. f.write() stores three bytes into the beginning of the buffer and updates the internal position variables accordingly.
8. f.tell() returns 6 as the correct position.
9. f.seek(0) flushes the dirty buffer to disk, calls lseek() so that the kernel updates the file position, and resets the internal position variables to reflect the beginning of the file.
10. f.write() stores three bytes into the beginning of the buffer, but the position variables are updated as if the write is relative to the beginning of the file. The position variables are now out of sync with the kernel.
11. f.tell() reflects this discrepancy by returning 3 instead of 9.
12. f.write() stores three bytes into the buffer starting at offset 3 and updates the position variables accordingly.
13. f.tell() returns 6 instead of 9.
14. f.seek(0, io.SEEK_CUR) flushes the 6 bytes in the dirty buffer to disk. The flush operation correctly appends the 6 bytes to the disk file regardless of the position variables because the kernel handles the append operation internally based upon its own file position variable. It then calls lseek() to determine the new position and updates the position variables accordingly. 
15. f.tell() returns the correct position.
16. The final f.seek(0) calls lseek() to move the file position back to the beginning of the file and updates the position variables, but doesn't flush anything to disk because the buffer is clean.
17. f.read() reads the entire file from the beginning.
18. f.close() closes the file. No flush operation is performed because the buffer is clean.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36411>
_______________________________________


More information about the Python-bugs-list mailing list