Bug 3.11.x behavioral, open file buffers not flushed til file closed.

Sun Mar 5 20:50:17 EST 2023

On 3/5/23 19:02, Cameron Simpson wrote:
> On 05Mar2023 10:38, aapost <aapost at idontexist.club> wrote:
>> Additionally (not sure if this still applies):
>> flush() does not necessarily write the file’s data to disk. Use 
>> flush() followed by os.fsync() to ensure this behavior.
> 
> Yes. You almost _never_ need or want this behaviour. A database tends to 
> fsync at the end of a transaction and at other critical points.
> 
> However, once you've `flush()`ed the file the data are then in the hands 
> of the OS, to get to disc in a timely but efficient fashion. Calling 
> fsync(), like calling flush(), affects writing _efficiency_ by depriving 
> the OS (or for flush(), the Python I/O buffering system) the opportunity 
> to bundle further data efficiency. It will degrade the overall performance.
> 
> Also, fsync() need not expedite the data getting to disc. It is equally 
> valid that it just blocks your programme _until_ the data have gone to 
> disc. I practice it probably does expedite things slightly, but the real 
> world effect is that your pogramme will gratuitously block anyway, when 
> it could just get on with its work, secure in the knowledge that the OS 
> has its back.
> 
> flush() is for causality - ensuring the data are on their way so that 
> some external party _will_ see them rather than waiting forever for data 
> with are lurking in the buffer.  If that external party, for you, is an 
> end user tailing a log file, then you might want to flush(0 at the end 
> of every line.  Note that there is a presupplied line-buffering mode you 
> can choose which will cause a file to flush like that for you 
> automatically.
> 
> So when you flush is a policy decision which you can make either during 
> the programme flow or to a less flexible degree when you open the file.
> 
> As an example of choosing-to-flush, here's a little bit of code in a 
> module I use for writing packet data to a stream (eg a TCP connection):
> https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624
> 
> Starting at line 640: `if Q.empty():` it optionally pauses briefly to 
> see if more packets are coming on the source queue. If another arrives, 
> the flush() is _skipped_, and the decision to flush made again after the 
> next packet is transcribed. In this way a busy source of packets can 
> write maximally efficient data (full buffers) as long as there's new 
> data coming from the queue, but if the queue is empty and stays empty 
> for more that `grace` seconds we flush anyway so that the receiver 
> _will_ still see the latest packet.
> 
> Cheers,
> Cameron Simpson <cs at cskk.id.au>

Thanks for the details. And yes, that above quote was from a 
non-official doc without a version reference that several forum posts 
were referencing, with no further reasoning as to why they make the 
suggestion or to what importance it was (for the uninformed trying to 
parse it, the suggestion could be because of anything, like python 
lacking something that maybe was fixed, or who knows.) Thanks.