[Python-Dev] Ext4 data loss

Gisle Aas gisle at activestate.com
Thu Mar 12 09:06:26 CET 2009


On Mar 11, 2009, at 22:43 , Cameron Simpson wrote:

> On 11Mar2009 10:09, Joachim K?nig <him at online.de> wrote:
>> Guido van Rossum wrote:
>>> On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes  
>>> <lists at cheimes.de> wrote:
>>>> [...]
>>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 
>>>> .
>>>> [...]
>>> If I understand the post properly, it's up to the app to call  
>>> fsync(),
>>> and it's only necessary when you're doing one of the rename  
>>> dances, or
>>> updating a file in place. Basically, as he explains, fsync() is a  
>>> very
>>> heavyweight operation; I'm against calling it by default anywhere.
>>>
>> To me, the flaw seem to be in the close() call (of the operating
>> system). I'd expect the data to be
>> in a persistent state once the close() returns. So there would be no
>> need to fsync if the file gets closed anyway.
>
> Not really. On the whole, flush() means "the object has handed all  
> data
> to the OS".  close() means "the object has handed all data to the OS
> and released the control data structures" (OS file descriptor release;
> like the OS, the python interpreter may release python stuff later  
> too).
>
> By contrast, fsync() means "the OS has handed filesystem changes to  
> the
> disc itself". Really really slow, by comparison with memory. It is  
> Very
> Expensive, and a very different operation to close().

...and at least on OS X there is one level more where you actually  
tell the
disc to flush its buffers to permanent storage with:

    fcntl(fd, F_FULLSYNC)

The fsync manpage says:

      Note that while fsync() will flush all data from the host to the  
drive
      (i.e. the "permanent storage device"), the drive itself may not  
physi-
      cally write the data to the platters for quite some time and it  
may be
      written in an out-of-order sequence.

      Specifically, if the drive loses power or the OS crashes, the  
application
      may find that only some or none of their data was written.  The  
disk
      drive may also re-order the data so that later writes may be  
present,
      while earlier writes are not.

      This is not a theoretical edge case.  This scenario is easily  
reproduced
      with real world workloads and drive power failures.

      For applications that require tighter guarantees about the  
integrity of
      their data, Mac OS X provides the F_FULLFSYNC fcntl.  The  
F_FULLFSYNC
      fcntl asks the drive to flush all buffered data to permanent  
storage.
      Applications, such as databases, that require a strict ordering  
of writes
      should use F_FULLFSYNC to ensure that their data is written in  
the order
      they expect.  Please see fcntl(2) for more detail.

It's not obvious what level of syncing is appropriate to automatically  
happen
from Python so I think it's better to let the application deal with it.

--Gisle



More information about the Python-Dev mailing list