[Python-Dev] Ext4 data loss

Adam Olsen rhamph at gmail.com
Fri Mar 13 05:14:41 CET 2009


On Tue, Mar 10, 2009 at 2:11 PM, Christian Heimes <lists at cheimes.de> wrote:
> Multiple blogs and news sites are swamped with a discussion about ext4
> and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue
> at
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>
>
> Python's file type doesn't use fsync() and be the victim of the very
> same issue, too. Should we do anything about it?

It's a kernel defect and we shouldn't touch it.

Traditionally you were hooped regardless of what you did, just with
smaller windows.  Did you want to lose your file 50% of the time or
only 10% of the time?  Heck, 1% of the time you lose the *entire*
filesystem.

Along came journaling file systems.  They guarantee the filesystem
itself stays intact, but not your file.  Still, if you hedge your bets
it's a fairly small window.  In fact if you kill performance you can
eliminate the window: write to a new file, flush all the buffers, then
use the journaling filesystem to rename; few people do that though,
due to the insane performance loss.

What we really want is a simple memory barrier.  We don't need the
file to be saved *now*, just so long as it gets saved before the
rename does.  Unfortunately the filesystem APIs don't touch on this,
as they were designed when losing the entire filesystem was
acceptable.  What we need is a heuristic to make them work in this
scenario.  Lo and behold ext3's data=ordered did just that!

Personally, I consider journaling to be a joke without that.  It has
different justifications, but not this critical one.  Yet the ext4
developers didn't see it that way, so it was sacrificed to new
performance improvements (delayed allocation).

2.6.30 has patches lined up that will fix this use case, making sure
the file is written before the rename.  We don't have to touch it.

Of course if you're planning to use the file without renaming then you
probably do need an explicit fsync and an API for that might help
after all.  That's a different problem though, and has always existed.


-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list