[Python-Dev] Ext4 data loss (original) (raw)

Adam Olsen rhamph at gmail.com
Fri Mar 13 05:14:41 CET 2009


On Tue, Mar 10, 2009 at 2:11 PM, Christian Heimes <lists at cheimes.de> wrote:

Multiple blogs and news sites are swamped with a discussion about ext4 and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue at https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.

Python's file type doesn't use fsync() and be the victim of the very same issue, too. Should we do anything about it?

It's a kernel defect and we shouldn't touch it.

Traditionally you were hooped regardless of what you did, just with smaller windows. Did you want to lose your file 50% of the time or only 10% of the time? Heck, 1% of the time you lose the entire filesystem.

Along came journaling file systems. They guarantee the filesystem itself stays intact, but not your file. Still, if you hedge your bets it's a fairly small window. In fact if you kill performance you can eliminate the window: write to a new file, flush all the buffers, then use the journaling filesystem to rename; few people do that though, due to the insane performance loss.

What we really want is a simple memory barrier. We don't need the file to be saved now, just so long as it gets saved before the rename does. Unfortunately the filesystem APIs don't touch on this, as they were designed when losing the entire filesystem was acceptable. What we need is a heuristic to make them work in this scenario. Lo and behold ext3's data=ordered did just that!

Personally, I consider journaling to be a joke without that. It has different justifications, but not this critical one. Yet the ext4 developers didn't see it that way, so it was sacrificed to new performance improvements (delayed allocation).

2.6.30 has patches lined up that will fix this use case, making sure the file is written before the rename. We don't have to touch it.

Of course if you're planning to use the file without renaming then you probably do need an explicit fsync and an API for that might help after all. That's a different problem though, and has always existed.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-Dev mailing list