how to remove oldest files up to a limit efficiently

linuxnow at gmail.com linuxnow at gmail.com
Tue Jul 8 18:18:23 EDT 2008


I need to mantain a filesystem where I'll keep only the most recently
used (MRU) files; least recently used ones (LRU) have to be removed to
leave space for newer ones. The filesystem in question is a clustered
fs (glusterfs) which is very slow on "find" operations. To add
complexity there are more than 10^6 files in 2 levels: 16³ dirs with
equally distributed number of files inside.

My first idea was to "os.walk" the filesystem, find  oldest files and
remove them until I reach the threshold. But find proves to be too
slow.

My second thought was to run find -atime several times to remove the
oldest ones, and repeat the process with most recent atime until
threshold is reached. Again, this needs several walks through the fs.

Then I thought about tmpwatch, but it needs, as find, a date to start
removing.

The ideal way is to keep a sorted list if files by atime, probably in
a cache, something like updatedb.
This list could be also be built based only on the diratime of the
first level of dirs, seek them in order and so on, but it still seems
expensive to get his first level of dir sorted.

Any suggestions of how to do it effectively?



More information about the Python-list mailing list