how to remove oldest files up to a limit efficiently

linuxnow at gmail.com linuxnow at gmail.com
Wed Jul 9 18:21:09 EDT 2008


On Jul 9, 7:08 pm, Terry Reedy <tjre... at udel.edu> wrote:
> Dan Stromberg wrote:
> > On Tue, 08 Jul 2008 15:18:23 -0700, linux... at gmail.com wrote:
>
> >> I need to mantain a filesystem where I'll keep only the most recently
> >> used (MRU) files; least recently used ones (LRU) have to be removed to
> >> leave space for newer ones. The filesystem in question is a clustered fs
> >> (glusterfs) which is very slow on "find" operations. To add complexity
> >> there are more than 10^6 files in 2 levels: 16³ dirs with equally
> >> distributed number of files inside.
>
> >> Any suggestions of how to do it effectively?
>
> > os.walk once.
>
> > Build a list of all files in memory.
>
> > Sort them by whatever time you prefer - you can get times from os.stat.
>
> Since you do not need all 10**6 files sorted, you might also try the
> heapq module.  The entries into the heap would be (time, fileid)

I'll look into it: probably sorting dirs by atime and adding the files
inside to the heapq until I can remove enough of them would work very
efficiently.

Thanks
Pau



More information about the Python-list mailing list