[Tutor] Sorting of files based on filesize

C Smith smichr at hotmail.com
Tue Apr 12 08:40:43 CEST 2005


--request for a method of sorting disk files based on size so as to 
fill backup disks--

You may want to check out the karp.py routine posted at
http://aspn.activestate.com/ASPN/Mail/Message/python-Tutor/749797

Right now it is coded to split N numbers into 2 groups that have sums 
as nearly identical as possible. The set of numbers it is coded to test 
are are the first 50 square roots of integers; you would replace these 
with a list of N file sizes that are close to 2X a single disk capacity 
and it would tell you how to split up the group into two 
nearly-identically sized groups.  It's "very, very, very clever, and 
runs in an eyeblink" says Tim Peters. You might even want to use Tim's 
greedy "knapsack filler" approach that he initially proposed as part of 
the thread above which will try random selections from a list of values 
and keep track of the one that came closest to a target value. Your 
target value would be 2X the storage limit and then you could use the 
karp routine to split it nicely. Tim's greedy approach is at
http://mail.python.org/pipermail/tutor/2001-August/008075.html


An alternative to breaking your list of file sizes into sublists that 
are close to 2X your disk capacity would be to generalize the algorithm 
to break the numbers into M groups rather than 2 groups...but I'm not 
sure how easy that will be.

You're going to love using the karp for this problem :-)

/c



More information about the Tutor mailing list