[Mailman-Developers] Issues with archiving directory and OS limitations

Brad Knowles brad at stop.mail-abuse.org
Tue Oct 25 02:28:40 CEST 2005


At 9:52 AM +1000 2005-10-25, AE Somerville wrote:

>  Problem manifests as an inability of the create list process being able to
>  make the archiving directories. The number appears to be when the directory
>  count approaches 32,000 separate directories.

	Most *nix OSes have problems with too many files (or 
subdirectories) within a given directory structure.  Frequently, you 
start seeing problems at much lower numbers, like 1000 or 10,000.

>  My temp solution:
>
>  I have altered Site.py line 52 to add the list name again into the path for
>  the archives. This halved the number of directories in the
>  /var/mailman/archives/private/ level and pushed the extra directories into
>  their own named sub directory. Now we can create new lists again (in our
>  situation we have the list population updated daily and the lists themselves
>  are added/deleted as required)

	This just pushes the horizon out.  This doesn't solve the 
fundamental problem.  IMO, you're better off doing a quick MD5 hash 
of the listname and then slicing off the first few (or last) 
characters of the hash, then incorporating that into the path name.

	If you use hex characters instead of some other base, that's 
roughly a factor of sixteen reduction in the number of 
subdirectories/files for each character of hash.  In practice, you'll 
get birthday collisions more frequently than you'd like, so count it 
as something closer to a four to eight reduction.


	With this technique, it doesn't take too many hash characters to 
greatly reduce the problem to a much more manageable size.  Just 
three characters of a reasonably well distributed hash will result in 
no more than 4096 hash subdirectories at the parent, and probably 
something close to a factor of 64 to 512 reduction in the number of 
grandchild subdirectories/files within each hash subdirectory.

	If you go with base-32 instead, two base-32 characters would be 
no more than 1024 files in a single directory, and probably close to 
a factor of six to 32 reduction in the number of grandchild 
subdirectories/files per hash subdirectory.

	Base-64 would let you get two characters creating no more than 
4096 hash subdirectories, and you can see the numbers above for the 
likely reduction in the number of grandchild subdirectories/files.


	If you need, you can take the hashing another level.  It all 
depends on how cramped you are for space in your filenames, because 
there are also inode and iname caching issues to consider.

-- 
Brad Knowles, <brad at stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.


More information about the Mailman-Developers mailing list