[Mailman-Developers] Scrubber.py confusion, 2.1b3

Barry A. Warsaw barry@python.org
Mon, 12 Aug 2002 20:00:04 -0400


>>>>> "MM" == Michael Meltzer <mjm@michaelmeltzer.com> writes:

    MM> I been going over some of the Scrubber.py code two thing are
    MM> standing out for me

Cool, someone's looking at it :)

    MM> 1)A lot of work was made to make the filename unique in
    MM> "save_attachment", it look like a straight bug that the url
    MM> returned does not have the "extra" part returned as part of
    MM> the url, looks to me like the last line should be

    MM> url = baseurl + 'attachments/%s/%s' % (msgdir, filename +
    MM> extra)

It's certainly true that extra is never used once calculated.  That
can't be useful. :)

    MM> frankly I think the forming of the name could better, like
    MM> filenamebase + "-" +counter + "." + ext, but that more of a
    MM> feature request

That was the intent, but the code's broken.

    MM> 2)It looks like this code is doing directory abuse, it looks
    MM> like a unlimited amount of files names fill be placed in one
    MM> directory, like 2^32, this is not good for systems
    MM> performance, even with the latest dirhash methods by the
    MM> operating system ,this will become a linear screech very
    MM> quickly for file creates and file exists. Been their and
    MM> killed the patient that way. Hard to spot it until you ramp
    MM> the systems up. I am playing around by adding two more time
    MM> based directories to the system "attachments/YYYYMM/DD/". BTW
    MM> that what made spotting bug #1 so easy :-)

I agree that the directory calculation is broken.  It actually looks
like a message with two attachments will end up in two different
subdirs in archives/private/listname/attachments.  That wasn't the
intent.  The idea was that each message would have a separate subdir
in attachments and all its attachments would end up there.  So you'd
only be in trouble on very high volume lists.  2**32 at 1000 msgs /
day gives you about 11k years of running room.  If you were paranoid
about 2**16 directories, then you might care about adding another
level of directories.

I'll work on fixing the code, and see how easy it is to add or change
to the date-based directory.

Thanks,
-Barry