[Mailman-Developers] Scrubber.py confusion, 2.1b3
Barry A. Warsaw
barry@python.org
Mon, 12 Aug 2002 20:00:04 -0400
>>>>> "MM" == Michael Meltzer <mjm@michaelmeltzer.com> writes:
MM> I been going over some of the Scrubber.py code two thing are
MM> standing out for me
Cool, someone's looking at it :)
MM> 1)A lot of work was made to make the filename unique in
MM> "save_attachment", it look like a straight bug that the url
MM> returned does not have the "extra" part returned as part of
MM> the url, looks to me like the last line should be
MM> url = baseurl + 'attachments/%s/%s' % (msgdir, filename +
MM> extra)
It's certainly true that extra is never used once calculated. That
can't be useful. :)
MM> frankly I think the forming of the name could better, like
MM> filenamebase + "-" +counter + "." + ext, but that more of a
MM> feature request
That was the intent, but the code's broken.
MM> 2)It looks like this code is doing directory abuse, it looks
MM> like a unlimited amount of files names fill be placed in one
MM> directory, like 2^32, this is not good for systems
MM> performance, even with the latest dirhash methods by the
MM> operating system ,this will become a linear screech very
MM> quickly for file creates and file exists. Been their and
MM> killed the patient that way. Hard to spot it until you ramp
MM> the systems up. I am playing around by adding two more time
MM> based directories to the system "attachments/YYYYMM/DD/". BTW
MM> that what made spotting bug #1 so easy :-)
I agree that the directory calculation is broken. It actually looks
like a message with two attachments will end up in two different
subdirs in archives/private/listname/attachments. That wasn't the
intent. The idea was that each message would have a separate subdir
in attachments and all its attachments would end up there. So you'd
only be in trouble on very high volume lists. 2**32 at 1000 msgs /
day gives you about 11k years of running room. If you were paranoid
about 2**16 directories, then you might care about adding another
level of directories.
I'll work on fixing the code, and see how easy it is to add or change
to the date-based directory.
Thanks,
-Barry