[Mailman-Developers] Possible bug in marilman arch when doing incremental archiving

fc lists fclists at pr-z.info
Fri Dec 13 14:06:26 CET 2013


Hi,

I think i encountered a bug for mailman *arch* process when doing
incremental archiving out of band from mbox file.
I am running mailman-2.1.9 (working on upgrading to 2.1.17 ... but if i got
this right it is still present in that version)

For various reasons i need to run my Archiving out of band from the normal
mailman pipeline.
So i set mailman to only archive to MBOX file and then i regularly run the
*arch* command using "-s" and "-e" option to define my incremental

The issue i have been having is that at every single time the *arch *command
is called after the first "wipe" then i get a duplicate of every single
attachment file that was previously created from messages already archived.

I followed up into the code and i think i found the issue, but i wanted to
confirm here that i did not miss anything else or just got on the wrong
road. (i am not a developer whatsoever)

in lib/mailman/Mailman/Mailbox.py (line 79 and following on my code) an
*archfactory* is defined for the MBOX file that will be opened.



*def _archfactory(mailbox): *
which return the scrubber which extract the attachment from the email.

the factory is called on the message extracted every time the
*next()*method is called on the mbox object

in lib/mailman/Mailman/pipermail.py

where the actual logic of arch -s X -e X is defined the code look like
(line 555 on my code):

    def processUnixMailbox(self, input, start=None, end=None):
        mbox = ArchiverMailbox(input, self.maillist)
        if start is None:
            start = 0
        counter = 0
        while counter < start:
            try:
*                m = mbox.next()*
            except Errors.DiscardMessage:
                continue
            if m is None:
                return
            counter += 1


When the start is higher then counter, to iterate over the mailbox and get
to the starting point requested, the *next()* method on the mbox object is
called which trigger the archfactory which extract the attachment from
messages that were already archived causing the duplicate attachment to
happen.

My "test fix" is basically to set the factory for the mbox object to the
default rfc822.Message while iterating over the mbox object until the start
is reached , then i set the factory back to the _archfactory ... and
everything seems to work as expected.

Again, i am no developer and i wanted to double check this here before i go
on and put this live on my systems.

Thanks
Francesco Ciocchetti


More information about the Mailman-Developers mailing list