[Mailman-Developers] Re: [Mailman-Users] Poking and prodding the archiver

Barry A. Warsaw barry@digicool.com
Mon, 9 Jul 2001 22:28:19 -0400


[Note: this discussion is more appropriate for mailman-developers, so
I've changed the Cc: -baw]

>>>>> "PS" == Phil Stracchino <alaric@babcom.com> writes:

    PS> I've looked at some length through the code for the archiver
    PS> now, and although I still don't understand python, I've
    PS> figured out enough of what the archiver is doing to see that
    PS> it's apparently intentional that the path to mbox archives is
    PS> .../mailman/archives/private/list.mbox/list.mbox.

Yes, and this is for security reasons as explained in the comment in
Archiver.py (see InitVars()).  The comment is slightly out-of-date in
that the file under listname.mbox/ is also called listname.mbox.

    PS> What I haven't been able to figure out is *why* the code is
    PS> written to duplicate the last element in the pathname;

See above.
    
    PS> nor why it is that the archiver is written in such a way that
    PS> it attempts to access this mbox archive directory with its
    PS> duplicated final pathname element even when mbox archives are
    PS> disabled, and fails if it doesn't exist.

If this is true (and I haven't tested it), then it's most likely just
old lurking bugs.  The archiver/Pipermail stuff is the most neglected
part of the codebase.  People keep threatening to help rewrite it, but
so far nothing's materialized, and I have little time or energy to
devote to the Pipermail side.
    
    PS> I find this behavior even more curious in light of the fact
    PS> that newlist apparently creates archives/private/list.mbox
    PS> when it sets up the list, but does not create the
    PS> archives/private/list.mbox/list.mbox without the existence of
    PS> which the archiver fails.

Do you mean the archiver fails or that the web access to the archiver
fails?  Certainly not the former (unless I misunderstand) because it
works for me, and loads of other people.  It's a known buglet that the
pipermail url doesn't work until the first message is posted to the list.

    PS> I've applied the following patch to my HyperArch.py file
    PS> (patch also attached separately):

[patch deleted]

    PS> I don't know what impact this has on mbox archives, but for
    PS> me, it makes the HTML archiver work.

Hmm, odd.  What I think will break is private archives.  If you toggle
an archive to private, I seem to remember that you can craft a url to
trick the web server into vending an archive page for you directly,
instead of forcing you to go through authentication with the
private.py cgi.

    PS> I would welcome comment, any explanations for the curious
    PS> state of unsatisfied and illogical dependencies described
    PS> above, and any advice on fixing anything that this patch
    PS> breaks.  It's still a mystery to me why the archiver should
    PS> even *care* whether or not the mbox archive directory exists,
    PS> when mbox archives are disabled in the master configuration
    PS> anyway.

It probably shouldn't, but then Mailman probably shouldn't support
ARCHIVE_TO_MBOX=0.  Archiving to the mbox is about as fast as it gets,
since it is just a file append, and it's /incredibly/ handy to have
that .mbox file around (even as large as it can get), in case you want
to regenerate your archive, or you want to migrate to a different
external archiver.

-Barry