[Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3

Fri Mar 16 06:41:58 CET 2012

There's an aspect of the scrubber that isn't going to work in a Mailman 3
world where we have multiple, possibly external, archivers and especially
where we don't have such tight integration with Pipermail (or Pipermail at all
<wink>).

We can still scrub messages of unwanted content type, but we can't save those
parts on the file system and calculate a URL into Pipermail to display them.

I can think of a few ways to handle this.

The easiest thing to do, and what I will probably do in my
'death-to-pipermail' branch is to simply scrub out the unwanted parts *after*
a copy of the message is sent to the archive queue, but *before* the message
is sent to the digest, usenet, and outgoing queues.

This makes sense because with a model of external archiving, those archivers
may make different decisions about what should be removed or displayed from
the original message.  We can still include a little blurb saying that a part
was scrubbed out, and since the messages can have the pre-calculated url to
the message in one or more archivers, the user is always free to just click on
the url to see the full message, displayed with whatever policy the archiver
is configured with.

One possibility is to save the scrubbed part inside the core and provide a url
to the REST API for accessing this attachment.  This can't be inserted into
the scrubbed message directly though because this would be a non-public url to
the resource, and it would have to be proxied by the web ui.  We need better
configuration for integrating the web ui with the core any way (e.g. to
calculate the url to the user's options page), so this could be part of that.
The interactions are trickier though because you would then have to inform the
web ui that there's a new attachment it should proxy.

The other, more elaborate option is to define an IScrubber interface, or
alternatively a "primary" IArchiver, that the message can pass through, which
would give it an opportunity to provide urls for each of the parts that will
be scrubbed out.  This is trickier because there can really be only one such
thing defined in the system.  I think it would be confusing if you received a
message that had something like this:

    text/html part scrubbed, view it at one of the following:
    http://example.com/attachments/foo.html
    http://example.org/some/extra/path/bar.html
    http://another.archive.example.net/whatever/baz.html

Besides, this may be nearly impossible to do without in-band communication
with that external archiver, which is exactly what the RFC 5064 +
message-id-hash was supposed to avoid.   I think we definitely don't want to
have to force such in-band communications to occur in order to scrub messages
of unwanted parts.

For now, I'm going to try to implement sending an unscrubbed copy of the
message to the archivers and just throwing up our hands for the copy of the
message sent to the list members.  The nice side-effect of this is that it
makes the scrubber *way* simpler!

Any other suggestions?

Cheers,
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120315/81576dbb/attachment.pgp>