[Mailman-Developers] Protecting email addresses from spam harvesters

Tue, 26 Feb 2002 00:56:45 -0500

> You can argue that "barry at zope.com" isn't obfuscated enough, and
> you might be right.  I'm against any image or JavaScript approach to
> protecting these because I really do want to keep Mailman's web
> interface as pedestrian as possible.  In principle I don't mind if
> JavaScript or images are used, but they should never be the only way
> to navigate a Mailman site.  Mailman must degrade gracefully for
> browsers that either don't support these features or have them
> disabled.  I'd do the same with cookies if I could figure out how to
> do low-frustration-factor authentication without them.

>>>>> "JRA" == Jay R Ashworth <jra@baylink.com> writes:

    JRA> And, of course, if it *will* degrade, then address-snarfers
    JRA> will figure out how to *make* it degrade, so it's not worth
    JRA> doing in the first place, at least not for *that* reason.

Agreed.

> MM3 will likely integrate admin addresses and list memberships into an
> object called a "roster" (essentially just a list of email addresses).
> This will let us define a pipeline for each roster, which could
> include a spam filter that performs an action based on some criteria
> (e.g. drop it, reject it, mark a header, etc.).  So we can do more
> protection on the -owner address than we can do now (without
> hacking).

    JRA> I do see one problem here, and I don't know if you already
    JRA> address it below.  [ looks ] You don't; it's this: if the
    JRA> list-owner addresses go through the MM machinery, as well,
    JRA> then they too can die if MM crashes the wrong way.

    JRA> This implies, as I believe has already been discussed, that
    JRA> the *server* admin address must be publicly accessible, not
    JRA> be piped into MailMan at all, and preferably, should actually
    JRA> not even be handled by the same machine...  ("Single point of
    JRA> failure")

Well, what machine it's handled by isn't Mailman's business, but you
do have a point.  Until recently, I recommended that you install
aliases `mailman' and `mailman-owner', but now I recommend that
`mailman' be an actual list, and it is from this list that things like
password reminders look to come from.  Also, if the site list gets a
bounce, it'll check all the existing lists for a match against the
bouncing address.

You make the valid point that if the Mailman system were to break,
you'd have no way to contact the site administrator, save for typical
aliases like postmaster.  It seems like you want:

- A non-list, plain alias to contact the human in case of emergency

- Some place that password reminders come from.  Since this will be
  receiving bounces, it ought to be a real list.

- A site-wide list of maintainers of the site who can take care of
  normal operations (i.e. panicky unsubscription requests).

Perhaps #3 can be the same as #1 for those sites that have a
collaborative management arrangement.  So the question is, what do we
call the alias and what do we call the list?  I have definitely seen
people try to send mail commands to `mailman@python.org' and from my
Majordomo days, this seems like a reasonable thing to (eventually)
implement.  Is it sufficient to recommend that postmaster@ point to a
real human, not a list, and leave mailman@dom.ain a normal list?

If not, i'd still opt for `mailman' to be the site list, and add
something like mailman-panic to be a human address.  Or perhaps make
mailman-owner pipe both to the wrapper and to postmaster.  I dunno,
I'm open to suggestions.

> Mailman should avoid getting deeply into the spam detection and
> prevention business, except for some really really basic stuff
> (probably not much more or less than it does now).  It should
> integrate well with external spam detection programs like SpamAssassin
> or commercial equivalents.  E.g. if we always send the message through
> SA, and the message gets some score, we could decide to hold messages
> below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.

    JRA> That sounds good, and if there isn't already a "plugin" API
    JRA> for that, we ought to give some thought to that...

Agreed.  I just have no idea what a reasonable API would be, although
we're planning on doing some experiments with SA on {python,zope}.org
to see what might make sense.

> #4 is interesting too.  I'm not against putting the raw archive behind
> a turing-test, since I suspect that very few people will ever want
> it.  It means that we won't be able to write an automated wget-ish
> script to do off-site backups, but so be it.

    JRA> Is there a difference between raw and private that I'm
    JRA> missing?  Do you mean the mbox format files?

Yup.  raw == mbox.

> - Someone needs to step up and "own" Pipermail if any of these
>   problems are going to be fixed, or if the obfuscation is going to
>   happen.

    JRA> Not much danger of that, is there?

Not so far. :(

> - Remember that Pipermail itself is completely optional.  An API is
>   defined between Mailman and the archiver and that's all the
>   interaction they have.  Maybe the API needs to be more elaborate to
>   support obfuscation.  It definitely needs some changes if we ever
>   want to add some of the features I'd like to add (but that's
>   off-topic here).

    JRA> Well, that's probably the best point yet: this isn't
    JRA> *MailMan's* problem, except to the extent that we "recommend"
    JRA> Piper as out archiver.

I don't know if I recommend it, in fact I try to dis-recommend it.
Still, I think we do more good than harm in distributing an archiver
that works out of the box.  And the advantage of Pipermail is that for
really really critical problems, we /can/ go in and hack on it.  I'm
torn, but still come down on the side of including Pipermail, even
with all its worts.

> - I'll note that one of the early design decisions for Pipermail was
>   that public archives should be vended directly from the file system
>   for performance reasons.  That decision may not be appropriate for
>   today's operations.  Certainly maintaining two static versions of
>   the pages isn't feasible, so I think you have to vend one or the
>   other (probably the obfuscated version) from a cgi.

    JRA> No, but the performance reasons aren't as much of an issue
    JRA> now...

Nope.

-Barry