[Mailman-Developers] Protecting email addresses from spam harvesters
Jay R. Ashworth
jra@baylink.com
Mon, 25 Feb 2002 13:14:56 -0500
On Mon, Feb 25, 2002 at 12:27:23PM -0500, Barry A. Warsaw wrote:
> I /think/ I've caught up on this thread, but I'm sure I've missed a
> bunch. As I see it there are really these issues to protecting email
> addresses in Mailman:
>
> 1) list admin addresses
> 2) public archives
> 3) private archives
> 4) raw archive
> 5) list rosters
I believe you've synopsized it correctly, yes.
> For #1, MM2.1 changes what gets included at the bottom of list pages.
> The admin's personal address is no longer included in the link's text
> or in mailto: href. In the mailto: you'll see something like
> mylist-owner@dom.ain and in the text you'll see something like "barry
> at zope.com". I see no point in trying to obscure the former -- or
> put it behind a web form -- because it's easily guessed given a probe
> of existing lists, as is every other list-related email address. More
> on protecting the -owner from spam below. I claim that the
> guessability is a feature, btw.
Concur.
> You can argue that "barry at zope.com" isn't obfuscated enough, and
> you might be right. I'm against any image or JavaScript approach to
> protecting these because I really do want to keep Mailman's web
> interface as pedestrian as possible. In principle I don't mind if
> JavaScript or images are used, but they should never be the only way
> to navigate a Mailman site. Mailman must degrade gracefully for
> browsers that either don't support these features or have them
> disabled. I'd do the same with cookies if I could figure out how to
> do low-frustration-factor authentication without them.
And, of course, if it *will* degrade, then address-snarfers will figure
out how to *make* it degrade, so it's not worth doing in the first
place, at least not for *that* reason.
> (Aside: I really really hate websites that are only viewable with
> JavaScript on, and I often send a friendly ADA-ish noodge to webmaster
> when I find such beasts, although it rarely does any good).
Hear hear!
> MM3 will likely integrate admin addresses and list memberships into an
> object called a "roster" (essentially just a list of email addresses).
> This will let us define a pipeline for each roster, which could
> include a spam filter that performs an action based on some criteria
> (e.g. drop it, reject it, mark a header, etc.). So we can do more
> protection on the -owner address than we can do now (without
> hacking).
I do see one problem here, and I don't know if you already address it
below. [ looks ] You don't; it's this: if the list-owner addresses go
through the MM machinery, as well, then they too can die if MM crashes
the wrong way.
This implies, as I believe has already been discussed, that the
*server* admin address must be publicly accessible, not be piped into
MailMan at all, and preferably, should actually not even be handled by
the same machine... ("Single point of failure")
> Rosters and the improved user database will allow us to
> actually equate admin email addresses with Real Names, so you could
> conceivably see something like
>
> List run by <a href="mailto:mylist-owner@dom.ain">Barry Warsaw</a>
>
> at the bottom of the pages. You'd be within your rights to argue that
> end users never even need know who admins the list, but I think it
> helps to avoid the "faceless droid" syndrome.
Concur *strongly*.
> Mailman should avoid getting deeply into the spam detection and
> prevention business, except for some really really basic stuff
> (probably not much more or less than it does now). It should
> integrate well with external spam detection programs like SpamAssassin
> or commercial equivalents. E.g. if we always send the message through
> SA, and the message gets some score, we could decide to hold messages
> below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.
That sounds good, and if there isn't already a "plugin" API for that,
we ought to give some thought to that...
> As for #2, I'd go for the low-tech approach of simply discarding the
> hostname part of the email address in all public archives. Certainly
> this is easy in the headers, and we'll have to decide whether we're
> going to expend the resources to do body searches for email addresses,
> and obfuscate those as well. If people want to make contacts based on
> some public archive message, they can email the list. Until we've got
> web-posting, I don't think it matters if they lose the full email
> address in the public archives.
Well, personally, I don't ever assume that someone who posted a message
a year ago with 95% of the answer to my question is even *on* the list
anymore -- a situation I don't think you thought of -- but...
> As for #3, I don't mind not obscuring the email addresses since a
> login will be required. If we think we don't trust the current
> private archive login procedures to be secure against bots, then we
> can fix that, but I don't see it as a high priority.
Concur.
> #4 is interesting too. I'm not against putting the raw archive behind
> a turing-test, since I suspect that very few people will ever want
> it. It means that we won't be able to write an automated wget-ish
> script to do off-site backups, but so be it.
Is there a difference between raw and private that I'm missing? Do you
mean the mbox format files?
> Things to note for #'s 2-4:
>
> - The Pipermail implementation has lots of well-known problems. I'm
> personally not willing to spend a lot of time fixing them, and I
> still recommend Real Sites use a Real Archiver. I've just thrown
> the majority of the email obfuscation problems over the fence into
> someone else's back yard <wink>.
:-)
> - Adding public archive obfuscation is fine and dandy for new messages
> added to the archives but what about all the existing archived
> messages? Re-running Pipermail (i.e. bin/arch) to regenerate from
> scratch has two significant drawbacks. 1) Message url's can change,
> especially if you also fix broken From_ delimiters, and that in turn
> breaks bookmarks, 2) on large mboxes, you simply can't do bin/arch
> because of memory problems.
See above. :-)
> - Someone needs to step up and "own" Pipermail if any of these
> problems are going to be fixed, or if the obfuscation is going to
> happen.
Not much danger of that, is there?
> - Remember that Pipermail itself is completely optional. An API is
> defined between Mailman and the archiver and that's all the
> interaction they have. Maybe the API needs to be more elaborate to
> support obfuscation. It definitely needs some changes if we ever
> want to add some of the features I'd like to add (but that's
> off-topic here).
Well, that's probably the best point yet: this isn't *MailMan's*
problem, except to the extent that we "recommend" Piper as out
archiver.
> - I'll note that one of the early design decisions for Pipermail was
> that public archives should be vended directly from the file system
> for performance reasons. That decision may not be appropriate for
> today's operations. Certainly maintaining two static versions of
> the pages isn't feasible, so I think you have to vend one or the
> other (probably the obfuscated version) from a cgi.
No, but the performance reasons aren't as much of an issue now...
> Nobody's even mentioned #5, which are available publically via the
> "Visit Subscriber List" button, or the email command "who" to the
> -request address. If I were a spam harvester, I wouldn't even bother
> with scanning the archives if either of these were publically
> enabled. When you turn them off, especially the former, just remember
> that you've now made it much harder for Joe User to unsubscribe
> themselves. Catch 22.
<chuckle>
Not enough experience in the field, or I'd probably have mentioned that
already.
Cheers,
-- jra
--
Jay R. Ashworth jra@baylink.com
Member of the Technical Staff Baylink RFC 2100
The Suncoast Freenet The Things I Think
Tampa Bay, Florida http://baylink.pitas.com +1 727 647 1274
"If you don't have a dream; how're you gonna have a dream come true?"
-- Captain Sensible, The Damned (from South Pacific's "Happy Talk")