[Mailman-Developers] RE: Mailman developer?

Thu, 31 Jan 2002 00:15:43 -0500

We can use GIFs, but we cant use the LZW compression option. As the GIFs
we produce will be small, this is hardly a problem. At any rate, any of
GIF, PNG or JPEG will do. 

The advantage of using encryption rather than hashing is that the email
renderer needs to know nothing about the system except what it is given
in the querystring (and the key, of course). Using md5 hashes would be
more secure, but would require more information to be shared between the
pipermail cgi and the renderemail cgi; that is, a mapping of hashes to
email addresses. Frankly, I cant see email harvesters going to the
trouble of cracking encryption - no matter what kind.

I can think of an alterative technique for preventing email harvesting;
one that preseves the clickability of emaila addresses. It involves the
use of javascript. Using it, an email link would look something like
this:

<script language="javascript">
	email = decode("92731602eba1aa4f506604f8c3671ed83ea9");
	document.write("<a href='mailto:"+email+"'>"+email+"</a>");
</script>

decode() is some kind of decrypting function. Possibly something as
simple as a substitution cipher.

The disadvantage of this technique is that email addresses wouldn't be
accessible to people using a non-javascript capable browser, and browser
variations would tend to make this less reliable than the image viewing
technique. For example, I have found that Netscape is less than reliable
when it comes to document.write().

One solution to the non-javascript browser issues would be to use the
<noscript> tag to deliver the email address as an image.

<noscript>
<img src="/render-email.py?92731602eba1aa4f506604f8c3671ed83ea9">
</noscript>

One problem with all this is that pipermail renders HTML as it stores
emails, rather than as they are viewed. The issue here is that the
original emails are lost; if we encrypt (or hash) email addresses, then
losing the key (or hash->email mapping) implies also losing the
addresses. There are some comments in the pipermail/HyperMail source
code about rendering to html on viewing rather than storage, but
converting to this scheme would imply a backwards compatability issue:
how to import emails alerady rendered to html.

A backwards compatable solution would be to render to html twice - once
on storage and again on viewing. The viewing renderer would be
responsible for encrypting/obfuscating email addresses.

Im going to join the mailman-developers list with a temporary email
address. Get some more input on this issue. 

-----Original Message-----
From: Barry A. Warsaw [mailto:barry@zope.com] 
Sent: Wednesday, 30 January 2002 21:18
To: Damien Morton
Subject: Re: Mailman developer?

>>>>> "DM" == Damien Morton <Damien.Morton@acm.org> writes:

    DM> 	Im under the impression that you are one of the main
    DM> developers and/or maintainers of Mailman. I hope you don't
    DM> mind me writing to you.

Nope.

    DM> 	I notice that Mailman obfuscates email addresses to a
    DM> certain extent, but replacing the @ symbol with %40 or
    DM> &atmark; is hardly sufficient. An even vaguely intelligent
    DM> email harvester will see through this.

True.

    DM> 	The feature im proposing is to render out all email
    DM> addresses in the archive as GIFs. I would have pipermail
    DM> render out <img> tags whose src is an encrypted version of the
    DM> email address. A companion CGI script would decrypt the email
    DM> address and render it out as a GIF image using PIL or
    DM> somesuch.

Of course, because Mailman is a GNU project, we can't use gifs, but pngs
or jpegs would work just as well.  IIRC, PIL can generate either of
those formats.

    | Instead of rendering this:
    | <a href="mailto:your.email@address">your.email@address</a>

    DM> You'd render this instead: <img
    DM> src="/render-email.py?92731602eba1aa4f506604f8c3671ed83ea9">

    DM> 	This exmaple uses the simple rotor encryption that
    DM> comes with python and the key is "the quick brown fox jumped
    DM> over the lazy dog"

I usually use the md5 module to generate unique keys in such situations.

    DM> 	Ive been looking at pipermail, and it is
    DM> _ugly_.

Tell me about it!  About it's only saving grace is that it's reasonably
well integrated and it's all in Python.  Other than that... ;)

There have been lots of discussion over the years about ditching that
code and doing it right, so I'd suggest pouring over the
mailman-developers archives (yes, you'll miss being able to search it
;).  It's a lot of work though, so it currently languishes for lack of a
motivated champion.

    DM> Nonetheless, I'm fairly sure I can add this functionality
    DM> easily. Im running w2k, however, and I see that Mailman isnt
    DM> really meant for w2k.

Correct.

    DM> As I would be working on the pipermail part of mailman only,
    DM> it might be easier to get only that component up and running
    DM> under widnows. Not sure if theres a sample archive that comes
    DM> with mailman, but... any suggestions welcome.

Nope, but you can of course grab the raw mbox for any archive, or for a
month of messages.  Note that that does point to another source of
leaked addresses, one that won't be directly affected by your idea.
However I could see hiding raw mbox access behind a cgi POST which
should effectively stop today's harvesters.

    DM> 	The downsides of this functionality are that it might
    DM> incur a performance penalty and that it eliminates the
    DM> clickable mailto: functionality persently there.

System-wide caching should alleviate the performance hit (modulo cgi
overhead).  The loss of clickable mailto: would be a drag, although I
don't know how much it would be missed in practice.

    DM> 	As far as functionality goes, I imagine that the bulk
    DM> of any mailman bandwidth will be from spiders, and these are
    DM> unlikely to traverse an image source link. Secondly, simple
    DM> caching should be very easy to implement.

    DM> 	As far as the clickable mailto functionality goes, I
    DM> have two suggestions. The first is that the
    DM> render-emails-as-images functionality could be a personal
    DM> preference of the sender of the email, and the second is that
    DM> that preference could be overridden by acquiring a cookie
    DM> through some bot and spider proof mechanism. I like the 'type
    DM> what you read in the image above' mechanism for detecting
    DM> humans.

I wouldn't make it a option of the sender, but of the list (or maybe
just of the site).

Anyway, it's a neat idea.  Mailman-developers would be the best place to
discuss it, but that does present a bit of a catch 22 for you. ;)

>From a practical standpoint, MM2.1 will likely go to beta this weekend,
meaning feature freeze.  However I encourage you to follow through and
work out some patches, if you'd be willing to assign copyright to them
to the FSF eventually.  Post any patches on Mailman's SF project page
and that will let other interested parties download it and test it out,
etc.

Cheers,
-Barry