[Mailman-Developers] [GSOC 2014]Approach towards the Full anonymization project

Stephen J. Turnbull stephen at xemacs.org
Wed Feb 26 17:23:12 CET 2014


Rajeev S writes:

 > As mentioned, here is my approach towards the full anonymization
 > project.

AFAICS as far as described it will provide the outcomes you describe.

However, I don't understand the use case here.  Most approaches use a
single secret ID for each user.  This is not just a matter of
convenience for the developer, but a requirement in some cases.  That
is, the list members, although anonymous "in the real world", build
trust relationships with each other in the list environment.  "Full
anonymity" is in any case difficult to achieve, as word choice, topic
choice, grammatical construction, time of writing, etc fall into
patterns over time.

Also, given your model of address-per-post, I'm again unclear on the
use-case for off-list communication via the list server.

Finally, there are a number of sets of details you don't mention here
but need to be discussed in your plan (even if you don't propose to
implement them now, you need to ensure that you don't make it
difficult to implement them!)  First, there's a question of how the
proposed off-list messaging is going to be handled.  Those pseudo-
random addresses are going to need to be made valid addresses to the
host's MTA.  That's MTA-dependent, and also will likely have security
implications.  To be sure that people don't inadvertantly reveal their
identities just by hitting "R" those messages should be anonymized as
well, so that any such replies have to go through the server too.  Of
course it's going to be impossible to prevent people from exchanging
email addresses in the body of the text, but in that case it's really
not your problem any more.

The second is cleaning up the rest of a post.  The incoming trace
headers typically identify the sender quite precisely.  Quite likely
you'll want to nuke everything that isn't required by the RFCs, in
fact.  You probably also should try to do something about .sigs,
Message-ID (which is required), Date (also required, and which often
gives timezone information) and other automatically added text.

Third, what about authentication for incoming posts?  Do you care if
people spoof addresses?  I'm not sure this has any meaning in the
one-shot address environment you propose, but that again is going to
depend on the use case.

Fourth, you need to think about security for the encryption key and
EmailMapper table, as well as any archives (you need to clean up
archived posts before they go to the archive -- this is probably just
a matter of where your Handler goes in the pipeline).

 >    - Introduce a new model EmailMapper with attributes
 >       - ForeginKey to Address / User
 >       - seed, A 40 bit hash,unique
 >       - nuses, number of times this hash is used,max 5 or 10
 >    - The approach is to encrypt the seed nuses times, with encryption
 >      algorithms like AES, each time the email ID is displayed.
 >    - The email ID is displayed as <nuses><encrypted seed>
 >    - The email is decrypted nuses times to find the parent seed and
 >      thereby point to the exact email address.
 >    - A new seed should be generated for the user after a fixed
 >      number of attempts,say 5 or 10,as the repeated encryption
 >      routines can slow down the system.
 > 
 > The outcomes
 > 
 >    - Everytime the user sends a message,his from address changes.
 >    - At the same time, each of the from addresses point to the same user.
 >    - The sender can use any stored address he has,like in the mail
 >      contacts,repeatedly, to contact with a user,as it has nuses
 >      attached with it.

So you need to store the addresses forever.  How big might these
tables grow?  Could that be a problem?

Did you consider using the "seed" as a "salt" instead?  Ie,
regenerating the seed each time, adjoining it to the address, and
encrypting the combination?  That would allow you not store a database
of addresses.  Of course if the encryption were compromised, all the
old posts could be identified, whereas in your scheme the EmailMapper
table also needs to be compromised to get addresses.

Regards, and good luck with your proposal!





More information about the Mailman-Developers mailing list