[Mailman-Users] Private archives available via Internet

Richard Barrett R.Barrett at ftel.co.uk
Thu Mar 28 19:54:06 CET 2002


At 08:47 28/03/2002 -0700, you wrote:
>It has come to our attention that mailman archives are available via the
>Internet even though mailman archives are deemed as private for mailing
>list members viewing only.

What are the URL paths being returned by the search engine.

Do they point to the web server delivering your Mailman web GUI? This is 
not such a dumb question as it might appear. It is entirely possible for a 
list subscriber to direct their incoming mail from the list to their own 
archive and it is this source which is being referenced by the search 
engines. Potentially the same content as your mailman archive just a 
different location/URL.

If  the URLs being returned point to the web server delivering your Mailman 
web GUI, do they begin with the public archive alias (default /pipermail/) 
or the private script alias (default /mailman/private/) or some other path?

This could give some clue as to how a search engine's indexer gained access 
to the private mail archives you are concerned about.

Any list that was created as private and has stayed as private ever since 
can only be accessed using HTTP, on a _properly_ configured system, via 
mailman's CGI script in $prefix/Mailman/Cgi/private.py. And this script 
requires a list subscribed member e-mail id and associated password before 
it allows list access.

If the web server concerned was mis-configured, so that it could serve the 
pages directly from the private archive storage through the file system via 
some other URL path, rather than the proper CGI path of 
/mailman/private/<listname>/..., this could give a clue as to what is 
causing your problem.

If the lists concerned were at some time public then the indexer could have 
accessed them at that time but the URL paths returned by the search engine 
would be of the form /pipermail/<listname>/... and following the link 
should now fail if the list is now private.

>If you do a search on Google or any other search engine you can find any
>message that was posted to the mailing list.  This is a problem for our
>private mailing lists.

But can you access the actual archive mail file via the URL returned by the 
search without having a valid member id and associated password?

The source of your problem will hinge in part on how the search engine 
indexers are crawling your web site. Is it pure 'arms length' HTTP access?

One of the problems with indexing Mailman private list archives to provide 
legitimate search facilities is the cookie authentication scheme used to 
control access by $prefix/Mailman/Cgi/private.py script. The indexers for 
some search engines are not programmed to handle this type of 
authentication. For instance, with the htdig search engine, in order to set 
up search of private list archives one has to do the indexing of them in 
the file space i.e. the indexer has to access the archive files through the 
filing system, and provide the indexer with a rule for mapping the file 
space paths back to the URLs that are to be returned in subsequent search 
results.

Is it possible that such an access path has been set up on your system for 
indexing private archives and that the index information has 'leaked' onto 
a publicly available search engine?

I see you have a search facility on your site. How is this implemented? 
Could this be the source of the leakage from the private mail archives to 
other search engines? How does your site search facility (it appears to be 
delivered by http://search.atomz.com/search/) do its indexing?

Also, I see your site makes use of PHP - no criticism intended - but the 
tools to drive a coach and horse through Mailman's attempts at archive 
security are ready to hand.

>Is there  a way to ensure that this is not available?  Also how do you

Yes:

1. configure your mailman and associated web server correctly

2. control the setup of any local archive search facility you set up to 
ensure the information it holds does not leak to outside search engines.

3. add a restriction on access for /mailman/ to your site's robots.txt: yes 
I know! But some search engine crawlers honor it


>get read of messages in the archives?

I assume you meant "get rid of messages in the archives". If so yes:

1. Edit the raw message in list's mailbox file 
$prefix/archives/private/<listname>.mbox to remove the offending messages.

2. Rebuild the archive using the command $prefix/bin/arch <listname>

>Nancy Montano
>
>
>--
>Nancy M. Montano        || 224 Cruz Alta Rd, #F || Taos, NM 87571
>Webmaster/Content Coord || nmontano at laplaza.org ||
>http://www.laplaza.org
>
>La Plaza Telecommunity  || [V] 505-758-1836     || [F] 505-751-1812
>
>"Aprender es avanzar"
>
>
>
>------------------------------------------------------
>Mailman-Users mailing list
>Mailman-Users at python.org
>http://mail.python.org/mailman/listinfo/mailman-users
>Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py





More information about the Mailman-Users mailing list