[Mailman-Users] Private archives available via Internet
Richard Barrett
R.Barrett at ftel.co.uk
Thu Mar 28 19:54:06 CET 2002
At 08:47 28/03/2002 -0700, you wrote:
>It has come to our attention that mailman archives are available via the
>Internet even though mailman archives are deemed as private for mailing
>list members viewing only.
What are the URL paths being returned by the search engine.
Do they point to the web server delivering your Mailman web GUI? This is
not such a dumb question as it might appear. It is entirely possible for a
list subscriber to direct their incoming mail from the list to their own
archive and it is this source which is being referenced by the search
engines. Potentially the same content as your mailman archive just a
different location/URL.
If the URLs being returned point to the web server delivering your Mailman
web GUI, do they begin with the public archive alias (default /pipermail/)
or the private script alias (default /mailman/private/) or some other path?
This could give some clue as to how a search engine's indexer gained access
to the private mail archives you are concerned about.
Any list that was created as private and has stayed as private ever since
can only be accessed using HTTP, on a _properly_ configured system, via
mailman's CGI script in $prefix/Mailman/Cgi/private.py. And this script
requires a list subscribed member e-mail id and associated password before
it allows list access.
If the web server concerned was mis-configured, so that it could serve the
pages directly from the private archive storage through the file system via
some other URL path, rather than the proper CGI path of
/mailman/private/<listname>/..., this could give a clue as to what is
causing your problem.
If the lists concerned were at some time public then the indexer could have
accessed them at that time but the URL paths returned by the search engine
would be of the form /pipermail/<listname>/... and following the link
should now fail if the list is now private.
>If you do a search on Google or any other search engine you can find any
>message that was posted to the mailing list. This is a problem for our
>private mailing lists.
But can you access the actual archive mail file via the URL returned by the
search without having a valid member id and associated password?
The source of your problem will hinge in part on how the search engine
indexers are crawling your web site. Is it pure 'arms length' HTTP access?
One of the problems with indexing Mailman private list archives to provide
legitimate search facilities is the cookie authentication scheme used to
control access by $prefix/Mailman/Cgi/private.py script. The indexers for
some search engines are not programmed to handle this type of
authentication. For instance, with the htdig search engine, in order to set
up search of private list archives one has to do the indexing of them in
the file space i.e. the indexer has to access the archive files through the
filing system, and provide the indexer with a rule for mapping the file
space paths back to the URLs that are to be returned in subsequent search
results.
Is it possible that such an access path has been set up on your system for
indexing private archives and that the index information has 'leaked' onto
a publicly available search engine?
I see you have a search facility on your site. How is this implemented?
Could this be the source of the leakage from the private mail archives to
other search engines? How does your site search facility (it appears to be
delivered by http://search.atomz.com/search/) do its indexing?
Also, I see your site makes use of PHP - no criticism intended - but the
tools to drive a coach and horse through Mailman's attempts at archive
security are ready to hand.
>Is there a way to ensure that this is not available? Also how do you
Yes:
1. configure your mailman and associated web server correctly
2. control the setup of any local archive search facility you set up to
ensure the information it holds does not leak to outside search engines.
3. add a restriction on access for /mailman/ to your site's robots.txt: yes
I know! But some search engine crawlers honor it
>get read of messages in the archives?
I assume you meant "get rid of messages in the archives". If so yes:
1. Edit the raw message in list's mailbox file
$prefix/archives/private/<listname>.mbox to remove the offending messages.
2. Rebuild the archive using the command $prefix/bin/arch <listname>
>Nancy Montano
>
>
>--
>Nancy M. Montano || 224 Cruz Alta Rd, #F || Taos, NM 87571
>Webmaster/Content Coord || nmontano at laplaza.org ||
>http://www.laplaza.org
>
>La Plaza Telecommunity || [V] 505-758-1836 || [F] 505-751-1812
>
>"Aprender es avanzar"
>
>
>
>------------------------------------------------------
>Mailman-Users mailing list
>Mailman-Users at python.org
>http://mail.python.org/mailman/listinfo/mailman-users
>Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
More information about the Mailman-Users
mailing list