[Mailman-Users] Search by Message-ID, preserving Cc for direct recipients

Mark Sapiro mark at msapiro.net
Wed May 15 21:33:27 CEST 2013


On 05/14/2013 10:17 AM, Jed Brown wrote:
> I would like to be able to search the archives of a mailman list using
> the Message-ID, ideally using a stable URL like
> 
>   http://mid.gmane.org/${message_id}
>   http://mail-archive.com/search?l=mid&q=${message_id}
> 
> but preferably on our own host as we're not currently mirrored and would
> rather link to our own archives when referencing on old discussion on
> the list.  Our current archives (e.g., [1]) are searched using htdig,
> but it doesn't seem to support query by Message-ID.  Your wiki page [2]
> also suggests Swish, MnoGoSearch, and Namazu.  Can any of these search
> by Message-ID, or is our best bet to get indexed by mail-archive.com and
> direct people there?


The Message-ID of the post is in the HTML page containing the post, but
it is only in an In-Reply-To= fragment of a mailto: URL that isn't
indexed in htdig. Also, it's URL encoded so <, > and @ are %3C, %3E and
%40 respectively. The actual Message-ID: headers are in the periodic
*.txt files.

This leads to a few possibilities such as teaching htdig to index the
.txt files (may be tricky, I just spent a couple of minutes looking at
this and didn't see it), changing the noindex start and end tags in the
list's archives/private/LIST/htdig/LIST.conf file so that everything in
the HTML files including the URL encoded Message-ID is indexed or
writing a separate CGI search script to search the .txt files for the
Message-ID.

Or, use mail-archive.com which is probably simplest.


> Second question: Why are direct recipients dropped from the Cc header of
> the copy sent via the list?  This seems partially addressed in the
> archives [3], but I think it's important for high-volume lists when
> people filter conversations based on whether they are a direct
> recipient.  Is there an option somewhere to keep Cc headers intact
> without changing other behavior?
> 
> [1] http://lists.mcs.anl.gov/pipermail/petsc-dev/
> [2] http://wiki.list.org/display/DOC/How+do+I+make+the+archives+searchable
> [3] http://mail.python.org/pipermail/mailman-developers/2006-May/018777.html


I've learned a lot in the last 7 years ;)

The reason is to keep the Cc: list from growing excessively long in long
threads involving many people (see the subsequent post(s) in that thread).

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list