[Mailman-Users] Searchable archives

Richard Barrett R.Barrett at ftel.co.uk
Fri Sep 20 13:51:35 CEST 2002


On Thursday 19 September 2002 19:34, G. Armour Van Horn wrote:
> Richard,
>
> Perhaps you could take a minute to sketch the advantages of the options
> mentioned, the FAQ tells me that there are three but gives no clue as to
> establishing a preference among MnoGoSearch, HT:Dig, and  Pipermail itself.
>
> I have never tried to set any of these up, and the only one I recall
> reading about here is HT:Dig. Since I have a couple of lists that probably
> are candidates for a search feature I'd like to hear how the contenders
> compare.
>

I have to admit that I cannot speak to using anything other than Htdig for 
searching list archives produced using Mailman's internal archiver, which is 
Pipermail.

Why did I opt for Htdig? We were already using it to provide search for some 
existing web material on our intranet. It is Open Source, has a good 
reputation, is being actively developed and is readily available with RPMs 
included in the Redhat and Suse Linux distributions I use. 

I was young, naive and lazy when I decided to use Mailman's internal 
archiver. It was available without any effort as part of MM and I needed to 
get a new list manager with archiving up and working fairly quickly. I do not 
regret the decision but, certainly in MM 2.0.x, the handling of mail 
attachments by the archiver is poor. MM 2.1b3 improves things but I can 
understand why people use external archivers for their lists. I'm considering 
using MHonarc as I am told it is better but cannot get it high enough up the 
priority list to do the real work involved in a thorough trial installation.

My Htdig/MM integration patches were produced so that, having patched and 
installed MM and with a vanilla Htdig installation, the patched MM code would
pretty much do everthing that needed to be done without further manual 
intervention. The setup is one time and mainly to tell Mailman where Htdig is 
installed. You also have to make one symbolic link in the file system so that 
Htdig can reach htdig conf files for the MM list archives. The patched MM 
code builds per list htdig config files to control indexing and search of 
each list's archives. It also provides a per list search form on each 
archived list'sTOC page and preserves access control over private list 
archives. List can move from having private to public archives or vice versa 
without any intervention as regards their searchability and with their new 
access status via search being honoured.

The #444884 patch includes cron scripts for doing regular list reindexing and 
some useful maintenance scripts. It also allows the indexing and searching to 
be done on a separate machine to MM as long as it has access through NFS to 
the mail archives.

If all your lists are public and you are happy not having independent per 
list search then any search engine can be configured to access and index the 
list archives.
 
I'm sure the other search engine candidates are perfectly viable. External 
archivers like MHonarc may offer advantages over MM's pipermail archives. 
When I have time I'll look at a more generic integration. Until then others 
will have to speak to those alternatives.

Of the two patches I cited, #444879 is generic if you are using MM's internal 
archiver and applicable regardless of search engine you use. Its purpose is 
to embed configurable strings in the archived HTML pages to influence search 
engine indexing that should/may improve the quality of the search results 
subsequently returned. #444879 is a necessary precursor for using the #444884 
patch.

If you download the #444884 patch and apply it to a test expansion of the MM 
.tar.gz (or just read the patch file as text) you will find the patch adds a 
file to the top level of the MM build directory called INSTALL.htidg-mm. This 
file gives a lot of detail about installing and setting up the MM/Htdig 
integration supported by the patch. 

Thus far I've been able to keep my patches up-to-date as MM moves along. I do 
not know how many MM installations use them Probably more than 10 but when I 
asked on this I did not get that many responses so maybe its not as useful to 
others as it is for our, mainly company internal, mailing lists.

> Van
>
> Richard Barrett wrote:
> > There is a FAQ entry on this topic:
> >
> > http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.011.htp
> >
> > which also refers to a couple of patches I maintain to integrate htdig
> > with MM. If you decide to use these make sure you download and apply the
> > correct patch version for the MM version you are running. These patches
> > will handle indexing/search of both public archives and private archives,
> > with privacy access control being maintained for the latter:
> >
> > http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=
> >103&atid=300103
> >
> > http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=
> >103&atid=300103





More information about the Mailman-Users mailing list