[Mailman-Users] Search Archive Feature?

Richard Barrett R.Barrett at ftel.co.uk
Fri Oct 26 14:07:24 CEST 2001


Apologies to other subscribers for these lengthy exchanges over what I 
assume is a peripheral topic for most Mailman users. Mark, I suggest we 
take the this off list unless you object.

Mark

At 15:54 25/10/2001 -0400, Mark T. Valites wrote:
>Richard,
>
>In patch #44484,
>
>to sym-link to "../private/conf_name" instead of mlist.archive_dir joined 
>with 'archives/htdig', should
>HTDIG_CONF_LINK_DIR = '../archives/htdig"?

Unfortunately I do not think we can adopt this suggestion as it stands. If 
you check the patch in more detail you will see that it adds the following 
fragment of code inside a function called setup_htdig at line 258 in the 
file Mailman/Archiver/HyperArch.py:

         # we need a symlink so that htdig will be able to find the config file
         conf_file_link = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name)
         # if this link is left over from a previous list with the same name
         # (unlikely I know) we'll whack it first and recreate
         if os.path.exists(conf_file_link) and os.path.islink(conf_file_link):
             os.unlink(conf_file_link)
         os.symlink(conf_file, conf_file_link)

As there are no guarantees as to the pwd when the script executes, the 
python variable HTDIG_CONF_LINK_DIR's value has to be an absolute path in 
the filesystem for this code to be reliable. When I next update the patch 
I'll be explict in saying the value assigned to HTDIG_CONF_LINK_DIR should 
be a full filesystem path as seen from the machine running the Mailman 
code. You might notice if you looked in your Mailman build directory after 
applying the patch that Mailman/Defaults.py says:

     HTDIG_CONF_LINK_DIR = os.path.join(PREFIX, 'archives/htdig')

When you run make install PREFIX is translated to the full path to the 
Mailman installation directory. In other words this python variable is 
automatically set to the correct value by the installation process which is 
why setting some other value to it is not discussed in the additions to the 
INSTALL document made by the patch. If you are going to override what is in 
Defaults.py in mm_cfg.py then you must put a full path as the value, not a 
relative path as you have apparently done. I'll look at adding a extra note 
to that effect to the patch for the INSTALL file.

>That would take care of my one problem with two different machines both 
>with mailman installed.
>
>Also, I noticed you released a new patch the day before yesterday.   It 
>appears to create a new, separate dir for what it does.  What exactly does 
>it add/change/do?

I'm unaware of the new, separate dir you are referring to. When I diff the 
original patch htdig-2.0.6.path and the revised htdig-2.0.6-0.2.patch and 
omit the differences resulting from top level directory names and 
modification dates of files I find only two changes:

1. in Mailman/Archiver/HyperArch.py: this is an entirely cosmetic patch:

210c210
< +                    lastrun = time.asctime(last_rundig)
---
 > +                    lastrun = time.strftime("%A, %d %b %Y %H:%M:%S %Z", 
last_rundig)

2. in cron/nightly_htdig: this corrects a bug but one which can only 
manifest in particular circumstances when a list's archive is first being 
indexed.

645a646
 > +        archive = HyperArch.HyperArchive(mlist)
648d648
< +            archive = HyperArch.HyperArchive(mlist)

Come back to me if you think I'm in error here.

>Mark T. Valites wrote:
>
>>Richard, I got a good one for you...
>>
>>Below in this email, you mentioned that you could get the patches to work 
>>if they ran on a different machine that the one that the actual archives

That's reading rather more into what my patch added to the INSTALL document 
than I really intended. After patching it said:

           You must have htdig installed and it must be able to access
           $prefix/archives directly i.e. either htdig must be running
           on the same machine as Mailman or it must be able to reach
           the archives via NFS.

You, not entirely unreasonably, have read that to imply that the patch was 
intended to support htdig running on a machine other than the machine 
delivering the web interface for Mailman.

While I had hoped at one point during development of the integration to 
achieve this, I found there were problems associated retaining privacy when 
providing searched access to private list archives. The problem is in fact 
the same for providing access to private list archives when the htdig 
integration is not patched in. While solving that I stopped considering the 
type of solution you want to adopt - described below.

Anyway, the upshot is that I didn't make full provision for having htdig 
run on a different machine, either for building indexes or for doing index 
searches. I think its going to take more than adjusting some of the Mailman 
configuration variables to produce a full solution.

It looks from our weather forecast that its going to be a wet weekend so 
I'll see if I can come up with a patch to provide a complete solution over 
the weekend. I'll email you when I have posted a patch for this.

Richard


>>lived on.  Unfortuneatly, I think I may have to go there... Htdig does 
>>not compile under gcc-3.0.X, and it is difficult to get going under 
>>solaris 8 from what I have read and been replied to on the htdig mailing 
>>list. So, I have my ultra 5 desktop set up so that it exports the whole 
>>mailman dir to a linux box at the far end of the room.  The linux box has 
>>a successfull build of htdig, and another mailman test site(with 
>>searchable archives working!).  I mounted the mailman export from the 
>>ultra 5 into /mnt/ultra on the linux box.  From there, I added the stuff 
>>to mm_cfg.py on the sun (/mnt/ultra/Mailman/mm_cfg.py on the linux box):
>>
>>HTDIG_MAILMAN_DIR = 'htdig-ultra'
>>HTRUNDIG_PATH = '/usr/local/htdig/bin/rundig' # (on the linux box)
>>USE_HTDIG = 1,
>>after previously creating a symlink on the linux box from 
>>/mnt/ultra/archives/htdig to /usr/local/htdig/conf/htdig-ultra (on the 
>>linux box)
>>
>>The main archives webpage on the sun picks up the patches, and displays 
>>the search box, like it should.  However, after I enter a query, I get an 
>>error saying:
>>
>>requested URL /cgi-bin/htsearch was not found on the server.
>>I figure this comes from the fact htdig never was installed on the sun, 
>>and that this was getting picked up from the linux box install of htdig.
>>
>>In addition, the link that gets created from the nightly_htdig cron job 
>>(I think it comes from there, I could very well be wrong) points to
>>/home/mailman/archives/private/listname/htdig/listname.conf.  Problem is, 
>>on the linux box, that directory is not correct, and it points to the 
>>local mailman install. To correct this, I think that link should instead 
>>point to ../private/listname/htdig/listname.conf
>>
>>Granted, that will only take care of one of my problems.  (How)/can I get 
>>the query to perform correctly on the sun machine with no htdig 
>>installed? or did I read too much into what you said?
>>
>>
>>
>>Richard Barrett wrote:
>>
>>>Mark
>>>
>>>At 14:56 19/10/2001 -0400, Mark T. Valites wrote:
>>>
>>>>I've been looking at implementing these two patches into a new mailman 
>>>>install I am doing here.  The documentation seems to be missing though. 
>>>>I'm able to patch the mailman source, but what to do after that I'm a 
>>>>bit lost as for what to do.  While this isn't a mailing list for these 
>>>>pathces, I was wondering if anyone has had any experience installing 
>>>>these patchs, ideas of where I can find more info on them, and maybe if 
>>>>anyone knows if they can be added to an existing mailman install after 
>>>>it has already been implemented and used.
>>>
>>>
>>>
>>>Read the INSTALL file in $build/INSTALL which is patched by the patches. 
>>>It should now contain the following text:
>>>
>>>         - Setting up Mailman list archive search using
>>>           Mailman-htdig (http://www.htdig.org) integration
>>>
>>>           You must have htdig installed and it must be able to access
>>>           $prefix/archives directly i.e. either htdig must be running
>>>           on the same machine as Mailman or it must be able to reach
>>>           the archives via NFS.
>>>
>>>           Next, establish where htdig expects to find its configuration
>>>           files. For instance with my copy of Redhat 6.2 Secure Web 
>>> Server rpms
>>>           this is /etc. With Suse Linux 6.4 rpms this is 
>>> /opt/www/htdig/conf.
>>>           Create a symbolic link in this directory to 
>>> $prefix/archives/htdig.
>>>
>>>           For instance:
>>>
>>>               ln -s /home/mailman/archives/htdig /etc/htdig-mailman
>>>
>>>           You need to set up the following Mailman configuration parameters
>>>           in $prefix/Mailman/mm_cfg.py:
>>>
>>>           HTDIG_MAILMAN_LINK - string variable. Set this to the name of the
>>>           symlink you just created. For instance:
>>>
>>>               HTDIG_MAILMAN_LINK = 'htdig-mailman'
>>>
>>>           HTDIG_RUNDIG_PATH - string variable. The path to the script which
>>>           runs to build htdig search indices. For example:
>>>
>>>               HTDIG_RUNDIG_PATH = '/usr/bin/rundig'
>>>
>>>           USE_HTDIG = 1 - python truth value activates Mailman use of htdig
>>>
>>>       Lists with archives will be searchable via the list's TOC page from
>>>       now on.
>>>
>>>       Notes:
>>>
>>>       1. By default, rundig is run for each archived list every day (if
>>>          new posts to the list have been archived since rundig was last
>>>          run) by cron executing the script $prefix/cron/nightly_htdig.
>>>          Change Mailman's crontab entries if you want to change this
>>>          interval.
>>>
>>>       2. Until a list has some archived postings and nightly_htdig has
>>>          been run once, then search's of a list's archive will fail with
>>>          a complaint about missing db files. To overcome this, post a
>>>          message to the list and run nightly_htdig by hand from the
>>>          command line under the mailman UID.
>>>
>>>If you need any further info, please contact me. Best of luck
>>>
>>>Richard
>>>
>>>
>>>>Richard Barrett wrote:
>>>>
>>>>>The following patches integrate the htdig (http://www.htdig.org/) 
>>>>>search engine with Mailman.
>>>>>
>>>>>http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&atid=300103
>>>>>
>>>>>http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=103&atid=300103
>>>>>
>>>>>
>>>>>At 09:52 03/10/2001 -0400, The Berean wrote:
>>>>>
>>>>>>Thanks for the answers to my previous question, they were very much
>>>>>>appreciated.  I just have one more:
>>>>>>
>>>>>>Does Mailman provide a feature for searching archives?  I know it can 
>>>>>>list
>>>>>>and store archives, but I havent seen anything about searching them.  If
>>>>>>not, would a CGI search script do the job?  If it can, what would be the
>>>>>>best CGI script to use?  I have an Entropy search script preinstalled 
>>>>>>on my
>>>>>>server, but it's designed to searc my entire website, not strictly the
>>>>>>archives.  I noticed Python.org has a feature for searching the 
>>>>>>archives by
>>>>>>Inktomi, but that looks like its going to cost a couple of amputations.
>>>>>>Thanks for any help!
>>>>>>
>>>>>>Frank Pagano (The Berean)
>>>>>>Owner of C-SQUAD (http://www.c-squad.org)
>>>>>>---------------------------------------------------------------------------
>>>>>>E-mail: Berean at C-Squad.org
>>>>>>Personal Fax Number: (630) 214-9076
>>>>>>ICQ#: 98723297
>>>>>>AOL Screen Name: Berean333
>>>>>>Yahoo! Messenger ID: tseh_dek
>>>>>>******************************************
>>>>>>"Those who would give up essential liberty to purchase a
>>>>>>little temporary safety deserve neither liberty nor safety."
>>>>>>--Benjamin Franklin, 1759
>>>>>>
>>>>>>---
>>>>>>Outgoing mail is certified Virus Free.
>>>>>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>>>>>Version: 6.0.281 / Virus Database: 149 - Release Date: 9/18/01
>>>>>>
>>>>>>
>>>>>>------------------------------------------------------
>>>>>>Mailman-Users maillist  -  Mailman-Users at python.org
>>>>>>http://mail.python.org/mailman/listinfo/mailman-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>------------------------------------------------------
>>>>>Mailman-Users maillist  -  Mailman-Users at python.org
>>>>>http://mail.python.org/mailman/listinfo/mailman-users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Mark T. Valites
>>>>Unix Systems Analyst
>>>>124b South Hall
>>>>SUNY Geneseo
>>>>Geneseo, NY 14454
>>>>(716) 245-5577
>>>>
>>
>
>
>--
>Mark T. Valites
>Unix Systems Analyst
>124b South Hall
>SUNY Geneseo
>Geneseo, NY 14454
>(716) 245-5577
>
>





More information about the Mailman-Users mailing list