[Mailman-Users] Mailman & Htdig integration (with external archiver)
Richard Barrett
R.Barrett at ftel.co.uk
Tue Jan 15 17:09:09 CET 2002
At 14:33 15/01/2002 +0100, Sasa Janiska wrote:
>On Today, -0000, Richard Barrett wrote:
>
>Hi Richard!
>Thank you very much for your reply.
>
> > This is a straight htdig configuration issue. At the minimum you will have
> > to add start_url directives to htdig's conf file for each of the list
> > archives or ensure that links from one of the start_url directives in
> > htdig's conf file eventually lead to each of the list archives. You will
> > also have to have some sort of cron job to rebuild htdig's search indices
> > regularly (probably daily) to include new archived material.
>
>That's easy.
>
> > The following patches can be applied to the mailman 2.0.8 (and earlier
> > vesions of 2.0.x) to integrate htdig with Mailman and provide search of
> > archives generated by the internal (pipermail) archiver.
>
>Do you have soemthing ready for V2.1?
I have already posted on sourceforge versions of the patch for MM 2.1a3 and
MM2.1cvs. The latter is for the MM cvs at the date and time noted in the
posting but this may need updating depending on what change in the CVS
since my posting. It is my intention to publish a version of the patch for
the beta and final versions of MM 2.1 as soon as I can after they are
available. Just check sourceforge for Mailman patches 444879 and 444884
read the notes I post with each patch file.
> > The patches are not of direct relevance if you have opted to use an
> > external archiver.
>
>If pipermail can do the job, it isn't necessary. I am thinking about
>external archiver seeing that pipermail is no longer maintained ..
In the context of Mailman I think it can be said that pipermail is still
being maintained. MM contains its own copy of pipermail code in python and
if you search the developer archives you will see there is ongoing work and
discussion about its future. The archiver will certainly be enhance by and
maintained through MM 2.1 albeit the enhancements may not be that major. Do
you do python? Maybe you could make a contribution!
> > The benefit of the integration of htdig with Mailman archives generated by
> > pipermail is that it provides per list search facilities with a search form
> > on each list's archive TOC page and uses Mailman's security mechanism for
> > limiting access to private mail archives via search responses; in fact you
> > can only access a private list archive's search form if you are authorised
> > to access the list. The patches also automatically builds htdig config
> > files for each archived list and provides cron scripts for maintaining
> > htdig's search indices.
>
>That's very important to limit access for private list archives.
>Actually, only students should have access to the mailing lists, and
>only for those courses they are enrolled in.
If you go with the external archiver I guess you will have to apply
authentication and access control via the web server used to access the
archives produced. You may want to consider how you can automate keeping
the access control data for each private list's archives, in a format for
use by the web server, and the subscription information held by MM in step.
As an aside, the htdig/MM integration I produced uses per list search forms
embedded in the list archive TOC page in association with per list htdig
config files and per list search indexes. The primary reason is that this
gives user authentication before the search is done and inhibits
unauthorised users having access to links and synopsis information which
they are not entitled to access.
The approach I adopted helps overcome a problem with having search indexes
that contain information about both private and public data. If you have
this you have to do one of following:
1. if you are serious about security, use your own search script to run the
search engine's search and then filter the results returned by it to remove
links and their associated synopsis information which the user is not
authorised to see. The problem with this is that if you have a large search
space then checking all the returned results is going to be demanding of
system performance.
2. if you don't mind if people can read the snippets of data they are not
authorised to see in the synopsis returned in association with each link
you let the user see all the results returned. Having aroused their
interest you then annoy by refusing to let them follow one of the links
that the search just returned to them.
My approach sidesteps both these issues reasonably neatly but I'm sure
there are a dozen other ways of achieving the same objectives suing any
combination of list manager/archive/search engine.
>I'll definitely try with your suggestion.
>
>Since Pipermail is no longer developed, do you think about some patch
>with external archivers like Mhonarc or Hypermail?
I'm looking at producing a more generalised patch to simply producing
closer integrations of other search engines with Mailman archives. I guess
it might be worth expanding my thinking to generalise to mail archives
produced by other archivers and searching them with different search engines.
>Sincerely,
>Sasa
More information about the Mailman-Users
mailing list