[Mailman-Users] Indexing mail right after delivery

Mark Sapiro mark at msapiro.net
Fri Feb 26 19:15:13 CET 2010


On 2/26/2010 4:20 AM, Cedric Jeanneret wrote:
> On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <mark at msapiro.net>
> wrote:
> 
>> Cedric Jeanneret wrote:
>>> 
>>> I'm trying to create a xapian[1] indexer for our mailing list. As
>>> mailman is written in Python and there are python bindings for
>>> xapian, I guess I can maybe create a plugin for that. My first
>>> question is : is there already such a thing ? I searched on the
>>> net, but nothing appeared My second one : can we create a plugin
>>> for mailman, if so, where should I go to have some doc ? seems
>>> there's nothing in the wiki
>>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all)
>>>
>>>
>>> 
Just to explain why I'd like to do that: we already have a xapian search
engine in here, indexing a fileserver, request tracker queues and
moinmoin wikis... so we'd like to aggregate all our stuff in one app for
searching.
>> 
>> 
>> This will be quite doable with Mailman 3 which is still in
>> development.
>> 
>> There are problems trying to do this in Mailman 2.1.x. There is a 
>> plugin capability of sorts in the form of custom handlers that can
>> be added to the incoming message processing pipeline. See the FAQ
>> at <http://wiki.list.org/x/l4A9>. However, archiving is
>> asynchronous with incoming message processing, so it is not
>> possible for a custom handler to know the URL that will ultimately
>> retrieve the message from the archive.
>> 
>> A different approach which might be workable is to use the 
>> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If
>> you set
>> 
>> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' 
>> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py'
>> 
>> in mm_cfg.py, then that script will be invoked do do the archiving.
>> The script in turn could invoke the standard pipermail archiving
>> process and then invoke xapian to index the archived message.
>> 
> 
> 
> Hello again,
> 
> Just one question : what do mlist, msg, msgdata stand for ? As I read
> I've to create my module and define a "process(mlist, msg, msgdata)
> inside it, I'd like to know what are those objects. I discovered that
> mlist stands for a Mailman.MailList.MailList('list-name'), but for
> the others, it's a bit hard to find...


Only custom handlers need to define process(mlist, msg, msgdata). That
is the entry point to the handler and three objects are passed

mlist is the Mailman.MailList.MailList() instance for the current list

msg is a Mailman.Message.Message() (subclass of email.Message.Message)
    instance for the current message

msgdata is a dictionary of the message metadata accumulated so far.

The important thing is these are passed in as arguments to the handler
process() function.

In your case, you are defining a module which is going to be invoked
like the following.

Suppose that

PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s'

It will be invoked in a pipe similar to

cat raw_message | /path/to/myarch.py HOST LIST

i.e. the command string with %(hostname)s and %listname)s replaced by
the actual host name and list name of the list will be invoked and the
message piped to it.

So, it could begin something like:

#!python
import sys
sys.path.insert(0, 'path/to/mailman/bin')
# The above line can be skipped if myarch.py is in Mailman's
# bin directory.
import paths

import email
from Mailman import MailList
from Mailman import Message

msg = email.message_from_file(sys.stdin, Message.Message)
mlist = MailList.MailList(sys.argv[1], lock=True)


At this point, you have a list object (locked) and a message object. You
might think you could just do

mlist.ArchiveMail(msg)

to archive the mail to the listname.mbox file and the pipermail archive,
but that wouldn't quite work because that method would re-invoke the
external archiver. Also, you don't need to worry about the listname.mbox
file because the ArchiveMail() method already did that before invoking
the external archiver, so what you would need is

from Mailman.Archiver import HyperArch
from cStringIO import StringIO
f = StringIO(str(msg))
h = HyperArch.HyperArchive(mlist)
h.processUnixMailbox(f)
h.close()
f.close()

Which is what the ArchiveMail() method would do. Now you still have the
mlist and msg objects, and you need to save and unlock the list at some
point

mlist.Save()
mlist.Unlock()

and the message is now in the pipermail archive and can be indexed.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list