[Mailman-Users] Indexing mail right after delivery

Cédric Jeanneret cedric.jeanneret at camptocamp.com
Sat Feb 27 18:03:17 CET 2010


On Fri, Feb 26, 2010 at 7:15 PM, Mark Sapiro <mark at msapiro.net> wrote:
> On 2/26/2010 4:20 AM, Cedric Jeanneret wrote:
>> On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <mark at msapiro.net>
>> wrote:
>>
>>> Cedric Jeanneret wrote:
>>>>
>>>> I'm trying to create a xapian[1] indexer for our mailing list. As
>>>> mailman is written in Python and there are python bindings for
>>>> xapian, I guess I can maybe create a plugin for that. My first
>>>> question is : is there already such a thing ? I searched on the
>>>> net, but nothing appeared My second one : can we create a plugin
>>>> for mailman, if so, where should I go to have some doc ? seems
>>>> there's nothing in the wiki
>>>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all)
>>>>
>>>>
>>>>
> Just to explain why I'd like to do that: we already have a xapian search
> engine in here, indexing a fileserver, request tracker queues and
> moinmoin wikis... so we'd like to aggregate all our stuff in one app for
> searching.
>>>
>>>
>>> This will be quite doable with Mailman 3 which is still in
>>> development.
>>>
>>> There are problems trying to do this in Mailman 2.1.x. There is a
>>> plugin capability of sorts in the form of custom handlers that can
>>> be added to the incoming message processing pipeline. See the FAQ
>>> at <http://wiki.list.org/x/l4A9>. However, archiving is
>>> asynchronous with incoming message processing, so it is not
>>> possible for a custom handler to know the URL that will ultimately
>>> retrieve the message from the archive.
>>>
>>> A different approach which might be workable is to use the
>>> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If
>>> you set
>>>
>>> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py'
>>> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py'
>>>
>>> in mm_cfg.py, then that script will be invoked do do the archiving.
>>> The script in turn could invoke the standard pipermail archiving
>>> process and then invoke xapian to index the archived message.
>>>
>>
>>
>> Hello again,
>>
>> Just one question : what do mlist, msg, msgdata stand for ? As I read
>> I've to create my module and define a "process(mlist, msg, msgdata)
>> inside it, I'd like to know what are those objects. I discovered that
>> mlist stands for a Mailman.MailList.MailList('list-name'), but for
>> the others, it's a bit hard to find...
>
>
> Only custom handlers need to define process(mlist, msg, msgdata). That
> is the entry point to the handler and three objects are passed
>
> mlist is the Mailman.MailList.MailList() instance for the current list
>
> msg is a Mailman.Message.Message() (subclass of email.Message.Message)
>    instance for the current message
>
> msgdata is a dictionary of the message metadata accumulated so far.
>
> The important thing is these are passed in as arguments to the handler
> process() function.
>
> In your case, you are defining a module which is going to be invoked
> like the following.
>
> Suppose that
>
> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s'
>
> It will be invoked in a pipe similar to
>
> cat raw_message | /path/to/myarch.py HOST LIST
>
> i.e. the command string with %(hostname)s and %listname)s replaced by
> the actual host name and list name of the list will be invoked and the
> message piped to it.
>
> So, it could begin something like:
>
> #!python
> import sys
> sys.path.insert(0, 'path/to/mailman/bin')
> # The above line can be skipped if myarch.py is in Mailman's
> # bin directory.
> import paths
>
> import email
> from Mailman import MailList
> from Mailman import Message
>
> msg = email.message_from_file(sys.stdin, Message.Message)
> mlist = MailList.MailList(sys.argv[1], lock=True)
>
>
> At this point, you have a list object (locked) and a message object. You
> might think you could just do
>
> mlist.ArchiveMail(msg)
>
> to archive the mail to the listname.mbox file and the pipermail archive,
> but that wouldn't quite work because that method would re-invoke the
> external archiver. Also, you don't need to worry about the listname.mbox
> file because the ArchiveMail() method already did that before invoking
> the external archiver, so what you would need is
>
> from Mailman.Archiver import HyperArch
> from cStringIO import StringIO
> f = StringIO(str(msg))
> h = HyperArch.HyperArchive(mlist)
> h.processUnixMailbox(f)
> h.close()
> f.close()
>
> Which is what the ArchiveMail() method would do. Now you still have the
> mlist and msg objects, and you need to save and unlock the list at some
> point
>
> mlist.Save()
> mlist.Unlock()
>
> and the message is now in the pipermail archive and can be indexed.
>
> --
> Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
> San Francisco Bay Area, California    better use your sense - B. Dylan
>
>

wow, thanks a lot, with all this I'll be able to do what I want!

I'll post all my stuff as soon as I've done it, hopefully next week :).

Thanks again.

Best regards,

C.


More information about the Mailman-Users mailing list