[Mailman-Users] Indexing mail right after delivery

Cedric Jeanneret cedric.jeanneret at camptocamp.com
Tue Mar 2 12:41:35 CET 2010


On Fri, 26 Feb 2010 10:15:13 -0800
Mark Sapiro <mark at msapiro.net> wrote:

> On 2/26/2010 4:20 AM, Cedric Jeanneret wrote:
> > On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <mark at msapiro.net>
> > wrote:
> > 
> >> Cedric Jeanneret wrote:
> >>> 
> >>> I'm trying to create a xapian[1] indexer for our mailing list. As
> >>> mailman is written in Python and there are python bindings for
> >>> xapian, I guess I can maybe create a plugin for that. My first
> >>> question is : is there already such a thing ? I searched on the
> >>> net, but nothing appeared My second one : can we create a plugin
> >>> for mailman, if so, where should I go to have some doc ? seems
> >>> there's nothing in the wiki
> >>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all)
> >>>
> >>>
> >>> 
> Just to explain why I'd like to do that: we already have a xapian search
> engine in here, indexing a fileserver, request tracker queues and
> moinmoin wikis... so we'd like to aggregate all our stuff in one app for
> searching.
> >> 
> >> 
> >> This will be quite doable with Mailman 3 which is still in
> >> development.
> >> 
> >> There are problems trying to do this in Mailman 2.1.x. There is a 
> >> plugin capability of sorts in the form of custom handlers that can
> >> be added to the incoming message processing pipeline. See the FAQ
> >> at <http://wiki.list.org/x/l4A9>. However, archiving is
> >> asynchronous with incoming message processing, so it is not
> >> possible for a custom handler to know the URL that will ultimately
> >> retrieve the message from the archive.
> >> 
> >> A different approach which might be workable is to use the 
> >> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If
> >> you set
> >> 
> >> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' 
> >> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py'
> >> 
> >> in mm_cfg.py, then that script will be invoked do do the archiving.
> >> The script in turn could invoke the standard pipermail archiving
> >> process and then invoke xapian to index the archived message.
> >> 
> > 
> > 
> > Hello again,
> > 
> > Just one question : what do mlist, msg, msgdata stand for ? As I read
> > I've to create my module and define a "process(mlist, msg, msgdata)
> > inside it, I'd like to know what are those objects. I discovered that
> > mlist stands for a Mailman.MailList.MailList('list-name'), but for
> > the others, it's a bit hard to find...
> 
> 
> Only custom handlers need to define process(mlist, msg, msgdata). That
> is the entry point to the handler and three objects are passed
> 
> mlist is the Mailman.MailList.MailList() instance for the current list
> 
> msg is a Mailman.Message.Message() (subclass of email.Message.Message)
>     instance for the current message
> 
> msgdata is a dictionary of the message metadata accumulated so far.
> 
> The important thing is these are passed in as arguments to the handler
> process() function.
> 
> In your case, you are defining a module which is going to be invoked
> like the following.
> 
> Suppose that
> 
> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s'
> 
> It will be invoked in a pipe similar to
> 
> cat raw_message | /path/to/myarch.py HOST LIST
> 
> i.e. the command string with %(hostname)s and %listname)s replaced by
> the actual host name and list name of the list will be invoked and the
> message piped to it.
> 
> So, it could begin something like:
> 
> #!python
> import sys
> sys.path.insert(0, 'path/to/mailman/bin')
> # The above line can be skipped if myarch.py is in Mailman's
> # bin directory.
> import paths
> 
> import email
> from Mailman import MailList
> from Mailman import Message
> 
> msg = email.message_from_file(sys.stdin, Message.Message)
> mlist = MailList.MailList(sys.argv[1], lock=True)
> 
> 
> At this point, you have a list object (locked) and a message object. You
> might think you could just do
> 
> mlist.ArchiveMail(msg)
> 
> to archive the mail to the listname.mbox file and the pipermail archive,
> but that wouldn't quite work because that method would re-invoke the
> external archiver. Also, you don't need to worry about the listname.mbox
> file because the ArchiveMail() method already did that before invoking
> the external archiver, so what you would need is
> 
> from Mailman.Archiver import HyperArch
> from cStringIO import StringIO
> f = StringIO(str(msg))
> h = HyperArch.HyperArchive(mlist)
> h.processUnixMailbox(f)
> h.close()
> f.close()
> 
> Which is what the ArchiveMail() method would do. Now you still have the
> mlist and msg objects, and you need to save and unlock the list at some
> point
> 
> mlist.Save()
> mlist.Unlock()
> 
> and the message is now in the pipermail archive and can be indexed.
> 

Hello again,

I'm having some troubles with my code. According to what Mark said, I've done this :

#!/usr/bin/env python
import sys
sys.path.insert(0,'/usr/lib/mailman')

import syslog

syslog.syslog('begin script')

import email
from Mailman import MailList
from Mailman import Message
## archive part
from Mailman.Archiver import HyperArch
from cStringIO import StringIO

maillist = sys.argv[2]
hostname = sys.argv[1]

msg = email.message_from_file(sys.stdin, Message.Message)
syslog.syslog(maillist)

mlist = MailList.MailList(maillist, lock=True)

syslog.syslog('processing archiver')
## let archive it
f = StringIO(str(msg))
h = HyperArch.HyperArchive(mlist)
h.processUnixMailbox(f)
h.close()
f.close()
mlist.Save()
mlist.Unlock()

mlist.ArchiveMail(msg)

syslog.syslog('processing indexer')
### coming soon

syslog.syslog('exiting - all ok')
sys.exit(0)

"syslog" is for debug purpose only.

And if I send an email on my ML, I have this kind of error:

Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking
Mar 02 12:38:33 2010 (28380)   File "/var/lib/mailman/scripts/driver", line 250, in <module>
Mar 02 12:38:33 2010 (28380)     run_main()
Mar 02 12:38:33 2010 (28380)   File "/var/lib/mailman/scripts/driver", line 110, in run_main
Mar 02 12:38:33 2010 (28380)     main()
Mar 02 12:38:33 2010 (28380)   File "/usr/lib/mailman/Mailman/Cgi/admin.py", line 167, in main
Mar 02 12:38:33 2010 (28380)     mlist.Lock()
Mar 02 12:38:33 2010 (28380)   File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock
Mar 02 12:38:33 2010 (28380)     self.__lock.lock(timeout)
Mar 02 12:38:33 2010 (28380)   File "/usr/lib/mailman/Mailman/LockFile.py", line 306, in lock
Mar 02 12:38:33 2010 (28380)     important=True)
Mar 02 12:38:33 2010 (28380)   File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog
Mar 02 12:38:33 2010 (28380)     traceback.print_stack(file=logf)

This block is spamming my /var/log/mailman/locks

It seems I have a problem with the lockfile... 

Any idea ?

Thank you!



-- 
Cédric Jeanneret                 |  System Administrator
021 619 10 32                    |  Camptocamp SA
cedric.jeanneret at camptocamp.com  |  PSE-A / EPFL
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100302/fdb10c74/attachment.pgp>


More information about the Mailman-Users mailing list