[Mailman-Developers] Google Summer of Code: Integration of Search Code

Shayan Md mdoshayan at gmail.com
Thu Mar 29 20:55:50 CEST 2012


On Wed, Mar 28, 2012 at 6:59 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> On Wed, Mar 28, 2012 at 4:21 AM, Terri Oda <terri at zone12.com> wrote:
>
> >> Looks like archiver for mm3 is still in development stage. As far as I
> >> understand searcher depends on the srchiver, right? Not completely but
> it
> >> somewhat depends on archiver. I am not sure if searcher can be
> implemented
> >> without archiver. If possible I can implement for mm3 also.
> >
> > Searcher and archiver are interdependent *if* we want to share caches and
> > data stores, which we probably do for any installation with larger
> archives
> > where storing 2 copies vs 4 of each message would make a difference.
>  Plus,
> > many archive views may be basically searches "messages in the last month"
> > "messages which are replies to messageid $foo" etc.
>
> Actually, as far as I can see, the summary/search/index/retrieval
> functions depend only on the API for the message store.  If you
> want, you can split this into the database layer and a presentation
> layer, of course.  However, the database layer is surely going to
> have its own schema optimized for the kinds of retrieval its
> designer considers important.  If the designer emphasizes
> threads, however, she is *not* going to try to store messages in
> thread order or anything like that.  Rather, any reasonable store
> will be message-ID-addressable.
>
> The only tricky issue is that we *do* have to worry about
> message-ID collisions of truly different messages and about
> messages without message IDs, especially for converted
> historical archives.  So the API needs to be able to deal
> with these issues, probably by returning a set or sequence
> of messages.
>
> Oh, and we probably ought to have a more general notion
> of retrievable "object" rather than just messages, as some
> archive/retrieval backends may store some types of MIME
> part separately.  Hopefully these would be presented to
> us as MIME parts with external bodies and content IDs.
>
> I would guess she'll probably store messages in
> YY-MM/MSGID, or as git does in "unpacked"
> XX/YYYYYYYY... format, where XX are the first two digits
> of the hash ID, and YY... are the remaining ones).  But it
> could easily be backed by an IMAP store or something
> more specialized; we don't really care as long as it's
> object-ID-addressable.
>

Assuming that we have something like this(object-ID-addressable, If I am
not wrong, mailman3 made it possible but not yet implemented as it's part
of archiver), is it over ambitious to plan to implement indexer/searcher
for mailman3 and a REST API to use this searcher, extend client to use this
api,
and django search form along with this client api? All this independent of
archiver. Because the only part common with archiver is message retrieval
part,
If we implement whole searcher, and rest of the client code, later when
archiver is implemented message retrieval code can used in searcher. When
archiver is completely mature may we can even merge them together. Is it
possible? Or this plan has any 'non-sense' parts?


> And that's all we want to say about the archiver and the
> associated message-retrieval logic, I think.  (In fact, it occurs to
> me that maybe we should say "RFC 3501" and be done with
> it.  I don't mean that we necessarily implement IMAP protocol
> per se, but some subset of its functionality probably is what we
> need from an archiver.)
>
> Then the schema-specific stuff will use hash IDs to represent
> message objects in a portable but schema-specific way.  As
> it's schema-specific, I don't really see how data structures
> can be shared by different searchers.
>
> So I would say not to worry about the archiver side at all.  If
> large installations want to implement specialized message-
> retrieval, bully for them.  But we can go with simple backends,
> maildir, mbox, and maybe IMAP, I think.
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives:
> http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe:
> http://mail.python.org/mailman/options/mailman-developers/mdoshayan%40gmail.com
>
> Security Policy: http://wiki.list.org/x/QIA9
>


More information about the Mailman-Developers mailing list