[Mailman-Developers] Speaking about kitties (or archivers)

Tue Apr 24 20:12:21 CEST 2012

On Mon, Apr 23, 2012 at 06:20:18PM -0400, Barry Warsaw wrote:
> Thanks for posting this Pierre-Yves!
> 
> On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:
> 
> >archive-core (store the emails and expose them through an API) -->
> >archivers/stats/NNTP
> >
> >The questions are then:
> >- how do we store the emails ?
> >- how do we expose the API ?
> >- how to make it such that it becomes easy to extend ? (ie: the stats
> >module wants to read the db, but probably also to store information on
> >it)
> 
> Sharing is good, but it's also important to remember that any specific system
> may or may not have a local archiver.  I could certainly imagine a site that
> only archives on M-A or Gmane and doesn't waste the space to archive locally.
> 
> I think we've pretty much come to agreement that the core itself doesn't need
> a full copy of all the messages after it's sent them, but of course, the
> "prototype" archiver could be used to keep a local copy of everything in a
> maildir.  That could be shared at the lower level (maildir) or through some
> kind of API in minikitty.
> 
Ive been thinking about this and I'm in mild disagreement.  I think that
a mailing list system should give people an archive-store which is acessible
behind a generalized API.  That may be a non-local archiver if it's still
possible to implement the API.  That archiver-store should be pluggable (the
storage could be SQL, mongodb, or remote) but having the store be accessbile
is important.

The store may be accessible via a REST API but I'm not certain that its the
correct level to deal with when talking about it in this contect.  The
current mailman3 doesn't have an API for plugging in archivers via REST...
it has an API for plugging in archivers via python.  That may be the correct
level to be looking at this.

Now the important part -- why an archive store is more integral than the
current architecture makes it out to be...

One way to look at this is conceptually.  Mailman2 is what I've come to
think of as a complete mailing list system.  By contrast mailman3-core is
only a mailing list manager.  Mailman3 contains the information necessary to
send messages to an address and have those message disseminated to a wider
audience.  By itself, this is just fancy management of email aliases.
Mailing lists seem to be something more than this.  In addition to being
management of where email is sent, they're also repositories of knowledge on
a particular subject.  This is the role filled by archives.

One could also look at it from a sysadmin standpoint.  If a sysadmin wants
to deploy mailman3 with archives.  And wants to have a forum-like interface,
an nntp interface, a standard archives interface, and a REST interface to
the archives are they going to want to set up for different storage
technologies for those, import the generic archives into all four of those,
and then maintain and update the storage technologies to keep them safe and
secure?  Will they want to buy warrantied storage for all of them?  I think
that theyll be happier if the design of our system could consolidate those.

A different way to look at this is from a programmers standpoint.  Many of
the interfaces to archives that were talking about are going to share common
needs.  They need access to the email messages.  They need to know how the
email messages thread together.  They're going to want to search the
messages.  Under the current scheme, programmers will be creating very
similar code to access the email messages in their particular store even if
they all choose to use the same underlying storage technology.

At the beginning I said that I was only in mild disagreement... where's the
qualifier come in?  I think that what we have with mailman3 right now is
something like this:

[mailman3 core] -- maintainance of the list metadata, sending and receiving
                   provides a REST API
    [Web UIs] -- web ui to the Core functions
    [Archivers] -- mailing list storage and user interface to those stored
                   messages.

I think we should look into something a little more symmetrical:

[mailman3 core] -- maintainance of list metadata, sending and receiving,
                   provides a REST API
    [Web UIs] -- web ui to Core functions
    [Archive-stores] -- stores the messages sent to the mailing lists.
                        Provides a (REST?) API to apps built on top of it
    [Archiver UIs] -- web ui, nntp interface, REST API (if not implemented
                      at the storage layer), etc to the archive-store

By splitting the archive storage from the archive UI similar to how
mailman3-core splits with the web ui, we can allow a sysadmin to choose one
archive-storage for all of the archive front-ends that they run on their
systems.

Question: Why have multiple stores?  The big reason is that archives are
being much more rapidly developed right now.  So I anticipate that people
are going to be working on different storage technology with different
tradeoffs.  One storage might be faster.  Another might be more generally
available.  We'll have to reexamine this in the future.  It's possible that
we'll find one storage system that is perfect for all cases.  It's also
possible that we'll find all storage solutions have tradeoffs in which case
we'll likely want to support third-party stores forever.

Question: This is all dangling off of the archiver interface for mailman3
anyway so how can we affect the outcome?  Well, in some ways people can
create anything they want in there so we cant enforce a solution.  However,
if we think that it's desirable, we can certainly document this (maybe with
an interface if we go the python route for that layer of API or with
a specification of what the REST API should look like for that.)  We can
also enhance our current archivers to provide the API that we come up with.
I have a feeling that the prototype archiver with maildir will be a little
slow but if it provides the API and comments about separation between
core, storage, and archive UI it gives people a starting point to
creating their own.

Question: Where do we start?  I think that we'll either succeed or fail very
quickly by trying to define what the API between archive-store and
archiver-ui should look like.  We'll either be able to agree on a common set
of features there (from which we'll be able to go forth and create our own
archive-storage plugins) or we'll decide that we all need/want to do
different things that no common API can address.  If there's no common API
definition then we won't be able to do any of the rest of this so there
won't be any sense continuing down that path.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120424/52c842f7/attachment-0001.pgp>