[Mailman-Developers] Speaking about kitties (or archivers)

Sat Jun 2 03:33:50 CEST 2012

On Apr 24, 2012, at 11:12 AM, Toshio Kuratomi wrote:

>Ive been thinking about this and I'm in mild disagreement.  I think that
>a mailing list system should give people an archive-store which is acessible
>behind a generalized API.

I'm warming up to this.

The IArchiver interface is generic enough to support both internal and
external archivers.  If there are deficiencies in either, we can fix the API,
as long as both use cases are supportable, in a manner similar to
IArchiver.permalink() returning None if the archiver doesn't support stable
urls.

(A known omission from the current IArchiver API is that there's no way to
access attachments.  Does anybody have good ideas about that?)

>The store may be accessible via a REST API but I'm not certain that its the
>correct level to deal with when talking about it in this contect.  The
>current mailman3 doesn't have an API for plugging in archivers via REST...
>it has an API for plugging in archivers via python.  That may be the correct
>level to be looking at this.

From a systems perspective, yes.  Archivers must be enabled system-wide via
the config file, but I think we should allow individual lists to opt-in or
-out of system-enabled archivers.  I'm on the fence as to what to do about the
prototype archiver, which is beginning to seem much more like the default
archiver-core, i.e. sans ui.

>I think we should look into something a little more symmetrical:
>
>[mailman3 core] -- maintainance of list metadata, sending and receiving,
>                   provides a REST API
>    [Web UIs] -- web ui to Core functions
>    [Archive-stores] -- stores the messages sent to the mailing lists.
>                        Provides a (REST?) API to apps built on top of it
>    [Archiver UIs] -- web ui, nntp interface, REST API (if not implemented
>                      at the storage layer), etc to the archive-store

This is compelling.

>Question: Why have multiple stores?  The big reason is that archives are
>being much more rapidly developed right now.  So I anticipate that people
>are going to be working on different storage technology with different
>tradeoffs.  One storage might be faster.  Another might be more generally
>available.  We'll have to reexamine this in the future.  It's possible that
>we'll find one storage system that is perfect for all cases.  It's also
>possible that we'll find all storage solutions have tradeoffs in which case
>we'll likely want to support third-party stores forever.

I always envisioned the core's storage being splittable into three main
partitions.  One would be the list-centric data, another would be the
user-centric data, and the third would be the message-centric data.  If you
look carefully for example, you'll see that there are no direct foreign key
references between members and the mailing lists they're associated with.
This link is by fqdn listname, *not* mailinglist table ids.  This is
deliberate.

(It's entirely possible the implementation doesn't actually allow these three
partitions to be stored in completely separate places.  I'd consider that a
bug.)

OTOH, I don't think it makes sense for the core to rely on more than one ORM.
For now, that's Storm.

(I'm slightly lying here because the technology that shows the most promise
for supporting schema migrations is Alembic which is based on a stripped down
version of SQLAlchemy.  But migrations are probably a completely off-line
operation.)

>Question: This is all dangling off of the archiver interface for mailman3
>anyway so how can we affect the outcome?  Well, in some ways people can
>create anything they want in there so we cant enforce a solution.  However,
>if we think that it's desirable, we can certainly document this (maybe with
>an interface if we go the python route for that layer of API or with
>a specification of what the REST API should look like for that.)  We can
>also enhance our current archivers to provide the API that we come up with.
>I have a feeling that the prototype archiver with maildir will be a little
>slow but if it provides the API and comments about separation between
>core, storage, and archive UI it gives people a starting point to
>creating their own.

Some IArchiver implementations will be purely external archivers.  I like that
we can have a Mail Archive implementation, or potentially a Gmane
implementation.  Those are very different from a MHonArc implementation, which
is again different from the prototype (default? built-in? always-enabled?)
archiver.  Having a common API for all of these simplifies the parts of the
core that send messages to the archives, but what happens once the data is
inserted into the different archivers is another question.

Remember too that archiver speed is less important, since that doesn't live in
the critical path for message delivery.  There is a handle that basically
copies the message to the archiver queue, and there's a separate runner that
dequeues those messages and sends them off to the individual archivers, via
the IArchive interface.  So I think the performance of message insertion isn't
something we should worry about for now.

>Question: Where do we start?  I think that we'll either succeed or fail very
>quickly by trying to define what the API between archive-store and
>archiver-ui should look like.  We'll either be able to agree on a common set
>of features there (from which we'll be able to go forth and create our own
>archive-storage plugins) or we'll decide that we all need/want to do
>different things that no common API can address.  If there's no common API
>definition then we won't be able to do any of the rest of this so there
>won't be any sense continuing down that path.

Places to start:

* Look at the IArchiver interface and try to figure out whether it's complete
  from a message-insertion POV.  Maybe in that case, we don't care about
  attachments since the archiver will do whatever it wants with them.

* Look at the IMessageStore API.  Is this complete?  IOW, could you build a
  purely Python-level archiver like HyperKitty on top of this API?  Here's
  where proper attachment handling would probably be necessary.

* How would you want to expose the IMessageStore interface into the REST API?
  My sense is that you could probably take a fairly straightforward
  translation of IMessageStore into REST and *that* would be what you'd build
  the various archiver UIs on top of.  REST needs to answer questions like
  batching which are necessary for efficient transfer of data over HTTP but
  not for direct Python calls.

* Should threading information be part of the IMessageStore, or a separate
  interface?  If the prototype archiver becomes the default implementation for
  the IMessageStore, it probably needs to grow a lot more functionality to
  support threading information.

The way I'm seeing it is that IArchiver is the interface for getting messages
*into* the IMessageStore.  The IMessageStore is the interface for making
Python level queries needed to get the raw messages out of the system, and a
REST API is how you publish this data for the various ui consumers.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120601/1713caf9/attachment-0001.pgp>