[Mailman-Developers] Login / User Identification Issues in MM3

Thu Jul 12 20:31:51 CEST 2012

Richard Wackerbarth writes:

 > There seems to be two fundamental design strategies being discussed.
 > One of them has a monolithic data store and the other has a
 > distributed store.
 > Barry has expressed some reservations about overloading a
 > monolithic data store with data extraneous to the fundamental
 > mission of message handling.
 > 
 > I have expressed concern in requiring any implementation to
 > maintain related data in a split format.
 > I recognize that there will be cases where this is necessary (for
 > example the Launchpad case as described by Barry in another
 > message). But, as he notes, such implementations tend to be
 > "brittle". Especially where there are multiple components which can
 > alter the data.  But, unless it is a constraint external to MM, I
 > do not believe that such a restriction should be introduced.

I don't think that anybody is considering requiring a monolithic
store, in the sense of putting everything into a single backend DBMS,
because we all agree that it should be possible to take member lists
from an external database and augment them with Mailman-specific
properties that we may not be permitted to store in the external
database.

(Aside: I don't think we should assume that external databases are
necessarily read-only.  For example, I can imagine informal
organizations that would allow Mailman to add new subscribers to the
member directory, or sales organizations that would allow people to
subscribe to product announcement lists and automatically add them the
to CRM database.)

What I propose is a requirement is that any data added to any
databases used by Mailman be accessible via standard Python
introspection techniques.  (In principle and by default, that is;
Mailman already hides some data from some interfaces, such as the
member list.)  For example, if we use the "user as Python object"
model, then the introspection method would simply be the 'dir'
function.  Other possibilities would be to have components register
mutators and accessors for "their" data.

 > There is also an issue of what the term "core" means. Perhaps you
 > have been referring to a distribution package. I have been
 > referring to one component of such a package, in particular that
 > component which interacts with the MTAs and redistributes
 > messages.

I find that highly unintuitive.  The core is the set of functions that
are essential.  The "distribution package" description is an
heuristic.  I.e., "these are the functions that would completely stop
the show if you installed Mailman and discovered any one of them was
not present."

 > I consider the processing of administrative messages to be a
 > separate component. And I consider the storage of configuration
 > information to be yet another component. In my view, each component
 > extends only as far as its parts interact with the same private
 > data representation.

I don't think that's a useful definition, to be honest.  On the one
hand, most functions have local variables, but surely that doesn't
make them components by themselves.  On the other, pretty much
everything in Mailman interacts with mailing lists in one way or
another, but surely none of us thinks of Mailman as a one-component
application.

I think of "component" as a concept that belongs to the art of
programming, and not having a technical definition.  A component is
any body of code and content that is a convenient unit of creation,
maintenance, and administration.  Of course issues of coherence and
coupling will help determine what is "convenient", but I don't think
they're sufficient in themselves.

 > I do mean the latter. But, if the real underlying database is a
 > RDBMS, then, within the "black box", these queries probably should
 > be implemented by translating them to real SQL queries and passing
 > those to the RDBMS.

Sure.  But this is more likely if we have a good ORM (which is a more
Pythonic way of thinking about things) as an interface to the RDBMS,
and all of that is wrapped in a convenient powerful API that allows
the programmer to delegate data persistence to some component of
Mailman.

 > First, we seem to have a different conceptual model of MM.
 > I view that which is being called "core", not as a single entity,
 > but a collection of components, most of which are critical to the
 > operation of the system.

That's not what you said above; above you restrict it to the message
routing and distribution component.  I believe that is the definition
you have been pretty consistently using throughout the thread.  No?

Anyway, I find this one very close to my own thinking.

 > > You started this thread with the observation that various
 > > components are keeping data in different places, and that this
 > > data is often redundant but not synced or inaccessible.  To me
 > > this suggests a design principle: a single conceptual database
 > > managed by a core component (i.e., one that is present in every
 > > Mailman 3 system).
 > 
 > Yes, that is how I started the thread. However, you misinterpret
 > the requirement for a monolithic database.

I think you're misinterpreting my words, actually, though I'm open to
correction by a third party.  By a "single conceptual database", I
mean that there is a single API for accessing persistent Mailman data,
and that you don't have to specify a connection to a database to
access data.  The implementation knows where all the data is stored,
whether that happens to be a single humongous ZODB, or an heterogeous
array of LDAP, SQL, and flatfile data stores.

 > Certainly a monolithic database would be one way to accomplish DRY
 > storage of the data, but it can also be accomplished in a
 > distributed manner. What I am suggesting is that in a distributed
 > system, no component of the system has the right to demand that it
 > have the exclusive right to be the keeper of certain shared
 > data. But, further, that any component taking on that
 > responsibility should also be responsible for the storage of any
 > related items.

I question whether the pain of having an (explicitly) distributed
system is worth the gain.  As you've explained it here, I see it as
setting us up for a situation where each component (including
components that are substitutes performing the same conceptual
function) will make their own decisions about what to store and where,
and what is private and what is public, so that components will
continually need to negotiate with each other over who have authority
and responsibility for certain data.

>From the responses of several people in this thread, I strongly
suspect that most implementers will decide that most of the data they
use is not interesting to other modules and make it private, rather
than spend the effort needed to generalize.  So I think the costs will
be higher and the amount of shared data lower than for a system where
one component is responsible for all connections to databases.

 > I would agree only if you drop the "non-core". Each component may
 > have "private" data. But that data cannot include any data that
 > needs to be exposed by the API.

And how do we know what "needs" to be exposed?  We don't.

I'm sure we can make a killer MLM with a distributed database and each
component storing private data.  What I don't think we can do is make
an MLM that's capable of killing web fora and Usenet, too that way.  I
think it's worth the extra effort to keep things general.