[Mailman-Developers] Login / User Identification Issues in MM3

Thu Jul 12 18:51:12 CEST 2012

... taken out of sequence ...
>> That is because you have not followed the principles and allow
>> "someone else" to provide that service.
> 
> True.  (I wish you'd stop using "you" in this kind of statement; it
> isn't true, I didn't code any of this.  And it doesn't matter who

I apologize for my terminology. Rather than using the second person, I should use the third. However,
no where do I mean "you" to reflect on you personally. I tend to use personal pronouns as a short form representing "the argument/proposal/design that, in the context of the message, is being described by a particular person", as distinguished from one which another person has described/advocated. Similarly, if I were to use "Barry" in that context, I would be referring to an idea that he has described. I do not know, necessarily, if any methodology is the preferred approach of the individual describing it.

There seems to be two fundamental design strategies being discussed.
One of them has a monolithic data store and the other has a distributed store.
Barry has expressed some reservations about overloading a monolithic data store with data extraneous to the fundamental mission of message handling.

I have expressed concern in requiring any implementation to maintain related data in a split format.
I recognize that there will be cases where this is necessary (for example the Launchpad case as described by Barry in another message). But, as he notes, such implementations tend to be "brittle". Especially where there are multiple components which can alter the data.  But, unless it is a constraint external to MM, I do not believe that such a restriction should be introduced.

There is also an issue of what the term "core" means. Perhaps you have been referring to a distribution package. I have been referring to one component of such a package, in particular that component which interacts with the MTAs and redistributes messages. I consider the processing of administrative messages to be a separate component. And I consider the storage of configuration information to be yet another component. In my view, each component extends only as far as its parts interact with the same private data representation.

On Jul 12, 2012, at 2:05 AM, Stephen J. Turnbull wrote:

> Richard Wackerbarth writes:
>> I don't pretend to know just what our users will want to add. But
>> they should be allowed to write an SQL-type description of their
>> needs and they shouldn't "muck" with the inner workings of the
>> message handling schema to do so.
> 
> So by "SQL-type" do you mean they *must* have access to the RDBMS so
> they can actually write SQL, or just that the provided interface needs
> to allow queries with logical operators?  The "-type" suggests you
> mean the latter.

I do mean the latter. But, if the real underlying database is a RDBMS, then, within the "black box", these queries probably should be implemented by translating them to real SQL queries and passing those to the RDBMS.
There has been a lot of work, by many far more qualified that we, to handle such details within the RDBMS. We should not attempt to reinvent their wheel.

> But you've also suggested the former.

I have suggested it as an alternative implementation. I do so only because that strategy exposes a powerful resource and avoids the burden of adding a new mechanism in order to meet the requirement for customized extensions and access that need to interact efficiently with the data that MM needs to have maintained.

>  I don't like the idea of direct
> access to the underlying database, because there isn't necessarily
> going to be just one, and it may be that Mailman needs certain kinds
> of access to component DBs (eg, updating email addresses) but the
> organization would like to have access controls on them based on
> another component database (authorized admins, say).  Also, we're not
> in a position to require that all databases be kept in, say,
> Postgresql.  They may not even be RDBMSes (LDAP member databases,
> sendmail alias files).  So we need a layer of abstraction.

I agree that we need the abstraction.

>> I don't see user passwords providing much direct use in the mail
>> distribution system.
> 
> I don't understand what you're thinking.

First, we seem to have a different conceptual model of MM.
I view that which is being called "core", not as a single entity, but a collection of components, most of which are critical to the operation of the system.
Among those components, I distinguish message routing and distribution, configuration storage, and processing of administrative messages.

The first is what I have been calling the message handling. It interacts with the MTAs, maintains queues of partially completed work, etc.

The second is critical in that it provides the customization information which causes each mailing list to be distinctive.
It can be further subdivided into structural configuration (the location of various interfaces, the parameters defining the lists, etc.), rosters of subscribers, and subscription preferences.

The third component implements the processing of messages which are designed to alter the state of the configuration storage and/or the state of messages queued in the message handler. I do not consider this element "critical" in that the messages which it will process can be queued and handled later, or the component can be omitted entirely in a system that utilizes a webUI as an alternative access to handle those administrative functions.

> You started this thread with
> the observation that various components are keeping data in different
> places, and that this data is often redundant but not synced or
> inaccessible.  To me this suggests a design principle: a single
> conceptual database managed by a core component (i.e., one that is
> present in every Mailman 3 system).

Yes, that is how I started the thread. However, you misinterpret the requirement for a monolithic database.
Certainly a monolithic database would be one way to accomplish DRY storage of the data, but it can also be accomplished in a distributed manner. What I am suggesting is that in a distributed system, no component of the system has the right to demand that it have the exclusive right to be the keeper of certain shared data. But, further, 
that any component taking on that responsibility should also be responsible for the storage of any related items.

> The implementation of that database may very well include multiple
> database systems (eg, the organization's LDAP directory, a Postgresql
> database for the tables related to list configurations, and an MTA
> alias file for the list addresses).
Agreed.

>  However, these need to be managed
> via a single common API, and the data must not be private to any
> non-core component.
I would agree only if you drop the "non-core". Each component may have "private" data. But that data cannot include any data that needs to be exposed by the API.

> The fact that some data are not useful to all components seems to me
> to be a red herring.  The point of a DBMS in general is that you can
> flexibly access only the data you need for the job at hand, in a form
> optimized for the job at hand.
This is the reason that utilizing, and exposing, the database engine is an attractive way to implement the storage.

>>> So what?  This extension needs to be done *somewhere*; you aren't
>>> going to be able to avoid it by throwing it out of the core.
>> 
>> No, but I will "compartmentalize" it.
> 
> You mean "as a single entity in the distribution of core components",
> or "as per-component entities containing what each component needs"?
> 
>> No, I am suggesting that either you implement the functionality by
>> specifying that some particular structure be set in a standard
>> database (and provide a reference implementation of doing so) or
> 
> I think that's a non-starter.  We are not in a position to specify
> that there even *be* a standard database backing our API, unless we're
> willing to push the redundancy/inaccessibility problems to the next
> higher level by copying databases from organizational sources
> *outside* of Mailman *into* Mailman-only databases.  I consider that
> unacceptable; use of external databases for subscriber lists is a
> high-frequency RFE, and it would be *way* higher if it weren't for the
> extremely high quality of MM-U participants, most of whom check the
> FAQ/tracker and notice that there already is an RFE on file.  AIUI,
> Barry does too.

I agree. I consider the ability to store "rosters" and/or user information in databases which are not 
under the control of MM is something that I would make a design requirement. But, going along with that, use of such external storage also negates MM's responsibility to provide management for that data.

>> Further, "each non-core module will do it differently and
>> incompatibly" is a red herring. There MUST be a SPECIFICATION of
>> the interface and EVERY implementation MUST meet those
>> REQUIREMENTS. What ever else it does will not affect any other part
>> of the system.
> 
> Have you ever told a baby to stop sucking their thumb, and use the
> pacifier?  You have to pull the thumb out to get the point across.  In
> the same way, there's going to have to be one implementation, and that
> implementation will be distributed with the core.  Otherwise there
> WILL be a SPECIFICATION of the interface and EVERY implementation WILL
> meet those REQUIREMENTS (except where the implementer finds it
> inconvenient), and we're back where we started.

There is going to have to be one REFERENCE implementation and that implementation will be sufficient to get a minimal system operational.  But, because that implementation will not meet the operational needs of most users, there will be alternate implementations. You cannot stop that. You can only hope that those implementations will meet the specification.