[Mailman-Developers] Login / User Identification Issues in MM3

Thu Jul 12 09:05:52 CEST 2012

Richard Wackerbarth writes:

 > As an example, suppose that I want to have an "intelligent" ToDo
 > indicator.  As a minimum, I need to be able to get from the data
 > store a list of MLs that have pending requests AND for which I am
 > authorized to do that work.  Typically, this would be some kind of
 > join.

OK.  But in my head, Python is a dynamic language, and we should be
able to use the ORM to dynamically revise the DB schema, and access
such complexly specified data.

 > I don't pretend to know just what our users will want to add. But
 > they should be allowed to write an SQL-type description of their
 > needs and they shouldn't "muck" with the inner workings of the
 > message handling schema to do so.

So by "SQL-type" do you mean they *must* have access to the RDBMS so
they can actually write SQL, or just that the provided interface needs
to allow queries with logical operators?  The "-type" suggests you
mean the latter.

But you've also suggested the former.  I don't like the idea of direct
access to the underlying database, because there isn't necessarily
going to be just one, and it may be that Mailman needs certain kinds
of access to component DBs (eg, updating email addresses) but the
organization would like to have access controls on them based on
another component database (authorized admins, say).  Also, we're not
in a position to require that all databases be kept in, say,
Postgresql.  They may not even be RDBMSes (LDAP member databases,
sendmail alias files).  So we need a layer of abstraction.

 > > The point is that the message distribution agent is
 > > mission-critical; if it goes down you are well-and-truly screwed.
 > > If the web UI goes down, it might not even be noticed for weeks.
 > 
 > I don't buy that.  If you advertise a subscribe URL, or any other
 > function, that is just as much a "mission critical" component as
 > any other.

We'll have to agree to disagree.

 > I don't see user passwords providing much direct use in the mail
 > distribution system.

I don't understand what you're thinking.  You started this thread with
the observation that various components are keeping data in different
places, and that this data is often redundant but not synced or
inaccessible.  To me this suggests a design principle: a single
conceptual database managed by a core component (i.e., one that is
present in every Mailman 3 system).

The implementation of that database may very well include multiple
database systems (eg, the organization's LDAP directory, a Postgresql
database for the tables related to list configurations, and an MTA
alias file for the list addresses).  However, these need to be managed
via a single common API, and the data must not be private to any
non-core component.

The fact that some data are not useful to all components seems to me
to be a red herring.  The point of a DBMS in general is that you can
flexibly access only the data you need for the job at hand, in a form
optimized for the job at hand.

 > > So what?  This extension needs to be done *somewhere*; you aren't
 > > going to be able to avoid it by throwing it out of the core.
 > 
 > No, but I will "compartmentalize" it.

You mean "as a single entity in the distribution of core components",
or "as per-component entities containing what each component needs"?

 > No, I am suggesting that either you implement the functionality by
 > specifying that some particular structure be set in a standard
 > database (and provide a reference implementation of doing so) or

I think that's a non-starter.  We are not in a position to specify
that there even *be* a standard database backing our API, unless we're
willing to push the redundancy/inaccessibility problems to the next
higher level by copying databases from organizational sources
*outside* of Mailman *into* Mailman-only databases.  I consider that
unacceptable; use of external databases for subscriber lists is a
high-frequency RFE, and it would be *way* higher if it weren't for the
extremely high quality of MM-U participants, most of whom check the
FAQ/tracker and notice that there already is an RFE on file.  AIUI,
Barry does too.

 > that you specify REST interfaces that implement the appropriate
 > functions and REQUIRE that all components manipulate that data ONLY
 > through those interfaces.
 > 
 > The REST interface is not a single entity, but a collection of
 > components that inter-operate.

This makes no sense to me.  I see the architecture as

          +--------------+             +-------+
          |   Message    |             |       |
          | Distribution |  . . . . .  | WebUI |
          +--------------+             +-------+
                 \            |            /
                  \           |           /
                   \          |          /
                  +-----------------------+
                  |       REST API        |
                  +-----------------------+
                    /　　　　 |         \
                   /          |          \
                  /           |           \
          +------------+             +------------+
          | Subscriber |  . . . . .  |   Social   |
          |    List    |             | Networking |
          |            |             |    Data    |
          +------------+             +------------+

where the "MD" component may perceive a member in terms of only
subscriber data (i.e., something on the order of (FullName, Email,
BounceCount)), while the "WUI" component might be interested in
something like (Avatar, FullName, Email, IsATroll).  (Of course the
lower ellipsis also include a site config DB and a list config DB.)

To my mind a Pythonic base REST API would return MemberObjects with
appropriate properties, and the properties would be turned into DB
queries on access.

For performance-critical cases there would be a separate .query()
method on MemberObjects that would look up a vector of attributes in
one DB query.  Also a .select() method on the MailmanDB object which
would return a list of MemberObjects with specified properties,
optionally as a (MemberObject, *values_of_requested_properties) tuple
or dict.

 > Further, "each non-core module will do it differently and
 > incompatibly" is a red herring. There MUST be a SPECIFICATION of
 > the interface and EVERY implementation MUST meet those
 > REQUIREMENTS. What ever else it does will not affect any other part
 > of the system.

Have you ever told a baby to stop sucking their thumb, and use the
pacifier?  You have to pull the thumb out to get the point across.  In
the same way, there's going to have to be one implementation, and that
implementation will be distributed with the core.  Otherwise there
WILL be a SPECIFICATION of the interface and EVERY implementation WILL
meet those REQUIREMENTS (except where the implementer finds it
inconvenient), and we're back where we started.

 > That is because you have not followed the principles and allow
 > "someone else" to provide that service.

True.  (I wish you'd stop using "you" in this kind of statement; it
isn't true, I didn't code any of this.  And it doesn't matter who
did.)  Announcing principles isn't going to help enough, though.
Python operates on the basis of "consenting adults" and can't force
anybody write a program in a particular way.  Unless the API is
actually provided in *every* Mailman 3 distribution, and is well-
enough designed to be TOOWTDI, implementers will work around it.

 > >> I view your argument as the message handler claiming "I'm special!
 > > 
 > > It is.  First, it is mission-critical; nothing else is.
 > 
 > And the underlying RDBMS, the MTA, etc. are not?

This confounds levels of architecture.

 > This is my objection. IF some particular data is exposed, then it
 > should be maintained by one handler, without back doors.  If that
 > handler is local, then the interface need not serialize the data
 > and transmit it, but the access should be isomorphic to doing so.

That's not an objection, that's a somewhat more precise restatement of
what I wrote.

 > Credentials should be kept in a separate box. And that box should
 > be kept where ever it best fits in the overall data flow.

Precisely.  Since databases will be needed by all components when
present, they should be kept in or with a component that will always
be present.  That's what "core" means.

 > From a design perspective, it should be easy to place it anywhere
 > the installer desires.

No.  That exposes an implementation detail.  As far as installers are
concerned, the database *is* the API.  Where it is located is none of
their business.

There will need to be a little leakage here, because admins will want
to link the Mailman DB to existing organizational DBs.  So the
possibility of specifying an existing external database needs to be
considered.  But this is only slightly more than the amount of
information required to configure Mailman's own PostgreSQL or MySQL
database, and these are not going to be "placed" by a Mailman admin,
but rather configured and accessed from a provided installation
(whether by the user organization, or by an OS distro).  So I don't
see a need to make a big distinction here, except that the "own"
database will have a schema designed for Mailman, but an external
database will need some kind of "adapter" to match schema.

 > For distribution, a reference implementation of EVERY interface
 > should be included.

I don't see how that's possible in your design, since you propose to
allow components to implement their own databases.

 > And substituting a different implementation should be a simple as
 > excluding the distribution version and dropping in the alternate.

Sure, but is there a reason why this might be difficult?  ISTM that
Python's orientation to duck-typing will make this happen naturally.
(I don't mean to ignore the possibility of problems, but if you have
something specific in mind we can be careful to avoid that in the
design process.)