[Mailman-Developers] Login / User Identification Issues in MM3

Thu Jul 12 04:05:43 CEST 2012

Thanks for starting this discussion.  Since the thread's already long, I'm
just going to answer randomly with my own thoughts.

One thing I have a real problem with is defining the database query layer as
the interface between components.  To me that just unacceptably ties us to a
specific database, and/or a specific protocol.  For example, I do not want to
*require* Postgres in order to run Mailman, or to integrate *a* web ui with
the core.  I just think that as convenient as that might seem today, it will
lock us into a system design we're going to regret somewhere down the road.

So let's say for the moment that we agree that all the user data should live
in one place.  I don't have a problem with that conceptually, and I actually
don't care whether that's part of the core or in a separate component.  The
other problem I have is extending the core's data model to include things it
doesn't care about.  When you realize all of that has to be documented and
tested, that just seems like it's adding lots of extra baggage to the core.

For example, today you might want Twitter and Facebook ids in that database.
Five years ago maybe you also wanted an AIM id in there.  Do you today?  Will
you still want Google+ ids in there, or BrowserIDs, or OpenIDs five years from
now?  Yet, if it's part of the core's data model, we have to support it, test
it, document it, go through deprecation cycles, etc. etc.

One of the important design decisions I made was using Zope interfaces to
formally define the touch points between the different components of the
system.  This isn't just for the fun of it; instead, it gives us great
implementation flexibility.

For example, if you need to know what email addresses a user has registered,
you access that through the IUser interface.  Rosters are another great
example of where you access things through the IRoster interface and nothing
else.  Nothing except the implementation of that even cares that they are
implemented as queries and don't exist as *real* objects in the system.  They
can return whatever they want, as long as they conform to the IUser or
whatever interface.

This all might lead to inefficiencies, but I don't think that matters right
now.  It probably will some day, but let's worry about that if and when we
need to.  What we care about now is the *flexibility* and the *stability* of
the system.

For the sake of argument, let's say that all the user information should be
stored in Postorius.  What kind of changes would be needed in the core to keep
its view of the user world in sync with Postorius's view of the world.  No
matter how you slice it, you are going to have two separate processes that
need to be kept in sync.

You actually could, as I think Richard advocates, just expose the SQL queries
to both processes.  You would in theory have to only re-implement a handful of
interfaces to keep the rest of the system humming.  IOW, when the IUserManager
needs to look up a user by their email address, instead of running a query
against the local SQLite database, you would run it against the Postorius
database.  But - and here's the key thing - you would *still* return some
object that implements the IUser interface.  If you do that, you've localized
the changes you have to make to the core and everything else Just Works
(again, in theory ;).

One of the things I've tried to do, with unknown success because nobody's
tried it, is to keep in mind three broad slices of data: the user data, the
list data, and the message data.  So for example, an IMember associates an
IAddress with an IMailingList.  The standard implementation of that doesn't
use a foreign key between the `member` table and the `mailinglist` table.
Instead it uses the fqdn_listname, i.e. a string.  What that *should* mean is
that you could move the user data anywhere you want and not have to also move
the list data and message store data.

There *should* be enough hooks in the system already for a system
administrator to say "I want to use Postorius, so I must enable the Postorius
IUserManager implementation".  For global objects like this, we use the Zope
Component Architecture (ZCA), so in a Postorius-owns-the-world scenario, what
has to happen is that

    usermanager = getUtility(IUserManager)

must return the PostoriusUserManager instance and not the SQLite based
UserManager instance.  Once you've done that, you have to change *nothing*
else in the system because everything talks to that object through the
interface, and as long as that keeps its contract, the rest of the system
should, again Just Work.

I have no idea whether the above will be easy or not, since nobody's tried
it.  But the system design should allow you to do it this way, and I would be
very open to the right hooks, fixes, and extensions to make this possible.
I hope you can see how this approach lets someone run Mailman in many
different configurations, from a core-only system, to Postorius, to a system
where all the user database is in ZODB or already locked up in a proprietary
closed database.

There is another approach of course, which may end up being simpler, if more
brittle.  You could just try to keep the two databases in sync.  It doesn't
matter too much which is the master, you just have to decide.  This is
essentially how Launchpad's integration with Mailman 2 works.  Launchpad is
the master database and whenever something in that database chances that could
affect Mailman, that information is communicated to the Mailman server.  The
details are mostly unimportant, and yes, it does work.  It's been brittle in
the past, but now with enough logging, monitoring, and fail-safes it works
great.

How would you keep these two in sync?  First, if something changes in the
core, the idea is that an event is triggered.  Other components of the system
watch for those events and react to the ones they care about.  For example,
let's say a user changes their password via email command.  Once the core acts
on that change, it will trigger a PasswordChangeEvent which has a reference to
the user effecting that change.  If Postorius was the master database for
passwords, we'd have to add a little event subscriber which, when it got a
PasswordChangeEvent, then talked to Postorius to make that change.  Or maybe
it updated the shared user data component, or made the appropriate SQL UPDATE
call.  The key thing again, is that the core just fires the
PasswordChangeEvent, and other things react to it.

Alternatively, let's say a user changes their password through the web ui.  I
think this case is already covered, because the way to keep that in sync with
the core is to make the appropriate REST call, probably PATCHing the user's
password.

Very likely we don't have enough events defined to cover all the actions that
the core must take (e.g. through email commands).  But events are easy to add
and again, I'm not opposed to adding any events which make the integration
easier.

It's also likely that the REST API is incomplete for every bit of information
Postorius wants to get into the core or out of the core.  Again, it's easy to
extend the REST API, so let's fill in what's missing.

I hope this lays out the basic design constraints that I want to follow.
Maybe it sparks some thoughts about different possibilities.

Cheers,
-Barry