[Mailman-Developers] Update 2.0 - Creating threads to handle dynamic list

Thu Jun 4 19:28:44 CEST 2015

Pranjal Yadav writes:

 > Blog: prany.github.io
 > code: giitlab.com/godricglow/mailman

Sorry about the long brain dump.  Pick some of the issues to address
now and save the rest for later.

I took a look at your blog.  First I reviewed your proposal blog from
April 29 (http://prany.github.io/gsoc15.html), and had the following
thoughts.

1.  I don't understand why the "X-Original-To" header field is checked
    after "To".  I imagine there is a reason, but as this header is
    non-standard, something needs to be said about its presumed
    semantics and when you would expect it to appear, as well as what
    MTAs implement it (I don't think Mailman itself does, at least not
    in our upstream version).

2.  I also wonder about the deviation from normal Mailman semantics
    which I believe will pick up the address from the envelope sender.
    (Of course there's an option to refuse posts where the message
    headers do not contain the list as an addressee, but that option
    isn't always selected.)

    For Systers, I'm sure that these semantics are fine, but in
    general I would think many list admins would expect dlist
    semantics to be as close as possible to ordinary semantics.

3.  I'm not sure about the wisdom of having a separate named pipeline
    for dlists.  I agree with you that the actual pipeline should
    include different handlers rather than trying to incorporate the
    dlist logic in existing handlers.  (Of course somebody may want to
    refactor to exploit commonalities in the code, but IMO your best
    strategy is to copy the whole calculate-recipients and decorate
    Handlers and modify them as convenient.)  However, Mailman already
    allows per-list (2.x allowed per-message! dunno if 3 does)
    pipelines.

    So I'm thinking that maybe we should put the standard pipelines in
    variables (a sort of pipeline registry), and *always* attach a
    pipeline attribute to each list.  To prevent a list from changing
    the global objects for everybody, we could make pipelines tuples
    rather than lists.  (We would need to run that by Barry first,
    though.)

4.  Since Thread.base_message_id is supposed to be a UUID, that (or a
    hash) would be a reasonable primary key for the Thread.  This
    would lead to an interesting extension that any message could
    become a new dlist.  Don't know if that would be useful, but it
    might be (often threads split).

Now, about the progress report blog of May 31
(http://prany.github.io/threads-for-dynamic-sublists.html).  First a
few stylistic comments.

5.  You start by saying that messages "are not kept in a structure."
    It's not obvious what you mean by that.  Obviously there are
    message and metadata classes defined in Mailman.  I guess you mean
    that you want a history of the messages which might simply be a
    list of message-ids, or their hashes.  However, your Thread class
    in the proposal doesn't seem to keep any history.

6.  I disagree with you about the use of the word "conversation."

    A *conversation* is an ambiguously defined collection of messages
    that have various commonalities -- a time span, a topic, and a
    group of participants, at least.

    A *thread* is a collection of messages that can be arranged in a
    DAG (or even a tree) by using the reply-to relationship (which may
    be implemented using the In-Reply-To and/or References fields, but
    I'm referring to the human concept of "reply" here).

    A *dlist* is a collection of messages defined by having a
    particular mailbox among their addressees.

    As I see it, your goal is to *implement* conversations (or maybe a
    better word is "facilitate") using d-lists.  I have no objection
    at all to using conversation in that sense.  However, it's
    important to recognize that what a human thinks of as a
    conversation is quite different from what a machine can implement
    as a thread or a dlist.

    Also, while you have the right to use whatever definitions you
    want, I think most people who are well-versed in email semantics
    and implementations will agree with the definition of thread I
    gave, and also agree that dlists are something quite different.
    That means you will have a difficult time communicating if you use
    the words "conversation", "thread", and "dlist" as rough synonyms.

7.  I don't understand what you mean by "to make threads which are
    meaningful rather than grouping them visually."  I sort of
    understand what you mean by "visual grouping", but in fact all
    MUAs I know of also implement navigation based on the thread
    structure (normally with a depth-first traversal, and the visual
    display also corresponds to that order).  On the other hand,
    "meaningful threads" has no particular meaning to me.

8.  Under "Requirement" you write

    [W]e aim to use this concept [of threading] only for those mailing
    lists which are initially 'dlist_enabled' or simply saying dynamic
    list enabled via mailing list object's attribute.

    but as described in 6, threads are quite a different concept from
    dlists.[1]

9.  Under "Code" you give a simple list of modules to modify, but I
    think you should describe in more detail what you propose to do.

10. The link to your repo is broken!  (It's prefixed by your blog for
    some reason.)

Some comments about the code (I'm looking at "git diff master dlist").

11. Throughout, I think you should use "dlist" rather than "thread"
    for naming classes and other identifiers.

11. In dlist.py, your __all__ variable is wrong, I believe (the dlist
    module has no dlist attribute).

12. class DlistCommands probably should derive from the main Mailman
    commands class.

13. DlistCommands.__init__() should have an else: statement that gives
    a useful error message.  Ditto the "ignored" part of __iter__().

14. In threads.py, the stub methods in class IThread should raise
    NotImplemented or something like that.

15. In message.py, the decision to add msgdata as an attribute to the
    Message model really needs justification, since the original
    design carefully separates the two.

16. Isn't the assignment to thread_id dead?  Also, use .get() here,
    not try ... except.  And *never* use a bare except, not even in
    very early prototype code.  Defining self-documenting Exceptions
    is trivial.  The problem is that msgdata[] could invoke arbitrary
    code which could raise things you can't imagine.

17. In lmtp.py you're going to need to deal with dlist names, as well
    as +new and friends.

Overall I think it's a reasonable first cut, but some things seem to
be a little careless (like the __all__ lists).

Footnotes: 
[1] I actually don't think implementing threads as such would be very
difficult.  Simply keep a history of Message-IDs (or their hashes) and
allow users to "subscribe" to the replies to arbitrary messages (and
to replies to the replies, of course), alternatively unsubscribe (what
in a traditional MUA would be "killing" the thread).  The hard part
would be the UI, since most users would not be capable of, let alone
happy with, specifying message IDs to commands.  I do have some ideas
about that, though.