[Mailman-Developers] Update 2.0 - Creating threads to handle dynamic list

Tue Jun 9 14:58:14 CEST 2015

Thanks for the detailed review. Replies inline!

Stephen J. Turnbull wrote:
> Pranjal Yadav writes:
>
>   > Blog: prany.github.io
>   > code: giitlab.com/godricglow/mailman
>
> Sorry about the long brain dump.  Pick some of the issues to address
> now and save the rest for later.
>
> I took a look at your blog.  First I reviewed your proposal blog from
> April 29 (http://prany.github.io/gsoc15.html), and had the following
> thoughts.
>
> 1.  I don't understand why the "X-Original-To" header field is checked
>      after "To".  I imagine there is a reason, but as this header is
>      non-standard, something needs to be said about its presumed
>      semantics and when you would expect it to appear, as well as what
>      MTAs implement it (I don't think Mailman itself does, at least not
>      in our upstream version).
I agree with you, I was checking for "X-Original-To:" due to some 
confusion, I
read /mailman/config/schema and misunderstood the last part where it says
subsequent methods will be re-written with second headers and adding to
the confusion was systers implementation which too checked for both. will
check for "To:" in the header.
> 3.  I'm not sure about the wisdom of having a separate named pipeline
>      for dlists.  I agree with you that the actual pipeline should
>      include different handlers rather than trying to incorporate the
>      dlist logic in existing handlers.  (Of course somebody may want to
>      refactor to exploit commonalities in the code, but IMO your best
>      strategy is to copy the whole calculate-recipients and decorate
>      Handlers and modify them as convenient.)  However, Mailman already
>      allows per-list (2.x allowed per-message! dunno if 3 does)
>      pipelines.
>
>      So I'm thinking that maybe we should put the standard pipelines in
>      variables (a sort of pipeline registry), and *always* attach a
>      pipeline attribute to each list.  To prevent a list from changing
>      the global objects for everybody, we could make pipelines tuples
>      rather than lists.  (We would need to run that by Barry first,
>      though.)
I'm not sure how this would work so I will need a bit more explanation for
this comment. AFAIK all mailing lists are attached with a pipeline from
Base pipeline, Virgin-pipeline, Owner-pipeline or Posting-pipeline, mostly
all messages attach to the 'built-in' or simply default-posting-pipeline
initially. So how exactly should I create those registries from pipelines?
Adding another attribute to the mailing lists will be easy but replacing
lists as pipelines tuples is a pretty complex thought. We'll run it with 
Barry
as you said and then I will post the necessary changes.
>
> 4.  Since Thread.base_message_id is supposed to be a UUID, that (or a
>      hash) would be a reasonable primary key for the Thread.  This
>      would lead to an interesting extension that any message could
>      become a new dlist.  Don't know if that would be useful, but it
>      might be (often threads split).
>
> Now, about the progress report blog of May 31
> (http://prany.github.io/threads-for-dynamic-sublists.html).  First a
> few stylistic comments.
>
> 5.  You start by saying that messages "are not kept in a structure."
>      It's not obvious what you mean by that.  Obviously there are
>      message and metadata classes defined in Mailman.  I guess you mean
>      that you want a history of the messages which might simply be a
>      list of message-ids, or their hashes.  However, your Thread class
>      in the proposal doesn't seem to keep any history.
By "structure" I simple meant any form of order/arrangement that can
be achieved, using history is one good way of doing so. However I simply
planned to make threads to do so.
>
> 6.  I disagree with you about the use of the word "conversation."
>
>      A *conversation* is an ambiguously defined collection of messages
>      that have various commonalities -- a time span, a topic, and a
>      group of participants, at least.
>
>      A *thread* is a collection of messages that can be arranged in a
>      DAG (or even a tree) by using the reply-to relationship (which may
>      be implemented using the In-Reply-To and/or References fields, but
>      I'm referring to the human concept of "reply" here).
>
>      A *dlist* is a collection of messages defined by having a
>      particular mailbox among their addressees.
>
>      As I see it, your goal is to *implement* conversations (or maybe a
>      better word is "facilitate") using d-lists.  I have no objection
>      at all to using conversation in that sense.  However, it's
>      important to recognize that what a human thinks of as a
>      conversation is quite different from what a machine can implement
>      as a thread or a dlist.
>
>      Also, while you have the right to use whatever definitions you
>      want, I think most people who are well-versed in email semantics
>      and implementations will agree with the definition of thread I
>      gave, and also agree that dlists are something quite different.
>      That means you will have a difficult time communicating if you use
>      the words "conversation", "thread", and "dlist" as rough synonyms.
I strongly agree with the definitions and I accept misusing the
term "conversation" which I wrote to convey an idea as to how
threads are understood in day to day life. I will make sure I don't
confuse with all these terms again.
>
> 7.  I don't understand what you mean by "to make threads which are
>      meaningful rather than grouping them visually."  I sort of
>      understand what you mean by "visual grouping", but in fact all
>      MUAs I know of also implement navigation based on the thread
>      structure (normally with a depth-first traversal, and the visual
>      display also corresponds to that order).  On the other hand,
>      "meaningful threads" has no particular meaning to me.
>
> 8.  Under "Requirement" you write
>
>      [W]e aim to use this concept [of threading] only for those mailing
>      lists which are initially 'dlist_enabled' or simply saying dynamic
>      list enabled via mailing list object's attribute.
>
>      but as described in 6, threads are quite a different concept from
>      dlists.[1]
Thanks for pointing this out, I will use above definitions to answer this.
*Dlist* is a collection of messages defined by having a particular mailbox
among their addressees, so when I say a list should be dlist enabled, its
merely a check that the list is good to be a Dlist, now the collection 
of new
messages inside this newly formed Dlist will be arranged in an order using
Reply-To relationship and thus forming a *Thread*.
>
> 9.  Under "Code" you give a simple list of modules to modify, but I
>      think you should describe in more detail what you propose to do.
I will include relevant part of the code with details.
> 10. The link to your repo is broken!  (It's prefixed by your blog for
>      some reason.)
>
> Some comments about the code (I'm looking at "git diff master dlist").
>
> 11. Throughout, I think you should use "dlist" rather than "thread"
>      for naming classes and other identifiers.
Agreed and changed!
>
> 11. In dlist.py, your __all__ variable is wrong, I believe (the dlist
>      module has no dlist attribute).
Thanks again for pointing this out , I will not repeat this.
>
> 12. class DlistCommands probably should derive from the main Mailman
>      commands class.
>
> 13. DlistCommands.__init__() should have an else: statement that gives
>      a useful error message.  Ditto the "ignored" part of __iter__().
I haven't yet started working on "Errors" and that is on my list for 
this week
so I think this won't be an issue next time.
>
> 14. In threads.py, the stub methods in class IThread should raise
>      NotImplemented or something like that.
Same as above.
>
> 15. In message.py, the decision to add msgdata as an attribute to the
>      Message model really needs justification, since the original
>      design carefully separates the two.
Looking into this, I had no clue about the original design and I should
be more careful with what I choose to use, lesson learnt.
> 16. Isn't the assignment to thread_id dead?  Also, use .get() here,
>      not try ... except.  And *never* use a bare except, not even in
>      very early prototype code.  Defining self-documenting Exceptions
>      is trivial.  The problem is that msgdata[] could invoke arbitrary
>      code which could raise things you can't imagine.
Sure, I will use .get() here, I used a bare except as this was an early
prototype which you correctly pointed out.
>
> 17. In lmtp.py you're going to need to deal with dlist names, as well
>      as +new and friends.
>
> Overall I think it's a reasonable first cut, but some things seem to
> be a little careless (like the __all__ lists).
>
>
> Footnotes:
> [1] I actually don't think implementing threads as such would be very
> difficult.  Simply keep a history of Message-IDs (or their hashes) and
> allow users to "subscribe" to the replies to arbitrary messages (and
> to replies to the replies, of course), alternatively unsubscribe (what
> in a traditional MUA would be "killing" the thread).  The hard part
> would be the UI, since most users would not be capable of, let alone
> happy with, specifying message IDs to commands.  I do have some ideas
> about that, though.
>