[Email-SIG] API for email threading library?

Fri Jan 6 04:22:10 CET 2012

David, thanks for the follow-up.

R. David Murray <rdmurray at bitdance.com> wrote:

> On Thu, 05 Jan 2012 20:21:08 -0500, Barry Warsaw <barry at python.org> wrote:
> > On Jan 05, 2012, at 09:55 AM, Bill Janssen wrote:
> > 
> > >Folks, I'm working on an implementation of RFC 5256 email threading,
> > >designed so that it could fit as a submodule in the "email" package, if
> > >such a think was ever seen to be useful.
> > 
> > I really like the idea of threading support being included in the email
> > package.  (I admit that I don't have time right now to read the RFC.)  My
> > general thoughts are that the actual messages needn't be included in the
> > thread collection, but perhaps just Message-IDs.  That would allow an
> > application to store the actual message objects anywhere they want, and would
> > reduce space requirements of the thread collection.
> 
> I don't have time to read the RFC either :(.  But from a skim of the
> first bits, my immediate reaction is that the best thing to do is to break
> everything down into as many discrete components as practical (pluggable
> thread storage, thread construction (which presumably takes duck typed
> Message objects containing at least the relevant headers) with different
> subclasses or plugins for the different sorting algorithms, thread query,
> etc) and keep them as decoupled as possible.  That would give a server
> implementer the greatest flexibility.

That sounds good to me, too.  Let me think about pluggable thread
persistence a bit more -- pluggable might work better than subtypes
there, which is the path I've been going down.  The key question is what
would we want to be able to do with a re-vivified thread store.  If we
want to be able to add new messages to it, we need to have access to the
"five headers" of each of the messages, either by saving them, or by
having access to the message store.  If not, we can just save the
message-IDs.  (It would be nice if we could use fixed-size hashes of the
message IDs instead of strings, but that would require a message store
which understood that concept.)

On the other hand, if we're adding a message, presumably we also have
access to the message store, and could retrieve the "five headers"
therefrom given the message-id -- though that might be an expensive
operations for large message stores.

Interesting set of metadata requirements on the pluggable design, both
for the thread store and the message store.

> You'll probably want to noodle on the various APIs and make some
> concrete (but not fully fleshed out) proposals for discussion.  That's
> the procedure that seemed to work best when we were working on the
> email6 API.

Think of this as the noodling :-).

> PS: If you implement the 'base subject' algorithm I bet we can get
> agreement to check that right in to email.utils before 3.3 :)

I have working code for all of this; right now I'm expanding the test
suite and looking at performance and API optimizations, not to mention
PEP8-ification.

Bill