[Mailman-Developers] Parsing and Rendering rfc8222

Barry Warsaw barry at python.org
Wed Jul 5 19:30:28 EDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 4, 2006, at 3:44 PM, emf wrote:

> Here's where I'm at, grouped functionally:
>
> * Need to convert rfc8222 to xml/html
>
> I haven't found anything substantial via searching. My next step is to
> go spelunking in MailManager code and other python-webmail  
> packages. If
> anyone knows good trees in this forest, please clue me in.

You might also poke around www.divmod.org and the Chandler project.   
I believe both of them are in a similar space and there may be  
components that can be borrowed from those projects to do this kind  
of thing.

> * Want to provide feeds (rss/atom/YourMommasSyntaxFormat)
>
> Right this second I'm planning on using pyfeed [1]; is there anything
> else I should consider?
>
> [1] http://home.blarg.net/~steveha/pyfeed.html
>
> * mbox thread indexing on messages
>
> I plan on using [2] to generate mbox thread indexes for rapid  
> navigation
> of lists. Any suggestions for more robust variants would be welcome;
> feedback on how to handle threading for message-id-less messages would
> also be welcome.
>
> [2] http://benno.id.au/code/archiver/jwzthreading.py

I haven't looked at either package, but JWZ does have the  
authoritative word on threading, AFAICT.

The question of Message-IDs is the interesting one because it's  
clearly the most natural unique identifier to use.  However, while  
the RFC says it /should/ be unique, there's actually no guarantee  
that it /is/ unique, or even present.  What should Mailman do if it  
sees a Message-ID collision?  What should it do if there is no  
Message-ID?

It's been argued that Mailman should clobber any Message-ID it sees  
and just overwrite it with one it can guarantee is unique (for any  
specific installation, that is).  I don't like that solution because  
of the negative effects on threading when that message is received  
through Mailman by other systems or applications.  It's also been  
suggested that we just don't worry about it, because the chances of  
collisions in practice are very small.  Yeah, okay, but what if it / 
does/ happen?  Seems like you'd still have to munge Message-ID in  
that case.

It's also been suggested that Mailman assign its own unique  
identifier whenever it forwards a message to a list membership.  I  
like this the best, and it would have to be called something like X- 
List-Message-ID or some such.  I've long favored a solution where the  
X-List-Message-ID could be calculated from components of the message  
that would have a high probability of being immutable, even if a copy  
of the message was received out-of-band from Mailman.  IOW, if an  
archiver received a message before Mailman could calculate the X-List- 
Message-ID (or if it received it after some intermediate tool  
stripped that header), it could perform the same calculation and  
would end up with the same url, probably using List-Archive in that  
calculation.

I'm thinking something along the lines of sha1 hashing Message-ID and  
perhaps Date.  RFC 2822 $3.6 says that the only required headers are  
the origination date (Date:) and originator address fields (From: and  
possibly Sender: and Reply-To:).  Those seem like good candidates to  
base a hash on, but fields like Subject and the body of the message  
are probably unusable.  Then again, we have to watch out for  
originator header munging. :/

> * full-text indexing
>
> pylucene seems to be the obvious choice; anything else I should
> consider? Anyone know of good pylucene/web UI glue code out there?

Just keep in mind that of course, Mailman is GPL so anything we  
bundle has to be GPL-compatible.

- -Barry



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iQCVAwUBRKxLlXEjvBPtnXfVAQJsBwP/ZvTAHqZaBGA8MR0PH6Oq5c8QCqUBX2tr
JjRDT89wQYJqE+dY3tDMrPIytJBKiE50n9usfURzlikq517NG79hYfMMYfZM550K
Ua3a9oBrBTzLW+SpEUaM8KT+QakqkDNY3ro6e3KnqUhl3MwwRii9X178m7pYAHRH
Sc87Ps/1r8Y=
=OH6a
-----END PGP SIGNATURE-----



More information about the Python-list mailing list