[Mailman-Developers] Google Summer of Code: Integration of Search Code
Barry Warsaw
barry at list.org
Thu Mar 29 01:07:47 CEST 2012
On Mar 28, 2012, at 10:29 AM, Stephen J. Turnbull wrote:
>The only tricky issue is that we *do* have to worry about message-ID
>collisions of truly different messages and about messages without message
>IDs, especially for converted historical archives. So the API needs to be
>able to deal with these issues, probably by returning a set or sequence of
>messages.
Mailman 3 itself requires unique Message-IDs. IIRC, the Mail Archive guys
found a very very low collision rate over millions of messages, and I think
all such cases were basically spam. The LMTP runner doesn't yet reject
duplicates, but it should (LP: #967951).
s>I would guess she'll probably store messages in YY-MM/MSGID, or as git does
>in "unpacked" XX/YYYYYYYY... format, where XX are the first two digits of the
>hash ID, and YY... are the remaining ones). But it could easily be backed by
>an IMAP store or something more specialized; we don't really care as long as
>it's object-ID-addressable.
Don't forget too that the LMTP runner automatically adds the X-Message-ID-Hash
header, which is a Base32 encoding of the SHA1 hash of the Message-ID contents
(without the angle brackets). This hash could be used as well.
-Barry
More information about the Mailman-Developers
mailing list