[Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input

Barry Warsaw barry at list.org
Tue Apr 24 01:53:22 CEST 2012


On Apr 20, 2012, at 01:19 PM, Jeff Breidenbach wrote:

>1) Terri is exactly right. The reason for including list identity as
>part of the hash calculation is for cross-posted messages. An
>archiving service shows context. Here's the message AND the thread it
>fits into, AND information about the list it travelled over AND the
>ability to search that list further. Archives need to know the list to
>provide context.

Agreed, but I think you'll get all that information anyway, without it being
expressed in the hash.  You'll get a full copy of the posted message, so
you'll get the Message-ID, To header (i.e. the posting address), List-Post (if
there is one), List-ID, etc.

>2) The reason mail-archive.com uses List-Post and not List-Id in the
>calculation is because every list, RFC2369 compliant or not, has a
>concept of a posting address. It is natural idea, easy to think of and
>understand. Hence all mail-archive.com archives are keyed off of
>posting address. It would be technical possible (but an architectural
>pain) for mail-archive.com to calculate using List-Id. We'd probably
>not bother and instead store whatever was calculated by mailman and
>placed in the Archived-At: header. Okay, I'll admit my prejudice. I've
>always found List-Id annoying, and wish that it didn't exist.

Note that the message you receive may not have a useful List-Post header at
all!  From RFC 2369:

3.4. List-Post

   The List-Post field describes the method for posting to the list.
   This is typically the address of the list, but MAY be a moderator, or
   potentially some other form of submission. For the special case of a
   list that does not allow posting (e.g., an announcements list), the
   List-Post field may contain the special value "NO".

(I think neither mm2 nor mm3 does this right.  See LP: #987563)

>3) As long as things are changing, I want to mention that these URLs
>feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters. I
>think trimming to a 64 bit (13 character) hash is plenty. According to
>wikipedia collision tables, with the shorter hash we'd expect to get
>our first collision after archiving 5 billion messages. That's 50X the
>current corpus size of public archival services like GMane. And it
>isn't like an occasional hash collision is a big deal or a security
>problem. http://en.wikipedia.org/wiki/Birthday_attack

Let's say we take the lower 80 bits of the SHA1.  After base32 encoding, that
leaves us with 16 bytes.  Of course, we could also use the full 160 bit SHA1
hash, and take only the lower X number of bytes after the base32 encoding.
I'm all in favor of a shorter URL, but someone with better Maths-Fu will have
to propose a specific algorithm that adequately trades off collisions for
human-friendliness.  Also, note the implications of increased collisions on
the whole argument, which I brought up in my previous message.

>3b) For that matter, a sequence number would also do the trick, but I
>can understand that this is much more dangerous; it is easy for a
>sequence number to get reset and cause all hell to break loose.

It would also be nearly impossible to preserve the zeroth principle, that
Mailman and the archiver can agree on the permalink for a message with no
communication between them.

>4) I'm really not that picky. Our archival service could deal with all
>sorts of URLs, including the ones Terri was trying to avoid, such as
>http://example.com/archiver/listname.example.com/$hash
>In fact, we've found that lots of small, per-list databases have speed
>and reliability advantages over big global databases. But I also like
>short URLs. Bottom line, please don't let these comments delay or
>derail forward progress.

No worries!  We'll hash (pun intended ;) this out in plenty of time before 3.0
final.  With Richard's suggestion of a version number, we could even roll out
updates in future versions, although it would probably be more of a PITA for
you by then, than us. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120423/3a52860b/attachment.pgp>


More information about the Mailman-Developers mailing list