[Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 24 09:41:41 CEST 2012


On Tue, Apr 24, 2012 at 1:40 PM, Jeff Breidenbach <jeff at jab.org> wrote:
>>Is 4 bytes too short?
>
> Four characters is only about a million combinations. First collision
> is 50% likely at 1200 messages, and multi-million message databases
> are completely screwed.

If we're willing to impose disambiguation on the user (and ability to
find and report all matching messages on the UI), then the questions
to me would be

0. Assume a 10 million message archive.
1. What percentage of permalinks need another click?
2. What percentage of permalinks will result in a list of more than 10 matches?

Rationale for 0: 10 related lists X 20 years X 365 days X 100
messages/day.  I can imagine people wanting to index into such a
corpus.
Rationale for 1: Obvious, I hope.
Rationale for 2: Maybe I'm just getting old, but that's the number of
lines I can comfortably scan in a glance.  FVO of "10" that suit you,
I guess.

Note that, like Barry, I'm assuming disambiguation will be needed for
x-posts in any case.  WDOT?



>
> Bottom line: how big a database do we expect to have, and amongst
> those messages, how many collisions are considered acceptable?
>
> -Jeff
>
> PS. These numbers assume a well balanced hash. This paper suggests
> SHA-1 is pretty good in non-adversarial situations, but I'm not an
> expert.  http://cseweb.ucsd.edu/~mihir/papers/balance.html
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/stephen%40xemacs.org
>
> Security Policy: http://wiki.list.org/x/QIA9


More information about the Mailman-Developers mailing list