binary file compare...

Adam Olsen rhamph at gmail.com
Thu Apr 16 15:03:43 EDT 2009


On Apr 16, 8:59 am, Grant Edwards <invalid at invalid> wrote:
> On 2009-04-16, Adam Olsen <rha... at gmail.com> wrote:
> > I'm afraid you will need to back up your claims with real files.
> > Although MD5 is a smaller, older hash (128 bits, so you only need
> > 2**64 files to find collisions),
>
> You don't need quite that many to have a significant chance of
> a collision.  With "only" something on the order of 2**61
> files, you still have about a 1% chance of a collision.

Aye, 2**64 is more of the middle of the curve or so.  You can still go
either way.  What's important is the order of magnitude required.


> For "a few million files" (we'll say 4e6), the probability of a
> collision is so close to 0 that it can't be calculated using
> double-precision IEEE floats.

≈ 0.000000000000000000000000023509887

Or 42535296000000000000000000 to 1.

Or 42 trillion trillion to 1.


> Here's the Python function I'm using:
>
> def bp(n, d):
>     return 1.0 - exp(-n*(n-1.)/(2.*d))
>
> I haven't spent much time studying the numerical issues of the
> way that the exponent is calculated, so I'm not entirely
> confident in the results for "small" n values such that
> p(n) == 0.0.

Try using Qalculate.  I always resort to it for things like this.



More information about the Python-list mailing list