Generate unique ID for URL
Johannes Bauer
dfnsonfsduifb at gmx.de
Wed Nov 14 06:29:05 EST 2012
On 14.11.2012 02:39, Roy Smith wrote:
> The next step is to reduce the number of bits you are encoding. You
> said in another post that "1 collision in 10 million hashes would be
> tolerable". So you need:
>
>>>> math.log(10*1000*1000, 2)
> 23.25349666421154
>
> 24 bits worth of key.
Nope :-)
> Base64 encoded, that's only 4 characters.
> Actually, I probably just proved that I don't really understand how
> probabilities work, so maybe what you really need is 32 or 48 or 64
> bits.
:-))
When doing these calculations, it's important to keep the birthday
paradox in mind (this is kind of counter-intuitive): The chance of a
collission raises tremendously when we're looking for *any* arbitrary
two hashes colliding within a certain namespace. The probability you've
calculated is the pre-image probability (which you also again need to
multiply with a factor of two, because when trying to collide one given
hash, in the mean case you'll only have to search *half* the namespace
before finding a collision).
There are three things you need to know before you can give an estimate:
1. The permissible probability of a collision (1e-7 in this case)
2. The hash size
3. The worst-case number of elements in the namespace
You neglected 3 completely -- but knowing this is really important. This
becomes obvious when looking at the extreme cases: Let's say you have a
hash of arbitrary size, but only hash one element. The chance of a
collision is *always* zero. Or look at a hash of size 2^n. Then put 2^n
+ 1 elements in the namespace. The chance of a collision is *always* one.
Doing the calculations (formulas can be found on wikipedia on the site
of the birthday phaenomenon), you can come up with these following
bitlenghts of the hash with a 1e-7 probability of collision in respect
to the worst-case number of elements
10k elements: 49 bit
100k elements: 56 bit
1e6 elements: 63 bit
100e6 elements: 76 bit
1e9 elements: 83 bit
1e12 elements: 102 bit
Best regards,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>
More information about the Python-list
mailing list