Random Number

Paul Rubin http
Sat Mar 27 16:44:14 EST 2004


talktojamesblair at yahoo.com (james blair) writes:
> I am generating a random number using
> random.randint(1,10000000000)

At least in Python 2.2, 10000000000 is not a valid arg to randint, which
has to take two int (not long) values.  10000000000 won't fit in 32 bits.
I'm not sure about Python 2.3 or 2.4.

> Whats the possibility that the numbers generated will be same when
> generated by 100 users at the same time?

There's two levels where you have to think about that questions:

1) Is the underlying generator biased enough to make such a collision
more likely than pure chance?  That would be a failure of what's
called the "poker test" (Knuth vol. 2) and Python's PRNG is supposed
to be good about that, so you shouldn't get extra (or too few)
collisions that way.

2) What's the likelihood of such a collision if the numbers from the
generator are really random?  Let N be the number of distinct values,
in this case N= approx. 10**10.  Let k be how many numbers you draw,
i.e. in this case k=100.  Let the random numbers be a[1],a[2],...,a[N].
If there's no collision, it means that for every i,j with i,j in 1,2,...,k,
a[i] != a[j].  There are k**2/2 such (unordered) pairs, and for any i,j,
a[i]!=a[j] with probability 1-1/N.  So the chance of all the pairs
being unequal is (1-1/N)**(k**2/2).  That's about exp(-k**2/(2*N)),
or about 1 in 2 million for your parameters.  The case N=365, k=23 is the
famous "birthday paradox".  It says that if you have 23 people in a
room, there's a better-than-even chance that some two of them have the
same birthday.  In general, if you have a set of random numbers with N
possible values, you can expect to see a collision after seeing
O(sqrt(N)) numbers.  Those collisions are called "birthday collisions"
after the birthday paradox.

> Whats the best method to generate random numbers so that they are most
> likely unique??

If they're random, there is a chance that they will collide, and the
only way to make that less likely is to use a bigger range.



More information about the Python-list mailing list