Sorting NaNs

Fri Jun 8 13:45:58 EDT 2018

On Fri, 8 Jun 2018 02:15:02 +0000 (UTC), Steven D'Aprano wrote:
> On Thu, 07 Jun 2018 20:43:10 +0000, Peter Pearson wrote:
[snip]
>> 
>> But gosh, if there are only 2**32 different "random" floats, then you'd
>> have about a 50% chance of finding a collision among any set of 2**16
>> samples.  Is that really tolerable?
>
> Why wouldn't it be? It would be shocking if a sufficiently large sequence 
> of numbers contained no collisions at all: that would imply the values 
> were very much NON random.

[snip]

> . . . I understand that Python's Mersenne Twister implementation 
> is based on 64-bit ints.

OK, I'll relax, particularly since Michael Lamparski's experiment
strongly indicates that random floats are drawn from a population much
larger than 2**16.

You're completely correct, of course, in noting that an absence of
collisions would condemn the random-number generator just as badly
as an excess.  What bothered me was my feeling that a "reasonable
observer" would expect the random-float population to be much larger
than 2**32, and the probably-collision-free sample size to be accordingly
much larger than 2**16, which is, after all, small enough to appear
in many applications.

Exactly what the "reasonable observer" would expect that population to
be, I don't know.  To a mathematician, there's zero chance of collision
in any finite sample of real numbers, or even just rational numbers; but
I don't think anybody would expect that from a computer.  When I picture
the diligent software engineer asking himself, "Wait, how large can I
make this sample before I'll start seeing collisions," I imagine his
first guess is going to be the size of a float's mantissa.

What applications would have to worry about colliding floats?  I
don't know.  I'm coming from cryptology, where worrying about such
things becomes a reflex.

-- 
To email me, substitute nowhere->runbox, invalid->com.