xor: how come so slow?

Sun Oct 19 04:16:07 EDT 2008

On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:

> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> wrote:
>>
>>On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>>
>>> Is piece really meant to be random? If so, your create_random_block
>>> function isn't achieving much--xoring random data together isn't going
>>> to produce anything more exciting than less random data than you
>>> started with.
>>
>>Hmmm... why do you say that xoring random data with other random data
>>produces less randomness than you started with?
>>
>>I'm not saying that you're wrong, and certainly it is pointless since
>>you're not going to improve on the randomness of /dev/urandom without a
>>lot of work. But less random?
> 
> For those who got a bit lost here, I'd would point out that Knuth[1] has
> an excellent chapter on random numbers that includes a detailed
> discussion of this effect.  His net takeaway is that most of the things
> people do to increase randomness actually have exactly the opposite
> effect.

I don't doubt it at all. But xoring random data with more random data? 
I'm guessing that if the two sources of data are independent and from the 
same distribution, then xoring them is pointless but not harmful. Here's 
a rough-and-ready test which suggests there's little harm in it:

>>> import os, math
>>> def rand_data(size):
...     return [ord(c) for c in os.urandom(size)]
...
>>> def mean(data):
...     return sum(data)/len(data)
...
>>> def stdev(data):
...     return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
...
>>> A = rand_data(1000)  # good random data
>>> B = rand_data(1000)  # more good random data
>>> AB = [a^b for (a,b) in zip(A, B)]  # is this still good random data?
>>> assert len(AB) == len(A) == len(B)
>>>
>>> mean(A), stdev(A)
(126, 73.91887445030531)
>>> mean(B), stdev(B)
(128, 74.242844773082339)
>>> mean(AB), stdev(AB)
(129, 74.39085965358916)

Note: I wouldn't take the above terribly seriously. Mean and standard 
deviation alone are terrible measures of the randomness of data. But this 
does suggest that any deviation from uniform randomness will be quite 
subtle.

-- 
Steven