xor: how come so slow?

Sun Oct 19 14:41:02 EDT 2008

Steven D'Aprano wrote:

> On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:
>
>> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> wrote:
>>>
>>>On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>>>
>>>> Is piece really meant to be random? If so, your create_random_block
>>>> function isn't achieving much--xoring random data together isn't going
>>>> to produce anything more exciting than less random data than you
>>>> started with.
>>>
>>>Hmmm... why do you say that xoring random data with other random data
>>>produces less randomness than you started with?
>>>
>>>I'm not saying that you're wrong, and certainly it is pointless since
>>>you're not going to improve on the randomness of /dev/urandom without a
>>>lot of work. But less random?
>> 
>> For those who got a bit lost here, I'd would point out that Knuth[1] has
>> an excellent chapter on random numbers that includes a detailed
>> discussion of this effect.  His net takeaway is that most of the things
>> people do to increase randomness actually have exactly the opposite
>> effect.
>
> I don't doubt it at all. But xoring random data with more random data? 
> I'm guessing that if the two sources of data are independent and from the 
> same distribution, then xoring them is pointless but not harmful. Here's 
> a rough-and-ready test which suggests there's little harm in it:
>
>
>>>> import os, math
>>>> def rand_data(size):
> ...     return [ord(c) for c in os.urandom(size)]
> ...
>>>> def mean(data):
> ...     return sum(data)/len(data)
> ...
>>>> def stdev(data):
> ...     return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
> ...
>>>> A = rand_data(1000)  # good random data
>>>> B = rand_data(1000)  # more good random data
>>>> AB = [a^b for (a,b) in zip(A, B)]  # is this still good random data?
>>>> assert len(AB) == len(A) == len(B)
>>>>
>>>> mean(A), stdev(A)
> (126, 73.91887445030531)
>>>> mean(B), stdev(B)
> (128, 74.242844773082339)
>>>> mean(AB), stdev(AB)
> (129, 74.39085965358916)
>
>
> Note: I wouldn't take the above terribly seriously. Mean and standard 
> deviation alone are terrible measures of the randomness of data. But this 
> does suggest that any deviation from uniform randomness will be quite 
> subtle.
>
>
>

Operations like 'and' and 'or' will tend to destroy randomness.  'and'
tends to the 0-string and 'or' tends to the 1-string.  I feel like 'xor'
should be safe (like Steven), but is the proof merely the half-and-half
split of the truth table?