SHA-based subclass for random module

Josiah Carlson jcarlson at nospam.uci.edu
Wed Mar 24 02:45:31 EST 2004


> If the generator is something like /dev/urandom or Win32's 
> CryptGenRandom, you don't want the repeated system call overhead of 
> dipping into it 32 bits at a time.
> 
> (I made the mistake of getting a byte at a time out of CryptGenRandom 
> once, for a few hundred bytes of RSA padding, and it was a big 
> performance hit).

Do like a standard file read does, buffer some extra bytes for the 
future, 4k should be sufficient for most needs.


> But you don't need getrandbits().
[snip]
> I argue that getrandstring() makes more sense than getrandbits() in both 
> the API and SPI.
> 
> In the API, getrandstring() could be used for cryptographic keys, for 
> example.  In contrast, getrandbits() is not a useful API function, since 
> the API already offers randint().
> 
> As for the SPI, currently there is random() and getrandbits().  These 
> require every new generator to be able to produce floats and 
> arbitrary-sized longs.
> 
> Most generators are oriented towards producing strings of bits or bytes 
> (M-T produces a sequence of 32-bit values, for example), so it would be 
> easier for them to implement getrandstring() and let the base class 
> handle conversion.

As provided, using bits or bytes, it is easy to convert either way. 
While there are sources that are better suited for generating bytes 
(which you provide), there are entropy producers that give bits at a 
time.  If I remember correctly, there is a serial port, USB, processor 
and even chipset entropy generators that all produce bits (some faster 
than others).

Personally, I find bits far more useful, because while I can do binary 
operations on long integers, strings don't have |,^,&.


>> On a relatively unrelated note, I'm leaning toward trying to get the 
>> benefits of both SHA and MT with the following, though I lack the 
>> heavy mathematical background to prove whether it is better than either.

> I agree with Paul, there's no need to combine them.  The only benefit of 
> MT vs. a SHA-based generator is speed.

I could have sworn the other was that MT had cycle that seemed to be 
provably longer than any pure SHA based generator.

I will also point out that Raymond had offered a generator using MD5 
(which Paul gave us a link to showing that it was all but broken) that 
really just used MD5 as a bit twiddler to take 128 bits of MT to get 53 
bits of output.  The bit of code I offered merely goes a step further 
and uses SHA as a bit twiddler to take 160 bits of MT and 104 bits of 
previous state to get 53 bits of output.

Taking 160 bits of MT reduces the cycle to 1/5 of its original period, 
but I would expect that the output of the SHA digest to not cycle for 
some nontrivial number of MT cycles, unless SHA and MT are internally 
doing something quite similar (I don't believe this is the case).

  - Josiah



More information about the Python-list mailing list