SHA-based subclass for random module

Trevor Perrin trevp_spam at trevp.net
Wed Mar 24 01:21:03 EST 2004


Josiah Carlson wrote:

> 
>> This would be easy to implement with Mersenne Twister too.  Though it 
>> would involve some rewriting.
>>
>> Do you think this is worthwhile?
> 
> 
> I don't know, but I'm leaning towards not really.
> 
>     def getrandstring(self, n):
>         out = [struct.pack('>L', self.getrandbits(32)) \
>                for i in xrange(0, n, 4)]
>         out.extend([chr(self.getrandbits(4)) for i in xrange(n%4)])
>         return ''.join(out)
> 
> You have your getrandstring for any class that provides getrandbits, and 
> is relatively optimized to deal with very long output strings.

If the generator is something like /dev/urandom or Win32's 
CryptGenRandom, you don't want the repeated system call overhead of 
dipping into it 32 bits at a time.

(I made the mistake of getting a byte at a time out of CryptGenRandom 
once, for a few hundred bytes of RSA padding, and it was a big 
performance hit).


>     def getrandbits(self, n):
>         out = 0
>         for i in xrange(0, n+16, 16):
>             out = out << 16 + struct.unpack('>H', self.getrandstring(2))
>     return out >> (16-(n%16))
> 
> Now you have getrandbits for any method that provides getrandstring,

But you don't need getrandbits().

Let's distinguish between the API and the SPI (service provider 
interface).  API is what users access, and the SPI is what new 
generators implement.

I argue that getrandstring() makes more sense than getrandbits() in both 
the API and SPI.

In the API, getrandstring() could be used for cryptographic keys, for 
example.  In contrast, getrandbits() is not a useful API function, since 
the API already offers randint().

As for the SPI, currently there is random() and getrandbits().  These 
require every new generator to be able to produce floats and 
arbitrary-sized longs.

Most generators are oriented towards producing strings of bits or bytes 
(M-T produces a sequence of 32-bit values, for example), so it would be 
easier for them to implement getrandstring() and let the base class 
handle conversion.


> On a relatively unrelated note, I'm leaning toward trying to get the 
> benefits of both SHA and MT with the following, though I lack the heavy 
> mathematical background to prove whether it is better than either.

I agree with Paul, there's no need to combine them.  The only benefit of 
MT vs. a SHA-based generator is speed.

Trevor



More information about the Python-list mailing list