[Python-Dev] random number generator state

Scott David Daniels Scott.Daniels at Acm.Org
Sun Aug 16 19:40:00 CEST 2009


Raymond Hettinger wrote:
> [Scott David Daniels]
>> I find I have a need in randomized testing for a shorter version
>> of getstate, even if it _is_ slower to restore.  [blah about big state]
> 
> Sounds like you could easily wrap the generator to get this.
> It would slow you down but would give the information you want.
Well, I was thinking that this might be generally useful for randomized
testing.

> I think it would be a mistake to complexify the API to accomodate
> short states -- I'm not even sure than they are generally useful
> (recording my initial seed and how many cycles I've run through
> is only helpful for sequences short enough that I'm willing to rerun
> them).
Right, that was what I was asking about.  The complexity of the change
grew on me; I hadn't realized at the outset it would be more than adding
a counter internally.  Consider me officially dissuaded.

> I'm curious what your use case is.  Why not just record the the sequence 
> as generated -- I don't see any analytic value to
> just knowing the initial seed and cycle count. 
I'm building data structures controlled by an rng, and then performing
sequences of (again randomly controlled) operations on those data
structures, check all invariants at each step.  I then lather, rinse,
repeat recording the start of each failing experiment.  In the morning I
come in and look for commonality in the cases I see.  Having the short
state means I  means I can easily rebuild the data structure and command
list to see what is going on.  I prune commands, simplify the tree, and
thus isolate the problem I found.

I did enough experimenting to see that if I simply provide access to run
N cycles of the block, I can actually do 2**32 cycles in feasible time,
so I have a pair of counters, and the code should take long enough for
eternity to show up before the wrap.

My short state is:
     seed, block_index, cycles_low, cycles_high, floating

(block_index + 625 * (cycles_low + (cycles_high << 32)) is the position,
and could be done as such; the pieces reflect the least-expensive cost
in performance to the rng. floating is simply the same final floating
piece that the state keeps now.

> Ability to print out a short state implies that you are using only a
> small subset of possible states (i.e. the ones you can get to with
> a short seed). 
Well, as you see above, I do capture the seed.  I realize that the time-
constructed seeds are distinct from identically provided values as small
ints, and I also mark when the rng gets called by set_state to indicate
that I then know nothing about the seed.
 >> mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
 >> but probably should be:
 >> mt[0] |= 0x80000000UL; /* MSB is 1; assuring non-zero initial array*/
 > Please file a bug report for this and assign to me....  Also, our
 > tests for MT exactly reproduce their published test sequence.
 >
I've been assured it is not a bug, and I filed no report since I had 
just arrived at the point of suspicion.

To summarize, I am officially dissuaded, and will post a recipe if I
get something nice working.

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-Dev mailing list