exercise: partition a list by equivalence

Xah Lee xah at xahlee.org
Sat Feb 19 20:26:51 EST 2005


an interesting problem so developed now is to write a function that
generate test cases for the purpose of testing performance. (just for
fun)

the design of this function could be interesting. We want to be able to
give parameters in this function so as to spit out all possible screw
test cases. First of all, a range of n (some millions) numbers. Then, a
fraction that specifies the percentage of these number are equivalent.
1 being all equivalent. 0 being all "distinct" (having only one
equivalent member (since the input comes in pairs)). Then we want to
have a parameter that says something like the sizes of the equivalence
groups. Suppose 50% of number are equal among themselves (i.e. have
more than one equivalent member). 1 would be all in one group. 0 would
mean all partitions have size 3 or 2. (there are more to it here...
since this is a distribution) ... Then, we need to turn this range of
integers into pairs. That's something more to think about.

So with this function at hand, we'll be able to say for sure which code
performs better (and under what type of input)

the Order notion in computing mathematics is fairly useless for finer
details.

PS it is not trivial to design this pair generating function. I don't
know if the above is on the right track, but essentially we want to
categorize the type of inputs according to the mathematical operational
performance aspect of partition by equivalence, and distill them into
parameters.

another func to write is one that canonicalize the output. Such as
sorting. So that all results can be compared simply by = in Python.

failing to design a elaborate pair_gen, we can just have pairs of
random numbers. But exactly what nature is such input... is more to
think about.

(in my original application, each number represent a computer file,
there are up to tens of thousands of files, and much less than 0.1% is
same as another, and for files that are same, each equivalent group
number no more than 10 or so.)

 Xah
 xah at xahlee.org
 http://xahlee.org/PageTwo_dir/more.html


John Machin wrote:
> FWIW, here's a brief UAT report:
> 
> Appears to work: Reinhold, David, Xah...
> Has bug(s): ...




More information about the Python-list mailing list