[Tutor] Filling an array - with a twist

Wed Nov 11 13:47:05 EST 2020

On 11/11/2020 19:30, Phil wrote:
> Another call on the Brains Trust I'm afraid.
> 
> I'd like to fill an 8 x 8 array (list of lists) with random numbers such 
> that no column or row repeats the same number more than twice in a 
> sequence.
> 
> e.g.. [12344562] is OK but [12223456] is not wanted.
> 
> Something like the following will give me a row or column that meets the 
> requirements but more often than not fails the complete array test.
> 
>  >>> import random
>  >>> r = random.sample(range(0,8),8)
>  >>> r
> [6, 2, 4, 5, 3, 0, 1, 7]
> 
> Searching through the array trying to repair an errant sequence is not 
> the answer. Can anyone suggest an algorithm to do what I need?
> 
> To add to the complication I really need numbers between 0 and 6 which 
> random.sample() won't allow because, as the value error states, the 
> sample is larger than the population.

"Stepwise Decomposition" involves taking a hard-to-solve 'large' problem 
and splitting it into smaller easier-to-solve problems - repeatedly - 
until the smallest problems can be solved.

In this case, rather than thinking only of an 8x8 matrix, ie the result; 
let's consider that each element (each 'cell' if you prefer) 
participates in two sub-problems:

1 does it obey 'the rules' for its row?
2 does it obey 'the rules' for its column?

So, now we have three data-structures:
1 the result - what we want
2 something which handles the row-rule
3 something which handles the column-rule

For the purposes of illustration, let me change the rules (simplify the 
'larger problem') by saying "*no repetition* in a single row or column" 
- call it "dn's Sudoku rule":-

The issue of repetition is most easily addressed in Python by realising 
that whilst a list (and thus a list of lists) may contain repeated 
values, a set may not.

Thus, create a list of eight (empty) sets representing (the values 
appearing in) each of the columns, and similar for the eight 'rows'.

Now to the process: each time the next element's value is computed from 
random(), before placing it in the 8x8 structure, check to see if it is 
already in one/both of the two applicable sets (by row and by column). 
If it is, then it would be a repetition (so 'reject'/ignore that 
computed-value. If not, store the value in the applicable sets, and in 
the list-of-lists.

Once you've understood that, and are coming back to me with the word 
"but...", let's get back to the real spec[ification].

We can't use a set to solve the 'one repetition allowed' rule, because 
sets refuse any repetition within their "members". There are a couple of 
ways to achieve 'counts' in Python, but let's stick with built-in data 
structures, and thus the pattern described (above). Change the eight 
column- and row-monitoring sets to dictionaries. Please recall that each 
dictionary (dict) element consists of a key and a value ("key-value pairs").

In this approach, when the next random-value is generated, check the two 
applicable dicts in-turn: does this value exist as a key (if value in 
dictionary!).

At first it will not, so add a dictionary entry with the random-value as 
key and a 'count' of one as its value.

If both dicts 'pass' the random-value, load it into the list-of-lists.

As the 8x8 fills, if a later random-value is generated which matches an 
existing row-dict (or column-dict) key (ie is "in" the dict), then the 
dict's value ("counter") must be checked. If it is one, increment to 
two*, and load the list-of-lists.
* ie this value currently appears in the applicable row (or column) 
twice - which, by definition, means that it may not appear again!

Thus, considering the above checks again, if the dict's value is two, 
reject the random-value because it breaks 'the rules'.

Rinse-and-repeat.

For extra credit:
if you liked playing with sets, then instead of using a dict to manage 
'counters' you could have two 'banks' of eight column-sets and the same 
for the rows. The first bank, used as above, will reveal if the number 
has/not been generated previously for this row/column. The second bank 
could be used to indicate if the value appears in the row a second time. 
Thus when the same value is generated for the third time, ie appears in 
the applicable row/column set in both 'banks', it should be rejected.

I have a feeling/hunch, that multiple sets might be faster than using a 
dict-counter.

Please note:
This spec is talking about 8x8 values, and thus the creation of an 
addition 16 dicts (or 32 sets). In today's world that does not seem 
'expensive' (demanding on storage). However, multiply that up to 
thousands/millions of values and then trading storage for CPU-time may 
become an issue!
-- 
Regards =dn