[Tutor] Filling an array - with a twist
dn
PyTutor at DancesWithMice.info
Wed Nov 11 13:47:05 EST 2020
On 11/11/2020 19:30, Phil wrote:
> Another call on the Brains Trust I'm afraid.
>
> I'd like to fill an 8 x 8 array (list of lists) with random numbers such
> that no column or row repeats the same number more than twice in a
> sequence.
>
> e.g.. [12344562] is OK but [12223456] is not wanted.
>
> Something like the following will give me a row or column that meets the
> requirements but more often than not fails the complete array test.
>
> >>> import random
> >>> r = random.sample(range(0,8),8)
> >>> r
> [6, 2, 4, 5, 3, 0, 1, 7]
>
> Searching through the array trying to repair an errant sequence is not
> the answer. Can anyone suggest an algorithm to do what I need?
>
> To add to the complication I really need numbers between 0 and 6 which
> random.sample() won't allow because, as the value error states, the
> sample is larger than the population.
"Stepwise Decomposition" involves taking a hard-to-solve 'large' problem
and splitting it into smaller easier-to-solve problems - repeatedly -
until the smallest problems can be solved.
In this case, rather than thinking only of an 8x8 matrix, ie the result;
let's consider that each element (each 'cell' if you prefer)
participates in two sub-problems:
1 does it obey 'the rules' for its row?
2 does it obey 'the rules' for its column?
So, now we have three data-structures:
1 the result - what we want
2 something which handles the row-rule
3 something which handles the column-rule
For the purposes of illustration, let me change the rules (simplify the
'larger problem') by saying "*no repetition* in a single row or column"
- call it "dn's Sudoku rule":-
The issue of repetition is most easily addressed in Python by realising
that whilst a list (and thus a list of lists) may contain repeated
values, a set may not.
Thus, create a list of eight (empty) sets representing (the values
appearing in) each of the columns, and similar for the eight 'rows'.
Now to the process: each time the next element's value is computed from
random(), before placing it in the 8x8 structure, check to see if it is
already in one/both of the two applicable sets (by row and by column).
If it is, then it would be a repetition (so 'reject'/ignore that
computed-value. If not, store the value in the applicable sets, and in
the list-of-lists.
Once you've understood that, and are coming back to me with the word
"but...", let's get back to the real spec[ification].
We can't use a set to solve the 'one repetition allowed' rule, because
sets refuse any repetition within their "members". There are a couple of
ways to achieve 'counts' in Python, but let's stick with built-in data
structures, and thus the pattern described (above). Change the eight
column- and row-monitoring sets to dictionaries. Please recall that each
dictionary (dict) element consists of a key and a value ("key-value pairs").
In this approach, when the next random-value is generated, check the two
applicable dicts in-turn: does this value exist as a key (if value in
dictionary!).
At first it will not, so add a dictionary entry with the random-value as
key and a 'count' of one as its value.
If both dicts 'pass' the random-value, load it into the list-of-lists.
As the 8x8 fills, if a later random-value is generated which matches an
existing row-dict (or column-dict) key (ie is "in" the dict), then the
dict's value ("counter") must be checked. If it is one, increment to
two*, and load the list-of-lists.
* ie this value currently appears in the applicable row (or column)
twice - which, by definition, means that it may not appear again!
Thus, considering the above checks again, if the dict's value is two,
reject the random-value because it breaks 'the rules'.
Rinse-and-repeat.
For extra credit:
if you liked playing with sets, then instead of using a dict to manage
'counters' you could have two 'banks' of eight column-sets and the same
for the rows. The first bank, used as above, will reveal if the number
has/not been generated previously for this row/column. The second bank
could be used to indicate if the value appears in the row a second time.
Thus when the same value is generated for the third time, ie appears in
the applicable row/column set in both 'banks', it should be rejected.
I have a feeling/hunch, that multiple sets might be faster than using a
dict-counter.
Please note:
This spec is talking about 8x8 values, and thus the creation of an
addition 16 dicts (or 32 sets). In today's world that does not seem
'expensive' (demanding on storage). However, multiply that up to
thousands/millions of values and then trading storage for CPU-time may
become an issue!
--
Regards =dn
More information about the Tutor
mailing list