Best attack order for groups of numbers trying to destroy each other, given a victory chance for number to number attack.

skybuck2000 at hotmail.com skybuck2000 at hotmail.com
Fri Dec 9 07:26:21 EST 2016


Hello,

(This problem is probably too computationally intensive to solve with Python, though Python + Cuda could be interesting, and also Python has some interesting expressive powers, so it could be interesting to see how Python programmers might be able to express this problem with Python code, so I am going to give this Python group a try... maybe it will surprise me ! :) At least now you have a nice computational problem for those boring rainy days (when the net is down?and the offline games bore you ;) LOL :))

There are two groups of numbers which try to destroy each other:

Group X = 1,2,3,4
Group Y = 1,2,3,4

There is a 4 by 4 victory table which describes the chance of a number 
destroying another number:

Victory table =
50,  3, 80, 70
90, 60, 20, 40
30, 90, 55, 65
75, 90, 98, 60

(Currently implemented as a chance by diving it by 100 and storing as 
floating point, but since these are subtracted only from 1.0 I guess they 
can be stored as integers instead, even bytes)

This leads to the following computational problem as far as I am concerned:

Each number has an attack order permutation as follows (factorial 4 = 
1x2x3x4 = 24)

1,2,3,4    // 1
1,2,4,3    // 2
1,3,2,4    // 3
1,3,4,2    // 4
1,4,2,3    // 5
1,4,3,2    // 6
2,1,3,4    // 7
2,1,4,3    // 8
2,3,1,4    // 9
2,3,4,1    // 10
2,4,1,3    // 11
2,4,3,1    // 12
3,1,2,4    // 13
3,1,4,2    // 14
3,2,1,4    // 15
3,2,4,1    // 16
3,4,1,2    // 17
3,4,2,1    // 18
4,1,2,3    // 19
4,1,3,2    // 20
4,2,1,3    // 21
4,2,3,1    // 22
4,3,1,2    // 23
4,3,2,1   // 24

(These attack orders can be numbered from 1 to 24 or 0 to 23 and then it's 
attack order/permutation can be looked up to safe memory.)

Each number has it's own attack order and thus this leads to the following 
combinational computational problem:

All combinations of permutations in which order group X can attack Group Y 
and vice versa:

Group X = 24 x 24 x 24 x 24
Group Y = 24 x 24 x 24 x 24

So this is 24 possibility to the power of 8.

Final computation complexity at the very minimum is (24 to the power of 8) 
multiplied by roughly 4 attacks perhaps even 5 or 6 to finally destroy a 
group of numbers.

24 to the power of 8 = 110.075.314.176

I have already written a computer program which can solve this, however the 
computer program estimates it takes 19 hours on a 2.0 gigahertz AMD Athlon 
X2 core.

Using dual core could solve the problem over night, though I do not feel 
comfortable running my PC at night unattended.

So now I have the idea to make this program run when my computer is idling 
during the day, it should also be able to store it's progress so that it can 
continue after it was shutdown.

(Idea for now is to make it multi threaded and assign a low thread priority 
so it can run during the day when I use my computer and it's not doing much 
so it can use the reserve computational horse power).
(I still have to try these "idle/reverse" ideas to see which one works best 
without interrupting my web browsing or music listening too much ;))

My system has 4 GB of ram, so other ideas could be to store a data structure 
partially which could keep some computations so that it doesn't have to be 
done again... Though memory lookups might be a bit slow so not sure if that 
makes any sense.

I might also try GPU/Cuda since there seems to be lots of loops/reoccurences 
of the same computations that will happen over and over again... So maybe 
cuda can detect "same branch execution" and some "computations" and might 
speed it up, not sure about that.

Just the 8 index loops already cost a lot of instructions. Since there are 
only 24 permutation it would be enough to store it in 5 bits. Perhaps a 
rounded floating point which increases by 1/24 might be enough to trigger 
the 4th bit from incrementing when it actually needs to.
2x2x2x2x2 = 32  (it should increment bit 6 when the 5 bits reach 24).
So the idea here was to create 8 indexes from just 1 index being incremented 
to create the 8 combinations of indexes "instruction cheaply".
Not sure if this will work, just an idea I might try :)

Then those bits would still need to be extract and makes. So perhaps on 
older systems this is not efficient.

The 8 indexes need at least 3 instructions, 1 index increment, 1 
comparision, 1 jump.

The inner loop contains some while loops to increment attack index per 
number.

Each number has a "alive" variable which starts at 1.0 and is decreased 
everytime it's attacked.

Once a number is dead below 0.0000001 it's considered dead and can no longer 
attack.

(Since victory table was described as integers above this can also be read 
as: Alive starts at 100 and once it goes zero or negative it's dead).

Anyway the main reason why I wrote to this/these groups is that the numbers 
themselfes are not that larger and can fit in a byte, even a few bits.

Thus they will fit into SSE registers and such and I also know SSE has some 
permutations instructions.

I am not expert at SSE but I am wondering if there are perhaps some 
SSE1/2/3/4/5 instructions which can help at solving/computing this problem ?

Now the question you might be wondering about is ? What am I actually trying 
to compute ?

Well the final output could simply be the number of victories that each 
attack order has had per number. I am particullary interested in victories 
of number 4. So that would be sufficient for now.

Number 4 has 24 permutations.

So I want to know how each permutation of number 4 performs under all other 
circumstances/permutations of all other numbers/combinations and so forth.

This basically requires the entire set to be computed and then record number 
of victories for for example Group X number 4.

Only the results of one group needs to be outputted since the two groups are 
a mirror of each other.

Also during the development of the computer program I finally had a 
successfull implementation by keeping it as simple as possible and working 
with direct variables, instead of much more complex arrays.

Later on I made a more code efficient version which uses arrays/lookups and 
such. It was much harder to try the array approach at first since it becomes 
mindly complex and can obscure "vision to a solution".

So I suggest anybody trying to implement a solver for this to keep the code 
as simple as possible at first, since this is already an 8 dimensional 
problem with a 9th dimension.

Also it was interesting to see the Delphi for loops not allowing array 
fields as loop variables, so those loops will need to be fleshed out anyway, 
or one big linear loop could be used and then 8 indexes calculated from 
that.

That approach will probably be necessary for a cuda solution anyway... 32 
bit might be sufficient if remainders are reincluded in the conversion from 
thread/block dimensions to linear dimension back to 8 dimensional indexes.

Perhaps such computation might even be a bit quicker than the 3 instructions 
per loop.

For example 8 divisions and 8 remainders will be necessary to compute 8 
indexes from one big linear indexes. Since div/mod can be the same 
instruction this might only require 8 instructions to compute these loop 
indexes from a big linear index.
However computing the big linear index already requires between 6 or 10 
instructions.

So just to compute those 100+ billion data entries requires something like 
20 instructions just to be able to compute those loop indexes.

This is my finding with bigger data sets which consists out of small little 
data units... The closer the number of entries/data items get to the 
actually processing computing power the more time/processing/instructions 
are wasted on just computing these loop indexes.

It makes sense from this perspective: A problem of 100 items only needs 100 
loops. Thus lots of processing power remains for the data. But for 2 billion 
data items, the number of items already matches the gigaherts of the core... 
so the core is already swamped... with just loop index calculations.

I hope this reasoning/explanation convinces some CPU/GPU designers to 
include more instructions to compute loop indexes in all kinds of ways that 
could speed that up somewhat and safe some processing time.

Anyway... let me know if you think SSE could be of some help in solving this 
permutation/combination problem ?!

Also for the python newsgroup:

Maybe python is more powerfull than I thought and it has some weird/kinda crazy/cool new permutation/combination algorithmic classes that can solve these kinds of problems faster than just the average run of the mill general code ! ;)
(Or perhaps build in solver support, that will be my next try/posting :P :))

Bye,
  Skybuck. 



More information about the Python-list mailing list