optimization of rule-based model on discrete variables

Mon Jun 14 20:19:44 EDT 2021

On 6/13/21 12:15 PM, Elena via Python-list wrote:
> Hi, I have, say 10 variables (x1 ... x10) which can assume discrete finite 
> values, for instance [0,1 or 2].
> I need to build a set of rules, such as:
>
> 1) if x1==0 and x2==1 and x10==2 then y = 1
> 2) if x2==1 and x3==1 and x4==2 and x6==0 then y = 0
> 3) if x2==0 and x3==1 then y = 2
> 4) if x6==0 and x7==2 then y = 0
> ...
> ...
> (actually it can be seen as a decision tree classifier).
>
> y can assume the same discrete value [0,1 or 2]
> I don't know a-priori anything about the number of rules and the 
> combinations of the tested inputs.
>
> Given a dataset of X={(x1... x10)} I can calculate Y=f(X) where f is this 
> rule-based function.
>
> I know an operator g that can calculate a real value from Y: e = g(Y)
> g is too complex to be written analytically.
>
> I would like to find a set of rules f able to minimize e on X.
>
> I know the problem can become NP-hard, but I would be fine also with a 
> suboptimal solution.
>
> What's the best way to approach the problem?
> In case, does something already exist in python?
>
>
> thank you

My first feeling is that this doesn't have a lot of structure to do a
lot of optimizations, and just brute forcing might be the answer.

Of course, one option is just sort the rules by Y, then iterate through
the rules and see if any data matches, for that putting the dataset into
something like an SQLite database with indexes on the various X columns
might work (with 10 columns, the 45 indexes on each column pair might
make things reasonably efficient.)

The biggest gain would happen if you could look at the rule set and find
patterns in it. The big question is making sure that the rule set covers
every value in the data array, and never gives one input value two
different y values.

-- 
Richard Damon