Programming challenge: wildcard exclusion in cartesian products

Fri Mar 24 05:53:46 EST 2006

Dinko Tenev <dinko.tenev at gmail.com> wrote:
> Dirk Thierbach wrote:

[One cannot escape exponential behaviour]

>> But you cannot get rid of this. Consider S = {a, b}, W = {a}.
>> Then there are |S|^n result elements for n > 1, and you have to enumerate
>> all of them.

> Yes, but then, they are in the target set.  

Which is the point. If they are in the target set, you have to enumerate
them. If the target set is of exponential size with respect to n,
then you'll need exponential time to do that.

> The point here is whether
> you can generate S^n - W in Theta( n * |S^n - W| ), which may be
> dramatically different from Theta( n * |S^n| ).

Exactly. Hence, you use a construction that guarantees that the time
needed is proportional to n*|S^n - W|: Every step you do will be
enecessary to produce at least one word in the output set. Now, if 
|S^n - W| is still exponential, then you'll still need exponential
time. But nevertheless, that's the best you can hope for.

>> The automaton is:
>>
>> S: a -> A, b -> B
>> A: a -> A
>> B: b -> B

> The target set is specified as S^n - W, where W is everything matching
> (.*a.*b|.*b.*a).  Following the construction procedure in point, this
> exclusion set is matched exactly by my DFA with S initial and F final.
> Then, swapping final and non-final states makes {S, A, B} final, and F
> non-final.  Your DFA above may be equivalent, but to me it is far from
> clear exactly what algorithm would build it from the given data.

Well, it's just the result from the minimazation algorithm, where my
variant of the algorithm just prunes away the "stuck" state which can
never produce any output.

>> The point is not to "cut efficiently", the point is to enumerate
>> only those words that are actually in the result set.

> No, this is not the point.  Naive filtering already does that. 

No, it doesn't. Naive filtering always will look at the complete
input set, so, no matter what size |S^n - W| actually is, it will
always take time in proportion to |S^n|.

>  By "cutting efficiently" I mean skipping over search sub-trees that
> don't contain any results from the target set.

Yes. Consider a different example: With the wildcard expressions
W = { b*, aa*, ab* }, you'll get S^* - W = { a }. The resulting
minimum FSM will just accept 'a' (start state, one final state, and
the "stuck" state if you insist on it), so you skip over every
other subtree when enumerating results from that automaton.

And for the previous example, you'll need something like 2*n time
to enumerate the output set instead of 2^n, because once you're in
the "a-branch", you're producing only a's, and you're pruning
away all the subtrees that start with a "b". Similarly in the 
"b-branch".

Now clearer?

- Dirk