Programming challenge: wildcard exclusion in cartesian products

Dinko Tenev dinko.tenev at gmail.com
Fri Mar 24 04:42:14 EST 2006


Dirk Thierbach wrote:
> Dinko Tenev <dinko.tenev at gmail.com> wrote:
> >> > I don't see immediately how exactly this is going to work.  Unless I'm
> >> > very much mistaken, a FSA in the classical sense will accept or reject
> >> > only after the whole sequence has been consumed, and this spells
> >> > exponential times.
>
> >> Exponential in respect to what?
>
> > With respect to sequence length.
>
> But you cannot get rid of this. Consider S = {a, b}, W = {a}.
> Then there are |S|^n result elements for n > 1, and you have to enumerate
> all of them.

Yes, but then, they are in the target set.  The point here is whether
you can generate S^n - W in Theta( n * |S^n - W| ), which may be
dramatically different from Theta( n * |S^n| ).

> The best thing one can hope for is to just actually process those
> elements that are really in the result set. With the FSM, you're
> getting at least close to that.

If the FSA has to consume the whole sequence, you're only impoving
performace by a constant factor.

> The automaton is:
>
> S: a -> A, b -> B
> A: a -> A
> B: b -> B

The target set is specified as S^n - W, where W is everything matching
(.*a.*b|.*b.*a).  Following the construction procedure in point, this
exclusion set is matched exactly by my DFA with S initial and F final.
Then, swapping final and non-final states makes {S, A, B} final, and F
non-final.  Your DFA above may be equivalent, but to me it is far from
clear exactly what algorithm would build it from the given data.

> > Looking at the minimal DFA for the above example, it may be more
> > appropriate to detect being in a state from which there's no path to a
> > final state:
>
> This cannot happen, because minimizing removes all those states.
> (Or, in case you have a different definition of automaton in mind:
> There'll be just one "stuck" state, where all transitions that never
> go to a final state end up. Remove this state, and you'll end up
> with the other definition).

I have to admit that I've never considered this before, but as I'm
thinking of it now, a block of such states will never split by
minimization, so they're bound to end up together as a single state,
which can then be eliminated.  I can see your point now.

> >> I guess one could even optimize on that: In the minimal automaton,
> >> every state has some transition sequence that will end up in a final
> >> state. Annotate each state with the minimum number of steps for
> >> such a sequence. While walking the automaton, keep the maximum
> >> number of letters to produce in a variable. If this number is small
> >> then the number in the annotation, stop and backtrace.
>
> > I don't think this could cut efficiently for patterns like *a*b.
>
> The point is not to "cut efficiently", the point is to enumerate
> only those words that are actually in the result set.

No, this is not the point.  Naive filtering already does that.  By
"cutting efficiently" I mean skipping over search sub-trees that don't
contain any results from the target set.  If you can do this, you can
enumerate the target set in time proportional to its size, not to the
size of the search space, which is the point.


Cheers,

Dinko




More information about the Python-list mailing list