Programming challenge: wildcard exclusion in cartesian products

Dirk Thierbach dthierbach at usenet.arcornews.de
Thu Mar 23 16:28:00 EST 2006


Dinko Tenev <dinko.tenev at gmail.com> wrote:
>> > I don't see immediately how exactly this is going to work.  Unless I'm
>> > very much mistaken, a FSA in the classical sense will accept or reject
>> > only after the whole sequence has been consumed, and this spells
>> > exponential times.

>> Exponential in respect to what?

> With respect to sequence length.  

But you cannot get rid of this. Consider S = {a, b}, W = {a}.
Then there are |S|^n result elements for n > 1, and you have to enumerate 
all of them.

The best thing one can hope for is to just actually process those
elements that are really in the result set. With the FSM, you're
getting at least close to that.

>> However, if the result set is simple (as in the example), then the
>> result FSM will have states that won't have transitions for every
>> letters, so I guess the average case will be a lot better.

> I don't believe this case should ever arise out of the current
> definition of the problem: labels are specified explicitly only for the
> excluded subset, and you have to accept everything else by default.

If the result set is {a^n, b^n | n \in N}, then you have an automaton
where exactly this happens. Since minimum automatons are unique
up to renaming of states, that's the result you'll get.

>> > For improved asymptotic complexity in this case,
>> > you need to be able to at least reject in mid-sequence,

>> One can do that if there is no valid transition out of some state.

> One possibly can, but such states should never be encountered (see
> above.)

The automaton is:

S: a -> A, b -> B
A: a -> A
B: b -> B

All the states are final. A and B have just one transition, so
you'll be able to generate either a^n or b^n efficiently.

> Looking at the minimal DFA for the above example, it may be more
> appropriate to detect being in a state from which there's no path to a
> final state:

This cannot happen, because minimizing removes all those states.
(Or, in case you have a different definition of automaton in mind:
There'll be just one "stuck" state, where all transitions that never
go to a final state end up. Remove this state, and you'll end up
with the other definition).

>> I guess one could even optimize on that: In the minimal automaton,
>> every state has some transition sequence that will end up in a final
>> state. Annotate each state with the minimum number of steps for
>> such a sequence. While walking the automaton, keep the maximum
>> number of letters to produce in a variable. If this number is small
>> then the number in the annotation, stop and backtrace.

> I don't think this could cut efficiently for patterns like *a*b.

The point is not to "cut efficiently", the point is to enumerate
only those words that are actually in the result set. If you are
enumerating all words shorter than a given length, the above method
should guarantee this.

- Dirk




More information about the Python-list mailing list