dictionary comparison

Fri Aug 11 09:04:43 EDT 2000

<theoryboy at my-deja.com> wrote in message news:8n0ebs$65n$1 at nnrp1.deja.com...
    [snip]
> for state1 in FSA.keys()
>     for state2 in FSA.keys()
>         if FSA[state1] is not FSA[state2] and\
>            FSA[state1] == FSA[state2]:
>                 keep_list.append(state1)
>                 remove_list.append(state2)
>
> I will of course get duplicate entries. I could simply check to see if
> the items have already been added, but this still means n^2 comparisons,
> not sum to n which is all I need.

OK, problem is clearer now, I think.  Except that I fail to see the fact
that you think you only need n comparisons.  (n*n)/2, yes.  But, n?

> The same problem can be true of lists, but with lists it is easy enough
> to use numerical indices to make sure the start of the inner loop
> "follows" the outer loop.

...and that gives you (n*n)/2, not n, right?  I.e., you do still have an
inner loop as well as an outer one.  So you halve the effort, which is
good, but it's still O(n*n), not O(n).

> This obviously cannot be done with dicitionaries, but is there a
> similar solution that's staring me in the face somewhere?

Yes, the fact that you can save the list-of-keys once and for all:

keys=FSA.keys()
for istate1 in range(len(keys)-1):
    state1=keys[istate1]
    for istate2 in range(istate1+1, len(keys)):
        state2=keys[istate2]
        if FSA[state1]==FSA[state2]:
            keep_list.append(state1)
            remove_list.append(state2)

so you're basically still operating with a list -- that of the keys.

To avoid duplicates in the keeplist and removelist make them
into dictionaries, too, if you wish; i.e. change the last2 lines to:
            keepdict[state1]=1
            removedict[state2]=1
and in the end your keeplist will be keepdict.keys(), etc.  Of
course you have to initialize keepdict={} &c at the start, in lieu
of keeplist=[] &c.

I'm unclear on what you want to do if a state appears on both
the keeplist and removelist, anyway.  Maybe your loops should
skip those states which are already in the appropriate list, which
is also faster if you can check with keepdict.has_key(state1)
rather than with state1 in keeplist -- O(1) or so vs O(n) for
each check, and, if there are O(n*n) of them, that matters!-)
[Actually I think you only need O(n) checks, but it still may
matter a bit].

I.e., suppose FSA has 3 states a,b,c, all with identical lists
of transitions, which happen to come in this order.  Now with
your code, keeplist would be [a,b] and removelist [b,c,c].  I
think you'd want a keeplist of [a] and a removelist of [b,c],
right?  In this case, here's a fleshed-out approach...:

keys=FSA.keys()
keepdict={}
removedict={}

for istate1 in range(len(keys)-1):
    state1=keys[istate1]
    if removedict.has_key(state1): continue
    for istate2 in range(istate1+1, len(keys)):
        state2=keys[istate2]
        if FSA[state1]==FSA[state2]:
            keepdict[state1]=1
            removedict[state2]=1

keeplist = keepdict.keys()
removelist = removedict.keys()

If state1 is in the removedict, so also are any other
'following' states whose transitionlists might be
identical with it (they've been placed there on the
same outerloop as state1 was); so we need check
no more.

Alex